Skip to content

Non-stationary Transformer 调试形参

§1 PyCharm Run Configuration

Script pathD:\1sudyta\1ai-self\aistyle\TFB\scripts\run_benchmark.py

Parameters

--config-path rolling_forecast_config.json
--data-name-list ETTh1.csv
--strategy-args {"horizon": 4}
--model-name time_series_library.Nonstationary_Transformer
--model-hyper-params {"batch_size":2,"seq_len":12,"horizon":4,"d_model":8,"d_ff":16,"n_heads":4,"e_layers":2,"d_layers":1,"factor":1,"output_attention":0,"p_hidden_dims":[32],"p_hidden_layers":1,"num_epochs":1,"patience":3,"lr":0.001,"loss":"MSE","dropout":0.0,"embed":"timeF"}
--adapter transformer_adapter
--deterministic full
--gpus 0
--num-workers 1
--timeout 60000
--save-path debug/Nonstationary

Working directoryD:\1sudyta\1ai-self\aistyle\TFB


§2 VSCode launch.json

json
{
  "name": "Nonstationary Debug",
  "type": "debugpy",
  "request": "launch",
  "program": "${workspaceFolder}/scripts/run_benchmark.py",
  "args": [
    "--config-path", "rolling_forecast_config.json",
    "--data-name-list", "ETTh1.csv",
    "--strategy-args", "{\"horizon\": 4}",
    "--model-name", "time_series_library.Nonstationary_Transformer",
    "--model-hyper-params", "{\"batch_size\":2,\"seq_len\":12,\"horizon\":4,\"d_model\":8,\"d_ff\":16,\"n_heads\":4,\"e_layers\":2,\"d_layers\":1,\"factor\":1,\"output_attention\":0,\"p_hidden_dims\":[32],\"p_hidden_layers\":1,\"num_epochs\":1,\"patience\":3,\"lr\":0.001,\"loss\":\"MSE\",\"dropout\":0.0,\"embed\":\"timeF\"}",
    "--adapter", "transformer_adapter",
    "--deterministic", "full",
    "--gpus", "0",
    "--num-workers", "1",
    "--timeout", "60000",
    "--save-path", "debug/Nonstationary"
  ],
  "cwd": "${workspaceFolder}",
  "console": "integratedTerminal"
}

§3 参数选择第一原理

参数原因
data-name-listETTh1.csvN=5 变量,toy 参数 enc_in=5 自动推断
batch_size2最小化,方便 tensor shape 追踪
seq_len12与主要维度互不相同
horizon4= pred_len;label_len 自动设为 seq_len//2=6
d_model8足够小;n_heads=4 时 d_keys=2
d_ff16= d_model × 2
n_heads4d_keys=d_model/n_heads=2
e_layers2覆盖 Encoder 循环
d_layers1单层 Decoder,保持简单
p_hidden_dims[32]单隐层 32 维,比默认 [128,128] 小得多,追踪更清晰
p_hidden_layers1与 p_hidden_dims 长度一致
factor1DSAttention 中保留(实际 DSAttention 不使用 factor)
output_attention0不返回 attention 矩阵,简化输出
dropout0.0禁用随机性
num_epochs1只走一次 forward 路径

§4 形参含义速查

参数名代码中访问路径决定什么
seq_lenconfigs.seq_lenEncoder 输入长度;delta_learner 的 output_dim;Projector Conv1d 的 in_channels
pred_lenconfigs.pred_len(由 horizon 自动设置)forecast() 中 zeros 的长度;forward() 截取的长度
label_len自动推断 = seq_len//2x_dec_new 的历史段长度
enc_in自动从数据列数推断Projector 的 2×enc_in 拼接维度;DataEmbedding 的输入维度
d_modelconfigs.d_modelDataEmbedding 输出维度;AttentionLayer 的 d_keys×n_heads
n_headsconfigs.n_heads多头数;d_keys=d_model/n_heads
p_hidden_dimsconfigs.p_hidden_dimsProjector backbone 的每层隐层维度(列表)
p_hidden_layersconfigs.p_hidden_layersProjector backbone 的层数
output_attentionconfigs.output_attentionDSAttention 是否返回 A 矩阵

§5 循环覆盖验证

循环/分支覆盖方法当前参数是否覆盖
e_layers Encoder 循环e_layers=2✅ 执行 2 次 EncoderLayer
d_layers Decoder 循环d_layers=1✅ 执行 1 次 DecoderLayer
Encoder DSAttention(mask_flag=False)Encoder 自带
Decoder self-attn(mask_flag=True, delta=None)DecoderLayer 自带
Decoder cross-attn(mask_flag=False, delta=delta)DecoderLayer 自带
tau is None / delta is None 分支Decoder self-attn 传 delta=None
Projector backbone hidden_layers-1=0 循环p_hidden_layers=1✅ 循环不执行,直接到最后层
forecast() → forward() 路径task_name="short_term_forecast"

§6 Shape 追踪快速参考

ETTh1(N=5),batch_size=2,seq_len=12,pred_len=4,label_len=6:

步骤shape
x_enc 输入(2, 12, 5)
x_raw 备份(2, 12, 5)
mean_enc(2, 1, 5)
std_enc(2, 1, 5)
x_enc_norm(2, 12, 5)
tau_learner 输出(.exp() 前)(2, 1)
τ(.exp() 后)(2, 1)
δ(delta_learner 输出)(2, 12)
x_dec_new(2, 10, 5)
enc_embedding 输出(2, 12, 8)
Encoder 输出 enc_out(2, 12, 8)
dec_embedding 输出(2, 10, 8)
Decoder 输出(Linear 前)(2, 10, 8)
Decoder 输出(Linear 后)(2, 10, 5)
反归一化后 dec_out(2, 10, 5)
forecast() 返回(2, 10, 5)
forward() 截取 [:,-4:,:](2, 4, 5)

DSAttention 内部(Encoder self-attn):

步骤shape
queries/keys/values 输入 AttentionLayer(2, 12, 8)
Linear 投影后 .view 多头拆分(2, 12, 4, 2)
τ .unsqueeze(1).unsqueeze(1)(2, 1, 1, 1)
δ .unsqueeze(1).unsqueeze(1)(2, 1, 1, 12)
einsum("blhe,bshe→bhls") scores(2, 4, 12, 12)
scores × τ + δ(2, 4, 12, 12)
A = softmax(scale × scores)(2, 4, 12, 12)
V = einsum("bhls,bshd→blhd")(2, 12, 4, 2)
.view(2,12,8) + out_projection(2, 12, 8)

§7 关键断点设置

#文件位置目的
1adapters_for_transformers.py_process 首行确认 x_enc/x_dec shape
2Nonstationary_Transformer.pyforecast 首行进入主链
3Nonstationary_Transformer.pyx_raw = x_enc.clone().detach()确认原始备份 (2,12,5)
4Nonstationary_Transformer.pyx_enc = x_enc / std_enc验证归一化后 x_enc shape 不变
5Nonstationary_Transformer.pytau = self.tau_learner(...).exp()验证 tau shape (2,1),值均为正
6Nonstationary_Transformer.pydelta = self.delta_learner(...)验证 delta shape (2,12)
7Nonstationary_Transformer.pyx_dec_new = torch.cat(...)验证 x_dec_new shape (2,10,5)
8SelfAttention_Family.pyDSAttention.forward 首行验证 queries shape (2,12,4,2)
9SelfAttention_Family.pytau = ... tau.unsqueeze(1).unsqueeze(1)验证 tau 扩展为 (2,1,1,1)
10SelfAttention_Family.pyscores = einsum(...) * tau + delta验证 scores shape (2,4,12,12)
11SelfAttention_Family.pyA = softmax(...)验证 A shape (2,4,12,12),每行和为 1
12Nonstationary_Transformer.pydec_out = dec_out * std_enc + mean_enc验证反归一化输出 (2,10,5)

§8 与 Autoformer 调试参数对比

维度AutoformerNon-stationary
adaptertransformer_adaptertransformer_adapter
datasetETTh1.csv (N=5)ETTh1.csv (N=5)
seq_len1212
pred_len44
关键参数moving_avg=3, e_layers=2p_hidden_dims=[32], p_hidden_layers=1
新增组件Auto-Correlation(FFT)Projector(tau/delta)
attention 修改替换为 AutoCorrelation在 score 上加 ×τ+δ
decoder deltaN/Aself-attn delta=None;cross-attn delta=delta
形状拐点AutoCorrelation 的 rfft (2,12,4,7)scores × tau + delta (2,4,12,12)

*记录并在线阅读我的笔记*