Appearance
Non-stationary Transformer 调试形参
§1 PyCharm Run Configuration
Script path:D:\1sudyta\1ai-self\aistyle\TFB\scripts\run_benchmark.py
Parameters:
--config-path rolling_forecast_config.json
--data-name-list ETTh1.csv
--strategy-args {"horizon": 4}
--model-name time_series_library.Nonstationary_Transformer
--model-hyper-params {"batch_size":2,"seq_len":12,"horizon":4,"d_model":8,"d_ff":16,"n_heads":4,"e_layers":2,"d_layers":1,"factor":1,"output_attention":0,"p_hidden_dims":[32],"p_hidden_layers":1,"num_epochs":1,"patience":3,"lr":0.001,"loss":"MSE","dropout":0.0,"embed":"timeF"}
--adapter transformer_adapter
--deterministic full
--gpus 0
--num-workers 1
--timeout 60000
--save-path debug/NonstationaryWorking directory:D:\1sudyta\1ai-self\aistyle\TFB
§2 VSCode launch.json
json
{
"name": "Nonstationary Debug",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/scripts/run_benchmark.py",
"args": [
"--config-path", "rolling_forecast_config.json",
"--data-name-list", "ETTh1.csv",
"--strategy-args", "{\"horizon\": 4}",
"--model-name", "time_series_library.Nonstationary_Transformer",
"--model-hyper-params", "{\"batch_size\":2,\"seq_len\":12,\"horizon\":4,\"d_model\":8,\"d_ff\":16,\"n_heads\":4,\"e_layers\":2,\"d_layers\":1,\"factor\":1,\"output_attention\":0,\"p_hidden_dims\":[32],\"p_hidden_layers\":1,\"num_epochs\":1,\"patience\":3,\"lr\":0.001,\"loss\":\"MSE\",\"dropout\":0.0,\"embed\":\"timeF\"}",
"--adapter", "transformer_adapter",
"--deterministic", "full",
"--gpus", "0",
"--num-workers", "1",
"--timeout", "60000",
"--save-path", "debug/Nonstationary"
],
"cwd": "${workspaceFolder}",
"console": "integratedTerminal"
}§3 参数选择第一原理
| 参数 | 值 | 原因 |
|---|---|---|
data-name-list | ETTh1.csv | N=5 变量,toy 参数 enc_in=5 自动推断 |
batch_size | 2 | 最小化,方便 tensor shape 追踪 |
seq_len | 12 | 与主要维度互不相同 |
horizon | 4 | = pred_len;label_len 自动设为 seq_len//2=6 |
d_model | 8 | 足够小;n_heads=4 时 d_keys=2 |
d_ff | 16 | = d_model × 2 |
n_heads | 4 | d_keys=d_model/n_heads=2 |
e_layers | 2 | 覆盖 Encoder 循环 |
d_layers | 1 | 单层 Decoder,保持简单 |
p_hidden_dims | [32] | 单隐层 32 维,比默认 [128,128] 小得多,追踪更清晰 |
p_hidden_layers | 1 | 与 p_hidden_dims 长度一致 |
factor | 1 | DSAttention 中保留(实际 DSAttention 不使用 factor) |
output_attention | 0 | 不返回 attention 矩阵,简化输出 |
dropout | 0.0 | 禁用随机性 |
num_epochs | 1 | 只走一次 forward 路径 |
§4 形参含义速查
| 参数名 | 代码中访问路径 | 决定什么 |
|---|---|---|
seq_len | configs.seq_len | Encoder 输入长度;delta_learner 的 output_dim;Projector Conv1d 的 in_channels |
pred_len | configs.pred_len(由 horizon 自动设置) | forecast() 中 zeros 的长度;forward() 截取的长度 |
label_len | 自动推断 = seq_len//2 | x_dec_new 的历史段长度 |
enc_in | 自动从数据列数推断 | Projector 的 2×enc_in 拼接维度;DataEmbedding 的输入维度 |
d_model | configs.d_model | DataEmbedding 输出维度;AttentionLayer 的 d_keys×n_heads |
n_heads | configs.n_heads | 多头数;d_keys=d_model/n_heads |
p_hidden_dims | configs.p_hidden_dims | Projector backbone 的每层隐层维度(列表) |
p_hidden_layers | configs.p_hidden_layers | Projector backbone 的层数 |
output_attention | configs.output_attention | DSAttention 是否返回 A 矩阵 |
§5 循环覆盖验证
| 循环/分支 | 覆盖方法 | 当前参数是否覆盖 |
|---|---|---|
e_layers Encoder 循环 | e_layers=2 | ✅ 执行 2 次 EncoderLayer |
d_layers Decoder 循环 | d_layers=1 | ✅ 执行 1 次 DecoderLayer |
| Encoder DSAttention(mask_flag=False) | Encoder 自带 | ✅ |
| Decoder self-attn(mask_flag=True, delta=None) | DecoderLayer 自带 | ✅ |
| Decoder cross-attn(mask_flag=False, delta=delta) | DecoderLayer 自带 | ✅ |
| tau is None / delta is None 分支 | Decoder self-attn 传 delta=None | ✅ |
| Projector backbone hidden_layers-1=0 循环 | p_hidden_layers=1 | ✅ 循环不执行,直接到最后层 |
| forecast() → forward() 路径 | task_name="short_term_forecast" | ✅ |
§6 Shape 追踪快速参考
ETTh1(N=5),batch_size=2,seq_len=12,pred_len=4,label_len=6:
| 步骤 | shape |
|---|---|
x_enc 输入 | (2, 12, 5) |
x_raw 备份 | (2, 12, 5) |
mean_enc | (2, 1, 5) |
std_enc | (2, 1, 5) |
x_enc_norm | (2, 12, 5) |
tau_learner 输出(.exp() 前) | (2, 1) |
| τ(.exp() 后) | (2, 1) |
| δ(delta_learner 输出) | (2, 12) |
x_dec_new | (2, 10, 5) |
| enc_embedding 输出 | (2, 12, 8) |
| Encoder 输出 enc_out | (2, 12, 8) |
| dec_embedding 输出 | (2, 10, 8) |
| Decoder 输出(Linear 前) | (2, 10, 8) |
| Decoder 输出(Linear 后) | (2, 10, 5) |
| 反归一化后 dec_out | (2, 10, 5) |
| forecast() 返回 | (2, 10, 5) |
| forward() 截取 [:,-4:,:] | (2, 4, 5) |
DSAttention 内部(Encoder self-attn):
| 步骤 | shape |
|---|---|
| queries/keys/values 输入 AttentionLayer | (2, 12, 8) |
Linear 投影后 .view 多头拆分 | (2, 12, 4, 2) |
τ .unsqueeze(1).unsqueeze(1) | (2, 1, 1, 1) |
δ .unsqueeze(1).unsqueeze(1) | (2, 1, 1, 12) |
einsum("blhe,bshe→bhls") scores | (2, 4, 12, 12) |
| scores × τ + δ | (2, 4, 12, 12) |
| A = softmax(scale × scores) | (2, 4, 12, 12) |
| V = einsum("bhls,bshd→blhd") | (2, 12, 4, 2) |
.view(2,12,8) + out_projection | (2, 12, 8) |
§7 关键断点设置
| # | 文件 | 位置 | 目的 |
|---|---|---|---|
| 1 | adapters_for_transformers.py | _process 首行 | 确认 x_enc/x_dec shape |
| 2 | Nonstationary_Transformer.py | forecast 首行 | 进入主链 |
| 3 | Nonstationary_Transformer.py | x_raw = x_enc.clone().detach() | 确认原始备份 (2,12,5) |
| 4 | Nonstationary_Transformer.py | x_enc = x_enc / std_enc 后 | 验证归一化后 x_enc shape 不变 |
| 5 | Nonstationary_Transformer.py | tau = self.tau_learner(...).exp() 后 | 验证 tau shape (2,1),值均为正 |
| 6 | Nonstationary_Transformer.py | delta = self.delta_learner(...) 后 | 验证 delta shape (2,12) |
| 7 | Nonstationary_Transformer.py | x_dec_new = torch.cat(...) 后 | 验证 x_dec_new shape (2,10,5) |
| 8 | SelfAttention_Family.py | DSAttention.forward 首行 | 验证 queries shape (2,12,4,2) |
| 9 | SelfAttention_Family.py | tau = ... tau.unsqueeze(1).unsqueeze(1) | 验证 tau 扩展为 (2,1,1,1) |
| 10 | SelfAttention_Family.py | scores = einsum(...) * tau + delta | 验证 scores shape (2,4,12,12) |
| 11 | SelfAttention_Family.py | A = softmax(...) 后 | 验证 A shape (2,4,12,12),每行和为 1 |
| 12 | Nonstationary_Transformer.py | dec_out = dec_out * std_enc + mean_enc 后 | 验证反归一化输出 (2,10,5) |
§8 与 Autoformer 调试参数对比
| 维度 | Autoformer | Non-stationary |
|---|---|---|
| adapter | transformer_adapter | transformer_adapter |
| dataset | ETTh1.csv (N=5) | ETTh1.csv (N=5) |
| seq_len | 12 | 12 |
| pred_len | 4 | 4 |
| 关键参数 | moving_avg=3, e_layers=2 | p_hidden_dims=[32], p_hidden_layers=1 |
| 新增组件 | Auto-Correlation(FFT) | Projector(tau/delta) |
| attention 修改 | 替换为 AutoCorrelation | 在 score 上加 ×τ+δ |
| decoder delta | N/A | self-attn delta=None;cross-attn delta=delta |
| 形状拐点 | AutoCorrelation 的 rfft (2,12,4,7) | scores × tau + delta (2,4,12,12) |