Autoformer 调试形参

Abstract

这篇只做一件事：
保存用于学习 Autoformer 代码运行流程的 PyCharm 参数，并保证模型内部关键循环至少执行一次。

1. PyCharm 配置

Script path

text

D:\1sudyta\1ai-self\aistyle\TFB\scripts\run_benchmark.py

Working directory

text

D:\1sudyta\1ai-self\aistyle\TFB

Environment variables

text

KMP_DUPLICATE_LIB_OK=TRUE

2. Parameters

直接复制下面这一整行到 PyCharm 的 Parameters：

text

--config-path rolling_forecast_config.json --data-name-list cif_2016_dataset_1.csv --model-name time_series_library.Autoformer --adapter transformer_adapter --model-hyper-params "{\"batch_size\":2,\"d_model\":32,\"d_ff\":128,\"e_layers\":1,\"d_layers\":1,\"factor\":1,\"horizon\":6,\"label_len\":6,\"n_heads\":2,\"moving_avg\":3,\"norm\":true,\"seq_len\":24,\"dropout\":0.0,\"lr\":0.0001,\"num_epochs\":1,\"num_workers\":0,\"output_attention\":0}" --strategy-args "{\"horizon\":6,\"tv_ratio\":0.8,\"train_ratio_in_tv\":0.75,\"stride\":6,\"num_rollings\":2}" --num-workers 1 --timeout 600 --save-path debug\cif1_Autoformer_rolling_min

2.1 VSCode 调试配置

VSCode 不把参数写成一整行，而是写进 .vscode/launch.json 的 args 数组。

先在 VSCode 里执行：

text

Ctrl+Shift+P
-> Python: Select Interpreter
-> 选择 D:\Anaconda\envs\tfb\python.exe

然后在仓库根目录创建或修改：

text

D:\1sudyta\1ai-self\aistyle\TFB\.vscode\launch.json

加入下面配置：

json

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "TFB Autoformer rolling debug",
      "type": "python",
      "request": "launch",
      "program": "${workspaceFolder}\\scripts\\run_benchmark.py",
      "cwd": "${workspaceFolder}",
      "console": "integratedTerminal",
      "env": {
        "KMP_DUPLICATE_LIB_OK": "TRUE"
      },
      "args": [
        "--config-path", "rolling_forecast_config.json",
        "--data-name-list", "cif_2016_dataset_1.csv",
        "--model-name", "time_series_library.Autoformer",
        "--adapter", "transformer_adapter",
        "--model-hyper-params", "{\"batch_size\":2,\"d_model\":32,\"d_ff\":128,\"e_layers\":1,\"d_layers\":1,\"factor\":1,\"horizon\":6,\"label_len\":6,\"n_heads\":2,\"moving_avg\":3,\"norm\":true,\"seq_len\":24,\"dropout\":0.0,\"lr\":0.0001,\"num_epochs\":1,\"num_workers\":0,\"output_attention\":0}",
        "--strategy-args", "{\"horizon\":6,\"tv_ratio\":0.8,\"train_ratio_in_tv\":0.75,\"stride\":6,\"num_rollings\":2}",
        "--num-workers", "1",
        "--timeout", "600",
        "--save-path", "debug\\cif1_Autoformer_rolling_min"
      ]
    }
  ]
}

关键区别：

text

PyCharm Parameters:
一整行字符串，JSON 需要用 \" 转义。

VSCode args:
一个参数一个数组元素，JSON 参数整体作为一个字符串元素。

3. 这组参数的第一性

这组参数不是为了刷 benchmark 分数，而是为了读代码。它的目标是：

text

1. 数据小，调试快。
2. seq_len / label_len / horizon 足够小，shape 好看。
3. e_layers=1，保证 Encoder.forward 的 for 循环至少执行 1 次。
4. d_layers=1，保证 Decoder.forward 的 for 循环至少执行 1 次。
5. factor=1 且 seq_len=24 / dec_len=12，使 AutoCorrelation 的 top_k 至少为 1。
6. moving_avg=3，使 series_decomp 的 moving_avg 窗口能手算。
7. dropout=0.0，减少随机性，便于比较断点里的 tensor。

4. 参数含义

数据与策略参数

参数	当前值	作用
`--config-path`	`rolling_forecast_config.json`	使用 rolling forecast 策略
`--data-name-list`	`cif_2016_dataset_1.csv`	小数据，便于快速进入模型主链
`horizon`	`6`	每次预测未来 6 步，同时映射成 `pred_len=6`
`tv_ratio`	`0.8`	前 80% 进入 train/valid 区域
`train_ratio_in_tv`	`0.75`	train/valid 区域内部再切训练段
`stride`	`6`	rolling 每次往后移动 6 步
`num_rollings`	`2`	只滚 2 次，减少调试时间
`--num-workers`	`1`	benchmark 外层只开 1 个 worker，方便断点
`--timeout`	`600`	单任务最长 600 秒

模型参数

参数	当前值	进入 Autoformer 后的意义
`model-name`	`time_series_library.Autoformer`	加载 `models/Autoformer.py`
`adapter`	`transformer_adapter`	走 `TransformerAdapter._process(...)` 统一接口
`batch_size`	`2`	每个训练 batch 两条样本，shape 更小
`seq_len`	`24`	encoder 输入历史长度
`horizon`	`6`	`Config` 会设置 `pred_len=6`
`label_len`	`6`	decoder 历史前缀长度
`d_model`	`32`	embedding / autocorrelation hidden size
`n_heads`	`2`	AutoCorrelation 多头数
`d_keys=d_values`	`16`	由 `d_model // n_heads = 32 // 2` 得到
`d_ff`	`128`	EncoderLayer / DecoderLayer 的 Conv1d FFN 隐藏维
`e_layers`	`1`	encoder loop 跑 1 次
`d_layers`	`1`	decoder loop 跑 1 次
`moving_avg`	`3`	series decomposition 的移动平均核大小
`factor`	`1`	AutoCorrelation 的 `top_k = int(factor * log(length))`
`dropout`	`0.0`	关闭 dropout，便于调试
`num_epochs`	`1`	只训练 1 轮，保证能进入训练 forward
`output_attention`	`0`	不额外返回 corr，先看主输出

5. 为什么这次能覆盖关键循环

5.1 Encoder 循环至少跑 1 次

Autoformer 构造 Encoder 时：

python

self.encoder = Encoder(
    [
        EncoderLayer(...)
        for l in range(configs.e_layers)
    ],
    norm_layer=my_Layernorm(configs.d_model),
)

当前：

text

e_layers = 1

所以：

text

range(1) = [0]
EncoderLayer_0 一定被创建
Encoder.forward 里的 for attn_layer 至少执行 1 次

对应代码：

python

for attn_layer in self.attn_layers:
    x, attn = attn_layer(x, attn_mask=attn_mask)
    attns.append(attn)

5.2 Decoder 循环至少跑 1 次

Autoformer 构造 Decoder 时：

python

self.decoder = Decoder(
    [
        DecoderLayer(...)
        for l in range(configs.d_layers)
    ],
    norm_layer=my_Layernorm(configs.d_model),
    projection=nn.Linear(configs.d_model, configs.c_out, bias=True),
)

当前：

text

d_layers = 1

所以：

text

range(1) = [0]
DecoderLayer_0 一定被创建
Decoder.forward 里的 for layer 至少执行 1 次

对应代码：

python

for layer in self.layers:
    x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
    trend = trend + residual_trend

5.3 AutoCorrelation 的 `for i in range(top_k)` 至少跑 1 次

AutoCorrelation 里训练态使用：

python

top_k = int(self.factor * math.log(length))
for i in range(top_k):
    pattern = torch.roll(tmp_values, -int(index[i]), -1)
    delays_agg = delays_agg + pattern * weight

当前：

text

factor = 1
encoder length = 24
decoder length = label_len + pred_len = 12

所以：

text

encoder top_k = int(1 * log(24)) = 3
decoder top_k = int(1 * log(12)) = 2

这意味着：

text

Encoder self-correlation 的 time delay aggregation 循环会跑 3 次。
Decoder self-correlation 的 time delay aggregation 循环会跑 2 次。
Decoder cross-correlation 长度会对齐到 L_Q=12，也会有 top_k >= 2。

5.4 DecoderLayer 内部三次 decomposition 都会跑

DecoderLayer.forward(...) 里固定执行：

python

x = x + self.dropout(self.self_attention(x, x, x, attn_mask=x_mask)[0])
x, trend1 = self.decomp1(x)

x = x + self.dropout(
    self.cross_attention(x, cross, cross, attn_mask=cross_mask)[0]
)
x, trend2 = self.decomp2(x)

y = x
y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
y = self.dropout(self.conv2(y).transpose(-1, 1))
x, trend3 = self.decomp3(x + y)

当前 d_layers=1，所以这三次分解都会至少跑一次：

text

decomp1: self-attn 后分解
decomp2: cross-attn 后分解
decomp3: FFN 后分解

6. 当前小例子的关键 shape

cif_2016_dataset_1.csv 是小数据，通常是单变量。调试时主要看：

text

C = enc_in = dec_in = c_out = 1

当前 batch 的核心输入：

text

x_enc / input:          (2, 24, 1)
x_mark_enc / input_mark:(2, 24, time_feature_dim)

target:                 (2, 12, 1)
x_mark_dec / target_mark:(2, 12, time_feature_dim)

TransformerAdapter._process(...) 构造 decoder 输入：

text

label_len = 6
pred_len = horizon = 6
decoder length = label_len + pred_len = 12

dec_input = cat(target[:, :6, :], zeros_like(target[:, -6:, :]), dim=1)
          = (2, 12, 1)

进入 Autoformer.forward 后：

text

x_enc:      (2, 24, 1)
x_mark_enc: (2, 24, time_feature_dim)
x_dec:      (2, 12, 1)
x_mark_dec: (2, 12, time_feature_dim)

forecast(...) 前半段：

text

mean:          mean(x_enc, dim=1) → (2,1,1) → repeat pred_len → (2,6,1)
zeros:         (2,6,1)
seasonal_init: self.decomp(x_enc)[0] → (2,24,1)
trend_init:    self.decomp(x_enc)[1] → (2,24,1)

trend_init decoder side:
  cat(trend_init[:, -6:, :], mean) → (2,12,1)

seasonal_init decoder side:
  cat(seasonal_init[:, -6:, :], zeros) → (2,12,1)

embedding 后：

text

enc_embedding(x_enc, x_mark_enc):             (2,24,32)
dec_embedding(seasonal_init, x_mark_dec):     (2,12,32)

Encoder：

text

Encoder.forward input:  (2,24,32)
EncoderLayer input:     (2,24,32)
AutoCorrelationLayer:
  Q/K/V projection:     (2,24,32)
  view heads:           (2,24,2,16)
AutoCorrelation:
  permute for FFT:      (2,2,16,24)
  corr:                 (2,2,16,24)
Encoder output:         (2,24,32)

Decoder：

text

Decoder.forward input x:     (2,12,32)
Decoder.forward cross:       (2,24,32)
trend input:                 (2,12,1)

DecoderLayer self-attn:
  Q=K=V=x:                   (2,12,32)
  heads:                     (2,12,2,16)

DecoderLayer cross-attn:
  Q=x:                       (2,12,32)
  K/V=cross:                 (2,24,32)
  AutoCorrelation 内部会把 K/V 截到 L_Q=12

Decoder seasonal_part:       (2,12,1)
Decoder trend_part:          (2,12,1)
final dec_out:               (2,12,1)
return dec_out[:, -6:, :]:   (2,6,1)

7. 断点顺序

第一轮只看代码流，先不要急着手算 FFT。

ts_benchmark/baselines/time_series_library/adapters_for_transformers.py
- TransformerAdapter._process(...)
- 看 dec_input = label_len 历史 + horizon 零占位。
ts_benchmark/baselines/time_series_library/models/Autoformer.py
- forward(...)
- 看进入 forecast(...)。
ts_benchmark/baselines/time_series_library/models/Autoformer.py
- forecast(...)
- 看 mean / zeros / seasonal_init / trend_init 怎样构造 decoder 两路输入。
ts_benchmark/baselines/time_series_library/layers/Autoformer_EncDec.py
- series_decomp.forward(...)
- 看 moving_avg(x) 和 res = x - moving_mean。
ts_benchmark/baselines/time_series_library/layers/Embed.py
- DataEmbedding_wo_pos.forward(...)
- 看 value_embedding + temporal_embedding，注意 Autoformer 没有 position embedding。
ts_benchmark/baselines/time_series_library/layers/Autoformer_EncDec.py
- Encoder.forward(...)
- 当前 e_layers=1，确认 for 循环进入 1 次。
ts_benchmark/baselines/time_series_library/layers/Autoformer_EncDec.py
- EncoderLayer.forward(...)
- 看 attention -> decomp1 -> FFN -> decomp2。
ts_benchmark/baselines/time_series_library/layers/AutoCorrelation.py
- AutoCorrelationLayer.forward(...)
- 看 Q/K/V projection -> view(B,L,H,-1)。
ts_benchmark/baselines/time_series_library/layers/AutoCorrelation.py
- AutoCorrelation.forward(...)
- 看 rfft -> conj -> irfft -> time_delay_agg_training/inference。
ts_benchmark/baselines/time_series_library/layers/Autoformer_EncDec.py
- Decoder.forward(...)
- 当前 d_layers=1，确认 for 循环进入 1 次。
ts_benchmark/baselines/time_series_library/layers/Autoformer_EncDec.py
- DecoderLayer.forward(...)
- 看 self_attention -> decomp1 -> cross_attention -> decomp2 -> FFN -> decomp3 -> residual_trend projection。

8. 当前学习主线

text

run_benchmark
-> pipeline
-> eval_model
-> RollingForecast._eval_batch
-> forecast_fit
-> TransformerAdapter._process
-> Autoformer.forward
-> Autoformer.forecast
-> 初始 series_decomp: seasonal_init / trend_init
-> enc_embedding
-> Encoder.forward
-> EncoderLayer.forward
-> AutoCorrelationLayer.forward
-> AutoCorrelation.forward
-> dec_embedding
-> Decoder.forward
-> DecoderLayer.forward
-> self AutoCorrelation + cross AutoCorrelation + 3次 decomposition
-> seasonal_part + trend_part
-> output[:, -pred_len:, :]

这一轮的第一性：

先看清 Autoformer 怎样把输入拆成 seasonal/trend 两条路径，再用 AutoCorrelation 替代 attention，在 encoder-decoder 过程中持续分解并累加 trend，最后 seasonal + trend 得到预测。

9. 重要运行提醒：CPU 与 `.cuda()` 风险

Autoformer 当前实现里，AutoCorrelation.time_delay_agg_inference(...) 和 time_delay_agg_full(...) 有硬编码：

python

init_index = torch.arange(length). ... .cuda()

这意味着：

text

如果机器没有可用 NVIDIA CUDA，完整 benchmark 到预测/eval 阶段可能会因为 .cuda() 报错。

训练阶段 self.training=True 时会走：

python

time_delay_agg_training(...)

这一支没有 .cuda()，所以你至少可以在训练 forward 里完整调试：

text

Autoformer.forward
-> forecast
-> Encoder
-> Decoder
-> AutoCorrelation.forward
-> time_delay_agg_training

如果你要在 CPU 上完整跑通 rolling forecast 的预测阶段，后续应单独修代码，把 .cuda() 改成跟随输入 device，例如：

python

.to(values.device)

这不属于本文的调试形参本身，但对 Autoformer 很关键。

DLinear_v1_archive

Informer_v1_archive

PatchTST_v1_archive

12-SelfAttention_Family

01-DLinear

02-PatchTST

03-Informer

Autoformer 调试形参

1. PyCharm 配置

Script path

Working directory

Environment variables

2. Parameters

2.1 VSCode 调试配置

3. 这组参数的第一性

4. 参数含义

数据与策略参数

模型参数

5. 为什么这次能覆盖关键循环

5.1 Encoder 循环至少跑 1 次

5.2 Decoder 循环至少跑 1 次

5.3 AutoCorrelation 的 `for i in range(top_k)` 至少跑 1 次

5.4 DecoderLayer 内部三次 decomposition 都会跑

6. 当前小例子的关键 shape

7. 断点顺序

8. 当前学习主线

9. 重要运行提醒：CPU 与 `.cuda()` 风险

Autoformer 调试形参 ​

1. PyCharm 配置 ​

Script path ​

Working directory ​

Environment variables ​

2. Parameters ​

2.1 VSCode 调试配置 ​

3. 这组参数的第一性 ​

4. 参数含义 ​

数据与策略参数 ​

模型参数 ​

5. 为什么这次能覆盖关键循环 ​

5.1 Encoder 循环至少跑 1 次 ​

5.2 Decoder 循环至少跑 1 次 ​

5.3 AutoCorrelation 的 for i in range(top_k) 至少跑 1 次 ​

5.4 DecoderLayer 内部三次 decomposition 都会跑 ​

6. 当前小例子的关键 shape ​

7. 断点顺序 ​

8. 当前学习主线 ​

9. 重要运行提醒：CPU 与 .cuda() 风险 ​

Autoformer 调试形参

1. PyCharm 配置

Script path

Working directory

Environment variables

2. Parameters

2.1 VSCode 调试配置

3. 这组参数的第一性

4. 参数含义

数据与策略参数

模型参数

5. 为什么这次能覆盖关键循环

5.1 Encoder 循环至少跑 1 次

5.2 Decoder 循环至少跑 1 次

5.3 AutoCorrelation 的 `for i in range(top_k)` 至少跑 1 次

5.4 DecoderLayer 内部三次 decomposition 都会跑

6. 当前小例子的关键 shape

7. 断点顺序

8. 当前学习主线

9. 重要运行提醒：CPU 与 `.cuda()` 风险