Level1 配置进入 PatchTST

Abstract

这一篇对应 00-PatchTST总览与Level树的 Level 1。
讲一件事：命令行参数怎样一步步和 PatchTST(config) 建立关系，以及有哪些参数在框架层面已经被固定、无法从外部控制。

1. 总流程图

2. 当前例子里的命令行

以 ETTh1 horizon=96 为例（scripts/multivariate_forecast/ETTh1_script/PatchTST.sh）：

bash

python ./scripts/run_benchmark.py \
  --config-path "rolling_forecast_config.json" \
  --data-name-list "ETTh1.csv" \
  --strategy-args '{"horizon": 96}' \
  --model-name "time_series_library.PatchTST" \
  --model-hyper-params '{
      "d_ff": 2048,
      "d_model": 512,
      "e_layers": 1,
      "factor": 3,
      "horizon": 96,
      "n_headers": 2,
      "norm": true,
      "seq_len": 96
  }' \
  --adapter "transformer_adapter" \
  --gpus 0

3. 哪些参数与 PatchTST 强相关

参数	来源	在 PatchTST 里的用途
`seq_len`	`model_hyper_params`	输入序列长度，决定 patch_num
`horizon` → `pred_len`	`model_hyper_params` + adapter	预测步长，FlattenHead 的输出维度
`d_model`	`model_hyper_params`	embedding 维度
`d_ff`	`model_hyper_params`	FFN 中间维度
`e_layers`	`model_hyper_params`	EncoderLayer 层数
`patch_len`	`model_hyper_params`（有默认值）	patch 长度，默认 16
`stride`	`model_hyper_params`（有默认值）	滑动步长，默认 8
`factor`	`model_hyper_params`	FullAttention 传入，ProbAttention 才用；FullAttention 忽略
`dropout`	`model_hyper_params`（有默认值）	多处 Dropout
`n_heads`	`model_hyper_params`（有默认值）	attention 头数

4. ⚠️ n_headers：脚本里的 typo

.sh 脚本里写的是 "n_headers": 2，但 PatchTST.__init__ 读的是 config.n_heads：

python

# PatchTST.py
AttentionLayer(
    ...,
    config.d_model,
    config.n_heads,   # ← 读的是 n_heads
)

n_headers 和 n_heads 是两个不同的属性名。n_headers 会被写进 config 但不会被任何地方读取；config.n_heads 会使用 adapter 的默认值（通常是 8）。

实际影响：用这条脚本运行时，n_heads 并不是 2，而是默认值。脚本本身有 bug。

5. patch_len 和 stride 的默认值问题

PatchTST.__init__ 的签名是：

python

# PatchTST.py:30
def __init__(self, config):

注释里保留了被注释掉的旧签名：

python

# def __init__(self, config, patch_len=16, stride=8):

说明 patch_len=16, stride=8 是论文默认值。但现在它们改成从 config 读取：

python

self.patch_len = config.patch_len
self.stride = config.stride

如果 model_hyper_params 里没有传这两个参数，adapter 的 config 对象会没有这两个属性，导致 AttributeError。因此必须在命令行里传 patch_len 和 stride。

6. padding 是硬编码的

python

# PatchTST.py:41
padding = self.stride

padding 直接等于 stride，无法从外部配置。这意味着：

右端填充的长度始终等于步长
这保证了序列末尾的信息不会被截断（最后一个 patch 总能完整覆盖到末尾附近）

7. head_nf 的计算

python

# PatchTST.py:73-75
self.head_nf = config.d_model * int(
    (config.seq_len - self.patch_len) / self.stride + 2
)

这个公式的含义：

(seq_len - patch_len) / stride：如果不填充，能放下多少个完整 patch
+ 2：+ 填充带来的额外 1 个 patch，再 + 1（unfold 的 off-by-one 约定）

代入 toy 参数：

head_nf = 8 × int((9-4)/2 + 2)
        = 8 × int(2.5 + 2)
        = 8 × int(4.5)
        = 8 × 4
        = 32

Python 的 int() 对正数是向下取整（floor），等价于 //。

8. _init_model() 实例化过程

python

# adapters_for_transformers.py
def _init_model(self):
    return self.model_class(self.config)  # 等价于 PatchTST(config)

只传了 config，没有传其他参数。PatchTST.__init__(self, config) 只接受一个参数，所以没有问题。

9. 下一步

继续看：02-Level2-数据进入PatchTST

DLinear_v1_archive

Informer_v1_archive

PatchTST_v1_archive

12-SelfAttention_Family

01-DLinear

02-PatchTST

03-Informer

Level1 配置进入 PatchTST

1. 总流程图

2. 当前例子里的命令行

3. 哪些参数与 PatchTST 强相关

4. ⚠️ n_headers：脚本里的 typo

5. patch_len 和 stride 的默认值问题

6. padding 是硬编码的

7. head_nf 的计算

8. _init_model() 实例化过程

9. 下一步

Level1 配置进入 PatchTST ​

1. 总流程图 ​

2. 当前例子里的命令行 ​

3. 哪些参数与 PatchTST 强相关 ​

4. ⚠️ n_headers：脚本里的 typo ​

5. patch_len 和 stride 的默认值问题 ​

6. padding 是硬编码的 ​

7. head_nf 的计算 ​

8. _init_model() 实例化过程 ​

9. 下一步 ​

Level1 配置进入 PatchTST

1. 总流程图

2. 当前例子里的命令行

3. 哪些参数与 PatchTST 强相关

4. ⚠️ n_headers：脚本里的 typo

5. patch_len 和 stride 的默认值问题

6. padding 是硬编码的

7. head_nf 的计算

8. _init_model() 实例化过程

9. 下一步