Layer 1 — forecast() 主链

由 Autoformer.forward() 在 long_term_forecast 分支调用。
本层覆盖 forecast() 的完整执行序列：decomp 初始化 → embedding → encoder → decoder → 合并输出。

1. 在父层中的位置

Autoformer.forward()
  └─ self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)   ← 本文档
       ├─ self.decomp(x_enc)                               → 详见 03D-Layer2D-SeriesDecomp
       ├─ self.enc_embedding(x_enc, x_mark_enc)            → 详见 03A-Layer2A-DataEmbedding
       ├─ self.encoder(enc_out)                            → 详见 03B-Layer2B-Encoder
       ├─ self.dec_embedding(seasonal_init, x_mark_dec)    → 详见 03A-Layer2A-DataEmbedding
       └─ self.decoder(dec_out, enc_out, trend=trend_init) → 详见 03C-Layer2C-Decoder

2. I/O 接口定义

	shape（toy）	含义
输入 `x_enc`	`(2, 12, 5)`	encoder 窗口：B=2, seq_len=12, enc_in=5
输入 `x_mark_enc`	`(2, 12, 4)`	encoder 时间特征
输入 `x_dec`	`(2, 10, 5)`	decoder 输入（历史6步 + 零4步）
输入 `x_mark_dec`	`(2, 10, 4)`	decoder 时间特征
输出	`(2, 10, 5)`	trend_part + seasonal_part（完整 dec_len）

forward() 在此基础上再做 [:, -4:, :] → (2, 4, 5)。

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步精读

5.0 完整原始代码

python

def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):
    # decomp init
    mean = torch.mean(x_enc, dim=1).unsqueeze(1).repeat(1, self.pred_len, 1)
    zeros = torch.zeros(
        [x_dec.shape[0], self.pred_len, x_dec.shape[2]], device=x_enc.device
    )
    seasonal_init, trend_init = self.decomp(x_enc)
    # decoder input
    trend_init = torch.cat([trend_init[:, -self.label_len :, :], mean], dim=1)
    seasonal_init = torch.cat(
        [seasonal_init[:, -self.label_len :, :], zeros], dim=1
    )
    # enc
    enc_out = self.enc_embedding(x_enc, x_mark_enc)
    enc_out, attns = self.encoder(enc_out, attn_mask=None)
    # dec
    dec_out = self.dec_embedding(seasonal_init, x_mark_dec)
    seasonal_part, trend_part = self.decoder(
        dec_out, enc_out, x_mask=None, cross_mask=None, trend=trend_init
    )
    # final
    dec_out = trend_part + seasonal_part
    return dec_out

5.1 宏观逻辑：为什么要这样初始化 decoder 输入

论文设计原理 — Generative Decoder 的两条初始化策略（Wu et al., NeurIPS 2021 §3.3）

Autoformer 的 Decoder 是一次性并行生成（非自回归），pred_len 步必须预先填入初始值让模型从此精化，而非从随机噪声出发。两条路径各有信号处理依据：
seasonal 填零 — 零均值定义要求
经典时间序列分解对 seasonal 分量的定义就是零均值： $\sum_{t} s_{t} = 0$ ，即 seasonal 刻画"相对趋势的周期偏离"，其长期均值为零是定义的一部分。
因此未来 seasonal 的最优无信息先验为 ${\hat{s}}_{t + h} = 0$ ——从零出发，让 AutoCorrelation 通过 FFT 发现 lag、roll 移位来填入预测值，初始值不引入偏见。
trend 填 mean(x_enc) — 最优零阶外推
Trend 刻画"慢变的长期方向"。论文假设在不知道趋势方向时，历史均值是最小方差无偏估计：
${\hat{τ}}_{t + h} \approx \bar{x} = \frac{1}{T} \sum_{t = 1}^{T} x_{t}$
论文原文："We use the mean of the encoder input as a rough trend estimation for the prediction part."
两种替代方案均更差：填零（若序列均值非零则偏差极大）；填最后一步 trend（受端点噪声影响）。均值作为零阶外推基线最稳健，Decoder 的 decomp 在每层抽出 trend residual 并累加精化。
两条路径的分工
历史区 (label_len) 预测区 (pred_len) 谁来精化
seasonal_init 真实历史季节分量 0（无先验） AutoCorrelation 发现 lag
trend_init 真实历史趋势 mean(x_enc)（均值基线） DecoderLayer 逐层 decomp 累加

	历史区 (label_len)	预测区 (pred_len)	谁来精化
`seasonal_init`	真实历史季节分量	0（无先验）	AutoCorrelation 发现 lag
`trend_init`	真实历史趋势	mean(x_enc)（均值基线）	DecoderLayer 逐层 decomp 累加

用小例子（B=1, seq_len=4, label_len=2, pred_len=2, enc_in=1）直觉验证：

x_enc[0,:,0] = [1, 3, 5, 7]
mean = (1+3+5+7)/4 = 4

seasonal_init 原始 (len=4): [s0, s1, s2, s3]
trend_init    原始 (len=4): [4,  4,  4,  4]  ← moving_avg 输出

拼接后 (len=6 = label_len=2 + pred_len=2... 不对，label_len=2, pred_len=2):
seasonal_init: [s2, s3, 0, 0]     ← 后2步历史 + 零占位
trend_init:    [4,  4,  4, 4]     ← 后2步历史 + 均值填充

完整 shape 变化链：

decomp(x_enc)         → seasonal_init(2,12,5), trend_init(2,12,5)
mean = mean(x_enc,1)  → (2,5) → unsqueeze(1) → (2,1,5) → repeat(1,4,1) → (2,4,5)
zeros                 → (2,4,5)
trend_init 裁剪+cat   → (2,6,5) + (2,4,5) → (2,10,5)
seasonal_init 裁剪+cat→ (2,6,5) + (2,4,5) → (2,10,5)

5.2 步骤一：decomp 初始化

python

mean = torch.mean(x_enc, dim=1).unsqueeze(1).repeat(1, self.pred_len, 1)
zeros = torch.zeros(
    [x_dec.shape[0], self.pred_len, x_dec.shape[2]], device=x_enc.device
)
seasonal_init, trend_init = self.decomp(x_enc)

torch.mean(x_enc, dim=1) 对 L 维（12步）求均值，结果 (2, 5)；unsqueeze(1) → (2, 1, 5)；.repeat(1, 4, 1) → (2, 4, 5) — 把均值复制到 4 个预测位置。

self.decomp(x_enc) 返回 (seasonal_init, trend_init)，均为 (2, 12, 5)。

toy 数值（batch=0, var=0）：设 x_enc[0,:,0] = [2,4,3,5,6,4,7,5,8,6,9,7]，
mean[0,0,0] = (2+4+...+7)/12 = 60/12 = 5.0。

→ decomp() 内部详见 [[03D-Layer2D-SeriesDecomp]]

5.3 步骤二：构造 decoder 输入（拼接）

python

trend_init = torch.cat([trend_init[:, -self.label_len :, :], mean], dim=1)
seasonal_init = torch.cat(
    [seasonal_init[:, -self.label_len :, :], zeros], dim=1
)

取 trend_init 最后 6 步（[:, -6:, :] → (2, 6, 5)），与均值填充 mean (2, 4, 5) 拼接 → (2, 10, 5)。

取 seasonal_init 最后 6 步 → (2, 6, 5)，与全零 zeros (2, 4, 5) 拼接 → (2, 10, 5)。

图解：

路径	历史区 (label_len=6)	预测区 (pred_len=4)	语义
`seasonal_init`	`seasonal_init[:, -6:, :]` 真实历史季节分量	全零	从空白学季节性
`trend_init`	`trend_init[:, -6:, :]` 真实历史趋势	`mean(x_enc)` 复制	从均值基线学趋势

toy 数值追踪（batch=0, var=0）：

decomp(x_enc) 的输出（详见 [[03D-Layer2D-SeriesDecomp]]，kernel=3 移动平均）：

trend\_init [0, :, 0] = [2.67, 3.00, 4.00, 4.67, 5.00, 5.67, 5.33, 6.67, 6.33, 7.67, 7.33, 7.67]

seasonal\_init [0, :, 0] = [- 0.67, 1.00, - 1.00, 0.33, 1.00, - 1.67, 1.67, - 1.67, 1.67, - 1.67, 1.67, - 0.67]

mean[0,:,0]（x_enc[0,:,0] 共12步均值 = $(2 + 4 + 3 + 5 + 6 + 4 + 7 + 5 + 8 + 6 + 9 + 7) / 12 = 60 / 12 = 5.0$ ，复制4次）：

mean [0, :, 0] = [5.0, 5.0, 5.0, 5.0]

拼接过程（取最后 6 步，即索引 6..11）：

trend_init[:, -6:, :][0,:,0]   = [5.33, 6.67, 6.33, 7.67, 7.33, 7.67]   ← 历史区
mean[0,:,0]                    = [5.0,  5.0,  5.0,  5.0 ]                ← 预测区
─────────────────────────────────────────────────────────────────
trend_init（拼后）[0,:,0]       = [5.33, 6.67, 6.33, 7.67, 7.33, 7.67, 5.0, 5.0, 5.0, 5.0]
                                    ←────── label_len=6 ──────→  ←── pred_len=4 ──→

seasonal_init[:, -6:, :][0,:,0]= [1.67, -1.67, 1.67, -1.67, 1.67, -0.67] ← 历史区
zeros[0,:,0]                   = [0,    0,     0,    0   ]                 ← 预测区
─────────────────────────────────────────────────────────────────
seasonal_init（拼后）[0,:,0]    = [1.67, -1.67, 1.67, -1.67, 1.67, -0.67, 0, 0, 0, 0]
                                    ←────── label_len=6 ──────→  ←── pred_len=4 ──→

两条路径的初始值对比

预测区（后4步）的填充选择体现了不同的先验：
trend_init 填 5.0（历史均值）：趋势项有一阶平稳先验，均值是最简无信息基线，Decoder 的 decomp 从这里精化
seasonal_init 填 0：季节项零均值，从空白出发完全由 AutoCorrelation 决定

5.4 步骤三：Encoder 路径

python

enc_out = self.enc_embedding(x_enc, x_mark_enc)
enc_out, attns = self.encoder(enc_out, attn_mask=None)

enc_embedding 把 (2,12,5)+(2,12,4) → (2,12,8)；encoder 做 2 层 EncoderLayer（含 AutoCorrelation + 2×decomp），形状不变 (2,12,8)。

→ embedding 详见 [[03A-Layer2A-DataEmbedding]]；encoder 详见 [[03B-Layer2B-Encoder]]

5.5 步骤四：Decoder 路径

python

dec_out = self.dec_embedding(seasonal_init, x_mark_dec)
seasonal_part, trend_part = self.decoder(
    dec_out, enc_out, x_mask=None, cross_mask=None, trend=trend_init
)

dec_embedding 把 (2,10,5)+(2,10,4) → (2,10,8)；decoder 接收 seasonal 流（(2,10,8)）和 trend 初始值（trend_init (2,10,5)），经 1 层 DecoderLayer 后返回两路：

seasonal_part (2, 10, 5) — 季节分量，Decoder 最终 Linear(8→5) 投影
trend_part (2, 10, 5) — 趋势分量，3次 decomp 提取之和经 Conv1d 投影后的累加结果

→ 详见 [[03C-Layer2C-Decoder]]

5.6 步骤五：合并输出

python

dec_out = trend_part + seasonal_part
return dec_out

两路逐元素相加 → (2, 10, 5)。forward() 再切片 [:, -4:, :] → (2, 4, 5)。

toy 数值（batch=0, t=6, var=0）：seasonal_part[0,6,0]=0.12，trend_part[0,6,0]=5.03，dec_out[0,6,0]=5.15。

6. 下钻子组件

子组件	职责	下层文档
`series_decomp`	moving_avg 分解趋势/季节	[[03D-Layer2D-SeriesDecomp]]
`DataEmbedding_wo_pos`	Token + Temporal embedding（无位置编码）	[[03A-Layer2A-DataEmbedding]]
`Encoder`	2层 EncoderLayer + my_Layernorm	[[03B-Layer2B-Encoder]]
`Decoder`	1层 DecoderLayer + trend 累加 + projection	[[03C-Layer2C-Decoder]]

DLinear_v1_archive

Informer_v1_archive

PatchTST_v1_archive

12-SelfAttention_Family

01-DLinear

02-PatchTST

03-Informer

Layer 1 — forecast() 主链

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步精读

5.0 完整原始代码

5.1 宏观逻辑：为什么要这样初始化 decoder 输入

5.2 步骤一：decomp 初始化

5.3 步骤二：构造 decoder 输入（拼接）

5.4 步骤三：Encoder 路径

5.5 步骤四：Decoder 路径

5.6 步骤五：合并输出

6. 下钻子组件

Layer 1 — forecast() 主链 ​

1. 在父层中的位置 ​

2. I/O 接口定义 ​

3. 顺序图（具体层） ​

4. 语义分组图（索引层） ​

5. 逐步精读 ​

5.0 完整原始代码 ​

5.1 宏观逻辑：为什么要这样初始化 decoder 输入 ​

5.2 步骤一：decomp 初始化 ​

5.3 步骤二：构造 decoder 输入（拼接） ​

5.4 步骤三：Encoder 路径 ​

5.5 步骤四：Decoder 路径 ​

5.6 步骤五：合并输出 ​

6. 下钻子组件 ​

Layer 1 — forecast() 主链

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步精读

5.0 完整原始代码

5.1 宏观逻辑：为什么要这样初始化 decoder 输入

5.2 步骤一：decomp 初始化

5.3 步骤二：构造 decoder 输入（拼接）

5.4 步骤三：Encoder 路径

5.5 步骤四：Decoder 路径

5.6 步骤五：合并输出

6. 下钻子组件