Layer 3 — EncoderLayer 精读

由 Encoder.forward() 循环调用 2 次（[[03B-Layer2B-Encoder]]）。
本层覆盖：AutoCorrelation self-attention + 残差 + decomp1 + FFN + 残差 + decomp2。

1. 在父层中的位置

Encoder.forward()
  └─ for attn_layer in self.attn_layers:
       x, attn = attn_layer(x)   ← EncoderLayer（本文档）
            └─ self.attention(x, x, x)  → 详见 04-Layer4-AutoCorrelationLayer

2. I/O 接口定义

	shape	含义
输入 `x`	`(2, 12, 8)`	当前层的 seasonal 表示
输出 `res`	`(2, 12, 8)`	经 AutoCorrelation + 2×decomp 精炼后的 seasonal 表示
输出 `attn`	`None`	`output_attention=False` 时为 None

3. 顺序图（具体层）

4. 语义分组图（索引层）

Encoder 的每次 decomp 只保留 seasonal，trend 直接丢弃——这与 Decoder 形成对比（Decoder 累加所有 trend）。

5. 逐步精读

5.0 完整原始代码

python

class EncoderLayer(nn.Module):
    def __init__(
        self,
        attention,
        d_model,
        d_ff=None,
        moving_avg=25,
        dropout=0.1,
        activation="relu",
    ):
        super(EncoderLayer, self).__init__()
        d_ff = d_ff or 4 * d_model
        self.attention = attention
        self.conv1 = nn.Conv1d(
            in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False
        )
        self.conv2 = nn.Conv1d(
            in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False
        )
        self.decomp1 = series_decomp(moving_avg)
        self.decomp2 = series_decomp(moving_avg)
        self.dropout = nn.Dropout(dropout)
        self.activation = F.relu if activation == "relu" else F.gelu

    def forward(self, x, attn_mask=None):
        new_x, attn = self.attention(x, x, x, attn_mask=attn_mask)
        x = x + self.dropout(new_x)
        x, _ = self.decomp1(x)
        y = x
        y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
        y = self.dropout(self.conv2(y).transpose(-1, 1))
        res, _ = self.decomp2(x + y)
        return res, attn

5.1 残差块 1：AutoCorrelation + decomp1

步骤一：AutoCorrelation 注意力

python

new_x, attn = self.attention(x, x, x, attn_mask=attn_mask)
x = x + self.dropout(new_x)

self.attention 是 AutoCorrelationLayer，接收 Q=K=V=x（自注意力），返回 new_x (2,12,8)。残差相加后 x (2,12,8) 包含了注意力聚合的信息。

→ AutoCorrelationLayer 内部详见 [[04-Layer4-AutoCorrelationLayer]]

步骤二：decomp1 分解

python

x, _ = self.decomp1(x)

decomp1(x) 返回 (seasonal, trend1)，_ 接收并丢弃 trend1 (2,12,8)，x 保留季节分量 (2,12,8)。

为什么 Encoder 丢弃 trend？

Encoder 的职责是提取输入序列的周期性模式，供 Decoder cross-attention 使用。趋势分量是低频的长期走势，在时延聚合（AutoCorrelation）中贡献的 lag 信息有限。丢弃 trend 使 Encoder 专注于季节性特征，从而在 cross-attention 中向 Decoder 传递更纯粹的周期信号。
Decoder 则需要同时维护趋势（因为最终预测需要 trend + seasonal），所以 DecoderLayer 的 3 次 decomp 都保留 trend 并累加。

5.2 残差块 2：FFN + decomp2

步骤三：Conv1d FFN

python

y = x
y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
y = self.dropout(self.conv2(y).transpose(-1, 1))

conv1 = nn.Conv1d(in_channels=8, out_channels=16, kernel_size=1)
conv2 = nn.Conv1d(in_channels=16, out_channels=8, kernel_size=1)

Conv1d 要求 (B, C, L)，而 y 是 (2, 12, 8)：

y.transpose(-1, 1):      (2, 12, 8) → (2, 8, 12)
conv1:                   (2, 8, 12) → (2, 16, 12)
activation + dropout:    (2, 16, 12)
conv2:                   (2, 16, 12) → (2, 8, 12)
.transpose(-1, 1):       (2, 8, 12) → (2, 12, 8)

kernel_size=1 的 Conv1d 等价于 Position-wise Linear（每个时间步独立处理），与 Transformer FFN 语义相同，只是用 Conv1d 表达以利用批量计算优化。

toy 数值（batch=0, t=0）：y[0,0,:] = [a0,...,a7]（8维）→ conv1 升维到 16 → activation（gelu/relu）→ conv2 降维回 8 → y[0,0,:] = [b0,...,b7]。

步骤四：decomp2

python

res, _ = self.decomp2(x + y)

x + y 是残差连接后的结果 (2, 12, 8)；decomp2 再次分解，_ 丢弃 trend2，res 保留 seasonal。

最终 res (2, 12, 8) 是本层输出：经过 AutoCorrelation 聚合 + 两次趋势剥离后的纯 seasonal 表示。

6. 下钻子组件

子组件	职责	下层文档
`AutoCorrelationLayer`	Linear Q/K/V + view 多头 + FFT 互相关	[[04-Layer4-AutoCorrelationLayer]]
`series_decomp`（decomp1, decomp2）	moving_avg → trend；x − trend → seasonal	[[03D-Layer2D-SeriesDecomp]]

DLinear_v1_archive

Informer_v1_archive

PatchTST_v1_archive

12-SelfAttention_Family

01-DLinear

02-PatchTST

03-Informer

Layer 3 — EncoderLayer 精读

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步精读

5.0 完整原始代码

5.1 残差块 1：AutoCorrelation + decomp1

5.2 残差块 2：FFN + decomp2

6. 下钻子组件

Layer 3 — EncoderLayer 精读 ​

1. 在父层中的位置 ​

2. I/O 接口定义 ​

3. 顺序图（具体层） ​

4. 语义分组图（索引层） ​

5. 逐步精读 ​

5.0 完整原始代码 ​

5.1 残差块 1：AutoCorrelation + decomp1 ​

5.2 残差块 2：FFN + decomp2 ​

6. 下钻子组件 ​

Layer 3 — EncoderLayer 精读

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步精读

5.0 完整原始代码

5.1 残差块 1：AutoCorrelation + decomp1

5.2 残差块 2：FFN + decomp2

6. 下钻子组件