Layer 2B — Encoder（Distilling 调度层）

父层（Layer 1）的步骤④：self.encoder(enc_out, attn_mask=None)。
本文档只覆盖 Encoder.forward 这一层（distilling 分支路由 + 循环调度 + 最终 LayerNorm）。
子层 EncoderLayer 及以下见 03B1-Layer3-EncoderLayer。

1. 在父层中的位置

long_forecast()
  └─ ④ enc_out, attns = self.encoder(enc_out, attn_mask=None)   ← 本文档
          ├─ EncoderLayer 0 + ConvLayer 0   (seq: 10 → 6)       → 详见 Layer3
          └─ EncoderLayer 1（最后一层）      (seq: 6 → 6)        → 详见 Layer3

2. I/O 接口定义

python

def forward(self, x, attn_mask=None, tau=None, delta=None):

	shape（toy）	含义
输入 `x`	`(3, 10, 8)` = `(B, seq_len, d_model)`	encoder embedding 输出
输出 `x`	`(3, 6, 8)`	distilling 后的 encoder 表示（seq 10→6）
输出 `attns`	`[None, None]`	注意力权重列表（`output_attention=False`）

attn_mask / tau / delta 全为 None，原样透传给 EncoderLayer。

3. 顺序图（具体层）

4. 语义分组图（索引层）

Encoder 只做"管理"：选路径 → 按轮次调度 EncoderLayer + ConvLayer → 收尾归一化。

5. 逐步解析

5.0 完整原始代码

python

class Encoder(nn.Module):
    def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
        super(Encoder, self).__init__()
        self.attn_layers = nn.ModuleList(attn_layers)
        self.conv_layers = (
            nn.ModuleList(conv_layers) if conv_layers is not None else None
        )
        self.norm = norm_layer
        
	def forward(self, x, attn_mask=None, tau=None, delta=None):
	    # x [B, L, D]
	    attns = []
	    if self.conv_layers is not None:
	        for i, (attn_layer, conv_layer) in enumerate(
	            zip(self.attn_layers, self.conv_layers)
	        ):
	            delta = delta if i == 0 else None
	            x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
	            x = conv_layer(x)
	            attns.append(attn)
	        x, attn = self.attn_layers[-1](x, tau=tau, delta=None)
	        attns.append(attn)
	    else:
	        for attn_layer in self.attn_layers:
	            x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
	            attns.append(attn)
	
	    if self.norm is not None:
	        x = self.norm(x)
	    return x, attns

5.1 分支路由（if/else）

本节的作用

根据 conv_layers 是否为 None 决定走 distilling 路径（Informer）还是简单串联路径（PatchTST 等）。

分支	条件	迭代逻辑	适用模型
`if`（distilling）	`conv_layers is not None` ← Informer	attn_layer + conv_layer 交替；最后再单独跑 attn_layers[-1]	Informer
`else`（简单串联）	`conv_layers is None`	只有 attn_layer，纯串联	PatchTST / 通用 Transformer

Informer 构造时传入 conv_layers=[ConvLayer_0]（e_layers-1=1 个），走 if 分支。

5.2 distilling 循环调度

本节的作用

zip(attn_layers, conv_layers) 将 EncoderLayer 和 ConvLayer 配对循环，实现交替"注意力 → 压缩"；最后单独调用 attn_layers[-1] 补上最后一个 EncoderLayer。

python

for i, (attn_layer, conv_layer) in enumerate(
    zip(self.attn_layers, self.conv_layers)
):
    delta = delta if i == 0 else None
    x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
    x = conv_layer(x)
    attns.append(attn)
x, attn = self.attn_layers[-1](x, tau=tau, delta=None)
attns.append(attn)

zip(self.attn_layers, self.conv_layers) 把 e_layers-1=1 组 (EncoderLayer, ConvLayer) 配对循环。每轮：EncoderLayer 做注意力（形状不变），ConvLayer 做 seq 压缩。

循环结束后，单独调用 attn_layers[-1]（最后一个 EncoderLayer，不再配 ConvLayer）。

delta 只在 i==0 时传真实值，后续置 None——因为 delta 是对第一层 Encoder 的外部偏移信号。

x: (3,10,8) → EncoderLayer 0 → (3,10,8) → ConvLayer 0 → (3,6,8)
                                                           ↑
                                                     seq 10→6（distilling）
(3,6,8) → EncoderLayer 1（attn_layers[-1]）→ (3,6,8)

attns = [None, None]

EncoderLayer 的内部细节见 03B1-Layer3-EncoderLayer。
ConvLayer 的 seq 压缩逻辑见下方 §5 ConvLayer 一节。

5.3 ConvLayer — seq 压缩（distilling 核心）

Transformer_EncDec.py:6（__init__）/ Transformer_EncDec.py:20（forward）

Conv1d(k=3, circular) → BatchNorm → ELU → MaxPool(stride=2) 把 seq 从 10 压到 6：

(3,10,8) → permute → (3,8,10) → Conv1d → (3,8,12) → BN+ELU → MaxPool → (3,8,6) → transpose → (3,6,8)

完整逐步精读（含 toy 数值、circular padding 推导、MaxPool 窗口计算）→ [[03B2-Layer3-ConvLayer]]

5.4 最终归一化

本节的作用

所有 EncoderLayer 和 ConvLayer 执行完毕后，对最终 token 序列做一次 LayerNorm，统一数值范围再返回给父层。

python

if self.norm is not None:
    x = self.norm(x)
return x, attns

self.norm = LayerNorm(8)，对最后一维归一化，形状不变 (3,6,8)。

6. 下钻子组件

子组件	职责	下层文档
`EncoderLayer`（`attn_layer`）	Transformer block：ProbSparse 注意力残差 + FFN 残差	[[03B1-Layer3-EncoderLayer]]
`ConvLayer`（`conv_layer`）	Conv1d + BN + ELU + MaxPool(stride=2)，seq 压缩（distilling 核心）	[[03B2-Layer3-ConvLayer]]

DLinear_v1_archive

Informer_v1_archive

PatchTST_v1_archive

12-SelfAttention_Family

01-DLinear

02-PatchTST

03-Informer

Layer 2B — Encoder（Distilling 调度层）

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步解析

5.0 完整原始代码

5.1 分支路由（if/else）

5.2 distilling 循环调度

5.3 ConvLayer — seq 压缩（distilling 核心）

5.4 最终归一化

6. 下钻子组件

Layer 2B — Encoder（Distilling 调度层） ​

1. 在父层中的位置 ​

2. I/O 接口定义 ​

3. 顺序图（具体层） ​

4. 语义分组图（索引层） ​

5. 逐步解析 ​

5.0 完整原始代码 ​

5.1 分支路由（if/else） ​

5.2 distilling 循环调度 ​

5.3 ConvLayer — seq 压缩（distilling 核心） ​

5.4 最终归一化 ​

6. 下钻子组件 ​

Layer 2B — Encoder（Distilling 调度层）

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步解析

5.0 完整原始代码

5.1 分支路由（if/else）

5.2 distilling 循环调度

5.3 ConvLayer — seq 压缩（distilling 核心）

5.4 最终归一化

6. 下钻子组件