Skip to content

Layer 2C — Decoder 精读

forecast() 主链([[02-Layer1-forecast主链]])调用:
seasonal_part, trend_part = self.decoder(dec_out, enc_out, x_mask=None, cross_mask=None, trend=trend_init)


1. 在父层中的位置

forecast()
  └─ self.decoder(dec_out, enc_out, trend=trend_init)  ← 本文档
       └─ DecoderLayer[0](x, cross)                   → 详见 03C1-Layer3-DecoderLayer

2. I/O 接口定义

shape含义
输入 x(2, 10, 8)dec_embedding 输出(seasonal 路径)
输入 cross(2, 12, 8)encoder 输出(cross-attention KV)
输入 trend(2, 10, 5)trend 初始值(forecast 构造的 trend_init)
输出 seasonal_part(2, 10, 5)季节分量(d_model→c_out 投影后)
输出 trend_part(2, 10, 5)趋势分量(trend_init + 累加的 residual_trend)

3. 顺序图(具体层)


4. 语义分组图(索引层)


5. 逐步精读

5.0 完整原始代码

python
class Decoder(nn.Module):
    def __init__(self, layers, norm_layer=None, projection=None):
        super(Decoder, self).__init__()
        self.layers = nn.ModuleList(layers)
        self.norm = norm_layer
        self.projection = projection

    def forward(self, x, cross, x_mask=None, cross_mask=None, trend=None):
        for layer in self.layers:
            x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
            trend = trend + residual_trend

        if self.norm is not None:
            x = self.norm(x)

        if self.projection is not None:
            x = self.projection(x)
        return x, trend

5.1 DecoderLayer 循环 + trend 累加

python
for layer in self.layers:
    x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
    trend = trend + residual_trend

d_layers=1,循环只执行 1 次。DecoderLayer 返回两路:

  • x (2,10,8):更新后的 seasonal 表示
  • residual_trend (2,10,5):本层 3 次 decomp 提取的趋势之和,经 Conv1d 投影到 c_out=5

trend = trend_init + residual_trend:将初始 trend(历史均值填充)与本层提取的趋势增量相加,得到最终 trend_part (2,10,5)

d_layers=2(超参可调),则循环两次,每次都做 trend += residual_trend,实现渐进式趋势精化

→ DecoderLayer 内部详见 [[03C1-Layer3-DecoderLayer]]


5.2 my_Layernorm + Linear 投影

python
if self.norm is not None:
    x = self.norm(x)

if self.projection is not None:
    x = self.projection(x)
return x, trend

my_Layernorm(x) 同 Encoder 的处理:LayerNorm → 减去时间均值,保留 seasonal 特性。

self.projection = nn.Linear(d_model=8, c_out=5) 将 seasonal 表示从 d_model 维投影回原始变量数:

seasonal\_part=xWprojTWprojR5×8
  • 输入 x (2, 10, 8) → Linear(8→5) → seasonal_part (2, 10, 5)

trend_part 不经过 projection(已经在 DecoderLayer 内部用 Conv1d 投影到 c_out=5),直接原样返回 (2, 10, 5)

toy 数值:trend_part[0,6,0] = trend_init[0,6,0] + residual_trend[0,6,0] = 5.0 + 0.03 = 5.03


6. 下钻子组件

子组件职责下层文档
DecoderLayer3段(masked self-attn + cross-attn + FFN)+ 3×decomp + trend 路由[[03C1-Layer3-DecoderLayer]]

*记录并在线阅读我的笔记*