Skip to content

Layer 2B — PastDecomposableMixing(PDM)

1. 在父层中的位置

forecast()for i in range(self.layer): enc_out_list = self.pdm_blocks[i](enc_out_list) 共循环 e_layers=2 次。每次输入和输出 shape 相同。

2. I/O 接口定义

参数Shape说明
x_list (输入)[(6,24,8),(6,12,8),(6,6,8)]三个尺度,BN×T×d
返回[(6,24,8),(6,12,8),(6,6,8)]同形状,跨尺度混合后的特征

CI 模式下 BN=2×3=6d=d_model=8

3. 顺序图

4. 语义分组图

5. 逐步骤精读

§5.0 完整原始代码

python
class PastDecomposableMixing(nn.Module):
    def __init__(self, configs):
        super(PastDecomposableMixing, self).__init__()
        self.seq_len = configs.seq_len
        self.pred_len = configs.pred_len
        self.down_sampling_window = configs.down_sampling_window

        self.layer_norm = nn.LayerNorm(configs.d_model)
        self.dropout = nn.Dropout(configs.dropout)
        self.channel_independence = configs.channel_independence

        if configs.decomp_method == "moving_avg":
            self.decompsition = series_decomp(configs.moving_avg)
        elif configs.decomp_method == "dft_decomp":
            self.decompsition = DFT_series_decomp(configs.top_k)
        else:
            raise ValueError("decompsition is error")

        if configs.channel_independence == 0:
            self.cross_layer = nn.Sequential(
                nn.Linear(in_features=configs.d_model, out_features=configs.d_ff),
                nn.GELU(),
                nn.Linear(in_features=configs.d_ff, out_features=configs.d_model),
            )

        self.mixing_multi_scale_season = MultiScaleSeasonMixing(configs)
        self.mixing_multi_scale_trend = MultiScaleTrendMixing(configs)

        self.out_cross_layer = nn.Sequential(
            nn.Linear(in_features=configs.d_model, out_features=configs.d_ff),
            nn.GELU(),
            nn.Linear(in_features=configs.d_ff, out_features=configs.d_model),
        )

    def forward(self, x_list):
        length_list = []
        for x in x_list:
            _, T, _ = x.size()
            length_list.append(T)

        # Decompose to obtain the season and trend
        season_list = []
        trend_list = []
        for x in x_list:
            season, trend = self.decompsition(x)
            if self.channel_independence == 0:
                season = self.cross_layer(season)
                trend = self.cross_layer(trend)
            season_list.append(season.permute(0, 2, 1))
            trend_list.append(trend.permute(0, 2, 1))

        # bottom-up season mixing
        out_season_list = self.mixing_multi_scale_season(season_list)
        # top-down trend mixing
        out_trend_list = self.mixing_multi_scale_trend(trend_list)

        out_list = []
        for ori, out_season, out_trend, length in zip(
            x_list, out_season_list, out_trend_list, length_list
        ):
            out = out_season + out_trend
            if self.channel_independence:
                out = ori + self.out_cross_layer(out)
            out_list.append(out[:, :length, :])
        return out_list

§5.1 宏观逻辑

设计直觉:在同一尺度上混合趋势和季节会相互干扰——高频噪声影响趋势,趋势背景模糊季节峰值。PDM 先把两者分开,再让它们沿各自最有利的方向传播:季节从细粒度向粗粒度压缩(细粒度振荡信息最丰富),趋势从粗粒度向细粒度广播(粗粒度已被 AvgPool 天然平滑)。

用小例子(BN=2d=4,三个尺度 T=8,4,2)串起:series_decomp 把每尺度拆出 season 和 trend,均 permute 为 (2,4,Ti)(时间轴移末尾供 Linear 操作)。SeasonMixing 让 scale0 的 (2,4,8) 经 Linear(84) 压缩后加到 scale1;TrendMixing 让 scale2 的 (2,4,2) 经 Linear(24) 扩展后加到 scale1,再经 Linear(48) 扩展后加到 scale0。最终各尺度 season + trend 经 permute 还原为 (2,Ti,4),加上残差 ori + out_cross_layer(out) 输出。

§5.2 步骤 1 — 记录 length_list

python
length_list = []
for x in x_list:
    _, T, _ = x.size()
    length_list.append(T)

形状注解: 遍历 x_list 记录每个尺度的时间长度 T,用于最后的安全裁剪。

toy 数值: length_list = [24, 12, 6]。预先记录是因为后续 mixing 的 Linear 层输出维度严格等于下采样目标长度,理论上 shape 已正确,但若 seq_len 不能被 window 整除则可能产生微小差异,out[:, :length, :] 起保险作用。

§5.3 步骤 2 — series_decomp 分解 + permute

python
for x in x_list:
    season, trend = self.decompsition(x)
    if self.channel_independence == 0:
        season = self.cross_layer(season)
        trend = self.cross_layer(trend)
    season_list.append(season.permute(0, 2, 1))
    trend_list.append(trend.permute(0, 2, 1))

形状注解: decompsition = series_decomp(moving_avg=3)。对每个 x shape (6,Ti,8)series_decomp 内部对 x 做 3 点移动平均得 trendseason = x - trend,两者均为 (6,Ti,8)season.permute(0,2,1)T 轴移到最后,得 (6,8,Ti),因为后续 MultiScaleSeasonMixing 的 Linear 作用于最后一维(时间维)。CI 模式下 channel_independence==0 分支不执行。

toy 数值(尺度 0,T=24): 输入 x shape (6, 24, 8)。series_decomp 后 season shape (6, 24, 8)trend shape (6, 24, 8)season.permute(0, 2, 1)(6, 8, 24) 加入 season_list。三个尺度 permute 后:season_list = [(6,8,24), (6,8,12), (6,8,6)]trend_list = [(6,8,24), (6,8,12), (6,8,6)]

§5.4 步骤 3 — MultiScaleSeasonMixing(底向上)

python
out_season_list = self.mixing_multi_scale_season(season_list)

形状注解: 输入 season_list = [(6,8,24),(6,8,12),(6,8,6)]T 在最后)。输出 out_season_list = [(6,24,8),(6,12,8),(6,6,8)](已 permute 回 B,T,d 格式)。内部通过 Linear 将细粒度(T=24)信息压缩后加入粗粒度(T=12,6)。

toy 数值: 见 [[03B1-Layer3-SeasonMixing]] 完整追踪。结果:out_season_list = [(6,24,8),(6,12,8),(6,6,8)]

§5.5 步骤 4 — MultiScaleTrendMixing(顶向下)

python
out_trend_list = self.mixing_multi_scale_trend(trend_list)

形状注解: 输入 trend_list = [(6,8,24),(6,8,12),(6,8,6)]。输出 out_trend_list = [(6,24,8),(6,12,8),(6,6,8)](已 permute 回,且已从逆序还原回正序)。内部通过 Linear 将粗粒度(T=6)信息扩展后加入细粒度(T=12,24)。

toy 数值: 见 [[03B2-Layer3-TrendMixing]] 完整追踪。结果:out_trend_list = [(6,24,8),(6,12,8),(6,6,8)]

§5.6 步骤 5 — 合并 + 残差连接

python
for ori, out_season, out_trend, length in zip(
    x_list, out_season_list, out_trend_list, length_list
):
    out = out_season + out_trend
    if self.channel_independence:
        out = ori + self.out_cross_layer(out)
    out_list.append(out[:, :length, :])

形状注解(CI 模式): 以尺度 0 为例,ori = x_list[0] shape (6, 24, 8)out_season shape (6, 24, 8)out_trend shape (6, 24, 8)out = out_season + out_trend(6, 24, 8) 逐元素相加(季节与趋势合流)。out_cross_layer(Linear816 → GELU → Linear168)作用于最后一维,输出 (6, 24, 8)ori + out_cross_layer(out) 是 Transformer 风格残差连接。out[:, :24, :] 裁剪无效(length=24,不裁剪),shape 仍 (6, 24, 8)

toy 数值: out_cross_layer(6, 24, 8) 每个位置的 8 维向量独立做两层 MLP:h=GELU(W1u)v=W2h,其中 W1R16×8W2R8×16。残差后 out = ori + out_cross_layer(out) shape (6, 24, 8)。三个尺度处理后:out_list = [(6,24,8),(6,12,8),(6,6,8)],与输入 x_list 形状完全相同。

CI vs CD 合并路径的区别

CI 模式channel_independence=1,本文默认):out = ori + out_cross_layer(out),是"输入 + 混合结果"的残差,类 Transformer block 风格。 CD 模式channel_independence=0):跳过此残差,直接 out = out_season + out_trend(CD 的 cross_layer 已在 step 2 分解时作用于 season/trend 本身)。两种路径的 out_list 形状相同,特征含义不同。

6. 下钻子组件

子组件职责文档
MultiScaleSeasonMixing底向上季节混合(细→粗)[[03B1-Layer3-SeasonMixing]]
MultiScaleTrendMixing顶向下趋势混合(粗→细)[[03B2-Layer3-TrendMixing]]

*记录并在线阅读我的笔记*