Layer 2B — `PastDecomposableMixing`（PDM）

1. 在父层中的位置

forecast() 中 for i in range(self.layer): enc_out_list = self.pdm_blocks[i](enc_out_list) 共循环 e_layers=2 次。每次输入和输出 shape 相同。

2. I/O 接口定义

参数	Shape	说明
`x_list` (输入)	`[(6,24,8),(6,12,8),(6,6,8)]`	三个尺度， $B \cdot N \times T \times d$
返回	`[(6,24,8),(6,12,8),(6,6,8)]`	同形状，跨尺度混合后的特征

CI 模式下 $B \cdot N = 2 \times 3 = 6$ ， $d = d_model = 8$ 。

3. 顺序图

4. 语义分组图

5. 逐步骤精读

§5.0 完整原始代码

python

class PastDecomposableMixing(nn.Module):
    def __init__(self, configs):
        super(PastDecomposableMixing, self).__init__()
        self.seq_len = configs.seq_len
        self.pred_len = configs.pred_len
        self.down_sampling_window = configs.down_sampling_window

        self.layer_norm = nn.LayerNorm(configs.d_model)
        self.dropout = nn.Dropout(configs.dropout)
        self.channel_independence = configs.channel_independence

        if configs.decomp_method == "moving_avg":
            self.decompsition = series_decomp(configs.moving_avg)
        elif configs.decomp_method == "dft_decomp":
            self.decompsition = DFT_series_decomp(configs.top_k)
        else:
            raise ValueError("decompsition is error")

        if configs.channel_independence == 0:
            self.cross_layer = nn.Sequential(
                nn.Linear(in_features=configs.d_model, out_features=configs.d_ff),
                nn.GELU(),
                nn.Linear(in_features=configs.d_ff, out_features=configs.d_model),
            )

        self.mixing_multi_scale_season = MultiScaleSeasonMixing(configs)
        self.mixing_multi_scale_trend = MultiScaleTrendMixing(configs)

        self.out_cross_layer = nn.Sequential(
            nn.Linear(in_features=configs.d_model, out_features=configs.d_ff),
            nn.GELU(),
            nn.Linear(in_features=configs.d_ff, out_features=configs.d_model),
        )

    def forward(self, x_list):
        length_list = []
        for x in x_list:
            _, T, _ = x.size()
            length_list.append(T)

        # Decompose to obtain the season and trend
        season_list = []
        trend_list = []
        for x in x_list:
            season, trend = self.decompsition(x)
            if self.channel_independence == 0:
                season = self.cross_layer(season)
                trend = self.cross_layer(trend)
            season_list.append(season.permute(0, 2, 1))
            trend_list.append(trend.permute(0, 2, 1))

        # bottom-up season mixing
        out_season_list = self.mixing_multi_scale_season(season_list)
        # top-down trend mixing
        out_trend_list = self.mixing_multi_scale_trend(trend_list)

        out_list = []
        for ori, out_season, out_trend, length in zip(
            x_list, out_season_list, out_trend_list, length_list
        ):
            out = out_season + out_trend
            if self.channel_independence:
                out = ori + self.out_cross_layer(out)
            out_list.append(out[:, :length, :])
        return out_list

§5.1 宏观逻辑

设计直觉：在同一尺度上混合趋势和季节会相互干扰——高频噪声影响趋势，趋势背景模糊季节峰值。PDM 先把两者分开，再让它们沿各自最有利的方向传播：季节从细粒度向粗粒度压缩（细粒度振荡信息最丰富），趋势从粗粒度向细粒度广播（粗粒度已被 AvgPool 天然平滑）。

用小例子（ $B \cdot N = 2$ ， $d = 4$ ，三个尺度 $T = 8, 4, 2$ ）串起：series_decomp 把每尺度拆出 season 和 trend，均 permute 为 $(2, 4, T_{i})$ （时间轴移末尾供 Linear 操作）。SeasonMixing 让 scale0 的 $(2, 4, 8)$ 经 Linear $(8 \to 4)$ 压缩后加到 scale1；TrendMixing 让 scale2 的 $(2, 4, 2)$ 经 Linear $(2 \to 4)$ 扩展后加到 scale1，再经 Linear $(4 \to 8)$ 扩展后加到 scale0。最终各尺度 season + trend 经 permute 还原为 $(2, T_{i}, 4)$ ，加上残差 ori + out_cross_layer(out) 输出。

§5.2 步骤 1 — 记录 length_list

python

length_list = []
for x in x_list:
    _, T, _ = x.size()
    length_list.append(T)

形状注解： 遍历 x_list 记录每个尺度的时间长度 $T$ ，用于最后的安全裁剪。

toy 数值： length_list = [24, 12, 6]。预先记录是因为后续 mixing 的 Linear 层输出维度严格等于下采样目标长度，理论上 shape 已正确，但若 seq_len 不能被 window 整除则可能产生微小差异，out[:, :length, :] 起保险作用。

§5.3 步骤 2 — series_decomp 分解 + permute

python

for x in x_list:
    season, trend = self.decompsition(x)
    if self.channel_independence == 0:
        season = self.cross_layer(season)
        trend = self.cross_layer(trend)
    season_list.append(season.permute(0, 2, 1))
    trend_list.append(trend.permute(0, 2, 1))

形状注解： decompsition = series_decomp(moving_avg=3)。对每个 x shape $(6, T_{i}, 8)$ ：series_decomp 内部对 x 做 3 点移动平均得 trend，season = x - trend，两者均为 $(6, T_{i}, 8)$ 。season.permute(0,2,1) 将 $T$ 轴移到最后，得 $(6, 8, T_{i})$ ，因为后续 MultiScaleSeasonMixing 的 Linear 作用于最后一维（时间维）。CI 模式下 channel_independence==0 分支不执行。

toy 数值（尺度 0， $T = 24$ ）： 输入 x shape (6, 24, 8)。series_decomp 后 season shape (6, 24, 8)，trend shape (6, 24, 8)。season.permute(0, 2, 1) → (6, 8, 24) 加入 season_list。三个尺度 permute 后：season_list = [(6,8,24), (6,8,12), (6,8,6)]，trend_list = [(6,8,24), (6,8,12), (6,8,6)]。

§5.4 步骤 3 — MultiScaleSeasonMixing（底向上）

python

out_season_list = self.mixing_multi_scale_season(season_list)

形状注解： 输入 season_list = [(6,8,24),(6,8,12),(6,8,6)]（ $T$ 在最后）。输出 out_season_list = [(6,24,8),(6,12,8),(6,6,8)]（已 permute 回 $B, T, d$ 格式）。内部通过 Linear 将细粒度（ $T = 24$ ）信息压缩后加入粗粒度（ $T = 12, 6$ ）。

toy 数值： 见 [[03B1-Layer3-SeasonMixing]] 完整追踪。结果：out_season_list = [(6,24,8),(6,12,8),(6,6,8)]。

§5.5 步骤 4 — MultiScaleTrendMixing（顶向下）

python

out_trend_list = self.mixing_multi_scale_trend(trend_list)

形状注解： 输入 trend_list = [(6,8,24),(6,8,12),(6,8,6)]。输出 out_trend_list = [(6,24,8),(6,12,8),(6,6,8)]（已 permute 回，且已从逆序还原回正序）。内部通过 Linear 将粗粒度（ $T = 6$ ）信息扩展后加入细粒度（ $T = 12, 24$ ）。

toy 数值： 见 [[03B2-Layer3-TrendMixing]] 完整追踪。结果：out_trend_list = [(6,24,8),(6,12,8),(6,6,8)]。

§5.6 步骤 5 — 合并 + 残差连接

python

for ori, out_season, out_trend, length in zip(
    x_list, out_season_list, out_trend_list, length_list
):
    out = out_season + out_trend
    if self.channel_independence:
        out = ori + self.out_cross_layer(out)
    out_list.append(out[:, :length, :])

形状注解（CI 模式）： 以尺度 0 为例，ori = x_list[0] shape (6, 24, 8)，out_season shape (6, 24, 8)，out_trend shape (6, 24, 8)。out = out_season + out_trend → (6, 24, 8) 逐元素相加（季节与趋势合流）。out_cross_layer（Linear $8 \to 16$ → GELU → Linear $16 \to 8$ ）作用于最后一维，输出 (6, 24, 8)。ori + out_cross_layer(out) 是 Transformer 风格残差连接。out[:, :24, :] 裁剪无效（length=24，不裁剪），shape 仍 (6, 24, 8)。

toy 数值： out_cross_layer 对 (6, 24, 8) 每个位置的 8 维向量独立做两层 MLP： $h = GELU (W_{1} u)$ ， $v = W_{2} h$ ，其中 $W_{1} \in R^{16 \times 8}$ ， $W_{2} \in R^{8 \times 16}$ 。残差后 out = ori + out_cross_layer(out) shape (6, 24, 8)。三个尺度处理后：out_list = [(6,24,8),(6,12,8),(6,6,8)]，与输入 x_list 形状完全相同。

CI vs CD 合并路径的区别

CI 模式（channel_independence=1，本文默认）：out = ori + out_cross_layer(out)，是"输入 + 混合结果"的残差，类 Transformer block 风格。 CD 模式（channel_independence=0）：跳过此残差，直接 out = out_season + out_trend（CD 的 cross_layer 已在 step 2 分解时作用于 season/trend 本身）。两种路径的 out_list 形状相同，特征含义不同。

6. 下钻子组件

子组件	职责	文档
`MultiScaleSeasonMixing`	底向上季节混合（细→粗）	[[03B1-Layer3-SeasonMixing]]
`MultiScaleTrendMixing`	顶向下趋势混合（粗→细）	[[03B2-Layer3-TrendMixing]]

DLinear_v1_archive

Informer_v1_archive

PatchTST_v1_archive

12-SelfAttention_Family

01-DLinear

02-PatchTST

03-Informer

Layer 2B — `PastDecomposableMixing`（PDM）

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图

4. 语义分组图

5. 逐步骤精读

§5.0 完整原始代码

§5.1 宏观逻辑

§5.2 步骤 1 — 记录 length_list

§5.3 步骤 2 — series_decomp 分解 + permute

§5.4 步骤 3 — MultiScaleSeasonMixing（底向上）

§5.5 步骤 4 — MultiScaleTrendMixing（顶向下）

§5.6 步骤 5 — 合并 + 残差连接

6. 下钻子组件

Layer 2B — PastDecomposableMixing（PDM） ​

1. 在父层中的位置 ​

2. I/O 接口定义 ​

3. 顺序图 ​

4. 语义分组图 ​

5. 逐步骤精读 ​

§5.0 完整原始代码 ​

§5.1 宏观逻辑 ​

§5.2 步骤 1 — 记录 length_list ​

§5.3 步骤 2 — series_decomp 分解 + permute ​

§5.4 步骤 3 — MultiScaleSeasonMixing（底向上） ​

§5.5 步骤 4 — MultiScaleTrendMixing（顶向下） ​

§5.6 步骤 5 — 合并 + 残差连接 ​

6. 下钻子组件 ​

Layer 2B — `PastDecomposableMixing`（PDM）

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图

4. 语义分组图

5. 逐步骤精读

§5.0 完整原始代码

§5.1 宏观逻辑

§5.2 步骤 1 — 记录 length_list

§5.3 步骤 2 — series_decomp 分解 + permute

§5.4 步骤 3 — MultiScaleSeasonMixing（底向上）

§5.5 步骤 4 — MultiScaleTrendMixing（顶向下）

§5.6 步骤 5 — 合并 + 残差连接

6. 下钻子组件