DLinear 收束

1. 端到端流程图

T = seq_len，P = pred_len，C = enc_in

2. tensor 变化汇总表

步骤	操作	shape 变化	toy shape
输入	x_enc	—	`(2, 6, 3)`
series_decomp	moving_avg + 残差	不变 × 2	`(2,6,3)` × 2
permute	(B,T,C)→(B,C,T)	轴 1↔2 交换	`(2,3,6)`
Linear_Seasonal	seq_len→pred_len	最后维 6→2	`(2,3,2)`
Linear_Trend	seq_len→pred_len	最后维 6→2	`(2,3,2)`
相加	element-wise	不变	`(2,3,2)`
permute	(B,C,P)→(B,P,C)	轴 1↔2 交换	`(2,2,3)`
截取	`[:,-pred_len:,:]`	不变（已是pred_len）	`(2,2,3)`

3. 各文档覆盖范围

文档	覆盖代码	关键 tensor
01-Layer0-接入界面	`Config` + `__init__` + `_process` + `forward` 分支	`(B,seq_len,C)` 进入模型
02-Layer1-encoder主链	`encoder()` 完整流程	`(B,T,C)→(B,C,T)→(B,C,P)→(B,P,C)`
03-Layer2-series_decomp	`moving_avg` + `series_decomp`	`(B,T,C)→seasonal+trend (B,T,C)`

4. DLinear 核心设计决策

决策	做法	为什么
序列分解	moving_avg + 残差	让每路 Linear 只处理单一模式（趋势/季节），降低拟合难度
permute 包裹 Linear	`(B,T,C)→(B,C,T)→Linear→(B,C,P)`	`nn.Linear` 只作用于最后维，需要把时间轴移到最后
individual=False	所有变量共享一套线性头	参数少，训练快，对多变量数据有正则化效果
权重初始化为 1/seq_len	`torch.ones * (1/seq_len)`	相当于初始化为"对历史均匀加权的均值预测"，是合理的起点
丢弃 x_dec / x_mark	forward 只用 x_enc	DLinear 不需要 decoder，框架接口统一但模型自由选用

5. 与同类模型对比

维度	DLinear	Informer	PatchTST
核心操作	线性层	ProbSparse Attention	Patch + Full Attention
序列分解	✓ moving_avg	✓ Autoformer-style	✗
channel-independent	可选	✗	✓ 强制
参数量	极小（2×seq_len×pred_len）	大	中等
计算复杂度	O(T)	O(T log T)	O(T/P)²

6. 下一步读什么

iTransformer：把 DLinear 的"变量独立"思路推进一步，在变量维做注意力
TimesNet：把 1D 时序变换为 2D 来做 CNN，与 DLinear 的线性思路形成对比