FEDformer 总览

一、论文问题与动机

Autoformer 的残留问题：Autoformer 用 FFT 计算时延相关（Auto-Correlation），找到 top-k 个 lag 并做时延聚合。这是对注意力的改进，但：

仍是局部的：top-k lag 选择是基于平均相关强度，忽略了频谱中每个频率各自的贡献
没有利用频谱全局结构：模型只知道"哪几个 lag 最重要"，不知道"频谱的低秩结构"

FEDformer 的核心论点：

时间序列的有用信息在频谱中高度稀疏——只有 $M ≪ L$ 个傅里叶模式携带绝大多数能量。因此直接在 $M$ 个频率上做注意力，复杂度为 $O (M \cdot L)$ ，在 $M$ 固定时等价于 $O (L)$ 严格线性。

Informer  → "O(L²) 太慢"               → ProbSparse → O(L log L)
Autoformer → "忽略时序周期性"           → Auto-Correlation（FFT 自相关）+ 分解
FEDformer → "Auto-Correlation 仍是局部的" → 完整频域注意力：直接操作 M 个频率 → O(L)

二、核心创新

2.1 FourierBlock（频域 Self-Attention）

标准 Self-Attention 在时域做 $Q K^{T}$ ；FEDformer 把 $Q$ 变换到频域，只在 $M$ 个随机选取的频率上做线性变换：

Q (B, L, H, E)
  → permute → (B, H, E, L)
  → rfft    → (B, H, E, L//2+1)  [复数频谱]
  → 选 M 个频率 index = [ω₁, ..., ωM]（random 或 low）
  → for each ωᵢ: out_ft[:,:,:,wi] = compl_mul1d(Q_freq[:,:,:,i], W[:,:,:,wi])
     ← 每个频率独立地做复数线性变换，W ∈ ℂ^{H×E×E}
  → irfft(out_ft, n=L) → (B, H, E, L)

公式： ${\hat{y}}_{ω_{i}} = \sum_{h, e} {\hat{q}}_{ω_{i}}^{(h, e)} \cdot W_{ω_{i}}^{(h, e, o)}$ ，其中 $W = W_{1} + j W_{2}$ （实虚部分离参数化）。

2.2 FourierCrossAttention（频域 Cross-Attention）

Decoder cross-attention 中 Q 来自 decoder（dec_len=10），K/V 来自 encoder（seq_len=12）。频域 cross-attention：

Q_freq (M_q 模式) × K_freq (M_kv 模式) → 频域 attention 矩阵 (M_q × M_kv)
→ tanh 激活（实虚部分开）
→ × K_freq（作为 V）→ 加权 V 频域表示 (M_q 模式)
→ × 可学习权重 W → 投影
→ scatter 回频率轴 → irfft → 时域输出

2.3 跨模型对比表

维度	DLinear	Informer	Autoformer	Non-stationary	FEDformer	iTransformer
核心问题	简单线性外推	$O (L^{2})$ 效率	周期性缺失	过平稳化	Auto-Corr 仍局部	排列不变性
Attention 类型	无	ProbSparse	Auto-Correlation	DSAttention	FourierBlock	倒置 FullAttention
Attention 复杂度	—	$O (L \log L)$	$O (L \log L)$	$O (L^{2})$	$O (M \cdot L)$	$O (N^{2})$
时域 vs 频域	时域 Linear	时域稀疏	FFT 自相关 → 时域	时域（修正分布）	频域直接操作	时域（倒置）
分解	无	无	Series Decomp	无	Series Decomp	无
非平稳处理	无	无	趋势分解	τ/δ 注入	趋势分解（同Autoformer）	Instance Norm
token 语义	时间步	时间步	时间步	时间步	时间步	变量

2.4 FEDformer = Autoformer 骨架 + FourierBlock 替换

FEDformer 共享 Autoformer 所有骨架组件（series_decomp、EncoderLayer、DecoderLayer 的残差/FFN/Layernorm 结构），唯一替换是注意力机制：

位置	Autoformer	FEDformer
Encoder self-attn	AutoCorrelation	FourierBlock
Decoder self-attn	AutoCorrelation	FourierBlock
Decoder cross-attn	AutoCorrelation	FourierCrossAttention

三、架构概览（论文层 mermaid）

整体架构 SVG：

四、TFB 调用链

五、BFS 文档树

文件	内容	下钻理由
[[01-Layer0-接入界面]]	TransformerAdapter + FEDformer 三个专属参数	有独立 I/O、多步配置
[[02-Layer1-forecast主链]]	series_decomp + 双路构造 + enc + dec	主链 ≥8 步骤
[[03A-Layer2A-FourierBlock]]	FFT → 复数线性变换 → irfft；⚠️ n_heads=8 硬编码	≥5步 + view quirk
[[03B-Layer2B-FourierCrossAttention]]	频域 Q×K 注意力 + tanh + V 加权	4步 einsum + 不同 seq_len

Series Decomp、EncoderLayer/DecoderLayer 骨架：与 Autoformer 代码完全相同，见 [[Autoformer]] 系列文档（03D-Layer2D-SeriesDecomp、03B1-Layer3-EncoderLayer、03C1-Layer3-DecoderLayer）。FEDformer 的唯一区别是 attention 组件替换为 FourierBlock / FourierCrossAttention。

六、论文组件 → 代码对应表

论文组件	TFB 代码实现	源文件	精读文档
FEB-f（频域 Encoder 块）	`FourierBlock`	`FourierCorrelation.py`	[[03A-Layer2A-FourierBlock]]
FEB-fw（频域 Cross 块）	`FourierCrossAttention`	`FourierCorrelation.py`	[[03B-Layer2B-FourierCrossAttention]]
Series Decomp	`series_decomp`（同 Autoformer）	`Autoformer_EncDec.py`	[[Autoformer/03D-Layer2D-SeriesDecomp]]
MOEDecomp	⚠️ 未实现，TFB 用 series_decomp 替代	—	—
Encoder Layer	`EncoderLayer`（同 Autoformer）	`Autoformer_EncDec.py`	[[Autoformer/03B1-Layer3-EncoderLayer]]
Decoder Layer	`DecoderLayer`（同 Autoformer）	`Autoformer_EncDec.py`	[[Autoformer/03C1-Layer3-DecoderLayer]]

七、全局 Toy 参数

参数	值	说明
B	3	batch size
seq_len	12	encoder 输入长度
label_len	6	decoder 历史前缀
pred_len	4	预测步数
dec_len	10	= label_len + pred_len
enc_in = dec_in = c_out	5	变量数
d_model	16	隐层维度（必须被 8 整除！）
n_heads	8	必须 = 8（FourierBlock 硬编码，见 §8）
d_keys = E	2	= d_model / n_heads = 16 / 8
d_ff	32	FFN 隐层
modes	4	频率注意力选用的模式数（用户设置），实际 = min(4, seq_len//2=6) = 4
mode_select	random	模式选择策略
moving_avg	7	分解核大小（奇数，覆盖 Autoformer 默认 25）
time_dims	4	时间标记特征维度（embed="timeF"）
e_layers	2	encoder 层数
d_layers	1	decoder 层数

维度唯一性验证：B=3, seq_len=12, label_len=6, pred_len=4, dec_len=10, enc_in=5, d_model=16, n_heads=8, d_keys=2, modes=4（actual_modes 实际与 pred_len/time_dims 同为 4，但这三者始终出现在不同 tensor 维度上，追踪中不会混淆）, d_ff=32, moving_avg=7。

八、关键 Quirks（运行前须知）

⚠️ FourierBlock 硬编码 n_heads=8

FourierBlock.__init__ 中 weights1 第一维写死为 8，不使用 n_heads 参数：
python
self.weights1 = nn.Parameter(
    self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index), ...)
)
因此 n_heads 必须等于 8，否则 einsum("bhi,hio->bho") 中 h 维不匹配，运行时报错。 TFB 默认 n_heads=8 满足此约束。FourierCrossAttention 正确使用 num_heads 参数，无此问题。

AutoCorrelationLayer 的 shape 语义 quirk

AutoCorrelationLayer.forward() 在调用 inner_correlation（即 FourierBlock）后执行：
python
out = out.view(B, L, -1)
FourierBlock 返回 (B, H, E, L) = (3, 8, 2, 12)，但 view(3, 12, -1) 期望输入按 (B, L, H*E) 排列。实际上 total elements 相同（3×8×2×12 = 3×12×16），reshape 不报错，但每个"时间步"包含了来自不同 head 和不同时间位置的混合数据。模型仍能收敛（out_projection 是全连接层，可从混合表示学习映射），但这是已知的布局语义偏差。

九、推荐阅读路径

快速理解 FEDformer 设计意图：[[00-总览]] → §二核心创新 → [[03A-Layer2A-FourierBlock]] §宏观逻辑 → [[04-收束]] 对比表

完整代码精读：[[00-总览]] → [[01-Layer0-接入界面]] → [[02-Layer1-forecast主链]] → [[03A-Layer2A-FourierBlock]] → [[03B-Layer2B-FourierCrossAttention]] → [[04-收束]]

创建：2026-04-26 · 完整 BFS 精读文档（替换 2026-04-24 局部精读版）

DLinear_v1_archive

Informer_v1_archive

PatchTST_v1_archive

12-SelfAttention_Family

01-DLinear

02-PatchTST

03-Informer

FEDformer 总览

一、论文问题与动机

二、核心创新

2.1 FourierBlock（频域 Self-Attention）

2.2 FourierCrossAttention（频域 Cross-Attention）

2.3 跨模型对比表

2.4 FEDformer = Autoformer 骨架 + FourierBlock 替换

三、架构概览（论文层 mermaid）

四、TFB 调用链

五、BFS 文档树

六、论文组件 → 代码对应表

七、全局 Toy 参数

八、关键 Quirks（运行前须知）

九、推荐阅读路径

FEDformer 总览 ​

一、论文问题与动机 ​

二、核心创新 ​

2.1 FourierBlock（频域 Self-Attention） ​

2.2 FourierCrossAttention（频域 Cross-Attention） ​

2.3 跨模型对比表 ​

2.4 FEDformer = Autoformer 骨架 + FourierBlock 替换 ​

三、架构概览（论文层 mermaid） ​

四、TFB 调用链 ​

五、BFS 文档树 ​

六、论文组件 → 代码对应表 ​

七、全局 Toy 参数 ​

八、关键 Quirks（运行前须知） ​

九、推荐阅读路径 ​

FEDformer 总览

一、论文问题与动机

二、核心创新

2.1 FourierBlock（频域 Self-Attention）

2.2 FourierCrossAttention（频域 Cross-Attention）

2.3 跨模型对比表

2.4 FEDformer = Autoformer 骨架 + FourierBlock 替换

三、架构概览（论文层 mermaid）

四、TFB 调用链

五、BFS 文档树

六、论文组件 → 代码对应表

七、全局 Toy 参数

八、关键 Quirks（运行前须知）

九、推荐阅读路径