Layer 4 — AutoCorrelationLayer 精读

由 EncoderLayer（[[03B1-Layer3-EncoderLayer]]）和 DecoderLayer（[[03C1-Layer3-DecoderLayer]]）调用，共计 4 种场景。
本层是 d_model ↔ 多头格式的桥梁，内部注意力计算委托给 AutoCorrelation。

1. 在父层中的位置

调用位置	mask_flag	Q 来源	K/V 来源	L_Q	L_K
EncoderLayer[0,1] 自注意力	False	enc_out (2,12,8)	enc_out (2,12,8)	12	12
DecoderLayer masked 自注意力	True	dec_out (2,10,8)	dec_out (2,10,8)	10	10
DecoderLayer cross 注意力	False	dec_out (2,10,8)	enc_out (2,12,8)	10	12

2. I/O 接口定义

以 EncoderLayer 自注意力（toy）为例：

	shape	含义
输入 `queries`	`(2, 12, 8)`	encoder token（自注意力时 Q=K=V）
输入 `keys`	`(2, 12, 8)`	同上
输入 `values`	`(2, 12, 8)`	同上
输出 `out`	`(2, 12, 8)`	注意力聚合后投影回 d_model
输出 `attn`	`None`	`output_attention=False` 时为 None

3. 顺序图（具体层）

4. 语义分组图（索引层）

结构与 Informer 的 AttentionLayer 完全相同，唯一区别：内部注意力换成 AutoCorrelation（FFT 互相关）而非 ProbAttention（稀疏查询筛选）。

5. 逐步精读

5.0 完整原始代码

python

class AutoCorrelationLayer(nn.Module):
    def __init__(self, correlation, d_model, n_heads, d_keys=None, d_values=None):
        super(AutoCorrelationLayer, self).__init__()

        d_keys = d_keys or (d_model // n_heads)
        d_values = d_values or (d_model // n_heads)

        self.inner_correlation = correlation
        self.query_projection = nn.Linear(d_model, d_keys * n_heads)
        self.key_projection = nn.Linear(d_model, d_keys * n_heads)
        self.value_projection = nn.Linear(d_model, d_values * n_heads)
        self.out_projection = nn.Linear(d_values * n_heads, d_model)
        self.n_heads = n_heads

    def forward(self, queries, keys, values, attn_mask):
        B, L, _ = queries.shape
        _, S, _ = keys.shape
        H = self.n_heads

        queries = self.query_projection(queries).view(B, L, H, -1)
        keys = self.key_projection(keys).view(B, S, H, -1)
        values = self.value_projection(values).view(B, S, H, -1)

        out, attn = self.inner_correlation(queries, keys, values, attn_mask)
        out = out.view(B, L, -1)

        return self.out_projection(out), attn

5.1 格式转换 IN：d_model → 多头

python

B, L, _ = queries.shape
_, S, _ = keys.shape
H = self.n_heads

queries = self.query_projection(queries).view(B, L, H, -1)
keys    = self.key_projection(keys).view(B, S, H, -1)
values  = self.value_projection(values).view(B, S, H, -1)

d_keys = d_model // n_heads = 8 // 4 = 2。

query_projection = nn.Linear(8, 8)： $y = x W_{Q}^{T}$ ，作用在最后一维，输出仍 (2,12,8)。
.view(2, 12, 4, -1) 把最后一维 8 拆成 (H=4, d_keys=2) → (2, 12, 4, 2)。

queries[0, 0, :] = [p0,p1,p2,p3,p4,p5,p6,p7]
              ↓ .view(4, 2)
Head 0: [p0, p1]
Head 1: [p2, p3]
Head 2: [p4, p5]
Head 3: [p6, p7]

L 和 S 分开取：cross-attention 时 L=10 (decoder) ≠ S=12 (encoder)，若共用会导致 .view() 报错。

操作	代码	说明
Linear Q 投影	`query_projection(queries)`	$(B, L, 8) \to (B, L, 8)$ ， $W_{Q} \in R^{8 \times 8}$
多头拆分	`.view(B, L, H, -1)`	最后一维 $8 = 4 \times 2$ 拆为 (H=4, d_keys=2)
K/V 同步拆分	`.view(B, S, H, -1)`	形式相同，S 可与 L 不同（cross-attn）
格式确认	`(B,L,H,D)`	AutoCorrelation 期望此格式，内部再 permute 为 (B,H,E,L)

5.2 注意力计算（委托 AutoCorrelation，步骤三）

python

out, attn = self.inner_correlation(queries, keys, values, attn_mask)

传入 queries=(2,12,4,2)，keys=(2,12,4,2)，values=(2,12,4,2)（自注意力时）。

AutoCorrelation 接收 (B, L, H, E) 格式，内部 permute 为 (B,H,E,L) 后做 FFT 互相关，返回 out (2,12,4,2) = (B,L,H,D)。

→ 详见 [[04A-Layer5-AutoCorrelation]]

5.3 格式转换 OUT：多头 → d_model

python

out = out.view(B, L, -1)
return self.out_projection(out), attn

AutoCorrelation 返回的 out 格式为 (B, L, H, D) = (2, 12, 4, 2)。
.view(2, 12, -1) 合并最后两维 → (2, 12, 8)。
out_projection = Linear(8, 8) 做最终线性混合 → (2, 12, 8)。

与 Informer AttentionLayer 的格式差异

Informer 的 ProbAttention 返回 (B, H, L_Q, D) 格式，需要 transpose 后才能 .view()。
AutoCorrelation 直接返回 (B, L, H, D) 格式，可直接 .view(B, L, -1) — 两者格式不同，但 AutoCorrelationLayer 和 AttentionLayer 各自的 .view() 调用都是正确的。

6. 下钻子组件

子组件	职责	下层文档
`AutoCorrelation`	FFT 周期检测 + top-k lag 时延聚合	[[04A-Layer5-AutoCorrelation]]

DLinear_v1_archive

Informer_v1_archive

PatchTST_v1_archive

12-SelfAttention_Family

01-DLinear

02-PatchTST

03-Informer

Layer 4 — AutoCorrelationLayer 精读

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步精读

5.0 完整原始代码

5.1 格式转换 IN：d_model → 多头

5.2 注意力计算（委托 AutoCorrelation，步骤三）

5.3 格式转换 OUT：多头 → d_model

6. 下钻子组件

Layer 4 — AutoCorrelationLayer 精读 ​

1. 在父层中的位置 ​

2. I/O 接口定义 ​

3. 顺序图（具体层） ​

4. 语义分组图（索引层） ​

5. 逐步精读 ​

5.0 完整原始代码 ​

5.1 格式转换 IN：d_model → 多头 ​

5.2 注意力计算（委托 AutoCorrelation，步骤三） ​

5.3 格式转换 OUT：多头 → d_model ​

6. 下钻子组件 ​

Layer 4 — AutoCorrelationLayer 精读

1. 在父层中的位置

2. I/O 接口定义

3. 顺序图（具体层）

4. 语义分组图（索引层）

5. 逐步精读

5.0 完整原始代码

5.1 格式转换 IN：d_model → 多头

5.2 注意力计算（委托 AutoCorrelation，步骤三）

5.3 格式转换 OUT：多头 → d_model

6. 下钻子组件