Appearance
03A-Layer2A-DataEmbedding
本文件位置
上层:[[02-Layer1-forecast主链]]
入口代码:enc_out = self.enc_embedding(x_enc, x_mark_enc)
入口函数:DataEmbedding.forward(x, x_mark)
出口张量:enc_out,形状从(B, seq_len, enc_in)变成(B, seq_len, d_model)。
1. 本层顺序树
1.1 语义分组图
2. 输入输出接口
| 变量 | toy shape | 含义 |
|---|---|---|
x | (3,8,4) | 归一化后的历史数值序列 |
x_mark | (3,8,4) | 时间特征,小时频率下通常对应 month/day/weekday/hour 的连续编码 |
value_embedding(x) | (3,8,6) | 用 Conv1d 把 4 个变量映射到 6 个 hidden channel |
temporal_embedding(x_mark) | (3,8,6) | 用 Linear(4 -> 6) 把时间特征映射到 hidden channel |
position_embedding(x) | (1,8,6) | 正弦余弦位置编码,可广播到 batch |
output | (3,8,6) | 三种信息相加后的 embedding |
3. 对照源码
位置:ts_benchmark/baselines/time_series_library/layers/Embed.py
python
class DataEmbedding(nn.Module):
def __init__(self, c_in, d_model, embed_type="fixed", freq="h", dropout=0.1):
super(DataEmbedding, self).__init__()
self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
self.position_embedding = PositionalEmbedding(d_model=d_model)
self.temporal_embedding = (
TemporalEmbedding(d_model=d_model, embed_type=embed_type, freq=freq)
if embed_type != "timeF"
else TimeFeatureEmbedding(d_model=d_model, embed_type=embed_type, freq=freq)
)
self.dropout = nn.Dropout(p=dropout)
def forward(self, x, x_mark):
if x_mark is None:
x = self.value_embedding(x) + self.position_embedding(x)
else:
x = (
self.value_embedding(x)
+ self.temporal_embedding(x_mark)
+ self.position_embedding(x)
)
return self.dropout(x)TimesNet 在 TFB 的 transformer_adapter 中通常走 embed_type="timeF",所以 temporal_embedding 是 TimeFeatureEmbedding。
4. value_embedding:数值进入 hidden 空间
源码:
python
class TokenEmbedding(nn.Module):
def __init__(self, c_in, d_model):
super(TokenEmbedding, self).__init__()
padding = 1 if torch.__version__ >= "1.5.0" else 2
self.tokenConv = nn.Conv1d(
in_channels=c_in,
out_channels=d_model,
kernel_size=3,
padding=padding,
padding_mode="circular",
bias=False,
)
for m in self.modules():
if isinstance(m, nn.Conv1d):
nn.init.kaiming_normal_(
m.weight, mode="fan_in", nonlinearity="leaky_relu"
)
def forward(self, x):
x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
return xshape 流水线:
text
x: (3,8,4)
x.permute(0,2,1): (3,4,8)
Conv1d(4 -> 6): (3,6,8)
transpose(1,2): (3,8,6)Conv1d 的数学职责:对每个时间点附近的长度 3 邻域做局部线性组合,同时把变量维 4 映射到 hidden 维 6。
toy 例子只看一个 batch、一个输出通道、时间位置 t=3。全局 toy 有 4 个变量,卷积核长度 3,取便于手算的权重:
text
变量0在 t=2,3,4 的值: [1, 2, 3]
变量1在 t=2,3,4 的值: [4, 5, 6]
变量2在 t=2,3,4 的值: [7, 8, 9]
变量3在 t=2,3,4 的值: [10, 11, 12]
out_channel0 的卷积核:
变量0权重: [0.10, 0.10, 0.10]
变量1权重: [0.20, 0.20, 0.20]
变量2权重: [0.05, 0.05, 0.05]
变量3权重: [0.01, 0.01, 0.01]
输出 =
(1+2+3)*0.10 + (4+5+6)*0.20 + (7+8+9)*0.05 + (10+11+12)*0.01
= 0.60 + 3.00 + 1.20 + 0.33
= 5.13真实运行时权重来自 kaiming_normal_ 初始化并在训练中更新,计算规则与上面的 toy 完全一致。
5. temporal_embedding:时间特征进入 hidden 空间
源码:
python
class TimeFeatureEmbedding(nn.Module):
def __init__(self, d_model, embed_type="timeF", freq="h"):
super(TimeFeatureEmbedding, self).__init__()
freq_map = {"h": 4, "t": 5, "s": 6, "m": 1, "a": 1, "w": 2, "d": 3, "b": 3}
d_inp = freq_map[freq]
self.embed = nn.Linear(d_inp, d_model, bias=False)
def forward(self, x):
return self.embed(x)小时频率 freq="h" 时:
text
x_mark: (3,8,4)
Linear(4 -> 6): (3,8,6)toy 例子只看一个时间点:
text
x_mark[0,3,:] = [0.25, 0.50, 0.75, 1.00]
若第0个 hidden 通道权重为:
w = [1.0, -1.0, 0.5, 2.0]
temporal_embedding[0,3,0]
= 0.25*1.0 + 0.50*(-1.0) + 0.75*0.5 + 1.00*2.0
= 2.1256. position_embedding:位置信息进入 hidden 空间
源码:
python
class PositionalEmbedding(nn.Module):
def __init__(self, d_model, max_len=5000):
super(PositionalEmbedding, self).__init__()
# Compute the positional encodings once in log space.
pe = torch.zeros(max_len, d_model).float()
pe.require_grad = False
position = torch.arange(0, max_len).float().unsqueeze(1)
div_term = (
torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)
).exp()
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0)
self.register_buffer("pe", pe)
def forward(self, x):
return self.pe[:, : x.size(1)]公式:
shape:
text
self.pe: (1,5000,6)
self.pe[:, :8]: (1,8,6)
广播到 batch: (3,8,6)7. 三路相加
源码对应:
python
x = (
self.value_embedding(x)
+ self.temporal_embedding(x_mark)
+ self.position_embedding(x)
)
return self.dropout(x)toy 例子只看 batch=0, time=3, hidden=0:
text
value_embedding[0,3,0] = 5.130
temporal_embedding[0,3,0] = 2.125
position_embedding[0,3,0] = sin(3) ≈ 0.141
sum = 5.130 + 2.125 + 0.141 = 7.396
dropout 后:
训练模式可能置零或按 1/(1-p) 缩放
评估模式保持 7.3968. 出口接回上层
text
DataEmbedding 输出 enc_out: (3,8,6)
回到 [[02-Layer1-forecast主链]]
下一步: predict_linear 把时间长度 8 扩展到 13