欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 文旅 > 游戏 > 记录一个大模型逐层微调计算损失输出少了一个维度的小bug

记录一个大模型逐层微调计算损失输出少了一个维度的小bug

2025/9/15 19:28:23 来源:https://blog.csdn.net/m0_56243424/article/details/148610156  浏览:    关键词:记录一个大模型逐层微调计算损失输出少了一个维度的小bug

1.假如针对的对象是linear

def _compute_mse_on_batch(layer: nn.Module, batch_iter: Iterator[Tuple[torch.Tensor, torch.Tensor]], **kwargs
) -> torch.Tensor:inps_batch, outs_batch = next(batch_iter)print("Initial inps_batch:", inps_batch.shape)print("Initial outs_batch:", outs_batch.shape)# print("Any NaNs in inps_batch:", torch.isnan(inps_batch).any())# print("Any NaNs in outs_batch:", torch.isnan(outs_batch).any())# if inps_batch.shape[0] != 1:#     for name, value in list(kwargs.items()):#         if isinstance(value, torch.Tensor) and value.shape[0] == 1:#             if name not in ("attention_mask", "position_ids"):#                 warnings.warn(f"Tiling an unexpected kwarg {name} over batch size; make sure this is valid.")#             repeats = [len(inps_batch)] + [1 for _ in range(value.ndim - 1)]#             kwargs[name] = value.tile(*repeats)outs_prediction= layer(inps_batch, **kwargs)assert outs_prediction.shape == outs_batch.shapeloss = F.mse_loss(outs_prediction, outs_batch)# print("Computed loss:", loss.item())return loss

2.假如针对的对象是transformer

def _compute_mse_on_batch(layer: nn.Module, batch_iter: Iterator[Tuple[torch.Tensor, torch.Tensor]], **kwargs
) -> torch.Tensor:inps_batch, outs_batch = next(batch_iter)print("Initial inps_batch:", inps_batch.shape)print("Initial outs_batch:", outs_batch.shape)# print("Any NaNs in inps_batch:", torch.isnan(inps_batch).any())# print("Any NaNs in outs_batch:", torch.isnan(outs_batch).any())# if inps_batch.shape[0] != 1:#     for name, value in list(kwargs.items()):#         if isinstance(value, torch.Tensor) and value.shape[0] == 1:#             if name not in ("attention_mask", "position_ids"):#                 warnings.warn(f"Tiling an unexpected kwarg {name} over batch size; make sure this is valid.")#             repeats = [len(inps_batch)] + [1 for _ in range(value.ndim - 1)]#             kwargs[name] = value.tile(*repeats)outs_prediction, *_unused = layer(inps_batch, **kwargs)# print("outs_prediction device in loss:", outs_prediction.device)# print(" outs_batch device in loss:",  outs_batch.device)assert outs_prediction.shape == outs_batch.shapeloss = F.mse_loss(outs_prediction, outs_batch)# print("Computed loss:", loss.item())return loss

值得注意的是,假如我们在线性层里面写的是: outs_prediction, *_unused = layer(inps_batch, **kwargs),由于线性层返回没有 *_unused ,会导致我们输入[batchsize,input]时候得到的输出不是我们期望的[batchsize,output],而只会有[output]

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

热搜词