基于DeepSeek-R1-Distill-Llama-8B的健康管理助手微调过程

本次创新实训项目的主要任务是利用DEEPSEEK提供的开源模型，通过微调技术，实现一个专注于健康管理与医疗咨询的人工智能助手。本文详细记录我们如何对DeepSeek-R1-Distill-Llama-8B模型进行微调，以满足健康医疗领域应用的需求。

为什么选择DeepSeek-R1-Distill-Llama-8B？

我们选择DeepSeek-R1-Distill-Llama-8B模型主要基于以下原因：

模型规模合适：8B参数规模在GPU资源有限的条件下也能高效训练。
中文理解能力强：特别适合医疗咨询类问题，语言表述清晰且专业。
开放与自由：DeepSeek系列的开源特性让我们能灵活地微调和部署。

数据集的选择和介绍

我们使用的是FreedomIntelligence医疗推理数据集。该数据集专注于医疗推理任务，每条数据由以下三个部分组成：

Question：医学问题描述。
Complex_CoT：详细的医学推理过程。
Response：医学建议或诊疗方案。

数据示例：

{"Question": "一个患有急性阑尾炎的病人已经发病5天，腹痛稍有减轻但仍然发热...","Complex_CoT": "考虑病程较长，阑尾可能已形成脓肿，需要进一步处理...","Response": "建议首先进行保守治疗，如有必要再考虑手术干预。"
}

LoRA 微调原理详解

LoRA (Low-Rank Adaptation) 是一种高效的微调技术，通过冻结原模型的参数，仅通过低秩矩阵来适应新任务。具体而言，LoRA在原始权重矩阵 $W_0 \in \mathbb{R}^{d \times d}$ 基础上，增加了两个低秩矩阵 $\in \mathbb{R}^{r \times d}$ 和 $\in \mathbb{R}^{d \times r}$ ，实现权重微调：

$\Delta W = BA \quad (r \ll d)$

实际更新后的权重表示为：

$W = W_0 + BA$

LoRA的参数设置包括：

r (rank)：控制模型微调的容量与精度，通常取8至64。
lora_alpha：放大系数，用于调整LoRA微调的学习强度，通常与r取相近数值。

通过LoRA，能够极大降低训练成本与显存占用，仅用少量参数即可有效微调。

微调实现过程

环境配置

!pip install unsloth bitsandbytes transformers datasets trl

模型加载与量化

使用Unsloth进行高效加载（使用4-bit量化）：

from unsloth import FastLanguageModelmodel, tokenizer = FastLanguageModel.from_pretrained("unsloth/DeepSeek-R1-Distill-Llama-8B",max_seq_length=2048, load_in_4bit=True
)

数据集处理

构建适合模型微调的Prompt模板：

from datasets import load_datasetEOS = tokenizer.eos_tokendef formatting_prompts_func(examples):texts = []for q, cot, ans in zip(examples["Question"], examples["Complex_CoT"], examples["Response"]):text = f"""Below is an instruction...
### Question:
{q}### Response:
<think>
{cot}
</think>
{ans}{EOS}"""texts.append(text)return {"text": texts}dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT",'zh',split="train[:500]"
).map(formatting_prompts_func, batched=True)

LoRA微调参数配置

model = FastLanguageModel.get_peft_model(model,r=16,  # 设定秩大小lora_alpha=16,  # LoRA放缩因子target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],use_gradient_checkpointing="unsloth"
)

微调训练

使用TRL库的SFTTrainer进行高效训练：

from trl import SFTTrainer
from transformers import TrainingArgumentstrainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,max_steps=60,learning_rate=2e-4,fp16=True,output_dir="outputs",logging_steps=1)
)trainer.train()

微调后模型简单推理验证

FastLanguageModel.for_inference(model)question = "“最近感觉睡眠质量差，晚上容易醒来，白天精神也不好，应该如何调理？"
inputs = tokenizer([question], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=1024
)response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)