本地微调DeepSeek-R1-8b模型

一、文档说明与服务器准备

1.1 文档说明

本文已经将相关问题如下载模型大文件、操作过程中相关python库版本问题、wandb认证问题等解决，并记录在此文档中。

这是一篇保姆级别的"微调DeepSeek-R1-Distill-Llama-8B模型"的操作文章，只要稍微懂点计算机软件知识就可以成功复现此文章中所述内容。

此文档中使用到了NVIDIA A40，如果GPU显存或算力更低，可以尝试使用更少参数版本DeepSeek-R1蒸馏模型比如1.5B，它仅靠CPU就可以运行。

1.2 服务器与GPU准备

此次使用的服务器是一个x86_64构架的Hygon C86 5380物理服务器，具体信息如下：

主机名	IP	操作系统	规格	GPU情况	备注
controller01	172.20.0.21	Ubuntu 20.04.3 LTS -amd64	32c64g+960G	NVIDIA A40*1

相关重要软件版本（后面3个是python库的版本，建议使用conda创建一个虚拟环境并安装此版本。在其他软件版本固定的情况下，unsloth使用2025.2.5之外的版本就报错）：

GPU驱动版本：550.54.15
Cuda版本：V12.4.131
torch版本：2.6.0
transformers版本：4.48.3
unsloth版本：2025.2.5

1.3 大模型微调定义

利用特定领域的数据集对已预训练的大模型进行进一步训练的过程。它旨在优化模型在特定任务上的性能，使模型能够更好地适应和完成特定领域的任务。其中最重要的是超参数（如学习率、批次大小和训练轮次）调整优化。转成大白话就是调整大模型中一些参数的值，使其在特定数据集上表现更优秀。

二、相关文件下载

2.1 模型文件下载

本来是想去huggingface上下载相关模型文件与数据集，由于huggingface需要梯子才能访问，不便操作。所以此文档中是在国内网络可正常访问魔搭平台上下载模型与数据集。

#访问魔搭：https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/files
#不指定--local_dir 参数，文件将被保存在用户根目录下的.cache子目录下
#模型总大小在16G左右
root@t1-gpu:/opt/code_repos/AI_models/unsloth-DeepSeek-R1-Distill-Llama-8B# modelscope download --model unsloth/DeepSeek-R1-Distill-Llama-8B --local_dir ./

2.2 训练数据集文件下载

访问魔搭，搜索“medical-o1-reasoning-SFT”数据集并下载（此处会有两个同名数据集，但上传者不同，我选择了下载量更大的https://modelscope.cn/datasets/AI-ModelScope/medical-o1-reasoning-SFT）。

1	`root@t1-gpu:/opt/code_repos/AI_datasets/AI-ModelScope---medical-o1-reasoning-SFT# modelscope download --dataset AI-ModelScope/medical-o1-reasoning-SFT --local_dir ./`

三、其他准备步骤

3.1 wandb token准备

去wandb官网注册一个账号、申请一个token，并记录此token，后续要用。

注：

wandb的意思是”weights and biases“，网上没有找到现成的翻译，我直译为权重与偏差。
关于W&B的介绍：W&B 是一个平台，可帮助数据科学家跟踪他们的模型、数据集、系统信息等。只需几行代码，就可以开始跟踪有关这些功能的所有内容。它是免费供个人使用的。团队使用通常是付费的，但用于学术目的的团队是免费的。可以将 W&B 与自己喜欢的框架一起使用，例如 TensorFlow、Keras、PyTorch、SKlearn、fastai 等。

所有跟踪信息都发送到 W&B UI 上的专用项目页面，可以在其中打开高质量的可视化、汇总信息并比较模型或参数。

3.1.1 wandb官网账号注册

如下使用我自己的github账号进行注册。当然也可以使用手动邮箱与密码进行注册（如下图最下部分所示）

3.1.2 获取token

进入wandb官网中自己账号的主页，可以看到如下内容。在如下位置复制自己的wandb token

3.2 准备jupyter环境

以下以在Ubuntu 20.04.3 LTS -amd64上安装jupyter环境为例，说明安装jupyter环境的步骤。

3.2.1 安装conda

root@controller01:/opt/installPkgs# mkdir /root/.pip
root@controller01:/opt/installPkgs# cat > /root/.pip/pip.conf  <<EOF
[global]
trusted-host = mirrors.aliyun.com
index-url = https://mirrors.aliyun.com/pypi/simple
EOF

root@controller01:/opt/installPkgs# wget --user-agent=“Mozilla” + https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2024.06-1-Linux-x86_64.sh

root@controller01:/opt/installPkgs# bash Anaconda3-2024.06-1-Linux-x86_64.sh
###安装过程中，会询问相关问题，保持默认配置直接回车或输入YES再回车（整个安装过程可能需要耗时几分钟）
###conda的默认安装目录是/root/anaconda3
###安装过程中最几行如下
....
You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes             #此处输入yes并回车
no change     /root/anaconda3/condabin/conda
no change     /root/anaconda3/bin/conda
no change     /root/anaconda3/bin/conda-env
no change     /root/anaconda3/bin/activate
no change     /root/anaconda3/bin/deactivate
no change     /root/anaconda3/etc/profile.d/conda.sh
no change     /root/anaconda3/etc/fish/conf.d/conda.fish
no change     /root/anaconda3/shell/condabin/Conda.psm1
no change     /root/anaconda3/shell/condabin/conda-hook.ps1
no change     /root/anaconda3/lib/python3.12/site-packages/xontrib/conda.xsh
no change     /root/anaconda3/etc/profile.d/conda.csh
modified      /root/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

Thank you for installing Anaconda3!
root@controller01:/opt/installPkgs# 

#重新加载环境变量，并启用默认的base conda环境
root@controller01:/opt/installPkgs# source ~/.bashrc
(base) root@controller01:/opt/installPkgs# which python
/root/anaconda3/bin/python
(base) root@controller01:/opt/installPkgs# which pip
/root/anaconda3/bin/pip
(base) root@controller01:/opt/installPkgs# python -V
Python 3.12.4
(base) root@controller01:/opt/installPkgs# pip -V
pip 24.0 from /root/anaconda3/lib/python3.12/site-packages/pip (python 3.12)


#查看现有的conda管理的所有虚拟python环境
(base) root@controller01:/opt/installPkgs# conda env list
# conda environments:
#
base                  *  /root/anaconda3

(base) root@controller01:/opt/installPkgs#

3.2.2 创建self-llm虚拟python环境

#更新软件列表
(base) root@controller01:/opt/installPkgs# apt-get update

#准备好python3环境与pip3（使用上述conda创建虚拟python3、pip3环境）
(base) root@controller01:/opt/installPkgs# conda create -n self-llm python=3.12
...
The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main 
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 
  bzip2              pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6 
  ca-certificates    pkgs/main/linux-64::ca-certificates-2024.12.31-h06a4308_0 
  expat              pkgs/main/linux-64::expat-2.6.4-h6a678d5_0 
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 
  libffi             pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 
  libgomp            pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 
  libuuid            pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0 
  ncurses            pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 
  openssl            pkgs/main/linux-64::openssl-3.0.15-h5eee18b_0 
  pip                pkgs/main/linux-64::pip-25.0-py312h06a4308_0 
  python             pkgs/main/linux-64::python-3.12.9-h5148396_0 
  readline           pkgs/main/linux-64::readline-8.2-h5eee18b_0 
  setuptools         pkgs/main/linux-64::setuptools-75.8.0-py312h06a4308_0 
  sqlite             pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0 
  tk                 pkgs/main/linux-64::tk-8.6.14-h39e8969_0 
  tzdata             pkgs/main/noarch::tzdata-2025a-h04d1e81_0 
  wheel              pkgs/main/linux-64::wheel-0.45.1-py312h06a4308_0 
  xz                 pkgs/main/linux-64::xz-5.6.4-h5eee18b_1 
  zlib               pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1 


Proceed ([y]/n)? Y         #此处输入y并回车
###然后等待下载与解压提取相关python安装文件与python模块。安装过程的最后几行输出如下
...
# To activate this environment, use                                                                         
##     $ conda activate self-llm                                                                             
# To deactivate an active environment, use                                                                   
##     $ conda deactivate
(base) root@controller01:/opt/installPkgs# 

#查看现有的conda管理的所有虚拟python环境
(base) root@controller01:/opt/installPkgs# conda env list
# conda environments:
#
base                     /root/anaconda3
self-llm              *  /root/anaconda3/envs/self-llm

#激活self-llm 这个虚拟python环境，并在其中执行相关命令
(base) root@controller01:/opt/installPkgs# conda activate self-llm 
(self-llm) root@controller01:/opt/installPkgs# python -V
Python 3.12.9
(self-llm) root@controller01:/opt/installPkgs# pip -V
pip 25.0 from /root/anaconda3/envs/self-llm/lib/python3.12/site-packages/pip (python 3.12)
(self-llm) root@controller01:/opt/installPkgs#

3.2.3 安装jupyter环境

#安装 IPython 交互式 shell
(self-llm) root@controller01:/opt/installPkgs# pip install ipython

#正式安装jupyter
(self-llm) root@controller01:/opt/installPkgs# apt install jupyter-core jupyter-notebook -qy
#创建目录
(self-llm) root@controller01:/opt/installPkgs# mkdir finetunning-bigmodles

#启动jupyter notebook
(self-llm) root@controller01:/opt/installPkgs# jupyter notebook --allow-root --ip=0.0.0.0 --no-browser
[I 01:29:38.744 NotebookApp] Serving notebooks from local directory: /opt/installPkgs
[I 01:29:38.744 NotebookApp] The Jupyter Notebook is running at:
[I 01:29:38.744 NotebookApp] http://a10:8888/?token=96f565e66d149b1803474ff3bf7c15e43fbaf9cc59c192ba
[I 01:29:38.744 NotebookApp]  or http://127.0.0.1:8888/?token=96f565e66d149b1803474ff3bf7c15e43fbaf9cc59c192ba
[I 01:29:38.744 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 01:29:38.755 NotebookApp] No web browser found: could not locate runnable browser.
[C 01:29:38.756 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///root/.local/share/jupyter/runtime/nbserver-41207-open.html
    Or copy and paste one of these URLs:
        http://a10:8888/?token=96f565e66d149b1803474ff3bf7c15e43fbaf9cc59c192ba
     or http://127.0.0.1:8888/?token=96f565e66d149b1803474ff3bf7c15e43fbaf9cc59c192ba
[I 01:29:53.755 NotebookApp] 302 GET /?token=96f565e66d149b1803474ff3bf7c15e43fbaf9cc59c192ba (192.168.254.84) 2.07ms

1
2
3

#将以下url中的127.0.0.1换成服务器IP，即可访问上述创建的jupyter环境，比如当前安装jupyter环境的服务器IP是172.20.0.21，则URL是如下
http://172.20.0.21:8888/?token=96f565e66d149b1803474ff3bf7c15e43fbaf9cc59c192ba
#最后的参数token类似于访问的密码。访问成功后界面如下，jupyter环境根目录就是执行“jupyter notebook --allow-root --ip=0.0.0.0”命令的所在目录/opt/installPkgs，以下显示了所在目录/opt/installPkgs下的所有文件及文件夹

如果要自定义jupyter的家目录及其他配置，可以修改它的配置文件（默认已经创建：/root/.jupyter/jupyter_notebook_config.py），以后每次启动时，它都是优先使用这个配置文件中的配置项。

(self-llm) root@controller01:/opt/installPkgs/finetunning-bigmodles# vi /root/.jupyter/jupyter_notebook_config.py
...
931 # c.ServerApp.notebook_dir = ''
932 c.ServerApp.notebook_dir = '/opt/installPkgs/finetunning-bigmodles/'
...

修改上述文件后，需要重新启动jupyter进程，才会生效。

3.2.4 报错处理

启动jupyter notebook时报“Fail to get yarn configuration”

但此错误并未不影响jupyter notebook的使用

解决办法：

#方法1：此报错不影响jupyter notebook的使用，暂时不予理会

#方法2：需要升级Node版本（先前是v10.19.6，升级到v12.22.6即可）。参考：https://discourse.jupyter.org/t/jupyter-lab-4-0-6-error-on-startup-about-yarn-configuration-and-worker-threads/21859
#使用nvm升级nodejs版本：
nvm install 12.22.6

四、在.ipynb中进行微调

4.1 打开一个已经存在的.ipynb文件

比如下载deepseekr1_8b本地微调.ipynb ，保存到/opt/installPkgs/finetunning-bigmodles

root@controller01:/opt/installPkgs/finetunning-bigmodles# ll
total 104
drwxr-xr-x  2 root root  4096 Feb 17 10:02 ./
drwxr-xr-x 12 root root  4096 Feb 17 09:48 ../
-rw-r--r--  1 root root 95152 Feb 17 09:46 deepseekr1_8b本地微调.ipynb

然后按照如下方式，在jupyter图形界面找到上述文件并使用"Notebook"打开：

打开后界面如下：

4.1.1 设置此.ipynb文件可信

4.1.2 为.ipynb文件选择kernel

4.1.3 运行此.ipynb文件

上述文件deepseekr1_8b本地微调.ipynb从网上下载下来后，有些地方需要修改下，否则在未翻墙的服务器上运行会失败或执行速度很慢。

以下未列举出来的cell，不需要修改，直接执行即可。

第1个cell-安装unsloth

%%capture
!pip install unsloth==2025.2.5 -i https://mirrors.aliyun.com/pypi/simple

# Also get the latest nightly Unsloth!
!pip install --force-reinstall --no-cache-dir --no-deps git+https://gitee.com/sy-jiang/unsloth.git -i https://mirrors.aliyun.com/pypi/simple
#上述命令中又会安装最新的unsloth 2025.2.12（后续使用中会有问题），所以重新安装nsloth 2025.2.12
!pip install unsloth==2025.2.5 -i https://mirrors.aliyun.com/pypi/simple

然后光标选中此cell，使用组合键“shift+ener”或点击如下按钮，即可执行选中的cell

第一个cell执行所需要的时间较长，因为安装的python库较大。cell执行未结果时，前面会一直有个"*"

当前cell执行结束后，会在当面所有执行cell次数的基础加1，结果显示在cell左上角。如下图，这是这个页面此cell第一次执行，所以执行结束后，显示1

第3个cell-安装wandb

1
2
3

#!pip install wandb
#修改成如下以使用国内pip安装源
!pip install wandb -i https://mirrors.aliyun.com/pypi/simple

第4个cell-初始化wandb

如下配置的key就是在wandb官网申请后获取的token，请修改成自己的tokenwanb token

## 这里不使用环境变量，直接填入wandb的token,如果没有token可以去官网下载一个
## 为了安全，我此处只粘贴了自己token的前部分内容
import wandb

wandb.login(key="b11f575fd0f6c8cae0cb016b24")
run = wandb.init(
    project='my fint-tune on deepseek r1 with medical data',
    job_type="training",
    anonymous="allow"
)

如果上述cell内容运行不成功，则采用如下方法：

#1)首先在self-llm这个虚拟机python环境登录到wandb.ai，成功后会将相关信息保存在/root/.netrc
(self-llm) root@controller01:/opt/installPkgs/finetunning-bigmodles# wandb login --relogin
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
wandb: W&B API key is configured. Use `wandb login --relogin` to force relogin
(self-llm) root@controller01:/opt/installPkgs/finetunning-bigmodles# 
(self-llm) root@controller01:/opt/installPkgs/finetunning-bigmodles# vi /root/.netrc 
(self-llm) root@controller01:/opt/installPkgs/finetunning-bigmodles# 
(self-llm) root@controller01:/opt/installPkgs/finetunning-bigmodles# cat /root/.netrc   
machine api.wandb.ai
  login user
  password b11f575fd0f6c8cae0cb016b24
  
#2)修改上述cell内容为如下，然后重新执行此cell
## 这里不使用环境变量，直接填入wandb的token,如果没有token可以去官网下载一个
## 同时wandb.init时默认超时时间是90s，因为网站是国外网站可能经常出现init超过90s的现象，所以增加超时时间为300s
import wandb
import os
#os.environ["WANDB_API_KEY"] = 'b11f575fd0f6c8cae0cb016b24'
##os.environ["WANDB_MODE"] = "offline"
os.environ['WANDB_INIT_TIMEOUT'] = '1200'  #Increase timeout settings
os.environ['WANDB_DEBUG'] = "true"           #Enable debugging
#wandb.login(key="b11f575fd0f6c8cae0cb016b24")
run = wandb.init(
    project='my fint-tune on deepseek r1 with medical data',
    job_type="training",
    anonymous="allow",
    settings=wandb.Settings(init_timeout=1200)
)

#3)执行成功时，输出如下

1	`#4)之后按照上述输出的内容，查看wandb.ai网站中自己的主页，看到如下类似内容。之后往后执行下一个cell`

第5个cell-加载模型文件/创建model实例

此cell需要保证你的服务器上有NVIDIA GPU。如下所示，我使用的是NVIDIA A40，它有44G左右显存

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/opt/code_repos/AI_models/unsloth-DeepSeek-R1-Distill-Llama-8B", # 这里改成你本地模型，以我的为例，我已经huggingface上的模型文件下载到本地。
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

第6与第7个cell-微调前执行推理

第6个cell提供中英文两种形式的问题：

#question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
question = "一名61岁女性，长期在咳嗽或打喷嚏等活动时不自觉排尿，但夜间无漏尿，现接受妇科检查和Q-tip检查。基于这些发现，膀胱造瘘术最有可能揭示她的残余容量和逼尿肌收缩情况？"


FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

执行时，输出内容如下：

第8个cell-创建框架model实例

这个cell的内容无需修改。它使用的lora框架中低秩矩阵的方法对模型进行快速有效微调。（需要对lora框架知识进行学习）

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

第9与第10个cell-数据集规整化准备

预处理数据集，对数据集进行规整。规整成第9个cell中样式

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

利用如下第10个cell中如下函数对数据集进行规整，将问题、思维链与回答填充到上述prompt中

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

第11个cell-加载dataset

from datasets import load_dataset

#load_dataset方法调用的第1个参数（数据集目录）、第2个参数（使用数据集中哪种语言.cn表示英文，zh表示中文）。其余不变（第3个参数表示使用前500条数据）
dataset = load_dataset("/opt/code_repos/AI_datasets/AI-ModelScope---medical-o1-reasoning-SFT", "zh",split = "train[0:500]") # 这里同样去huggingface上面下载数据集，然后放到本地
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

执行上述cell前，还需要做如下操作：

#备份unsloth-DeepSeek-R1-Distill-Llama-8B的README.md文件
root@controller01:/opt/code_repos/AI_models/unsloth-DeepSeek-R1-Distill-Llama-8B# mv README.md unsloth-DeepSeek-R1-Distill-Llama-8B---README.md 

#将数据集AI-ModelScope---medical-o1-reasoning-SFT下面*.json与README.md文件复制到上述目录下
root@controller01:/opt/code_repos/AI_models/unsloth-DeepSeek-R1-Distill-Llama-8B# cp -p /opt/code_repos/AI_datasets/AI-ModelScope---medical-o1-reasoning-SFT/*.json ./
root@controller01:/opt/code_repos/AI_models/unsloth-DeepSeek-R1-Distill-Llama-8B# cp -p /opt/code_repos/AI_datasets/AI-ModelScope---medical-o1-reasoning-SFT/README.md ./

#此时所有文件
root@controller01:/opt/code_repos/AI_models/unsloth-DeepSeek-R1-Distill-Llama-8B# ls -alh
total 16G
drwxr-xr-x 3 root root 4.0K Feb 17 16:21 .
drwxr-xr-x 6 root root 4.0K Feb 14 10:49 ..
-rw-r--r-- 1 root root  959 Feb 13 17:59 config.json
-rw-r--r-- 1 root root   73 Feb 13 17:59 configuration.json
-rw-r--r-- 1 root root  236 Feb 13 17:59 generation_config.json
-rw-r--r-- 1 root root  62M Feb 13 21:36 medical_o1_sft_Chinese.json
-rw-r--r-- 1 root root  71M Feb 13 21:36 medical_o1_sft.json
-rw-r--r-- 1 root root 4.7G Feb 13 19:11 model-00001-of-00004.safetensors
-rw-r--r-- 1 root root 4.7G Feb 13 19:02 model-00002-of-00004.safetensors
-rw-r--r-- 1 root root 4.6G Feb 13 19:10 model-00003-of-00004.safetensors
-rw-r--r-- 1 root root 1.1G Feb 13 18:17 model-00004-of-00004.safetensors
-rw-r--r-- 1 root root  24K Feb 13 17:59 model.safetensors.index.json
-rw------- 1 root root  952 Feb 13 19:11 .msc
-rw-r--r-- 1 root root   36 Feb 13 19:11 .mv
-rw-r--r-- 1 root root 1.3K Feb 13 21:36 README.md
-rw-r--r-- 1 root root  483 Feb 13 17:59 special_tokens_map.json
drwxr-xr-x 2 root root 4.0K Feb 13 19:11 ._____temp
-rw-r--r-- 1 root root  52K Feb 13 17:59 tokenizer_config.json
-rw-r--r-- 1 root root  17M Feb 13 18:00 tokenizer.json
-rw-r--r-- 1 root root  16K Feb 13 17:59 unsloth-DeepSeek-R1-Distill-Llama-8B---README.md

执行上述cell，输出内容如下：

第12个cell-创建SFTTrainer实例

使用trl这个python库来进行训练（微调）。

TrainingArguments(...)中有很多参数，它们是训练（此处是深度学习）时用到的参数。其中最后一个参数output_dir定义输出的checkpoint的输出目录

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

第13个cell-Model training

此cell偶尔会出现运行失败的问题，提示【failed to upsert bucket: returned error 401 Unauthorized: {"errors":[{"message":"user is not logged in"...】这样的信息，重复运行试试

1	`trainer_stats = trainer.train()`

此cell正常运行过程中输出如下：

此cell正常结束如下：

第14个cell-保存微调模型

cell内容无需修改。执行后，输出如果如下。

此时在wandb.ai网站中查看对应project的运行情况，发现状态已经是Finished：

点击project名字，可以看到如下详细信息：

第15个cell-微调后执行推理

仅仅将问题从英文形式换成中文形式。其余内容不变

#question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
###仅仅将问题从英文形式换成中文形式。其余内容不变
question = "一名61岁女性，长期在咳嗽或打喷嚏等活动时不自觉排尿，但夜间无漏尿，现接受妇科检查和Q-tip检查。基于这些发现，膀胱造瘘术最有可能揭示她的残余容量和逼尿肌收缩情况？"

FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

执行后，输出内容如下：

以下是微调前后推理的思维链与结果比较（可以看到微调后结果精简准确一些）：

第16个cell-微调后其他问题推理

第6个cell提供中英文两种形式的问题：

#question = "A 59-year-old man presents with a fever, chills, night sweats, and generalized fatigue, and is found to have a 12 mm vegetation on the aortic valve. Blood cultures indicate gram-positive, catalase-negative, gamma-hemolytic cocci in chains that do not grow in a 6.5% NaCl medium. What is the most likely predisposing factor for this patient's condition?"
question = "59岁男性，发热，寒颤，盗汗，全身疲劳，主动脉瓣上有12毫米的赘生物。血液培养显示革兰氏阳性，过氧化氢酶阴性，γ -溶血性球菌链，不能在6.5%的NaCl培养基中生长。这个病人的病情最有可能的诱发因素是什么？"

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

输出结果如下：

第17个cell-保存模型权重文件等到本地

cell内容无需修改，直接执行。输出如下图

# new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT"
new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local) # Local saving
tokenizer.save_pretrained(new_model_local)

同时会在jupyter根目录下创建一个”DeepSeek-R1-Medical-COT“目录，其中内容如下：

第18与第19个cell-把模型发布到huggingface

huggingface需要翻墙访问，省略。

model.push_to_hub(new_model_online) # Online saving
tokenizer.push_to_hub(new_model_online) # Online saving

#----------------
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")