LLaMAFactory中进行模型微调

一、安装LLaMAFactory环境

以下是在ubuntu20.04 LTS环境上进行操作。

安装python虚拟环境

root@ksp-registry:~# wget --user-agent="Mozilla" https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
root@ksp-registry:~# bash Anaconda3-2024.06-1-Linux-x86_64.sh
###anaconda的安装目录是/root/anaconda3
...
You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
...
You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
...

#加载~/.bashrc
root@ksp-registry:~# source ~/.bashrc
#查看现有的conda管理的所有虚拟python环境
root@ksp-registry:~# conda env list

root@ksp-registry:~# conda create -n llamafactory python=3.12

CUDA安装

假设NVIDIA驱动已经安装好。接下来下载合适版本的CUDA安装文件：https://developer.nvidia.com/cuda-toolkit-archive ，以下是下载与安装cuda的步骤：

笔者NVIDIA驱动版本是，已经安装CUDA12.4

(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

LLaMAFactory安装

(llamafactory) root@ksp-registry:/opt/code_repos# git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
#如果上述github仓库访问失败，可以使用 https://gitee.com/sy-jiang/LLaMA-Factory.git

#修改pip安装源
(llamafactory) root@ksp-registry:/opt/code_repos# mkdir /root/.pip
(llamafactory) root@ksp-registry:/opt/code_repos# tee /root/.pip/pip.conf <<EOF
[global]
trusted-host = mirrors.aliyun.com
index-url = https://mirrors.aliyun.com/pypi/simple
EOF

(llamafactory) root@ksp-registry:/opt/code_repos# cd LLaMA-Factory
#安装 LLaMA-Factory 及其依赖
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# pip install -e ".[torch,metrics]"
#其中重要的python库版本
#torch==2.6.0 transformers==4.50.0

#上述命令执行后，并没有成功安装metrics这一python库，执行如下命令再次安装
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# pip install metrics
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# pip show metrics
#metrics==0.0.2

#运行以下代码检查 PyTorch 是否编译了 CUDA 支持
import torch
print(torch.__version__)  # 查看 PyTorch 版本
#2.6.0+cu124

print(torch.cuda.is_available())  # 检查是否支持 CUDA
#True

print(torch.version.cuda)  # 查看编译时的 CUDA 版本
#12.4

LLaMA-Factory校验

#校验安装是否成功
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# llamafactory-cli version  
----------------------------------------------------------
| Welcome to LLaMA Factory, version 0.9.3.dev0           |
|                                                        |
| Project page: https://github.com/hiyouga/LLaMA-Factory |
----------------------------------------------------------

#查看帮助
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# llamafactory-cli help

二、命令行使用（可选）

#安装modelscope python库
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# pip install modelscope>=1.11.0
#使用魔搭社区下载模型文件与数据集（很关键）
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# export USE_MODELSCOPE_HUB=1

#文件备份
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# cp examples/train_lora/llama3_lora_sft.yaml examples/train_lora/llama3_lora_sft.yaml.bak

#修改examples/train_lora/llama3_lora_sft.yaml，默认的model名或路径是hf中，在modelscope中不一定存在
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# vi examples/train_lora/llama3_lora_sft.yaml 
##model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: modelscope/Meta-Llama-3-8B-Instruct


###如果你不知道微调时使用的yaml文件中要填写的内容，可以先使用llamafactory的webui界面进行操作，（同时webui真正执行微调前可以生成一个预览命令，先从已有文件如llama3_lora_sft.yaml复制一份出来，可以根据预览命令修改复制出来的yaml文件中的配置内容，预览命令中没有的、但yaml文件中有的就保持默认）
#启动微调（以下展示了微调过程中输出内容的最后几行）
(llamafactory) root@ksp-registry:/opt/code_repos/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
...
[INFO|tokenization_utils_base.py:2510] 2025-03-31 16:20:20,622 >> tokenizer config file saved in saves/llama3-8b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2519] 2025-03-31 16:20:20,622 >> Special tokens file saved in saves/llama3-8b/lora/sft/special_tokens_map.json
***** train metrics *****
  epoch                    =     2.9826
  total_flos               = 22730138GF
  train_loss               =     0.9173
  train_runtime            = 0:34:54.13
  train_samples_per_second =      1.563
  train_steps_per_second   =      0.195
Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
[WARNING|2025-03-31 16:20:21] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.
[WARNING|2025-03-31 16:20:21] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|modelcard.py:449] 2025-03-31 16:20:21,163 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

如上生成的文件”saves/llama3-8b/lora/sft/training_loss.png“的内容如下：

提示，如下是一个微调DeepSeek-R1-Distill-Qwen-1.5B模型时用到的yaml文件：

### model
model_name_or_path: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
dataset: alpaca_zh_demo
template: deepseekr1
cutoff_len: 2048
max_samples: 100000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/DeepSeek-R1-1.5B-Distill/lora/sft
logging_steps: 5
save_steps: 100
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 0.00005
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

三、WebUI使用

3.1 启动与使用（微调Llama-3-8B-Instruct）

参考：https://llamafactory.readthedocs.io/zh-cn/latest/getting_started/webui.html、https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/README_zh.md

#使用以下命令启动webui
(llamafactory) root@ksp-registry:~# llamafactory-cli webui
#可以在“llamafactory-cli webui”命令前添加一些变量比如CUDA_VISIBLE_DEVICES=0,1,2。此时在web界面设备数量就是3，当然如果不指定此变量，默认也是全部可用加速卡设备的数量3
#(llamafactory) root@ksp-registry:~# CUDA_VISIBLE_DEVICES=0,1,2 llamafactory-cli webuillamafactory-cli webui

#假如启动此进程的服务器IP是：172.20.0.22，此时在笔记本电脑浏览器地址栏中输入“http://172.20.0.22:7860/” 将看到如下界面与内容

进行微调配置（下图中“检查点路径”无须配置任何值，未注明或显式修改配置项，保持默认即可）：

正式开始执行微调后，在当前页面的最底部，将看到日志不断滚动输出，如下：

微调过程中默认能使用所有能用的GPU卡（参考https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/README_zh.md），如下可以看到微调过程上，笔者的服务器上仅有一个GPU被使用中。

在当前目录的“./saves/Llama-3-8B-Instruct/lora/train_2025-03-31-16-27-21”子目录中可以看到相关输出配置信息与微调日志信息。

3.2 微调DeepSeek-R1-Distill-Qwen-1.5B

执行微调

保存训练参数

训练参数保存在如下目录下：

3.3 报错处理

3.3.1 微调“unsloth/Qwen2.5-7B-Instruct-bnb-4bit”模型时报错“No package metadata was found for bitsandbytes”

1
2
3

pip install bitsandbytes bitsandbytes-cuda117==0.26.0.post2
#下面这行命令，笔者未执行成功（暂时忽略，不执行也可）
#pip install transformers[audio,deepspeed,ftfy,onnx,sentencepiece,timm,tokenizers,video,vision]==4.50.0