LoRA: Low-Rank Adaptation of Large Language Models
Paper
•
2106.09685
•
Published
•
57
这是一个基于 DeepSeek-7B-Chat 使用 LoRA 技术微调甄嬛体的模型。
pip install torch>=2.0.0
pip install transformers>=4.35.2
pip install peft>=0.7.0
pip install accelerate>=0.25.0
pip install safetensors>=0.4.1
# 安装基本依赖
pip install torch transformers peft accelerate safetensors
# 或者指定版本安装
pip install torch>=2.0.0
pip install transformers>=4.35.2
pip install peft>=0.7.0
pip install accelerate>=0.25.0
pip install safetensors>=0.4.1
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# 加载基础模型
base_model = AutoModelForCausalLM.from_pretrained(
"deepseek-ai/deepseek-llm-7b-chat",
trust_remote_code=True,
torch_dtype=torch.half,
device_map="auto"
)
# 加载 tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"deepseek-ai/deepseek-llm-7b-chat",
use_fast=False,
trust_remote_code=True
)
# 加载 LoRA 权重
model = PeftModel.from_pretrained(
base_model,
"fage13141/fage",
torch_dtype=torch.half,
device_map="auto"
)
# 使用示例
prompt = "你的提示词"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
在 generate 函数中,你可以调整以下参数来控制生成效果:
示例:
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1
)
显存不足
load_in_8bit=Truedevice_map="cpu"模型加载失败
Base model
deepseek-ai/deepseek-llm-7b-chat