π LFM2-v8-RL-10k Adapter (SOTA)
GRPO + COMET/CHrF++ κ°ννμ΅μΌλ‘ νλ ¨λ νμ/μν μλ°©ν₯ λ²μ LoRA Adapter
β οΈ μ΄ Adapterλ λ°λμ Base λͺ¨λΈκ³Ό ν¨κ» μ¬μ©ν΄μΌ ν©λλ€!
Base Model: gyung/lfm2-1.2b-koen-mt-v6.4-merged
π ν΅μ¬ μ±κ³Ό
- 1.2B λͺ¨λΈμ΄ 4B λͺ¨λΈμ μλ! Gemma-3 (4B)λ³΄λ€ 1.78 CHrF++ λμ
- λ¨ 400 Step (0.78 Epoch)λ§μ νμ΅μΌλ‘ SOTA λ¬μ±
- μ‘΄λλ§ μΌκ΄μ± μλ²½: 1012κ° μν μ 체μμ "~ν©λλ€" μ΄λ―Έ μΌκ΄ μ μ©
- Google Colab T4 λ¬΄λ£ GPUλ‘ νμ΅ κ°λ₯
π μ±λ₯ (Flores-200 Benchmark, 1012 Samples)
| Rank | Model | CHrF++ | BLEU | Params |
|---|---|---|---|---|
| 1 | Google Translate | 39.27 | 18.18 | - (API) |
| 2 | π LFM2-v8-RL (Step 400) | 34.61 | 13.21 | 1.2B |
| 3 | Gemma-3-4B-it-GGUF | 32.83 | 11.36 | 4B |
| 4 | LFM2-1.2B (Base) | 27.23 | 6.43 | 1.2B |
| 5 | Qwen3-4B-GGUF (Base) | 25.62 | - | 4B |
β 1.2B λͺ¨λΈλ‘ 4B λͺ¨λΈλ€μ μλμ μΌλ‘ λ₯κ°!
π νμ΅ κ³Όμ λ³ μ±λ₯ ν₯μ
| Step | Epoch | CHrF++ | BLEU | λΉκ³ |
|---|---|---|---|---|
| 0 | 0.00 | 33.53 | 12.63 | v6.4 Base |
| 200 | 0.39 | 34.10 | 12.93 | +0.57 ν₯μ |
| 300 | 0.59 | 34.19 | 13.24 | Historic High |
| 400 | 0.78 | 34.61 | 13.21 | π SOTA |
β¨ v8μ κ°μ
- μ‘΄λλ§/문체 μΌκ΄μ±: "
ν©λλ€", "νμ΅λλ€" μ΄λ―Έκ° μ μνμ κ±Έμ³ μΌκ΄λκ² μ μ© - μμ°μ€λ¬μ΄ λ¬Έμ₯ ꡬ쑰: "νμ€λ§λμ μ£Ό λ°λ³Έν¬νΈμ λ©μ΄λ² λ³μ μκΈμ..." κ°μ 볡μ‘ν λ¬Έμ₯λ μμ°μ€λ½κ² μ²λ¦¬
- λ¬Έλ§₯ μΈμ λ²μ: "While"μ λ¬Έλ§₯μ λ°λΌ "λ°λ©΄", "λμ" λ±μΌλ‘ μ μ°νκ² λ²μ
- μ λ¬Έ μ©μ΄ μ²λ¦¬: "rachis"λ₯Ό "μ°μΆ"μΌλ‘ μ ννκ² λ²μ
β οΈ μλ €μ§ νκ³
- κ³ μ λͺ μ¬ νκ°: "George W. Bush" β "μ‘°μ§ μμ±ν΄"μΌλ‘ μλͺ» λ²μ (λ² μ΄μ€ λͺ¨λΈμμ μμλ νΈν₯)
- μΌλΆ μ λ¬Έ μ©μ΄ μ€μ: νΉμ κ³Όν μ©μ΄μμ μ€λ₯ λ°μ κ°λ₯
π‘ ν΄κ²° λ°©μ: DPOλ₯Ό ν΅ν νκ° κ΅μ μμ (v9)
π§ μ¬μ©λ²
1. Adapter λ‘λ (μΆμ²)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Base λͺ¨λΈ λ‘λ
base_model = AutoModelForCausalLM.from_pretrained(
"gyung/lfm2-1.2b-koen-mt-v6.4-merged",
device_map="auto",
torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("gyung/lfm2-1.2b-koen-mt-v6.4-merged")
# Adapter λ‘λ λ° λ³ν©
model = PeftModel.from_pretrained(base_model, "gyung/lfm2-1.2b-koen-mt-v8-rl-10k-adapter")
model = model.merge_and_unload() # μΆλ‘ μλ ν₯μ
2. λ²μ μ€ν
# μμ΄ β νκ΅μ΄
messages = [
{"role": "system", "content": "Translate to Korean."},
{"role": "user", "content": "The quick brown fox jumps over the lazy dog."}
]
# νκ΅μ΄ β μμ΄
messages = [
{"role": "system", "content": "Translate to English."},
{"role": "user", "content": "λΉ λ₯Έ κ°μ μ¬μ°κ° κ²μΌλ₯Έ κ°λ₯Ό λ°μ΄λμ΅λλ€."}
]
input_ids = tokenizer.apply_chat_template(
messages, return_tensors="pt", add_generation_prompt=True
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.3,
min_p=0.15,
repetition_penalty=1.05
)
translation = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(translation)
π νμ΅ μμΈ
| νλͺ© | κ° |
|---|---|
| Base Model | gyung/lfm2-1.2b-koen-mt-v6.4-merged |
| Method | GRPO (Group Relative Policy Optimization) |
| Reward | COMET + CHrF++ |
| Dataset Size | 10,000 samples (μλ°©ν₯) |
| Training Steps | 400 |
| Effective Batch Size | 128 |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| Target Modules | all-linear |
π§ νμ΅ λ°©λ²λ‘
"Data-Centric + Progressive Learning" νμ΄νλΌμΈ:
- SFT (Step 1): LFM2-1.2B λ² μ΄μ€ β 20λ§κ° + 8.8λ§κ° κ³ νμ§ λ²μ λ°μ΄ν° νμ΅ β v6.4
- GRPO (Step 2): v6.4 β 10k μλ°©ν₯ RL λ°μ΄ν° + COMET/CHrF++ 리μλ β v8 (SOTA)
π‘ ν΅μ¬: 무μμ λ°μ΄ν°λ₯Ό λ리λ κ² μλλΌ, μ½μ μ λΆμνκ³ νκ²ν λ λ°μ΄ν°λ₯Ό μΆκ°
π ν₯ν κ³ν
- μΆκ° RL νμ΅: 1,200+ StepκΉμ§ νμ΅ μμ (ProRL λ Όλ¬Έ κΆμ₯)
- DPOλ‘ νκ° μμ : "George W. Bush" λ± κ³ μ λͺ μ¬ μ€μ κ΅μ
- DuPO μ μ©: μλ²μ κΈ°λ° μκΈ° κ²μ¦ (ByteDance λ Όλ¬Έ)
- LFM2-2.6B-Exp: μλ‘μ΄ λ² μ΄μ€ λͺ¨λΈ κΈ°λ° νμ΅
π κ΄λ ¨ λ§ν¬
- Base Model (v6.4): gyung/lfm2-1.2b-koen-mt-v6.4-merged
- Previous Adapter (v5): gyung/lfm2-1.2b-koen-mt-v5-rl-10k-adapter
- GitHub Repository: LFM2-KoEn-Tuning
- Liquid AI Official Cookbook: English-to-Korean Example
π Citation
@misc{lfm2-koen-v8-rl,
author = {gyung},
title = {LFM2-1.2B-KoEn-MT-v8-RL: GRPO-Enhanced Bidirectional Korean-English Translation Adapter},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/gyung/lfm2-1.2b-koen-mt-v8-rl-10k-adapter}
}
π License
This model inherits the LFM Open License v1.0 from the base LFM2 model.
- Downloads last month
- 65