πŸ† LFM2-v8-RL-10k Adapter (SOTA)

GRPO + COMET/CHrF++ κ°•ν™”ν•™μŠ΅μœΌλ‘œ ν›ˆλ ¨λœ ν•œμ˜/μ˜ν•œ μ–‘λ°©ν–₯ λ²ˆμ—­ LoRA Adapter

⚠️ 이 AdapterλŠ” λ°˜λ“œμ‹œ Base λͺ¨λΈκ³Ό ν•¨κ»˜ μ‚¬μš©ν•΄μ•Ό ν•©λ‹ˆλ‹€!

Base Model: gyung/lfm2-1.2b-koen-mt-v6.4-merged

πŸ… 핡심 μ„±κ³Ό

  • 1.2B λͺ¨λΈμ΄ 4B λͺ¨λΈμ„ 압도! Gemma-3 (4B)보닀 1.78 CHrF++ λ†’μŒ
  • 단 400 Step (0.78 Epoch)만의 ν•™μŠ΅μœΌλ‘œ SOTA 달성
  • μ‘΄λŒ“λ§ 일관성 μ™„λ²½: 1012개 μƒ˜ν”Œ μ „μ²΄μ—μ„œ "~ν•©λ‹ˆλ‹€" μ–΄λ―Έ 일관 적용
  • Google Colab T4 무료 GPU둜 ν•™μŠ΅ κ°€λŠ₯

πŸ“Š μ„±λŠ₯ (Flores-200 Benchmark, 1012 Samples)

Rank Model CHrF++ BLEU Params
1 Google Translate 39.27 18.18 - (API)
2 πŸ† LFM2-v8-RL (Step 400) 34.61 13.21 1.2B
3 Gemma-3-4B-it-GGUF 32.83 11.36 4B
4 LFM2-1.2B (Base) 27.23 6.43 1.2B
5 Qwen3-4B-GGUF (Base) 25.62 - 4B

βœ… 1.2B λͺ¨λΈλ‘œ 4B λͺ¨λΈλ“€μ„ μ••λ„μ μœΌλ‘œ λŠ₯κ°€!

πŸ“ˆ ν•™μŠ΅ 과정별 μ„±λŠ₯ ν–₯상

Step Epoch CHrF++ BLEU λΉ„κ³ 
0 0.00 33.53 12.63 v6.4 Base
200 0.39 34.10 12.93 +0.57 ν–₯상
300 0.59 34.19 13.24 Historic High
400 0.78 34.61 13.21 πŸ† SOTA

✨ v8의 강점

  1. μ‘΄λŒ“λ§/문체 일관성: "ν•©λ‹ˆλ‹€", "ν–ˆμŠ΅λ‹ˆλ‹€" μ–΄λ―Έκ°€ μ „ μƒ˜ν”Œμ— 걸쳐 μΌκ΄€λ˜κ²Œ 적용
  2. μžμ—°μŠ€λŸ¬μš΄ λ¬Έμž₯ ꡬ쑰: "νƒ€μŠ€λ§ˆλ‹ˆμ•„ μ£Ό 데본포트의 메이버 병원 μžκΈˆμ„..." 같은 λ³΅μž‘ν•œ λ¬Έμž₯도 μžμ—°μŠ€λŸ½κ²Œ 처리
  3. λ¬Έλ§₯ 인식 λ²ˆμ—­: "While"을 λ¬Έλ§₯에 따라 "반면", "λ™μ•ˆ" λ“±μœΌλ‘œ μœ μ—°ν•˜κ²Œ λ²ˆμ—­
  4. μ „λ¬Έ μš©μ–΄ 처리: "rachis"λ₯Ό "μš°μΆ•"으둜 μ •ν™•ν•˜κ²Œ λ²ˆμ—­

⚠️ μ•Œλ €μ§„ ν•œκ³„

  1. 고유λͺ…사 ν™˜κ°: "George W. Bush" β†’ "μ‘°μ§€ μ›Œμ‹±ν„΄"으둜 잘λͺ» λ²ˆμ—­ (베이슀 λͺ¨λΈμ—μ„œ μƒμ†λœ 편ν–₯)
  2. 일뢀 μ „λ¬Έ μš©μ–΄ μ˜€μ—­: νŠΉμ • κ³Όν•™ μš©μ–΄μ—μ„œ 였λ₯˜ λ°œμƒ κ°€λŠ₯

πŸ’‘ ν•΄κ²° λ°©μ•ˆ: DPOλ₯Ό ν†΅ν•œ ν™˜κ° ꡐ정 μ˜ˆμ • (v9)

πŸ”§ μ‚¬μš©λ²•

1. Adapter λ‘œλ“œ (μΆ”μ²œ)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Base λͺ¨λΈ λ‘œλ“œ
base_model = AutoModelForCausalLM.from_pretrained(
    "gyung/lfm2-1.2b-koen-mt-v6.4-merged",
    device_map="auto",
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("gyung/lfm2-1.2b-koen-mt-v6.4-merged")

# Adapter λ‘œλ“œ 및 병합
model = PeftModel.from_pretrained(base_model, "gyung/lfm2-1.2b-koen-mt-v8-rl-10k-adapter")
model = model.merge_and_unload()  # μΆ”λ‘  속도 ν–₯상

2. λ²ˆμ—­ μ‹€ν–‰

# μ˜μ–΄ β†’ ν•œκ΅­μ–΄
messages = [
    {"role": "system", "content": "Translate to Korean."},
    {"role": "user", "content": "The quick brown fox jumps over the lazy dog."}
]

# ν•œκ΅­μ–΄ β†’ μ˜μ–΄
messages = [
    {"role": "system", "content": "Translate to English."},
    {"role": "user", "content": "λΉ λ₯Έ κ°ˆμƒ‰ μ—¬μš°κ°€ 게으λ₯Έ 개λ₯Ό λ›°μ–΄λ„˜μŠ΅λ‹ˆλ‹€."}
]

input_ids = tokenizer.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05
)

translation = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(translation)

πŸ“ˆ ν•™μŠ΅ 상세

ν•­λͺ© κ°’
Base Model gyung/lfm2-1.2b-koen-mt-v6.4-merged
Method GRPO (Group Relative Policy Optimization)
Reward COMET + CHrF++
Dataset Size 10,000 samples (μ–‘λ°©ν–₯)
Training Steps 400
Effective Batch Size 128
LoRA Rank 32
LoRA Alpha 64
Target Modules all-linear

πŸ”§ ν•™μŠ΅ 방법둠

"Data-Centric + Progressive Learning" νŒŒμ΄ν”„λΌμΈ:

  1. SFT (Step 1): LFM2-1.2B 베이슀 β†’ 20만개 + 8.8만개 κ³ ν’ˆμ§ˆ λ²ˆμ—­ 데이터 ν•™μŠ΅ β†’ v6.4
  2. GRPO (Step 2): v6.4 β†’ 10k μ–‘λ°©ν–₯ RL 데이터 + COMET/CHrF++ λ¦¬μ›Œλ“œ β†’ v8 (SOTA)

πŸ’‘ 핡심: λ¬΄μž‘μ • 데이터λ₯Ό λŠ˜λ¦¬λŠ” 게 μ•„λ‹ˆλΌ, 약점을 λΆ„μ„ν•˜κ³  νƒ€κ²ŸνŒ…λœ 데이터λ₯Ό μΆ”κ°€

πŸš€ ν–₯ν›„ κ³„νš

  1. μΆ”κ°€ RL ν•™μŠ΅: 1,200+ StepκΉŒμ§€ ν•™μŠ΅ μ˜ˆμ • (ProRL λ…Όλ¬Έ ꢌμž₯)
  2. DPO둜 ν™˜κ° μˆ˜μ •: "George W. Bush" λ“± 고유λͺ…사 μ˜€μ—­ ꡐ정
  3. DuPO 적용: μ—­λ²ˆμ—­ 기반 자기 검증 (ByteDance λ…Όλ¬Έ)
  4. LFM2-2.6B-Exp: μƒˆλ‘œμš΄ 베이슀 λͺ¨λΈ 기반 ν•™μŠ΅

πŸ”— κ΄€λ ¨ 링크

πŸ“ Citation

@misc{lfm2-koen-v8-rl,
  author = {gyung},
  title = {LFM2-1.2B-KoEn-MT-v8-RL: GRPO-Enhanced Bidirectional Korean-English Translation Adapter},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/gyung/lfm2-1.2b-koen-mt-v8-rl-10k-adapter}
}

πŸ“„ License

This model inherits the LFM Open License v1.0 from the base LFM2 model.

Downloads last month
65
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gyung/lfm2-1.2b-koen-mt-v8-rl-10k-adapter

Base model

LiquidAI/LFM2-1.2B
Adapter
(1)
this model