Qwen3-IMDB-Reward-0.6B
Elite reward model trained on IMDb movie preferences
- Final Loss: ~0.08 (research-grade)
- Reward Gap: 0.109 (chosen >> rejected)
- Dataset: 10k IMDb preference pairs
- Base Model: Qwen/Qwen3-0.6B (merged LoRA)
π Performance
Loss trajectory: 0.77 β 0.08 (140 steps) Gap (chosen vs rejected): 0.638 > 0.529 Ready for PPO/RLHF/DPO!
π Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained(
"Sachinkry/qwen3-imdb-reward-0.6b",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("Sachinkry/qwen3-imdb-reward-0.6b")
# Single inference
prompt = "What did you think about the movie?"
response = "Brilliant film, excellent acting!"
inputs = tokenizer(f"{prompt} {response}", return_tensors="pt")
reward = model(**inputs).logits.squeeze().sigmoid().item()
print(f"Reward: {reward:.3f}")
π Example Outputs
"What did you think about the movie? Brilliant film!" β 0.638 β
"What did you think about the movie? Total waste..." β 0.529 β
Gap: 0.109 β
- Downloads last month
- -
Evaluation results
- Final Loss on Sachinkry/imdb-preference-10kself-reported0.080
- Reward Gap on Sachinkry/imdb-preference-10kself-reported0.109