Qwen3-IMDB-Reward-0.6B

Elite reward model trained on IMDb movie preferences

  • Final Loss: ~0.08 (research-grade)
  • Reward Gap: 0.109 (chosen >> rejected)
  • Dataset: 10k IMDb preference pairs
  • Base Model: Qwen/Qwen3-0.6B (merged LoRA)

πŸ† Performance

Loss trajectory: 0.77 β†’ 0.08 (140 steps) Gap (chosen vs rejected): 0.638 > 0.529 Ready for PPO/RLHF/DPO!

πŸš€ Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained(
    "Sachinkry/qwen3-imdb-reward-0.6b",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("Sachinkry/qwen3-imdb-reward-0.6b")

# Single inference
prompt = "What did you think about the movie?"
response = "Brilliant film, excellent acting!"
inputs = tokenizer(f"{prompt} {response}", return_tensors="pt")
reward = model(**inputs).logits.squeeze().sigmoid().item()
print(f"Reward: {reward:.3f}")

πŸ“Š Example Outputs

"What did you think about the movie? Brilliant film!" β†’ 0.638 βœ“
"What did you think about the movie? Total waste..." β†’ 0.529 βœ—
Gap: 0.109 βœ“
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results