Hugging Face H4

Team

company

https://github.com/huggingface/alignment-handbook

Activity Feed

AI & ML interests

Aligning LLMs to be helpful, honest, harmless, and huggy (H4)

Recent Activity

edbeeching updated a model 4 days ago

HuggingFaceH4/Qwen3-4B-Thinking-2507-SFT-tr5

edbeeching published a model 5 days ago

HuggingFaceH4/Qwen3-4B-Thinking-2507-SFT-tr5

clefourrier authored a paper 13 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

View all activity

mishig

posted an update about 4 hours ago

Post

I like these models nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 and nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8 and TradingAgents: Multi-Agents LLM Financial Trading Framework (2412.20138) and https://arxiv.org/abs/2412.20138

edbeeching

updated a model 4 days ago

HuggingFaceH4/Qwen3-4B-Thinking-2507-SFT-tr5

Updated 5 days ago • 273

edbeeching

published a model 5 days ago

HuggingFaceH4/Qwen3-4B-Thinking-2507-SFT-tr5

Updated 5 days ago • 273

sergiopaniego

posted an update 13 days ago

Post

529

ICYMI, great blog by @kashif and @stas on Ulysses Sequence Parallelism: train with million-token contexts

on 4×H100s: 12x longer sequences, 3.7x throughput

learn how to integrate it with Accelerate, Transformers, and TRL ⤵️
https://huggingface.co/blog/ulysses-sp

clefourrier

authored a paper 13 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 14 days ago • 63

sergiopaniego

posted an update 14 days ago

Post

307

We just released a big blog surveying 16 OSS frameworks for async RL training of LLMs!

We're building a new async GRPO trainer for TRL and as first step, we needed to understand how the ecosystem solves this problem today.

The problem: in synchronous RL training, generation dominates wall-clock time. 32K-token rollouts on a 32B model take hours while training GPUs sit completely idle. With reasoning models and agentic RL making rollouts longer and more variable, this only gets worse.

The ecosystem converged on the same fix: separate inference + training onto different GPU pools, rollout buffer, and async weight sync.

We compared 16 frameworks across 7 axes: orchestration, buffer design, weight sync, staleness management, partial rollouts, LoRA, and MoE support.

This survey is step one. The async GRPO trainer for TRL is next!

https://huggingface.co/blog/async-rl-training-landscape

sergiopaniego

posted an update 15 days ago

Post

312

Nemotron 3 Super by @nvidia is here! NVIDIA's hybrid Mamba2/Transformer models are now natively supported in transformers (no trust_remote_code needed)

Fine-tune them with TRL in just a few lines of code. Notebook + script included to get started right away. goooo!

- Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb
- Script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_nemotron_3.py
- Collection with all the models: https://huggingface.co/collections/nvidia/nvidia-nemotron-v3

cmpatino

in HuggingFaceH4/on-policy-distillation 21 days ago

How to reproduce the results in your blog?

#7 opened about 2 months ago by

141forever

alvarobartt

posted an update 21 days ago

Post

3393

Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr

sergiopaniego

posted an update 23 days ago

Post

562

did you know you can train agentic models with RL deploying the environments on HF Spaces? 🤗

with TRL + OpenEnv, your training script connects to remote environments hosted as Spaces

want to train faster? → just add more Spaces (TRL handles the parallelization natively)

we used this to train a model to solve the trolley problem in CARLA. 2 HF Spaces running a full driving simulator, each on a T4 GPU

full write-up with code and results → https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl

sergiopaniego

posted an update 24 days ago

Post

428

Qwen3.5 dense (smol 🤏) models just dropped

- natively multimodal
- 0.8B · 2B · 4B · 9B (+ base variants)
- 262K context extensible to 1M
- built-in thinking

fine-tune them with TRL out of the box → SFT, GRPO, DPO and more!

examples: https://huggingface.co/docs/trl/example_overview
collection: https://huggingface.co/collections/Qwen/qwen35

sayakpaul

authored 2 papers 24 days ago

Fine-Grained Perturbation Guidance via Attention Head Selection

Paper • 2506.10978 • Published Jun 12, 2025 • 25

From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

Paper • 2602.21778 • Published 29 days ago • 14

sergiopaniego

posted an update 28 days ago

Post

2412

What happens when you make an LLM drive a car where physics are real and actions can't be undone?

I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.

The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.

In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.

The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.

This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py

albertvillanova

posted an update 28 days ago

Post

2110

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

sayakpaul

submitted a paper to Daily Papers 28 days ago

From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

Paper • 2602.21778 • Published 29 days ago • 14

sayakpaul

authored a paper about 1 month ago

TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models

Paper • 2602.15449 • Published Feb 17 • 7

qgallouedec

posted an update about 1 month ago

Post

2889

@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch:

Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*.
So the real question is:

How hard is it to turn Tiny Aya into an agent?

Turns out… it’s simple, thanks to Hugging Face TRL.
We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.

Small model. Global reach. Agent capabilities.

👉 https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb