Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

811

Full-text search

Active filters: ppo, trl

baek26/all_8113_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 29, 2024 • 6

baek26/all_4814_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 29, 2024 • 6

pkbiswas/Phi-3-Detoxified-PPO-LoRa

Reinforcement Learning • Updated May 18, 2024 • 5

stvnl/ppo_model_en

Reinforcement Learning • Updated May 2, 2024 • 4

hanyinwang/layer-project-diagnostic-mistral

Reinforcement Learning • Updated May 3, 2024 • 9

baek26/all_6618_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 6

baek26/all_8243_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 5

baek26/all_6959_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 5

baek26/all_2022_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 4

baek26/Ours-crossrl2

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 6

baek26/all_1445_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 6

baek26/all_3769_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 8, 2024 • 5

pkbiswas/Phi-3-Detoxified-PPO-QLoRa

Reinforcement Learning • Updated May 10, 2024 • 6

lctzz540/bunboppo

Reinforcement Learning • Updated May 14, 2024 • 8

baek26/bart-cnndm-oracle

Reinforcement Learning • 0.1B • Updated May 13, 2024 • 6

baek26/cnn_dailymail_7898_cnn_dailymail_8824_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 13, 2024 • 6

baek26/cnn_dailymail_5321_cnn_dailymail_8824_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 13, 2024 • 6

baek26/cnn_dailymail_5862_cnn_dailymail_8824_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 13, 2024 • 6

baek26/cnn_dailymail_5425_cnn_dailymail_8824_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 13, 2024 • 6

baek26/cnn_dailymail_4146_cnn_dailymail_8824_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 13, 2024 • 6

ignacioct/my_ppo_model

Reinforcement Learning • 3B • Updated May 14, 2024 • 5

baek26/dialogsum_784_bart-dialogsum_rl

Reinforcement Learning • 0.1B • Updated May 19, 2024 • 5

baek26/dialogsum_2749_bart-dialogsum_rl

Reinforcement Learning • 0.1B • Updated May 19, 2024 • 5

baek26/all_1000_bart-all_rl

Reinforcement Learning • 0.1B • Updated May 20, 2024 • 6

baek26/all_2245_bart-all_rl

Reinforcement Learning • 0.1B • Updated May 20, 2024 • 6

baek26/all_9929_bart-all_rl

Reinforcement Learning • 0.1B • Updated May 20, 2024 • 6

baek26/all_4293_bart-all_rl

Reinforcement Learning • 0.1B • Updated May 21, 2024 • 6

baek26/all_8929_bart-all_rl

Reinforcement Learning • 0.1B • Updated May 21, 2024 • 3

baek26/all_9529_bart-all_rl

Reinforcement Learning • 0.1B • Updated May 21, 2024 • 4

baek26/all_5356_bart-all_rl

Reinforcement Learning • 0.1B • Updated May 22, 2024 • 6