AI & ML interests
DeepRL, RL finetuning
Organizations
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated • 27k • 11
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offline-sandbox
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62.1k • 28
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62.1k • 17
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62.1k • 29
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 91.9k • 13
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 91.9k • 116
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 91.9k • 21
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62.5k • 62
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62.5k • 38
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62.5k • 16
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 99k • 65
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 99k • 74
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 99.1k • 15
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62k • 25
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62k • 21
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62k • 11
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 100k • 36
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 100k • 10
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 100k • 22
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 61.6k • 25
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 61.6k • 80
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 61.6k • 28
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 93.8k • 9
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 93.8k • 93
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 96.6k • 19
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated • 26.6k • 10
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offline-sandbox