AI & ML interests

None defined yet.

Recent Activity

Update README.md

#2 opened 6 days ago by
hypothetical
hypothetical 
posted an update 7 days ago
view post
Post
2565
We thought it would be easier, but finally we have integrated CuDNN Paged Attention to our models!


Read article here: https://app.thestage.ai/blog/Integrating-cuDNN-Paged-Attention-to-TheStage-AI-Inference?id=8

Llama-8B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Llama-3.1-8B-Instruct
Mistral-Small-24B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Mistral-Small-3.1-24B-Instruct-2503
hypothetical 
posted an update 14 days ago
view post
Post
2014
We have updated our transcription model: TheStageAI/thewhisper-large-v3-turbo

– 6.00 WER on the English Open ASR Leaderboard
– 4.74 WER on the Multilingual Open ASR Leaderboard
– Beats NVIDIA Parakeet (6.34 WER) and Whisper-large-v3-turbo (7.8 WER)
– Strong improvements in Arabic, Hindi, Chinese
– Maintains quality with background and environmental noise
– Optimized inference engines for NVIDIA and Apple
– Hugging Face Transformers interface for easy use
– Best-in-class speed on NVIDIA GPUs and power efficiency on Apple devices
– NVIDIA Jetson Thor support
  • 2 replies
·
hypothetical 
posted an update about 2 months ago
view post
Post
266
Hello guys! Maybe someone want to test our framework for automated model's compression. Here is what can be produced with it. Move the slider - compress/accelerate model, select point which like and compile. I can give an access, we are now improving and collecting comments from users

TheStageAI/ANNA-LLM
  • 3 replies
·