Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.15848

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Step-Audio-R1 is the first audio language model to successfully unlock test-time compute scaling.

stepfun-ai/Step-Audio-R1

Audio-Text-to-Text • 33B • Updated 12 days ago • 590 • 131
Running

33

Step Audio R1

⚡

33

Step-Audio-R1
Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51

TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

Paper • 2511.02802 • Published Nov 4 • 14
Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Paper • 2511.02818 • Published Nov 4 • 15
Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 6
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5 • 51

Music Flamingo: Scaling Music Understanding in Audio Language Models

Paper • 2511.10289 • Published about 1 month ago • 10
Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51

A Definition of AGI

Paper • 2510.18212 • Published Oct 21 • 34
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Paper • 2511.16664 • Published 23 days ago • 25
Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story

Paper • 2511.15210 • Published 25 days ago • 87

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 6
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5 • 51

Step-Audio-R1 is the first audio language model to successfully unlock test-time compute scaling.

stepfun-ai/Step-Audio-R1

Audio-Text-to-Text • 33B • Updated 12 days ago • 590 • 131
Running

33

Step Audio R1

⚡

33

Step-Audio-R1
Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51

Music Flamingo: Scaling Music Understanding in Audio Language Models

Paper • 2511.10289 • Published about 1 month ago • 10
Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51

TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

Paper • 2511.02802 • Published Nov 4 • 14
Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Paper • 2511.02818 • Published Nov 4 • 15
Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51

A Definition of AGI

Paper • 2510.18212 • Published Oct 21 • 34
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Paper • 2511.16664 • Published 23 days ago • 25
Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published 24 days ago • 51
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story

Paper • 2511.15210 • Published 25 days ago • 87

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs