DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion Paper • 2510.20766 • Published Oct 23 • 34
Advancing Speech Understanding in Speech-Aware Language Models with GRPO Paper • 2509.16990 • Published Sep 21 • 18
Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published Sep 15 • 27
Story2Board: A Training-Free Approach for Expressive Storyboard Generation Paper • 2508.09983 • Published Aug 13 • 68
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences Paper • 2508.03542 • Published Aug 5 • 4
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 180
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning Paper • 2507.22565 • Published Jul 30 • 9
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Paper • 2507.16814 • Published Jul 22 • 21
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games Paper • 2506.05309 • Published Jun 5 • 16
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation Paper • 2506.08570 • Published Jun 10 • 33
WHISTRESS: Enriching Transcriptions with Sentence Stress Detection Paper • 2505.19103 • Published May 25 • 13
Fast Text-to-Audio Generation with Adversarial Post-Training Paper • 2505.08175 • Published May 13 • 25
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Paper • 2504.01137 • Published Apr 1 • 21
Scaling Analysis of Interleaved Speech-Text Language Models Paper • 2504.02398 • Published Apr 3 • 31
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published Apr 1 • 37