Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 2 days ago • 135
4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer Paper • 2512.05060 • Published 2 days ago • 17
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards Paper • 2512.00425 • Published 7 days ago • 45
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published 5 days ago • 48
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 4 days ago • 25
Self-Calibration Collection Efficient Test-Time Scaling via Self-Calibration https://arxiv.org/abs/2503.00031 • 7 items • Updated Jun 8 • 3
PosS-Speculative-Decoding Collection This collection contains models of the paper "PosS:Position Specialist Generates Better Draft for Speculative Decoding" • 9 items • Updated Jun 5 • 2
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published 5 days ago • 47
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion Paper • 2512.04926 • Published 2 days ago • 28
PixelDiT: Pixel Diffusion Transformers for Image Generation Paper • 2511.20645 • Published 11 days ago • 24
view article Article Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms 17 days ago • 29
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand 3 days ago • 39