LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 10 days ago • 121
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Paper • 2601.09575 • Published 2 days ago • 23
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 8 days ago • 191
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published 8 days ago • 157
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 10 days ago • 95
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 117
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos Paper • 2512.10881 • Published Dec 11, 2025 • 28
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published Oct 6, 2025 • 128
Composing Concepts from Images and Videos via Concept-prompt Binding Paper • 2512.09824 • Published Dec 10, 2025 • 27
OmniPSD: Layered PSD Generation with Diffusion Transformer Paper • 2512.09247 • Published Dec 10, 2025 • 46
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation Paper • 2512.07831 • Published Dec 8, 2025 • 16
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper • 2512.07951 • Published Dec 8, 2025 • 48
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 130
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards Paper • 2512.00473 • Published Nov 29, 2025 • 25
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs Paper • 2512.07525 • Published Dec 8, 2025 • 58