OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Paper • 2511.20211 • Published 11 days ago • 12 • 2
OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Paper • 2511.20211 • Published 11 days ago • 12
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences Paper • 2306.07906 • Published Jun 13, 2023 • 13
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments Paper • 2402.14672 • Published Feb 22, 2024 • 1
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Paper • 2404.03648 • Published Apr 4, 2024 • 30
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published Jun 18, 2024 • 33
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12, 2024 • 17
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 49
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 124
OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Paper • 2511.20211 • Published 11 days ago • 12
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices Paper • 2506.03737 • Published Jun 4
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20 • 67
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9 • 125
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6 • 116
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios Paper • 2509.21766 • Published Sep 26 • 23
Weighted-Reward Preference Optimization for Implicit Model Fusion Paper • 2412.03187 • Published Dec 4, 2024 • 12
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 11
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation Paper • 2412.03558 • Published Dec 4, 2024 • 20