-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2503.21620
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Paper • 2503.16874 • Published • 44 -
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Paper • 2503.21729 • Published • 29 -
Modifying Large Language Model Post-Training for Diverse Creative Writing
Paper • 2503.17126 • Published • 36 -
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
Paper • 2503.21620 • Published • 62
-
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 29 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 65 -
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Paper • 2406.08451 • Published • 25 -
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
Paper • 2406.10819 • Published • 2
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies
Paper • 2506.14477 • Published -
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper • 2406.10227 • Published • 9 -
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Paper • 2506.03143 • Published • 53 -
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
Paper • 2503.21620 • Published • 62
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 133 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 122 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27
-
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration
Paper • 2502.17110 • Published • 13 -
WebGames: Challenging General-Purpose Web-Browsing AI Agents
Paper • 2502.18356 • Published • 14 -
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Paper • 2502.18906 • Published • 12 -
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Paper • 2503.02268 • Published • 11
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies
Paper • 2506.14477 • Published -
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper • 2406.10227 • Published • 9 -
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Paper • 2506.03143 • Published • 53 -
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
Paper • 2503.21620 • Published • 62
-
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Paper • 2503.16874 • Published • 44 -
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Paper • 2503.21729 • Published • 29 -
Modifying Large Language Model Post-Training for Diverse Creative Writing
Paper • 2503.17126 • Published • 36 -
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
Paper • 2503.21620 • Published • 62
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 133 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 122 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27
-
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 29 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 65 -
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Paper • 2406.08451 • Published • 25 -
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
Paper • 2406.10819 • Published • 2
-
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration
Paper • 2502.17110 • Published • 13 -
WebGames: Challenging General-Purpose Web-Browsing AI Agents
Paper • 2502.18356 • Published • 14 -
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Paper • 2502.18906 • Published • 12 -
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Paper • 2503.02268 • Published • 11
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63