Collections
Discover the best community collections!
Collections including paper arxiv:2509.17765
-
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Paper • 2510.23763 • Published • 53 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 89 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 139 -
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Paper • 2510.13747 • Published • 29
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 60 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 79 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 91
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 10 -
Symbolic Graphics Programming with Large Language Models
Paper • 2509.05208 • Published • 46 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 104
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 30 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 39 -
BitNet Distillation
Paper • 2510.13998 • Published • 54 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 48
-
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 239 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 210 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 193
-
LIMI: Less is More for Agency
Paper • 2509.17567 • Published • 102 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 139 -
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
Paper • 2509.17437 • Published • 17 -
EpiCache: Episodic KV Cache Management for Long Conversational Question Answering
Paper • 2509.17396 • Published • 19
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Paper • 2510.23763 • Published • 53 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 89 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 139 -
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Paper • 2510.13747 • Published • 29
-
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 30 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 39 -
BitNet Distillation
Paper • 2510.13998 • Published • 54 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 48
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 60 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 79 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 91
-
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 239 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 210 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 193
-
LIMI: Less is More for Agency
Paper • 2509.17567 • Published • 102 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 139 -
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
Paper • 2509.17437 • Published • 17 -
EpiCache: Episodic KV Cache Management for Long Conversational Question Answering
Paper • 2509.17396 • Published • 19
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 10 -
Symbolic Graphics Programming with Large Language Models
Paper • 2509.05208 • Published • 46 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 104
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48