Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper • 2512.07461 • Published 2 days ago • 60
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 208
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation Paper • 2509.23866 • Published Sep 28 • 13
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Paper • 2506.08672 • Published Jun 10 • 30
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection Paper • 2505.16475 • Published May 22 • 3
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space Paper • 2505.13308 • Published May 19 • 27
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts Paper • 2503.22952 • Published Mar 29 • 17
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Paper • 2502.18890 • Published Feb 26 • 30
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 376
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2, 2024 • 27
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published Jun 24, 2024 • 26
RAM: Towards an Ever-Improving Memory System by Learning from Communications Paper • 2404.12045 • Published Apr 18, 2024 • 2
In-Context Editing: Learning Knowledge from Self-Induced Distributions Paper • 2406.11194 • Published Jun 17, 2024 • 20