1 15 4

Xiaobo Wang

Yofuria

https://github.com/Yofuria

Yofuria

AI & ML interests

Natural Language Processing

Recent Activity

updated a dataset about 14 hours ago

Yofuria/UltraFeedback-ms-swift-thinkstep

published a dataset about 14 hours ago

Yofuria/UltraFeedback-ms-swift-thinkstep

upvoted a paper 1 day ago

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

View all activity

Organizations

upvoted a paper 1 day ago

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Paper • 2512.07461 • Published 2 days ago • 60

upvoted a collection 7 days ago

Qwen3

Collection

84 items • Updated Aug 6 • 1.48k

upvoted a paper about 1 month ago

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6 • 208

upvoted a paper about 2 months ago

Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

Paper • 2509.23866 • Published Sep 28 • 13

upvoted a paper 4 months ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 315

upvoted 2 papers 6 months ago

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Paper • 2506.08672 • Published Jun 10 • 30

ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection

Paper • 2505.16475 • Published May 22 • 3

upvoted a paper 7 months ago

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Paper • 2505.13308 • Published May 19 • 27

upvoted a paper 8 months ago

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Paper • 2503.22952 • Published Mar 29 • 17

upvoted a paper 9 months ago

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Paper • 2502.18890 • Published Feb 26 • 30

upvoted an article 10 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

376

upvoted 4 papers over 1 year ago

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published Sep 2, 2024 • 27

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 26

RAM: Towards an Ever-Improving Memory System by Learning from Communications

Paper • 2404.12045 • Published Apr 18, 2024 • 2

In-Context Editing: Learning Knowledge from Self-Induced Distributions

Paper • 2406.11194 • Published Jun 17, 2024 • 20

Xiaobo Wang

AI & ML interests

Recent Activity

Organizations

Yofuria's activity

Illustrating Reinforcement Learning from Human Feedback (RLHF)