15 20 4

Jiawei Wang

Jarvis1111

https://jarvisustc.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 24 days ago

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

upvoted a paper 27 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

upvoted a paper about 1 month ago

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

View all activity

Organizations

None yet

upvoted a paper 24 days ago

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Paper • 2512.04324 • Published 25 days ago • 149

upvoted a paper 27 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 28 days ago • 93

upvoted a paper about 1 month ago

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Paper • 2511.08577 • Published Nov 11 • 105

commented a paper 2 months ago

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Paper • 2509.09265 • Published Sep 11 • 47 •

authored a paper 3 months ago

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28 • 173

upvoted a paper 3 months ago

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28 • 173

authored a paper 4 months ago

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Paper • 2509.09265 • Published Sep 11 • 47

upvoted a paper 4 months ago

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Paper • 2509.09265 • Published Sep 11 • 47

commented a paper 4 months ago

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Paper • 2509.09265 • Published Sep 11 • 47 •

upvoted 4 papers 4 months ago

Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7 • 149

Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 194

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80

authored a paper 5 months ago

WideSearch: Benchmarking Agentic Broad Info-Seeking

Paper • 2508.07999 • Published Aug 11 • 110

upvoted a paper 5 months ago

WideSearch: Benchmarking Agentic Broad Info-Seeking

Paper • 2508.07999 • Published Aug 11 • 110

commented a paper 5 months ago

WideSearch: Benchmarking Agentic Broad Info-Seeking

Paper • 2508.07999 • Published Aug 11 • 110 •

upvoted 2 papers 5 months ago

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 180

Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 266

liked a dataset 5 months ago

ByteDance-Seed/WideSearch

Viewer • Updated Sep 8 • 200 • 9.86k • 31

upvoted a paper 5 months ago

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28 • 82

Jiawei Wang

AI & ML interests

Recent Activity

Organizations

Jarvis1111's activity