-
Grounded Reinforcement Learning for Visual Reasoning
Paper • 2505.23678 • Published • 2 -
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Paper • 2505.22651 • Published • 49 -
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Paper • 2506.09985 • Published • 29
Xiaoqian Wu
PandaQQ
·
AI & ML interests
None yet
Recent Activity
updated
a collection
about 2 months ago
scene4D
updated
a collection
3 months ago
digital_twins
updated
a collection
3 months ago
scene4D
Organizations
robot
-
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50 -
Humanoid Policy ~ Human Policy
Paper • 2503.13441 • Published -
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Paper • 2503.16408 • Published • 42 -
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Paper • 2503.19757 • Published • 51
diffusion
-
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Paper • 2503.06674 • Published • 8 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34 -
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Paper • 2503.19904 • Published • 2 -
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
Paper • 2504.14899 • Published • 20
RL
digital_twins
-
CHOrD: Generation of Collision-Free, House-Scale, and Organized Digital Twins for 3D Indoor Scenes with Controllable Floor Plans and Optimal Layouts
Paper • 2503.11958 • Published • 5 -
Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation
Paper • 2503.13424 • Published • 30 -
PhysX: Physical-Grounded 3D Asset Generation
Paper • 2507.12465 • Published • 43 -
Gaussian Splatting with Discretized SDF for Relightable Assets
Paper • 2507.15629 • Published • 23
scene4D
-
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Paper • 2503.10437 • Published • 33 -
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Paper • 2503.09642 • Published • 19 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 35 -
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Paper • 2503.16422 • Published • 14
reasoning
-
Grounded Reinforcement Learning for Visual Reasoning
Paper • 2505.23678 • Published • 2 -
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Paper • 2505.22651 • Published • 49 -
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Paper • 2506.09985 • Published • 29
RL
robot
-
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50 -
Humanoid Policy ~ Human Policy
Paper • 2503.13441 • Published -
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Paper • 2503.16408 • Published • 42 -
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Paper • 2503.19757 • Published • 51
digital_twins
-
CHOrD: Generation of Collision-Free, House-Scale, and Organized Digital Twins for 3D Indoor Scenes with Controllable Floor Plans and Optimal Layouts
Paper • 2503.11958 • Published • 5 -
Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation
Paper • 2503.13424 • Published • 30 -
PhysX: Physical-Grounded 3D Asset Generation
Paper • 2507.12465 • Published • 43 -
Gaussian Splatting with Discretized SDF for Relightable Assets
Paper • 2507.15629 • Published • 23
diffusion
-
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Paper • 2503.06674 • Published • 8 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34 -
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Paper • 2503.19904 • Published • 2 -
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
Paper • 2504.14899 • Published • 20
scene4D
-
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Paper • 2503.10437 • Published • 33 -
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Paper • 2503.09642 • Published • 19 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 35 -
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Paper • 2503.16422 • Published • 14