StoryMem: Multi-shot Long Video Storytelling with Memory Paper • 2512.19539 • Published 4 days ago • 16
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 8 days ago • 188
VABench: A Comprehensive Benchmark for Audio-Video Generation Paper • 2512.09299 • Published 17 days ago • 7
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 139
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
Trace Anything: Representing Any Video in 4D via Trajectory Fields Paper • 2510.13802 • Published Oct 15 • 30
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 139
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Paper • 2507.04009 • Published Jul 5 • 51
Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models Paper • 2506.12776 • Published Jun 15 • 2