PubTables-1M: Towards comprehensive table extraction from unstructured documents Paper • 2110.00061 • Published Sep 30, 2021 • 3
Optimized Table Tokenization for Table Structure Recognition Paper • 2305.03393 • Published May 5, 2023 • 1
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16 • 106
DocReward: A Document Reward Model for Structuring and Stylizing Paper • 2510.11391 • Published Oct 13 • 27
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding Paper • 2408.14764 • Published Aug 27, 2024
OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation Paper • 2510.26213 • Published Oct 30 • 9
MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns Paper • 2511.10390 • Published 28 days ago
Structured Document Translation via Format Reinforcement Learning Paper • 2512.05100 • Published 7 days ago • 1