CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation Paper • 2409.02098 • Published Sep 3, 2024 • 2
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20 • 19
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Paper • 2012.14740 • Published Dec 29, 2020 • 2
Structured 3D Latents for Scalable and Versatile 3D Generation Paper • 2412.01506 • Published Dec 2, 2024 • 84
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Paper • 2409.09269 • Published Sep 14, 2024 • 8
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13, 2024 • 53
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models Paper • 2405.13974 • Published May 22, 2024 • 10
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 133
view article Article Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task May 16, 2024 • 17