--- title: olmOCR Markdown Converter emoji: 📝 colorFrom: yellow colorTo: blue sdk: gradio sdk_version: 6.0.2 app_file: app.py python_version: 3.11 license: mit --- # olmOCR Markdown Converter This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting — ready for Calibre/Kindle or downstream parsing. - ✅ Vision + text anchor OCR pipeline (via `olmOCR`) - ✅ Extracts semantic structure via PDF TOC - ✅ Outputs clean `.txt` in markdown format - ✅ Hugging Face **Gradio Space with GPU support** ## Example Use Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure. --- Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)