---
title: olmOCR Markdown Converter
emoji: 📝
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
python_version: 3.11
license: mit
---

# olmOCR Markdown Converter

This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting — ready for Calibre/Kindle or downstream parsing.

- ✅ Vision + text anchor OCR pipeline (via `olmOCR`)
- ✅ Extracts semantic structure via PDF TOC
- ✅ Outputs clean `.txt` in markdown format
- ✅ Hugging Face **Gradio Space with GPU support**

## Example Use

Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure.

---

Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)