Documentation
Ragify Documentation
Everything you need to convert PDFs to structured output — from the web app to full API integration.
Quick start
Convert your first PDF in 2 minutes
Output formats
JSON, Markdown, HTML, Plain Text, Tagged PDF, Annotated PDF
Parser options
Reading order, tables, images, PII sanitization and more
API reference
Authenticate and call the REST API from any client
What is Ragify?
Ragify is a PDF-to-structured-output SaaS. You upload a PDF and receive clean, machine-readable output in one or more formats. It is designed specifically for AI/RAG pipelines, data extraction, and document automation.
The parsing engine is opendataloader-pdf — a high-performance Java-based engine that processes most PDFs in under 5 seconds without any external AI calls in the default mode. An optional Hybrid AI mode (Pro/Business) routes complex pages to a dedicated AI backend for significantly higher accuracy on tables, formulas, and scanned documents.
How it works
Upload
Submit a PDF via the web UI or via POST /jobs through the API.
Configure
Choose output formats and parser options (reading order, table detection, images, page range, etc.).
Process
The job is queued and processed asynchronously. Free jobs go to q_free, Pro to q_pro, Business to q_business (highest priority).
Download
When status is done, download your output files. Files are available for 30 days, then automatically deleted.
Plans and limits
| Plan | Pages/mo | Max file | Formats | API keys |
|---|---|---|---|---|
| Free | 50 | 25 MB | Markdown, Plain Text | — |
| Pro — €12/mo | 500 | 150 MB | All 6 + images | 3 |
| Business — €49/mo | Unlimited | Unlimited | All 6 + images | 10 |
◆ Note
Supported PDF types
- Digital PDFs — text-based, exporterd from Word, InDesign, LaTeX, etc. Best accuracy, fastest processing.
- Multi-column layouts — academic papers, newspapers. Accurate with xycut reading order.
- Table-heavy documents — financial reports, invoices. Cluster table method recommended.
- Password-protected PDFs — pass the password in options. It is never stored.
- Scanned PDFs — require Hybrid AI mode with OCR (Pro/Business).
- Multi-language — all languages supported. OCR engine auto-detects language in Hybrid mode.
Content safety
Ragify applies a set of content safety filters on every parse job. These filters protect against prompt-injection attacks hidden inside PDFs using rendering tricks such as tiny invisible text, off-page elements, or hidden OCG layers. The filters are always active and cannot be disabled by users.