Documentation

Ragify Documentation

Everything you need to convert PDFs to structured output — from the web app to full API integration.

What is Ragify?

Ragify is a PDF-to-structured-output SaaS. You upload a PDF and receive clean, machine-readable output in one or more formats. It is designed specifically for AI/RAG pipelines, data extraction, and document automation.

The parsing engine is opendataloader-pdf — a high-performance Java-based engine that processes most PDFs in under 5 seconds without any external AI calls in the default mode. An optional Hybrid AI mode (Pro/Business) routes complex pages to a dedicated AI backend for significantly higher accuracy on tables, formulas, and scanned documents.

How it works

1

Upload

Submit a PDF via the web UI or via POST /jobs through the API.

2

Configure

Choose output formats and parser options (reading order, table detection, images, page range, etc.).

3

Process

The job is queued and processed asynchronously. Free jobs go to q_free, Pro to q_pro, Business to q_business (highest priority).

4

Download

When status is done, download your output files. Files are available for 30 days, then automatically deleted.

Plans and limits

PlanPages/moMax fileFormatsAPI keys
Free5025 MBMarkdown, Plain Text
Pro — €12/mo500150 MBAll 6 + images3
Business — €49/moUnlimitedUnlimitedAll 6 + images10

Note

Monthly page counts reset on the 1st of each calendar month (UTC). Unused pages do not carry over. Files and outputs are automatically deleted 30 days after upload.

Supported PDF types

  • Digital PDFs — text-based, exporterd from Word, InDesign, LaTeX, etc. Best accuracy, fastest processing.
  • Multi-column layouts — academic papers, newspapers. Accurate with xycut reading order.
  • Table-heavy documents — financial reports, invoices. Cluster table method recommended.
  • Password-protected PDFs — pass the password in options. It is never stored.
  • Scanned PDFs — require Hybrid AI mode with OCR (Pro/Business).
  • Multi-language — all languages supported. OCR engine auto-detects language in Hybrid mode.

Content safety

Ragify applies a set of content safety filters on every parse job. These filters protect against prompt-injection attacks hidden inside PDFs using rendering tricks such as tiny invisible text, off-page elements, or hidden OCG layers. The filters are always active and cannot be disabled by users.