Frontend Guide
Output formats
Ragify produces up to 6 output formats per job. Free accounts get Markdown and Plain Text; Pro and Business unlock all formats plus image extraction.
markdown
Free+
json
Pro+
html
Pro+
text
Free+
tagged-pdf
Pro+
annotated-pdf
Pro+
Markdown
markdownThe most versatile format. Ideal for RAG pipelines, LLM ingestion, and human review.
- Headings mapped to
#,##,###by hierarchy - Tables rendered as GFM (GitHub Flavored Markdown) pipe tables
- Lists, bold, italic, strikethrough (if detect_strikethrough enabled)
- Images embedded as
or base64 depending on image_output setting - Page breaks as horizontal rules
---
# Q3 Financial Report ## Revenue Overview | Quarter | Revenue | Growth | |---------|---------|--------| | Q1 2026 | €2.1M | +18% | | Q2 2026 | €2.3M | +21% | | Q3 2026 | €2.4M | +34% | Year-over-year growth reached **34%** driven by expansion in the European market.
✦ Tip
JSON
jsonThe richest format. Every document element is an object with type, content, page number, bounding box, font information, and hierarchy. Best for programmatic processing and data extraction.
{
"number of pages": 12,
"title": "Q3 Financial Report 2026",
"kids": [
{
"type": "heading",
"level": "Title",
"content": "Q3 Financial Report",
"page_number": 1,
"bbox": [72, 54, 540, 80]
},
{
"type": "table",
"content": "Quarter | Revenue | Growth\nQ1 | €2.1M | +18%",
"page_number": 3,
"bbox": [72, 200, 540, 350],
"rows": [
["Quarter", "Revenue", "Growth"],
["Q1 2026", "€2.1M", "+18%"]
]
},
{
"type": "paragraph",
"content": "Year-over-year growth reached 34%...",
"page_number": 3,
"font_size": 11,
"font_name": "Arial"
}
]
}Element types
| Type | Description |
|---|---|
| heading | Section heading with level (Title, H1–H6) |
| paragraph | Body text block |
| table | Table with rows/cells, also serialised as plain text in content |
| list_item | Bullet or numbered list item |
| figure | Image or diagram with optional caption |
| formula | Mathematical formula (Hybrid mode) |
| caption | Figure or table caption |
| page_header | Running header (requires include_header_footer) |
| page_footer | Running footer (requires include_header_footer) |
HTML
htmlA self-contained HTML file with embedded styles. Suitable for web rendering, archiving, or feeding into HTML-aware processing pipelines.
- Full document structure with
<h1>–<h6>,<p>,<table>,<ul> - Basic CSS included inline — renders in any browser
- If
image_output=embeddedis set, images are base64-encoded data URIs inside the HTML
⚠ Warning
Plain Text
textRaw text extraction with minimal formatting. Reading order is applied (xycut by default). Useful for full-text search indexing, simple NLP pipelines, or when structure is not needed.
✦ Tip
Tagged PDF
tagged-pdfA PDF with accessibility tags (PDF/UA) embedded. The document structure (headings, paragraphs, tables, reading order) is embedded as XMP metadata. Suitable for accessibility compliance, screen readers, and archiving workflows.
Annotated PDF
annotated-pdfThe original PDF with bounding boxes drawn as visible annotations around each detected element. Primarily a debugging and quality-control tool — useful for verifying that the parser is correctly identifying headings, tables, and paragraphs.
◆ Note
Image extraction (ZIP)
imagesWhen image_output is set to external or embedded, images extracted from the PDF are packaged into a ZIP archive alongside your other output files. The ZIP contains one file per image (PNG or JPEG depending on image_format).
- external — images saved as separate files in the ZIP. References in Markdown/HTML point to relative paths.
- embedded — images base64-encoded inside the HTML output. Requires HTML format to be selected.