5.7 KiB
Mineru API Documentation
This document describes the FastAPI interface for the Mineru document parsing service.
Overview
The Mineru API provides endpoints for parsing documents (PDFs, images) using advanced OCR and layout analysis. It supports both pipeline and VLM backends for different use cases.
Base URL
http://localhost:8000/api/v1/mineru
Endpoints
1. Health Check
GET /health
Check if the Mineru service is running.
Response:
{
"status": "healthy",
"service": "mineru"
}
2. Parse Document
POST /parse
Parse a document using Mineru's advanced parsing capabilities.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
file |
File | Required | The document file to parse (PDF, PNG, JPEG, JPG) |
lang |
string | "ch" | Language option ('ch', 'en', 'korean', 'japan', etc.) |
backend |
string | "pipeline" | Backend for parsing ('pipeline', 'vlm-transformers', 'vlm-sglang-engine', 'vlm-sglang-client') |
method |
string | "auto" | Method for parsing ('auto', 'txt', 'ocr') |
server_url |
string | null | Server URL for vlm-sglang-client backend |
start_page_id |
int | 0 | Start page ID for parsing |
end_page_id |
int | null | End page ID for parsing |
formula_enable |
boolean | true | Enable formula parsing |
table_enable |
boolean | true | Enable table parsing |
draw_layout_bbox |
boolean | true | Whether to draw layout bounding boxes |
draw_span_bbox |
boolean | true | Whether to draw span bounding boxes |
dump_md |
boolean | true | Whether to dump markdown files |
dump_middle_json |
boolean | true | Whether to dump middle JSON files |
dump_model_output |
boolean | true | Whether to dump model output files |
dump_orig_pdf |
boolean | true | Whether to dump original PDF files |
dump_content_list |
boolean | true | Whether to dump content list files |
make_md_mode |
string | "MM_MD" | The mode for making markdown content |
Response:
{
"status": "success",
"file_name": "document_name",
"outputs": {
"markdown": "/path/to/document_name.md",
"middle_json": "/path/to/document_name_middle.json",
"model_output": "/path/to/document_name_model.json",
"content_list": "/path/to/document_name_content_list.json",
"original_pdf": "/path/to/document_name_origin.pdf",
"layout_pdf": "/path/to/document_name_layout.pdf",
"span_pdf": "/path/to/document_name_span.pdf"
},
"output_directory": "/path/to/output/directory"
}
3. Download Processed File
GET /download/{file_path}
Download a processed file from the Mineru output directory.
Parameters:
file_path: Path to the file relative to the mineru output directory
Response: File download
Usage Examples
Python Example
import requests
# Parse a document
with open('document.pdf', 'rb') as f:
files = {'file': ('document.pdf', f, 'application/pdf')}
params = {
'lang': 'ch',
'backend': 'pipeline',
'method': 'auto',
'formula_enable': True,
'table_enable': True
}
response = requests.post(
'http://localhost:8000/api/v1/mineru/parse',
files=files,
params=params
)
if response.status_code == 200:
result = response.json()
print(f"Parsed successfully: {result['file_name']}")
# Download the markdown file
md_path = result['outputs']['markdown']
download_response = requests.get(
f'http://localhost:8000/api/v1/mineru/download/{md_path}'
)
with open('output.md', 'wb') as f:
f.write(download_response.content)
cURL Example
# Parse a document
curl -X POST "http://localhost:8000/api/v1/mineru/parse" \
-F "file=@document.pdf" \
-F "lang=ch" \
-F "backend=pipeline" \
-F "method=auto"
# Download a processed file
curl -X GET "http://localhost:8000/api/v1/mineru/download/path/to/file.md" \
-o downloaded_file.md
Backend Options
Pipeline Backend
- Use case: General purpose, more robust
- Advantages: Better for complex layouts, supports multiple languages
- Command:
backend=pipeline
VLM Backends
- vlm-transformers: General purpose VLM
- vlm-sglang-engine: Faster engine-based approach
- vlm-sglang-client: Fastest client-based approach (requires server_url)
Language Support
Supported languages for the pipeline backend:
ch: Chinese (Simplified)en: Englishkorean: Koreanjapan: Japanesechinese_cht: Chinese (Traditional)ta: Tamilte: Teluguka: Kannada
Output Files
The API generates various output files depending on the parameters:
- Markdown (
.md): Structured text content - Middle JSON (
.json): Intermediate parsing results - Model Output (
.jsonor.txt): Raw model predictions - Content List (
.json): Structured content list - Original PDF: Copy of the input file
- Layout PDF: PDF with layout bounding boxes
- Span PDF: PDF with span bounding boxes
Error Handling
The API returns appropriate HTTP status codes:
200: Success400: Bad request (invalid parameters, unsupported file type)404: File not found500: Internal server error
Error responses include a detail message explaining the issue.
Testing
Use the provided test script to verify the API:
python test_mineru_api.py
Notes
- The API creates unique output directories for each request to avoid conflicts
- Temporary files are automatically cleaned up after processing
- File downloads are restricted to the processed folder for security
- Large files may take time to process depending on the backend and document complexity