5.7 KiB

Raw Blame History

Mineru API Documentation

This document describes the FastAPI interface for the Mineru document parsing service.

Overview

The Mineru API provides endpoints for parsing documents (PDFs, images) using advanced OCR and layout analysis. It supports both pipeline and VLM backends for different use cases.

Base URL

http://localhost:8000/api/v1/mineru

Endpoints

1. Health Check

GET /health

Check if the Mineru service is running.

Response:

{
  "status": "healthy",
  "service": "mineru"
}

2. Parse Document

POST /parse

Parse a document using Mineru's advanced parsing capabilities.

Parameters:

Parameter	Type	Default	Description
`file`	File	Required	The document file to parse (PDF, PNG, JPEG, JPG)
`lang`	string	"ch"	Language option ('ch', 'en', 'korean', 'japan', etc.)
`backend`	string	"pipeline"	Backend for parsing ('pipeline', 'vlm-transformers', 'vlm-sglang-engine', 'vlm-sglang-client')
`method`	string	"auto"	Method for parsing ('auto', 'txt', 'ocr')
`server_url`	string	null	Server URL for vlm-sglang-client backend
`start_page_id`	int	0	Start page ID for parsing
`end_page_id`	int	null	End page ID for parsing
`formula_enable`	boolean	true	Enable formula parsing
`table_enable`	boolean	true	Enable table parsing
`draw_layout_bbox`	boolean	true	Whether to draw layout bounding boxes
`draw_span_bbox`	boolean	true	Whether to draw span bounding boxes
`dump_md`	boolean	true	Whether to dump markdown files
`dump_middle_json`	boolean	true	Whether to dump middle JSON files
`dump_model_output`	boolean	true	Whether to dump model output files
`dump_orig_pdf`	boolean	true	Whether to dump original PDF files
`dump_content_list`	boolean	true	Whether to dump content list files
`make_md_mode`	string	"MM_MD"	The mode for making markdown content

Response:

{
  "status": "success",
  "file_name": "document_name",
  "outputs": {
    "markdown": "/path/to/document_name.md",
    "middle_json": "/path/to/document_name_middle.json",
    "model_output": "/path/to/document_name_model.json",
    "content_list": "/path/to/document_name_content_list.json",
    "original_pdf": "/path/to/document_name_origin.pdf",
    "layout_pdf": "/path/to/document_name_layout.pdf",
    "span_pdf": "/path/to/document_name_span.pdf"
  },
  "output_directory": "/path/to/output/directory"
}

3. Download Processed File

GET /download/{file_path}

Download a processed file from the Mineru output directory.

Parameters:

file_path: Path to the file relative to the mineru output directory

Response: File download

Usage Examples

Python Example

import requests

# Parse a document
with open('document.pdf', 'rb') as f:
    files = {'file': ('document.pdf', f, 'application/pdf')}
    params = {
        'lang': 'ch',
        'backend': 'pipeline',
        'method': 'auto',
        'formula_enable': True,
        'table_enable': True
    }
    
    response = requests.post(
        'http://localhost:8000/api/v1/mineru/parse',
        files=files,
        params=params
    )
    
    if response.status_code == 200:
        result = response.json()
        print(f"Parsed successfully: {result['file_name']}")
        
        # Download the markdown file
        md_path = result['outputs']['markdown']
        download_response = requests.get(
            f'http://localhost:8000/api/v1/mineru/download/{md_path}'
        )
        
        with open('output.md', 'wb') as f:
            f.write(download_response.content)

cURL Example

# Parse a document
curl -X POST "http://localhost:8000/api/v1/mineru/parse" \
  -F "file=@document.pdf" \
  -F "lang=ch" \
  -F "backend=pipeline" \
  -F "method=auto"

# Download a processed file
curl -X GET "http://localhost:8000/api/v1/mineru/download/path/to/file.md" \
  -o downloaded_file.md

Backend Options

Pipeline Backend

Use case: General purpose, more robust
Advantages: Better for complex layouts, supports multiple languages
Command: backend=pipeline

VLM Backends

vlm-transformers: General purpose VLM
vlm-sglang-engine: Faster engine-based approach
vlm-sglang-client: Fastest client-based approach (requires server_url)

Language Support

Supported languages for the pipeline backend:

ch: Chinese (Simplified)
en: English
korean: Korean
japan: Japanese
chinese_cht: Chinese (Traditional)
ta: Tamil
te: Telugu
ka: Kannada

Output Files

The API generates various output files depending on the parameters:

Markdown (.md): Structured text content
Middle JSON (.json): Intermediate parsing results
Model Output (.json or .txt): Raw model predictions
Content List (.json): Structured content list
Original PDF: Copy of the input file
Layout PDF: PDF with layout bounding boxes
Span PDF: PDF with span bounding boxes

Error Handling

The API returns appropriate HTTP status codes:

200: Success
400: Bad request (invalid parameters, unsupported file type)
404: File not found
500: Internal server error

Error responses include a detail message explaining the issue.

Testing

Use the provided test script to verify the API:

python test_mineru_api.py

Notes

The API creates unique output directories for each request to avoid conflicts
Temporary files are automatically cleaned up after processing
File downloads are restricted to the processed folder for security
Large files may take time to process depending on the backend and document complexity

5.7 KiB Raw Blame History