legal-doc-masker/mineru/MINERU_API_README.md

5.7 KiB

Mineru API Documentation

This document describes the FastAPI interface for the Mineru document parsing service.

Overview

The Mineru API provides endpoints for parsing documents (PDFs, images) using advanced OCR and layout analysis. It supports both pipeline and VLM backends for different use cases.

Base URL

http://localhost:8000/api/v1/mineru

Endpoints

1. Health Check

GET /health

Check if the Mineru service is running.

Response:

{
  "status": "healthy",
  "service": "mineru"
}

2. Parse Document

POST /parse

Parse a document using Mineru's advanced parsing capabilities.

Parameters:

Parameter Type Default Description
file File Required The document file to parse (PDF, PNG, JPEG, JPG)
lang string "ch" Language option ('ch', 'en', 'korean', 'japan', etc.)
backend string "pipeline" Backend for parsing ('pipeline', 'vlm-transformers', 'vlm-sglang-engine', 'vlm-sglang-client')
method string "auto" Method for parsing ('auto', 'txt', 'ocr')
server_url string null Server URL for vlm-sglang-client backend
start_page_id int 0 Start page ID for parsing
end_page_id int null End page ID for parsing
formula_enable boolean true Enable formula parsing
table_enable boolean true Enable table parsing
draw_layout_bbox boolean true Whether to draw layout bounding boxes
draw_span_bbox boolean true Whether to draw span bounding boxes
dump_md boolean true Whether to dump markdown files
dump_middle_json boolean true Whether to dump middle JSON files
dump_model_output boolean true Whether to dump model output files
dump_orig_pdf boolean true Whether to dump original PDF files
dump_content_list boolean true Whether to dump content list files
make_md_mode string "MM_MD" The mode for making markdown content

Response:

{
  "status": "success",
  "file_name": "document_name",
  "outputs": {
    "markdown": "/path/to/document_name.md",
    "middle_json": "/path/to/document_name_middle.json",
    "model_output": "/path/to/document_name_model.json",
    "content_list": "/path/to/document_name_content_list.json",
    "original_pdf": "/path/to/document_name_origin.pdf",
    "layout_pdf": "/path/to/document_name_layout.pdf",
    "span_pdf": "/path/to/document_name_span.pdf"
  },
  "output_directory": "/path/to/output/directory"
}

3. Download Processed File

GET /download/{file_path}

Download a processed file from the Mineru output directory.

Parameters:

  • file_path: Path to the file relative to the mineru output directory

Response: File download

Usage Examples

Python Example

import requests

# Parse a document
with open('document.pdf', 'rb') as f:
    files = {'file': ('document.pdf', f, 'application/pdf')}
    params = {
        'lang': 'ch',
        'backend': 'pipeline',
        'method': 'auto',
        'formula_enable': True,
        'table_enable': True
    }
    
    response = requests.post(
        'http://localhost:8000/api/v1/mineru/parse',
        files=files,
        params=params
    )
    
    if response.status_code == 200:
        result = response.json()
        print(f"Parsed successfully: {result['file_name']}")
        
        # Download the markdown file
        md_path = result['outputs']['markdown']
        download_response = requests.get(
            f'http://localhost:8000/api/v1/mineru/download/{md_path}'
        )
        
        with open('output.md', 'wb') as f:
            f.write(download_response.content)

cURL Example

# Parse a document
curl -X POST "http://localhost:8000/api/v1/mineru/parse" \
  -F "file=@document.pdf" \
  -F "lang=ch" \
  -F "backend=pipeline" \
  -F "method=auto"

# Download a processed file
curl -X GET "http://localhost:8000/api/v1/mineru/download/path/to/file.md" \
  -o downloaded_file.md

Backend Options

Pipeline Backend

  • Use case: General purpose, more robust
  • Advantages: Better for complex layouts, supports multiple languages
  • Command: backend=pipeline

VLM Backends

  • vlm-transformers: General purpose VLM
  • vlm-sglang-engine: Faster engine-based approach
  • vlm-sglang-client: Fastest client-based approach (requires server_url)

Language Support

Supported languages for the pipeline backend:

  • ch: Chinese (Simplified)
  • en: English
  • korean: Korean
  • japan: Japanese
  • chinese_cht: Chinese (Traditional)
  • ta: Tamil
  • te: Telugu
  • ka: Kannada

Output Files

The API generates various output files depending on the parameters:

  1. Markdown (.md): Structured text content
  2. Middle JSON (.json): Intermediate parsing results
  3. Model Output (.json or .txt): Raw model predictions
  4. Content List (.json): Structured content list
  5. Original PDF: Copy of the input file
  6. Layout PDF: PDF with layout bounding boxes
  7. Span PDF: PDF with span bounding boxes

Error Handling

The API returns appropriate HTTP status codes:

  • 200: Success
  • 400: Bad request (invalid parameters, unsupported file type)
  • 404: File not found
  • 500: Internal server error

Error responses include a detail message explaining the issue.

Testing

Use the provided test script to verify the API:

python test_mineru_api.py

Notes

  • The API creates unique output directories for each request to avoid conflicts
  • Temporary files are automatically cleaned up after processing
  • File downloads are restricted to the processed folder for security
  • Large files may take time to process depending on the backend and document complexity