# Mineru API Documentation This document describes the FastAPI interface for the Mineru document parsing service. ## Overview The Mineru API provides endpoints for parsing documents (PDFs, images) using advanced OCR and layout analysis. It supports both pipeline and VLM backends for different use cases. ## Base URL ``` http://localhost:8000/api/v1/mineru ``` ## Endpoints ### 1. Health Check **GET** `/health` Check if the Mineru service is running. **Response:** ```json { "status": "healthy", "service": "mineru" } ``` ### 2. Parse Document **POST** `/parse` Parse a document using Mineru's advanced parsing capabilities. **Parameters:** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `file` | File | Required | The document file to parse (PDF, PNG, JPEG, JPG) | | `lang` | string | "ch" | Language option ('ch', 'en', 'korean', 'japan', etc.) | | `backend` | string | "pipeline" | Backend for parsing ('pipeline', 'vlm-transformers', 'vlm-sglang-engine', 'vlm-sglang-client') | | `method` | string | "auto" | Method for parsing ('auto', 'txt', 'ocr') | | `server_url` | string | null | Server URL for vlm-sglang-client backend | | `start_page_id` | int | 0 | Start page ID for parsing | | `end_page_id` | int | null | End page ID for parsing | | `formula_enable` | boolean | true | Enable formula parsing | | `table_enable` | boolean | true | Enable table parsing | | `draw_layout_bbox` | boolean | true | Whether to draw layout bounding boxes | | `draw_span_bbox` | boolean | true | Whether to draw span bounding boxes | | `dump_md` | boolean | true | Whether to dump markdown files | | `dump_middle_json` | boolean | true | Whether to dump middle JSON files | | `dump_model_output` | boolean | true | Whether to dump model output files | | `dump_orig_pdf` | boolean | true | Whether to dump original PDF files | | `dump_content_list` | boolean | true | Whether to dump content list files | | `make_md_mode` | string | "MM_MD" | The mode for making markdown content | **Response:** ```json { "status": "success", "file_name": "document_name", "outputs": { "markdown": "/path/to/document_name.md", "middle_json": "/path/to/document_name_middle.json", "model_output": "/path/to/document_name_model.json", "content_list": "/path/to/document_name_content_list.json", "original_pdf": "/path/to/document_name_origin.pdf", "layout_pdf": "/path/to/document_name_layout.pdf", "span_pdf": "/path/to/document_name_span.pdf" }, "output_directory": "/path/to/output/directory" } ``` ### 3. Download Processed File **GET** `/download/{file_path}` Download a processed file from the Mineru output directory. **Parameters:** - `file_path`: Path to the file relative to the mineru output directory **Response:** File download ## Usage Examples ### Python Example ```python import requests # Parse a document with open('document.pdf', 'rb') as f: files = {'file': ('document.pdf', f, 'application/pdf')} params = { 'lang': 'ch', 'backend': 'pipeline', 'method': 'auto', 'formula_enable': True, 'table_enable': True } response = requests.post( 'http://localhost:8000/api/v1/mineru/parse', files=files, params=params ) if response.status_code == 200: result = response.json() print(f"Parsed successfully: {result['file_name']}") # Download the markdown file md_path = result['outputs']['markdown'] download_response = requests.get( f'http://localhost:8000/api/v1/mineru/download/{md_path}' ) with open('output.md', 'wb') as f: f.write(download_response.content) ``` ### cURL Example ```bash # Parse a document curl -X POST "http://localhost:8000/api/v1/mineru/parse" \ -F "file=@document.pdf" \ -F "lang=ch" \ -F "backend=pipeline" \ -F "method=auto" # Download a processed file curl -X GET "http://localhost:8000/api/v1/mineru/download/path/to/file.md" \ -o downloaded_file.md ``` ## Backend Options ### Pipeline Backend - **Use case**: General purpose, more robust - **Advantages**: Better for complex layouts, supports multiple languages - **Command**: `backend=pipeline` ### VLM Backends - **vlm-transformers**: General purpose VLM - **vlm-sglang-engine**: Faster engine-based approach - **vlm-sglang-client**: Fastest client-based approach (requires server_url) ## Language Support Supported languages for the pipeline backend: - `ch`: Chinese (Simplified) - `en`: English - `korean`: Korean - `japan`: Japanese - `chinese_cht`: Chinese (Traditional) - `ta`: Tamil - `te`: Telugu - `ka`: Kannada ## Output Files The API generates various output files depending on the parameters: 1. **Markdown** (`.md`): Structured text content 2. **Middle JSON** (`.json`): Intermediate parsing results 3. **Model Output** (`.json` or `.txt`): Raw model predictions 4. **Content List** (`.json`): Structured content list 5. **Original PDF**: Copy of the input file 6. **Layout PDF**: PDF with layout bounding boxes 7. **Span PDF**: PDF with span bounding boxes ## Error Handling The API returns appropriate HTTP status codes: - `200`: Success - `400`: Bad request (invalid parameters, unsupported file type) - `404`: File not found - `500`: Internal server error Error responses include a detail message explaining the issue. ## Testing Use the provided test script to verify the API: ```bash python test_mineru_api.py ``` ## Notes - The API creates unique output directories for each request to avoid conflicts - Temporary files are automatically cleaned up after processing - File downloads are restricted to the processed folder for security - Large files may take time to process depending on the backend and document complexity