201 lines
5.7 KiB
Markdown
201 lines
5.7 KiB
Markdown
# Mineru API Documentation
|
|
|
|
This document describes the FastAPI interface for the Mineru document parsing service.
|
|
|
|
## Overview
|
|
|
|
The Mineru API provides endpoints for parsing documents (PDFs, images) using advanced OCR and layout analysis. It supports both pipeline and VLM backends for different use cases.
|
|
|
|
## Base URL
|
|
|
|
```
|
|
http://localhost:8000/api/v1/mineru
|
|
```
|
|
|
|
## Endpoints
|
|
|
|
### 1. Health Check
|
|
|
|
**GET** `/health`
|
|
|
|
Check if the Mineru service is running.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"service": "mineru"
|
|
}
|
|
```
|
|
|
|
### 2. Parse Document
|
|
|
|
**POST** `/parse`
|
|
|
|
Parse a document using Mineru's advanced parsing capabilities.
|
|
|
|
**Parameters:**
|
|
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `file` | File | Required | The document file to parse (PDF, PNG, JPEG, JPG) |
|
|
| `lang` | string | "ch" | Language option ('ch', 'en', 'korean', 'japan', etc.) |
|
|
| `backend` | string | "pipeline" | Backend for parsing ('pipeline', 'vlm-transformers', 'vlm-sglang-engine', 'vlm-sglang-client') |
|
|
| `method` | string | "auto" | Method for parsing ('auto', 'txt', 'ocr') |
|
|
| `server_url` | string | null | Server URL for vlm-sglang-client backend |
|
|
| `start_page_id` | int | 0 | Start page ID for parsing |
|
|
| `end_page_id` | int | null | End page ID for parsing |
|
|
| `formula_enable` | boolean | true | Enable formula parsing |
|
|
| `table_enable` | boolean | true | Enable table parsing |
|
|
| `draw_layout_bbox` | boolean | true | Whether to draw layout bounding boxes |
|
|
| `draw_span_bbox` | boolean | true | Whether to draw span bounding boxes |
|
|
| `dump_md` | boolean | true | Whether to dump markdown files |
|
|
| `dump_middle_json` | boolean | true | Whether to dump middle JSON files |
|
|
| `dump_model_output` | boolean | true | Whether to dump model output files |
|
|
| `dump_orig_pdf` | boolean | true | Whether to dump original PDF files |
|
|
| `dump_content_list` | boolean | true | Whether to dump content list files |
|
|
| `make_md_mode` | string | "MM_MD" | The mode for making markdown content |
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "success",
|
|
"file_name": "document_name",
|
|
"outputs": {
|
|
"markdown": "/path/to/document_name.md",
|
|
"middle_json": "/path/to/document_name_middle.json",
|
|
"model_output": "/path/to/document_name_model.json",
|
|
"content_list": "/path/to/document_name_content_list.json",
|
|
"original_pdf": "/path/to/document_name_origin.pdf",
|
|
"layout_pdf": "/path/to/document_name_layout.pdf",
|
|
"span_pdf": "/path/to/document_name_span.pdf"
|
|
},
|
|
"output_directory": "/path/to/output/directory"
|
|
}
|
|
```
|
|
|
|
### 3. Download Processed File
|
|
|
|
**GET** `/download/{file_path}`
|
|
|
|
Download a processed file from the Mineru output directory.
|
|
|
|
**Parameters:**
|
|
- `file_path`: Path to the file relative to the mineru output directory
|
|
|
|
**Response:** File download
|
|
|
|
## Usage Examples
|
|
|
|
### Python Example
|
|
|
|
```python
|
|
import requests
|
|
|
|
# Parse a document
|
|
with open('document.pdf', 'rb') as f:
|
|
files = {'file': ('document.pdf', f, 'application/pdf')}
|
|
params = {
|
|
'lang': 'ch',
|
|
'backend': 'pipeline',
|
|
'method': 'auto',
|
|
'formula_enable': True,
|
|
'table_enable': True
|
|
}
|
|
|
|
response = requests.post(
|
|
'http://localhost:8000/api/v1/mineru/parse',
|
|
files=files,
|
|
params=params
|
|
)
|
|
|
|
if response.status_code == 200:
|
|
result = response.json()
|
|
print(f"Parsed successfully: {result['file_name']}")
|
|
|
|
# Download the markdown file
|
|
md_path = result['outputs']['markdown']
|
|
download_response = requests.get(
|
|
f'http://localhost:8000/api/v1/mineru/download/{md_path}'
|
|
)
|
|
|
|
with open('output.md', 'wb') as f:
|
|
f.write(download_response.content)
|
|
```
|
|
|
|
### cURL Example
|
|
|
|
```bash
|
|
# Parse a document
|
|
curl -X POST "http://localhost:8000/api/v1/mineru/parse" \
|
|
-F "file=@document.pdf" \
|
|
-F "lang=ch" \
|
|
-F "backend=pipeline" \
|
|
-F "method=auto"
|
|
|
|
# Download a processed file
|
|
curl -X GET "http://localhost:8000/api/v1/mineru/download/path/to/file.md" \
|
|
-o downloaded_file.md
|
|
```
|
|
|
|
## Backend Options
|
|
|
|
### Pipeline Backend
|
|
- **Use case**: General purpose, more robust
|
|
- **Advantages**: Better for complex layouts, supports multiple languages
|
|
- **Command**: `backend=pipeline`
|
|
|
|
### VLM Backends
|
|
- **vlm-transformers**: General purpose VLM
|
|
- **vlm-sglang-engine**: Faster engine-based approach
|
|
- **vlm-sglang-client**: Fastest client-based approach (requires server_url)
|
|
|
|
## Language Support
|
|
|
|
Supported languages for the pipeline backend:
|
|
- `ch`: Chinese (Simplified)
|
|
- `en`: English
|
|
- `korean`: Korean
|
|
- `japan`: Japanese
|
|
- `chinese_cht`: Chinese (Traditional)
|
|
- `ta`: Tamil
|
|
- `te`: Telugu
|
|
- `ka`: Kannada
|
|
|
|
## Output Files
|
|
|
|
The API generates various output files depending on the parameters:
|
|
|
|
1. **Markdown** (`.md`): Structured text content
|
|
2. **Middle JSON** (`.json`): Intermediate parsing results
|
|
3. **Model Output** (`.json` or `.txt`): Raw model predictions
|
|
4. **Content List** (`.json`): Structured content list
|
|
5. **Original PDF**: Copy of the input file
|
|
6. **Layout PDF**: PDF with layout bounding boxes
|
|
7. **Span PDF**: PDF with span bounding boxes
|
|
|
|
## Error Handling
|
|
|
|
The API returns appropriate HTTP status codes:
|
|
|
|
- `200`: Success
|
|
- `400`: Bad request (invalid parameters, unsupported file type)
|
|
- `404`: File not found
|
|
- `500`: Internal server error
|
|
|
|
Error responses include a detail message explaining the issue.
|
|
|
|
## Testing
|
|
|
|
Use the provided test script to verify the API:
|
|
|
|
```bash
|
|
python test_mineru_api.py
|
|
```
|
|
|
|
## Notes
|
|
|
|
- The API creates unique output directories for each request to avoid conflicts
|
|
- Temporary files are automatically cleaned up after processing
|
|
- File downloads are restricted to the processed folder for security
|
|
- Large files may take time to process depending on the backend and document complexity |