legal-doc-masker/magicdoc/README.md

95 lines
2.5 KiB
Markdown

# MagicDoc API Service
A FastAPI service that provides document to markdown conversion using the Magic-Doc library. This service is designed to be compatible with the existing Mineru API interface.
## Features
- Converts DOC, DOCX, PPT, PPTX, and PDF files to markdown
- RESTful API interface compatible with Mineru API
- Docker containerization with LibreOffice dependencies
- Health check endpoint
- File upload support
## API Endpoints
### Health Check
```
GET /health
```
Returns service health status.
### File Parse
```
POST /file_parse
```
Converts uploaded document to markdown.
**Parameters:**
- `files`: File upload (required)
- `output_dir`: Output directory (default: "./output")
- `lang_list`: Language list (default: "ch")
- `backend`: Backend type (default: "pipeline")
- `parse_method`: Parse method (default: "auto")
- `formula_enable`: Enable formula processing (default: true)
- `table_enable`: Enable table processing (default: true)
- `return_md`: Return markdown (default: true)
- `return_middle_json`: Return middle JSON (default: false)
- `return_model_output`: Return model output (default: false)
- `return_content_list`: Return content list (default: false)
- `return_images`: Return images (default: false)
- `start_page_id`: Start page ID (default: 0)
- `end_page_id`: End page ID (default: 99999)
**Response:**
```json
{
"markdown": "converted markdown content",
"md": "converted markdown content",
"content": "converted markdown content",
"text": "converted markdown content",
"time_cost": 1.23,
"filename": "document.docx",
"status": "success"
}
```
## Running with Docker
### Build and run with docker-compose
```bash
cd magicdoc
docker-compose up --build
```
The service will be available at `http://localhost:8002`
### Build and run with Docker
```bash
cd magicdoc
docker build -t magicdoc-api .
docker run -p 8002:8000 magicdoc-api
```
## Integration with Document Processors
This service is designed to be compatible with the existing document processors. To use it instead of Mineru API, update the configuration in your document processors:
```python
# In docx_processor.py or pdf_processor.py
self.magicdoc_base_url = getattr(settings, 'MAGICDOC_API_URL', 'http://magicdoc-api:8000')
```
## Dependencies
- Python 3.10
- LibreOffice (installed in Docker container)
- Magic-Doc library
- FastAPI
- Uvicorn
## Storage
The service creates the following directories:
- `storage/uploads/`: For uploaded files
- `storage/processed/`: For processed files