95 lines
2.5 KiB
Markdown
95 lines
2.5 KiB
Markdown
# MagicDoc API Service
|
|
|
|
A FastAPI service that provides document to markdown conversion using the Magic-Doc library. This service is designed to be compatible with the existing Mineru API interface.
|
|
|
|
## Features
|
|
|
|
- Converts DOC, DOCX, PPT, PPTX, and PDF files to markdown
|
|
- RESTful API interface compatible with Mineru API
|
|
- Docker containerization with LibreOffice dependencies
|
|
- Health check endpoint
|
|
- File upload support
|
|
|
|
## API Endpoints
|
|
|
|
### Health Check
|
|
```
|
|
GET /health
|
|
```
|
|
Returns service health status.
|
|
|
|
### File Parse
|
|
```
|
|
POST /file_parse
|
|
```
|
|
Converts uploaded document to markdown.
|
|
|
|
**Parameters:**
|
|
- `files`: File upload (required)
|
|
- `output_dir`: Output directory (default: "./output")
|
|
- `lang_list`: Language list (default: "ch")
|
|
- `backend`: Backend type (default: "pipeline")
|
|
- `parse_method`: Parse method (default: "auto")
|
|
- `formula_enable`: Enable formula processing (default: true)
|
|
- `table_enable`: Enable table processing (default: true)
|
|
- `return_md`: Return markdown (default: true)
|
|
- `return_middle_json`: Return middle JSON (default: false)
|
|
- `return_model_output`: Return model output (default: false)
|
|
- `return_content_list`: Return content list (default: false)
|
|
- `return_images`: Return images (default: false)
|
|
- `start_page_id`: Start page ID (default: 0)
|
|
- `end_page_id`: End page ID (default: 99999)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"markdown": "converted markdown content",
|
|
"md": "converted markdown content",
|
|
"content": "converted markdown content",
|
|
"text": "converted markdown content",
|
|
"time_cost": 1.23,
|
|
"filename": "document.docx",
|
|
"status": "success"
|
|
}
|
|
```
|
|
|
|
## Running with Docker
|
|
|
|
### Build and run with docker-compose
|
|
```bash
|
|
cd magicdoc
|
|
docker-compose up --build
|
|
```
|
|
|
|
The service will be available at `http://localhost:8002`
|
|
|
|
### Build and run with Docker
|
|
```bash
|
|
cd magicdoc
|
|
docker build -t magicdoc-api .
|
|
docker run -p 8002:8000 magicdoc-api
|
|
```
|
|
|
|
## Integration with Document Processors
|
|
|
|
This service is designed to be compatible with the existing document processors. To use it instead of Mineru API, update the configuration in your document processors:
|
|
|
|
```python
|
|
# In docx_processor.py or pdf_processor.py
|
|
self.magicdoc_base_url = getattr(settings, 'MAGICDOC_API_URL', 'http://magicdoc-api:8000')
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
- Python 3.10
|
|
- LibreOffice (installed in Docker container)
|
|
- Magic-Doc library
|
|
- FastAPI
|
|
- Uvicorn
|
|
|
|
## Storage
|
|
|
|
The service creates the following directories:
|
|
- `storage/uploads/`: For uploaded files
|
|
- `storage/processed/`: For processed files
|