153 lines
3.6 KiB
Markdown
153 lines
3.6 KiB
Markdown
# MagicDoc Service Setup Guide
|
|
|
|
This guide explains how to set up and use the MagicDoc API service as an alternative to the Mineru API for document processing.
|
|
|
|
## Overview
|
|
|
|
The MagicDoc service provides a FastAPI-based REST API that converts various document formats (DOC, DOCX, PPT, PPTX, PDF) to markdown using the Magic-Doc library. It's designed to be compatible with your existing document processors.
|
|
|
|
## Quick Start
|
|
|
|
### 1. Build and Run the Service
|
|
|
|
```bash
|
|
cd magicdoc
|
|
./start.sh
|
|
```
|
|
|
|
Or manually:
|
|
```bash
|
|
cd magicdoc
|
|
docker-compose up --build -d
|
|
```
|
|
|
|
### 2. Verify the Service
|
|
|
|
```bash
|
|
# Check health
|
|
curl http://localhost:8002/health
|
|
|
|
# View API documentation
|
|
open http://localhost:8002/docs
|
|
```
|
|
|
|
### 3. Test with Sample Files
|
|
|
|
```bash
|
|
cd magicdoc
|
|
python test_api.py
|
|
```
|
|
|
|
## API Compatibility
|
|
|
|
The MagicDoc API is designed to be compatible with your existing Mineru API interface:
|
|
|
|
### Endpoint: `POST /file_parse`
|
|
|
|
**Request Format:**
|
|
- File upload via multipart form data
|
|
- Same parameters as Mineru API (most are optional)
|
|
|
|
**Response Format:**
|
|
```json
|
|
{
|
|
"markdown": "converted content",
|
|
"md": "converted content",
|
|
"content": "converted content",
|
|
"text": "converted content",
|
|
"time_cost": 1.23,
|
|
"filename": "document.docx",
|
|
"status": "success"
|
|
}
|
|
```
|
|
|
|
## Integration with Existing Processors
|
|
|
|
To use MagicDoc instead of Mineru in your existing processors:
|
|
|
|
### 1. Update Configuration
|
|
|
|
Add to your settings:
|
|
```python
|
|
MAGICDOC_API_URL = "http://magicdoc-api:8000" # or http://localhost:8002
|
|
MAGICDOC_TIMEOUT = 300
|
|
```
|
|
|
|
### 2. Modify Processors
|
|
|
|
Replace Mineru API calls with MagicDoc API calls. See `integration_example.py` for detailed examples.
|
|
|
|
### 3. Update Docker Compose
|
|
|
|
Add the MagicDoc service to your main docker-compose.yml:
|
|
```yaml
|
|
services:
|
|
magicdoc-api:
|
|
build:
|
|
context: ./magicdoc
|
|
dockerfile: Dockerfile
|
|
ports:
|
|
- "8002:8000"
|
|
volumes:
|
|
- ./magicdoc/storage:/app/storage
|
|
environment:
|
|
- PYTHONUNBUFFERED=1
|
|
restart: unless-stopped
|
|
```
|
|
|
|
## Service Architecture
|
|
|
|
```
|
|
magicdoc/
|
|
├── app/
|
|
│ ├── __init__.py
|
|
│ └── main.py # FastAPI application
|
|
├── Dockerfile # Container definition
|
|
├── docker-compose.yml # Service orchestration
|
|
├── requirements.txt # Python dependencies
|
|
├── README.md # Service documentation
|
|
├── SETUP.md # This setup guide
|
|
├── test_api.py # API testing script
|
|
├── integration_example.py # Integration examples
|
|
└── start.sh # Startup script
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
- **Python 3.10**: Base runtime
|
|
- **LibreOffice**: Document processing (installed in container)
|
|
- **Magic-Doc**: Document conversion library
|
|
- **FastAPI**: Web framework
|
|
- **Uvicorn**: ASGI server
|
|
|
|
## Troubleshooting
|
|
|
|
### Service Won't Start
|
|
1. Check Docker is running
|
|
2. Verify port 8002 is available
|
|
3. Check logs: `docker-compose logs`
|
|
|
|
### File Conversion Fails
|
|
1. Verify LibreOffice is working in container
|
|
2. Check file format is supported
|
|
3. Review API logs for errors
|
|
|
|
### Integration Issues
|
|
1. Verify API endpoint URL
|
|
2. Check network connectivity between services
|
|
3. Ensure response format compatibility
|
|
|
|
## Performance Considerations
|
|
|
|
- MagicDoc is generally faster than Mineru for simple documents
|
|
- LibreOffice dependency adds container size
|
|
- Consider caching for repeated conversions
|
|
- Monitor memory usage for large files
|
|
|
|
## Security Notes
|
|
|
|
- Service runs on internal network
|
|
- File uploads are temporary
|
|
- No persistent storage of uploaded files
|
|
- Consider adding authentication for production use
|