3.6 KiB
3.6 KiB
MagicDoc Service Setup Guide
This guide explains how to set up and use the MagicDoc API service as an alternative to the Mineru API for document processing.
Overview
The MagicDoc service provides a FastAPI-based REST API that converts various document formats (DOC, DOCX, PPT, PPTX, PDF) to markdown using the Magic-Doc library. It's designed to be compatible with your existing document processors.
Quick Start
1. Build and Run the Service
cd magicdoc
./start.sh
Or manually:
cd magicdoc
docker-compose up --build -d
2. Verify the Service
# Check health
curl http://localhost:8002/health
# View API documentation
open http://localhost:8002/docs
3. Test with Sample Files
cd magicdoc
python test_api.py
API Compatibility
The MagicDoc API is designed to be compatible with your existing Mineru API interface:
Endpoint: POST /file_parse
Request Format:
- File upload via multipart form data
- Same parameters as Mineru API (most are optional)
Response Format:
{
"markdown": "converted content",
"md": "converted content",
"content": "converted content",
"text": "converted content",
"time_cost": 1.23,
"filename": "document.docx",
"status": "success"
}
Integration with Existing Processors
To use MagicDoc instead of Mineru in your existing processors:
1. Update Configuration
Add to your settings:
MAGICDOC_API_URL = "http://magicdoc-api:8000" # or http://localhost:8002
MAGICDOC_TIMEOUT = 300
2. Modify Processors
Replace Mineru API calls with MagicDoc API calls. See integration_example.py for detailed examples.
3. Update Docker Compose
Add the MagicDoc service to your main docker-compose.yml:
services:
magicdoc-api:
build:
context: ./magicdoc
dockerfile: Dockerfile
ports:
- "8002:8000"
volumes:
- ./magicdoc/storage:/app/storage
environment:
- PYTHONUNBUFFERED=1
restart: unless-stopped
Service Architecture
magicdoc/
├── app/
│ ├── __init__.py
│ └── main.py # FastAPI application
├── Dockerfile # Container definition
├── docker-compose.yml # Service orchestration
├── requirements.txt # Python dependencies
├── README.md # Service documentation
├── SETUP.md # This setup guide
├── test_api.py # API testing script
├── integration_example.py # Integration examples
└── start.sh # Startup script
Dependencies
- Python 3.10: Base runtime
- LibreOffice: Document processing (installed in container)
- Magic-Doc: Document conversion library
- FastAPI: Web framework
- Uvicorn: ASGI server
Troubleshooting
Service Won't Start
- Check Docker is running
- Verify port 8002 is available
- Check logs:
docker-compose logs
File Conversion Fails
- Verify LibreOffice is working in container
- Check file format is supported
- Review API logs for errors
Integration Issues
- Verify API endpoint URL
- Check network connectivity between services
- Ensure response format compatibility
Performance Considerations
- MagicDoc is generally faster than Mineru for simple documents
- LibreOffice dependency adds container size
- Consider caching for repeated conversions
- Monitor memory usage for large files
Security Notes
- Service runs on internal network
- File uploads are temporary
- No persistent storage of uploaded files
- Consider adding authentication for production use