# MagicDoc Service Setup Guide This guide explains how to set up and use the MagicDoc API service as an alternative to the Mineru API for document processing. ## Overview The MagicDoc service provides a FastAPI-based REST API that converts various document formats (DOC, DOCX, PPT, PPTX, PDF) to markdown using the Magic-Doc library. It's designed to be compatible with your existing document processors. ## Quick Start ### 1. Build and Run the Service ```bash cd magicdoc ./start.sh ``` Or manually: ```bash cd magicdoc docker-compose up --build -d ``` ### 2. Verify the Service ```bash # Check health curl http://localhost:8002/health # View API documentation open http://localhost:8002/docs ``` ### 3. Test with Sample Files ```bash cd magicdoc python test_api.py ``` ## API Compatibility The MagicDoc API is designed to be compatible with your existing Mineru API interface: ### Endpoint: `POST /file_parse` **Request Format:** - File upload via multipart form data - Same parameters as Mineru API (most are optional) **Response Format:** ```json { "markdown": "converted content", "md": "converted content", "content": "converted content", "text": "converted content", "time_cost": 1.23, "filename": "document.docx", "status": "success" } ``` ## Integration with Existing Processors To use MagicDoc instead of Mineru in your existing processors: ### 1. Update Configuration Add to your settings: ```python MAGICDOC_API_URL = "http://magicdoc-api:8000" # or http://localhost:8002 MAGICDOC_TIMEOUT = 300 ``` ### 2. Modify Processors Replace Mineru API calls with MagicDoc API calls. See `integration_example.py` for detailed examples. ### 3. Update Docker Compose Add the MagicDoc service to your main docker-compose.yml: ```yaml services: magicdoc-api: build: context: ./magicdoc dockerfile: Dockerfile ports: - "8002:8000" volumes: - ./magicdoc/storage:/app/storage environment: - PYTHONUNBUFFERED=1 restart: unless-stopped ``` ## Service Architecture ``` magicdoc/ ├── app/ │ ├── __init__.py │ └── main.py # FastAPI application ├── Dockerfile # Container definition ├── docker-compose.yml # Service orchestration ├── requirements.txt # Python dependencies ├── README.md # Service documentation ├── SETUP.md # This setup guide ├── test_api.py # API testing script ├── integration_example.py # Integration examples └── start.sh # Startup script ``` ## Dependencies - **Python 3.10**: Base runtime - **LibreOffice**: Document processing (installed in container) - **Magic-Doc**: Document conversion library - **FastAPI**: Web framework - **Uvicorn**: ASGI server ## Troubleshooting ### Service Won't Start 1. Check Docker is running 2. Verify port 8002 is available 3. Check logs: `docker-compose logs` ### File Conversion Fails 1. Verify LibreOffice is working in container 2. Check file format is supported 3. Review API logs for errors ### Integration Issues 1. Verify API endpoint URL 2. Check network connectivity between services 3. Ensure response format compatibility ## Performance Considerations - MagicDoc is generally faster than Mineru for simple documents - LibreOffice dependency adds container size - Consider caching for repeated conversions - Monitor memory usage for large files ## Security Notes - Service runs on internal network - File uploads are temporary - No persistent storage of uploaded files - Consider adding authentication for production use