legal-doc-masker/magicdoc/SETUP.md

3.6 KiB

MagicDoc Service Setup Guide

This guide explains how to set up and use the MagicDoc API service as an alternative to the Mineru API for document processing.

Overview

The MagicDoc service provides a FastAPI-based REST API that converts various document formats (DOC, DOCX, PPT, PPTX, PDF) to markdown using the Magic-Doc library. It's designed to be compatible with your existing document processors.

Quick Start

1. Build and Run the Service

cd magicdoc
./start.sh

Or manually:

cd magicdoc
docker-compose up --build -d

2. Verify the Service

# Check health
curl http://localhost:8002/health

# View API documentation
open http://localhost:8002/docs

3. Test with Sample Files

cd magicdoc
python test_api.py

API Compatibility

The MagicDoc API is designed to be compatible with your existing Mineru API interface:

Endpoint: POST /file_parse

Request Format:

  • File upload via multipart form data
  • Same parameters as Mineru API (most are optional)

Response Format:

{
  "markdown": "converted content",
  "md": "converted content", 
  "content": "converted content",
  "text": "converted content",
  "time_cost": 1.23,
  "filename": "document.docx",
  "status": "success"
}

Integration with Existing Processors

To use MagicDoc instead of Mineru in your existing processors:

1. Update Configuration

Add to your settings:

MAGICDOC_API_URL = "http://magicdoc-api:8000"  # or http://localhost:8002
MAGICDOC_TIMEOUT = 300

2. Modify Processors

Replace Mineru API calls with MagicDoc API calls. See integration_example.py for detailed examples.

3. Update Docker Compose

Add the MagicDoc service to your main docker-compose.yml:

services:
  magicdoc-api:
    build:
      context: ./magicdoc
      dockerfile: Dockerfile
    ports:
      - "8002:8000"
    volumes:
      - ./magicdoc/storage:/app/storage
    environment:
      - PYTHONUNBUFFERED=1
    restart: unless-stopped

Service Architecture

magicdoc/
├── app/
│   ├── __init__.py
│   └── main.py              # FastAPI application
├── Dockerfile               # Container definition
├── docker-compose.yml       # Service orchestration
├── requirements.txt         # Python dependencies
├── README.md               # Service documentation
├── SETUP.md                # This setup guide
├── test_api.py             # API testing script
├── integration_example.py  # Integration examples
└── start.sh                # Startup script

Dependencies

  • Python 3.10: Base runtime
  • LibreOffice: Document processing (installed in container)
  • Magic-Doc: Document conversion library
  • FastAPI: Web framework
  • Uvicorn: ASGI server

Troubleshooting

Service Won't Start

  1. Check Docker is running
  2. Verify port 8002 is available
  3. Check logs: docker-compose logs

File Conversion Fails

  1. Verify LibreOffice is working in container
  2. Check file format is supported
  3. Review API logs for errors

Integration Issues

  1. Verify API endpoint URL
  2. Check network connectivity between services
  3. Ensure response format compatibility

Performance Considerations

  • MagicDoc is generally faster than Mineru for simple documents
  • LibreOffice dependency adds container size
  • Consider caching for repeated conversions
  • Monitor memory usage for large files

Security Notes

  • Service runs on internal network
  • File uploads are temporary
  • No persistent storage of uploaded files
  • Consider adding authentication for production use