# MagicDoc API Service A FastAPI service that provides document to markdown conversion using the Magic-Doc library. This service is designed to be compatible with the existing Mineru API interface. ## Features - Converts DOC, DOCX, PPT, PPTX, and PDF files to markdown - RESTful API interface compatible with Mineru API - Docker containerization with LibreOffice dependencies - Health check endpoint - File upload support ## API Endpoints ### Health Check ``` GET /health ``` Returns service health status. ### File Parse ``` POST /file_parse ``` Converts uploaded document to markdown. **Parameters:** - `files`: File upload (required) - `output_dir`: Output directory (default: "./output") - `lang_list`: Language list (default: "ch") - `backend`: Backend type (default: "pipeline") - `parse_method`: Parse method (default: "auto") - `formula_enable`: Enable formula processing (default: true) - `table_enable`: Enable table processing (default: true) - `return_md`: Return markdown (default: true) - `return_middle_json`: Return middle JSON (default: false) - `return_model_output`: Return model output (default: false) - `return_content_list`: Return content list (default: false) - `return_images`: Return images (default: false) - `start_page_id`: Start page ID (default: 0) - `end_page_id`: End page ID (default: 99999) **Response:** ```json { "markdown": "converted markdown content", "md": "converted markdown content", "content": "converted markdown content", "text": "converted markdown content", "time_cost": 1.23, "filename": "document.docx", "status": "success" } ``` ## Running with Docker ### Build and run with docker-compose ```bash cd magicdoc docker-compose up --build ``` The service will be available at `http://localhost:8002` ### Build and run with Docker ```bash cd magicdoc docker build -t magicdoc-api . docker run -p 8002:8000 magicdoc-api ``` ## Integration with Document Processors This service is designed to be compatible with the existing document processors. To use it instead of Mineru API, update the configuration in your document processors: ```python # In docx_processor.py or pdf_processor.py self.magicdoc_base_url = getattr(settings, 'MAGICDOC_API_URL', 'http://magicdoc-api:8000') ``` ## Dependencies - Python 3.10 - LibreOffice (installed in Docker container) - Magic-Doc library - FastAPI - Uvicorn ## Storage The service creates the following directories: - `storage/uploads/`: For uploaded files - `storage/processed/`: For processed files