2.5 KiB
2.5 KiB
MagicDoc API Service
A FastAPI service that provides document to markdown conversion using the Magic-Doc library. This service is designed to be compatible with the existing Mineru API interface.
Features
- Converts DOC, DOCX, PPT, PPTX, and PDF files to markdown
- RESTful API interface compatible with Mineru API
- Docker containerization with LibreOffice dependencies
- Health check endpoint
- File upload support
API Endpoints
Health Check
GET /health
Returns service health status.
File Parse
POST /file_parse
Converts uploaded document to markdown.
Parameters:
files: File upload (required)output_dir: Output directory (default: "./output")lang_list: Language list (default: "ch")backend: Backend type (default: "pipeline")parse_method: Parse method (default: "auto")formula_enable: Enable formula processing (default: true)table_enable: Enable table processing (default: true)return_md: Return markdown (default: true)return_middle_json: Return middle JSON (default: false)return_model_output: Return model output (default: false)return_content_list: Return content list (default: false)return_images: Return images (default: false)start_page_id: Start page ID (default: 0)end_page_id: End page ID (default: 99999)
Response:
{
"markdown": "converted markdown content",
"md": "converted markdown content",
"content": "converted markdown content",
"text": "converted markdown content",
"time_cost": 1.23,
"filename": "document.docx",
"status": "success"
}
Running with Docker
Build and run with docker-compose
cd magicdoc
docker-compose up --build
The service will be available at http://localhost:8002
Build and run with Docker
cd magicdoc
docker build -t magicdoc-api .
docker run -p 8002:8000 magicdoc-api
Integration with Document Processors
This service is designed to be compatible with the existing document processors. To use it instead of Mineru API, update the configuration in your document processors:
# In docx_processor.py or pdf_processor.py
self.magicdoc_base_url = getattr(settings, 'MAGICDOC_API_URL', 'http://magicdoc-api:8000')
Dependencies
- Python 3.10
- LibreOffice (installed in Docker container)
- Magic-Doc library
- FastAPI
- Uvicorn
Storage
The service creates the following directories:
storage/uploads/: For uploaded filesstorage/processed/: For processed files