feat:新增magicdoc
This commit is contained in:
parent
a16b69475e
commit
0820d7bba2
|
|
@ -0,0 +1,42 @@
|
|||
FROM python:3.10-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies including LibreOffice
|
||||
RUN apt-get update && apt-get install -y \
|
||||
build-essential \
|
||||
libreoffice \
|
||||
libreoffice-writer \
|
||||
libreoffice-calc \
|
||||
libreoffice-impress \
|
||||
wget \
|
||||
curl \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
|
||||
# Copy requirements and install Python packages
|
||||
RUN pip install --upgrade pip
|
||||
RUN pip install uv
|
||||
|
||||
# Configure uv and install mineru
|
||||
ENV UV_SYSTEM_PYTHON=1
|
||||
RUN uv pip install --system -U "fairy-doc[cpu]"
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy the application code
|
||||
COPY app/ ./app/
|
||||
|
||||
# Create storage directories
|
||||
RUN mkdir -p storage/uploads storage/processed
|
||||
|
||||
# Expose the port the app runs on
|
||||
EXPOSE 8000
|
||||
|
||||
# Health check
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
|
||||
CMD curl -f http://localhost:8000/health || exit 1
|
||||
|
||||
# Command to run the application
|
||||
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
|
|
@ -0,0 +1,94 @@
|
|||
# MagicDoc API Service
|
||||
|
||||
A FastAPI service that provides document to markdown conversion using the Magic-Doc library. This service is designed to be compatible with the existing Mineru API interface.
|
||||
|
||||
## Features
|
||||
|
||||
- Converts DOC, DOCX, PPT, PPTX, and PDF files to markdown
|
||||
- RESTful API interface compatible with Mineru API
|
||||
- Docker containerization with LibreOffice dependencies
|
||||
- Health check endpoint
|
||||
- File upload support
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Health Check
|
||||
```
|
||||
GET /health
|
||||
```
|
||||
Returns service health status.
|
||||
|
||||
### File Parse
|
||||
```
|
||||
POST /file_parse
|
||||
```
|
||||
Converts uploaded document to markdown.
|
||||
|
||||
**Parameters:**
|
||||
- `files`: File upload (required)
|
||||
- `output_dir`: Output directory (default: "./output")
|
||||
- `lang_list`: Language list (default: "ch")
|
||||
- `backend`: Backend type (default: "pipeline")
|
||||
- `parse_method`: Parse method (default: "auto")
|
||||
- `formula_enable`: Enable formula processing (default: true)
|
||||
- `table_enable`: Enable table processing (default: true)
|
||||
- `return_md`: Return markdown (default: true)
|
||||
- `return_middle_json`: Return middle JSON (default: false)
|
||||
- `return_model_output`: Return model output (default: false)
|
||||
- `return_content_list`: Return content list (default: false)
|
||||
- `return_images`: Return images (default: false)
|
||||
- `start_page_id`: Start page ID (default: 0)
|
||||
- `end_page_id`: End page ID (default: 99999)
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"markdown": "converted markdown content",
|
||||
"md": "converted markdown content",
|
||||
"content": "converted markdown content",
|
||||
"text": "converted markdown content",
|
||||
"time_cost": 1.23,
|
||||
"filename": "document.docx",
|
||||
"status": "success"
|
||||
}
|
||||
```
|
||||
|
||||
## Running with Docker
|
||||
|
||||
### Build and run with docker-compose
|
||||
```bash
|
||||
cd magicdoc
|
||||
docker-compose up --build
|
||||
```
|
||||
|
||||
The service will be available at `http://localhost:8002`
|
||||
|
||||
### Build and run with Docker
|
||||
```bash
|
||||
cd magicdoc
|
||||
docker build -t magicdoc-api .
|
||||
docker run -p 8002:8000 magicdoc-api
|
||||
```
|
||||
|
||||
## Integration with Document Processors
|
||||
|
||||
This service is designed to be compatible with the existing document processors. To use it instead of Mineru API, update the configuration in your document processors:
|
||||
|
||||
```python
|
||||
# In docx_processor.py or pdf_processor.py
|
||||
self.magicdoc_base_url = getattr(settings, 'MAGICDOC_API_URL', 'http://magicdoc-api:8000')
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
- Python 3.10
|
||||
- LibreOffice (installed in Docker container)
|
||||
- Magic-Doc library
|
||||
- FastAPI
|
||||
- Uvicorn
|
||||
|
||||
## Storage
|
||||
|
||||
The service creates the following directories:
|
||||
- `storage/uploads/`: For uploaded files
|
||||
- `storage/processed/`: For processed files
|
||||
|
|
@ -0,0 +1,152 @@
|
|||
# MagicDoc Service Setup Guide
|
||||
|
||||
This guide explains how to set up and use the MagicDoc API service as an alternative to the Mineru API for document processing.
|
||||
|
||||
## Overview
|
||||
|
||||
The MagicDoc service provides a FastAPI-based REST API that converts various document formats (DOC, DOCX, PPT, PPTX, PDF) to markdown using the Magic-Doc library. It's designed to be compatible with your existing document processors.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Build and Run the Service
|
||||
|
||||
```bash
|
||||
cd magicdoc
|
||||
./start.sh
|
||||
```
|
||||
|
||||
Or manually:
|
||||
```bash
|
||||
cd magicdoc
|
||||
docker-compose up --build -d
|
||||
```
|
||||
|
||||
### 2. Verify the Service
|
||||
|
||||
```bash
|
||||
# Check health
|
||||
curl http://localhost:8002/health
|
||||
|
||||
# View API documentation
|
||||
open http://localhost:8002/docs
|
||||
```
|
||||
|
||||
### 3. Test with Sample Files
|
||||
|
||||
```bash
|
||||
cd magicdoc
|
||||
python test_api.py
|
||||
```
|
||||
|
||||
## API Compatibility
|
||||
|
||||
The MagicDoc API is designed to be compatible with your existing Mineru API interface:
|
||||
|
||||
### Endpoint: `POST /file_parse`
|
||||
|
||||
**Request Format:**
|
||||
- File upload via multipart form data
|
||||
- Same parameters as Mineru API (most are optional)
|
||||
|
||||
**Response Format:**
|
||||
```json
|
||||
{
|
||||
"markdown": "converted content",
|
||||
"md": "converted content",
|
||||
"content": "converted content",
|
||||
"text": "converted content",
|
||||
"time_cost": 1.23,
|
||||
"filename": "document.docx",
|
||||
"status": "success"
|
||||
}
|
||||
```
|
||||
|
||||
## Integration with Existing Processors
|
||||
|
||||
To use MagicDoc instead of Mineru in your existing processors:
|
||||
|
||||
### 1. Update Configuration
|
||||
|
||||
Add to your settings:
|
||||
```python
|
||||
MAGICDOC_API_URL = "http://magicdoc-api:8000" # or http://localhost:8002
|
||||
MAGICDOC_TIMEOUT = 300
|
||||
```
|
||||
|
||||
### 2. Modify Processors
|
||||
|
||||
Replace Mineru API calls with MagicDoc API calls. See `integration_example.py` for detailed examples.
|
||||
|
||||
### 3. Update Docker Compose
|
||||
|
||||
Add the MagicDoc service to your main docker-compose.yml:
|
||||
```yaml
|
||||
services:
|
||||
magicdoc-api:
|
||||
build:
|
||||
context: ./magicdoc
|
||||
dockerfile: Dockerfile
|
||||
ports:
|
||||
- "8002:8000"
|
||||
volumes:
|
||||
- ./magicdoc/storage:/app/storage
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
## Service Architecture
|
||||
|
||||
```
|
||||
magicdoc/
|
||||
├── app/
|
||||
│ ├── __init__.py
|
||||
│ └── main.py # FastAPI application
|
||||
├── Dockerfile # Container definition
|
||||
├── docker-compose.yml # Service orchestration
|
||||
├── requirements.txt # Python dependencies
|
||||
├── README.md # Service documentation
|
||||
├── SETUP.md # This setup guide
|
||||
├── test_api.py # API testing script
|
||||
├── integration_example.py # Integration examples
|
||||
└── start.sh # Startup script
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Python 3.10**: Base runtime
|
||||
- **LibreOffice**: Document processing (installed in container)
|
||||
- **Magic-Doc**: Document conversion library
|
||||
- **FastAPI**: Web framework
|
||||
- **Uvicorn**: ASGI server
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service Won't Start
|
||||
1. Check Docker is running
|
||||
2. Verify port 8002 is available
|
||||
3. Check logs: `docker-compose logs`
|
||||
|
||||
### File Conversion Fails
|
||||
1. Verify LibreOffice is working in container
|
||||
2. Check file format is supported
|
||||
3. Review API logs for errors
|
||||
|
||||
### Integration Issues
|
||||
1. Verify API endpoint URL
|
||||
2. Check network connectivity between services
|
||||
3. Ensure response format compatibility
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- MagicDoc is generally faster than Mineru for simple documents
|
||||
- LibreOffice dependency adds container size
|
||||
- Consider caching for repeated conversions
|
||||
- Monitor memory usage for large files
|
||||
|
||||
## Security Notes
|
||||
|
||||
- Service runs on internal network
|
||||
- File uploads are temporary
|
||||
- No persistent storage of uploaded files
|
||||
- Consider adding authentication for production use
|
||||
|
|
@ -0,0 +1 @@
|
|||
# MagicDoc FastAPI Application
|
||||
|
|
@ -0,0 +1,96 @@
|
|||
import os
|
||||
import logging
|
||||
from typing import Dict, Any, Optional
|
||||
from fastapi import FastAPI, File, UploadFile, Form, HTTPException
|
||||
from fastapi.responses import JSONResponse
|
||||
from magic_doc.docconv import DocConverter, S3Config
|
||||
import tempfile
|
||||
import shutil
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
app = FastAPI(title="MagicDoc API", version="1.0.0")
|
||||
|
||||
# Global converter instance
|
||||
converter = DocConverter(s3_config=None)
|
||||
|
||||
@app.get("/health")
|
||||
async def health_check():
|
||||
"""Health check endpoint"""
|
||||
return {"status": "healthy", "service": "magicdoc-api"}
|
||||
|
||||
@app.post("/file_parse")
|
||||
async def parse_file(
|
||||
files: UploadFile = File(...),
|
||||
output_dir: str = Form("./output"),
|
||||
lang_list: str = Form("ch"),
|
||||
backend: str = Form("pipeline"),
|
||||
parse_method: str = Form("auto"),
|
||||
formula_enable: bool = Form(True),
|
||||
table_enable: bool = Form(True),
|
||||
return_md: bool = Form(True),
|
||||
return_middle_json: bool = Form(False),
|
||||
return_model_output: bool = Form(False),
|
||||
return_content_list: bool = Form(False),
|
||||
return_images: bool = Form(False),
|
||||
start_page_id: int = Form(0),
|
||||
end_page_id: int = Form(99999)
|
||||
):
|
||||
"""
|
||||
Parse document file and convert to markdown
|
||||
Compatible with Mineru API interface
|
||||
"""
|
||||
try:
|
||||
logger.info(f"Processing file: {files.filename}")
|
||||
|
||||
# Create temporary file to save uploaded content
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(files.filename)[1]) as temp_file:
|
||||
shutil.copyfileobj(files.file, temp_file)
|
||||
temp_file_path = temp_file.name
|
||||
|
||||
try:
|
||||
# Convert file to markdown using magic-doc
|
||||
markdown_content, time_cost = converter.convert(temp_file_path, conv_timeout=300)
|
||||
|
||||
logger.info(f"Successfully converted {files.filename} to markdown in {time_cost:.2f}s")
|
||||
|
||||
# Return response compatible with Mineru API
|
||||
response = {
|
||||
"markdown": markdown_content,
|
||||
"md": markdown_content, # Alternative field name
|
||||
"content": markdown_content, # Alternative field name
|
||||
"text": markdown_content, # Alternative field name
|
||||
"time_cost": time_cost,
|
||||
"filename": files.filename,
|
||||
"status": "success"
|
||||
}
|
||||
|
||||
return JSONResponse(content=response)
|
||||
|
||||
finally:
|
||||
# Clean up temporary file
|
||||
if os.path.exists(temp_file_path):
|
||||
os.unlink(temp_file_path)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing file {files.filename}: {str(e)}")
|
||||
raise HTTPException(status_code=500, detail=f"Error processing file: {str(e)}")
|
||||
|
||||
@app.get("/")
|
||||
async def root():
|
||||
"""Root endpoint with service information"""
|
||||
return {
|
||||
"service": "MagicDoc API",
|
||||
"version": "1.0.0",
|
||||
"description": "Document to Markdown conversion service using Magic-Doc",
|
||||
"endpoints": {
|
||||
"health": "/health",
|
||||
"file_parse": "/file_parse"
|
||||
}
|
||||
}
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||
|
|
@ -0,0 +1,26 @@
|
|||
version: '3.8'
|
||||
|
||||
services:
|
||||
magicdoc-api:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
platform: linux/amd64
|
||||
ports:
|
||||
- "8002:8000"
|
||||
volumes:
|
||||
- ./storage/uploads:/app/storage/uploads
|
||||
- ./storage/processed:/app/storage/processed
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 60s
|
||||
|
||||
volumes:
|
||||
uploads:
|
||||
processed:
|
||||
|
|
@ -0,0 +1,144 @@
|
|||
"""
|
||||
Example of how to integrate MagicDoc API with existing document processors
|
||||
"""
|
||||
|
||||
# Example modification for docx_processor.py
|
||||
# Replace the Mineru API configuration with MagicDoc API configuration
|
||||
|
||||
class DocxDocumentProcessor(DocumentProcessor):
|
||||
def __init__(self, input_path: str, output_path: str):
|
||||
super().__init__()
|
||||
self.input_path = input_path
|
||||
self.output_path = output_path
|
||||
self.output_dir = os.path.dirname(output_path)
|
||||
self.name_without_suff = os.path.splitext(os.path.basename(input_path))[0]
|
||||
|
||||
# Setup work directory for temporary files
|
||||
self.work_dir = os.path.join(
|
||||
os.path.dirname(output_path),
|
||||
".work",
|
||||
os.path.splitext(os.path.basename(input_path))[0]
|
||||
)
|
||||
os.makedirs(self.work_dir, exist_ok=True)
|
||||
|
||||
self.ollama_client = OllamaClient(model_name=settings.OLLAMA_MODEL, base_url=settings.OLLAMA_API_URL)
|
||||
|
||||
# MagicDoc API configuration (instead of Mineru)
|
||||
self.magicdoc_base_url = getattr(settings, 'MAGICDOC_API_URL', 'http://magicdoc-api:8000')
|
||||
self.magicdoc_timeout = getattr(settings, 'MAGICDOC_TIMEOUT', 300) # 5 minutes timeout
|
||||
|
||||
def _call_magicdoc_api(self, file_path: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Call MagicDoc API to convert DOCX to markdown
|
||||
|
||||
Args:
|
||||
file_path: Path to the DOCX file
|
||||
|
||||
Returns:
|
||||
API response as dictionary or None if failed
|
||||
"""
|
||||
try:
|
||||
url = f"{self.magicdoc_base_url}/file_parse"
|
||||
|
||||
with open(file_path, 'rb') as file:
|
||||
files = {'files': (os.path.basename(file_path), file, 'application/vnd.openxmlformats-officedocument.wordprocessingml.document')}
|
||||
|
||||
# Prepare form data - simplified compared to Mineru
|
||||
data = {
|
||||
'output_dir': './output',
|
||||
'lang_list': 'ch',
|
||||
'backend': 'pipeline',
|
||||
'parse_method': 'auto',
|
||||
'formula_enable': True,
|
||||
'table_enable': True,
|
||||
'return_md': True,
|
||||
'return_middle_json': False,
|
||||
'return_model_output': False,
|
||||
'return_content_list': False,
|
||||
'return_images': False,
|
||||
'start_page_id': 0,
|
||||
'end_page_id': 99999
|
||||
}
|
||||
|
||||
logger.info(f"Calling MagicDoc API for DOCX processing at {url}")
|
||||
response = requests.post(
|
||||
url,
|
||||
files=files,
|
||||
data=data,
|
||||
timeout=self.magicdoc_timeout
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
logger.info("Successfully received response from MagicDoc API for DOCX")
|
||||
return result
|
||||
else:
|
||||
error_msg = f"MagicDoc API returned status code {response.status_code}: {response.text}"
|
||||
logger.error(error_msg)
|
||||
raise Exception(error_msg)
|
||||
|
||||
except requests.exceptions.Timeout:
|
||||
error_msg = f"MagicDoc API request timed out after {self.magicdoc_timeout} seconds"
|
||||
logger.error(error_msg)
|
||||
raise Exception(error_msg)
|
||||
except requests.exceptions.RequestException as e:
|
||||
error_msg = f"Error calling MagicDoc API for DOCX: {str(e)}"
|
||||
logger.error(error_msg)
|
||||
raise Exception(error_msg)
|
||||
except Exception as e:
|
||||
error_msg = f"Unexpected error calling MagicDoc API for DOCX: {str(e)}"
|
||||
logger.error(error_msg)
|
||||
raise Exception(error_msg)
|
||||
|
||||
def read_content(self) -> str:
|
||||
logger.info("Starting DOCX content processing with MagicDoc API")
|
||||
|
||||
# Call MagicDoc API to convert DOCX to markdown
|
||||
magicdoc_response = self._call_magicdoc_api(self.input_path)
|
||||
|
||||
# Extract markdown content from the response
|
||||
markdown_content = self._extract_markdown_from_response(magicdoc_response)
|
||||
|
||||
if not markdown_content:
|
||||
raise Exception("No markdown content found in MagicDoc API response for DOCX")
|
||||
|
||||
logger.info(f"Successfully extracted {len(markdown_content)} characters of markdown content from DOCX")
|
||||
|
||||
# Save the raw markdown content to work directory for reference
|
||||
md_output_path = os.path.join(self.work_dir, f"{self.name_without_suff}.md")
|
||||
with open(md_output_path, 'w', encoding='utf-8') as file:
|
||||
file.write(markdown_content)
|
||||
|
||||
logger.info(f"Saved raw markdown content from DOCX to {md_output_path}")
|
||||
|
||||
return markdown_content
|
||||
|
||||
# Configuration changes needed in settings.py:
|
||||
"""
|
||||
# Add these settings to your configuration
|
||||
MAGICDOC_API_URL = "http://magicdoc-api:8000" # or http://localhost:8002 for local development
|
||||
MAGICDOC_TIMEOUT = 300 # 5 minutes timeout
|
||||
"""
|
||||
|
||||
# Docker Compose integration:
|
||||
"""
|
||||
# Add to your main docker-compose.yml
|
||||
services:
|
||||
magicdoc-api:
|
||||
build:
|
||||
context: ./magicdoc
|
||||
dockerfile: Dockerfile
|
||||
ports:
|
||||
- "8002:8000"
|
||||
volumes:
|
||||
- ./magicdoc/storage:/app/storage
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 60s
|
||||
"""
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
fastapi==0.104.1
|
||||
uvicorn[standard]==0.24.0
|
||||
python-multipart==0.0.6
|
||||
# fairy-doc[cpu]==0.1.0
|
||||
pydantic==2.5.0
|
||||
|
|
@ -0,0 +1,34 @@
|
|||
#!/bin/bash
|
||||
|
||||
# MagicDoc API Service Startup Script
|
||||
|
||||
echo "Starting MagicDoc API Service..."
|
||||
|
||||
# Check if Docker is running
|
||||
if ! docker info > /dev/null 2>&1; then
|
||||
echo "Error: Docker is not running. Please start Docker first."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Build and start the service
|
||||
echo "Building and starting MagicDoc API service..."
|
||||
docker-compose up --build -d
|
||||
|
||||
# Wait for service to be ready
|
||||
echo "Waiting for service to be ready..."
|
||||
sleep 10
|
||||
|
||||
# Check health
|
||||
echo "Checking service health..."
|
||||
if curl -f http://localhost:8002/health > /dev/null 2>&1; then
|
||||
echo "✅ MagicDoc API service is running successfully!"
|
||||
echo "🌐 Service URL: http://localhost:8002"
|
||||
echo "📖 API Documentation: http://localhost:8002/docs"
|
||||
echo "🔍 Health Check: http://localhost:8002/health"
|
||||
else
|
||||
echo "❌ Service health check failed. Check logs with: docker-compose logs"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "To stop the service, run: docker-compose down"
|
||||
echo "To view logs, run: docker-compose logs -f"
|
||||
|
|
@ -0,0 +1,92 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test script for MagicDoc API
|
||||
"""
|
||||
|
||||
import requests
|
||||
import json
|
||||
import os
|
||||
|
||||
def test_health_check(base_url="http://localhost:8002"):
|
||||
"""Test health check endpoint"""
|
||||
try:
|
||||
response = requests.get(f"{base_url}/health")
|
||||
print(f"Health check status: {response.status_code}")
|
||||
print(f"Response: {response.json()}")
|
||||
return response.status_code == 200
|
||||
except Exception as e:
|
||||
print(f"Health check failed: {e}")
|
||||
return False
|
||||
|
||||
def test_file_parse(base_url="http://localhost:8002", file_path=None):
|
||||
"""Test file parse endpoint"""
|
||||
if not file_path or not os.path.exists(file_path):
|
||||
print(f"File not found: {file_path}")
|
||||
return False
|
||||
|
||||
try:
|
||||
with open(file_path, 'rb') as f:
|
||||
files = {'files': (os.path.basename(file_path), f, 'application/octet-stream')}
|
||||
data = {
|
||||
'output_dir': './output',
|
||||
'lang_list': 'ch',
|
||||
'backend': 'pipeline',
|
||||
'parse_method': 'auto',
|
||||
'formula_enable': True,
|
||||
'table_enable': True,
|
||||
'return_md': True,
|
||||
'return_middle_json': False,
|
||||
'return_model_output': False,
|
||||
'return_content_list': False,
|
||||
'return_images': False,
|
||||
'start_page_id': 0,
|
||||
'end_page_id': 99999
|
||||
}
|
||||
|
||||
response = requests.post(f"{base_url}/file_parse", files=files, data=data)
|
||||
print(f"File parse status: {response.status_code}")
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
print(f"Success! Converted {len(result.get('markdown', ''))} characters")
|
||||
print(f"Time cost: {result.get('time_cost', 'N/A')}s")
|
||||
return True
|
||||
else:
|
||||
print(f"Error: {response.text}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"File parse failed: {e}")
|
||||
return False
|
||||
|
||||
def main():
|
||||
"""Main test function"""
|
||||
print("Testing MagicDoc API...")
|
||||
|
||||
# Test health check
|
||||
print("\n1. Testing health check...")
|
||||
if not test_health_check():
|
||||
print("Health check failed. Make sure the service is running.")
|
||||
return
|
||||
|
||||
# Test file parse (if sample file exists)
|
||||
print("\n2. Testing file parse...")
|
||||
sample_files = [
|
||||
"../sample_doc/20220707_na_decision-2.docx",
|
||||
"../sample_doc/20220707_na_decision-2.pdf",
|
||||
"../sample_doc/short_doc.md"
|
||||
]
|
||||
|
||||
for sample_file in sample_files:
|
||||
if os.path.exists(sample_file):
|
||||
print(f"Testing with {sample_file}...")
|
||||
if test_file_parse(file_path=sample_file):
|
||||
print("File parse test passed!")
|
||||
break
|
||||
else:
|
||||
print(f"Sample file not found: {sample_file}")
|
||||
|
||||
print("\nTest completed!")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
Reference in New Issue