Go to file
tigermren 2075218955 feat: 正式fully支持docx 2025-08-18 01:15:40 +08:00
backend feat: 正式fully支持docx 2025-08-18 01:15:40 +08:00
frontend feat: 增加错误信息显示 2025-08-17 23:26:59 +08:00
magicdoc fix: 解决magic-doc包的问题 2025-08-18 01:01:58 +08:00
mineru Initial commit 2025-07-20 21:54:24 +08:00
sample_doc Initial commit 2025-07-20 21:54:24 +08:00
.dockerignore Initial commit 2025-07-20 21:54:24 +08:00
.gitignore Initial commit 2025-07-20 21:54:24 +08:00
DOCKER_COMPOSE_README.md Initial commit 2025-07-20 21:54:24 +08:00
DOCKER_MIGRATION_GUIDE.md Initial commit 2025-07-20 21:54:24 +08:00
MIGRATION_QUICK_REFERENCE.md Initial commit 2025-07-20 21:54:24 +08:00
README.md Initial commit 2025-07-20 21:54:24 +08:00
docker-compose.yml feat: 正式fully支持docx 2025-08-18 01:15:40 +08:00
export-images.sh Initial commit 2025-07-20 21:54:24 +08:00
import-images.sh Initial commit 2025-07-20 21:54:24 +08:00
setup-unified-docker.sh Initial commit 2025-07-20 21:54:24 +08:00

README.md

README.md

Document Processing App

This project is designed to process legal documents by hiding sensitive information such as names and company names. It utilizes the Ollama API with selected models for text processing. The application monitors a specified directory for new files, processes them automatically, and saves the results to a target path.

Project Structure

doc-processing-app
├── src
│   ├── main.py               # Entry point of the application
│   ├── config
│   │   └── settings.py       # Configuration settings for paths
│   ├── services
│   │   ├── file_monitor.py    # Monitors directory for new files
│   │   ├── document_processor.py # Handles document processing logic
│   │   └── ollama_client.py   # Interacts with the Ollama API
│   ├── utils
│   │   └── file_utils.py      # Utility functions for file operations
│   └── models
│       └── document.py        # Represents the structure of a document
├── tests
│   └── test_document_processor.py # Unit tests for DocumentProcessor
├── requirements.txt           # Project dependencies
├── .env.example               # Example environment variables
└── README.md                  # Project documentation

Setup Instructions

  1. Clone the repository:

    git clone <repository-url>
    cd doc-processing-app
    
  2. Install LibreOffice (required for document processing):

    brew install libreoffice
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    pip install -U magic-pdf[full]
    
  4. Configure the application by editing the src/config/settings.py file to set the paths for the object storage and target directory.

  5. Create a .env file based on the .env.example file to set up necessary environment variables.

Usage

To run the application, execute the following command:

python src/main.py

The application will start monitoring the specified directory for new documents. Once a new document is added, it will be processed automatically.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.