Commit Graph

9 Commits

Author SHA1 Message Date
oliviamn 0f158c159b Enhance PDF content masking by introducing mapping prompts
- Added a new function `get_masking_mapping_prompt` to generate prompts for creating a mapping of original names/companies to their masked versions.
- Updated `PdfDocumentProcessor` to utilize the new mapping prompt, processing each sentence individually for improved content masking.
2025-05-08 00:04:50 +08:00
oliviamn 7d0be5aa8a 将题词抽象出来 2025-05-06 00:13:19 +08:00
oliviamn 815427a509 文件写入output folder的.work隐藏目录下 2025-05-05 23:34:10 +08:00
oliviamn e6fb9b9a83 调整目录结构 2025-05-05 20:33:08 +08:00
oliviamn edca9a87a0 Refactor PdfDocumentProcessor to enhance PDF content processing
- Updated read_content method to return raw bytes instead of extracted text.
- Modified process_content method to handle bytes and generate multiple output files including markdown, JSON, and processed PDFs.
- Implemented directory setup for image storage and output management.
- Integrated PymuDocDataset for PDF classification and processing based on OCR capabilities.
2025-05-05 19:15:03 +08:00
oliviamn 6acf3e5423 Update requirements.txt to upgrade requests and add magic-pdf dependency 2025-05-05 18:53:22 +08:00
tigermren 592fb66f40 Enhance document processing with Ollama integration and update .gitignore
- Added OllamaClient for document processing in TxtDocumentProcessor.
- Updated process_content method to use Ollama API for content masking.
- Refactored FileMonitor to utilize DocumentService with OllamaClient.
- Removed unnecessary log files and Python cache files.
- Added test file for document processing validation.
2025-04-23 01:09:33 +08:00
tigermren fc68c243bb add gitignore 2025-04-23 00:06:39 +08:00
tigermren 0904ab5073 Initial commit: Document processing app with Ollama integration 2025-04-23 00:02:10 +08:00