legal-doc-masker

Commit Graph

Author	SHA1	Message	Date
oliviamn	0f158c159b	Enhance PDF content masking by introducing mapping prompts - Added a new function `get_masking_mapping_prompt` to generate prompts for creating a mapping of original names/companies to their masked versions. - Updated `PdfDocumentProcessor` to utilize the new mapping prompt, processing each sentence individually for improved content masking.	2025-05-08 00:04:50 +08:00
oliviamn	7d0be5aa8a	将题词抽象出来	2025-05-06 00:13:19 +08:00
oliviamn	815427a509	文件写入output folder的.work隐藏目录下	2025-05-05 23:34:10 +08:00
oliviamn	e6fb9b9a83	调整目录结构	2025-05-05 20:33:08 +08:00
oliviamn	edca9a87a0	Refactor PdfDocumentProcessor to enhance PDF content processing - Updated read_content method to return raw bytes instead of extracted text. - Modified process_content method to handle bytes and generate multiple output files including markdown, JSON, and processed PDFs. - Implemented directory setup for image storage and output management. - Integrated PymuDocDataset for PDF classification and processing based on OCR capabilities.	2025-05-05 19:15:03 +08:00
oliviamn	6acf3e5423	Update requirements.txt to upgrade requests and add magic-pdf dependency	2025-05-05 18:53:22 +08:00
tigermren	592fb66f40	Enhance document processing with Ollama integration and update .gitignore - Added OllamaClient for document processing in TxtDocumentProcessor. - Updated process_content method to use Ollama API for content masking. - Refactored FileMonitor to utilize DocumentService with OllamaClient. - Removed unnecessary log files and Python cache files. - Added test file for document processing validation.	2025-04-23 01:09:33 +08:00
tigermren	fc68c243bb	add gitignore	2025-04-23 00:06:39 +08:00
tigermren	0904ab5073	Initial commit: Document processing app with Ollama integration	2025-04-23 00:02:10 +08:00

9 Commits All Branches Search

9 Commits

All Branches