legal-doc-masker

Commit Graph

Author	SHA1	Message	Date
oliviamn	b3cf9f98a7	refine	2025-05-25 16:45:48 +08:00
oliviamn	24c5bbd5d7	refine: 删除文档数据文件夹，用sample_doc取代	2025-05-25 16:43:32 +08:00
oliviamn	13ef24a3da	feat：增加前端	2025-05-25 00:37:20 +08:00
oliviamn	900a614b09	refine: 解决了导入路径的问题	2025-05-25 00:04:19 +08:00
oliviamn	3e9c44e8c4	refine: 将原src的内容复制到backend/app/core	2025-05-24 23:28:33 +08:00
oliviamn	e0695e7f0e	refine: src rename to core	2025-05-24 22:13:20 +08:00
oliviamn	76b0351f8f	feat: 增加backend	2025-05-24 22:06:28 +08:00
oliviamn	47e78c35bb	Add Markdown document processing support and enhance document handling - Introduced `MarkdownDocumentProcessor` for handling markdown files, including reading and saving content. - Updated `DocumentProcessorFactory` to include support for markdown file types. - Enhanced existing document processors to utilize a shared initialization method for OllamaClient. - Implemented chunking and mapping logic in `DocumentProcessor` for improved content processing and masking. - Added utility class `LLMJsonExtractor` for extracting and parsing JSON from LLM outputs.	2025-05-24 21:05:48 +08:00
oliviamn	caa4d6d2ef	Update README.md to clarify installation steps and add LibreOffice dependency	2025-05-24 14:55:04 +08:00
oliviamn	5abfa4998d	实现docx转md	2025-05-21 00:15:01 +08:00
oliviamn	0f158c159b	Enhance PDF content masking by introducing mapping prompts - Added a new function `get_masking_mapping_prompt` to generate prompts for creating a mapping of original names/companies to their masked versions. - Updated `PdfDocumentProcessor` to utilize the new mapping prompt, processing each sentence individually for improved content masking.	2025-05-08 00:04:50 +08:00
oliviamn	7d0be5aa8a	将题词抽象出来	2025-05-06 00:13:19 +08:00
oliviamn	815427a509	文件写入output folder的.work隐藏目录下	2025-05-05 23:34:10 +08:00
oliviamn	e6fb9b9a83	调整目录结构	2025-05-05 20:33:08 +08:00
oliviamn	edca9a87a0	Refactor PdfDocumentProcessor to enhance PDF content processing - Updated read_content method to return raw bytes instead of extracted text. - Modified process_content method to handle bytes and generate multiple output files including markdown, JSON, and processed PDFs. - Implemented directory setup for image storage and output management. - Integrated PymuDocDataset for PDF classification and processing based on OCR capabilities.	2025-05-05 19:15:03 +08:00
oliviamn	6acf3e5423	Update requirements.txt to upgrade requests and add magic-pdf dependency	2025-05-05 18:53:22 +08:00
tigermren	592fb66f40	Enhance document processing with Ollama integration and update .gitignore - Added OllamaClient for document processing in TxtDocumentProcessor. - Updated process_content method to use Ollama API for content masking. - Refactored FileMonitor to utilize DocumentService with OllamaClient. - Removed unnecessary log files and Python cache files. - Added test file for document processing validation.	2025-04-23 01:09:33 +08:00
tigermren	fc68c243bb	add gitignore	2025-04-23 00:06:39 +08:00
tigermren	0904ab5073	Initial commit: Document processing app with Ollama integration	2025-04-23 00:02:10 +08:00

19 Commits All Branches Search

19 Commits

All Branches