Commit Graph

2 Commits

Author SHA1 Message Date
oliviamn edca9a87a0 Refactor PdfDocumentProcessor to enhance PDF content processing
- Updated read_content method to return raw bytes instead of extracted text.
- Modified process_content method to handle bytes and generate multiple output files including markdown, JSON, and processed PDFs.
- Implemented directory setup for image storage and output management.
- Integrated PymuDocDataset for PDF classification and processing based on OCR capabilities.
2025-05-05 19:15:03 +08:00
tigermren 0904ab5073 Initial commit: Document processing app with Ollama integration 2025-04-23 00:02:10 +08:00