dev #2

Merged
tigeren merged 21 commits from dev into main 2025-08-20 02:20:42 +00:00

21 Commits

Author SHA1 Message Date
tigermren 84f9ef2b18 feat: 将漏掉的身份证号和社会安全号补上 2025-08-20 00:11:56 +08:00
tigermren a001c26e8d feat:优化公司名简化性能 2025-08-19 23:28:56 +08:00
tigerenwork eb33dc137e feat: 优化chunking,避免截断 2025-08-19 17:43:05 +08:00
tigerenwork ffa31d33de feat: 过滤掉置信度低的entity 2025-08-19 17:26:30 +08:00
tigerenwork 24f452818a feat: 更新替换算法,解决匹配token中有空格的问题 2025-08-19 16:08:49 +08:00
tigermren 40dd0de1b3 feat: 改进ner chunking 2025-08-19 02:15:05 +08:00
tigermren d446ac1854 feat: 使用NER模型进行识别 2025-08-19 01:36:08 +08:00
tigermren 2075218955 feat: 正式fully支持docx 2025-08-18 01:15:40 +08:00
tigermren afddcf4dd7 fix: 解决magic-doc包的问题 2025-08-18 01:01:58 +08:00
tigermren 0820d7bba2 feat:新增magicdoc 2025-08-18 00:40:39 +08:00
tigermren a16b69475e refine: 整理文件 2025-08-17 23:33:56 +08:00
tigermren 84499f52ea feat: 增加错误信息显示 2025-08-17 23:26:59 +08:00
tigermren 256e263cff feat: 开启docx解析,但是mineru-api未支持 2025-08-17 23:12:45 +08:00
tigermren 1138683da1 refine: 调整docker 2025-08-17 20:16:07 +08:00
tigermren c85e166208 feat:重构ollama,内置重试逻辑和schema验证 2025-08-17 20:09:00 +08:00
tigermren 70b6617c5e refine:重构文档 2025-08-17 20:02:37 +08:00
tigermren 1dd2f3884c refine: 新身份证、社会安全代码脱敏规则 2025-08-17 15:59:12 +08:00
tigermren 2c985bc963 feat: 地址脱敏隐去门牌、街道、小区等 2025-08-17 15:30:52 +08:00
tigermren 437e010aee feat: 配置测试test runner 2025-08-17 14:11:29 +08:00
tigermren b3be522358 feat: 公司名字mask 2025-08-17 13:56:25 +08:00
tigermren 2c4ecfd6b0 feat: 中文名按照姓+名拼音首字母脱敏 2025-08-16 16:37:24 +08:00