6.9 KiB
Library Scan Enhancement Requirements
📋 Current State Analysis
✅ Existing Capabilities
- File Discovery: Recursive scanning of library paths using glob patterns
- Multi-format Support: Videos (9 formats), Photos (8 formats), Text files (18 formats)
- Thumbnail Generation: FFmpeg-based with hashed folder structure
- Video Analysis: Codec detection and transcoding requirement analysis
- Database Integration: Complete media metadata storage with proper indexing
- Batch Processing: Both individual library and bulk scanning options
❌ Critical Gaps
- No File Deletion Handling: Deleted files remain in database as orphaned records
- No Thumbnail Verification: Missing/corrupted thumbnails aren't regenerated on re-scan
🎯 Enhanced Requirements
Requirement 1: File Deletion Cleanup
Description: Automatically detect and remove database entries for files that no longer exist on disk
Priority: 🔴 P0 - Critical
Acceptance Criteria:
- Compare database records with actual file system state
- Identify orphaned database entries (files that exist in DB but not on disk)
- Remove orphaned entries from database
- Log cleanup actions to console
- Handle errors gracefully (continue scan if cleanup fails)
Technical Requirements:
- File existence verification using
fs.access()orfs.stat() - Delete operation for each orphaned record
- Error logging for debugging
- No transaction rollback needed (simple delete operations)
User Stories:
- As a user, when I delete files from my library folder, I want them automatically removed from the database during the next scan
- As a user, I want the database to accurately reflect what's actually on disk
Requirement 2: Thumbnail Recovery
Description: Detect and regenerate missing thumbnail files during library scan
Priority: 🔴 P0 - Critical
Acceptance Criteria:
- Verify thumbnail file existence for each media record
- Detect missing thumbnail files (path exists in DB but file missing on disk)
- Regenerate missing thumbnails during scan
- Continue processing if thumbnail generation fails (use fallback)
- Log thumbnail regeneration actions
Technical Requirements:
- Thumbnail file validation using
fs.stat() - Re-use existing thumbnail generation logic
- Handle thumbnail generation failures gracefully
- Use existing fallback thumbnail mechanism
- No additional database fields needed
User Stories:
- As a user, when thumbnails are accidentally deleted, I want them automatically regenerated during the next scan
- As a user, when thumbnail generation previously failed, I want the scan to retry automatically
🏗️ Technical Architecture Requirements
Database Schema
No schema changes required - Use existing tables:
mediatable already haspathandthumbnailfields- No new fields needed
Scan Process Flow
1. File Discovery (existing)
├── Scan library path for media files
└── Get existing database records
2. File Deletion Cleanup (NEW)
├── For each database record:
│ ├── Check if file exists on disk
│ └── If not: DELETE from database
└── Log cleanup actions
3. File Processing (existing + enhanced)
├── For each discovered file:
│ ├── Check if already in database (existing)
│ ├── If new: Insert and generate thumbnail (existing)
│ └── If exists: Verify thumbnail (NEW)
4. Thumbnail Verification (NEW)
├── For each existing media record:
│ ├── Check if thumbnail file exists
│ ├── If missing: Regenerate thumbnail
│ ├── If generation fails: Use fallback
│ └── Log regeneration actions
API Enhancements
No new API endpoints needed - Enhance existing scan endpoint:
// Use existing endpoint
POST /api/scan
// No request body changes
{
"libraryId": number // Optional: specific library
}
// Response includes new statistics
{
"success": true,
"message": "Scan completed",
"stats": {
"filesProcessed": number,
"filesAdded": number,
"filesRemoved": number, // NEW
"thumbnailsRegenerated": number // NEW
}
}
📊 Implementation Priority
| Feature | Priority | Effort | Impact |
|---|---|---|---|
| File Deletion Detection | 🔴 P0 | Medium (3-4h) | Critical |
| Missing Thumbnail Regeneration | 🔴 P0 | Medium (3-4h) | Critical |
Total Estimated Time: 6-8 hours
🎯 Success Metrics
Functional Metrics
- Database Accuracy: 100% of deleted files removed from database
- Thumbnail Recovery: >90% of missing thumbnails regenerated successfully
- Error Tolerance: Scan completes even if individual files fail
Quality Metrics
- No Regressions: Existing scan functionality works as before
- Error Handling: Individual file failures don't stop entire scan
- Logging: All actions logged for debugging
🔍 Non-Requirements
The following are explicitly excluded from this enhancement:
- ❌ Real-time progress reporting / WebSocket updates
- ❌ Scan session tracking / history
- ❌ Concurrent processing / worker threads
- ❌ Incremental scanning (only changed files)
- ❌ Content-based duplicate detection
- ❌ Advanced error recovery / retry mechanisms
- ❌ Soft delete / undo functionality
- ❌ Performance optimizations beyond current implementation
- ❌ UI changes / progress bars
- ❌ Database transactions (use simple operations)
📝 Technical Constraints
- Backward Compatibility: Must work with existing database schema
- Simple Implementation: No complex architectural changes
- Error Tolerance: Individual failures should not stop scan
- Minimal Dependencies: Use existing libraries and utilities
- Code Reuse: Leverage existing thumbnail generation code
🧪 Testing Requirements
Manual Testing Scenarios
-
File Deletion Test
- Add files to library and scan
- Delete some files from disk
- Re-scan library
- Verify deleted files removed from database
-
Thumbnail Recovery Test
- Add files to library and scan
- Delete thumbnail files from disk
- Re-scan library
- Verify thumbnails regenerated
-
Error Handling Test
- Create files that cause thumbnail failures
- Run scan
- Verify scan completes despite failures
Unit Tests
- Test file existence checking
- Test thumbnail file verification
- Test database deletion operations
- Test error handling
Document Status: ✅ Complete
Implementation Scope: Focused on 2 core requirements
Estimated Time: 6-8 hours
Last Updated: October 14, 2025
Next Steps: Review architecture design document for technical implementation details.