nextav/docs/active/library-clusters/LIBRARY_SCAN_ENHANCEMENT_RE...

6.9 KiB

Library Scan Enhancement Requirements

📋 Current State Analysis

Existing Capabilities

  • File Discovery: Recursive scanning of library paths using glob patterns
  • Multi-format Support: Videos (9 formats), Photos (8 formats), Text files (18 formats)
  • Thumbnail Generation: FFmpeg-based with hashed folder structure
  • Video Analysis: Codec detection and transcoding requirement analysis
  • Database Integration: Complete media metadata storage with proper indexing
  • Batch Processing: Both individual library and bulk scanning options

Critical Gaps

  1. No File Deletion Handling: Deleted files remain in database as orphaned records
  2. No Thumbnail Verification: Missing/corrupted thumbnails aren't regenerated on re-scan

🎯 Enhanced Requirements

Requirement 1: File Deletion Cleanup

Description: Automatically detect and remove database entries for files that no longer exist on disk

Priority: 🔴 P0 - Critical

Acceptance Criteria:

  • Compare database records with actual file system state
  • Identify orphaned database entries (files that exist in DB but not on disk)
  • Remove orphaned entries from database
  • Log cleanup actions to console
  • Handle errors gracefully (continue scan if cleanup fails)

Technical Requirements:

  • File existence verification using fs.access() or fs.stat()
  • Delete operation for each orphaned record
  • Error logging for debugging
  • No transaction rollback needed (simple delete operations)

User Stories:

  • As a user, when I delete files from my library folder, I want them automatically removed from the database during the next scan
  • As a user, I want the database to accurately reflect what's actually on disk

Requirement 2: Thumbnail Recovery

Description: Detect and regenerate missing thumbnail files during library scan

Priority: 🔴 P0 - Critical

Acceptance Criteria:

  • Verify thumbnail file existence for each media record
  • Detect missing thumbnail files (path exists in DB but file missing on disk)
  • Regenerate missing thumbnails during scan
  • Continue processing if thumbnail generation fails (use fallback)
  • Log thumbnail regeneration actions

Technical Requirements:

  • Thumbnail file validation using fs.stat()
  • Re-use existing thumbnail generation logic
  • Handle thumbnail generation failures gracefully
  • Use existing fallback thumbnail mechanism
  • No additional database fields needed

User Stories:

  • As a user, when thumbnails are accidentally deleted, I want them automatically regenerated during the next scan
  • As a user, when thumbnail generation previously failed, I want the scan to retry automatically

🏗️ Technical Architecture Requirements

Database Schema

No schema changes required - Use existing tables:

  • media table already has path and thumbnail fields
  • No new fields needed

Scan Process Flow

1. File Discovery (existing)
   ├── Scan library path for media files
   └── Get existing database records

2. File Deletion Cleanup (NEW)
   ├── For each database record:
   │   ├── Check if file exists on disk
   │   └── If not: DELETE from database
   └── Log cleanup actions

3. File Processing (existing + enhanced)
   ├── For each discovered file:
   │   ├── Check if already in database (existing)
   │   ├── If new: Insert and generate thumbnail (existing)
   │   └── If exists: Verify thumbnail (NEW)
   
4. Thumbnail Verification (NEW)
   ├── For each existing media record:
   │   ├── Check if thumbnail file exists
   │   ├── If missing: Regenerate thumbnail
   │   ├── If generation fails: Use fallback
   │   └── Log regeneration actions

API Enhancements

No new API endpoints needed - Enhance existing scan endpoint:

// Use existing endpoint
POST /api/scan

// No request body changes
{
  "libraryId": number  // Optional: specific library
}

// Response includes new statistics
{
  "success": true,
  "message": "Scan completed",
  "stats": {
    "filesProcessed": number,
    "filesAdded": number,
    "filesRemoved": number,        // NEW
    "thumbnailsRegenerated": number // NEW
  }
}

📊 Implementation Priority

Feature Priority Effort Impact
File Deletion Detection 🔴 P0 Medium (3-4h) Critical
Missing Thumbnail Regeneration 🔴 P0 Medium (3-4h) Critical

Total Estimated Time: 6-8 hours


🎯 Success Metrics

Functional Metrics

  • Database Accuracy: 100% of deleted files removed from database
  • Thumbnail Recovery: >90% of missing thumbnails regenerated successfully
  • Error Tolerance: Scan completes even if individual files fail

Quality Metrics

  • No Regressions: Existing scan functionality works as before
  • Error Handling: Individual file failures don't stop entire scan
  • Logging: All actions logged for debugging

🔍 Non-Requirements

The following are explicitly excluded from this enhancement:

  • Real-time progress reporting / WebSocket updates
  • Scan session tracking / history
  • Concurrent processing / worker threads
  • Incremental scanning (only changed files)
  • Content-based duplicate detection
  • Advanced error recovery / retry mechanisms
  • Soft delete / undo functionality
  • Performance optimizations beyond current implementation
  • UI changes / progress bars
  • Database transactions (use simple operations)

📝 Technical Constraints

  1. Backward Compatibility: Must work with existing database schema
  2. Simple Implementation: No complex architectural changes
  3. Error Tolerance: Individual failures should not stop scan
  4. Minimal Dependencies: Use existing libraries and utilities
  5. Code Reuse: Leverage existing thumbnail generation code

🧪 Testing Requirements

Manual Testing Scenarios

  1. File Deletion Test

    • Add files to library and scan
    • Delete some files from disk
    • Re-scan library
    • Verify deleted files removed from database
  2. Thumbnail Recovery Test

    • Add files to library and scan
    • Delete thumbnail files from disk
    • Re-scan library
    • Verify thumbnails regenerated
  3. Error Handling Test

    • Create files that cause thumbnail failures
    • Run scan
    • Verify scan completes despite failures

Unit Tests

  • Test file existence checking
  • Test thumbnail file verification
  • Test database deletion operations
  • Test error handling

Document Status: Complete
Implementation Scope: Focused on 2 core requirements
Estimated Time: 6-8 hours
Last Updated: October 14, 2025

Next Steps: Review architecture design document for technical implementation details.