nextav/docs/active/library-clusters/LIBRARY_SCAN_ENHANCEMENT_AR...

17 KiB

Library Scan Enhancement Architecture

🏗️ System Architecture Overview

Simplified Scan Enhancement

┌────────────────────────────────────────────────────────────┐
│                   Enhanced Scanner                         │
│                   (scanner.ts)                             │
├────────────────────────────────────────────────────────────┤
│  1. File Discovery           (existing)                    │
│  2. File Deletion Cleanup    (NEW)                         │
│  3. File Processing          (existing)                    │
│  4. Thumbnail Verification   (NEW)                         │
└────────────────────────────────────────────────────────────┘

Design Philosophy: Minimal changes to existing scanner, add two new verification steps


🔧 Component Enhancements

Enhanced Scanner Flow

``typescript // File: src/lib/scanner.ts

const scanLibrary = async (library: { id: number; path: string }) => { const db = getDatabase();

// 1. FILE DISCOVERY (existing) const allFiles = await glob(${library.path}/**/*.*, { nodir: true }); const mediaFiles = [...filteredVideoFiles, ...filteredPhotoFiles, ...filteredTextFiles];

// 2. FILE DELETION CLEANUP (NEW) await cleanupDeletedFiles(db, library.id, mediaFiles);

// 3. FILE PROCESSING (existing + enhanced) for (const file of mediaFiles) { const existingMedia = db.prepare("SELECT * FROM media WHERE path = ?").get(file);

if (existingMedia) {
  // 4. THUMBNAIL VERIFICATION (NEW)
  await verifyAndRegenerateThumbnail(existingMedia);
  continue;
}

// Existing: Insert new media with thumbnail generation

} };


### **1. File Deletion Cleanup** (NEW)

**Purpose**: Remove database entries for files that no longer exist on disk

**Implementation**:
```typescript
async function cleanupDeletedFiles(
  db: Database, 
  libraryId: number, 
  currentFiles: string[]
): Promise<{ removed: number }> {
  // Get all media records for this library
  const dbRecords = db.prepare(
    "SELECT id, path FROM media WHERE library_id = ?"
  ).all(libraryId) as { id: number; path: string }[];
  
  // Create set of current file paths for fast lookup
  const currentFileSet = new Set(currentFiles);
  
  let removed = 0;
  
  // Check each database record
  for (const record of dbRecords) {
    // If file doesn't exist in current scan
    if (!currentFileSet.has(record.path)) {
      try {
        // Verify file truly doesn't exist on disk
        await fs.access(record.path);
        // File exists but wasn't in scan - possibly outside glob pattern
        continue;
      } catch {
        // File doesn't exist - remove from database
        db.prepare("DELETE FROM media WHERE id = ?").run(record.id);
        console.log(`Removed orphaned record: ${record.path}`);
        removed++;
      }
    }
  }
  
  console.log(`Cleanup complete: ${removed} orphaned records removed`);
  return { removed };
}

Key Features:

  • Double-checks file existence before deletion (safety)
  • Handles cases where files exist but weren't scanned
  • Logs each deletion for transparency
  • Returns statistics for reporting

2. Thumbnail Verification (NEW)

Purpose: Detect and regenerate missing thumbnails for existing media

Implementation: ``typescript async function verifyAndRegenerateThumbnail( media: MediaRecord ): Promise<{ regenerated: boolean }> { // Skip if using fallback thumbnail if (media.thumbnail.includes('/fallback/')) { return { regenerated: false }; }

// Get full path from URL const thumbnailPath = getThumbnailPathFromUrl(media.thumbnail);

try { // Check if thumbnail file exists await fs.access(thumbnailPath); return { regenerated: false }; // Thumbnail exists, no action needed } catch { // Thumbnail missing - regenerate console.log(Regenerating missing thumbnail for: ${media.path});

try {
  const { folderPath, fullPath, url } = ThumbnailManager.getThumbnailPath(media.path);
  ThumbnailManager.ensureDirectory(folderPath);
  
  if (media.type === 'video') {
    await generateVideoThumbnail(media.path, fullPath);
  } else if (media.type === 'photo') {
    await generatePhotoThumbnail(media.path, fullPath);
  }
  
  // Update database with new thumbnail path
  db.prepare("UPDATE media SET thumbnail = ? WHERE id = ?")
    .run(url, media.id);
  
  console.log(`Successfully regenerated thumbnail: ${media.path}`);
  return { regenerated: true };
  
} catch (error) {
  console.warn(`Failed to regenerate thumbnail for ${media.path}:`, error);
  
  // Use fallback thumbnail
  const fallbackUrl = ThumbnailManager.getFallbackThumbnailUrl(media.type);
  db.prepare("UPDATE media SET thumbnail = ? WHERE id = ?")
    .run(fallbackUrl, media.id);
  
  return { regenerated: false };
}

} }

function getThumbnailPathFromUrl(url: string): string { // Convert URL like /thumbnails/ab/cd/file.png // to full path like /path/to/public/thumbnails/ab/cd/file.png return path.join(process.cwd(), 'public', url); }


**Key Features**:
- Verifies file existence before attempting regeneration
- Reuses existing thumbnail generation functions
- Updates database with new thumbnail path
- Falls back to type-based placeholder on failure
- Logs all actions for debugging

---

## 🔄 **Enhanced Scan Process Flow**

### **Detailed Process Steps**

┌─────────────────────────────────────────────────────────────┐ │ 1. START SCAN │ │ ├── Receive library ID or scan all │ │ └── Initialize statistics counters │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 2. FILE DISCOVERY (existing) │ │ ├── Glob library path for all media files │ │ ├── Filter by video/photo/text extensions │ │ └── Build list of current files │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 3. FILE DELETION CLEANUP (NEW) │ │ ├── Get all database records for library │ │ ├── For each database record: │ │ │ ├── Check if file in current scan results │ │ │ ├── If not: Verify file doesn't exist on disk │ │ │ └── If missing: DELETE from database │ │ └── Log cleanup statistics │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 4. FILE PROCESSING LOOP │ │ For each discovered file: │ │ ├── Check if file already in database │ │ ├── If NEW: │ │ │ ├── Generate thumbnail (existing) │ │ │ ├── Analyze video codec (existing) │ │ │ └── INSERT into database (existing) │ │ └── If EXISTS: │ │ └── Verify & regenerate thumbnail (NEW) → │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 5. THUMBNAIL VERIFICATION (NEW) │ │ ├── Check if thumbnail file exists on disk │ │ ├── If missing: Regenerate thumbnail │ │ ├── Update database with new path │ │ └── Log regeneration actions │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 6. COMPLETE SCAN │ │ ├── Log final statistics │ │ └── Return success │ └─────────────────────────────────────────────────────────────┘


---

## 🗄️ **Database Operations**

### **No Schema Changes Required**

Use existing tables and fields:

```sql
-- Existing media table (no changes)
CREATE TABLE media (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  library_id INTEGER,
  path TEXT NOT NULL UNIQUE,
  type TEXT NOT NULL,
  title TEXT,
  size INTEGER,
  thumbnail TEXT,
  codec_info TEXT DEFAULT '{}',
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (library_id) REFERENCES libraries (id)
);

Database Operations

// 1. Get all media for library
const records = db.prepare(
  "SELECT id, path, type, thumbnail FROM media WHERE library_id = ?"
).all(libraryId);

// 2. Delete orphaned record
db.prepare("DELETE FROM media WHERE id = ?").run(recordId);

// 3. Update thumbnail path
db.prepare("UPDATE media SET thumbnail = ? WHERE id = ?")
  .run(newThumbnailUrl, mediaId);

No transactions needed - Simple, independent operations


📊 Statistics Tracking

Scan Statistics Object

interface ScanStats {
  filesProcessed: number;      // Total files scanned
  filesAdded: number;           // New files inserted (existing)
  filesRemoved: number;         // Orphaned records deleted (NEW)
  thumbnailsRegenerated: number; // Missing thumbnails recreated (NEW)
  errors: number;               // Total errors encountered
}

Statistics Collection

``typescript const scanLibrary = async (library: { id: number; path: string }) => { const stats: ScanStats = { filesProcessed: 0, filesAdded: 0, filesRemoved: 0, thumbnailsRegenerated: 0, errors: 0 };

// ... scan logic ...

// Log final statistics console.log('Scan complete:', stats); return stats; };


---

## 🔒 **Error Handling Strategy**

### **Error Tolerance Approach**

``typescript
// Principle: Individual failures should not stop entire scan

// File deletion cleanup
for (const record of dbRecords) {
  try {
    // Check and delete if needed
    await cleanupRecord(record);
  } catch (error) {
    console.error(`Error cleaning up ${record.path}:`, error);
    stats.errors++;
    // Continue to next record
  }
}

// Thumbnail verification
for (const file of mediaFiles) {
  try {
    const existingMedia = getExistingMedia(file);
    if (existingMedia) {
      await verifyAndRegenerateThumbnail(existingMedia);
    }
  } catch (error) {
    console.error(`Error processing ${file}:`, error);
    stats.errors++;
    // Continue to next file
  }
}

Key Principles:

  • Wrap each file operation in try-catch
  • Log errors but continue processing
  • Track error count in statistics
  • No transaction rollback (simple operations)

🚀 Implementation Strategy

Code Changes Required

Single file modification: src/lib/scanner.ts

``typescript // Add helper functions at top of file async function cleanupDeletedFiles(...) { /* ... / } async function verifyAndRegenerateThumbnail(...) { / ... / } function getThumbnailPathFromUrl(...) { / ... */ }

// Modify existing scanLibrary function const scanLibrary = async (library: { id: number; path: string }) => { // Initialize statistics const stats = { filesProcessed: 0, filesAdded: 0, filesRemoved: 0, thumbnailsRegenerated: 0, errors: 0 };

// Existing file discovery code const allFiles = await glob(${library.path}/**/*.*, { nodir: true }); // ... existing filtering logic ...

// NEW: Cleanup deleted files const cleanupResult = await cleanupDeletedFiles(db, library.id, mediaFiles); stats.filesRemoved = cleanupResult.removed;

// Existing file processing loop for (const file of mediaFiles) { stats.filesProcessed++;

const existingMedia = db.prepare("SELECT * FROM media WHERE path = ?").get(file);

if (existingMedia) {
  // NEW: Verify thumbnail for existing files
  const thumbResult = await verifyAndRegenerateThumbnail(existingMedia);
  if (thumbResult.regenerated) stats.thumbnailsRegenerated++;
  continue;
}

// Existing new file processing
// ... existing thumbnail generation and insert logic ...
stats.filesAdded++;

}

// Log final statistics console.log('Scan complete:', stats); return stats; };


**No other files require changes**

---

## 🎯 **Performance Considerations**

### **Performance Impact**

| **Operation** | **Impact** | **Mitigation** |
|--------------|-----------|---------------|
| File existence checks | Low | Use fast `fs.access()` |
| Database queries | Low | Single query per library |
| Thumbnail regeneration | Medium | Only for missing thumbnails |
| Overall scan time | +10-20% | Acceptable for data integrity |

### **Optimization Notes**

- File existence checks are fast I/O operations
- Database deletions are simple, indexed operations
- Thumbnail regeneration only happens for missing files
- No additional memory overhead
- Sequential processing (same as current)

---

## 📈 **Testing Strategy**

### **Unit Testing**

``typescript
describe('cleanupDeletedFiles', () => {
  it('should remove records for deleted files', async () => {
    // Setup: Create DB records for files that don't exist
    // Execute: Run cleanup
    // Verify: Records removed from database
  });
  
  it('should keep records for existing files', async () => {
    // Setup: Create DB records for existing files
    // Execute: Run cleanup
    // Verify: Records still in database
  });
});

describe('verifyAndRegenerateThumbnail', () => {
  it('should skip if thumbnail exists', async () => {
    // Setup: Media record with existing thumbnail
    // Execute: Verify
    // Verify: No regeneration attempted
  });
  
  it('should regenerate if thumbnail missing', async () => {
    // Setup: Media record with missing thumbnail
    // Execute: Verify
    // Verify: Thumbnail regenerated and DB updated
  });
});

Integration Testing

``typescript describe('Enhanced Scanner', () => { it('should complete full scan with cleanup and verification', async () => { // Setup: Library with mixed scenarios (new, existing, deleted, missing thumbnails) // Execute: Full library scan // Verify: All scenarios handled correctly }); });


---

## 🔗 **Related Documentation**

- [Requirements Document](LIBRARY_SCAN_ENHANCEMENT_REQUIREMENTS.md)
- [Implementation Plan](LIBRARY_SCAN_ENHANCEMENT_IMPLEMENTATION.md)
- [Summary](LIBRARY_SCAN_ENHANCEMENT_SUMMARY.md)

---

*Document Status*: ✅ **Complete**  
*Architecture Type*: Minimal, focused enhancement  
*Implementation Complexity*: Low-Medium  
*Last Updated*: October 14, 2025