Adds file hash-based matching for Audiobookshelf library items to ensure 100% accurate ASIN assignment for RMAB-organized content. Removes fuzzy matching from library availability checks, making all matching ASIN-only to eliminate false positives and race conditions. Updates database schema, processors, and matcher utilities; adds new tests and documentation for the new matching strategy. Removes obsolete scripts, Dockerfile, and related tests; updates docker-compose for test environments.
6.9 KiB
File Hash-Based Library Matching
Status: ✅ Implemented | Accurate ASIN matching for RMAB-organized audiobooks
Overview
Solves false positive matches in Audiobookshelf fuzzy search by using file hash matching for RMAB-downloaded content.
Problem
- New ABS items without ASIN → fuzzy Audible search by title/author
- Risk: Wrong book matches (e.g., "Foundation" → "Foundation and Empire")
- Result: Incorrect metadata, false positives
Solution
File Hash Matching Strategy:
- Generate SHA256 hash of audio filenames during organization
- Store hash in
Audiobook.filesHashfield - During library scan: compare ABS item files against database hashes
- Match found → Use request's ASIN for 100% accurate metadata
- No match → Fallback to fuzzy search (external content)
How It Works
Organization Phase
File: src/lib/processors/organize-files.processor.ts
const filesHash = generateFilesHash(result.audioFiles);
await prisma.audiobook.update({
data: {
filesHash: filesHash, // SHA256 of sorted audio filenames
// ... other fields
}
});
Library Scan Phase
Files: scan-plex.processor.ts, plex-recently-added.processor.ts
Phase 1: File Hash Matching (Items WITHOUT ASIN)
const itemsWithoutAsin = libraryItems.filter(item => !item.asin && item.externalId);
for (const item of itemsWithoutAsin) {
// 1. Fetch ABS item details
const absItem = await getABSItem(item.externalId);
// 2. Generate hash from ABS audio filenames
const audioFilenames = absItem.media.audioFiles.map(f => f.metadata.filename);
const itemHash = generateFilesHash(audioFilenames);
// 3. Query for matching RMAB download
const matched = await prisma.audiobook.findFirst({
where: { filesHash: itemHash, status: 'completed' }
});
// 4. Trigger metadata match (with ASIN if matched, undefined if not)
await triggerABSItemMatch(item.externalId, matched?.audibleAsin);
}
Phase 2: Request Matching
// Match requests to library items and mark as available
const match = await findPlexMatch({
asin: audiobook.audibleAsin,
title: audiobook.title,
author: audiobook.author
});
if (match) {
// Update audiobook and request status
await prisma.audiobook.update({ data: { absItemId: match.plexGuid } });
await prisma.request.update({ data: { status: 'available' } });
// No metadata match triggering needed:
// - Items without ASIN: Already handled in Phase 1
// - Items with ASIN: Already have correct metadata
}
Hash Generation Algorithm
File: src/lib/utils/files-hash.ts
Process:
- Extract basenames from file paths
- Filter to audio extensions:
.m4b,.m4a,.mp3,.mp4,.aa,.aax - Normalize to lowercase (case-insensitive)
- Sort alphabetically (deterministic order)
- Generate SHA256:
crypto.createHash('sha256').update(JSON.stringify(sorted)).digest('hex')
Properties:
- Deterministic: Same files → same hash (regardless of order/path)
- Path-agnostic: Only basenames matter
- Case-insensitive: "CHAPTER 01.mp3" === "chapter 01.mp3"
- Fast: O(1) database lookup with indexed field
Database Schema
Model: Audiobook
model Audiobook {
// ... existing fields
filesHash String? @map("files_hash") @db.Text // SHA256 (64 chars)
@@index([filesHash]) // Fast O(1) lookups
}
Migration: 20260126100000_add_audiobook_files_hash
Implementation Details
Metadata Match Strategy
Phase 1 (File Hash): Handle NEW items WITHOUT ASIN
- Filter:
libraryItems.filter(item => !item.asin) - Trigger metadata match with file-hash-matched ASIN or undefined
- This is the ONLY phase that triggers ABS metadata matching
Phase 2 (Request Match): Match requests, no metadata triggering
- Match requests to library items by ASIN/title/author
- Update request status to 'available'
- No metadata match triggering - items either:
- Were handled in Phase 1 (new items without ASIN)
- Already have correct metadata (items with ASIN from ABS)
Why This Works:
- Single source of truth: Only file hash phase triggers metadata matching
- No redundant API calls: Items with ASIN already have correct metadata
- Clean separation: Phase 1 = metadata, Phase 2 = request matching
- Simple and efficient: No duplicate checks, no wasted API calls
Edge Cases
Externally-Added Content
- User manually imports audiobook to ABS (not via RMAB)
- No matching
filesHashin database - Fallback: Fuzzy metadata match (current behavior preserved)
Modified Files
- User adds/removes chapters after organization
- ABS hash won't match RMAB hash
- Fallback: Fuzzy metadata match
Existing Content (Before Feature)
- Audiobooks organized before hash feature
filesHashfield is NULL- Behavior: Continues using fuzzy matching
- Future: Admin job could backfill hashes (out of scope)
Chapter-Merged Files
- 20 MP3s → 1 M4B via chapter merging
- Hash generated AFTER merging
- Works correctly: Hash reflects final organized state
Multiple Downloads (Same Book)
- User re-downloads same audiobook (different edition/request)
- Multiple records with same
filesHash - Solution:
findFirst()returns first match (acceptable - same ASIN)
Performance
Storage:
- New index: ~8 bytes per row (minimal)
- SHA256 hash: 64 characters per record
API Calls:
- One additional
getABSItem()call per item without ASIN - Typical response: ~1-5KB JSON
- Latency: ~50-100ms per call
Database:
- Index lookup: O(1) with hash index (extremely fast)
Impact:
- 10 items without ASIN → +500-1000ms per scan (acceptable)
Logging
Organization:
[INFO] Generated files hash: abc123def456... (5 audio files)
Library Scan (Match Found):
[INFO] File hash match found for "Foundation" → ASIN: B08G9PRS1K (from "Foundation (Unabridged)")
[INFO] Triggered metadata match with ASIN B08G9PRS1K for: "Foundation"
Library Scan (No Match):
[INFO] No file match found, triggering fuzzy metadata match for: "The Expanse"
Benefits
✅ 100% Accurate Matching - RMAB-organized content always gets correct ASIN ✅ Path-Agnostic - Works regardless of folder structure differences ✅ Fast Lookups - O(1) database query with indexed field ✅ Graceful Fallback - External content still works via fuzzy matching ✅ No Breaking Changes - Existing content continues working
Testing
Unit Tests: tests/utils/files-hash.test.ts
- Hash generation correctness
- Deterministic behavior
- Edge case handling
Integration Tests: tests/processors/*.test.ts
- Hash storage during organization
- Hash matching during library scan
- Fallback to fuzzy matching
Related
- Audiobookshelf Integration - Backend mode
- File Organization - Organization flow
- Database Schema - Audiobook model