Implement Plex-compatible file-extension coercion to avoid Plex silently ignoring .mp4 (and single-file .m4a) audiobooks (issue #166). Adds a DB migration and configuration key (plex_format_coercion_enabled, default true), exposes a toggle in the setup wizard and Admin Paths settings, and persists/reads the setting in the admin/setup APIs. Introduces src/lib/utils/format-coercion.ts (coerceToPlexCompatible) and related constants in src/lib/constants/audio-formats.ts (PLEX_COMPATIBLE_EXTENSIONS, COERCION_RENAME_MAP, DRM_EXTENSIONS, TRANSCODE_REQUIRED_EXTENSIONS). The organize-files processor now runs coercion after organizing/tagging and before generating the filesHash and triggering scans; coercion is idempotent, never overwrites existing targets, logs warnings on DRM/transcode/permission errors, and is non-fatal. Adds unit tests for the coercion util and updates processor & setup UI tests. Updates documentation (TABLEOFCONTENTS, file-organization, fixes/file-hash-matching, settings-pages) describing behavior, config, and constraints.
7.3 KiB
File Hash-Based Library Matching
Status: ✅ Implemented | Accurate ASIN matching for RMAB-organized audiobooks
Overview
Solves false positive matches in Audiobookshelf fuzzy search by using file hash matching for RMAB-downloaded content.
Problem
- New ABS items without ASIN → fuzzy Audible search by title/author
- Risk: Wrong book matches (e.g., "Foundation" → "Foundation and Empire")
- Result: Incorrect metadata, false positives
Solution
File Hash Matching Strategy:
- Generate SHA256 hash of audio filenames during organization
- Store hash in
Audiobook.filesHashfield - During library scan: compare ABS item files against database hashes
- Match found → Use request's ASIN for 100% accurate metadata
- No match → Fallback to fuzzy search (external content)
How It Works
Organization Phase
File: src/lib/processors/organize-files.processor.ts
const filesHash = generateFilesHash(result.audioFiles);
await prisma.audiobook.update({
data: {
filesHash: filesHash, // SHA256 of sorted audio filenames
// ... other fields
}
});
Library Scan Phase
Files: scan-plex.processor.ts, plex-recently-added.processor.ts
Phase 1: File Hash Matching (Items WITHOUT ASIN)
const itemsWithoutAsin = libraryItems.filter(item => !item.asin && item.externalId);
for (const item of itemsWithoutAsin) {
// 1. Fetch ABS item details
const absItem = await getABSItem(item.externalId);
// 2. Generate hash from ABS audio filenames
const audioFilenames = absItem.media.audioFiles.map(f => f.metadata.filename);
const itemHash = generateFilesHash(audioFilenames);
// 3. Query for matching RMAB download
const matched = await prisma.audiobook.findFirst({
where: { filesHash: itemHash, status: 'completed' }
});
// 4. Trigger metadata match (with ASIN if matched, undefined if not)
await triggerABSItemMatch(item.externalId, matched?.audibleAsin);
}
Phase 2: Request Matching
// Match requests to library items and mark as available
const match = await findPlexMatch({
asin: audiobook.audibleAsin,
title: audiobook.title,
author: audiobook.author
});
if (match) {
// Update audiobook and request status
await prisma.audiobook.update({ data: { absItemId: match.plexGuid } });
await prisma.request.update({ data: { status: 'available' } });
// No metadata match triggering needed:
// - Items without ASIN: Already handled in Phase 1
// - Items with ASIN: Already have correct metadata
}
Hash Generation Algorithm
File: src/lib/utils/files-hash.ts
Process:
- Extract basenames from file paths
- Filter to audio extensions:
.m4b,.m4a,.mp3,.mp4,.aa,.aax - Normalize to lowercase (case-insensitive)
- Sort alphabetically (deterministic order)
- Generate SHA256:
crypto.createHash('sha256').update(JSON.stringify(sorted)).digest('hex')
Properties:
- Deterministic: Same files → same hash (regardless of order/path)
- Path-agnostic: Only basenames matter
- Case-insensitive: "CHAPTER 01.mp3" === "chapter 01.mp3"
- Fast: O(1) database lookup with indexed field
Database Schema
Model: Audiobook
model Audiobook {
// ... existing fields
filesHash String? @map("files_hash") @db.Text // SHA256 (64 chars)
@@index([filesHash]) // Fast O(1) lookups
}
Migration: 20260126100000_add_audiobook_files_hash
Implementation Details
Metadata Match Strategy
Phase 1 (File Hash): Handle NEW items WITHOUT ASIN
- Filter:
libraryItems.filter(item => !item.asin) - Trigger metadata match with file-hash-matched ASIN or undefined
- This is the ONLY phase that triggers ABS metadata matching
Phase 2 (Request Match): Match requests, no metadata triggering
- Match requests to library items by ASIN/title/author
- Update request status to 'available'
- No metadata match triggering - items either:
- Were handled in Phase 1 (new items without ASIN)
- Already have correct metadata (items with ASIN from ABS)
Why This Works:
- Single source of truth: Only file hash phase triggers metadata matching
- No redundant API calls: Items with ASIN already have correct metadata
- Clean separation: Phase 1 = metadata, Phase 2 = request matching
- Simple and efficient: No duplicate checks, no wasted API calls
Edge Cases
Externally-Added Content
- User manually imports audiobook to ABS (not via RMAB)
- No matching
filesHashin database - Fallback: Fuzzy metadata match (current behavior preserved)
Modified Files
- User adds/removes chapters after organization
- ABS hash won't match RMAB hash
- Fallback: Fuzzy metadata match
Existing Content (Before Feature)
- Audiobooks organized before hash feature
filesHashfield is NULL- Behavior: Continues using fuzzy matching
- Future: Admin job could backfill hashes (out of scope)
Chapter-Merged Files
- 20 MP3s → 1 M4B via chapter merging
- Hash generated AFTER merging
- Works correctly: Hash reflects final organized state
Coerced Files (Plex Format Coercion)
- Files renamed from
.mp4→.m4b(or single-file.m4a→.m4b) by Plex format coercion - Hash generated AFTER coercion → reflects post-coercion filenames
- Works correctly going forward: ABS sees post-coercion names, hash matches
- Pre-existing library entries hashed before coercion was enabled will NOT match post-coercion files — retroactive library sweep is out of scope (see issue #166)
Multiple Downloads (Same Book)
- User re-downloads same audiobook (different edition/request)
- Multiple records with same
filesHash - Solution:
findFirst()returns first match (acceptable - same ASIN)
Performance
Storage:
- New index: ~8 bytes per row (minimal)
- SHA256 hash: 64 characters per record
API Calls:
- One additional
getABSItem()call per item without ASIN - Typical response: ~1-5KB JSON
- Latency: ~50-100ms per call
Database:
- Index lookup: O(1) with hash index (extremely fast)
Impact:
- 10 items without ASIN → +500-1000ms per scan (acceptable)
Logging
Organization:
[INFO] Generated files hash: abc123def456... (5 audio files)
Library Scan (Match Found):
[INFO] File hash match found for "Foundation" → ASIN: B08G9PRS1K (from "Foundation (Unabridged)")
[INFO] Triggered metadata match with ASIN B08G9PRS1K for: "Foundation"
Library Scan (No Match):
[INFO] No file match found, triggering fuzzy metadata match for: "The Expanse"
Benefits
✅ 100% Accurate Matching - RMAB-organized content always gets correct ASIN ✅ Path-Agnostic - Works regardless of folder structure differences ✅ Fast Lookups - O(1) database query with indexed field ✅ Graceful Fallback - External content still works via fuzzy matching ✅ No Breaking Changes - Existing content continues working
Testing
Unit Tests: tests/utils/files-hash.test.ts
- Hash generation correctness
- Deterministic behavior
- Edge case handling
Integration Tests: tests/processors/*.test.ts
- Hash storage during organization
- Hash matching during library scan
- Fallback to fuzzy matching
Related
- Audiobookshelf Integration - Backend mode
- File Organization - Organization flow
- Database Schema - Audiobook model