mirror of https://github.com/kikootwo/ReadMeABook.git synced 2026-06-02 20:30:10 +00:00

Files

T

kikootwo f23afc1ba2 Add Plex format coercion (.mp4 → .m4b)

Implement Plex-compatible file-extension coercion to avoid Plex silently ignoring .mp4 (and single-file .m4a) audiobooks (issue #166). Adds a DB migration and configuration key (plex_format_coercion_enabled, default true), exposes a toggle in the setup wizard and Admin Paths settings, and persists/reads the setting in the admin/setup APIs.

Introduces src/lib/utils/format-coercion.ts (coerceToPlexCompatible) and related constants in src/lib/constants/audio-formats.ts (PLEX_COMPATIBLE_EXTENSIONS, COERCION_RENAME_MAP, DRM_EXTENSIONS, TRANSCODE_REQUIRED_EXTENSIONS). The organize-files processor now runs coercion after organizing/tagging and before generating the filesHash and triggering scans; coercion is idempotent, never overwrites existing targets, logs warnings on DRM/transcode/permission errors, and is non-fatal.

Adds unit tests for the coercion util and updates processor & setup UI tests. Updates documentation (TABLEOFCONTENTS, file-organization, fixes/file-hash-matching, settings-pages) describing behavior, config, and constraints.

2026-05-15 19:33:59 -04:00

7.3 KiB

Raw Permalink Blame History

File Hash-Based Library Matching

Status: ✅ Implemented | Accurate ASIN matching for RMAB-organized audiobooks

Overview

Solves false positive matches in Audiobookshelf fuzzy search by using file hash matching for RMAB-downloaded content.

Problem

New ABS items without ASIN → fuzzy Audible search by title/author
Risk: Wrong book matches (e.g., "Foundation" → "Foundation and Empire")
Result: Incorrect metadata, false positives

Solution

File Hash Matching Strategy:

Generate SHA256 hash of audio filenames during organization
Store hash in Audiobook.filesHash field
During library scan: compare ABS item files against database hashes
Match found → Use request's ASIN for 100% accurate metadata
No match → Fallback to fuzzy search (external content)

How It Works

Organization Phase

File: src/lib/processors/organize-files.processor.ts

const filesHash = generateFilesHash(result.audioFiles);
await prisma.audiobook.update({
  data: {
    filesHash: filesHash,  // SHA256 of sorted audio filenames
    // ... other fields
  }
});

Library Scan Phase

Files: scan-plex.processor.ts, plex-recently-added.processor.ts

Phase 1: File Hash Matching (Items WITHOUT ASIN)

const itemsWithoutAsin = libraryItems.filter(item => !item.asin && item.externalId);

for (const item of itemsWithoutAsin) {
  // 1. Fetch ABS item details
  const absItem = await getABSItem(item.externalId);

  // 2. Generate hash from ABS audio filenames
  const audioFilenames = absItem.media.audioFiles.map(f => f.metadata.filename);
  const itemHash = generateFilesHash(audioFilenames);

  // 3. Query for matching RMAB download
  const matched = await prisma.audiobook.findFirst({
    where: { filesHash: itemHash, status: 'completed' }
  });

  // 4. Trigger metadata match (with ASIN if matched, undefined if not)
  await triggerABSItemMatch(item.externalId, matched?.audibleAsin);
}

Phase 2: Request Matching

// Match requests to library items and mark as available
const match = await findPlexMatch({
  asin: audiobook.audibleAsin,
  title: audiobook.title,
  author: audiobook.author
});

if (match) {
  // Update audiobook and request status
  await prisma.audiobook.update({ data: { absItemId: match.plexGuid } });
  await prisma.request.update({ data: { status: 'available' } });

  // No metadata match triggering needed:
  // - Items without ASIN: Already handled in Phase 1
  // - Items with ASIN: Already have correct metadata
}

Hash Generation Algorithm

File: src/lib/utils/files-hash.ts

Process:

Extract basenames from file paths
Filter to audio extensions: .m4b, .m4a, .mp3, .mp4, .aa, .aax
Normalize to lowercase (case-insensitive)
Sort alphabetically (deterministic order)
Generate SHA256: crypto.createHash('sha256').update(JSON.stringify(sorted)).digest('hex')

Properties:

Deterministic: Same files → same hash (regardless of order/path)
Path-agnostic: Only basenames matter
Case-insensitive: "CHAPTER 01.mp3" === "chapter 01.mp3"
Fast: O(1) database lookup with indexed field

Database Schema

Model: Audiobook

model Audiobook {
  // ... existing fields
  filesHash String? @map("files_hash") @db.Text  // SHA256 (64 chars)

  @@index([filesHash])  // Fast O(1) lookups
}

Migration: 20260126100000_add_audiobook_files_hash

Implementation Details

Metadata Match Strategy

Phase 1 (File Hash): Handle NEW items WITHOUT ASIN

Filter: libraryItems.filter(item => !item.asin)
Trigger metadata match with file-hash-matched ASIN or undefined
This is the ONLY phase that triggers ABS metadata matching

Phase 2 (Request Match): Match requests, no metadata triggering

Match requests to library items by ASIN/title/author
Update request status to 'available'
No metadata match triggering - items either:
- Were handled in Phase 1 (new items without ASIN)
- Already have correct metadata (items with ASIN from ABS)

Why This Works:

Single source of truth: Only file hash phase triggers metadata matching
No redundant API calls: Items with ASIN already have correct metadata
Clean separation: Phase 1 = metadata, Phase 2 = request matching
Simple and efficient: No duplicate checks, no wasted API calls

Edge Cases

Externally-Added Content

User manually imports audiobook to ABS (not via RMAB)
No matching filesHash in database
Fallback: Fuzzy metadata match (current behavior preserved)

Modified Files

User adds/removes chapters after organization
ABS hash won't match RMAB hash
Fallback: Fuzzy metadata match

Existing Content (Before Feature)

Audiobooks organized before hash feature
filesHash field is NULL
Behavior: Continues using fuzzy matching
Future: Admin job could backfill hashes (out of scope)

Chapter-Merged Files

20 MP3s → 1 M4B via chapter merging
Hash generated AFTER merging
Works correctly: Hash reflects final organized state

Coerced Files (Plex Format Coercion)

Files renamed from .mp4 → .m4b (or single-file .m4a → .m4b) by Plex format coercion
Hash generated AFTER coercion → reflects post-coercion filenames
Works correctly going forward: ABS sees post-coercion names, hash matches
Pre-existing library entries hashed before coercion was enabled will NOT match post-coercion files — retroactive library sweep is out of scope (see issue #166)

Multiple Downloads (Same Book)

User re-downloads same audiobook (different edition/request)
Multiple records with same filesHash
Solution: findFirst() returns first match (acceptable - same ASIN)

Performance

Storage:

New index: ~8 bytes per row (minimal)
SHA256 hash: 64 characters per record

API Calls:

One additional getABSItem() call per item without ASIN
Typical response: ~1-5KB JSON
Latency: ~50-100ms per call

Database:

Index lookup: O(1) with hash index (extremely fast)

Impact:

10 items without ASIN → +500-1000ms per scan (acceptable)

Logging

Organization:

[INFO] Generated files hash: abc123def456... (5 audio files)

Library Scan (Match Found):

[INFO] File hash match found for "Foundation" → ASIN: B08G9PRS1K (from "Foundation (Unabridged)")
[INFO] Triggered metadata match with ASIN B08G9PRS1K for: "Foundation"

Library Scan (No Match):

[INFO] No file match found, triggering fuzzy metadata match for: "The Expanse"

Benefits

✅ 100% Accurate Matching - RMAB-organized content always gets correct ASIN ✅ Path-Agnostic - Works regardless of folder structure differences ✅ Fast Lookups - O(1) database query with indexed field ✅ Graceful Fallback - External content still works via fuzzy matching ✅ No Breaking Changes - Existing content continues working

Testing

Unit Tests: tests/utils/files-hash.test.ts

Hash generation correctness
Deterministic behavior
Edge case handling

Integration Tests: tests/processors/*.test.ts

Hash storage during organization
Hash matching during library scan
Fallback to fuzzy matching

Audiobookshelf Integration - Backend mode
File Organization - Organization flow
Database Schema - Audiobook model

7.3 KiB Raw Permalink Blame History