Implement file hash-based library matching and remove fuzzy ASIN matching

Adds file hash-based matching for Audiobookshelf library items to ensure 100% accurate ASIN assignment for RMAB-organized content. Removes fuzzy matching from library availability checks, making all matching ASIN-only to eliminate false positives and race conditions. Updates database schema, processors, and matcher utilities; adds new tests and documentation for the new matching strategy. Removes obsolete scripts, Dockerfile, and related tests; updates docker-compose for test environments.
2026-07-18 02:31:10 +00:00 · 2026-01-28 10:32:14 -05:00
parent 497849f427
commit a97979358f
111 changed files with 6571 additions and 1426 deletions
@@ -87,6 +87,10 @@
 - **Request deletion (soft delete, seeding awareness)** → [admin-features/request-deletion.md](admin-features/request-deletion.md)
 - **Request approval system, auto-approve settings** → [admin-features/request-approval.md](admin-features/request-approval.md)

+## Fixes & Improvements
+- **File hash-based library matching (ABS)** → [fixes/file-hash-matching.md](fixes/file-hash-matching.md)
+- **Accurate ASIN matching for RMAB-organized content** → [fixes/file-hash-matching.md](fixes/file-hash-matching.md)
+
 ## Deployment
 - **Docker Compose setup (multi-container)** → [deployment/docker.md](deployment/docker.md)
 - **Unified container (all-in-one)** → [deployment/unified.md](deployment/unified.md)
@@ -125,3 +129,5 @@
 **"How do I switch from Plex to Audiobookshelf?"** → [features/audiobookshelf-integration.md](features/audiobookshelf-integration.md) (PRD only, not implemented)
 **"How does library thumbnail caching work?"** → [features/library-thumbnail-cache.md](features/library-thumbnail-cache.md)
 **"Why do BookDate library books show placeholders?"** → [features/library-thumbnail-cache.md](features/library-thumbnail-cache.md)
+**"How does file hash matching work?"** → [fixes/file-hash-matching.md](fixes/file-hash-matching.md)
+**"Why is ABS matching the wrong book?"** → [fixes/file-hash-matching.md](fixes/file-hash-matching.md) (file hash prevents false positives)
@@ -96,7 +96,11 @@ model Request {
   - **Audiobookshelf Mode:** Delete library item via API if `absItemId` exists
     - Prevents "ghost" entries in Audiobookshelf library
     - Only removes from ABS database, not files (already deleted in step 3)
-   - **Plex Mode:** Clear plex_library cache records
+   - **Plex Mode:** Delete library item via API if `plexGuid` exists
+     - Queries plex_library table to get plexRatingKey from audiobook's plexGuid
+     - Calls Plex DELETE `/library/metadata/{ratingKey}` endpoint with the ratingKey
+     - Requires deletion enabled in Plex: Settings > Server > Library
+     - Also clears plex_library cache records

 5. **Soft Delete Request**
   - UPDATE: `deletedAt = NOW(), deletedBy = adminUserId`
@@ -194,6 +198,9 @@ where: {
 8. ✅ **Network error** - Alert shown, request remains
 9. ✅ **ABS library item deletion fails** - Log error, continue with soft delete
 10. ✅ **No absItemId present** - Skip ABS deletion (not yet in library)
+11. ✅ **Plex library item deletion fails** - Log error, continue with soft delete
+12. ✅ **No plexGuid present** - Skip Plex deletion (not yet in library)
+13. ✅ **Plex deletion not enabled in settings** - Log error, continue with soft delete

 ## File Structure

@@ -50,11 +50,13 @@ PostgreSQL database storing users, audiobooks, requests, downloads, configuratio
 ### Audiobooks
 - `id` (UUID PK), `audible_asin` (nullable), `title`, `author`, `narrator`, `description`
 - `cover_art_url`, `file_path`, `file_format`, `file_size_bytes`
- `plex_guid` (nullable), `plex_library_id` (nullable)
+- `plex_guid` (nullable), `plex_library_id` (nullable), `abs_item_id` (nullable)
+- `files_hash` (nullable) - SHA256 hash of sorted audio filenames for library matching
 - `status` ('requested'|'downloading'|'processing'|'completed'|'failed')
 - `created_at`, `updated_at`, `completed_at`
- Indexes: `audible_asin`, `plex_guid`, `title`, `author`, `status`
+- Indexes: `audible_asin`, `plex_guid`, `abs_item_id`, `files_hash`, `title`, `author`, `status`
 - **Purpose:** User-requested audiobooks only (created on request)
+- **File Hash Matching:** `files_hash` enables 100% accurate ASIN matching for RMAB-organized content in ABS library scans (see: [fixes/file-hash-matching.md](../fixes/file-hash-matching.md))

 ### Requests
 - `id` (UUID PK), `user_id` (FK), `audiobook_id` (FK)
@@ -194,18 +194,13 @@ Result: ASIN match (100% confidence)

 ### Step 1: Apply Database Migration

-**Option A: Docker Environment (Recommended)**
+**Docker deployment:**
 ```bash
 # The migration will auto-apply on container restart
-docker-compose restart backend
+docker-compose restart readmeabook

 # Or apply manually:
-docker-compose exec backend npx prisma migrate deploy
-```
-
-**Option B: Local Development**
-```bash
-npx prisma migrate deploy
+docker-compose exec readmeabook npx prisma migrate deploy
 ```

 **What this does:**
@@ -338,6 +333,169 @@ Plex (After Fix):
 3. **Match confidence reporting:** Show match type in UI ("ASIN Match" vs "Fuzzy Match" badge)
 4. **Multi-ASIN support:** Handle cases where one audiobook has multiple regional ASINs

+## Phase 2: Fuzzy Matching Removal (January 2026)
+
+**Status:** ✅ Implemented
+**Date:** 2026-01-26
+**Issue:** Race condition with Audiobookshelf causing false positive matches
+
+### Problem Statement
+
+**Race Condition in Audiobookshelf:**
+1. New ABS item discovered → triggers async `triggerABSItemMatch()` to fetch ASIN
+2. Immediately runs library matching (sync) before ASIN populates
+3. Falls back to fuzzy matching (70% threshold)
+4. Result: One book matches entire series → false positives
+
+**Example:**
+- User has "Foundation" (Book 1) in library
+- Download completes for "Foundation and Empire" (Book 2)
+- Library scan runs before ABS populates ASIN
+- Fuzzy matcher: "Foundation and Empire" vs "Foundation" = 75% match ✅
+- Wrong match! Book 2 marked as available, pointing to Book 1
+
+### Root Cause
+
+**Fuzzy matching in library checks creates false positives.** It should only be used for:
+- ✅ **Prowlarr torrent ranking** - Selecting best release from multiple options
+- ❌ **Library availability checks** - Must be exact ASIN matches only
+
+### Solution
+
+Remove fuzzy matching from all library matching functions. Make it strictly ASIN-only.
+
+**Match Priority (After Phase 2):**
+- `findPlexMatch()`: ASIN (field) → ASIN (GUID) → **null** (no fuzzy fallback)
+- `matchAudiobook()`: ASIN → ISBN → **null** (no fuzzy fallback)
+
+**Preserve Fuzzy Matching:**
+- `ranking-algorithm.ts` - Kept untouched (used for Prowlarr torrent selection)
+
+### Implementation Changes
+
+**Critical Fix: Trigger Metadata Match for Items Without ASIN**
+
+To solve the circular dependency (no ASIN → no match → no trigger → no ASIN), added logic to proactively trigger metadata match for ALL Audiobookshelf items without ASIN during library scans:
+
+**File: `src/lib/processors/scan-plex.processor.ts`**
+- After scanning library items, check for items without ASIN
+- Trigger `triggerABSItemMatch()` for each item without ASIN
+- This populates ASIN asynchronously, allowing future scans to match
+
+**File: `src/lib/processors/plex-recently-added.processor.ts`**
+- Same logic added for recently-added checks
+- Ensures new items get ASIN populated immediately
+
+**File: `src/lib/utils/audiobook-matcher.ts`**
+
+**Removed:**
+- Import: `compareTwoStrings` from `string-similarity`
+- Function: `normalizeTitle()` (title normalization helper)
+- Query: Title substring search (replaced with direct ASIN query)
+- Logic: All fuzzy matching in `findPlexMatch()` (lines 190-261 removed)
+- Logic: All fuzzy matching in `matchAudiobook()` (lines 433-479 removed)
+
+**New Implementation:**
+```typescript
+// findPlexMatch() - ASIN-only matching
+export async function findPlexMatch(audiobook: AudiobookMatchInput) {
+  // Query directly by ASIN (indexed O(1) lookup)
+  const plexBooks = await prisma.plexLibrary.findMany({
+    where: {
+      OR: [
+        { asin: audiobook.asin },
+        { plexGuid: { contains: audiobook.asin } },
+      ],
+    },
+  });
+
+  // Priority 1a: ASIN exact match in dedicated field
+  // Priority 1b: ASIN in plexGuid (backward compatibility)
+  // Return null if no ASIN match (no fuzzy fallback)
+}
+
+// matchAudiobook() - ASIN/ISBN only
+export function matchAudiobook(request, libraryItems) {
+  // 1. Exact ASIN match
+  // 2. Exact ISBN match
+  // 3. Return null (no fuzzy fallback)
+}
+```
+
+**Performance Optimization:**
+- Eliminated title substring query (was: `LIKE '%title%' LIMIT 20`)
+- Direct ASIN query using indexed fields (O(1) lookup)
+- ~100 lines of fuzzy matching code removed
+
+**Test Updates:**
+- Updated `audiobook-matcher.test.ts` to expect null for non-ASIN matches
+- Verified ranking-algorithm.ts untouched (fuzzy preserved for torrents)
+
+### Benefits
+
+1. **Eliminates false positives** - "Foundation" won't match "Foundation and Empire"
+2. **Solves race condition** - Items won't match until ASIN populated by ABS
+3. **Faster matching** - O(1) indexed lookups vs O(n²) string comparisons
+4. **Cleaner code** - ~100 lines removed, simpler logic
+5. **Predictable behavior** - Exact matches only, no threshold tuning
+
+### Trade-offs
+
+1. **Lower initial match rate** - Items without ASIN won't match
+   - ABS: 5-10% of items temporarily (until `triggerABSItemMatch()` completes)
+   - Plex: 30-40% if Plex GUID doesn't contain ASIN (agent-dependent)
+2. **User experience** - Some books may show "not in library" temporarily
+   - This is CORRECT behavior - better no match than false positive
+3. **Discovery pages** - "In Your Library" badge only shows for exact ASIN matches
+
+### Match Distribution (Expected)
+
+**Audiobookshelf (After Phase 2):**
+- ASIN exact match: 95%+ (100% confidence)
+- ISBN exact match: 2% (95% confidence)
+- No match: 3% (correct - waiting for ASIN population)
+
+**Plex (After Phase 2):**
+- ASIN exact match (field): 60% (100% confidence)
+- ASIN exact match (GUID): 30% (100% confidence)
+- No match: 10% (correct - no ASIN in metadata)
+
+### Files Modified
+
+**Processors (Critical Fix):**
+- ✅ `src/lib/processors/scan-plex.processor.ts` - Trigger metadata match for items without ASIN (~25 lines added)
+- ✅ `src/lib/processors/plex-recently-added.processor.ts` - Trigger metadata match for items without ASIN (~20 lines added)
+
+**Matching Logic:**
+- ✅ `src/lib/utils/audiobook-matcher.ts` - Removed fuzzy matching (~150 lines modified, ~100 removed)
+
+**Tests:**
+- ✅ `tests/utils/audiobook-matcher.test.ts` - Updated expectations (~20 lines)
+- ✅ `tests/processors/scan-plex.processor.test.ts` - All 4 tests passing
+- ✅ `tests/processors/plex-recently-added.processor.test.ts` - All 3 tests passing
+
+**Documentation:**
+- ✅ `documentation/fixes/asin-matching-fix.md` - Added Phase 2 section
+- ✅ `documentation/integrations/plex.md` - Updated availability checking description
+- ✅ `documentation/integrations/audible.md` - Updated matcher description
+
+**Preserved (Unchanged):**
+- ✅ `src/lib/utils/ranking-algorithm.ts` - Fuzzy matching for Prowlarr (different purpose)
+
+### Verification
+
+**Unit Tests:**
+```bash
+npm run test -- audiobook-matcher.test.ts  # ✅ All 5 tests passing
+```
+
+**Integration Testing:**
+1. Discovery APIs - "In Your Library" badge only for exact ASIN matches ✅
+2. Request creation - "Already in library" check works with ASIN ✅
+3. Library scanning - Downloaded requests only match if ASIN present ✅
+4. BookDate - `isInLibrary()` check works with ASIN-only ✅
+5. Prowlarr ranking - Fuzzy matching still works (unchanged) ✅
+
 ## Conclusion

 This fix resolves the critical ASIN matching issue for Audiobookshelf by implementing a robust, universal metadata storage architecture. The solution is:
@@ -347,4 +505,10 @@ This fix resolves the critical ASIN matching issue for Audiobookshelf by impleme
 - **Well-tested:** Follows established patterns from existing codebase
 - **Future-proof:** Easy to extend for new backends or metadata types

-**Status:** ✅ Code complete, awaiting database migration and testing
+**Phase 2 Enhancement:**
+- **Eliminates false positives:** ASIN-only matching prevents wrong-book matches
+- **Solves race condition:** Items wait for ASIN population before matching
+- **Preserves critical functionality:** Fuzzy matching kept for Prowlarr torrent ranking
+- **Improves performance:** O(1) indexed lookups replace O(n²) string comparisons
+
+**Status:** ✅ Both phases complete and production-ready
@@ -0,0 +1,220 @@
+# File Hash-Based Library Matching
+
+**Status:** ✅ Implemented | Accurate ASIN matching for RMAB-organized audiobooks
+
+## Overview
+Solves false positive matches in Audiobookshelf fuzzy search by using file hash matching for RMAB-downloaded content.
+
+## Problem
+- New ABS items without ASIN → fuzzy Audible search by title/author
+- Risk: Wrong book matches (e.g., "Foundation" → "Foundation and Empire")
+- Result: Incorrect metadata, false positives
+
+## Solution
+**File Hash Matching Strategy:**
+1. Generate SHA256 hash of audio filenames during organization
+2. Store hash in `Audiobook.filesHash` field
+3. During library scan: compare ABS item files against database hashes
+4. Match found → Use request's ASIN for 100% accurate metadata
+5. No match → Fallback to fuzzy search (external content)
+
+## How It Works
+
+### Organization Phase
+**File:** `src/lib/processors/organize-files.processor.ts`
+
+```typescript
+const filesHash = generateFilesHash(result.audioFiles);
+await prisma.audiobook.update({
+  data: {
+    filesHash: filesHash,  // SHA256 of sorted audio filenames
+    // ... other fields
+  }
+});
+```
+
+### Library Scan Phase
+**Files:** `scan-plex.processor.ts`, `plex-recently-added.processor.ts`
+
+**Phase 1: File Hash Matching (Items WITHOUT ASIN)**
+```typescript
+const itemsWithoutAsin = libraryItems.filter(item => !item.asin && item.externalId);
+
+for (const item of itemsWithoutAsin) {
+  // 1. Fetch ABS item details
+  const absItem = await getABSItem(item.externalId);
+
+  // 2. Generate hash from ABS audio filenames
+  const audioFilenames = absItem.media.audioFiles.map(f => f.metadata.filename);
+  const itemHash = generateFilesHash(audioFilenames);
+
+  // 3. Query for matching RMAB download
+  const matched = await prisma.audiobook.findFirst({
+    where: { filesHash: itemHash, status: 'completed' }
+  });
+
+  // 4. Trigger metadata match (with ASIN if matched, undefined if not)
+  await triggerABSItemMatch(item.externalId, matched?.audibleAsin);
+}
+```
+
+**Phase 2: Request Matching**
+```typescript
+// Match requests to library items and mark as available
+const match = await findPlexMatch({
+  asin: audiobook.audibleAsin,
+  title: audiobook.title,
+  author: audiobook.author
+});
+
+if (match) {
+  // Update audiobook and request status
+  await prisma.audiobook.update({ data: { absItemId: match.plexGuid } });
+  await prisma.request.update({ data: { status: 'available' } });
+
+  // No metadata match triggering needed:
+  // - Items without ASIN: Already handled in Phase 1
+  // - Items with ASIN: Already have correct metadata
+}
+```
+
+## Hash Generation Algorithm
+**File:** `src/lib/utils/files-hash.ts`
+
+**Process:**
+1. Extract basenames from file paths
+2. Filter to audio extensions: `.m4b`, `.m4a`, `.mp3`, `.mp4`, `.aa`, `.aax`
+3. Normalize to lowercase (case-insensitive)
+4. Sort alphabetically (deterministic order)
+5. Generate SHA256: `crypto.createHash('sha256').update(JSON.stringify(sorted)).digest('hex')`
+
+**Properties:**
+- Deterministic: Same files → same hash (regardless of order/path)
+- Path-agnostic: Only basenames matter
+- Case-insensitive: "CHAPTER 01.mp3" === "chapter 01.mp3"
+- Fast: O(1) database lookup with indexed field
+
+## Database Schema
+
+**Model:** `Audiobook`
+
+```prisma
+model Audiobook {
+  // ... existing fields
+  filesHash String? @map("files_hash") @db.Text  // SHA256 (64 chars)
+
+  @@index([filesHash])  // Fast O(1) lookups
+}
+```
+
+**Migration:** `20260126100000_add_audiobook_files_hash`
+
+## Implementation Details
+
+### Metadata Match Strategy
+
+**Phase 1 (File Hash):** Handle NEW items WITHOUT ASIN
+- Filter: `libraryItems.filter(item => !item.asin)`
+- Trigger metadata match with file-hash-matched ASIN or undefined
+- **This is the ONLY phase that triggers ABS metadata matching**
+
+**Phase 2 (Request Match):** Match requests, no metadata triggering
+- Match requests to library items by ASIN/title/author
+- Update request status to 'available'
+- **No metadata match triggering** - items either:
+  - Were handled in Phase 1 (new items without ASIN)
+  - Already have correct metadata (items with ASIN from ABS)
+
+**Why This Works:**
+- **Single source of truth**: Only file hash phase triggers metadata matching
+- **No redundant API calls**: Items with ASIN already have correct metadata
+- **Clean separation**: Phase 1 = metadata, Phase 2 = request matching
+- **Simple and efficient**: No duplicate checks, no wasted API calls
+
+## Edge Cases
+
+### Externally-Added Content
+- User manually imports audiobook to ABS (not via RMAB)
+- No matching `filesHash` in database
+- **Fallback:** Fuzzy metadata match (current behavior preserved)
+
+### Modified Files
+- User adds/removes chapters after organization
+- ABS hash won't match RMAB hash
+- **Fallback:** Fuzzy metadata match
+
+### Existing Content (Before Feature)
+- Audiobooks organized before hash feature
+- `filesHash` field is NULL
+- **Behavior:** Continues using fuzzy matching
+- **Future:** Admin job could backfill hashes (out of scope)
+
+### Chapter-Merged Files
+- 20 MP3s → 1 M4B via chapter merging
+- Hash generated AFTER merging
+- **Works correctly:** Hash reflects final organized state
+
+### Multiple Downloads (Same Book)
+- User re-downloads same audiobook (different edition/request)
+- Multiple records with same `filesHash`
+- **Solution:** `findFirst()` returns first match (acceptable - same ASIN)
+
+## Performance
+
+**Storage:**
+- New index: ~8 bytes per row (minimal)
+- SHA256 hash: 64 characters per record
+
+**API Calls:**
+- One additional `getABSItem()` call per item without ASIN
+- Typical response: ~1-5KB JSON
+- Latency: ~50-100ms per call
+
+**Database:**
+- Index lookup: O(1) with hash index (extremely fast)
+
+**Impact:**
+- 10 items without ASIN → +500-1000ms per scan (acceptable)
+
+## Logging
+
+**Organization:**
+```
+[INFO] Generated files hash: abc123def456... (5 audio files)
+```
+
+**Library Scan (Match Found):**
+```
+[INFO] File hash match found for "Foundation" → ASIN: B08G9PRS1K (from "Foundation (Unabridged)")
+[INFO] Triggered metadata match with ASIN B08G9PRS1K for: "Foundation"
+```
+
+**Library Scan (No Match):**
+```
+[INFO] No file match found, triggering fuzzy metadata match for: "The Expanse"
+```
+
+## Benefits
+
+✅ **100% Accurate Matching** - RMAB-organized content always gets correct ASIN
+✅ **Path-Agnostic** - Works regardless of folder structure differences
+✅ **Fast Lookups** - O(1) database query with indexed field
+✅ **Graceful Fallback** - External content still works via fuzzy matching
+✅ **No Breaking Changes** - Existing content continues working
+
+## Testing
+
+**Unit Tests:** `tests/utils/files-hash.test.ts`
+- Hash generation correctness
+- Deterministic behavior
+- Edge case handling
+
+**Integration Tests:** `tests/processors/*.test.ts`
+- Hash storage during organization
+- Hash matching during library scan
+- Fallback to fuzzy matching
+
+## Related
+- [Audiobookshelf Integration](../integrations/audiobookshelf.md) - Backend mode
+- [File Organization](../phase3/file-organization.md) - Organization flow
+- [Database Schema](../backend/database.md) - Audiobook model
@@ -88,22 +88,29 @@ Where `{baseUrl}` is determined by configured region (e.g., `https://www.audible

 ## Unified Matching (`audiobook-matcher.ts`)

-**Status:** ✅ Production Ready
+**Status:** ✅ Production Ready (ASIN-Only Matching)

 Single matching algorithm used everywhere (search, popular, new-releases, jobs).

-**Process:**
-1. Query DB candidates: `audibleId` exact match OR partial title+author match
-2. If exact ASIN match → return immediately
-3. Fuzzy match: title 70% + author 30% weights, 70% threshold
-4. Return best match or null
+**Process (Library Availability Checks):**
+1. Query DB directly by ASIN (indexed O(1) lookup)
+2. Check ASIN in dedicated field (100% confidence)
+3. Check ASIN in plexGuid (backward compatibility)
+4. Return match or null (no fuzzy fallback)
+
+**Match Priority:**
+- `findPlexMatch()`: ASIN (field) → ASIN (GUID) → null
+- `matchAudiobook()`: ASIN → ISBN → null

 **Benefits:**
 - Real-time matching at query time (not pre-matched)
- Works regardless of job execution order
- Prevents duplicate `plexGuid` assignments
+- 100% confidence matches only (eliminates false positives)
+- O(1) indexed lookups (faster than fuzzy matching)
+- Solves race condition with Audiobookshelf ASIN population
 - Used by all APIs for consistency

+**Note:** Fuzzy matching (70% threshold) is preserved in `ranking-algorithm.ts` for Prowlarr torrent ranking, where it's needed to score multiple release candidates. Library availability checks require exact ASIN matches only.
+
 ## Database-First Approach

 **Status:** ✅ Implemented
@@ -77,7 +77,7 @@ Anna's Archive uses Cloudflare protection which may block direct scraping reques

 **Method 1: ASIN Search (exact match)**
 ```
-Search: https://annas-archive.li/search?ext=epub&q="asin:B09TWSRMCB"
+Search: https://annas-archive.li/search?ext=epub&lang=en&q="asin:B09TWSRMCB"
  ↓
 MD5 Page: https://annas-archive.li/md5/[md5]
  ↓ (Filter: "slow partner server" links)
@@ -21,6 +21,7 @@ Connectivity to Plex for OAuth, library management, content detection, and autom
 **GET {server_url}/library/sections/{id}/refresh** - Trigger async scan
 **GET {server_url}/library/metadata/{rating_key}** - Item metadata (includes user's personal rating)
 **GET {server_url}/library/sections/{id}/search?title={query}** - Search
+**DELETE {server_url}/library/metadata/{rating_key}** - Delete library item (requires deletion enabled in Plex settings)

 Auth: `X-Plex-Token` header
 Response: XML (requires `xml2js` parsing to JSON)
@@ -256,14 +257,41 @@ interface PlexLibrary {
  - testConnection() only used for: testing connections, initial fetching during setup/settings
 - Result: Faster authentication, no unnecessary API calls, consistent architecture

+## Library Item Deletion
+
+**Endpoint:** `DELETE /library/metadata/{ratingKey}`
+
+**Use Case:** When admin deletes a request, also delete from Plex library to keep in sync
+
+**Requirements:**
+- Deletion must be enabled: Settings > Server > Library in Plex webui
+- Without this setting enabled, DELETE requests will fail
+
+**Implementation:**
+- `deleteItem(serverUrl, authToken, ratingKey)` - Deletes library item by ratingKey
+- Called during request deletion when backend mode is 'plex'
+- Extracts ratingKey from audiobook.plexGuid (format: `plex://album/{ratingKey}`)
+- Mirrors ABS deletion behavior for consistency
+
+**Error Handling:**
+- 404: Item not found (already deleted) - logged but not thrown
+- Other errors: Logged but deletion continues (prevents blocking request deletion)
+
 ## Availability Checking

-1. **DB Population:** Plex scan creates/updates records with `plexGuid` + `availabilityStatus: 'available'`
-2. **Audible Matching:** Refresh job fuzzy matches (85% threshold), sets `availabilityStatus: 'available'` for matches
-3. **API Enrichment:** Discovery APIs use real-time matching (70% threshold) at query time
-4. **UI:** `AudiobookCard` shows "In Your Library" if `isAvailable: true`
+1. **DB Population:** Plex scan creates/updates records with `plexGuid` + ASIN + `availabilityStatus: 'available'`
+2. **Audible Matching:** Real-time ASIN-only matching (100% confidence, exact matches only)
+3. **API Enrichment:** Discovery APIs use real-time ASIN matching at query time
+4. **UI:** `AudiobookCard` shows "In Your Library" if `isAvailable: true` (ASIN exact match)
 5. **Server Validation:** `/api/requests` returns 409 if `availabilityStatus === 'available'`

+**Match Priority (ASIN-Only):**
+- ASIN in dedicated field (100% confidence) → Match
+- ASIN in plexGuid (backward compatibility) → Match
+- No ASIN match → Return null (no fuzzy fallback)
+
+**Note:** Fuzzy matching (70% threshold) is preserved in `ranking-algorithm.ts` for Prowlarr torrent selection, but NOT used for library availability checks. This eliminates false positives (e.g., "Foundation" matching "Foundation and Empire").
+
 ## Tech Stack

 - axios/node-fetch
@@ -43,9 +43,10 @@ Result: Douglas Adams/Stephen Fry/The Hitchhiker's Guide to the Galaxy/
 5. **Copy** files (not move - originals stay for seeding)
 6. **Tag metadata** (if enabled) - writes correct title, author, narrator, ASIN to audio files
 7. Copy cover art if found, else download from Audible
-8. Update request status to `downloaded`
-9. **Trigger filesystem scan** (if enabled) - tells Plex/ABS to scan for new files
-10. Originals remain until seeding requirements met
+8. **Generate file hash** - SHA256 of sorted audio filenames for library matching (see: [fixes/file-hash-matching.md](../fixes/file-hash-matching.md))
+9. Update request status to `downloaded` and store file hash in `audiobooks.files_hash`
+10. **Trigger filesystem scan** (if enabled) - tells Plex/ABS to scan for new files
+11. Originals remain until seeding requirements met

 ## Filesystem Scan Triggering

@@ -1,9 +1,36 @@
 # Intelligent Ranking Algorithm

-**Status:** ✅ Implemented
+**Status:** ✅ Implemented | Comprehensive edge case test coverage
+**Tests:** tests/utils/ranking-algorithm.test.ts (73 test cases)

 Evaluates and scores torrents to automatically select best audiobook download.

+## Test Coverage
+
+**Comprehensive edge case testing includes:**
+- ✅ Parenthetical/bracketed content handling (4 tests)
+- ✅ Structured metadata prefix validation (5 tests)
+- ✅ Suffix validation (5 tests)
+- ✅ Multi-author handling (6 tests)
+- ✅ Bonus modifiers (indexer priority + flags, 7 tests)
+- ✅ Tiebreaker sorting (2 tests)
+- ✅ Word coverage edge cases (4 tests)
+- ✅ Format detection (5 tests)
+- ✅ **Author presence check (10 tests)**
+- ✅ **Context-aware filtering (3 tests)**
+- ✅ **API compatibility (2 tests)**
+
+**Tested edge cases prevent regressions from previous tweaks:**
+- "We Are Legion (We Are Bob)" matching with/without subtitle
+- "This Inevitable Ruin Dungeon Crawler Carl" NOT matching "Dungeon Crawler Carl"
+- "The Housemaid's Secret" NOT matching "The Housemaid"
+- Multiple author splitting and role filtering
+- Flag bonus stacking and case-insensitive matching
+- Tiebreaker sorting by publish date
+- **"Project Hail Mary" (no author) NOT matching when Andy Weir required (automatic mode)**
+- **All results shown in interactive mode regardless of author**
+- **Middle initials, name order, and role filtering for author matching**
+
 ## Scoring Criteria (100 points max)

 **1. Title/Author Match (60 pts max) - MOST IMPORTANT**
@@ -15,13 +42,35 @@ Evaluates and scores torrents to automatically select best audiobook download.
 - **Parenthetical/bracketed content is optional**: Content in () [] {} treated as subtitle (may be omitted from torrents)
  - "We Are Legion (We Are Bob)" → Required: ["we", "are", "legion"], Optional: ["bob"]
  - "Title [Series Name]" → Required: ["title"], Optional: ["series", "name"]
+  - "Book Title {Extra Info}" → Required: ["book", "title"], Optional: ["extra", "info"]
 - Calculates coverage: % of **required** words found in torrent title
 - **Hard requirement: 80%+ coverage of required words or automatic 0 score**
- Example: "The Wild Robot on the Island" → ["wild", "robot", "island"]
-  - "The Wild Robot" → ["wild", "robot"] → 2/3 = 67% → **REJECTED**
-  - "The Wild Robot on the Island" → 3/3 = 100% → **PASSES**
- Example: "We Are Legion (We Are Bob)" → Required: ["we", "are", "legion"]
-  - "Dennis E. Taylor - Bobiverse - 01 - We Are Legion" → 3/3 = 100% → **PASSES**
+
+**Stage 1.5: Author Presence Check (CONTEXT-AWARE)**
+- **Automatic mode (requireAuthor: true - default):** At least ONE author must be present with high confidence
+- **Interactive mode (requireAuthor: false):** Check disabled, all results shown to user
+- **High confidence = any of:**
+  1. Exact substring match: "dennis e. taylor" in torrent
+  2. High fuzzy similarity (≥ 0.85): handles spacing/punctuation
+  3. Core components present: First name + Last name within 30 chars
+- Handles variations:
+  - Middle initials: "Dennis E. Taylor" ↔ "Dennis Taylor"
+  - Name order: "Brandon Sanderson" ↔ "Sanderson, Brandon"
+  - Multiple authors: Only ONE needs to match (OR logic)
+  - Filters roles: "translator", "narrator" ignored
+- **If check fails in automatic mode → automatic 0 score**
+- **Prevents wrong-author matches**: Stops "Project Hail Mary" (no author) from matching request for Andy Weir
+
+**Edge Cases - Coverage Examples:**
+- "The Wild Robot on the Island" → ["wild", "robot", "island"]
+  - ✅ "The Wild Robot on the Island" → 3/3 = 100% → **PASSES**
+  - ❌ "The Wild Robot" → 2/3 = 67% → **REJECTED**
+- "We Are Legion (We Are Bob)" → Required: ["we", "are", "legion"]
+  - ✅ "Dennis E. Taylor - Bobiverse - 01 - We Are Legion" → 3/3 = 100% → **PASSES**
+  - ✅ "We Are Legion (We Are Bob)" → 3/3 = 100% → **PASSES**
+- "Harry Potter and the Philosopher Stone" → ["harry", "potter", "philosopher", "stone"] (stop words filtered)
+  - ✅ "Harry Potter Philosopher Stone" → 4/4 = 100% → **PASSES**
+  - ❌ "Harry Potter" → 2/4 = 50% → **REJECTED**
 - Prevents wrong series books from matching while handling common subtitle patterns

 **Stage 2: Title Matching (0-45 pts)**
@@ -35,22 +84,44 @@ Evaluates and scores torrents to automatically select best audiobook download.
    - Title preceded by metadata separator (` - `, `: `, `—`) — handles "Author - Series - 01 - Title"
    - Author name appears in prefix — handles "Author Name - Title"
  - **Acceptable suffix**: Followed by metadata markers: " by", " [", " -", " (", " {", " :", "," or end of string
+    - Also accepts author name in suffix (e.g., "Title AuthorName Year")
 - Complete match → 45 pts
 - Unstructured prefix (words without separators) → fuzzy similarity (partial credit)
-  - Prevents: "This Inevitable Ruin Dungeon Crawler Carl" matching "Dungeon Crawler Carl"
 - Suffix continues with non-metadata → fuzzy similarity (partial credit)
-  - Prevents: "The Housemaid's Secret" matching "The Housemaid"
 - No substring match → fuzzy similarity (best score from full or required title)

+**Edge Cases - Prefix Validation:**
+- ✅ "Brandon Sanderson - Mistborn - 01 - The Final Empire" (structured metadata prefix)
+- ✅ "Brandon Sanderson The Way of Kings" (author name in prefix)
+- ✅ "Series Name: Book Title" (colon separator)
+- ✅ "Author Name — Book Title" (em-dash separator)
+- ❌ "This Inevitable Ruin Dungeon Crawler Carl" → REJECTED for "Dungeon Crawler Carl" (unstructured words before title)
+
+**Edge Cases - Suffix Validation:**
+- ✅ "The Great Book by Author Name" (metadata marker " by")
+- ✅ "Book Title [Unabridged] (2024)" (bracketed metadata)
+- ✅ "Book Title John Smith 2024" (author name in suffix)
+- ✅ "Author - Book Title" (title at end of string)
+- ❌ "The Housemaid's Secret - Freida McFadden" → REJECTED for "The Housemaid" (suffix continues with "'s Secret")
+
 **Stage 3: Author Matching (0-15 pts)**
 - Exact substring match → proportional credit
 - No exact match → fuzzy similarity (partial credit)
 - Splits authors on delimiters (comma, &, "and", " - ")
 - Filters out roles ("translator", "narrator")
-
 - Order-independent, no structure assumptions
 - Ensures correct book is selected over wrong book with better format

+**Edge Cases - Multi-Author Handling:**
+- ✅ "Jane Doe, John Smith" → splits on comma
+- ✅ "Jane Doe & John Smith" → splits on ampersand
+- ✅ "Jane Doe and John Smith" → splits on "and"
+- ✅ "Jane Doe, translator" → filters out "translator" role
+- ✅ "Jane Doe, narrator" → filters out "narrator" role
+- Proportional credit: If 1 of 3 authors matches → 5 pts (1/3 × 15)
+- Proportional credit: If 2 of 3 authors match → 10 pts (2/3 × 15)
+- Full credit: If all authors match → 15 pts
+
 **2. Format Quality (25 pts max)**
 - M4B with chapters: 25
 - M4B without chapters: 22
@@ -93,6 +164,16 @@ Evaluates and scores torrents to automatically select best audiobook download.
 - Universal across all indexers (not indexer-specific)
 - Multiple flag bonuses stack (additive)

+**Edge Cases - Flag Matching:**
+- ✅ "FREELEECH" matches config "freeleech" (case-insensitive)
+- ✅ "  Freeleech  " matches config " Freeleech " (whitespace-trimmed)
+- ✅ Multiple flags: ["Freeleech", "Double Upload"] → both bonuses applied
+- Example stacking: Freeleech (+50%) + Double Upload (+25%) on 80 base score
+  - Freeleech bonus: 80 × 0.5 = +40
+  - Double Upload bonus: 80 × 0.25 = +20
+  - Total bonus: +60 points
+  - Final score: 80 + 60 = 140
+
 **Future Modifiers (planned):**
 - User preferences
 - Custom rules
@@ -114,6 +195,14 @@ When multiple torrents have identical final scores:
 - Ensures latest uploads are preferred when quality is equal
 - Example: 3 torrents with 171 final score → newest upload ranks #1

+**Edge Cases - Tiebreaker Examples:**
+- ✅ Same score, different dates:
+  - Torrent A: Score 85, published 2024-06-01 → **Ranks #1**
+  - Torrent B: Score 85, published 2023-01-01 → Ranks #2
+- ❌ Different scores, ignore date:
+  - Torrent A: Score 95, published 2020-01-01 → **Ranks #1** (better match wins despite older date)
+  - Torrent B: Score 75, published 2024-01-01 → Ranks #2
+
 ## Interface

 ```typescript
@@ -122,6 +211,12 @@ interface IndexerFlagConfig {
  modifier: number;     // -100 to 100 (percentage)
 }

+interface RankTorrentsOptions {
+  indexerPriorities?: Map<number, number>;  // indexerId -> priority (1-25)
+  flagConfigs?: IndexerFlagConfig[];        // Flag bonus configurations
+  requireAuthor?: boolean;                  // Enforce author check (default: true)
+}
+
 interface BonusModifier {
  type: 'indexer_priority' | 'indexer_flag' | 'custom';
  value: number;        // Multiplier (e.g., 0.4 for 40%)
@@ -149,12 +244,46 @@ interface RankedTorrent extends TorrentResult {
  };
 }

+// New API (recommended)
 function rankTorrents(
  torrents: TorrentResult[],
  audiobook: AudiobookRequest,
-  indexerPriorities?: Map<number, number>,  // indexerId -> priority (1-25)
-  flagConfigs?: IndexerFlagConfig[]         // Flag bonus configurations
+  options?: RankTorrentsOptions
 ): RankedTorrent[];
+
+// Legacy API (backwards compatible)
+function rankTorrents(
+  torrents: TorrentResult[],
+  audiobook: AudiobookRequest,
+  indexerPriorities?: Map<number, number>,
+  flagConfigs?: IndexerFlagConfig[]
+): RankedTorrent[];
+```
+
+## Usage Examples
+
+**Automatic selection (strict author filtering):**
+```typescript
+// Background job - safe auto-download
+const ranked = rankTorrents(torrents, audiobook, {
+  indexerPriorities,
+  flagConfigs,
+  requireAuthor: true  // Default - prevents wrong authors
+});
+
+const topResult = ranked[0];  // Safe to auto-download
+```
+
+**Interactive search (show all results):**
+```typescript
+// User browsing - let user decide
+const ranked = rankTorrents(torrents, audiobook, {
+  indexerPriorities,
+  flagConfigs,
+  requireAuthor: false  // Show everything, including edge cases
+});
+
+return ranked;  // User can see torrents without author info
 ```

 ## Tech Stack
@@ -24,7 +24,10 @@ Free, open-source Usenet/NZB download client with comprehensive Web API. Industr
 **GET /api?mode=history&limit=100&output=json&apikey={key}** - Get completed/failed downloads
 **GET /api?mode=pause&value={nzbId}&output=json&apikey={key}** - Pause download
 **GET /api?mode=resume&value={nzbId}&output=json&apikey={key}** - Resume download
-**GET /api?mode=queue&name=delete&value={nzbId}&del_files={0|1}&output=json&apikey={key}** - Delete download
+**GET /api?mode=queue&name=delete&value={nzbId}&del_files={0|1}&output=json&apikey={key}** - Delete download from queue
+**GET /api?mode=history&name=delete&value={nzbId}&del_files={0|1}&archive={0|1}&output=json&apikey={key}** - Delete/archive download from history
+  - `archive=1` (default): Move to hidden archive (preserves for troubleshooting)
+  - `archive=0`: Permanently delete from history
 **GET /api?mode=get_config&output=json&apikey={key}** - Get configuration (categories)
 **GET /api?mode=set_config&section=categories&keyword={cat}&value={path}&output=json&apikey={key}** - Create/update category

@@ -179,6 +182,38 @@ interface HistoryItem {
 **4. Queue vs History Logic** - Checks queue first, falls back to history
 **5. SSL Certificate Errors** - Optional SSL verification disable for self-signed certs

+## Automatic Cleanup
+
+**Per-Indexer Configuration:**
+- Usenet indexers have "Remove After Processing" option (default: enabled)
+- When enabled, NZB downloads are automatically cleaned up after files are organized
+- Saves disk space by removing completed download files
+
+**Two-Stage Cleanup Process:**
+1. **Filesystem Cleanup:** Manually deletes download directory/files using `fs.rm()`
+   - Removes extracted files from category download directory
+   - Handles both single files and directories recursively
+   - Gracefully handles already-deleted files (ENOENT)
+
+2. **SABnzbd Archive:** Archives NZB from history (hides from UI)
+   - Uses SABnzbd's archive feature (default: `archive=1`)
+   - Preserves job in hidden archive for troubleshooting/auditing
+   - Does NOT permanently delete from history
+   - Does NOT attempt queue deletion (if still in queue, something went wrong)
+
+**Implementation:**
+- Location: `organize-files.processor.ts`
+- After file organization completes, checks if indexer has `removeAfterProcessing` enabled
+- Filesystem cleanup performed first (critical for disk space)
+- SABnzbd archive performed second (UI cleanup)
+- Non-blocking: logs warnings but doesn't fail the job if cleanup fails
+
+**Why Archive Instead of Delete:**
+- Preserves download history for troubleshooting
+- Maintains records for duplicate detection
+- Allows reviewing past downloads if issues arise
+- Can be viewed in SABnzbd by toggling "Show Archive" in history
+
 ## Comparison: SABnzbd vs qBittorrent

 | Feature | SABnzbd | qBittorrent |
@@ -190,6 +225,7 @@ interface HistoryItem {
 | Seeding | N/A (Usenet is not P2P) | Required (tracker) |
 | Categories | Path-based | Path + tag-based |
 | File Handling | Auto-extracts archives | Downloads as-is |
+| Cleanup | Automatic (optional, per-indexer) | Seeding time based |

 ## Tech Stack

@@ -54,7 +54,7 @@ interface SetupState {
  plexLibraryId: string;
  prowlarrUrl: string;
  prowlarrApiKey: string;
-  prowlarrIndexers: Array<{id: number, name: string, priority: number, seedingTimeMinutes: number, rssEnabled: boolean}>;
+  prowlarrIndexers: Array<{id: number, name: string, protocol: string, priority: number, seedingTimeMinutes?: number, removeAfterProcessing?: boolean, rssEnabled: boolean}>;
  downloadClient: 'qbittorrent' | 'transmission';
  downloadClientUrl: string;
  downloadClientUsername: string;