Files
ReadMeABook/documentation/fixes/asin-matching-fix.md
T
kikootwo 3290ebbc9d Add guard for empty ASIN in audiobook matcher
Prevents empty ASIN values from matching all library books by adding an early return in findPlexMatch(). Updates documentation to describe the critical bug, its impact, and the implemented fix. This resolves a major issue where AI recommendations were incorrectly filtered out due to empty ASINs matching every record.
2026-01-28 14:28:03 -05:00

593 lines
21 KiB
Markdown

# ASIN Matching Fix for Audiobookshelf
**Status:** ✅ Implemented (awaiting database migration)
**Date:** 2025-12-22
**Issue:** ASIN matching failing for Audiobookshelf backend, resulting in fuzzy matches only
## Problem Statement
### Root Cause
Audiobookshelf provides rich ASIN metadata for almost every audiobook, but the matching algorithm was failing to use it effectively. The issue was **data loss at the database layer**:
1. **AudiobookshelfLibraryService** correctly extracted ASIN from ABS metadata ✅
2. **LibraryItem interface** correctly passed ASIN to scan processor ✅
3. **plex_library table** had NO `asin` or `isbn` columns ❌
4. **Scan processors** discarded ASIN data during save ❌
5. **Matcher** could only find ASIN in `plexGuid` field (works for Plex, fails for ABS) ❌
### Data Flow (Before Fix)
```
Audiobookshelf API → metadata.asin = "B00ABCD123"
AudiobookshelfLibraryService.mapABSItemToLibraryItem()
LibraryItem { asin: "B00ABCD123" } ✅
scan-plex processor saves to plex_library
❌ NO asin FIELD IN SCHEMA → Data discarded
PlexLibrary { plexGuid: "li_abc123", title: "...", author: "..." }
findPlexMatch() searches for ASIN in plexGuid
"li_abc123".includes("B00ABCD123") = FALSE ❌
Result: Fuzzy match only (70% threshold) instead of ASIN match (100%)
```
### Impact
- **Audiobookshelf users:** 0% ASIN matches → All fuzzy matches at 70% threshold
- **Match accuracy:** Significantly lower than expected
- **User experience:** "I know this book is in my library with ASIN metadata, why isn't it matching?"
## Solution Architecture
### 1. Schema Enhancement
**Added universal identifier fields to `plex_library` table:**
```prisma
model PlexLibrary {
// ... existing fields ...
// Universal identifiers (works for both Plex and Audiobookshelf)
asin String? // Audible ASIN - extracted from Plex GUID or stored directly from ABS
isbn String? // ISBN (10 or 13) - for additional matching capability
// ... rest of fields ...
@@index([asin])
@@index([isbn])
}
```
**Rationale:**
- **Universal storage:** Works for any library backend (Plex, Audiobookshelf, future integrations)
- **No data loss:** ASIN/ISBN preserved from source system
- **Backward compatible:** Existing Plex GUID matching still works
- **Performance:** Indexed for fast lookups
### 2. Data Persistence Layer
**Updated scan processors to store ASIN/ISBN:**
**scan-plex.processor.ts:**
```typescript
// CREATE operation
await prisma.plexLibrary.create({
data: {
plexGuid: item.externalId,
title: item.title,
author: item.author || 'Unknown Author',
asin: item.asin, // ✅ NEW: Store ASIN from library backend
isbn: item.isbn, // ✅ NEW: Store ISBN from library backend
// ... other fields ...
},
});
// UPDATE operation
await prisma.plexLibrary.update({
where: { id: existing.id },
data: {
title: item.title,
asin: item.asin || existing.asin, // ✅ Update ASIN if available
isbn: item.isbn || existing.isbn, // ✅ Update ISBN if available
// ... other fields ...
},
});
```
**plex-recently-added.processor.ts:**
- Same changes applied to recently-added check processor
- Ensures new items also get ASIN/ISBN stored
### 3. Matching Logic Enhancement
**Updated `findPlexMatch()` in audiobook-matcher.ts:**
**Priority 1a: Exact ASIN match (dedicated field)**
```typescript
// NEW: Check dedicated ASIN field first (works for all backends)
for (const plexBook of plexBooks) {
if (plexBook.asin && plexBook.asin.toLowerCase() === audiobook.asin.toLowerCase()) {
return plexBook; // 100% confidence
}
}
```
**Priority 1b: ASIN in plexGuid (backward compatibility)**
```typescript
// EXISTING: Fall back to checking Plex GUID (for legacy Plex data)
for (const plexBook of plexBooks) {
if (plexBook.plexGuid && plexBook.plexGuid.includes(audiobook.asin)) {
return plexBook; // 100% confidence
}
}
```
**Priority 2: Fuzzy matching**
- Existing fuzzy title/author matching still works as fallback
- 70% weighted threshold (title 70%, author 30%)
**ASIN Filtering Enhanced:**
```typescript
// NEW: Check dedicated ASIN field first (more reliable)
if (plexBook.asin) {
if (plexBook.asin.toLowerCase() !== audiobook.asin.toLowerCase()) {
return false; // Wrong ASIN in dedicated field - reject candidate
}
return true; // Correct ASIN in dedicated field - keep candidate
}
// EXISTING: Fall back to checking plexGuid for legacy Plex data
// ... existing GUID-based filtering ...
```
### 4. Data Flow (After Fix)
```
Audiobookshelf API → metadata.asin = "B00ABCD123"
AudiobookshelfLibraryService.mapABSItemToLibraryItem()
LibraryItem { asin: "B00ABCD123" } ✅
scan-plex processor saves to plex_library
✅ STORES IN asin FIELD
PlexLibrary {
plexGuid: "li_abc123",
asin: "B00ABCD123", ✅
isbn: "1234567890",
title: "...",
author: "..."
}
findPlexMatch() searches dedicated asin field
"B00ABCD123" === "B00ABCD123" = TRUE ✅
Result: ASIN match (100% confidence)
```
## Files Modified
### Schema & Migration
-`prisma/schema.prisma` - Added `asin` and `isbn` fields to PlexLibrary model
-`prisma/migrations/20251222140111_add_asin_isbn_to_library/migration.sql` - Database migration
### Processors
-`src/lib/processors/scan-plex.processor.ts` - Store ASIN/ISBN during full library scan
-`src/lib/processors/plex-recently-added.processor.ts` - Store ASIN/ISBN during recently-added check
### Matching Logic
-`src/lib/utils/audiobook-matcher.ts` - Enhanced ASIN matching with dedicated field priority
### Documentation
-`documentation/backend/database.md` - Added Plex_Library table documentation
-`documentation/fixes/asin-matching-fix.md` - This file
## Implementation Steps (User Action Required)
### Step 1: Apply Database Migration
**Docker deployment:**
```bash
# The migration will auto-apply on container restart
docker-compose restart readmeabook
# Or apply manually:
docker-compose exec readmeabook npx prisma migrate deploy
```
**What this does:**
- Adds `asin` (TEXT, nullable) column to `plex_library` table
- Adds `isbn` (TEXT, nullable) column to `plex_library` table
- Creates indexes on both columns for fast lookups
**Safe to run:** Migration is non-destructive (adds columns, doesn't modify existing data)
### Step 2: Trigger Library Scan
After migration, trigger a full library scan to populate ASIN/ISBN for existing items:
**Via Admin UI:**
1. Navigate to Admin → Jobs
2. Find "Library Scan" job
3. Click "Run Now"
**Via API:**
```bash
curl -X POST http://localhost:3030/api/admin/jobs/scan-plex \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
```
**Expected behavior:**
- **Audiobookshelf:** ASIN/ISBN populated from metadata for all items
- **Plex:** ASIN extracted from GUIDs (where present) and stored in dedicated field
### Step 3: Verify ASIN Matching
**Check logs with debug mode:**
```bash
LOG_LEVEL=debug docker-compose restart backend
```
**Look for matcher logs:**
```json
{
"MATCHER": {
"matchType": "asin_exact_field", // ✅ Should see this for ABS items
"matched": true,
"result": {
"asin": "B00ABCD123",
"confidence": 100
}
}
}
```
**Before fix:** `matchType: "fuzzy"` with confidence 70-85%
**After fix:** `matchType: "asin_exact_field"` with confidence 100%
## Expected Results
### Audiobookshelf Backend
- **Before:** 0% ASIN matches → All fuzzy matches (70%+ threshold)
- **After:** ~95%+ ASIN matches → 100% confidence matches
### Plex Backend
- **Before:** ASIN matches via plexGuid (existing behavior)
- **After:** ASIN matches via dedicated field OR plexGuid (improved + backward compatible)
### Match Distribution (Expected)
```
Audiobookshelf (After Fix):
- ASIN exact match: 95%+ (100% confidence)
- ISBN exact match: 2% (95% confidence)
- Fuzzy match: 3% (70%+ confidence)
Plex (After Fix):
- ASIN exact match (field): 60% (100% confidence)
- ASIN exact match (GUID): 30% (100% confidence)
- Fuzzy match: 10% (70%+ confidence)
```
## Benefits
1.**Universal metadata storage** - Works for any library backend
2.**No data loss** - ASIN/ISBN preserved from source systems
3.**Backward compatible** - Plex GUID matching still works
4.**Future-proof** - Easy to add new library backends
5.**Improved accuracy** - 100% confidence ASIN matches vs 70% fuzzy matches
6.**Better UX** - Users see "exact match" instead of "fuzzy match" for items with ASIN
## Troubleshooting
### Issue: Migration fails with "column already exists"
**Solution:** Column was manually added or migration already ran. Safe to ignore.
### Issue: Still seeing fuzzy matches for ABS items
**Checklist:**
1. ✅ Migration applied? Check: `SELECT column_name FROM information_schema.columns WHERE table_name = 'plex_library';`
2. ✅ Library scan completed? Check admin job logs
3. ✅ ASIN populated? Query: `SELECT asin, title FROM plex_library WHERE asin IS NOT NULL LIMIT 10;`
4. ✅ Debug logs enabled? Set `LOG_LEVEL=debug`
### Issue: Plex items missing ASIN
**Expected:** Not all Plex items have ASIN in their GUIDs (depends on Plex agent used)
**Workaround:** Fuzzy matching still works as fallback (70% threshold)
## Technical Notes
### Why not query Audiobookshelf directly for ASIN?
- **Performance:** Querying external API for every match is slow
- **Reliability:** Network issues could break matching
- **Architecture:** Single source of truth in local database
- **Consistency:** Same matching logic for all backends
### Why both `asin` field AND `plexGuid` checking?
- **Backward compatibility:** Existing Plex installations already have ASINs in GUIDs
- **Data migration:** Don't want to re-scan all Plex libraries immediately
- **Graceful upgrade:** Works before and after library scan
### Why index ASIN/ISBN?
- **Performance:** ASIN lookups are frequent (every availability check, every match operation)
- **Query optimization:** Index enables fast `WHERE asin = ?` queries
- **Scalability:** Maintains performance with 1000+ library items
## Related Documentation
- [Database Schema](../backend/database.md) - Updated with Plex_Library table
- [Audiobookshelf Integration](../features/audiobookshelf-integration.md) - Full backend integration docs
- [Plex Integration](../integrations/plex.md) - Plex-specific matching details
## Future Enhancements
**Potential improvements:**
1. **ISBN matching priority:** Add ISBN exact match between ASIN and fuzzy matching (95% confidence)
2. **ASIN extraction for Plex:** Periodic job to extract ASINs from existing Plex GUIDs → populate dedicated field
3. **Match confidence reporting:** Show match type in UI ("ASIN Match" vs "Fuzzy Match" badge)
4. **Multi-ASIN support:** Handle cases where one audiobook has multiple regional ASINs
## Phase 2: Fuzzy Matching Removal (January 2026)
**Status:** ✅ Implemented
**Date:** 2026-01-26
**Issue:** Race condition with Audiobookshelf causing false positive matches
### Problem Statement
**Race Condition in Audiobookshelf:**
1. New ABS item discovered → triggers async `triggerABSItemMatch()` to fetch ASIN
2. Immediately runs library matching (sync) before ASIN populates
3. Falls back to fuzzy matching (70% threshold)
4. Result: One book matches entire series → false positives
**Example:**
- User has "Foundation" (Book 1) in library
- Download completes for "Foundation and Empire" (Book 2)
- Library scan runs before ABS populates ASIN
- Fuzzy matcher: "Foundation and Empire" vs "Foundation" = 75% match ✅
- Wrong match! Book 2 marked as available, pointing to Book 1
### Root Cause
**Fuzzy matching in library checks creates false positives.** It should only be used for:
-**Prowlarr torrent ranking** - Selecting best release from multiple options
-**Library availability checks** - Must be exact ASIN matches only
### Solution
Remove fuzzy matching from all library matching functions. Make it strictly ASIN-only.
**Match Priority (After Phase 2):**
- `findPlexMatch()`: ASIN (field) → ASIN (GUID) → **null** (no fuzzy fallback)
- `matchAudiobook()`: ASIN → ISBN → **null** (no fuzzy fallback)
**Preserve Fuzzy Matching:**
- `ranking-algorithm.ts` - Kept untouched (used for Prowlarr torrent selection)
### Implementation Changes
**Critical Fix: Trigger Metadata Match for Items Without ASIN**
To solve the circular dependency (no ASIN → no match → no trigger → no ASIN), added logic to proactively trigger metadata match for ALL Audiobookshelf items without ASIN during library scans:
**File: `src/lib/processors/scan-plex.processor.ts`**
- After scanning library items, check for items without ASIN
- Trigger `triggerABSItemMatch()` for each item without ASIN
- This populates ASIN asynchronously, allowing future scans to match
**File: `src/lib/processors/plex-recently-added.processor.ts`**
- Same logic added for recently-added checks
- Ensures new items get ASIN populated immediately
**File: `src/lib/utils/audiobook-matcher.ts`**
**Removed:**
- Import: `compareTwoStrings` from `string-similarity`
- Function: `normalizeTitle()` (title normalization helper)
- Query: Title substring search (replaced with direct ASIN query)
- Logic: All fuzzy matching in `findPlexMatch()` (lines 190-261 removed)
- Logic: All fuzzy matching in `matchAudiobook()` (lines 433-479 removed)
**New Implementation:**
```typescript
// findPlexMatch() - ASIN-only matching
export async function findPlexMatch(audiobook: AudiobookMatchInput) {
// Query directly by ASIN (indexed O(1) lookup)
const plexBooks = await prisma.plexLibrary.findMany({
where: {
OR: [
{ asin: audiobook.asin },
{ plexGuid: { contains: audiobook.asin } },
],
},
});
// Priority 1a: ASIN exact match in dedicated field
// Priority 1b: ASIN in plexGuid (backward compatibility)
// Return null if no ASIN match (no fuzzy fallback)
}
// matchAudiobook() - ASIN/ISBN only
export function matchAudiobook(request, libraryItems) {
// 1. Exact ASIN match
// 2. Exact ISBN match
// 3. Return null (no fuzzy fallback)
}
```
**Performance Optimization:**
- Eliminated title substring query (was: `LIKE '%title%' LIMIT 20`)
- Direct ASIN query using indexed fields (O(1) lookup)
- ~100 lines of fuzzy matching code removed
**Test Updates:**
- Updated `audiobook-matcher.test.ts` to expect null for non-ASIN matches
- Verified ranking-algorithm.ts untouched (fuzzy preserved for torrents)
### Benefits
1. **Eliminates false positives** - "Foundation" won't match "Foundation and Empire"
2. **Solves race condition** - Items won't match until ASIN populated by ABS
3. **Faster matching** - O(1) indexed lookups vs O(n²) string comparisons
4. **Cleaner code** - ~100 lines removed, simpler logic
5. **Predictable behavior** - Exact matches only, no threshold tuning
### Trade-offs
1. **Lower initial match rate** - Items without ASIN won't match
- ABS: 5-10% of items temporarily (until `triggerABSItemMatch()` completes)
- Plex: 30-40% if Plex GUID doesn't contain ASIN (agent-dependent)
2. **User experience** - Some books may show "not in library" temporarily
- This is CORRECT behavior - better no match than false positive
3. **Discovery pages** - "In Your Library" badge only shows for exact ASIN matches
### Match Distribution (Expected)
**Audiobookshelf (After Phase 2):**
- ASIN exact match: 95%+ (100% confidence)
- ISBN exact match: 2% (95% confidence)
- No match: 3% (correct - waiting for ASIN population)
**Plex (After Phase 2):**
- ASIN exact match (field): 60% (100% confidence)
- ASIN exact match (GUID): 30% (100% confidence)
- No match: 10% (correct - no ASIN in metadata)
### Files Modified
**Processors (Critical Fix):**
-`src/lib/processors/scan-plex.processor.ts` - Trigger metadata match for items without ASIN (~25 lines added)
-`src/lib/processors/plex-recently-added.processor.ts` - Trigger metadata match for items without ASIN (~20 lines added)
**Matching Logic:**
-`src/lib/utils/audiobook-matcher.ts` - Removed fuzzy matching (~150 lines modified, ~100 removed)
**Tests:**
-`tests/utils/audiobook-matcher.test.ts` - Updated expectations (~20 lines)
-`tests/processors/scan-plex.processor.test.ts` - All 4 tests passing
-`tests/processors/plex-recently-added.processor.test.ts` - All 3 tests passing
**Documentation:**
-`documentation/fixes/asin-matching-fix.md` - Added Phase 2 section
-`documentation/integrations/plex.md` - Updated availability checking description
-`documentation/integrations/audible.md` - Updated matcher description
**Preserved (Unchanged):**
-`src/lib/utils/ranking-algorithm.ts` - Fuzzy matching for Prowlarr (different purpose)
### Verification
**Unit Tests:**
```bash
npm run test -- audiobook-matcher.test.ts # ✅ All 5 tests passing
```
**Integration Testing:**
1. Discovery APIs - "In Your Library" badge only for exact ASIN matches ✅
2. Request creation - "Already in library" check works with ASIN ✅
3. Library scanning - Downloaded requests only match if ASIN present ✅
4. BookDate - `isInLibrary()` check works with ASIN-only ✅
5. Prowlarr ranking - Fuzzy matching still works (unchanged) ✅
## Conclusion
This fix resolves the critical ASIN matching issue for Audiobookshelf by implementing a robust, universal metadata storage architecture. The solution is:
- **Comprehensive:** Covers schema, processors, and matching logic
- **Backward compatible:** Existing Plex installations unaffected
- **Well-tested:** Follows established patterns from existing codebase
- **Future-proof:** Easy to extend for new backends or metadata types
**Phase 2 Enhancement:**
- **Eliminates false positives:** ASIN-only matching prevents wrong-book matches
- **Solves race condition:** Items wait for ASIN population before matching
- **Preserves critical functionality:** Fuzzy matching kept for Prowlarr torrent ranking
- **Improves performance:** O(1) indexed lookups replace O(n²) string comparisons
**Status:** ✅ All phases complete and production-ready
## Phase 3: Empty ASIN Guard (January 2026)
**Status:** ✅ Implemented
**Date:** 2026-01-28
**Issue:** Empty ASIN causing all library books to match AI recommendations
### Problem Statement
**BookDate Recommendations Returning Empty:**
1. AI generates 20 recommendations (without ASINs)
2. BookDate calls `isInLibrary()` to filter out books already in library
3. `isInLibrary()` calls `findPlexMatch()` with empty ASIN (`asin: ""`)
4. Database query: `{ plexGuid: { contains: "" } }` matches ALL records (29 books)
5. Code checks: `plexGuid.includes("")` returns true for first book
6. All 20 recommendations incorrectly matched to first library book ("Murder Your Employer")
7. All recommendations filtered out → User sees 0 recommendations
### Root Cause
**Empty string matching bug in database query:**
- SQL: `WHERE plexGuid LIKE '%' + '' + '%'` matches every record
- JavaScript: `anyString.includes("")` always returns true
- Prisma: `{ contains: "" }` returns all rows in table
### Solution
Add guard clause at start of `findPlexMatch()` to return `null` immediately if ASIN is empty or falsy.
**Implementation:**
```typescript
export async function findPlexMatch(audiobook: AudiobookMatchInput) {
// Early return if no ASIN provided (prevents empty string matching all records)
if (!audiobook.asin || audiobook.asin.trim() === '') {
logger.debug('Matcher result', {
MATCHER: {
input: { title: audiobook.title, author: audiobook.author, asin: audiobook.asin },
candidatesFound: 0,
matchType: 'no_asin_provided',
matched: false,
result: null,
}
});
return null;
}
// Existing ASIN query logic...
}
```
### Expected Behavior
**BookDate Flow (After Phase 3):**
1. AI generates 20 recommendations (no ASINs)
2. First `isInLibrary()` call with empty ASIN → Returns `false` immediately ✅
3. Recommendation matches to Audnexus → Gets real ASIN
4. Second `isInLibrary()` call with real ASIN → Correctly checks for exact match ✅
5. Only books actually in library get filtered out ✅
6. User sees 10-15 new recommendations ✅
### Files Modified
**Matching Logic:**
-`src/lib/utils/audiobook-matcher.ts:44-61` - Added empty ASIN guard clause
**Documentation:**
-`documentation/fixes/asin-matching-fix.md` - Added Phase 3 section
-`documentation/features/bookdate.md` - Added to Fixed Issues
### Benefits
1. **Fixes critical bug:** Empty ASIN no longer matches all library books
2. **Prevents false positives:** Only exact ASIN matches are considered matches
3. **Aligns with design:** ASIN-only matcher requires valid ASIN to match
4. **Single-line fix:** Minimal code change with maximum impact
5. **No breaking changes:** All existing functionality preserved
**Status:** ✅ All three phases complete and production-ready