Files
ReadMeABook/documentation/integrations/plex.md
T
kikootwo a97979358f Implement file hash-based library matching and remove fuzzy ASIN matching
Adds file hash-based matching for Audiobookshelf library items to ensure 100% accurate ASIN assignment for RMAB-organized content. Removes fuzzy matching from library availability checks, making all matching ASIN-only to eliminate false positives and race conditions. Updates database schema, processors, and matcher utilities; adds new tests and documentation for the new matching strategy. Removes obsolete scripts, Dockerfile, and related tests; updates docker-compose for test environments.
2026-01-28 11:42:00 -05:00

15 KiB
Raw Blame History

Plex Media Server Integration

Status: Implemented

Connectivity to Plex for OAuth, library management, content detection, and automatic scanning. Database stores all audiobooks from Plex as source of truth for availability.

Data Flow

  1. Plex Scan Job → Fetches all audiobooks → Populates DB with availabilityStatus: 'available'
  2. Audible Refresh → Fuzzy matches against Plex data in DB → Sets availabilityStatus: 'available' for matches
  3. UI → Queries DB → Shows "In Your Library" badge → Prevents duplicate requests

Key Principle: Database reflects Plex content. Audible data matched against this.

Core Endpoints

GET {server_url}/identity - Server info (machineIdentifier, version, platform) | Also used for access verification GET {server_url}/library/sections - List libraries with IDs and types GET {server_url}/library/sections/{id}/all?type=9 - All albums (type 9 = audiobooks) GET {server_url}/library/sections/{id}/all?type=9&sort=addedAt:desc&X-Plex-Container-Start=0&X-Plex-Container-Size=10 - Recently added (lightweight polling) GET {server_url}/library/sections/{id}/refresh - Trigger async scan GET {server_url}/library/metadata/{rating_key} - Item metadata (includes user's personal rating) GET {server_url}/library/sections/{id}/search?title={query} - Search DELETE {server_url}/library/metadata/{rating_key} - Delete library item (requires deletion enabled in Plex settings)

Auth: X-Plex-Token header Response: XML (requires xml2js parsing to JSON) API Docs: /PlexMediaServerAPIDocs.json

Security: During OAuth, user's accessible servers are fetched from plex.tv/api/v2/resources. Only users with the configured server in their resource list can authenticate.

Plex OAuth

Base: https://plex.tv/api/v2

  1. POST /pins → Get PIN id and code
  2. Build auth URL: https://app.plex.tv/auth#?clientID={id}&code={code}
  3. GET /pins/{id} → Poll until authToken populated
  4. GET /users/account → Get user info with token
  5. Security check: Get server machineIdentifier from configured server
  6. Security check: Fetch user's accessible servers (GET plex.tv/api/v2/resources with user token)
  7. Security check: Verify configured server's machineIdentifier is in user's resource list
  8. Only grant access if server found in user's accessible resources (validates shared access)

Audiobook Detection

  • Plex has no dedicated audiobook type
  • Stored as Music library (type="artist")
  • Admin selects library during setup
  • Query with type=9 for Album-level items (books)
  • item.title = book name, item.parentTitle = author

Library Scanning

Full Library Scan

Scan Process:

  1. Fetch all audiobooks via API (type=9)
  2. For each:
    • Exists by plexGuid? Update metadata
    • New? Create entry in plex_library table
  3. Match downloaded requests (status: 'downloaded'):
    • Uses centralized audiobook-matcher.ts (ASIN matching, title normalization, narrator support)
    • Matched → Update request status to 'available' + link plexGuid
  4. Return summary (total, new count, updated count, matched downloads)

Trigger: Scheduled (every 6 hours default) or manual admin action Note: Heavy operation, scans entire library

Recently Added Check (Lightweight Polling)

Process:

  1. Query top 10 items sorted by addedAt:desc with pagination
  2. For each item:
    • New? Create in plex_library table
    • Existing? Update metadata
  3. Match downloaded requests:
    • Uses centralized audiobook-matcher.ts (same as full scan and homepage)
    • Searches entire plex_library table for matches
  4. Return summary (new, updated, matched downloads)

Trigger: Scheduled (every 5 minutes default), enabled by default Benefits: Lightweight polling for new items + comprehensive matching for downloaded requests Note: Requests transition: pending → searching → downloading → processing → downloaded → available (after detection)

Auto-Completion of Stuck Requests

Library scans (full and incremental) now check all non-terminal requests for matches:

Eligible statuses:

  • pending, searching, downloading, processing, downloaded
  • failed, awaiting_search, awaiting_import, warn

Excluded statuses:

  • available (already completed)
  • cancelled (user cancelled)

Use Case:

  1. Request stuck in 'awaiting_search' or 'failed' status
  2. User manually imports audiobook to library (via Plex/ABS or external tool)
  3. Next library scan (manual trigger or scheduled recently-added check)
  4. Request auto-matches and marks as 'available'
  5. Error messages and retry counters cleared

State Cleanup on Match:

  • errorMessage → null
  • searchAttempts → 0
  • downloadAttempts → 0
  • importAttempts → 0
  • completedAt → scan timestamp

Edge Cases:

  • Active downloads/jobs continue but become no-ops (download completes, organize skips)
  • Torrent/NZB remains in download client (manual cleanup if desired)

Logging:

  • Transitions from non-downloaded statuses logged with original status: Match found! "Book" → "Library Book" (was 'failed')
  • Provides visibility into which stuck requests were auto-completed

Data Models

interface PlexAudiobook {
  ratingKey: string;
  guid: string;
  title: string;
  author: string; // from parentTitle
  narrator?: string;
  duration: number; // ms
  year?: number;
  summary?: string;
  thumb?: string;
  addedAt: number;
  updatedAt: number;
  filePath: string;
}

interface PlexLibrary {
  id: string;
  title: string;
  type: string; // "artist", "audio"
  locations: string[];
  itemCount: number;
}

BookDate Ratings

Problem: Library scan runs with system Plex token, storing those ratings in cache. Different users need different ratings for recommendations.

Solution:

  1. Local admin users: Use cached ratings (from system Plex token)
  2. Plex-authenticated users (including admins): Fetch library with user's token to get personal ratings

How Per-User Ratings Work:

  • Key insight: /library/sections/{id}/all returns items with the authenticated user's ratings
  • Plex ratings are tied to user accounts (stored on plex.tv), not the server
  • When fetched with a user's token, each item includes that user's personal userRating
  • No special permissions needed - works for all authenticated users (admin and non-admin)

Implementation:

  • getLibraryContent(serverUrl, userToken, libraryId) - Fetches library with user-specific ratings
  • Returns PlexAudiobook[] with userRating field specific to the authenticated user
  • Plex-authenticated users: Fetch full library (~1-2s), match by plexGuid/ratingKey against cached structure
  • Local admin: Use cached ratings (skip API call, user has no Plex account)

BookDate Integration:

  • enrichWithUserRatings(userId, cachedBooks) - Determines user type and returns appropriate ratings
    • Local admin (plexId starts with 'local-') → cached ratings from system token (no API call)
    • Plex-authenticated (everyone else) → user's plex.tv token + stored machineIdentifier → server access token → fetch library with user's ratings

Notes:

  • System Plex token (configured during setup) is used for library scanning, testing, admin operations only
  • Cached ratings reflect whoever owns that system token
  • Local admins use cached ratings because they don't have Plex accounts (user.authToken is bcrypt hash)
  • Token types: Plex uses two token types per the API documentation
    • plex.tv OAuth tokens: For authenticating to plex.tv services
    • Server access tokens: For talking to individual PMS instances
    • Must call /api/v2/resources with plex.tv token + machineIdentifier to get server-specific access tokens
    • Each server in user's resources list has its own accessToken
  • Security: machineIdentifier stored in Configuration during setup to avoid accessing system token for user operations
  • BookDate correctly fetches server-specific access tokens without touching the system Plex token

Fixed Issues

1. Response Format Handling

  • Issue: Server info "unknown", libraries failing to load
  • Cause: Modern Plex returns JSON when Accept: application/json set, not XML
  • Fix: Added JSON handling alongside XML parsing, optional chaining for $ attributes

2. OAuth Callback Missing pinId

  • Issue: "Missing pinId parameter" after auth
  • Fix: Modified getOAuthUrl() to append pinId to callback URL

3. Scan Architecture

  • Issue: Matched requests instead of populating library (0 matches when DB empty)
  • User Feedback: "Seeing books on homepage I know are in library"
  • Fix: Rewrote to populate ALL Plex audiobooks to DB as source of truth, Audible matches against this

4. Mapping Artist Instead of Album

  • Issue: Author names as titles, undefined authors
  • Cause: Querying without type=9 returned Artist items, not Albums
  • Fix: Added type=9 parameter, changed grandparentTitle to parentTitle for author

5. Immediate Plex Search After File Organization (400 Error)

  • Issue: organize_files job triggered match_plex immediately after copying files
  • Cause: Plex hadn't scanned new files yet, search API returned 400 error
  • User Experience: Error logs despite successful download
  • Fix: Removed immediate match_plex trigger, changed workflow:
    • organize_files → status: 'downloaded' (green)
    • Scheduled scan_plex (every 6 hours) → matches downloaded requests → status: 'available'

6. Recently Added Check Used Different Matching Criteria

  • Issue: Recently added check didn't match downloaded requests that full scan matched
  • Cause: Recently added used AND logic (title >= 70% AND author >= 70%), full scan used weighted average (title × 0.7 + author × 0.3 >= 0.7)
  • User Experience: "The Tenant" → "The Tenant (Unabridged)" matched in full scan but not in recently added check
  • Fix: Changed recently added check to use same weighted scoring algorithm as full scan

7. Scan Methods Not Using Centralized Matcher

  • Issue: Full scan and recently added check had custom matching logic, different from homepage matcher
  • Cause: Each component implemented its own fuzzy matching without title normalization, ASIN matching, or narrator support
  • User Experience: Inconsistent matching behavior across the application
  • Fix: Both scan methods now use audiobook-matcher.ts utility (same as homepage)
    • ASIN matching: Checks plexGuid for exact ASIN (100% confidence)
    • Title normalization: Removes "(Unabridged)", "(Abridged)", etc.
    • Narrator matching: Can match narrator to Plex author field
    • ASIN filtering: Rejects candidates with wrong ASINs in plexGuid
    • Consistent 70% weighted threshold everywhere

8. BookDate Token Decryption Failures

  • Issue: Decryption errors when fetching user ratings for BookDate recommendations
  • User Experience: "Failed to decrypt user authToken" / "Failed to decrypt system Plex token"
  • Cause: Tokens may be stored as plain text (from before encryption implementation or different encryption key)
  • Fix: Added fallback to use tokens as plain text if decryption fails
    • User Plex token: Try decrypt, fallback to plain text
    • System Plex token: Try decrypt, fallback to plain text (before architectural fix)
    • Allows BookDate to function with both encrypted and plain text tokens

9. BookDate Accessing System Token for User Operations ARCHITECTURAL FIX

  • Issue: Every BookDate user request was decrypting system Plex token to get machineIdentifier
  • User Experience: Unnecessary decryption operations, security concern (users shouldn't access admin token)
  • Cause: machineIdentifier was fetched via testConnection() using system token for each user request
  • Fix: Store machineIdentifier in Configuration during setup, use stored value for user operations
    • Added plex_machine_identifier to Configuration table
    • Setup/complete route saves machineIdentifier from test-plex response
    • config.service.ts returns machineIdentifier from config
    • enrichWithUserRatings() uses stored machineIdentifier (no system token access)
    • System token now only used for: library scanning, setup, testing, admin operations
    • User flow: user's plex.tv token + stored machineIdentifier → server access token
  • Security: Users never access or decrypt the system Plex token

10. OAuth Callback Re-fetching machineIdentifier ARCHITECTURAL FIX

  • Issue: auth/plex/callback route was calling testConnection() to fetch machineIdentifier on every user login
  • User Experience: Unnecessary Plex API call on every authentication (adds latency, wastes resources)
  • Cause: Inconsistent architecture - setup/settings save machineIdentifier, but callback re-fetched it
  • Fix: Use stored machineIdentifier from config (via getPlexConfig().machineIdentifier)
    • auth/plex/callback now reads from database instead of API call
    • Consistent with BookDate and other user operations
    • testConnection() only used for: testing connections, initial fetching during setup/settings
  • Result: Faster authentication, no unnecessary API calls, consistent architecture

Library Item Deletion

Endpoint: DELETE /library/metadata/{ratingKey}

Use Case: When admin deletes a request, also delete from Plex library to keep in sync

Requirements:

  • Deletion must be enabled: Settings > Server > Library in Plex webui
  • Without this setting enabled, DELETE requests will fail

Implementation:

  • deleteItem(serverUrl, authToken, ratingKey) - Deletes library item by ratingKey
  • Called during request deletion when backend mode is 'plex'
  • Extracts ratingKey from audiobook.plexGuid (format: plex://album/{ratingKey})
  • Mirrors ABS deletion behavior for consistency

Error Handling:

  • 404: Item not found (already deleted) - logged but not thrown
  • Other errors: Logged but deletion continues (prevents blocking request deletion)

Availability Checking

  1. DB Population: Plex scan creates/updates records with plexGuid + ASIN + availabilityStatus: 'available'
  2. Audible Matching: Real-time ASIN-only matching (100% confidence, exact matches only)
  3. API Enrichment: Discovery APIs use real-time ASIN matching at query time
  4. UI: AudiobookCard shows "In Your Library" if isAvailable: true (ASIN exact match)
  5. Server Validation: /api/requests returns 409 if availabilityStatus === 'available'

Match Priority (ASIN-Only):

  • ASIN in dedicated field (100% confidence) → Match
  • ASIN in plexGuid (backward compatibility) → Match
  • No ASIN match → Return null (no fuzzy fallback)

Note: Fuzzy matching (70% threshold) is preserved in ranking-algorithm.ts for Prowlarr torrent selection, but NOT used for library availability checks. This eliminates false positives (e.g., "Foundation" matching "Foundation and Empire").

Tech Stack

  • axios/node-fetch
  • xml2js (XML → JSON)
  • string-similarity (fuzzy matching)