# Audible Integration **Status:** ✅ Implemented (Audnexus API + Web Scraping) Audiobook metadata from Audnexus API (primary) and Audible.com scraping (fallback) for discovery, search, and detail pages. ## Detail Page Strategy **Primary: Audnexus API** - Endpoint: `https://api.audnex.us/books/{asin}` - Structured JSON response (no parsing needed) - Provides: title, authors, narrators, description, duration, rating, genres, cover art - Free, no API key required - ~95% success rate for popular audiobooks **Fallback: Audible Scraping** - Used when Audnexus returns 404 - Parse Audible HTML with Cheerio - Multiple selector strategies with promotional text filtering - Extract JSON-LD structured data when available ## Discovery Strategy (Popular/New/Search) - Parse Audible HTML with Cheerio - Multi-page scraping (20 items/page) - Rate limit: max 10 req/min, 1.5s delay between pages - Cache results in database (24hr TTL) ## Data Sources 1. **Best Sellers:** `https://www.audible.com/adblbestsellers` 2. **New Releases:** `https://www.audible.com/newreleases` 3. **Search:** `https://www.audible.com/search?keywords={query}` 4. **Detail Page:** `https://www.audible.com/pd/{asin}` ## Metadata Extracted - ASIN (Audible ID) - Title, author, narrator - Duration (minutes), release date, rating - Description, cover art URL - Genres/categories ## Unified Matching (`audiobook-matcher.ts`) **Status:** ✅ Production Ready Single matching algorithm used everywhere (search, popular, new-releases, jobs). **Process:** 1. Query DB candidates: `audibleId` exact match OR partial title+author match 2. If exact ASIN match → return immediately 3. Fuzzy match: title 70% + author 30% weights, 70% threshold 4. Return best match or null **Benefits:** - Real-time matching at query time (not pre-matched) - Works regardless of job execution order - Prevents duplicate `plexGuid` assignments - Used by all APIs for consistency ## Database-First Approach **Status:** ✅ Implemented Discovery APIs serve cached data from DB with real-time matching. **Flow:** 1. `audible_refresh` job runs daily → fetches 200 popular + 200 new releases 2. Downloads and caches cover thumbnails locally (reduces Audible load) 3. Stores in DB with flags (`isPopular`, `isNewRelease`) and rankings 4. Cleans up unused thumbnails after sync 5. API routes query DB → apply real-time matching → return enriched results 6. Homepage loads instantly (no Audible API hits) ## Thumbnail Caching **Status:** ✅ Implemented Cover images cached locally to reduce external requests and improve performance. **Features:** - Downloads covers during `audible_refresh` job - Stores in `/app/cache/thumbnails` (Docker volume) - Serves via `/api/cache/thumbnails/[filename]` - Auto-cleanup of unused thumbnails - Falls back to original URL if cache fails - 24-hour browser cache headers **Implementation:** - Service: `src/lib/services/thumbnail-cache.service.ts` - API Route: `src/app/api/cache/thumbnails/[filename]/route.ts` - Storage: Docker volume `cache` mounted at `/app/cache` - Filename: `{asin}.{ext}` (e.g., `B08G9PRS1K.jpg`) **API Endpoints:** **GET /api/audiobooks/popular?page=1&limit=20** **GET /api/audiobooks/new-releases?page=1&limit=20** Response: ```typescript { success: boolean; audiobooks: EnrichedAudibleAudiobook[]; count: number; totalCount: number; page: number; totalPages: number; hasMore: boolean; lastSync: string | null; // ISO timestamp message?: string; // if no data } ``` ## Data Models ```typescript interface AudibleAudiobook { asin: string; title: string; author: string; narrator?: string; description?: string; coverArtUrl?: string; durationMinutes?: number; releaseDate?: string; rating?: number; genres?: string[]; } interface EnrichedAudibleAudiobook extends AudibleAudiobook { availabilityStatus: 'available' | 'requested' | 'unknown'; isAvailable: boolean; plexGuid: string | null; dbId: string; } ``` ## Tech Stack - axios (HTTP) - cheerio (HTML parsing) - Redis (caching, optional) - Database (PostgreSQL) - string-similarity (matching) ## Fixed Issues **Search returning empty results (2026-01-07)** - **Problem:** Audible changed HTML structure for search results from `.productListItem` to `.s-result-item` - **Impact:** All search queries returned 0 results - **Fix:** Updated `search()` method to support both `.s-result-item` (current) and `.productListItem` (legacy) - **Selectors updated:** - Main: `.s-result-item, .productListItem` - Title: `h2` (new) or `h3 a` (legacy) - Author: `a[href*="/author/"]` (new) or `.authorLabel` (legacy) - Narrator: `a[href*="searchNarrator="]` (new) or `.narratorLabel` (legacy) - Runtime: `span:contains("Length:")` (new) or `.runtimeLabel` (legacy) - Rating: `.a-icon-star span` (new) or `.ratingsLabel` (legacy) - **Location:** `src/lib/integrations/audible.service.ts:235` **Some audiobooks missing from search results (2026-01-07)** - **Problem:** ASIN extraction only matched `/pd/` URLs but some audiobooks use `/ac/` URLs - **Impact:** Books like "Beatitude" by DJ Krimmer (ASIN: B0DVH7XL36) were skipped - **Fix:** Updated ASIN regex to match both `/pd/` and `/ac/` URL patterns: `/\/(?:pd|ac)\/[^\/]+\/([A-Z0-9]{10})/` - **Location:** `src/lib/integrations/audible.service.ts:75, 161, 240` - **Affects:** `getPopularAudiobooks()`, `getNewReleases()`, `search()` methods