Merge branch 'main' into feature/bulk-import-folder-fallback

Resolves conflicts in src/lib/integrations/audible.service.ts.

main switched the ASIN-detail fallback from HTML scraping to the JSON
catalog API (fetchAudibleDetailsFromApi), removing scrapeAudibleDetails.
The PR's lookupAsinFast was a fail-fast variant of the same pattern that
getAudiobookDetails now performs (Audnexus -> catalog API), so it's
redundant.

- Drop the lookupAsinFast method (delete entire HEAD-side conflict block)
- Take main's fetchAudibleDetailsFromApi verbatim (the scrapeAudibleDetails
  maxRetries parameterization is moot)
- In bulk-import scan route, swap lookupAsinFast for getAudiobookDetails

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
kikootwo
2026-05-14 16:14:25 -04:00
36 changed files with 3952 additions and 1532 deletions
+3
View File
@@ -45,6 +45,8 @@
- **Web scraping (popular, new releases)** → [integrations/audible.md](integrations/audible.md)
- **Database caching, real-time matching** → [integrations/audible.md](integrations/audible.md)
- **Book covers API for login page** → [frontend/pages/login.md](frontend/pages/login.md)
- **Dedup & works table (cross-ASIN identity)** → [integrations/audible.md](integrations/audible.md#dedup--works-table)
- **Multi-narrator capture in HTML scrapers** → [integrations/audible.md](integrations/audible.md#narrator-capture-in-html-scrapers)
## E-book Support (First-Class)
- **First-class ebook requests, separate tracking** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
@@ -144,6 +146,7 @@
**"How do I delete requests?"** → [admin-features/request-deletion.md](admin-features/request-deletion.md)
**"How do I approve/deny user requests?"** → [admin-features/request-approval.md](admin-features/request-approval.md)
**"How do I enable auto-approve for requests?"** → [admin-features/request-approval.md](admin-features/request-approval.md)
**"How does the admin book info modal work?"** → [admin-features/request-approval.md](admin-features/request-approval.md#ui-features), [frontend/components.md](frontend/components.md#component-apis)
**"How do I customize audiobook folder organization?"** → [settings-pages.md](settings-pages.md#audiobook-organization-template), [phase3/file-organization.md](phase3/file-organization.md#target-structure)
**"How do I deploy?"** → [deployment/docker.md](deployment/docker.md) (multi-container), [deployment/unified.md](deployment/unified.md) (all-in-one)
**"How do I use the unified container?"** → [deployment/unified.md](deployment/unified.md)
@@ -259,8 +259,11 @@ Update user (includes autoApproveRequests field)
- Title and author
- User avatar and username
- Request timestamp (relative: "2 hours ago")
- Info button (ⓘ, top-right corner) — opens AudiobookDetailsModal for full book details
- Approve button (green, checkmark icon)
- Search button (blue, magnifier icon) — opens InteractiveTorrentSearchModal
- Deny button (red, X icon)
- **Info modal:** `AudiobookDetailsModal` rendered with `adminActions` prop containing Approve/Search/Deny buttons, allowing admin to review full book details (cover, description, series, genres, narrator, etc.) without leaving the approval workflow
- Auto-refreshes every 10 seconds (SWR)
- Loading states on buttons during approval/denial
- Success/error toast notifications
@@ -7,7 +7,7 @@ Sends notifications for audiobook request events (pending approval, approved, av
## Key Details
- **Backends:** Apprise (API), Discord (webhooks), ntfy (API), Pushover (API)
- **Events:** request_pending_approval, request_approved, request_available, request_error, issue_reported
- **Events:** request_pending_approval, request_approved, request_grabbed, request_available, request_error, issue_reported
- **Encryption:** AES-256-GCM for sensitive config (webhook URLs, API keys, notification URLs)
- **Delivery:** Async via Bull job queue (priority 5)
- **Failure Handling:** Non-blocking, Promise.allSettled (one backend fails, others succeed)
@@ -33,11 +33,14 @@ model NotificationBackend {
|-------|---------|------------------------|
| request_pending_approval | User creates request | Request needs admin approval |
| request_approved | Admin approves OR auto-approval | Request approved (manual or auto) |
| request_grabbed | Torrent/NZB added to download client | Download handed off to configured download client (title resolves by type) — **opt-in: existing backends do not auto-subscribe; enable in Settings** |
| request_available | Plex/ABS scan or ebook download completes | Request available (title resolves by type) |
| request_error | Download/import fails | Request failed at any stage |
| issue_reported | User reports issue | User reports problem with available audiobook |
**Dynamic Titles:** Events can define `titleByRequestType` in `notification-events.ts` for type-specific titles.
- `request_grabbed` + `requestType: 'audiobook'` → "Audiobook Grabbed"
- `request_grabbed` + `requestType: 'ebook'` → "Ebook Grabbed"
- `request_available` + `requestType: 'audiobook'` → "Audiobook Available"
- `request_available` + `requestType: 'ebook'` → "Ebook Available"
- `request_available` + no requestType → "Request Available" (fallback)
@@ -66,6 +69,11 @@ model NotificationBackend {
- Approve (with or without pre-selected torrent): After job triggered → request_approved
- Deny: No notification
**Download Grabbed (processor: download-torrent)**
- After `client.addDownload()` succeeds and `DownloadHistory` record created → request_grabbed
- `message` field: `"${torrent.title} via ${indexer} (${clientType})"`
- `requestType`: from `request.type` (audiobook/ebook)
**Audiobook Available (processors: scan-plex, plex-recently-added)**
- After `status: 'available'` update → request_available (requestType: 'audiobook')
- Includes user info in query (plexUsername)
+2 -1
View File
@@ -30,7 +30,7 @@ src/components/
**Audiobooks**
- **AudiobookCard** ✅ - Cover, title, author, narrator, duration, request button, clickable to open details modal. Shows "Requested by [username]" when someone else has requested the book, "Requested" when current user has requested it
- **AudiobookGrid** - Responsive grid (1/2/3/4 cols)
- **AudiobookDetailsModal** ✅ - Full-screen modal with comprehensive metadata (description, genres, rating, release date, narrator, request functionality). Shows requesting user's name when applicable
- **AudiobookDetailsModal** ✅ - Full-screen modal with comprehensive metadata (description, genres, rating, release date, narrator, language, format, publisher, request functionality). Shows requesting user's name when applicable
**Requests**
- **RequestCard** ✅ - Cover, title, author, status badge, progress bar, timestamps, action buttons (cancel, manual search, interactive search)
@@ -113,6 +113,7 @@ interface AudiobookDetailsModalProps {
requestStatus?: string | null;
isAvailable?: boolean;
requestedByUsername?: string | null;
adminActions?: React.ReactNode; // Optional admin buttons (Approve/Search/Deny) rendered as second row in action bar
}
interface RequestCardProps {
+204 -130
View File
@@ -1,104 +1,131 @@
# Audible Integration
**Status:** Implemented (Audnexus API + Web Scraping)
**Status:** Implemented | Hybrid — curated HTML for discovery refresh + Audible JSON catalog API for user-facing real-time + Audnexus for per-ASIN details
Audiobook metadata from Audnexus API (primary) and Audible.com scraping (fallback) for discovery, search, and detail pages.
## Overview
## Detail Page Strategy
Audiobook metadata for discovery, search, and detail pages. Split by access pattern:
**Primary: Audnexus API**
- Endpoint: `https://api.audnex.us/books/{asin}`
- Structured JSON response (no parsing needed)
- Provides: title, authors, narrators, description, duration, rating, genres, cover art
- Free, no API key required
- ~95% success rate for popular audiobooks
- **Nightly discovery refresh** (popular / new releases / category lists) — scraped from Audible's **curated HTML storefronts** (`www.audible.<tld>/adblbestsellers`, `/newreleases`, `/search?node=<id>`). The HTML pages reflect Audible's own editorial picks.
- **User-facing real-time** (search, author books, categories listing, per-ASIN details) — Audible's unauthenticated public **JSON catalog API** (`api.audible.<tld>/1.0/catalog/*`).
- **Per-ASIN detail lookups** — Audnexus (`api.audnex.us/books/{asin}`) primary; catalog API used as fallback when Audnexus returns 404.
**Fallback: Audible Scraping**
- Used when Audnexus returns 404
- Parse Audible HTML with Cheerio
- Multiple selector strategies with promotional text filtering
- Extract JSON-LD structured data when available
## Architecture
- **Curated HTML (refresh job only):** the three methods called solely by `audible-refresh.processor.ts` (`getPopularAudiobooks`, `getNewReleases`, `getCategoryBooks`) scrape Audible's storefront HTML to inherit editorial curation. Beefed-up retry/backoff knobs (12 retries, 3-min jittered cap) handle 503 storms patiently on the nightly job without slowing healthy users.
- **JSON catalog API (real-time):** `search`, `searchByAuthorAsin`, `getCategories` (categories listing), and `fetchAudibleDetailsFromApi` (per-ASIN fallback). Same endpoint used by the official Audible mobile apps. No authentication, no API key, no user credentials, no special headers.
- **Audnexus (per-ASIN):** `getAudiobookDetails` and `getRuntime` prefer Audnexus, with catalog API fallback for `getAudiobookDetails`.
- **`www.audible.<tld>`:** Used by HTML refresh scraping, by `audible-series.ts`, and by `getBaseUrl()` for "View on Audible" link generation.
## Data Sources
### Nightly refresh (HTML — `htmlClient`, baseURL `www.audible.<tld>`)
| Operation | Endpoint | Key params |
|---|---|---|
| Popular | `/adblbestsellers` | `pageSize=50`, `page=<n>` (omitted on first page) |
| New releases | `/newreleases` | `pageSize=50`, `page=<n>` (omitted on first page) |
| Category books | `/search` | `node=<categoryId>&pageSize=50&sort=popularity-rank&page=<n>` |
Parsed via cheerio. Selectors: `.productListItem` (popular/new releases), `.s-result-item, .productListItem` (categories).
### Real-time (JSON catalog API — `apiClient`, baseURL `api.audible.<tld>`)
| Operation | Endpoint | Key params |
|---|---|---|
| Search | `/1.0/catalog/products` | `keywords=<q>` |
| Author books | `/1.0/catalog/products` | `author=<name>` (name, NOT ASIN) |
| Categories listing | `/1.0/catalog/categories` | (none) |
| Single product | `/1.0/catalog/products/{asin}` | — |
| Audnexus (per-ASIN) | `https://api.audnex.us/books/{asin}` | `region={audnexusParam}` |
All `products` endpoints share:
- `num_results` — max **50** (service constant `AUDIBLE_PAGE_SIZE = 50`)
- `page`**0-indexed at the API** (service public interface is 1-indexed; the service subtracts 1 at the call site). See Gotchas.
- `response_groups=<CATALOG_RESPONSE_GROUPS>`
## `response_groups` Constant
`CATALOG_RESPONSE_GROUPS = 'contributors,product_desc,product_attrs,product_extended_attrs,media,rating,series,category_ladders,product_details'`
Populates every `AudibleAudiobook` field. Covered:
- `contributors` → authors (with ASINs), narrators
- `product_desc``publisher_summary`, `merchandising_summary`
- `product_attrs` / `product_extended_attrs` / `product_details` → title, release_date, language, runtime_length_min
- `media``product_images` (cover URLs, uses `500` variant)
- `rating``overall_distribution.display_stars`
- `series` → array of `{asin, title, sequence}`
- `category_ladders` → genre names (deduped, capped at 5)
## Gotchas
- **Catalog API cannot filter preorders or surface curated bestsellers.** The API's `BestSellers` sort is a right-now velocity rank that spikes on launch-day promos and preorder windows; the `-ReleaseDate` sort returns 100% future preorders. There is no server-side `release_time`, `released-only`, `customer_rights`, or alternate sort (`Reviewed`, `MostListened`, etc.) — every plausible variant was tested and silently ignored. This is why the nightly refresh job uses the curated HTML storefront pages instead.
- **`author=` takes a name, not an ASIN.** The catalog API has no ASIN-based author param. `searchByAuthorAsin()` queries by name, then filters client-side: keeps only products where `products[].authors[].asin === authorAsin`. Preserves ASIN-authoritative author identity. Also filters by `product.language` via `isAcceptedLanguage()` for the configured region.
- **Invalid ASIN returns HTTP 200 with stub body.** `/1.0/catalog/products/{asin}` responds 200 with `{product: {asin: INPUT}}` and no other fields. `fetchAudibleDetailsFromApi()` detects this via missing `product.title` and returns `null`.
- **`publisher_summary` is HTML.** Service strips tags via inline `stripHtml()` helper (regex-based, no cheerio) before populating `description`. Falls back to `merchandising_summary` (plain text) if `publisher_summary` missing.
- **Series is an array.** `products[].series[]` — a book may belong to multiple series. Service picks the first entry with non-empty `sequence`, else the first entry. `sequence` is cleaned by extracting first `/\d+(?:\.\d+)?/` match for numeric ordering.
- **Stub `product_images`:** cover URL reads from `product_images['500']`; missing keys fall back to `undefined`.
- **`page` is 0-indexed (catalog API only).** Despite the default value appearing to be 1, the API returns items `(page * num_results)` through `((page + 1) * num_results - 1)`. So `page=1` fetches items 51100, not 150. All catalog-API service methods accept a 1-indexed `page` and subtract 1 at the axios call. The symptom of getting this wrong is silent: queries whose `total_results ≤ num_results` return an empty `products` array while `total_results` is populated (e.g. author searches for small catalogues). HTML paths use Audible's native 1-indexed `page` query param and omit it on the first page.
## Rate Limiting & Resilience
- **Real-time JSON API paths:** 503s are uncommon. `fetchWithRetry()` uses jittered exponential backoff, 5 retries, retries on 503/429/5xx. API responses include `Cache-Control: private, max-age=1800`.
- **Nightly HTML refresh paths:** 503s are more likely (HTML storefront is more rate-sensitive). Same `fetchWithRetry()`, but with `HTML_MAX_RETRIES=12` and `HTML_MAX_BACKOFF_MS=180_000` (3-minute cap on jittered backoff). Healthy refreshes still complete fast (per-page success on attempt 0); users hit by sustained 503 storms grind through patiently rather than abandoning the refresh.
- **`AdaptivePacer`** — inter-page delay 24 s baseline, scales up multiplicatively under retry pressure, with a 4560 s circuit-breaker cooldown after 3 consecutive retry-pages.
- **Per-batch cooldowns** in `audible-refresh.processor.ts` — 1530 s between popular/new-releases, 1020 s between categories.
## Region Configuration
**Status:** Implemented
**Status:** Implemented
Configurable Audible region for accurate metadata matching across different international Audible stores.
Configurable Audible region for accurate metadata matching across international stores.
**Supported Regions:**
- United States (`us`) - `audible.com` (default, English)
- Canada (`ca`) - `audible.ca` (English)
- United Kingdom (`uk`) - `audible.co.uk` (English)
- Australia (`au`) - `audible.com.au` (English)
- India (`in`) - `audible.in` (English)
- Germany (`de`) - `audible.de` (non-English)
- Spain (`es`) - `audible.es` (non-English)
- French (`fr`) - `audible.fr` (non-English)
**`isEnglish` Flag:**
- Each region has `isEnglish: boolean` in `AudibleRegionConfig`
- Non-English regions (`isEnglish: false`) display an amber warning in all region dropdowns (setup wizard + admin settings)
- Warning text: "Many features such as search, discovery, and metadata matching are not yet fully supported for non-English regions."
- Dropdown options for non-English regions show `*` suffix (e.g., "Germany *")
| Code | Name | HTML baseUrl | apiBaseUrl | isEnglish |
|---|---|---|---|---|
| `us` | United States | `https://www.audible.com` | `https://api.audible.com` | true (default) |
| `ca` | Canada | `https://www.audible.ca` | `https://api.audible.ca` | true |
| `uk` | United Kingdom | `https://www.audible.co.uk` | `https://api.audible.co.uk` | true |
| `au` | Australia | `https://www.audible.com.au` | `https://api.audible.com.au` | true |
| `in` | India | `https://www.audible.in` | `https://api.audible.in` | true |
| `de` | Germany | `https://www.audible.de` | `https://api.audible.de` | false |
| `es` | Spain | `https://www.audible.es` | `https://api.audible.es` | false |
| `fr` | France | `https://www.audible.fr` | `https://api.audible.fr` | false |
**Why Regions Matter:**
- Each Audible region uses different ASINs for the same audiobook
- Metadata engines (Audnexus/Audible Agent) in Plex/Audiobookshelf must match RMAB's region
- Mismatched regions cause poor search results and failed metadata matching
**`AudibleRegionConfig` fields:** `code`, `name`, `baseUrl`, `apiBaseUrl`, `audnexusParam`, `language`.
**`isEnglish` flag:**
- Non-English regions show amber warning in region dropdowns (setup wizard + admin settings): "Many features such as search, discovery, and metadata matching are not yet fully supported for non-English regions."
- Dropdown options for non-English regions show `*` suffix.
**Why regions matter:**
- Each Audible region uses different ASINs for the same audiobook.
- Metadata engines (Audnexus / Audible Agent) in Plex / Audiobookshelf must match RMAB's region.
**Configuration:**
- Key: `audible.region` (stored in database)
- Default: `us`
- Set during: Setup wizard (Backend Selection step) or Admin Settings (Library tab)
- Help text instructs users to match their metadata engine region
- Auto-detection: Service checks config before each request and re-initializes if region changed.
- Cache clearing: Region change clears ConfigService cache and AudibleService state.
- Automatic refresh: Region change triggers `audible_refresh` job.
**Implementation:**
- `AudibleService` loads region from config on initialization
- Dynamically builds base URL: `AUDIBLE_REGIONS[region].baseUrl`
- Audnexus API calls include region parameter: `?region={code}`
- IP redirect prevention: `?ipRedirectOverride=true` on all Audible requests (region only)
- **Locale enforcement:** `?language=english` query parameter on all Audible requests (forces English content regardless of server IP geolocation)
- Configuration service helper: `getAudibleRegion()` returns configured region
- **Auto-detection of region changes**: Service checks config before each request and re-initializes if region changed
- **Cache clearing**: When region changes, ConfigService cache and AudibleService initialization are cleared
- **Automatic refresh**: Changing region automatically triggers `audible_refresh` job to fetch new data
**Per-region HTTP clients (on init):**
- `apiClient``baseURL=apiBaseUrl`, `Accept: application/json`, `User-Agent: ReadMeABook/1.0`, no language/ipRedirect params. Used for the real-time JSON catalog operations (search, author books, categories listing, per-ASIN details fallback).
- `htmlClient``baseURL=baseUrl`, rotating browser headers (`pickUserAgent` + `getBrowserHeaders`), default params `ipRedirectOverride=true` + `language=<audibleLocaleParam>`. Used by the nightly discovery refresh (`/adblbestsellers`, `/newreleases`, `/search?node=...`), by `audible-series.ts`, and by `getBaseUrl()`-based link generation.
- Audnexus calls include `region=<audnexusParam>`.
**Files:**
- Types: `src/lib/types/audible.ts`
- Service: `src/lib/integrations/audible.service.ts`
- Series (HTML): `src/lib/integrations/audible-series.ts`
- Config: `src/lib/services/config.service.ts`
- API: `src/app/api/admin/settings/audible/route.ts`
## Discovery Strategy (Popular/New/Search)
- Parse Audible HTML with Cheerio
- Multi-page scraping (20 items/page)
- Rate limit: max 10 req/min, 1.5s delay between pages
- Cache results in database (24hr TTL)
## Data Sources
URLs dynamically built based on configured region:
1. **Best Sellers:** `{baseUrl}/adblbestsellers`
2. **New Releases:** `{baseUrl}/newreleases`
3. **Search:** `{baseUrl}/search?keywords={query}&ipRedirectOverride=true`
4. **Detail Page:** `{baseUrl}/pd/{asin}?ipRedirectOverride=true`
5. **Audnexus API:** `https://api.audnex.us/books/{asin}?region={code}`
Where `{baseUrl}` is determined by configured region (e.g., `https://www.audible.co.uk` for UK).
## Metadata Extracted
- ASIN (Audible ID)
- Title, author, narrator
- Duration (minutes), release date, rating
- Description, cover art URL
- Genres/categories
## Unified Matching (`audiobook-matcher.ts`)
**Status:** Production Ready (ASIN-Only Matching)
**Status:** Production Ready (ASIN-Only Matching)
Single matching algorithm used everywhere (search, popular, new-releases, jobs).
@@ -112,50 +139,80 @@ Single matching algorithm used everywhere (search, popular, new-releases, jobs).
- `findPlexMatch()`: ASIN (field) → ASIN (GUID) → null
- `matchAudiobook()`: ASIN → ISBN → null
**Benefits:**
- Real-time matching at query time (not pre-matched)
- 100% confidence matches only (eliminates false positives)
- O(1) indexed lookups (faster than fuzzy matching)
- Solves race condition with Audiobookshelf ASIN population
- Used by all APIs for consistency
**Note:** Fuzzy matching (70% threshold) is preserved in `ranking-algorithm.ts` for Prowlarr torrent ranking. Library availability checks require exact ASIN matches only.
**Note:** Fuzzy matching (70% threshold) is preserved in `ranking-algorithm.ts` for Prowlarr torrent ranking, where it's needed to score multiple release candidates. Library availability checks require exact ASIN matches only.
## Dedup & Works Table
**Status:** ✅ Implemented | Two-pass dedup on every discovery view + cross-batch identity via works table
Discovery views (search, author books, series detail) collapse duplicate Audible listings for the same recording (publisher re-listings, regional re-issues, full-cast vs single-narrator productions) into a single card. Two passes run in sequence:
1. **Local pass — `deduplicateAndCollectGroups()`** (`src/lib/utils/deduplicate-audiobooks.ts`)
- Stateless, in-memory. Keys books by normalized title + sorted narrator set + duration (±max(5%, 10 min) tolerance), with subtitle compatibility to keep distinct series entries separate.
- Picks a canonical representative per group by `metadataScore()` (cover + rating + duration + description + narrator + release date + genres).
- Emits `DedupGroup[]` describing every multi-ASIN collapse → handed to `persistDedupGroups()` for the works table.
2. **Works pass — `collapseByExistingWorks()`** (`src/lib/services/works.service.ts`)
- Async DB lookup. Reads `work_asins` for every ASIN in the local-passed list and collapses any books sharing a `workId` to one representative (same `metadataScore()` ranking).
- Catches duplicates the local pass misses: source-metadata divergence (e.g. HTML scraper captured different narrators), cross-page splits (paginated series), or non-matching field shapes.
- Degrades gracefully — returns the input unchanged on DB failure (view still renders).
### Works Table Schema
- `Work { id, title, author }` — one row per logical book
- `WorkAsin { id, workId, asin, narrator?, durationMinutes?, isCanonical, source, createdAt }` — many ASINs per Work
### Population Layers
- **Layer 1 (auto):** `persistDedupGroups()` writes whenever the local pass finds a duplicate. Merges across pre-existing works when a new group spans them.
- **Layer 2 (seed):** `seedAsin()` writes a single-ASIN work at request creation time, ensuring every requested ASIN has an entry to grow from.
### Read Paths
- **`collapseByExistingWorks()`** — view-level collapse (this section).
- **`getSiblingAsins()`** — library availability matching (`audiobook-matcher.ts`), request-creation duplicate prevention (`request-creator.service.ts`), ignored-audiobook expansion. Returns sibling ASINs grouped by input ASIN.
### Narrator Capture in HTML Scrapers
- HTML scrapers (`audible-series.ts`, the two `parse*Items` parsers in `audible.service.ts`) capture **all** narrator anchors via `extractAllNarrators()` (`src/lib/utils/extract-narrator.ts`). Multi-narrator productions render each name as its own `<a href="?searchNarrator=...">` link; capturing only the first (prior bug) made co-narrated audiobooks fail to dedup. Order is not significant — `normalizeNarrator()` sorts before comparison.
### Wired Routes
- `src/app/api/audiobooks/search/route.ts`
- `src/app/api/authors/[asin]/books/route.ts`
- `src/app/api/series/[asin]/route.ts`
Watched-list background jobs (`watched-lists.service.ts`) run the local pass only — they don't render a view, and the downstream `request-creator.service.ts` already does sibling-aware dedup at request creation time.
## Database-First Approach
**Status:** Implemented
**Status:** Implemented
Discovery APIs serve cached data from DB with real-time matching.
**Flow:**
1. `audible_refresh` job runs daily → fetches 200 popular + 200 new releases + user-configured categories
2. Downloads and caches cover thumbnails locally (reduces Audible load)
3. Stores metadata in `audible_cache`, ranked entries in `audible_cache_categories` with reserved IDs (`__popular__`, `__new_releases__`) and user category IDs
4. Cleans up unused thumbnails after sync
5. API routes query `AudibleCacheCategory` by categoryId → join with `AudibleCache` metadata → apply real-time matching → return enriched results
6. Homepage loads instantly (no Audible API hits)
1. `audible_refresh` cron runs daily → fetches 200 popular + 200 new releases + user-configured categories by scraping Audible's curated HTML storefronts (`/adblbestsellers`, `/newreleases`, `/search?node=<id>&sort=popularity-rank`).
2. Downloads and caches cover thumbnails locally.
3. Stores metadata in `audible_cache`, ranked entries in `audible_cache_categories` with reserved IDs (`__popular__`, `__new_releases__`) and user category IDs.
4. Cleans up unused thumbnails after sync.
5. API routes query `AudibleCacheCategory` by categoryId → join with `AudibleCache` metadata → apply real-time matching → return enriched results.
6. Homepage loads instantly (no Audible HTTP hits at request time).
## Thumbnail Caching
**Status:** Implemented
**Status:** Implemented
Cover images cached locally to reduce external requests and improve performance.
Cover images cached locally to reduce external requests.
**Features:**
- Downloads covers during `audible_refresh` job
- Stores in `/app/cache/thumbnails` (Docker volume)
- Serves via `/api/cache/thumbnails/[filename]`
- Auto-cleanup of unused thumbnails
- Falls back to original URL if cache fails
- 24-hour browser cache headers
- Downloads covers during `audible_refresh` job.
- Stores in `/app/cache/thumbnails` (Docker volume).
- Serves via `/api/cache/thumbnails/[filename]`.
- Auto-cleanup of unused thumbnails.
- Falls back to original URL if cache fails.
- 24-hour browser cache headers.
- Filename: `{asin}.{ext}` (e.g. `B08G9PRS1K.jpg`).
**Implementation:**
**Files:**
- Service: `src/lib/services/thumbnail-cache.service.ts`
- API Route: `src/app/api/cache/thumbnails/[filename]/route.ts`
- Storage: Docker volume `cache` mounted at `/app/cache`
- Filename: `{asin}.{ext}` (e.g., `B08G9PRS1K.jpg`)
**API Endpoints:**
## App-Level API Endpoints
**GET /api/audiobooks/popular?page=1&limit=20**
**GET /api/audiobooks/new-releases?page=1&limit=20**
@@ -182,6 +239,7 @@ interface AudibleAudiobook {
asin: string;
title: string;
author: string;
authorAsin?: string;
narrator?: string;
description?: string;
coverArtUrl?: string;
@@ -189,6 +247,12 @@ interface AudibleAudiobook {
releaseDate?: string;
rating?: number;
genres?: string[];
series?: string;
seriesPart?: string;
seriesAsin?: string;
language?: string;
formatType?: string;
publisherName?: string;
}
interface EnrichedAudibleAudiobook extends AudibleAudiobook {
@@ -197,48 +261,58 @@ interface EnrichedAudibleAudiobook extends AudibleAudiobook {
plexGuid: string | null;
dbId: string;
}
interface AudibleSearchResult {
query: string;
results: AudibleAudiobook[];
totalResults: number;
page: number;
hasMore: boolean;
}
interface AuthorBooksResult {
books: AudibleAudiobook[];
hasMore: boolean;
page: number;
totalResults: number;
}
```
## Tech Stack
- axios (HTTP)
- cheerio (HTML parsing)
- Redis (caching, optional)
- Database (PostgreSQL)
- string-similarity (matching)
- `axios` (HTTP, two clients: `apiClient` for JSON catalog API, `htmlClient` for HTML refresh + series scraping)
- `cheerio` (HTML parsing for refresh job and `audible-series.ts`)
- Audnexus API (per-ASIN details, primary)
- PostgreSQL (`audible_cache`, `audible_cache_categories`)
## Fixed Issues
**Search returning empty results (2026-01-07)**
- **Problem:** Audible changed HTML structure for search results from `.productListItem` to `.s-result-item`
- **Impact:** All search queries returned 0 results
- **Fix:** Updated `search()` method to support both `.s-result-item` (current) and `.productListItem` (legacy)
- **Selectors updated:**
- Main: `.s-result-item, .productListItem`
- Title: `h2` (new) or `h3 a` (legacy)
- Author: `a[href*="/author/"]` (new) or `.authorLabel` (legacy)
- Narrator: `a[href*="searchNarrator="]` (new) or `.narratorLabel` (legacy)
- Runtime: `span:contains("Length:")` (new) or `.runtimeLabel` (legacy)
- Rating: `.a-icon-star span` (new) or `.ratingsLabel` (legacy)
- **Location:** `src/lib/integrations/audible.service.ts:235`
**Series-page duplicates not collapsing across user views (2026-05-14)**
- **Problem:** Two re-listings of the same audiobook (same title, same narrator set, same duration, different ASINs) showed as two cards on series detail pages, even after the works table had already linked them via search-page dedup.
- **Root cause (two-part):** (1) HTML scrapers used `$el.find('a[href*="searchNarrator="]').first()` for multi-narrator productions, capturing only the first co-narrator. So two listings of the same recording landed in `deduplicateAndCollectGroups` with mismatched single-narrator strings and never merged. (2) `deduplicateAndCollectGroups` was stateless — it wrote to the works table but never read it back, so even when one path (e.g. search) successfully merged two ASINs and persisted the Work, every other path (series, author books) re-derived the dedup decision from scratch and split them again.
- **Fix:** (1) New `extractAllNarrators()` helper (`src/lib/utils/extract-narrator.ts`) captures every `searchNarrator=` anchor and joins them; all three HTML scrapers route through it. (2) New `collapseByExistingWorks()` consults the works table after the local pass and collapses any remaining books sharing a `workId`. Wired into the three user-facing discovery routes (search / author books / series detail). Skipped for watched-list background jobs — those feed `request-creator.service.ts` which already does sibling-aware dedup.
- **Location:** `src/lib/utils/extract-narrator.ts` (new); `src/lib/integrations/audible-series.ts` (parseSeriesBooks); `src/lib/integrations/audible.service.ts` (parseProductListItems + parseSearchResultItems); `src/lib/utils/deduplicate-audiobooks.ts` (`metadataScore` exported); `src/lib/services/works.service.ts` (`collapseByExistingWorks` added); three API routes updated.
**Some audiobooks missing from search results (2026-01-07)**
- **Problem:** ASIN extraction only matched `/pd/` URLs but some audiobooks use `/ac/` URLs
- **Impact:** Books like "Beatitude" by DJ Krimmer (ASIN: B0DVH7XL36) were skipped
- **Fix:** Updated ASIN regex to match both `/pd/` and `/ac/` URL patterns: `/\/(?:pd|ac)\/[^\/]+\/([A-Z0-9]{10})/`
- **Location:** `src/lib/integrations/audible.service.ts:75, 161, 240`
- **Affects:** `getPopularAudiobooks()`, `getNewReleases()`, `search()` methods
**Discovery refresh reverted to curated HTML scraping (2026-05-14)**
- **Problem:** After switching all catalog ops to the JSON catalog API in `f564d0a`, the nightly discovery refresh (Popular / New Releases / user-configured Categories) started serving junk: New Releases became 100% preorders out to 2027, and Popular was dominated by launch-day no-name shovelware.
- **Root cause:** `products_sort_by=BestSellers` is a right-now sales velocity rank that spikes on launch promos and preorder windows; `-ReleaseDate` returns all catalog items in date order with no released-only filter. The catalog API exposes no server-side filter to exclude preorders or sort by established popularity (verified by exhaustively testing `release_time`, `availability_status`, `customer_rights`, `Reviewed`/`MostListened`/`SalesRank` sorts — all silently ignored or rejected). Doing the curation client-side would have made RMAB the editorial curator, which Audible's storefront pages already do well.
- **Fix:** Hybrid architecture — the three refresh-only methods (`getPopularAudiobooks`, `getNewReleases`, `getCategoryBooks`) went back to scraping Audible's curated HTML storefronts (`/adblbestsellers`, `/newreleases`, `/search?node=<id>&sort=popularity-rank`). All user-facing real-time paths (search, author books, categories listing, per-ASIN details) stayed on the JSON catalog API. To keep the higher-503-risk HTML traffic resilient on the unattended nightly job, `fetchWithRetry()` accepts an optional `maxBackoffMs` cap and HTML callers use `HTML_MAX_RETRIES=12` + `HTML_MAX_BACKOFF_MS=180_000` (3-min cap). Healthy users finish quickly; 503-blocked users grind through patiently.
- **Location:** `src/lib/integrations/audible.service.ts` (three methods + two private parsers `parseProductListItems` / `parseSearchResultItems`); `src/lib/utils/scrape-resilience.ts` (`jitteredBackoff` cap parameter).
**Audiobookshelf metadata matching not respecting configured region (2026-01-28)**
- **Problem:** `triggerABSItemMatch()` hardcoded `'audible'` provider (audible.com) instead of respecting user's configured Audible region
- **Impact:** Users with non-US regions (CA, UK, AU, IN) had incorrect metadata matching in Audiobookshelf, causing wrong ASINs and poor search results
- **Fix:** Added `mapRegionToABSProvider()` to convert RMAB region codes to AudiobookShelf provider values. US → `'audible'`, others → `'audible.{region}'` (e.g., `'audible.ca'`, `'audible.uk'`)
- **Problem:** `triggerABSItemMatch()` hardcoded `'audible'` provider (audible.com) instead of respecting user's configured Audible region.
- **Impact:** Users with non-US regions (CA, UK, AU, IN) had incorrect metadata matching in Audiobookshelf, causing wrong ASINs.
- **Fix:** Added `mapRegionToABSProvider()` to convert RMAB region codes to Audiobookshelf provider values. US → `'audible'`, others → `'audible.{region}'` (e.g. `'audible.ca'`, `'audible.uk'`).
- **Location:** `src/lib/services/audiobookshelf/api.ts:14, 147`
- **Affects:** All Audiobookshelf metadata matching operations
**Non-English locale pages served to users outside US (2026-02-05)**
- **Problem:** Audible uses IP geolocation to serve locale-specific pages (e.g., Spanish content for Dominican Republic IPs). `ipRedirectOverride=true` only prevents region redirects (audible.com → audible.co.uk), NOT language/locale changes.
- **Impact:** Users self-hosting from non-English-speaking countries got non-English bestsellers/new releases on their homepage.
- **Fix:** Added `language=english` query parameter to all Audible requests via axios default params. Audible respects this parameter and serves English content regardless of IP geolocation. Fails gracefully for regions where English isn't available.
- **Location:** `src/lib/integrations/audible.service.ts``initialize()` (axios default params)
- **Affects:** All Audible scraping: popular, new releases, search, detail pages
- **Problem:** Audible uses IP geolocation to serve locale-specific pages. `ipRedirectOverride=true` only prevents region redirects, NOT language/locale changes.
- **Impact:** Users self-hosting from non-English-speaking countries got non-English content on HTML-scraped surfaces.
- **Fix:** Added `language=<audibleLocaleParam>` default param on `htmlClient` (axios default params). Still in effect for the remaining HTML path (`audible-series.ts`). **Not applied to `apiClient`** — the catalog JSON API is region-bound via `apiBaseUrl` and does not require the language param.
- **Location:** `src/lib/integrations/audible.service.ts``initialize()` (htmlClient params)
## Related
- [Audiobookshelf Integration](./audiobookshelf.md)
- [Plex Integration](./plex.md)
- [Ranking Algorithm](../phase3/ranking-algorithm.md)