Expose language, formatType, and publisherName from the Audible catalog. Update audible.service to map format_type and publisher_name (and language) into the AudibleAudiobook model, update AudiobookDetailsModal to display language and format using the CSS "capitalize" class, and update documentation to list the new fields. Add unit tests to verify the mappings, details propagation, and behavior when fields are omitted.
22 KiB
Audible Integration
Status: Implemented | Hybrid — curated HTML for discovery refresh + Audible JSON catalog API for user-facing real-time + Audnexus for per-ASIN details
Overview
Audiobook metadata for discovery, search, and detail pages. Split by access pattern:
- Nightly discovery refresh (popular / new releases / category lists) — scraped from Audible's curated HTML storefronts (
www.audible.<tld>/adblbestsellers,/newreleases,/search?node=<id>). The HTML pages reflect Audible's own editorial picks. - User-facing real-time (search, author books, categories listing, per-ASIN details) — Audible's unauthenticated public JSON catalog API (
api.audible.<tld>/1.0/catalog/*). - Per-ASIN detail lookups — Audnexus (
api.audnex.us/books/{asin}) primary; catalog API used as fallback when Audnexus returns 404.
Architecture
- Curated HTML (refresh job only): the three methods called solely by
audible-refresh.processor.ts(getPopularAudiobooks,getNewReleases,getCategoryBooks) scrape Audible's storefront HTML to inherit editorial curation. Beefed-up retry/backoff knobs (12 retries, 3-min jittered cap) handle 503 storms patiently on the nightly job without slowing healthy users. - JSON catalog API (real-time):
search,searchByAuthorAsin,getCategories(categories listing), andfetchAudibleDetailsFromApi(per-ASIN fallback). Same endpoint used by the official Audible mobile apps. No authentication, no API key, no user credentials, no special headers. - Audnexus (per-ASIN):
getAudiobookDetailsandgetRuntimeprefer Audnexus, with catalog API fallback forgetAudiobookDetails. www.audible.<tld>: Used by HTML refresh scraping, byaudible-series.ts, and bygetBaseUrl()for "View on Audible" link generation.
Data Sources
Nightly refresh (HTML — htmlClient, baseURL www.audible.<tld>)
| Operation | Endpoint | Key params |
|---|---|---|
| Popular | /adblbestsellers |
pageSize=50, page=<n> (omitted on first page) |
| New releases | /newreleases |
pageSize=50, page=<n> (omitted on first page) |
| Category books | /search |
node=<categoryId>&pageSize=50&sort=popularity-rank&page=<n> |
Parsed via cheerio. Selectors: .productListItem (popular/new releases), .s-result-item, .productListItem (categories).
Real-time (JSON catalog API — apiClient, baseURL api.audible.<tld>)
| Operation | Endpoint | Key params |
|---|---|---|
| Search | /1.0/catalog/products |
keywords=<q> |
| Author books | /1.0/catalog/products |
author=<name> (name, NOT ASIN) |
| Categories listing | /1.0/catalog/categories |
(none) |
| Single product | /1.0/catalog/products/{asin} |
— |
| Audnexus (per-ASIN) | https://api.audnex.us/books/{asin} |
region={audnexusParam} |
All products endpoints share:
num_results— max 50 (service constantAUDIBLE_PAGE_SIZE = 50)page— 0-indexed at the API (service public interface is 1-indexed; the service subtracts 1 at the call site). See Gotchas.response_groups=<CATALOG_RESPONSE_GROUPS>
response_groups Constant
CATALOG_RESPONSE_GROUPS = 'contributors,product_desc,product_attrs,product_extended_attrs,media,rating,series,category_ladders,product_details'
Populates every AudibleAudiobook field. Covered:
contributors→ authors (with ASINs), narratorsproduct_desc→publisher_summary,merchandising_summaryproduct_attrs/product_extended_attrs/product_details→ title, release_date, language, runtime_length_minmedia→product_images(cover URLs, uses500variant)rating→overall_distribution.display_starsseries→ array of{asin, title, sequence}category_ladders→ genre names (deduped, capped at 5)
Gotchas
- Catalog API cannot filter preorders or surface curated bestsellers. The API's
BestSellerssort is a right-now velocity rank that spikes on launch-day promos and preorder windows; the-ReleaseDatesort returns 100% future preorders. There is no server-siderelease_time,released-only,customer_rights, or alternate sort (Reviewed,MostListened, etc.) — every plausible variant was tested and silently ignored. This is why the nightly refresh job uses the curated HTML storefront pages instead. author=takes a name, not an ASIN. The catalog API has no ASIN-based author param.searchByAuthorAsin()queries by name, then filters client-side: keeps only products whereproducts[].authors[].asin === authorAsin. Preserves ASIN-authoritative author identity. Also filters byproduct.languageviaisAcceptedLanguage()for the configured region.- Invalid ASIN returns HTTP 200 with stub body.
/1.0/catalog/products/{asin}responds 200 with{product: {asin: INPUT}}and no other fields.fetchAudibleDetailsFromApi()detects this via missingproduct.titleand returnsnull. publisher_summaryis HTML. Service strips tags via inlinestripHtml()helper (regex-based, no cheerio) before populatingdescription. Falls back tomerchandising_summary(plain text) ifpublisher_summarymissing.- Series is an array.
products[].series[]— a book may belong to multiple series. Service picks the first entry with non-emptysequence, else the first entry.sequenceis cleaned by extracting first/\d+(?:\.\d+)?/match for numeric ordering. - Stub
product_images: cover URL reads fromproduct_images['500']; missing keys fall back toundefined. pageis 0-indexed (catalog API only). Despite the default value appearing to be 1, the API returns items(page * num_results)through((page + 1) * num_results - 1). Sopage=1fetches items 51–100, not 1–50. All catalog-API service methods accept a 1-indexedpageand subtract 1 at the axios call. The symptom of getting this wrong is silent: queries whosetotal_results ≤ num_resultsreturn an emptyproductsarray whiletotal_resultsis populated (e.g. author searches for small catalogues). HTML paths use Audible's native 1-indexedpagequery param and omit it on the first page.
Rate Limiting & Resilience
- Real-time JSON API paths: 503s are uncommon.
fetchWithRetry()uses jittered exponential backoff, 5 retries, retries on 503/429/5xx. API responses includeCache-Control: private, max-age=1800. - Nightly HTML refresh paths: 503s are more likely (HTML storefront is more rate-sensitive). Same
fetchWithRetry(), but withHTML_MAX_RETRIES=12andHTML_MAX_BACKOFF_MS=180_000(3-minute cap on jittered backoff). Healthy refreshes still complete fast (per-page success on attempt 0); users hit by sustained 503 storms grind through patiently rather than abandoning the refresh. AdaptivePacer— inter-page delay 2–4 s baseline, scales up multiplicatively under retry pressure, with a 45–60 s circuit-breaker cooldown after 3 consecutive retry-pages.- Per-batch cooldowns in
audible-refresh.processor.ts— 15–30 s between popular/new-releases, 10–20 s between categories.
Region Configuration
Status: Implemented
Configurable Audible region for accurate metadata matching across international stores.
Supported Regions:
| Code | Name | HTML baseUrl | apiBaseUrl | isEnglish |
|---|---|---|---|---|
us |
United States | https://www.audible.com |
https://api.audible.com |
true (default) |
ca |
Canada | https://www.audible.ca |
https://api.audible.ca |
true |
uk |
United Kingdom | https://www.audible.co.uk |
https://api.audible.co.uk |
true |
au |
Australia | https://www.audible.com.au |
https://api.audible.com.au |
true |
in |
India | https://www.audible.in |
https://api.audible.in |
true |
de |
Germany | https://www.audible.de |
https://api.audible.de |
false |
es |
Spain | https://www.audible.es |
https://api.audible.es |
false |
fr |
France | https://www.audible.fr |
https://api.audible.fr |
false |
AudibleRegionConfig fields: code, name, baseUrl, apiBaseUrl, audnexusParam, language.
isEnglish flag:
- Non-English regions show amber warning in region dropdowns (setup wizard + admin settings): "Many features such as search, discovery, and metadata matching are not yet fully supported for non-English regions."
- Dropdown options for non-English regions show
*suffix.
Why regions matter:
- Each Audible region uses different ASINs for the same audiobook.
- Metadata engines (Audnexus / Audible Agent) in Plex / Audiobookshelf must match RMAB's region.
Configuration:
- Key:
audible.region(stored in database) - Default:
us - Set during: Setup wizard (Backend Selection step) or Admin Settings (Library tab)
- Auto-detection: Service checks config before each request and re-initializes if region changed.
- Cache clearing: Region change clears ConfigService cache and AudibleService state.
- Automatic refresh: Region change triggers
audible_refreshjob.
Per-region HTTP clients (on init):
apiClient—baseURL=apiBaseUrl,Accept: application/json,User-Agent: ReadMeABook/1.0, no language/ipRedirect params. Used for the real-time JSON catalog operations (search, author books, categories listing, per-ASIN details fallback).htmlClient—baseURL=baseUrl, rotating browser headers (pickUserAgent+getBrowserHeaders), default paramsipRedirectOverride=true+language=<audibleLocaleParam>. Used by the nightly discovery refresh (/adblbestsellers,/newreleases,/search?node=...), byaudible-series.ts, and bygetBaseUrl()-based link generation.- Audnexus calls include
region=<audnexusParam>.
Files:
- Types:
src/lib/types/audible.ts - Service:
src/lib/integrations/audible.service.ts - Series (HTML):
src/lib/integrations/audible-series.ts - Config:
src/lib/services/config.service.ts - API:
src/app/api/admin/settings/audible/route.ts
Unified Matching (audiobook-matcher.ts)
Status: Production Ready (ASIN-Only Matching)
Single matching algorithm used everywhere (search, popular, new-releases, jobs).
Process (Library Availability Checks):
- Query DB directly by ASIN (indexed O(1) lookup)
- Check ASIN in dedicated field (100% confidence)
- Check ASIN in plexGuid (backward compatibility)
- Return match or null (no fuzzy fallback)
Match Priority:
findPlexMatch(): ASIN (field) → ASIN (GUID) → nullmatchAudiobook(): ASIN → ISBN → null
Note: Fuzzy matching (70% threshold) is preserved in ranking-algorithm.ts for Prowlarr torrent ranking. Library availability checks require exact ASIN matches only.
Dedup & Works Table
Status: ✅ Implemented | Two-pass dedup on every discovery view + cross-batch identity via works table
Discovery views (search, author books, series detail) collapse duplicate Audible listings for the same recording (publisher re-listings, regional re-issues, full-cast vs single-narrator productions) into a single card. Two passes run in sequence:
-
Local pass —
deduplicateAndCollectGroups()(src/lib/utils/deduplicate-audiobooks.ts)- Stateless, in-memory. Keys books by normalized title + sorted narrator set + duration (±max(5%, 10 min) tolerance), with subtitle compatibility to keep distinct series entries separate.
- Picks a canonical representative per group by
metadataScore()(cover + rating + duration + description + narrator + release date + genres). - Emits
DedupGroup[]describing every multi-ASIN collapse → handed topersistDedupGroups()for the works table.
-
Works pass —
collapseByExistingWorks()(src/lib/services/works.service.ts)- Async DB lookup. Reads
work_asinsfor every ASIN in the local-passed list and collapses any books sharing aworkIdto one representative (samemetadataScore()ranking). - Catches duplicates the local pass misses: source-metadata divergence (e.g. HTML scraper captured different narrators), cross-page splits (paginated series), or non-matching field shapes.
- Degrades gracefully — returns the input unchanged on DB failure (view still renders).
- Async DB lookup. Reads
Works Table Schema
Work { id, title, author }— one row per logical bookWorkAsin { id, workId, asin, narrator?, durationMinutes?, isCanonical, source, createdAt }— many ASINs per Work
Population Layers
- Layer 1 (auto):
persistDedupGroups()writes whenever the local pass finds a duplicate. Merges across pre-existing works when a new group spans them. - Layer 2 (seed):
seedAsin()writes a single-ASIN work at request creation time, ensuring every requested ASIN has an entry to grow from.
Read Paths
collapseByExistingWorks()— view-level collapse (this section).getSiblingAsins()— library availability matching (audiobook-matcher.ts), request-creation duplicate prevention (request-creator.service.ts), ignored-audiobook expansion. Returns sibling ASINs grouped by input ASIN.
Narrator Capture in HTML Scrapers
- HTML scrapers (
audible-series.ts, the twoparse*Itemsparsers inaudible.service.ts) capture all narrator anchors viaextractAllNarrators()(src/lib/utils/extract-narrator.ts). Multi-narrator productions render each name as its own<a href="?searchNarrator=...">link; capturing only the first (prior bug) made co-narrated audiobooks fail to dedup. Order is not significant —normalizeNarrator()sorts before comparison.
Wired Routes
src/app/api/audiobooks/search/route.tssrc/app/api/authors/[asin]/books/route.tssrc/app/api/series/[asin]/route.ts
Watched-list background jobs (watched-lists.service.ts) run the local pass only — they don't render a view, and the downstream request-creator.service.ts already does sibling-aware dedup at request creation time.
Database-First Approach
Status: Implemented
Discovery APIs serve cached data from DB with real-time matching.
Flow:
audible_refreshcron runs daily → fetches 200 popular + 200 new releases + user-configured categories by scraping Audible's curated HTML storefronts (/adblbestsellers,/newreleases,/search?node=<id>&sort=popularity-rank).- Downloads and caches cover thumbnails locally.
- Stores metadata in
audible_cache, ranked entries inaudible_cache_categorieswith reserved IDs (__popular__,__new_releases__) and user category IDs. - Cleans up unused thumbnails after sync.
- API routes query
AudibleCacheCategoryby categoryId → join withAudibleCachemetadata → apply real-time matching → return enriched results. - Homepage loads instantly (no Audible HTTP hits at request time).
Thumbnail Caching
Status: Implemented
Cover images cached locally to reduce external requests.
- Downloads covers during
audible_refreshjob. - Stores in
/app/cache/thumbnails(Docker volume). - Serves via
/api/cache/thumbnails/[filename]. - Auto-cleanup of unused thumbnails.
- Falls back to original URL if cache fails.
- 24-hour browser cache headers.
- Filename:
{asin}.{ext}(e.g.B08G9PRS1K.jpg).
Files:
- Service:
src/lib/services/thumbnail-cache.service.ts - API Route:
src/app/api/cache/thumbnails/[filename]/route.ts - Storage: Docker volume
cachemounted at/app/cache
App-Level API Endpoints
GET /api/audiobooks/popular?page=1&limit=20 GET /api/audiobooks/new-releases?page=1&limit=20
Response:
{
success: boolean;
audiobooks: EnrichedAudibleAudiobook[];
count: number;
totalCount: number;
page: number;
totalPages: number;
hasMore: boolean;
lastSync: string | null; // ISO timestamp
message?: string; // if no data
}
Data Models
interface AudibleAudiobook {
asin: string;
title: string;
author: string;
authorAsin?: string;
narrator?: string;
description?: string;
coverArtUrl?: string;
durationMinutes?: number;
releaseDate?: string;
rating?: number;
genres?: string[];
series?: string;
seriesPart?: string;
seriesAsin?: string;
language?: string;
formatType?: string;
publisherName?: string;
}
interface EnrichedAudibleAudiobook extends AudibleAudiobook {
availabilityStatus: 'available' | 'requested' | 'unknown';
isAvailable: boolean;
plexGuid: string | null;
dbId: string;
}
interface AudibleSearchResult {
query: string;
results: AudibleAudiobook[];
totalResults: number;
page: number;
hasMore: boolean;
}
interface AuthorBooksResult {
books: AudibleAudiobook[];
hasMore: boolean;
page: number;
totalResults: number;
}
Tech Stack
axios(HTTP, two clients:apiClientfor JSON catalog API,htmlClientfor HTML refresh + series scraping)cheerio(HTML parsing for refresh job andaudible-series.ts)- Audnexus API (per-ASIN details, primary)
- PostgreSQL (
audible_cache,audible_cache_categories)
Fixed Issues
Series-page duplicates not collapsing across user views (2026-05-14)
- Problem: Two re-listings of the same audiobook (same title, same narrator set, same duration, different ASINs) showed as two cards on series detail pages, even after the works table had already linked them via search-page dedup.
- Root cause (two-part): (1) HTML scrapers used
$el.find('a[href*="searchNarrator="]').first()for multi-narrator productions, capturing only the first co-narrator. So two listings of the same recording landed indeduplicateAndCollectGroupswith mismatched single-narrator strings and never merged. (2)deduplicateAndCollectGroupswas stateless — it wrote to the works table but never read it back, so even when one path (e.g. search) successfully merged two ASINs and persisted the Work, every other path (series, author books) re-derived the dedup decision from scratch and split them again. - Fix: (1) New
extractAllNarrators()helper (src/lib/utils/extract-narrator.ts) captures everysearchNarrator=anchor and joins them; all three HTML scrapers route through it. (2) NewcollapseByExistingWorks()consults the works table after the local pass and collapses any remaining books sharing aworkId. Wired into the three user-facing discovery routes (search / author books / series detail). Skipped for watched-list background jobs — those feedrequest-creator.service.tswhich already does sibling-aware dedup. - Location:
src/lib/utils/extract-narrator.ts(new);src/lib/integrations/audible-series.ts(parseSeriesBooks);src/lib/integrations/audible.service.ts(parseProductListItems + parseSearchResultItems);src/lib/utils/deduplicate-audiobooks.ts(metadataScoreexported);src/lib/services/works.service.ts(collapseByExistingWorksadded); three API routes updated.
Discovery refresh reverted to curated HTML scraping (2026-05-14)
- Problem: After switching all catalog ops to the JSON catalog API in
f564d0a, the nightly discovery refresh (Popular / New Releases / user-configured Categories) started serving junk: New Releases became 100% preorders out to 2027, and Popular was dominated by launch-day no-name shovelware. - Root cause:
products_sort_by=BestSellersis a right-now sales velocity rank that spikes on launch promos and preorder windows;-ReleaseDatereturns all catalog items in date order with no released-only filter. The catalog API exposes no server-side filter to exclude preorders or sort by established popularity (verified by exhaustively testingrelease_time,availability_status,customer_rights,Reviewed/MostListened/SalesRanksorts — all silently ignored or rejected). Doing the curation client-side would have made RMAB the editorial curator, which Audible's storefront pages already do well. - Fix: Hybrid architecture — the three refresh-only methods (
getPopularAudiobooks,getNewReleases,getCategoryBooks) went back to scraping Audible's curated HTML storefronts (/adblbestsellers,/newreleases,/search?node=<id>&sort=popularity-rank). All user-facing real-time paths (search, author books, categories listing, per-ASIN details) stayed on the JSON catalog API. To keep the higher-503-risk HTML traffic resilient on the unattended nightly job,fetchWithRetry()accepts an optionalmaxBackoffMscap and HTML callers useHTML_MAX_RETRIES=12+HTML_MAX_BACKOFF_MS=180_000(3-min cap). Healthy users finish quickly; 503-blocked users grind through patiently. - Location:
src/lib/integrations/audible.service.ts(three methods + two private parsersparseProductListItems/parseSearchResultItems);src/lib/utils/scrape-resilience.ts(jitteredBackoffcap parameter).
Audiobookshelf metadata matching not respecting configured region (2026-01-28)
- Problem:
triggerABSItemMatch()hardcoded'audible'provider (audible.com) instead of respecting user's configured Audible region. - Impact: Users with non-US regions (CA, UK, AU, IN) had incorrect metadata matching in Audiobookshelf, causing wrong ASINs.
- Fix: Added
mapRegionToABSProvider()to convert RMAB region codes to Audiobookshelf provider values. US →'audible', others →'audible.{region}'(e.g.'audible.ca','audible.uk'). - Location:
src/lib/services/audiobookshelf/api.ts:14, 147
Non-English locale pages served to users outside US (2026-02-05)
- Problem: Audible uses IP geolocation to serve locale-specific pages.
ipRedirectOverride=trueonly prevents region redirects, NOT language/locale changes. - Impact: Users self-hosting from non-English-speaking countries got non-English content on HTML-scraped surfaces.
- Fix: Added
language=<audibleLocaleParam>default param onhtmlClient(axios default params). Still in effect for the remaining HTML path (audible-series.ts). Not applied toapiClient— the catalog JSON API is region-bound viaapiBaseUrland does not require the language param. - Location:
src/lib/integrations/audible.service.ts—initialize()(htmlClient params)