mirror of
https://github.com/kikootwo/ReadMeABook.git
synced 2026-06-02 20:30:10 +00:00
Audible: HTML refresh, multi-narrator & works dedup
Switch nightly discovery refresh to scrape Audible's curated HTML storefronts (popular, new releases, category pages) while keeping real-time user paths on the JSON catalog API. Add robust HTML resilience knobs (increased retries, capped jittered backoff, AdaptivePacer changes and per-batch cooldowns) to avoid failing nightly jobs during 503 storms. Implement multi-narrator capture via a new extractAllNarrators helper and update parsers to preserve all narrator anchors. Introduce two-pass dedup: in-memory deduplicateAndCollectGroups + collapseByExistingWorks that consults the works table, export metadataScore for consistent representative selection, and persist dedup groups (fire-and-forget). Wire collapseByExistingWorks into search/author/series routes and make defensive dedup in the refresh processor. Add HTML parsing helpers, runtime/lang-aware parsing, jitteredBackoff cap, and tests for the new behaviors.
This commit is contained in:
@@ -45,6 +45,8 @@
|
|||||||
- **Web scraping (popular, new releases)** → [integrations/audible.md](integrations/audible.md)
|
- **Web scraping (popular, new releases)** → [integrations/audible.md](integrations/audible.md)
|
||||||
- **Database caching, real-time matching** → [integrations/audible.md](integrations/audible.md)
|
- **Database caching, real-time matching** → [integrations/audible.md](integrations/audible.md)
|
||||||
- **Book covers API for login page** → [frontend/pages/login.md](frontend/pages/login.md)
|
- **Book covers API for login page** → [frontend/pages/login.md](frontend/pages/login.md)
|
||||||
|
- **Dedup & works table (cross-ASIN identity)** → [integrations/audible.md](integrations/audible.md#dedup--works-table)
|
||||||
|
- **Multi-narrator capture in HTML scrapers** → [integrations/audible.md](integrations/audible.md#narrator-capture-in-html-scrapers)
|
||||||
|
|
||||||
## E-book Support (First-Class)
|
## E-book Support (First-Class)
|
||||||
- **First-class ebook requests, separate tracking** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
|
- **First-class ebook requests, separate tracking** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
|
||||||
|
|||||||
@@ -1,29 +1,40 @@
|
|||||||
# Audible Integration
|
# Audible Integration
|
||||||
|
|
||||||
**Status:** Implemented | Unauthenticated Audible JSON catalog API (primary) + Audnexus API (per-ASIN details)
|
**Status:** Implemented | Hybrid — curated HTML for discovery refresh + Audible JSON catalog API for user-facing real-time + Audnexus for per-ASIN details
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Audiobook metadata for discovery, search, and detail pages. All catalog operations (search, popular, new releases, categories, category books, author books, single-product details) now call Audible's unauthenticated public JSON catalog API (`api.audible.<tld>/1.0/catalog/*`). Per-ASIN detail lookups prefer Audnexus; the catalog API is used as fallback.
|
Audiobook metadata for discovery, search, and detail pages. Split by access pattern:
|
||||||
|
|
||||||
|
- **Nightly discovery refresh** (popular / new releases / category lists) — scraped from Audible's **curated HTML storefronts** (`www.audible.<tld>/adblbestsellers`, `/newreleases`, `/search?node=<id>`). The HTML pages reflect Audible's own editorial picks.
|
||||||
|
- **User-facing real-time** (search, author books, categories listing, per-ASIN details) — Audible's unauthenticated public **JSON catalog API** (`api.audible.<tld>/1.0/catalog/*`).
|
||||||
|
- **Per-ASIN detail lookups** — Audnexus (`api.audnex.us/books/{asin}`) primary; catalog API used as fallback when Audnexus returns 404.
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
- **Primary data source:** Audible JSON catalog API, same endpoint used by the official Audible mobile apps. No authentication, no API key, no user credentials, no special headers.
|
- **Curated HTML (refresh job only):** the three methods called solely by `audible-refresh.processor.ts` (`getPopularAudiobooks`, `getNewReleases`, `getCategoryBooks`) scrape Audible's storefront HTML to inherit editorial curation. Beefed-up retry/backoff knobs (12 retries, 3-min jittered cap) handle 503 storms patiently on the nightly job without slowing healthy users.
|
||||||
- **Per-ASIN details:** Audnexus (`api.audnex.us/books/{asin}`) remains primary; catalog API (`/1.0/catalog/products/{asin}`) is the fallback when Audnexus returns 404.
|
- **JSON catalog API (real-time):** `search`, `searchByAuthorAsin`, `getCategories` (categories listing), and `fetchAudibleDetailsFromApi` (per-ASIN fallback). Same endpoint used by the official Audible mobile apps. No authentication, no API key, no user credentials, no special headers.
|
||||||
- **HTML scraping:** Removed from `audible.service.ts`. The only remaining HTML path is `audible-series.ts` (series-page scraping, out of scope).
|
- **Audnexus (per-ASIN):** `getAudiobookDetails` and `getRuntime` prefer Audnexus, with catalog API fallback for `getAudiobookDetails`.
|
||||||
- **`www.audible.<tld>`:** Still used by `audible-series.ts` and by `getBaseUrl()` for "View on Audible" link generation. Not used for any catalog operation.
|
- **`www.audible.<tld>`:** Used by HTML refresh scraping, by `audible-series.ts`, and by `getBaseUrl()` for "View on Audible" link generation.
|
||||||
|
|
||||||
## Data Sources
|
## Data Sources
|
||||||
|
|
||||||
All catalog operations are HTTP GET against `{apiBaseUrl}` (region-dependent, e.g. `https://api.audible.com`):
|
### Nightly refresh (HTML — `htmlClient`, baseURL `www.audible.<tld>`)
|
||||||
|
|
||||||
|
| Operation | Endpoint | Key params |
|
||||||
|
|---|---|---|
|
||||||
|
| Popular | `/adblbestsellers` | `pageSize=50`, `page=<n>` (omitted on first page) |
|
||||||
|
| New releases | `/newreleases` | `pageSize=50`, `page=<n>` (omitted on first page) |
|
||||||
|
| Category books | `/search` | `node=<categoryId>&pageSize=50&sort=popularity-rank&page=<n>` |
|
||||||
|
|
||||||
|
Parsed via cheerio. Selectors: `.productListItem` (popular/new releases), `.s-result-item, .productListItem` (categories).
|
||||||
|
|
||||||
|
### Real-time (JSON catalog API — `apiClient`, baseURL `api.audible.<tld>`)
|
||||||
|
|
||||||
| Operation | Endpoint | Key params |
|
| Operation | Endpoint | Key params |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Search | `/1.0/catalog/products` | `keywords=<q>` |
|
| Search | `/1.0/catalog/products` | `keywords=<q>` |
|
||||||
| Author books | `/1.0/catalog/products` | `author=<name>` (name, NOT ASIN) |
|
| Author books | `/1.0/catalog/products` | `author=<name>` (name, NOT ASIN) |
|
||||||
| Popular | `/1.0/catalog/products` | `products_sort_by=BestSellers` |
|
|
||||||
| New releases | `/1.0/catalog/products` | `products_sort_by=-ReleaseDate` |
|
|
||||||
| Category books | `/1.0/catalog/products` | `category_id=<id>&products_sort_by=BestSellers` |
|
|
||||||
| Categories listing | `/1.0/catalog/categories` | (none) |
|
| Categories listing | `/1.0/catalog/categories` | (none) |
|
||||||
| Single product | `/1.0/catalog/products/{asin}` | — |
|
| Single product | `/1.0/catalog/products/{asin}` | — |
|
||||||
| Audnexus (per-ASIN) | `https://api.audnex.us/books/{asin}` | `region={audnexusParam}` |
|
| Audnexus (per-ASIN) | `https://api.audnex.us/books/{asin}` | `region={audnexusParam}` |
|
||||||
@@ -48,20 +59,20 @@ Populates every `AudibleAudiobook` field. Covered:
|
|||||||
|
|
||||||
## Gotchas
|
## Gotchas
|
||||||
|
|
||||||
|
- **Catalog API cannot filter preorders or surface curated bestsellers.** The API's `BestSellers` sort is a right-now velocity rank that spikes on launch-day promos and preorder windows; the `-ReleaseDate` sort returns 100% future preorders. There is no server-side `release_time`, `released-only`, `customer_rights`, or alternate sort (`Reviewed`, `MostListened`, etc.) — every plausible variant was tested and silently ignored. This is why the nightly refresh job uses the curated HTML storefront pages instead.
|
||||||
- **`author=` takes a name, not an ASIN.** The catalog API has no ASIN-based author param. `searchByAuthorAsin()` queries by name, then filters client-side: keeps only products where `products[].authors[].asin === authorAsin`. Preserves ASIN-authoritative author identity. Also filters by `product.language` via `isAcceptedLanguage()` for the configured region.
|
- **`author=` takes a name, not an ASIN.** The catalog API has no ASIN-based author param. `searchByAuthorAsin()` queries by name, then filters client-side: keeps only products where `products[].authors[].asin === authorAsin`. Preserves ASIN-authoritative author identity. Also filters by `product.language` via `isAcceptedLanguage()` for the configured region.
|
||||||
- **Invalid ASIN returns HTTP 200 with stub body.** `/1.0/catalog/products/{asin}` responds 200 with `{product: {asin: INPUT}}` and no other fields. `fetchAudibleDetailsFromApi()` detects this via missing `product.title` and returns `null`.
|
- **Invalid ASIN returns HTTP 200 with stub body.** `/1.0/catalog/products/{asin}` responds 200 with `{product: {asin: INPUT}}` and no other fields. `fetchAudibleDetailsFromApi()` detects this via missing `product.title` and returns `null`.
|
||||||
- **`publisher_summary` is HTML.** Service strips tags via inline `stripHtml()` helper (regex-based, no cheerio) before populating `description`. Falls back to `merchandising_summary` (plain text) if `publisher_summary` missing.
|
- **`publisher_summary` is HTML.** Service strips tags via inline `stripHtml()` helper (regex-based, no cheerio) before populating `description`. Falls back to `merchandising_summary` (plain text) if `publisher_summary` missing.
|
||||||
- **Series is an array.** `products[].series[]` — a book may belong to multiple series. Service picks the first entry with non-empty `sequence`, else the first entry. `sequence` is cleaned by extracting first `/\d+(?:\.\d+)?/` match for numeric ordering.
|
- **Series is an array.** `products[].series[]` — a book may belong to multiple series. Service picks the first entry with non-empty `sequence`, else the first entry. `sequence` is cleaned by extracting first `/\d+(?:\.\d+)?/` match for numeric ordering.
|
||||||
- **Stub `product_images`:** cover URL reads from `product_images['500']`; missing keys fall back to `undefined`.
|
- **Stub `product_images`:** cover URL reads from `product_images['500']`; missing keys fall back to `undefined`.
|
||||||
- **`page` is 0-indexed.** Despite the default value appearing to be 1, the API returns items `(page * num_results)` through `((page + 1) * num_results - 1)`. So `page=1` fetches items 51–100, not 1–50. All service methods accept a 1-indexed `page` and subtract 1 at the axios call. The symptom of getting this wrong is silent: queries whose `total_results ≤ num_results` return an empty `products` array while `total_results` is populated (e.g. author searches for small catalogues).
|
- **`page` is 0-indexed (catalog API only).** Despite the default value appearing to be 1, the API returns items `(page * num_results)` through `((page + 1) * num_results - 1)`. So `page=1` fetches items 51–100, not 1–50. All catalog-API service methods accept a 1-indexed `page` and subtract 1 at the axios call. The symptom of getting this wrong is silent: queries whose `total_results ≤ num_results` return an empty `products` array while `total_results` is populated (e.g. author searches for small catalogues). HTML paths use Audible's native 1-indexed `page` query param and omit it on the first page.
|
||||||
|
|
||||||
## Rate Limiting & Resilience
|
## Rate Limiting & Resilience
|
||||||
|
|
||||||
- 503s still possible but dramatically less frequent than the HTML surface.
|
- **Real-time JSON API paths:** 503s are uncommon. `fetchWithRetry()` uses jittered exponential backoff, 5 retries, retries on 503/429/5xx. API responses include `Cache-Control: private, max-age=1800`.
|
||||||
- `fetchWithRetry()` — jittered exponential backoff, 5 retries, retries on 503/429/5xx.
|
- **Nightly HTML refresh paths:** 503s are more likely (HTML storefront is more rate-sensitive). Same `fetchWithRetry()`, but with `HTML_MAX_RETRIES=12` and `HTML_MAX_BACKOFF_MS=180_000` (3-minute cap on jittered backoff). Healthy refreshes still complete fast (per-page success on attempt 0); users hit by sustained 503 storms grind through patiently rather than abandoning the refresh.
|
||||||
- `AdaptivePacer` circuit-breaker preserved.
|
- **`AdaptivePacer`** — inter-page delay 2–4 s baseline, scales up multiplicatively under retry pressure, with a 45–60 s circuit-breaker cooldown after 3 consecutive retry-pages.
|
||||||
- Inter-page base delay on API paths: **500–1500ms** (down from 2000–4000ms for HTML).
|
- **Per-batch cooldowns** in `audible-refresh.processor.ts` — 15–30 s between popular/new-releases, 10–20 s between categories.
|
||||||
- API responses include `Cache-Control: private, max-age=1800`.
|
|
||||||
|
|
||||||
## Region Configuration
|
## Region Configuration
|
||||||
|
|
||||||
@@ -101,8 +112,8 @@ Configurable Audible region for accurate metadata matching across international
|
|||||||
- Automatic refresh: Region change triggers `audible_refresh` job.
|
- Automatic refresh: Region change triggers `audible_refresh` job.
|
||||||
|
|
||||||
**Per-region HTTP clients (on init):**
|
**Per-region HTTP clients (on init):**
|
||||||
- `apiClient` — `baseURL=apiBaseUrl`, `Accept: application/json`, `User-Agent: ReadMeABook/1.0`, no language/ipRedirect params.
|
- `apiClient` — `baseURL=apiBaseUrl`, `Accept: application/json`, `User-Agent: ReadMeABook/1.0`, no language/ipRedirect params. Used for the real-time JSON catalog operations (search, author books, categories listing, per-ASIN details fallback).
|
||||||
- `htmlClient` — `baseURL=baseUrl`, browser headers, default params `ipRedirectOverride=true` + `language=<audibleLocaleParam>`. Used only by `audible-series.ts` and `getBaseUrl()`-based link generation.
|
- `htmlClient` — `baseURL=baseUrl`, rotating browser headers (`pickUserAgent` + `getBrowserHeaders`), default params `ipRedirectOverride=true` + `language=<audibleLocaleParam>`. Used by the nightly discovery refresh (`/adblbestsellers`, `/newreleases`, `/search?node=...`), by `audible-series.ts`, and by `getBaseUrl()`-based link generation.
|
||||||
- Audnexus calls include `region=<audnexusParam>`.
|
- Audnexus calls include `region=<audnexusParam>`.
|
||||||
|
|
||||||
**Files:**
|
**Files:**
|
||||||
@@ -130,6 +141,44 @@ Single matching algorithm used everywhere (search, popular, new-releases, jobs).
|
|||||||
|
|
||||||
**Note:** Fuzzy matching (70% threshold) is preserved in `ranking-algorithm.ts` for Prowlarr torrent ranking. Library availability checks require exact ASIN matches only.
|
**Note:** Fuzzy matching (70% threshold) is preserved in `ranking-algorithm.ts` for Prowlarr torrent ranking. Library availability checks require exact ASIN matches only.
|
||||||
|
|
||||||
|
## Dedup & Works Table
|
||||||
|
|
||||||
|
**Status:** ✅ Implemented | Two-pass dedup on every discovery view + cross-batch identity via works table
|
||||||
|
|
||||||
|
Discovery views (search, author books, series detail) collapse duplicate Audible listings for the same recording (publisher re-listings, regional re-issues, full-cast vs single-narrator productions) into a single card. Two passes run in sequence:
|
||||||
|
|
||||||
|
1. **Local pass — `deduplicateAndCollectGroups()`** (`src/lib/utils/deduplicate-audiobooks.ts`)
|
||||||
|
- Stateless, in-memory. Keys books by normalized title + sorted narrator set + duration (±max(5%, 10 min) tolerance), with subtitle compatibility to keep distinct series entries separate.
|
||||||
|
- Picks a canonical representative per group by `metadataScore()` (cover + rating + duration + description + narrator + release date + genres).
|
||||||
|
- Emits `DedupGroup[]` describing every multi-ASIN collapse → handed to `persistDedupGroups()` for the works table.
|
||||||
|
|
||||||
|
2. **Works pass — `collapseByExistingWorks()`** (`src/lib/services/works.service.ts`)
|
||||||
|
- Async DB lookup. Reads `work_asins` for every ASIN in the local-passed list and collapses any books sharing a `workId` to one representative (same `metadataScore()` ranking).
|
||||||
|
- Catches duplicates the local pass misses: source-metadata divergence (e.g. HTML scraper captured different narrators), cross-page splits (paginated series), or non-matching field shapes.
|
||||||
|
- Degrades gracefully — returns the input unchanged on DB failure (view still renders).
|
||||||
|
|
||||||
|
### Works Table Schema
|
||||||
|
- `Work { id, title, author }` — one row per logical book
|
||||||
|
- `WorkAsin { id, workId, asin, narrator?, durationMinutes?, isCanonical, source, createdAt }` — many ASINs per Work
|
||||||
|
|
||||||
|
### Population Layers
|
||||||
|
- **Layer 1 (auto):** `persistDedupGroups()` writes whenever the local pass finds a duplicate. Merges across pre-existing works when a new group spans them.
|
||||||
|
- **Layer 2 (seed):** `seedAsin()` writes a single-ASIN work at request creation time, ensuring every requested ASIN has an entry to grow from.
|
||||||
|
|
||||||
|
### Read Paths
|
||||||
|
- **`collapseByExistingWorks()`** — view-level collapse (this section).
|
||||||
|
- **`getSiblingAsins()`** — library availability matching (`audiobook-matcher.ts`), request-creation duplicate prevention (`request-creator.service.ts`), ignored-audiobook expansion. Returns sibling ASINs grouped by input ASIN.
|
||||||
|
|
||||||
|
### Narrator Capture in HTML Scrapers
|
||||||
|
- HTML scrapers (`audible-series.ts`, the two `parse*Items` parsers in `audible.service.ts`) capture **all** narrator anchors via `extractAllNarrators()` (`src/lib/utils/extract-narrator.ts`). Multi-narrator productions render each name as its own `<a href="?searchNarrator=...">` link; capturing only the first (prior bug) made co-narrated audiobooks fail to dedup. Order is not significant — `normalizeNarrator()` sorts before comparison.
|
||||||
|
|
||||||
|
### Wired Routes
|
||||||
|
- `src/app/api/audiobooks/search/route.ts`
|
||||||
|
- `src/app/api/authors/[asin]/books/route.ts`
|
||||||
|
- `src/app/api/series/[asin]/route.ts`
|
||||||
|
|
||||||
|
Watched-list background jobs (`watched-lists.service.ts`) run the local pass only — they don't render a view, and the downstream `request-creator.service.ts` already does sibling-aware dedup at request creation time.
|
||||||
|
|
||||||
## Database-First Approach
|
## Database-First Approach
|
||||||
|
|
||||||
**Status:** Implemented
|
**Status:** Implemented
|
||||||
@@ -137,12 +186,12 @@ Single matching algorithm used everywhere (search, popular, new-releases, jobs).
|
|||||||
Discovery APIs serve cached data from DB with real-time matching.
|
Discovery APIs serve cached data from DB with real-time matching.
|
||||||
|
|
||||||
**Flow:**
|
**Flow:**
|
||||||
1. `audible_refresh` cron runs daily → fetches 200 popular + 200 new releases + user-configured categories via catalog API.
|
1. `audible_refresh` cron runs daily → fetches 200 popular + 200 new releases + user-configured categories by scraping Audible's curated HTML storefronts (`/adblbestsellers`, `/newreleases`, `/search?node=<id>&sort=popularity-rank`).
|
||||||
2. Downloads and caches cover thumbnails locally.
|
2. Downloads and caches cover thumbnails locally.
|
||||||
3. Stores metadata in `audible_cache`, ranked entries in `audible_cache_categories` with reserved IDs (`__popular__`, `__new_releases__`) and user category IDs.
|
3. Stores metadata in `audible_cache`, ranked entries in `audible_cache_categories` with reserved IDs (`__popular__`, `__new_releases__`) and user category IDs.
|
||||||
4. Cleans up unused thumbnails after sync.
|
4. Cleans up unused thumbnails after sync.
|
||||||
5. API routes query `AudibleCacheCategory` by categoryId → join with `AudibleCache` metadata → apply real-time matching → return enriched results.
|
5. API routes query `AudibleCacheCategory` by categoryId → join with `AudibleCache` metadata → apply real-time matching → return enriched results.
|
||||||
6. Homepage loads instantly (no Audible API hits).
|
6. Homepage loads instantly (no Audible HTTP hits at request time).
|
||||||
|
|
||||||
## Thumbnail Caching
|
## Thumbnail Caching
|
||||||
|
|
||||||
@@ -228,12 +277,25 @@ interface AuthorBooksResult {
|
|||||||
|
|
||||||
## Tech Stack
|
## Tech Stack
|
||||||
|
|
||||||
- `axios` (HTTP, two clients: `apiClient` for JSON catalog, `htmlClient` for series-page scraping only)
|
- `axios` (HTTP, two clients: `apiClient` for JSON catalog API, `htmlClient` for HTML refresh + series scraping)
|
||||||
|
- `cheerio` (HTML parsing for refresh job and `audible-series.ts`)
|
||||||
- Audnexus API (per-ASIN details, primary)
|
- Audnexus API (per-ASIN details, primary)
|
||||||
- PostgreSQL (`audible_cache`, `audible_cache_categories`)
|
- PostgreSQL (`audible_cache`, `audible_cache_categories`)
|
||||||
|
|
||||||
## Fixed Issues
|
## Fixed Issues
|
||||||
|
|
||||||
|
**Series-page duplicates not collapsing across user views (2026-05-14)**
|
||||||
|
- **Problem:** Two re-listings of the same audiobook (same title, same narrator set, same duration, different ASINs) showed as two cards on series detail pages, even after the works table had already linked them via search-page dedup.
|
||||||
|
- **Root cause (two-part):** (1) HTML scrapers used `$el.find('a[href*="searchNarrator="]').first()` for multi-narrator productions, capturing only the first co-narrator. So two listings of the same recording landed in `deduplicateAndCollectGroups` with mismatched single-narrator strings and never merged. (2) `deduplicateAndCollectGroups` was stateless — it wrote to the works table but never read it back, so even when one path (e.g. search) successfully merged two ASINs and persisted the Work, every other path (series, author books) re-derived the dedup decision from scratch and split them again.
|
||||||
|
- **Fix:** (1) New `extractAllNarrators()` helper (`src/lib/utils/extract-narrator.ts`) captures every `searchNarrator=` anchor and joins them; all three HTML scrapers route through it. (2) New `collapseByExistingWorks()` consults the works table after the local pass and collapses any remaining books sharing a `workId`. Wired into the three user-facing discovery routes (search / author books / series detail). Skipped for watched-list background jobs — those feed `request-creator.service.ts` which already does sibling-aware dedup.
|
||||||
|
- **Location:** `src/lib/utils/extract-narrator.ts` (new); `src/lib/integrations/audible-series.ts` (parseSeriesBooks); `src/lib/integrations/audible.service.ts` (parseProductListItems + parseSearchResultItems); `src/lib/utils/deduplicate-audiobooks.ts` (`metadataScore` exported); `src/lib/services/works.service.ts` (`collapseByExistingWorks` added); three API routes updated.
|
||||||
|
|
||||||
|
**Discovery refresh reverted to curated HTML scraping (2026-05-14)**
|
||||||
|
- **Problem:** After switching all catalog ops to the JSON catalog API in `f564d0a`, the nightly discovery refresh (Popular / New Releases / user-configured Categories) started serving junk: New Releases became 100% preorders out to 2027, and Popular was dominated by launch-day no-name shovelware.
|
||||||
|
- **Root cause:** `products_sort_by=BestSellers` is a right-now sales velocity rank that spikes on launch promos and preorder windows; `-ReleaseDate` returns all catalog items in date order with no released-only filter. The catalog API exposes no server-side filter to exclude preorders or sort by established popularity (verified by exhaustively testing `release_time`, `availability_status`, `customer_rights`, `Reviewed`/`MostListened`/`SalesRank` sorts — all silently ignored or rejected). Doing the curation client-side would have made RMAB the editorial curator, which Audible's storefront pages already do well.
|
||||||
|
- **Fix:** Hybrid architecture — the three refresh-only methods (`getPopularAudiobooks`, `getNewReleases`, `getCategoryBooks`) went back to scraping Audible's curated HTML storefronts (`/adblbestsellers`, `/newreleases`, `/search?node=<id>&sort=popularity-rank`). All user-facing real-time paths (search, author books, categories listing, per-ASIN details) stayed on the JSON catalog API. To keep the higher-503-risk HTML traffic resilient on the unattended nightly job, `fetchWithRetry()` accepts an optional `maxBackoffMs` cap and HTML callers use `HTML_MAX_RETRIES=12` + `HTML_MAX_BACKOFF_MS=180_000` (3-min cap). Healthy users finish quickly; 503-blocked users grind through patiently.
|
||||||
|
- **Location:** `src/lib/integrations/audible.service.ts` (three methods + two private parsers `parseProductListItems` / `parseSearchResultItems`); `src/lib/utils/scrape-resilience.ts` (`jitteredBackoff` cap parameter).
|
||||||
|
|
||||||
**Audiobookshelf metadata matching not respecting configured region (2026-01-28)**
|
**Audiobookshelf metadata matching not respecting configured region (2026-01-28)**
|
||||||
- **Problem:** `triggerABSItemMatch()` hardcoded `'audible'` provider (audible.com) instead of respecting user's configured Audible region.
|
- **Problem:** `triggerABSItemMatch()` hardcoded `'audible'` provider (audible.com) instead of respecting user's configured Audible region.
|
||||||
- **Impact:** Users with non-US regions (CA, UK, AU, IN) had incorrect metadata matching in Audiobookshelf, causing wrong ASINs.
|
- **Impact:** Users with non-US regions (CA, UK, AU, IN) had incorrect metadata matching in Audiobookshelf, causing wrong ASINs.
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ import { NextRequest, NextResponse } from 'next/server';
|
|||||||
import { getAudibleService } from '@/lib/integrations/audible.service';
|
import { getAudibleService } from '@/lib/integrations/audible.service';
|
||||||
import { enrichAudiobooksWithMatches } from '@/lib/utils/audiobook-matcher';
|
import { enrichAudiobooksWithMatches } from '@/lib/utils/audiobook-matcher';
|
||||||
import { deduplicateAndCollectGroups } from '@/lib/utils/deduplicate-audiobooks';
|
import { deduplicateAndCollectGroups } from '@/lib/utils/deduplicate-audiobooks';
|
||||||
import { persistDedupGroups } from '@/lib/services/works.service';
|
import { persistDedupGroups, collapseByExistingWorks } from '@/lib/services/works.service';
|
||||||
import { getCurrentUser } from '@/lib/middleware/auth';
|
import { getCurrentUser } from '@/lib/middleware/auth';
|
||||||
import { RMABLogger } from '@/lib/utils/logger';
|
import { RMABLogger } from '@/lib/utils/logger';
|
||||||
import { annotateWithIgnoreStatus } from '@/lib/utils/ignored-audiobooks';
|
import { annotateWithIgnoreStatus } from '@/lib/utils/ignored-audiobooks';
|
||||||
@@ -41,16 +41,19 @@ export async function GET(request: NextRequest) {
|
|||||||
const currentUser = getCurrentUser(request);
|
const currentUser = getCurrentUser(request);
|
||||||
const userId = currentUser?.sub || undefined;
|
const userId = currentUser?.sub || undefined;
|
||||||
|
|
||||||
// Deduplicate before enrichment to avoid wasted DB queries on duplicate entries
|
// Two-pass dedup: local title/narrator/duration matching first, then collapse
|
||||||
|
// any remaining duplicates that the works table already knows are the same book
|
||||||
|
// (handles cases where source metadata diverges across paths or pages).
|
||||||
const { books: dedupedResults, groups } = deduplicateAndCollectGroups(results.results);
|
const { books: dedupedResults, groups } = deduplicateAndCollectGroups(results.results);
|
||||||
|
|
||||||
// Fire-and-forget: persist dedup groups to works table for cross-ASIN matching
|
|
||||||
if (groups.length > 0) {
|
if (groups.length > 0) {
|
||||||
persistDedupGroups(groups).catch(() => {});
|
persistDedupGroups(groups).catch(() => {});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const collapsedResults = await collapseByExistingWorks(dedupedResults);
|
||||||
|
|
||||||
// Enrich search results with availability and request status information
|
// Enrich search results with availability and request status information
|
||||||
const enrichedResults = await enrichAudiobooksWithMatches(dedupedResults, userId);
|
const enrichedResults = await enrichAudiobooksWithMatches(collapsedResults, userId);
|
||||||
|
|
||||||
// Annotate with per-user ignore status
|
// Annotate with per-user ignore status
|
||||||
const annotatedResults = await annotateWithIgnoreStatus(enrichedResults, userId);
|
const annotatedResults = await annotateWithIgnoreStatus(enrichedResults, userId);
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ import { NextRequest, NextResponse } from 'next/server';
|
|||||||
import { getAudibleService } from '@/lib/integrations/audible.service';
|
import { getAudibleService } from '@/lib/integrations/audible.service';
|
||||||
import { enrichAudiobooksWithMatches } from '@/lib/utils/audiobook-matcher';
|
import { enrichAudiobooksWithMatches } from '@/lib/utils/audiobook-matcher';
|
||||||
import { deduplicateAndCollectGroups } from '@/lib/utils/deduplicate-audiobooks';
|
import { deduplicateAndCollectGroups } from '@/lib/utils/deduplicate-audiobooks';
|
||||||
import { persistDedupGroups } from '@/lib/services/works.service';
|
import { persistDedupGroups, collapseByExistingWorks } from '@/lib/services/works.service';
|
||||||
import { getCurrentUser } from '@/lib/middleware/auth';
|
import { getCurrentUser } from '@/lib/middleware/auth';
|
||||||
import { RMABLogger } from '@/lib/utils/logger';
|
import { RMABLogger } from '@/lib/utils/logger';
|
||||||
import { annotateWithIgnoreStatus } from '@/lib/utils/ignored-audiobooks';
|
import { annotateWithIgnoreStatus } from '@/lib/utils/ignored-audiobooks';
|
||||||
@@ -56,17 +56,20 @@ export async function GET(
|
|||||||
const audibleService = getAudibleService();
|
const audibleService = getAudibleService();
|
||||||
const result = await audibleService.searchByAuthorAsin(authorName.trim(), asin, page);
|
const result = await audibleService.searchByAuthorAsin(authorName.trim(), asin, page);
|
||||||
|
|
||||||
// Deduplicate before enrichment to avoid wasted DB queries on duplicate entries
|
// Two-pass dedup: local title/narrator/duration matching first, then collapse
|
||||||
|
// any remaining duplicates that the works table already knows are the same book
|
||||||
|
// (handles cases where source metadata diverges across paths or pages).
|
||||||
const { books: dedupedBooks, groups } = deduplicateAndCollectGroups(result.books);
|
const { books: dedupedBooks, groups } = deduplicateAndCollectGroups(result.books);
|
||||||
|
|
||||||
// Fire-and-forget: persist dedup groups to works table for cross-ASIN matching
|
|
||||||
if (groups.length > 0) {
|
if (groups.length > 0) {
|
||||||
persistDedupGroups(groups).catch(() => {});
|
persistDedupGroups(groups).catch(() => {});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const collapsedBooks = await collapseByExistingWorks(dedupedBooks);
|
||||||
|
|
||||||
// Enrich with library availability and request status
|
// Enrich with library availability and request status
|
||||||
const userId = currentUser.sub || undefined;
|
const userId = currentUser.sub || undefined;
|
||||||
const enrichedBooks = await enrichAudiobooksWithMatches(dedupedBooks, userId);
|
const enrichedBooks = await enrichAudiobooksWithMatches(collapsedBooks, userId);
|
||||||
|
|
||||||
// Annotate with per-user ignore status
|
// Annotate with per-user ignore status
|
||||||
const annotatedBooks = await annotateWithIgnoreStatus(enrichedBooks, userId);
|
const annotatedBooks = await annotateWithIgnoreStatus(enrichedBooks, userId);
|
||||||
|
|||||||
@@ -9,7 +9,7 @@ import { RMABLogger } from '@/lib/utils/logger';
|
|||||||
import { scrapeSeriesPage } from '@/lib/integrations/audible-series';
|
import { scrapeSeriesPage } from '@/lib/integrations/audible-series';
|
||||||
import { enrichAudiobooksWithMatches } from '@/lib/utils/audiobook-matcher';
|
import { enrichAudiobooksWithMatches } from '@/lib/utils/audiobook-matcher';
|
||||||
import { deduplicateAndCollectGroups } from '@/lib/utils/deduplicate-audiobooks';
|
import { deduplicateAndCollectGroups } from '@/lib/utils/deduplicate-audiobooks';
|
||||||
import { persistDedupGroups } from '@/lib/services/works.service';
|
import { persistDedupGroups, collapseByExistingWorks } from '@/lib/services/works.service';
|
||||||
import { annotateWithIgnoreStatus } from '@/lib/utils/ignored-audiobooks';
|
import { annotateWithIgnoreStatus } from '@/lib/utils/ignored-audiobooks';
|
||||||
|
|
||||||
const logger = RMABLogger.create('API.Series.Detail');
|
const logger = RMABLogger.create('API.Series.Detail');
|
||||||
@@ -52,17 +52,20 @@ export async function GET(
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Deduplicate before enrichment to avoid wasted DB queries on duplicate entries
|
// Two-pass dedup: local title/narrator/duration matching first, then collapse
|
||||||
|
// any remaining duplicates that the works table already knows are the same book
|
||||||
|
// (handles cases where source metadata diverges across paths or pages).
|
||||||
const { books: dedupedBooks, groups } = deduplicateAndCollectGroups(detail.books);
|
const { books: dedupedBooks, groups } = deduplicateAndCollectGroups(detail.books);
|
||||||
|
|
||||||
// Fire-and-forget: persist dedup groups to works table for cross-ASIN matching
|
|
||||||
if (groups.length > 0) {
|
if (groups.length > 0) {
|
||||||
persistDedupGroups(groups).catch(() => {});
|
persistDedupGroups(groups).catch(() => {});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const collapsedBooks = await collapseByExistingWorks(dedupedBooks);
|
||||||
|
|
||||||
// Enrich books with library availability and request status
|
// Enrich books with library availability and request status
|
||||||
const userId = currentUser.sub || undefined;
|
const userId = currentUser.sub || undefined;
|
||||||
const enrichedBooks = await enrichAudiobooksWithMatches(dedupedBooks, userId);
|
const enrichedBooks = await enrichAudiobooksWithMatches(collapsedBooks, userId);
|
||||||
|
|
||||||
// Annotate with per-user ignore status
|
// Annotate with per-user ignore status
|
||||||
const annotatedBooks = await annotateWithIgnoreStatus(enrichedBooks, userId);
|
const annotatedBooks = await annotateWithIgnoreStatus(enrichedBooks, userId);
|
||||||
|
|||||||
@@ -19,6 +19,7 @@ import {
|
|||||||
import { RMABLogger } from '../utils/logger';
|
import { RMABLogger } from '../utils/logger';
|
||||||
import { parseRuntime } from '../utils/parse-runtime';
|
import { parseRuntime } from '../utils/parse-runtime';
|
||||||
import { randomDelay } from '../utils/scrape-resilience';
|
import { randomDelay } from '../utils/scrape-resilience';
|
||||||
|
import { extractAllNarrators } from '../utils/extract-narrator';
|
||||||
|
|
||||||
const logger = RMABLogger.create('Audible.Series');
|
const logger = RMABLogger.create('Audible.Series');
|
||||||
|
|
||||||
@@ -442,10 +443,8 @@ function parseSeriesBooks(
|
|||||||
const authorHref = authorLink.attr('href') || '';
|
const authorHref = authorLink.attr('href') || '';
|
||||||
const authorAsinMatch = authorHref.match(/\/author\/[^/]+\/([A-Z0-9]{10})/);
|
const authorAsinMatch = authorHref.match(/\/author\/[^/]+\/([A-Z0-9]{10})/);
|
||||||
|
|
||||||
// Narrator
|
// Narrator — capture all narrator links (multi-narrator productions are common)
|
||||||
const narratorText = $el.find('a[href*="searchNarrator="]').first().text().trim() ||
|
const narratorText = extractAllNarrators($, $el);
|
||||||
$el.find('.narratorLabel').text().trim() ||
|
|
||||||
'';
|
|
||||||
|
|
||||||
// Cover art
|
// Cover art
|
||||||
const coverArtUrl = $el.find('img').first().attr('src')?.replace(/\._.*_\./, '._SL500_.') || '';
|
const coverArtUrl = $el.find('img').first().attr('src')?.replace(/\._.*_\./, '._SL500_.') || '';
|
||||||
|
|||||||
@@ -4,21 +4,26 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
import axios, { AxiosInstance } from 'axios';
|
import axios, { AxiosInstance } from 'axios';
|
||||||
|
import * as cheerio from 'cheerio';
|
||||||
import { RMABLogger } from '../utils/logger';
|
import { RMABLogger } from '../utils/logger';
|
||||||
import { getConfigService } from '../services/config.service';
|
import { getConfigService } from '../services/config.service';
|
||||||
import { AudibleRegion, AUDIBLE_REGIONS, DEFAULT_AUDIBLE_REGION } from '../types/audible';
|
import { AudibleRegion, AUDIBLE_REGIONS, DEFAULT_AUDIBLE_REGION } from '../types/audible';
|
||||||
import {
|
import {
|
||||||
getLanguageForRegion,
|
getLanguageForRegion,
|
||||||
isAcceptedLanguage,
|
isAcceptedLanguage,
|
||||||
|
stripPrefixes,
|
||||||
|
buildContainsSelector,
|
||||||
|
type LanguageConfig,
|
||||||
} from '../constants/language-config';
|
} from '../constants/language-config';
|
||||||
import {
|
import {
|
||||||
pickUserAgent,
|
pickUserAgent,
|
||||||
getBrowserHeaders,
|
getBrowserHeaders,
|
||||||
jitteredBackoff,
|
jitteredBackoff,
|
||||||
randomDelay,
|
|
||||||
AdaptivePacer,
|
AdaptivePacer,
|
||||||
FetchResultMeta,
|
FetchResultMeta,
|
||||||
} from '../utils/scrape-resilience';
|
} from '../utils/scrape-resilience';
|
||||||
|
import { parseRuntime as parseRuntimeUtil } from '../utils/parse-runtime';
|
||||||
|
import { extractAllNarrators } from '../utils/extract-narrator';
|
||||||
|
|
||||||
const logger = RMABLogger.create('Audible');
|
const logger = RMABLogger.create('Audible');
|
||||||
|
|
||||||
@@ -27,6 +32,13 @@ const AUDIBLE_PAGE_SIZE = 50;
|
|||||||
const CATALOG_RESPONSE_GROUPS =
|
const CATALOG_RESPONSE_GROUPS =
|
||||||
'contributors,product_desc,product_attrs,product_extended_attrs,media,rating,series,category_ladders,product_details';
|
'contributors,product_desc,product_attrs,product_extended_attrs,media,rating,series,category_ladders,product_details';
|
||||||
|
|
||||||
|
// Retry/backoff knobs for HTML scraping (nightly refresh job only).
|
||||||
|
// Healthy users still finish quickly — per-page success returns on attempt 0
|
||||||
|
// with a 2-4s inter-page delay. Struggling users grind through 503 storms
|
||||||
|
// patiently: up to ~12 retries per request, with each backoff capped at 3 min.
|
||||||
|
const HTML_MAX_RETRIES = 12;
|
||||||
|
const HTML_MAX_BACKOFF_MS = 180_000;
|
||||||
|
|
||||||
export interface AudibleAudiobook {
|
export interface AudibleAudiobook {
|
||||||
asin: string;
|
asin: string;
|
||||||
title: string;
|
title: string;
|
||||||
@@ -298,6 +310,7 @@ export class AudibleService {
|
|||||||
config: any = {},
|
config: any = {},
|
||||||
maxRetries: number = 5,
|
maxRetries: number = 5,
|
||||||
client: AxiosInstance = this.htmlClient,
|
client: AxiosInstance = this.htmlClient,
|
||||||
|
maxBackoffMs: number = Number.POSITIVE_INFINITY,
|
||||||
): Promise<{ data: any; meta: FetchResultMeta }> {
|
): Promise<{ data: any; meta: FetchResultMeta }> {
|
||||||
let lastError: Error | null = null;
|
let lastError: Error | null = null;
|
||||||
let retriesUsed = 0;
|
let retriesUsed = 0;
|
||||||
@@ -324,7 +337,7 @@ export class AudibleService {
|
|||||||
|
|
||||||
retriesUsed++;
|
retriesUsed++;
|
||||||
|
|
||||||
const backoffMs = jitteredBackoff(attempt);
|
const backoffMs = jitteredBackoff(attempt, 1000, maxBackoffMs);
|
||||||
logger.info(
|
logger.info(
|
||||||
` Request failed (${status || 'network error'}), retrying in ${backoffMs}ms (attempt ${attempt + 1}/${maxRetries})...`,
|
` Request failed (${status || 'network error'}), retrying in ${backoffMs}ms (attempt ${attempt + 1}/${maxRetries})...`,
|
||||||
);
|
);
|
||||||
@@ -379,6 +392,12 @@ export class AudibleService {
|
|||||||
throw lastError || new Error('External API request failed after retries');
|
throw lastError || new Error('External API request failed after retries');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Popular audiobooks from Audible's curated /adblbestsellers HTML page.
|
||||||
|
* Uses HTML scraping (not the catalog API) because the API's BestSellers sort
|
||||||
|
* is a right-now velocity rank that surfaces launch-day shovelware and preorders;
|
||||||
|
* the HTML page reflects Audible's editorial curation.
|
||||||
|
*/
|
||||||
async getPopularAudiobooks(limit: number = 20): Promise<AudibleAudiobook[]> {
|
async getPopularAudiobooks(limit: number = 20): Promise<AudibleAudiobook[]> {
|
||||||
await this.initialize();
|
await this.initialize();
|
||||||
|
|
||||||
@@ -395,42 +414,36 @@ export class AudibleService {
|
|||||||
logger.info(` Fetching page ${page}/${maxPages}...`);
|
logger.info(` Fetching page ${page}/${maxPages}...`);
|
||||||
|
|
||||||
const { data: response, meta } = await this.fetchWithRetry(
|
const { data: response, meta } = await this.fetchWithRetry(
|
||||||
'/1.0/catalog/products',
|
'/adblbestsellers',
|
||||||
{
|
{
|
||||||
params: {
|
params: {
|
||||||
products_sort_by: 'BestSellers',
|
ipRedirectOverride: 'true',
|
||||||
num_results: AUDIBLE_PAGE_SIZE,
|
pageSize: AUDIBLE_PAGE_SIZE,
|
||||||
page: page - 1,
|
...(page > 1 ? { page } : {}),
|
||||||
response_groups: CATALOG_RESPONSE_GROUPS,
|
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
5,
|
HTML_MAX_RETRIES,
|
||||||
this.apiClient,
|
this.htmlClient,
|
||||||
|
HTML_MAX_BACKOFF_MS,
|
||||||
);
|
);
|
||||||
|
|
||||||
const envelope: CatalogProductsResponse = response.data;
|
const foundOnPage = this.parseProductListItems(
|
||||||
const products = envelope.products ?? [];
|
response.data,
|
||||||
const totalResults = envelope.total_results ?? 0;
|
audiobooks,
|
||||||
|
limit,
|
||||||
|
);
|
||||||
|
|
||||||
for (const product of products) {
|
logger.info(` Found ${foundOnPage} audiobooks on page ${page}`);
|
||||||
if (audiobooks.length >= limit) break;
|
|
||||||
if (audiobooks.some((b) => b.asin === product.asin)) continue;
|
if (foundOnPage < AUDIBLE_PAGE_SIZE / 2) {
|
||||||
audiobooks.push(mapCatalogProduct(product));
|
logger.info(` Reached end of available pages`);
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
logger.info(` Found ${products.length} audiobooks on page ${page}`);
|
|
||||||
|
|
||||||
const hasMore =
|
|
||||||
totalResults > 0
|
|
||||||
? totalResults > page * AUDIBLE_PAGE_SIZE
|
|
||||||
: products.length >= AUDIBLE_PAGE_SIZE;
|
|
||||||
|
|
||||||
if (!hasMore) break;
|
|
||||||
|
|
||||||
page++;
|
page++;
|
||||||
|
|
||||||
if (page <= maxPages && audiobooks.length < limit) {
|
if (page <= maxPages && audiobooks.length < limit) {
|
||||||
await this.delay(this.apiPageDelay(meta));
|
await this.delay(this.pacer.reportPageResult(meta));
|
||||||
}
|
}
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
logger.error(`Failed to fetch page ${page} of popular audiobooks`, {
|
logger.error(`Failed to fetch page ${page} of popular audiobooks`, {
|
||||||
@@ -445,6 +458,11 @@ export class AudibleService {
|
|||||||
return audiobooks;
|
return audiobooks;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* New release audiobooks from Audible's curated /newreleases HTML page.
|
||||||
|
* Uses HTML scraping (not the catalog API) because the API's -ReleaseDate sort
|
||||||
|
* returns 100% future preorders with no released-only filter available.
|
||||||
|
*/
|
||||||
async getNewReleases(limit: number = 20): Promise<AudibleAudiobook[]> {
|
async getNewReleases(limit: number = 20): Promise<AudibleAudiobook[]> {
|
||||||
await this.initialize();
|
await this.initialize();
|
||||||
|
|
||||||
@@ -461,42 +479,36 @@ export class AudibleService {
|
|||||||
logger.info(` Fetching page ${page}/${maxPages}...`);
|
logger.info(` Fetching page ${page}/${maxPages}...`);
|
||||||
|
|
||||||
const { data: response, meta } = await this.fetchWithRetry(
|
const { data: response, meta } = await this.fetchWithRetry(
|
||||||
'/1.0/catalog/products',
|
'/newreleases',
|
||||||
{
|
{
|
||||||
params: {
|
params: {
|
||||||
products_sort_by: '-ReleaseDate',
|
ipRedirectOverride: 'true',
|
||||||
num_results: AUDIBLE_PAGE_SIZE,
|
pageSize: AUDIBLE_PAGE_SIZE,
|
||||||
page: page - 1,
|
...(page > 1 ? { page } : {}),
|
||||||
response_groups: CATALOG_RESPONSE_GROUPS,
|
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
5,
|
HTML_MAX_RETRIES,
|
||||||
this.apiClient,
|
this.htmlClient,
|
||||||
|
HTML_MAX_BACKOFF_MS,
|
||||||
);
|
);
|
||||||
|
|
||||||
const envelope: CatalogProductsResponse = response.data;
|
const foundOnPage = this.parseProductListItems(
|
||||||
const products = envelope.products ?? [];
|
response.data,
|
||||||
const totalResults = envelope.total_results ?? 0;
|
audiobooks,
|
||||||
|
limit,
|
||||||
|
);
|
||||||
|
|
||||||
for (const product of products) {
|
logger.info(` Found ${foundOnPage} audiobooks on page ${page}`);
|
||||||
if (audiobooks.length >= limit) break;
|
|
||||||
if (audiobooks.some((b) => b.asin === product.asin)) continue;
|
if (foundOnPage < AUDIBLE_PAGE_SIZE / 2) {
|
||||||
audiobooks.push(mapCatalogProduct(product));
|
logger.info(` Reached end of available pages`);
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
logger.info(` Found ${products.length} audiobooks on page ${page}`);
|
|
||||||
|
|
||||||
const hasMore =
|
|
||||||
totalResults > 0
|
|
||||||
? totalResults > page * AUDIBLE_PAGE_SIZE
|
|
||||||
: products.length >= AUDIBLE_PAGE_SIZE;
|
|
||||||
|
|
||||||
if (!hasMore) break;
|
|
||||||
|
|
||||||
page++;
|
page++;
|
||||||
|
|
||||||
if (page <= maxPages && audiobooks.length < limit) {
|
if (page <= maxPages && audiobooks.length < limit) {
|
||||||
await this.delay(this.apiPageDelay(meta));
|
await this.delay(this.pacer.reportPageResult(meta));
|
||||||
}
|
}
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
logger.error(`Failed to fetch page ${page} of new releases`, {
|
logger.error(`Failed to fetch page ${page} of new releases`, {
|
||||||
@@ -791,6 +803,11 @@ export class AudibleService {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Category audiobooks from Audible's HTML /search?node=<categoryId> page,
|
||||||
|
* sorted by popularity-rank. Uses HTML scraping (not the catalog API) so
|
||||||
|
* results match Audible's curated category-storefront ordering.
|
||||||
|
*/
|
||||||
async getCategoryBooks(categoryId: string, limit: number = 200): Promise<AudibleAudiobook[]> {
|
async getCategoryBooks(categoryId: string, limit: number = 200): Promise<AudibleAudiobook[]> {
|
||||||
await this.initialize();
|
await this.initialize();
|
||||||
|
|
||||||
@@ -805,43 +822,35 @@ export class AudibleService {
|
|||||||
while (audiobooks.length < limit && page <= maxPages) {
|
while (audiobooks.length < limit && page <= maxPages) {
|
||||||
try {
|
try {
|
||||||
const { data: response, meta } = await this.fetchWithRetry(
|
const { data: response, meta } = await this.fetchWithRetry(
|
||||||
'/1.0/catalog/products',
|
'/search',
|
||||||
{
|
{
|
||||||
params: {
|
params: {
|
||||||
category_id: categoryId,
|
ipRedirectOverride: 'true',
|
||||||
products_sort_by: 'BestSellers',
|
node: categoryId,
|
||||||
num_results: AUDIBLE_PAGE_SIZE,
|
pageSize: AUDIBLE_PAGE_SIZE,
|
||||||
page: page - 1,
|
sort: 'popularity-rank',
|
||||||
response_groups: CATALOG_RESPONSE_GROUPS,
|
...(page > 1 ? { page } : {}),
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
5,
|
HTML_MAX_RETRIES,
|
||||||
this.apiClient,
|
this.htmlClient,
|
||||||
|
HTML_MAX_BACKOFF_MS,
|
||||||
);
|
);
|
||||||
|
|
||||||
const envelope: CatalogProductsResponse = response.data;
|
const foundOnPage = this.parseSearchResultItems(
|
||||||
const products = envelope.products ?? [];
|
response.data,
|
||||||
const totalResults = envelope.total_results ?? 0;
|
audiobooks,
|
||||||
|
limit,
|
||||||
|
);
|
||||||
|
|
||||||
for (const product of products) {
|
logger.info(`Category ${categoryId}: found ${foundOnPage} books on page ${page}`);
|
||||||
if (audiobooks.length >= limit) break;
|
|
||||||
if (audiobooks.some((b) => b.asin === product.asin)) continue;
|
|
||||||
audiobooks.push(mapCatalogProduct(product));
|
|
||||||
}
|
|
||||||
|
|
||||||
logger.info(`Category ${categoryId}: found ${products.length} books on page ${page}`);
|
if (foundOnPage < AUDIBLE_PAGE_SIZE / 2) break;
|
||||||
|
|
||||||
const hasMore =
|
|
||||||
totalResults > 0
|
|
||||||
? totalResults > page * AUDIBLE_PAGE_SIZE
|
|
||||||
: products.length >= AUDIBLE_PAGE_SIZE;
|
|
||||||
|
|
||||||
if (!hasMore) break;
|
|
||||||
|
|
||||||
page++;
|
page++;
|
||||||
|
|
||||||
if (page <= maxPages && audiobooks.length < limit) {
|
if (page <= maxPages && audiobooks.length < limit) {
|
||||||
await this.delay(this.apiPageDelay(meta));
|
await this.delay(this.pacer.reportPageResult(meta));
|
||||||
}
|
}
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
logger.error(`Failed to fetch category ${categoryId} page ${page}`, {
|
logger.error(`Failed to fetch category ${categoryId} page ${page}`, {
|
||||||
@@ -858,12 +867,148 @@ export class AudibleService {
|
|||||||
return audiobooks;
|
return audiobooks;
|
||||||
}
|
}
|
||||||
|
|
||||||
private apiPageDelay(meta: FetchResultMeta): number {
|
private getLangConfig(): LanguageConfig {
|
||||||
if (meta.retriesUsed > 0) {
|
return getLanguageForRegion(this.region);
|
||||||
return this.pacer.reportPageResult(meta);
|
|
||||||
}
|
}
|
||||||
this.pacer.reportPageResult(meta);
|
|
||||||
return randomDelay(500, 1500);
|
private parseRuntime(runtimeText: string): number | undefined {
|
||||||
|
return parseRuntimeUtil(runtimeText, this.getLangConfig());
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse the `.productListItem` blocks used by /adblbestsellers and /newreleases.
|
||||||
|
* Pushes matched books into `audiobooks` (skipping duplicates and respecting `limit`)
|
||||||
|
* and returns the count parsed from this page.
|
||||||
|
*/
|
||||||
|
private parseProductListItems(
|
||||||
|
html: string,
|
||||||
|
audiobooks: AudibleAudiobook[],
|
||||||
|
limit: number,
|
||||||
|
): number {
|
||||||
|
const $ = cheerio.load(html);
|
||||||
|
const langConfig = this.getLangConfig();
|
||||||
|
let foundOnPage = 0;
|
||||||
|
|
||||||
|
$('.productListItem').each((_index, element) => {
|
||||||
|
if (audiobooks.length >= limit) return false;
|
||||||
|
|
||||||
|
const $el = $(element);
|
||||||
|
|
||||||
|
const asin =
|
||||||
|
$el.find('li').attr('data-asin') ||
|
||||||
|
$el.find('a').attr('href')?.match(/\/(?:pd|ac)\/[^\/]+\/([A-Z0-9]{10})/)?.[1] ||
|
||||||
|
'';
|
||||||
|
if (!asin) return;
|
||||||
|
if (audiobooks.some((book) => book.asin === asin)) return;
|
||||||
|
|
||||||
|
const title =
|
||||||
|
$el.find('h3 a').text().trim() ||
|
||||||
|
$el.find('.bc-heading a').text().trim();
|
||||||
|
|
||||||
|
const authorText =
|
||||||
|
$el.find('.authorLabel').text().trim() ||
|
||||||
|
$el.find('.bc-size-small .bc-text-bold').first().text().trim();
|
||||||
|
|
||||||
|
const authorHref = $el.find('a[href*="/author/"]').first().attr('href') || '';
|
||||||
|
const authorAsinMatch = authorHref.match(/\/author\/[^\/]+\/([A-Z0-9]{10})/);
|
||||||
|
|
||||||
|
// Narrator — capture all narrator links (multi-narrator productions are common);
|
||||||
|
// fall back to .narratorLabel text, then to the bc-text-bold sibling for layouts
|
||||||
|
// that omit both anchor links and the .narratorLabel span.
|
||||||
|
const narratorText =
|
||||||
|
extractAllNarrators($, $el) ||
|
||||||
|
$el.find('.bc-size-small .bc-text-bold').eq(1).text().trim();
|
||||||
|
|
||||||
|
const coverArtUrl = $el.find('img').attr('src') || '';
|
||||||
|
|
||||||
|
const ratingText = $el.find('.ratingsLabel').text().trim();
|
||||||
|
const rating = ratingText ? parseFloat(ratingText.split(' ')[0]) : undefined;
|
||||||
|
|
||||||
|
audiobooks.push({
|
||||||
|
asin,
|
||||||
|
title,
|
||||||
|
author: stripPrefixes(authorText, langConfig.scraping.authorPrefixes),
|
||||||
|
authorAsin: authorAsinMatch?.[1] || undefined,
|
||||||
|
narrator: stripPrefixes(narratorText, langConfig.scraping.narratorPrefixes),
|
||||||
|
coverArtUrl: coverArtUrl.replace(/\._.*_\./, '._SL500_.'),
|
||||||
|
rating,
|
||||||
|
});
|
||||||
|
|
||||||
|
foundOnPage++;
|
||||||
|
});
|
||||||
|
|
||||||
|
return foundOnPage;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse the `.s-result-item` / `.productListItem` blocks used by
|
||||||
|
* /search?node=<categoryId>. Pushes matched books into `audiobooks`
|
||||||
|
* (skipping duplicates and respecting `limit`) and returns the count parsed
|
||||||
|
* from this page.
|
||||||
|
*/
|
||||||
|
private parseSearchResultItems(
|
||||||
|
html: string,
|
||||||
|
audiobooks: AudibleAudiobook[],
|
||||||
|
limit: number,
|
||||||
|
): number {
|
||||||
|
const $ = cheerio.load(html);
|
||||||
|
const langConfig = this.getLangConfig();
|
||||||
|
let foundOnPage = 0;
|
||||||
|
|
||||||
|
$('.s-result-item, .productListItem').each((_index, element) => {
|
||||||
|
if (audiobooks.length >= limit) return false;
|
||||||
|
|
||||||
|
const $el = $(element);
|
||||||
|
|
||||||
|
const asin =
|
||||||
|
$el.find('li').attr('data-asin') ||
|
||||||
|
$el.find('a').attr('href')?.match(/\/(?:pd|ac)\/[^\/]+\/([A-Z0-9]{10})/)?.[1] ||
|
||||||
|
'';
|
||||||
|
if (!asin) return;
|
||||||
|
if (audiobooks.some((b) => b.asin === asin)) return;
|
||||||
|
|
||||||
|
const title =
|
||||||
|
$el.find('h2').first().text().trim() ||
|
||||||
|
$el.find('h3 a').text().trim() ||
|
||||||
|
$el.find('.bc-heading a').text().trim();
|
||||||
|
|
||||||
|
const authorLink = $el.find('a[href*="/author/"]').first();
|
||||||
|
const authorText =
|
||||||
|
authorLink.text().trim() ||
|
||||||
|
$el.find('.authorLabel').text().trim();
|
||||||
|
const authorHref = authorLink.attr('href') || '';
|
||||||
|
const authorAsinMatch = authorHref.match(/\/author\/[^\/]+\/([A-Z0-9]{10})/);
|
||||||
|
|
||||||
|
// Narrator — capture all narrator links (multi-narrator productions are common)
|
||||||
|
const narratorText = extractAllNarrators($, $el);
|
||||||
|
|
||||||
|
const coverArtUrl = $el.find('img').attr('src') || '';
|
||||||
|
|
||||||
|
const runtimeText =
|
||||||
|
$el.find('.runtimeLabel').text().trim() ||
|
||||||
|
$el.find(buildContainsSelector('span', langConfig.scraping.lengthLabels)).text().trim();
|
||||||
|
const durationMinutes = this.parseRuntime(runtimeText);
|
||||||
|
|
||||||
|
const ratingText =
|
||||||
|
$el.find('.ratingsLabel').text().trim() ||
|
||||||
|
$el.find('.a-icon-star span').first().text().trim();
|
||||||
|
const rating = ratingText ? parseFloat(ratingText.split(' ')[0]) : undefined;
|
||||||
|
|
||||||
|
audiobooks.push({
|
||||||
|
asin,
|
||||||
|
title,
|
||||||
|
author: stripPrefixes(authorText, langConfig.scraping.authorPrefixes),
|
||||||
|
authorAsin: authorAsinMatch?.[1] || undefined,
|
||||||
|
narrator: stripPrefixes(narratorText, langConfig.scraping.narratorPrefixes),
|
||||||
|
coverArtUrl: coverArtUrl.replace(/\._.*_\./, '._SL500_.'),
|
||||||
|
durationMinutes,
|
||||||
|
rating,
|
||||||
|
});
|
||||||
|
|
||||||
|
foundOnPage++;
|
||||||
|
});
|
||||||
|
|
||||||
|
return foundOnPage;
|
||||||
}
|
}
|
||||||
|
|
||||||
private async delay(ms: number): Promise<void> {
|
private async delay(ms: number): Promise<void> {
|
||||||
|
|||||||
@@ -138,16 +138,37 @@ async function persistSectionBooks(
|
|||||||
logger: ReturnType<typeof RMABLogger.forJob>,
|
logger: ReturnType<typeof RMABLogger.forJob>,
|
||||||
labelForErrors: string,
|
labelForErrors: string,
|
||||||
): Promise<number> {
|
): Promise<number> {
|
||||||
|
// Defensive dedup: the (asin, categoryId) unique constraint means a duplicate ASIN
|
||||||
|
// in `books` crashes the second .create() with P2002. The HTML parser already dedupes
|
||||||
|
// per page and across pages against the cumulative accumulator, but a warn-on-fire
|
||||||
|
// signal here lets us detect upstream surprises (e.g. Audible serving the same item
|
||||||
|
// in both a carousel and the main grid) without the noisy duplicate-key Postgres
|
||||||
|
// errors. Keep the first occurrence so Audible's editorial ordering is preserved.
|
||||||
|
const seenAsins = new Set<string>();
|
||||||
|
const dedupedBooks = books.filter((b) => {
|
||||||
|
if (!b?.asin || seenAsins.has(b.asin)) return false;
|
||||||
|
seenAsins.add(b.asin);
|
||||||
|
return true;
|
||||||
|
});
|
||||||
|
const droppedCount = books.length - dedupedBooks.length;
|
||||||
|
if (droppedCount > 0) {
|
||||||
|
logger.warn(
|
||||||
|
`Dropped ${droppedCount} duplicate ASIN(s) from ${categoryId} input list before persist`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
// Wipe previous entries for this section
|
// Wipe previous entries for this section
|
||||||
logger.info(`Clearing previous data for ${categoryId}...`);
|
logger.info(`Clearing previous data for ${categoryId}...`);
|
||||||
await prisma.audibleCacheCategory.deleteMany({
|
await prisma.audibleCacheCategory.deleteMany({
|
||||||
where: { categoryId },
|
where: { categoryId },
|
||||||
});
|
});
|
||||||
logger.info(`Cleared previous entries for ${categoryId}, saving ${books.length} books...`);
|
logger.info(
|
||||||
|
`Cleared previous entries for ${categoryId}, saving ${dedupedBooks.length} books...`,
|
||||||
|
);
|
||||||
|
|
||||||
let saved = 0;
|
let saved = 0;
|
||||||
for (let i = 0; i < books.length; i++) {
|
for (let i = 0; i < dedupedBooks.length; i++) {
|
||||||
const book = books[i];
|
const book = dedupedBooks[i];
|
||||||
try {
|
try {
|
||||||
// Cache thumbnail if coverArtUrl exists
|
// Cache thumbnail if coverArtUrl exists
|
||||||
let cachedCoverPath: string | null = null;
|
let cachedCoverPath: string | null = null;
|
||||||
|
|||||||
@@ -9,7 +9,8 @@
|
|||||||
|
|
||||||
import { prisma } from '@/lib/db';
|
import { prisma } from '@/lib/db';
|
||||||
import { RMABLogger } from '@/lib/utils/logger';
|
import { RMABLogger } from '@/lib/utils/logger';
|
||||||
import type { DedupGroup } from '@/lib/utils/deduplicate-audiobooks';
|
import { metadataScore, type DedupGroup } from '@/lib/utils/deduplicate-audiobooks';
|
||||||
|
import type { AudibleAudiobook } from '@/lib/integrations/audible.service';
|
||||||
|
|
||||||
const logger = RMABLogger.create('WorksService');
|
const logger = RMABLogger.create('WorksService');
|
||||||
|
|
||||||
@@ -182,6 +183,96 @@ export async function seedAsin(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// View-level collapse (consult the works table after local dedup)
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Collapse books that already share a Work record according to the works table.
|
||||||
|
*
|
||||||
|
* The local `deduplicateAndCollectGroups()` pass is title/narrator/duration-based
|
||||||
|
* and stateless — it can fail to merge ASINs whose source metadata diverges (e.g.
|
||||||
|
* a series-page scrape captures different "first narrators" for two ASINs of the
|
||||||
|
* same recording, or two paginated pages each contain one ASIN and never compare
|
||||||
|
* them). The works table is the durable source of truth for "same book" identity,
|
||||||
|
* populated by every prior dedup pass and by request-time seeding. This pass
|
||||||
|
* applies that knowledge to the current view.
|
||||||
|
*
|
||||||
|
* Behavior:
|
||||||
|
* - Books whose ASINs map to a shared workId collapse to a single representative
|
||||||
|
* chosen by `metadataScore()` (same ranking as local dedup).
|
||||||
|
* - Books not present in any work, or in single-ASIN works, pass through untouched.
|
||||||
|
* - Original ordering is preserved (the kept representative sits at the position
|
||||||
|
* of the first occurrence of its work in the input list).
|
||||||
|
* - DB failure is non-fatal: the input list is returned unchanged so the view
|
||||||
|
* still renders (degrades to local-dedup-only behavior).
|
||||||
|
*/
|
||||||
|
export async function collapseByExistingWorks(
|
||||||
|
books: AudibleAudiobook[],
|
||||||
|
): Promise<AudibleAudiobook[]> {
|
||||||
|
if (books.length <= 1) return books;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const asins = books.map(b => b.asin);
|
||||||
|
const entries = await prisma.workAsin.findMany({
|
||||||
|
where: { asin: { in: asins } },
|
||||||
|
select: { asin: true, workId: true },
|
||||||
|
});
|
||||||
|
|
||||||
|
if (entries.length === 0) return books;
|
||||||
|
|
||||||
|
// Map ASIN → workId for fast lookup in the loop below
|
||||||
|
const asinToWorkId = new Map<string, string>();
|
||||||
|
for (const entry of entries) {
|
||||||
|
asinToWorkId.set(entry.asin, entry.workId);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Walk the input once, preserving position. For each work seen, keep a
|
||||||
|
// running "best" book; for books not in any work, emit immediately.
|
||||||
|
const result: AudibleAudiobook[] = [];
|
||||||
|
const workIdToResultIndex = new Map<string, number>();
|
||||||
|
|
||||||
|
for (const book of books) {
|
||||||
|
const workId = asinToWorkId.get(book.asin);
|
||||||
|
if (!workId) {
|
||||||
|
result.push(book);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
const existingIndex = workIdToResultIndex.get(workId);
|
||||||
|
if (existingIndex === undefined) {
|
||||||
|
workIdToResultIndex.set(workId, result.length);
|
||||||
|
result.push(book);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// A sibling from this work is already in the result. Keep whichever
|
||||||
|
// has the richer metadata; on tie, keep the earlier entry (already there).
|
||||||
|
const existing = result[existingIndex];
|
||||||
|
if (metadataScore(book) > metadataScore(existing)) {
|
||||||
|
result[existingIndex] = book;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const collapsed = books.length - result.length;
|
||||||
|
if (collapsed > 0) {
|
||||||
|
logger.debug('Collapsed books via works table', {
|
||||||
|
inputCount: books.length,
|
||||||
|
outputCount: result.length,
|
||||||
|
collapsed,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
} catch (error) {
|
||||||
|
logger.error('collapseByExistingWorks failed; returning input unchanged', {
|
||||||
|
error: error instanceof Error ? error.message : String(error),
|
||||||
|
bookCount: books.length,
|
||||||
|
});
|
||||||
|
return books;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// Sibling ASIN lookup (for library matching expansion)
|
// Sibling ASIN lookup (for library matching expansion)
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
|
|||||||
@@ -109,7 +109,12 @@ export function areDurationsCompatible(a?: number, b?: number): boolean {
|
|||||||
// Metadata scoring (for picking best representative)
|
// Metadata scoring (for picking best representative)
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
function metadataScore(book: AudibleAudiobook): number {
|
/**
|
||||||
|
* Score a book by how much metadata it carries. Used as the tie-breaker when
|
||||||
|
* collapsing duplicates — the entry with the richest metadata wins. Exported
|
||||||
|
* so the works-table collapse pass can apply the same ranking.
|
||||||
|
*/
|
||||||
|
export function metadataScore(book: AudibleAudiobook): number {
|
||||||
let score = 0;
|
let score = 0;
|
||||||
if (book.coverArtUrl) score++;
|
if (book.coverArtUrl) score++;
|
||||||
if (book.rating != null) score++;
|
if (book.rating != null) score++;
|
||||||
|
|||||||
@@ -0,0 +1,37 @@
|
|||||||
|
/**
|
||||||
|
* Component: Narrator Extraction Utility
|
||||||
|
* Documentation: documentation/integrations/audible.md
|
||||||
|
*
|
||||||
|
* Shared helper for Audible HTML scrapers. Audible product listings render
|
||||||
|
* each narrator as a separate `<a href="?searchNarrator=...">` link; using
|
||||||
|
* `.first()` on that selector silently drops co-narrators and breaks dedup
|
||||||
|
* for multi-narrator productions (e.g. full-cast audiobooks). This helper
|
||||||
|
* captures every narrator link and joins them, falling back to the
|
||||||
|
* `.narratorLabel` span when no anchor links are present.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import type * as cheerio from 'cheerio';
|
||||||
|
import type { AnyNode } from 'domhandler';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Extract a comma-joined narrator string from an Audible product list item.
|
||||||
|
*
|
||||||
|
* Order is not semantically significant — downstream `normalizeNarrator()`
|
||||||
|
* sorts before comparison — but document-order preserves a stable, legible
|
||||||
|
* value for caching and logging.
|
||||||
|
*/
|
||||||
|
export function extractAllNarrators(
|
||||||
|
$: cheerio.CheerioAPI,
|
||||||
|
$el: cheerio.Cheerio<AnyNode>,
|
||||||
|
): string {
|
||||||
|
const links = $el.find('a[href*="searchNarrator="]');
|
||||||
|
if (links.length > 0) {
|
||||||
|
const names: string[] = [];
|
||||||
|
links.each((_, link) => {
|
||||||
|
const name = $(link).text().trim();
|
||||||
|
if (name) names.push(name);
|
||||||
|
});
|
||||||
|
if (names.length > 0) return names.join(', ');
|
||||||
|
}
|
||||||
|
return $el.find('.narratorLabel').text().trim();
|
||||||
|
}
|
||||||
@@ -38,12 +38,18 @@ export function getBrowserHeaders(userAgent: string): Record<string, string> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Jittered exponential backoff: 2^attempt * baseMs * random(0.5, 1.5)
|
* Jittered exponential backoff: 2^attempt * baseMs * random(0.5, 1.5),
|
||||||
|
* optionally capped so high attempt counts don't produce absurd waits.
|
||||||
* Avoids predictable retry timing that is trivially fingerprinted.
|
* Avoids predictable retry timing that is trivially fingerprinted.
|
||||||
*/
|
*/
|
||||||
export function jitteredBackoff(attempt: number, baseMs: number = 1000): number {
|
export function jitteredBackoff(
|
||||||
|
attempt: number,
|
||||||
|
baseMs: number = 1000,
|
||||||
|
maxBackoffMs: number = Number.POSITIVE_INFINITY,
|
||||||
|
): number {
|
||||||
const jitter = 0.5 + Math.random(); // 0.5 – 1.5
|
const jitter = 0.5 + Math.random(); // 0.5 – 1.5
|
||||||
return Math.round(Math.pow(2, attempt) * baseMs * jitter);
|
const raw = Math.pow(2, attempt) * baseMs * jitter;
|
||||||
|
return Math.round(Math.min(raw, maxBackoffMs));
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Random integer in [minMs, maxMs] */
|
/** Random integer in [minMs, maxMs] */
|
||||||
|
|||||||
@@ -81,6 +81,122 @@ function apiResponse(envelope: object) {
|
|||||||
return { data: envelope };
|
return { data: envelope };
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// HTML fixture helpers (for getPopularAudiobooks / getNewReleases / getCategoryBooks,
|
||||||
|
// which scrape Audible's curated HTML pages)
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
interface HtmlBookOverrides {
|
||||||
|
asin?: string;
|
||||||
|
title?: string;
|
||||||
|
author?: string;
|
||||||
|
authorAsin?: string;
|
||||||
|
/** Single-narrator shorthand; mutually exclusive with `narrators`. */
|
||||||
|
narrator?: string;
|
||||||
|
/** Multi-narrator productions render each name as its own searchNarrator anchor. */
|
||||||
|
narrators?: string[];
|
||||||
|
coverArtUrl?: string;
|
||||||
|
rating?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Render one or more narrator anchor links suitable for embedding in .narratorLabel. */
|
||||||
|
function renderNarratorLinks(names: string[]): string {
|
||||||
|
return names
|
||||||
|
.map(
|
||||||
|
(name) =>
|
||||||
|
`<a href="/search?searchNarrator=${encodeURIComponent(name)}">${name}</a>`,
|
||||||
|
)
|
||||||
|
.join(', ');
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Produces a single .productListItem block matching the selectors parsed by
|
||||||
|
* parseProductListItems(). The parser looks for an `<li data-asin>` descendant,
|
||||||
|
* with an `<a href="/pd/...">` fallback — using a real `<li>` here both
|
||||||
|
* exercises the primary path and keeps the markup well-formed.
|
||||||
|
*/
|
||||||
|
function makeProductListItemHtml(overrides: HtmlBookOverrides = {}): string {
|
||||||
|
const {
|
||||||
|
asin = 'B000000001',
|
||||||
|
title = 'Test Book',
|
||||||
|
author = 'Test Author',
|
||||||
|
authorAsin = 'A000000001',
|
||||||
|
narrator = 'Test Narrator',
|
||||||
|
narrators,
|
||||||
|
coverArtUrl = 'https://images.example.com/cover._SL500_.jpg',
|
||||||
|
rating = 4.5,
|
||||||
|
} = overrides;
|
||||||
|
|
||||||
|
// Real Audible storefront markup embeds each narrator as its own anchor inside
|
||||||
|
// .narratorLabel for multi-narrator productions. The single-narrator case keeps
|
||||||
|
// the original plain-text span for backward compatibility with existing tests.
|
||||||
|
const narratorMarkup = narrators && narrators.length > 0
|
||||||
|
? `<span class="narratorLabel">Narrated by: ${renderNarratorLinks(narrators)}</span>`
|
||||||
|
: `<span class="narratorLabel">${narrator}</span>`;
|
||||||
|
|
||||||
|
return `
|
||||||
|
<div class="productListItem">
|
||||||
|
<ul>
|
||||||
|
<li data-asin="${asin}">
|
||||||
|
<img src="${coverArtUrl}" />
|
||||||
|
<h3><a href="/pd/test/${asin}">${title}</a></h3>
|
||||||
|
<a class="authorLabel" href="/author/test/${authorAsin}">${author}</a>
|
||||||
|
${narratorMarkup}
|
||||||
|
<span class="ratingsLabel">${rating} out of 5</span>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Produces a single .s-result-item block matching the selectors parsed by
|
||||||
|
* parseSearchResultItems(). Used for /search?node=<categoryId> category pages.
|
||||||
|
*/
|
||||||
|
function makeSearchResultItemHtml(overrides: HtmlBookOverrides = {}): string {
|
||||||
|
const {
|
||||||
|
asin = 'B000000001',
|
||||||
|
title = 'Test Book',
|
||||||
|
author = 'Test Author',
|
||||||
|
authorAsin = 'A000000001',
|
||||||
|
narrator = 'Test Narrator',
|
||||||
|
narrators,
|
||||||
|
coverArtUrl = 'https://images.example.com/cover._SL500_.jpg',
|
||||||
|
rating = 4.5,
|
||||||
|
} = overrides;
|
||||||
|
|
||||||
|
const narratorLinks = narrators && narrators.length > 0
|
||||||
|
? renderNarratorLinks(narrators)
|
||||||
|
: `<a href="/search?searchNarrator=${encodeURIComponent(narrator)}">${narrator}</a>`;
|
||||||
|
|
||||||
|
return `
|
||||||
|
<div class="s-result-item">
|
||||||
|
<ul>
|
||||||
|
<li data-asin="${asin}">
|
||||||
|
<img src="${coverArtUrl}" />
|
||||||
|
<h2><a href="/pd/test/${asin}">${title}</a></h2>
|
||||||
|
<a href="/author/test/${authorAsin}">${author}</a>
|
||||||
|
${narratorLinks}
|
||||||
|
<span class="ratingsLabel">${rating} out of 5</span>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Wrap one or more item-HTML strings in a minimal page document. */
|
||||||
|
function makeHtmlPage(items: string[]): string {
|
||||||
|
return `<html><body>${items.join('')}</body></html>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Produces the value that client.get() should resolve to for HTML responses.
|
||||||
|
* cheerio.load() is called on response.data, so .data must be the raw HTML string.
|
||||||
|
*/
|
||||||
|
function htmlResponse(html: string) {
|
||||||
|
return { data: html };
|
||||||
|
}
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// Test setup
|
// Test setup
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
@@ -683,61 +799,66 @@ describe('AudibleService', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
// getPopularAudiobooks()
|
// getPopularAudiobooks() — HTML scraping of /adblbestsellers
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
|
|
||||||
describe('getPopularAudiobooks()', () => {
|
describe('getPopularAudiobooks()', () => {
|
||||||
it('uses products_sort_by: BestSellers', async () => {
|
it('hits /adblbestsellers on the htmlClient with pageSize=50', async () => {
|
||||||
apiClientMock.get.mockResolvedValue(apiResponse(makeProductsResponse([])));
|
htmlClientMock.get.mockResolvedValue(htmlResponse(makeHtmlPage([makeProductListItemHtml()])));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
await service.getPopularAudiobooks(1);
|
await service.getPopularAudiobooks(1);
|
||||||
|
|
||||||
expect(apiClientMock.get.mock.calls[0][1].params.products_sort_by).toBe('BestSellers');
|
expect(htmlClientMock.get).toHaveBeenCalledWith(
|
||||||
|
'/adblbestsellers',
|
||||||
|
expect.objectContaining({
|
||||||
|
params: expect.objectContaining({ pageSize: 50 }),
|
||||||
|
}),
|
||||||
|
);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('subtracts 1 from public page=1 before calling the API', async () => {
|
it('does not include a page param on the first request (only from page 2 onward)', async () => {
|
||||||
apiClientMock.get.mockResolvedValue(apiResponse(makeProductsResponse([])));
|
htmlClientMock.get.mockResolvedValue(htmlResponse(makeHtmlPage([makeProductListItemHtml()])));
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
|
|
||||||
await service.getPopularAudiobooks(1);
|
await service.getPopularAudiobooks(1);
|
||||||
expect(apiClientMock.get.mock.calls[0][1].params.page).toBe(0);
|
expect(htmlClientMock.get.mock.calls[0][1].params.page).toBeUndefined();
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('makes a second call with page=1 when paginating to page 2', async () => {
|
it('includes page=2 on the second request when paginating', async () => {
|
||||||
const page1Products = Array.from({ length: 50 }, (_, i) =>
|
const page1Items = Array.from({ length: 50 }, (_, i) =>
|
||||||
makeProduct({ asin: `B${String(i).padStart(9, '0')}`, title: `Book ${i}` }),
|
makeProductListItemHtml({ asin: `B${String(i).padStart(9, '0')}`, title: `Book ${i}` }),
|
||||||
);
|
);
|
||||||
const page2Products = Array.from({ length: 25 }, (_, i) =>
|
const page2Items = Array.from({ length: 25 }, (_, i) =>
|
||||||
makeProduct({ asin: `B${String(i + 50).padStart(9, '0')}`, title: `Book ${i + 50}` }),
|
makeProductListItemHtml({ asin: `B${String(i + 50).padStart(9, '0')}`, title: `Book ${i + 50}` }),
|
||||||
);
|
);
|
||||||
|
|
||||||
apiClientMock.get
|
htmlClientMock.get
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse(page1Products, 75)))
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page1Items)))
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse(page2Products, 75)));
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page2Items)));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
|
|
||||||
await service.getPopularAudiobooks(75);
|
await service.getPopularAudiobooks(75);
|
||||||
|
|
||||||
expect(apiClientMock.get.mock.calls[1][1].params.page).toBe(1);
|
expect(htmlClientMock.get.mock.calls[1][1].params.page).toBe(2);
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('paginates and returns up to the requested limit', async () => {
|
it('paginates across pages and returns up to the requested limit', async () => {
|
||||||
const page1Products = Array.from({ length: 50 }, (_, i) =>
|
const page1Items = Array.from({ length: 50 }, (_, i) =>
|
||||||
makeProduct({ asin: `B${String(i).padStart(9, '0')}`, title: `Book ${i}` }),
|
makeProductListItemHtml({ asin: `B${String(i).padStart(9, '0')}`, title: `Book ${i}` }),
|
||||||
);
|
);
|
||||||
const page2Products = Array.from({ length: 25 }, (_, i) =>
|
const page2Items = Array.from({ length: 25 }, (_, i) =>
|
||||||
makeProduct({ asin: `B${String(i + 50).padStart(9, '0')}`, title: `Book ${i + 50}` }),
|
makeProductListItemHtml({ asin: `B${String(i + 50).padStart(9, '0')}`, title: `Book ${i + 50}` }),
|
||||||
);
|
);
|
||||||
|
|
||||||
apiClientMock.get
|
htmlClientMock.get
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse(page1Products, 75)))
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page1Items)))
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse(page2Products, 75)));
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page2Items)));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
@@ -747,176 +868,338 @@ describe('AudibleService', () => {
|
|||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('stops early when a page returns fewer than the page size', async () => {
|
it('stops early when a page returns fewer than half the page size', async () => {
|
||||||
const products = [makeProduct()];
|
htmlClientMock.get.mockResolvedValueOnce(
|
||||||
apiClientMock.get.mockResolvedValueOnce(apiResponse(makeProductsResponse(products, 1)));
|
htmlResponse(makeHtmlPage([makeProductListItemHtml()])),
|
||||||
|
);
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const results = await service.getPopularAudiobooks(50);
|
const results = await service.getPopularAudiobooks(50);
|
||||||
|
|
||||||
expect(results).toHaveLength(1);
|
expect(results).toHaveLength(1);
|
||||||
expect(apiClientMock.get).toHaveBeenCalledTimes(1);
|
expect(htmlClientMock.get).toHaveBeenCalledTimes(1);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('deduplicates by ASIN across pages', async () => {
|
it('deduplicates by ASIN across pages', async () => {
|
||||||
const sharedProduct = makeProduct({ asin: 'BDUP000001', title: 'Duplicated Book' });
|
const sharedAsin = 'BDUP000001';
|
||||||
const uniqueProduct = makeProduct({ asin: 'BUNIQ000001', title: 'Unique Book' });
|
const uniqueAsin = 'BUNIQ000001';
|
||||||
|
|
||||||
apiClientMock.get
|
// Build a "full" first page (50 items, all with the shared ASIN duplicated as filler)
|
||||||
.mockResolvedValueOnce(
|
// so the parser proceeds to page 2.
|
||||||
apiResponse(makeProductsResponse([sharedProduct], 51)),
|
const page1Items = [
|
||||||
)
|
makeProductListItemHtml({ asin: sharedAsin, title: 'Duplicated Book' }),
|
||||||
.mockResolvedValueOnce(
|
...Array.from({ length: 49 }, (_, i) =>
|
||||||
// page 2 returns the same ASIN plus a new one
|
makeProductListItemHtml({ asin: `BFILL${String(i).padStart(5, '0')}`, title: `Filler ${i}` }),
|
||||||
apiResponse(makeProductsResponse([sharedProduct, uniqueProduct], 51)),
|
),
|
||||||
);
|
];
|
||||||
|
const page2Items = [
|
||||||
|
makeProductListItemHtml({ asin: sharedAsin, title: 'Duplicated Book' }),
|
||||||
|
makeProductListItemHtml({ asin: uniqueAsin, title: 'Unique Book' }),
|
||||||
|
...Array.from({ length: 48 }, (_, i) =>
|
||||||
|
makeProductListItemHtml({ asin: `BFILL2${String(i).padStart(4, '0')}`, title: `Filler2 ${i}` }),
|
||||||
|
),
|
||||||
|
];
|
||||||
|
|
||||||
|
htmlClientMock.get
|
||||||
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page1Items)))
|
||||||
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page2Items)));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
const results = await service.getPopularAudiobooks(100);
|
const results = await service.getPopularAudiobooks(150);
|
||||||
|
|
||||||
const asins = results.map((r) => r.asin);
|
const asins = results.map((r) => r.asin);
|
||||||
expect(asins.filter((a) => a === 'BDUP000001')).toHaveLength(1);
|
expect(asins.filter((a) => a === sharedAsin)).toHaveLength(1);
|
||||||
|
expect(asins).toContain(uniqueAsin);
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns empty array on error without throwing', async () => {
|
it('returns empty array on error without throwing', async () => {
|
||||||
const error: Error & { response?: { status: number } } = new Error('Not Found');
|
const error: Error & { response?: { status: number } } = new Error('Not Found');
|
||||||
error.response = { status: 404 };
|
error.response = { status: 404 };
|
||||||
apiClientMock.get.mockRejectedValue(error);
|
htmlClientMock.get.mockRejectedValue(error);
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const results = await service.getPopularAudiobooks(5);
|
const results = await service.getPopularAudiobooks(5);
|
||||||
|
|
||||||
expect(results).toEqual([]);
|
expect(results).toEqual([]);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('uses htmlClient (not apiClient) for the request', async () => {
|
||||||
|
htmlClientMock.get.mockResolvedValue(htmlResponse(makeHtmlPage([makeProductListItemHtml()])));
|
||||||
|
|
||||||
|
const service = new AudibleService();
|
||||||
|
await service.getPopularAudiobooks(1);
|
||||||
|
|
||||||
|
expect(htmlClientMock.get).toHaveBeenCalled();
|
||||||
|
expect(apiClientMock.get).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('maps title, author, narrator, and rating from the parsed item', async () => {
|
||||||
|
htmlClientMock.get.mockResolvedValue(
|
||||||
|
htmlResponse(
|
||||||
|
makeHtmlPage([
|
||||||
|
makeProductListItemHtml({
|
||||||
|
asin: 'B0HTMLMAP1',
|
||||||
|
title: 'Mapped Title',
|
||||||
|
author: 'Mapped Author',
|
||||||
|
authorAsin: 'A00MAPAUTH',
|
||||||
|
narrator: 'Mapped Narrator',
|
||||||
|
rating: 4.7,
|
||||||
|
}),
|
||||||
|
]),
|
||||||
|
),
|
||||||
|
);
|
||||||
|
|
||||||
|
const service = new AudibleService();
|
||||||
|
const [book] = await service.getPopularAudiobooks(1);
|
||||||
|
|
||||||
|
expect(book.asin).toBe('B0HTMLMAP1');
|
||||||
|
expect(book.title).toBe('Mapped Title');
|
||||||
|
expect(book.author).toBe('Mapped Author');
|
||||||
|
expect(book.authorAsin).toBe('A00MAPAUTH');
|
||||||
|
expect(book.narrator).toBe('Mapped Narrator');
|
||||||
|
expect(book.rating).toBeCloseTo(4.7);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('captures every co-narrator on multi-narrator productions (regression: prior code took only the first link)', async () => {
|
||||||
|
htmlClientMock.get.mockResolvedValue(
|
||||||
|
htmlResponse(
|
||||||
|
makeHtmlPage([
|
||||||
|
makeProductListItemHtml({
|
||||||
|
asin: 'B0FULLCAST',
|
||||||
|
narrators: [
|
||||||
|
'Kristin Atherton',
|
||||||
|
'Roy McMillan',
|
||||||
|
'Clare Corbett',
|
||||||
|
'Tom Bateman',
|
||||||
|
'Patience Tomlinson',
|
||||||
|
'Shaheen Khan',
|
||||||
|
],
|
||||||
|
}),
|
||||||
|
]),
|
||||||
|
),
|
||||||
|
);
|
||||||
|
|
||||||
|
const service = new AudibleService();
|
||||||
|
const [book] = await service.getPopularAudiobooks(1);
|
||||||
|
|
||||||
|
// Every narrator must round-trip — order is not significant downstream,
|
||||||
|
// but document order should be preserved for stable cache values.
|
||||||
|
expect(book.narrator).toBe(
|
||||||
|
'Kristin Atherton, Roy McMillan, Clare Corbett, Tom Bateman, Patience Tomlinson, Shaheen Khan',
|
||||||
|
);
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
// getNewReleases()
|
// getNewReleases() — HTML scraping of /newreleases
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
|
|
||||||
describe('getNewReleases()', () => {
|
describe('getNewReleases()', () => {
|
||||||
it('uses products_sort_by: -ReleaseDate', async () => {
|
it('hits /newreleases on the htmlClient with pageSize=50', async () => {
|
||||||
apiClientMock.get.mockResolvedValue(apiResponse(makeProductsResponse([])));
|
htmlClientMock.get.mockResolvedValue(htmlResponse(makeHtmlPage([makeProductListItemHtml()])));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
await service.getNewReleases(1);
|
await service.getNewReleases(1);
|
||||||
|
|
||||||
expect(apiClientMock.get.mock.calls[0][1].params.products_sort_by).toBe('-ReleaseDate');
|
expect(htmlClientMock.get).toHaveBeenCalledWith(
|
||||||
|
'/newreleases',
|
||||||
|
expect.objectContaining({
|
||||||
|
params: expect.objectContaining({ pageSize: 50 }),
|
||||||
|
}),
|
||||||
|
);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('subtracts 1 from public page=1 before calling the API', async () => {
|
it('does not include a page param on the first request', async () => {
|
||||||
apiClientMock.get.mockResolvedValue(apiResponse(makeProductsResponse([])));
|
htmlClientMock.get.mockResolvedValue(htmlResponse(makeHtmlPage([makeProductListItemHtml()])));
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
|
|
||||||
await service.getNewReleases(1);
|
await service.getNewReleases(1);
|
||||||
expect(apiClientMock.get.mock.calls[0][1].params.page).toBe(0);
|
expect(htmlClientMock.get.mock.calls[0][1].params.page).toBeUndefined();
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('subtracts 1 from public page=2 when paginating to the second page', async () => {
|
it('includes page=2 on the second request when paginating', async () => {
|
||||||
const page1Products = Array.from({ length: 50 }, (_, i) =>
|
const page1Items = Array.from({ length: 50 }, (_, i) =>
|
||||||
makeProduct({ asin: `B${String(i).padStart(9, '0')}` }),
|
makeProductListItemHtml({ asin: `B${String(i).padStart(9, '0')}` }),
|
||||||
|
);
|
||||||
|
const page2Items = Array.from({ length: 50 }, (_, i) =>
|
||||||
|
makeProductListItemHtml({ asin: `B${String(i + 50).padStart(9, '0')}` }),
|
||||||
);
|
);
|
||||||
const page2Products = [makeProduct({ asin: 'BNEW000099' })];
|
|
||||||
|
|
||||||
apiClientMock.get
|
htmlClientMock.get
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse(page1Products, 51)))
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page1Items)))
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse(page2Products, 51)));
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page2Items)));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
|
|
||||||
await service.getNewReleases(51);
|
await service.getNewReleases(100);
|
||||||
expect(apiClientMock.get.mock.calls[1][1].params.page).toBe(1);
|
expect(htmlClientMock.get.mock.calls[1][1].params.page).toBe(2);
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('deduplicates by ASIN across pages', async () => {
|
it('deduplicates by ASIN across pages', async () => {
|
||||||
const sharedProduct = makeProduct({ asin: 'BDUP000002' });
|
const sharedAsin = 'BDUP000002';
|
||||||
apiClientMock.get
|
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse([sharedProduct], 51)))
|
const page1Items = [
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse([sharedProduct], 51)));
|
makeProductListItemHtml({ asin: sharedAsin }),
|
||||||
|
...Array.from({ length: 49 }, (_, i) =>
|
||||||
|
makeProductListItemHtml({ asin: `BNEW${String(i).padStart(6, '0')}` }),
|
||||||
|
),
|
||||||
|
];
|
||||||
|
const page2Items = [
|
||||||
|
makeProductListItemHtml({ asin: sharedAsin }),
|
||||||
|
...Array.from({ length: 49 }, (_, i) =>
|
||||||
|
makeProductListItemHtml({ asin: `BNEW2${String(i).padStart(5, '0')}` }),
|
||||||
|
),
|
||||||
|
];
|
||||||
|
|
||||||
|
htmlClientMock.get
|
||||||
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page1Items)))
|
||||||
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page2Items)));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
const results = await service.getNewReleases(100);
|
const results = await service.getNewReleases(150);
|
||||||
|
|
||||||
expect(results.filter((r) => r.asin === 'BDUP000002')).toHaveLength(1);
|
expect(results.filter((r) => r.asin === sharedAsin)).toHaveLength(1);
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns empty array on error without throwing', async () => {
|
it('returns empty array on error without throwing', async () => {
|
||||||
const error: Error & { response?: { status: number } } = new Error('Not Found');
|
const error: Error & { response?: { status: number } } = new Error('Not Found');
|
||||||
error.response = { status: 404 };
|
error.response = { status: 404 };
|
||||||
apiClientMock.get.mockRejectedValue(error);
|
htmlClientMock.get.mockRejectedValue(error);
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const results = await service.getNewReleases(5);
|
const results = await service.getNewReleases(5);
|
||||||
|
|
||||||
expect(results).toEqual([]);
|
expect(results).toEqual([]);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('uses htmlClient (not apiClient) for the request', async () => {
|
||||||
|
htmlClientMock.get.mockResolvedValue(htmlResponse(makeHtmlPage([makeProductListItemHtml()])));
|
||||||
|
|
||||||
|
const service = new AudibleService();
|
||||||
|
await service.getNewReleases(1);
|
||||||
|
|
||||||
|
expect(htmlClientMock.get).toHaveBeenCalled();
|
||||||
|
expect(apiClientMock.get).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
// getCategoryBooks()
|
// getCategoryBooks() — HTML scraping of /search?node=<categoryId>
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
|
|
||||||
describe('getCategoryBooks()', () => {
|
describe('getCategoryBooks()', () => {
|
||||||
it('sends category_id and BestSellers sort param', async () => {
|
it('hits /search on the htmlClient with node, pageSize, and popularity-rank sort', async () => {
|
||||||
apiClientMock.get.mockResolvedValue(apiResponse(makeProductsResponse([])));
|
htmlClientMock.get.mockResolvedValue(
|
||||||
|
htmlResponse(makeHtmlPage([makeSearchResultItemHtml()])),
|
||||||
|
);
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
await service.getCategoryBooks('18685580011', 1);
|
await service.getCategoryBooks('18685580011', 1);
|
||||||
|
|
||||||
const params = apiClientMock.get.mock.calls[0][1].params;
|
const params = htmlClientMock.get.mock.calls[0][1].params;
|
||||||
expect(params.category_id).toBe('18685580011');
|
expect(htmlClientMock.get.mock.calls[0][0]).toBe('/search');
|
||||||
expect(params.products_sort_by).toBe('BestSellers');
|
expect(params.node).toBe('18685580011');
|
||||||
|
expect(params.pageSize).toBe(50);
|
||||||
|
expect(params.sort).toBe('popularity-rank');
|
||||||
});
|
});
|
||||||
|
|
||||||
it('subtracts 1 from public page=1 before calling the API', async () => {
|
it('does not include a page param on the first request', async () => {
|
||||||
apiClientMock.get.mockResolvedValue(apiResponse(makeProductsResponse([])));
|
htmlClientMock.get.mockResolvedValue(
|
||||||
|
htmlResponse(makeHtmlPage([makeSearchResultItemHtml()])),
|
||||||
|
);
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
|
|
||||||
await service.getCategoryBooks('CAT001', 1);
|
await service.getCategoryBooks('CAT001', 1);
|
||||||
expect(apiClientMock.get.mock.calls[0][1].params.page).toBe(0);
|
expect(htmlClientMock.get.mock.calls[0][1].params.page).toBeUndefined();
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('subtracts 1 from public page=2 when paginating to the second page', async () => {
|
it('includes page=2 on the second request when paginating', async () => {
|
||||||
const page1Products = Array.from({ length: 50 }, (_, i) =>
|
const page1Items = Array.from({ length: 50 }, (_, i) =>
|
||||||
makeProduct({ asin: `B${String(i).padStart(9, '0')}` }),
|
makeSearchResultItemHtml({ asin: `B${String(i).padStart(9, '0')}` }),
|
||||||
|
);
|
||||||
|
const page2Items = Array.from({ length: 50 }, (_, i) =>
|
||||||
|
makeSearchResultItemHtml({ asin: `B${String(i + 50).padStart(9, '0')}` }),
|
||||||
);
|
);
|
||||||
const page2Products = [makeProduct({ asin: 'BCAT000099' })];
|
|
||||||
|
|
||||||
apiClientMock.get
|
htmlClientMock.get
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse(page1Products, 51)))
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page1Items)))
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse(page2Products, 51)));
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page2Items)));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
|
|
||||||
await service.getCategoryBooks('CAT001', 51);
|
await service.getCategoryBooks('CAT001', 100);
|
||||||
expect(apiClientMock.get.mock.calls[1][1].params.page).toBe(1);
|
expect(htmlClientMock.get.mock.calls[1][1].params.page).toBe(2);
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
it('deduplicates by ASIN across pages', async () => {
|
it('deduplicates by ASIN across pages', async () => {
|
||||||
const sharedProduct = makeProduct({ asin: 'BDUP000003' });
|
const sharedAsin = 'BDUP000003';
|
||||||
apiClientMock.get
|
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse([sharedProduct], 51)))
|
const page1Items = [
|
||||||
.mockResolvedValueOnce(apiResponse(makeProductsResponse([sharedProduct], 51)));
|
makeSearchResultItemHtml({ asin: sharedAsin }),
|
||||||
|
...Array.from({ length: 49 }, (_, i) =>
|
||||||
|
makeSearchResultItemHtml({ asin: `BCAT${String(i).padStart(6, '0')}` }),
|
||||||
|
),
|
||||||
|
];
|
||||||
|
const page2Items = [
|
||||||
|
makeSearchResultItemHtml({ asin: sharedAsin }),
|
||||||
|
...Array.from({ length: 49 }, (_, i) =>
|
||||||
|
makeSearchResultItemHtml({ asin: `BCAT2${String(i).padStart(5, '0')}` }),
|
||||||
|
),
|
||||||
|
];
|
||||||
|
|
||||||
|
htmlClientMock.get
|
||||||
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page1Items)))
|
||||||
|
.mockResolvedValueOnce(htmlResponse(makeHtmlPage(page2Items)));
|
||||||
|
|
||||||
const service = new AudibleService();
|
const service = new AudibleService();
|
||||||
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
const delaySpy = vi.spyOn(service as any, 'delay').mockResolvedValue(undefined);
|
||||||
const results = await service.getCategoryBooks('CAT001', 100);
|
const results = await service.getCategoryBooks('CAT001', 150);
|
||||||
|
|
||||||
expect(results.filter((r) => r.asin === 'BDUP000003')).toHaveLength(1);
|
expect(results.filter((r) => r.asin === sharedAsin)).toHaveLength(1);
|
||||||
delaySpy.mockRestore();
|
delaySpy.mockRestore();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('uses htmlClient (not apiClient) for the request', async () => {
|
||||||
|
htmlClientMock.get.mockResolvedValue(
|
||||||
|
htmlResponse(makeHtmlPage([makeSearchResultItemHtml()])),
|
||||||
|
);
|
||||||
|
|
||||||
|
const service = new AudibleService();
|
||||||
|
await service.getCategoryBooks('CAT001', 1);
|
||||||
|
|
||||||
|
expect(htmlClientMock.get).toHaveBeenCalled();
|
||||||
|
expect(apiClientMock.get).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('captures every co-narrator on multi-narrator productions (regression: prior code took only the first link)', async () => {
|
||||||
|
htmlClientMock.get.mockResolvedValue(
|
||||||
|
htmlResponse(
|
||||||
|
makeHtmlPage([
|
||||||
|
makeSearchResultItemHtml({
|
||||||
|
asin: 'B0FULLCAST',
|
||||||
|
narrators: ['Alice', 'Bob', 'Carol', 'Dan'],
|
||||||
|
}),
|
||||||
|
]),
|
||||||
|
),
|
||||||
|
);
|
||||||
|
|
||||||
|
const service = new AudibleService();
|
||||||
|
const [book] = await service.getCategoryBooks('CAT001', 1);
|
||||||
|
|
||||||
|
expect(book.narrator).toBe('Alice, Bob, Carol, Dan');
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
|
|||||||
@@ -198,4 +198,69 @@ describe('processAudibleRefresh', () => {
|
|||||||
const { processAudibleRefresh } = await import('@/lib/processors/audible-refresh.processor');
|
const { processAudibleRefresh } = await import('@/lib/processors/audible-refresh.processor');
|
||||||
await expect(processAudibleRefresh({ jobId: 'job-2' })).rejects.toThrow('DB down');
|
await expect(processAudibleRefresh({ jobId: 'job-2' })).rejects.toThrow('DB down');
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('deduplicates ASINs in the input list before persisting, preserving order', async () => {
|
||||||
|
// Two `A` entries should collapse to one. Final ranks must be contiguous
|
||||||
|
// (1, 2, 3) and follow Audible's editorial ordering (A, B, C).
|
||||||
|
const popular = [
|
||||||
|
{ asin: 'A', title: 'Book A', author: 'X', coverArtUrl: null },
|
||||||
|
{ asin: 'B', title: 'Book B', author: 'X', coverArtUrl: null },
|
||||||
|
{ asin: 'A', title: 'Book A (duplicate)', author: 'X', coverArtUrl: null },
|
||||||
|
{ asin: 'C', title: 'Book C', author: 'X', coverArtUrl: null },
|
||||||
|
];
|
||||||
|
|
||||||
|
audibleServiceMock.getPopularAudiobooks.mockResolvedValue(popular);
|
||||||
|
audibleServiceMock.getNewReleases.mockResolvedValue([]);
|
||||||
|
thumbnailCacheMock.cleanupUnusedThumbnails.mockResolvedValue(0);
|
||||||
|
prismaMock.audibleCache.upsert.mockResolvedValue({});
|
||||||
|
prismaMock.audibleCacheCategory.deleteMany.mockResolvedValue({ count: 0 });
|
||||||
|
prismaMock.audibleCacheCategory.create.mockResolvedValue({});
|
||||||
|
prismaMock.userHomeSection.findMany.mockResolvedValue([]);
|
||||||
|
prismaMock.audibleCache.findMany.mockResolvedValue([]);
|
||||||
|
|
||||||
|
const { processAudibleRefresh } = await import('@/lib/processors/audible-refresh.processor');
|
||||||
|
const result = await processAudibleRefresh({ jobId: 'job-dedup' });
|
||||||
|
|
||||||
|
expect(result.popularSaved).toBe(3);
|
||||||
|
|
||||||
|
// Only 3 category entries created — the duplicate `A` was dropped.
|
||||||
|
const popularCreates = (prismaMock.audibleCacheCategory.create.mock.calls as Array<[{ data: { asin: string; categoryId: string; rank: number } }]>)
|
||||||
|
.map((c) => c[0].data)
|
||||||
|
.filter((d) => d.categoryId === '__popular__');
|
||||||
|
expect(popularCreates).toHaveLength(3);
|
||||||
|
expect(popularCreates.map((d) => d.asin)).toEqual(['A', 'B', 'C']);
|
||||||
|
expect(popularCreates.map((d) => d.rank)).toEqual([1, 2, 3]);
|
||||||
|
|
||||||
|
// upsert called once per unique ASIN, not per input row.
|
||||||
|
expect(prismaMock.audibleCache.upsert).toHaveBeenCalledTimes(3);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('drops entries with missing ASINs as part of dedup', async () => {
|
||||||
|
const popular = [
|
||||||
|
{ asin: 'A', title: 'Book A', author: 'X', coverArtUrl: null },
|
||||||
|
{ asin: '', title: 'Book with empty asin', author: 'X', coverArtUrl: null },
|
||||||
|
{ asin: null, title: 'Book with null asin', author: 'X', coverArtUrl: null },
|
||||||
|
{ asin: 'B', title: 'Book B', author: 'X', coverArtUrl: null },
|
||||||
|
];
|
||||||
|
|
||||||
|
audibleServiceMock.getPopularAudiobooks.mockResolvedValue(popular as any);
|
||||||
|
audibleServiceMock.getNewReleases.mockResolvedValue([]);
|
||||||
|
thumbnailCacheMock.cleanupUnusedThumbnails.mockResolvedValue(0);
|
||||||
|
prismaMock.audibleCache.upsert.mockResolvedValue({});
|
||||||
|
prismaMock.audibleCacheCategory.deleteMany.mockResolvedValue({ count: 0 });
|
||||||
|
prismaMock.audibleCacheCategory.create.mockResolvedValue({});
|
||||||
|
prismaMock.userHomeSection.findMany.mockResolvedValue([]);
|
||||||
|
prismaMock.audibleCache.findMany.mockResolvedValue([]);
|
||||||
|
|
||||||
|
const { processAudibleRefresh } = await import('@/lib/processors/audible-refresh.processor');
|
||||||
|
const result = await processAudibleRefresh({ jobId: 'job-empty-asin' });
|
||||||
|
|
||||||
|
expect(result.popularSaved).toBe(2);
|
||||||
|
|
||||||
|
const popularCreates = (prismaMock.audibleCacheCategory.create.mock.calls as Array<[{ data: { asin: string; categoryId: string; rank: number } }]>)
|
||||||
|
.map((c) => c[0].data)
|
||||||
|
.filter((d) => d.categoryId === '__popular__');
|
||||||
|
expect(popularCreates.map((d) => d.asin)).toEqual(['A', 'B']);
|
||||||
|
expect(popularCreates.map((d) => d.rank)).toEqual([1, 2]);
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -6,6 +6,15 @@
|
|||||||
import { beforeEach, describe, expect, it, vi } from 'vitest';
|
import { beforeEach, describe, expect, it, vi } from 'vitest';
|
||||||
import { createPrismaMock } from '../helpers/prisma';
|
import { createPrismaMock } from '../helpers/prisma';
|
||||||
import type { DedupGroup } from '@/lib/utils/deduplicate-audiobooks';
|
import type { DedupGroup } from '@/lib/utils/deduplicate-audiobooks';
|
||||||
|
import type { AudibleAudiobook } from '@/lib/integrations/audible.service';
|
||||||
|
|
||||||
|
function makeBook(overrides: Partial<AudibleAudiobook> & { asin: string }): AudibleAudiobook {
|
||||||
|
return {
|
||||||
|
title: 'Test Book',
|
||||||
|
author: 'Test Author',
|
||||||
|
...overrides,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
const prismaMock = createPrismaMock();
|
const prismaMock = createPrismaMock();
|
||||||
|
|
||||||
@@ -304,3 +313,183 @@ describe('getSiblingAsins', () => {
|
|||||||
expect(result.has('ASIN_LONELY')).toBe(false);
|
expect(result.has('ASIN_LONELY')).toBe(false);
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
describe('collapseByExistingWorks', () => {
|
||||||
|
beforeEach(() => {
|
||||||
|
vi.clearAllMocks();
|
||||||
|
vi.resetModules();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns input unchanged when the list is empty or has one entry', async () => {
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
expect(await collapseByExistingWorks([])).toEqual([]);
|
||||||
|
expect(prismaMock.workAsin.findMany).not.toHaveBeenCalled();
|
||||||
|
|
||||||
|
const single = [makeBook({ asin: 'A1' })];
|
||||||
|
expect(await collapseByExistingWorks(single)).toEqual(single);
|
||||||
|
expect(prismaMock.workAsin.findMany).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns input unchanged when none of the ASINs are in any work', async () => {
|
||||||
|
prismaMock.workAsin.findMany.mockResolvedValue([]);
|
||||||
|
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
const books = [
|
||||||
|
makeBook({ asin: 'A1', title: 'Alpha' }),
|
||||||
|
makeBook({ asin: 'A2', title: 'Beta' }),
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = await collapseByExistingWorks(books);
|
||||||
|
expect(result).toEqual(books);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('collapses two ASINs that share a work to a single representative', async () => {
|
||||||
|
prismaMock.workAsin.findMany.mockResolvedValue([
|
||||||
|
{ asin: 'A1', workId: 'work-1' },
|
||||||
|
{ asin: 'A2', workId: 'work-1' },
|
||||||
|
]);
|
||||||
|
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
const books = [
|
||||||
|
makeBook({ asin: 'A1', title: 'The Passengers', coverArtUrl: 'cover.jpg' }),
|
||||||
|
makeBook({ asin: 'A2', title: 'The Passengers' }),
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = await collapseByExistingWorks(books);
|
||||||
|
expect(result).toHaveLength(1);
|
||||||
|
// A1 wins — it has the cover URL (higher metadata score)
|
||||||
|
expect(result[0].asin).toBe('A1');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('keeps the richest-metadata entry when collapsing, regardless of input order', async () => {
|
||||||
|
prismaMock.workAsin.findMany.mockResolvedValue([
|
||||||
|
{ asin: 'A1', workId: 'work-1' },
|
||||||
|
{ asin: 'A2', workId: 'work-1' },
|
||||||
|
]);
|
||||||
|
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
// A1 first (sparse), A2 second (rich) — A2 should win on score
|
||||||
|
const books = [
|
||||||
|
makeBook({ asin: 'A1', title: 'Book' }),
|
||||||
|
makeBook({
|
||||||
|
asin: 'A2',
|
||||||
|
title: 'Book',
|
||||||
|
coverArtUrl: 'cover.jpg',
|
||||||
|
rating: 4.5,
|
||||||
|
durationMinutes: 600,
|
||||||
|
narrator: 'Full Cast',
|
||||||
|
description: 'Rich book',
|
||||||
|
releaseDate: '2024-01-01',
|
||||||
|
genres: ['Fiction'],
|
||||||
|
}),
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = await collapseByExistingWorks(books);
|
||||||
|
expect(result).toHaveLength(1);
|
||||||
|
expect(result[0].asin).toBe('A2');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('preserves position of the work in the input order', async () => {
|
||||||
|
prismaMock.workAsin.findMany.mockResolvedValue([
|
||||||
|
{ asin: 'A2', workId: 'work-1' },
|
||||||
|
{ asin: 'A4', workId: 'work-1' },
|
||||||
|
]);
|
||||||
|
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
const books = [
|
||||||
|
makeBook({ asin: 'A1', title: 'Alpha' }),
|
||||||
|
makeBook({ asin: 'A2', title: 'Beta' }),
|
||||||
|
makeBook({ asin: 'A3', title: 'Gamma' }),
|
||||||
|
makeBook({ asin: 'A4', title: 'Beta' }),
|
||||||
|
makeBook({ asin: 'A5', title: 'Delta' }),
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = await collapseByExistingWorks(books);
|
||||||
|
// A2 and A4 collapse to one entry at position 1 (the first occurrence)
|
||||||
|
expect(result.map(b => b.asin)).toEqual(['A1', 'A2', 'A3', 'A5']);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('handles multiple independent works in the same batch', async () => {
|
||||||
|
prismaMock.workAsin.findMany.mockResolvedValue([
|
||||||
|
{ asin: 'A1', workId: 'work-1' },
|
||||||
|
{ asin: 'A2', workId: 'work-1' },
|
||||||
|
{ asin: 'B1', workId: 'work-2' },
|
||||||
|
{ asin: 'B2', workId: 'work-2' },
|
||||||
|
{ asin: 'B3', workId: 'work-2' },
|
||||||
|
]);
|
||||||
|
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
const books = [
|
||||||
|
makeBook({ asin: 'A1' }),
|
||||||
|
makeBook({ asin: 'B1' }),
|
||||||
|
makeBook({ asin: 'A2' }),
|
||||||
|
makeBook({ asin: 'B2' }),
|
||||||
|
makeBook({ asin: 'B3' }),
|
||||||
|
makeBook({ asin: 'C1' }),
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = await collapseByExistingWorks(books);
|
||||||
|
expect(result.map(b => b.asin)).toEqual(['A1', 'B1', 'C1']);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('passes through books that are not in any work alongside collapsed ones', async () => {
|
||||||
|
prismaMock.workAsin.findMany.mockResolvedValue([
|
||||||
|
{ asin: 'A1', workId: 'work-1' },
|
||||||
|
{ asin: 'A2', workId: 'work-1' },
|
||||||
|
]);
|
||||||
|
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
const books = [
|
||||||
|
makeBook({ asin: 'STANDALONE_1', title: 'Standalone 1' }),
|
||||||
|
makeBook({ asin: 'A1', title: 'Same Book' }),
|
||||||
|
makeBook({ asin: 'STANDALONE_2', title: 'Standalone 2' }),
|
||||||
|
makeBook({ asin: 'A2', title: 'Same Book' }),
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = await collapseByExistingWorks(books);
|
||||||
|
expect(result).toHaveLength(3);
|
||||||
|
expect(result.map(b => b.asin)).toEqual(['STANDALONE_1', 'A1', 'STANDALONE_2']);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns input unchanged on DB failure (does not throw)', async () => {
|
||||||
|
prismaMock.workAsin.findMany.mockRejectedValue(new Error('DB exploded'));
|
||||||
|
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
const books = [
|
||||||
|
makeBook({ asin: 'A1' }),
|
||||||
|
makeBook({ asin: 'A2' }),
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = await collapseByExistingWorks(books);
|
||||||
|
expect(result).toEqual(books);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('only queries the workAsin table once per call', async () => {
|
||||||
|
prismaMock.workAsin.findMany.mockResolvedValue([
|
||||||
|
{ asin: 'A1', workId: 'work-1' },
|
||||||
|
{ asin: 'A2', workId: 'work-1' },
|
||||||
|
]);
|
||||||
|
|
||||||
|
const { collapseByExistingWorks } = await import('@/lib/services/works.service');
|
||||||
|
|
||||||
|
await collapseByExistingWorks([
|
||||||
|
makeBook({ asin: 'A1' }),
|
||||||
|
makeBook({ asin: 'A2' }),
|
||||||
|
makeBook({ asin: 'A3' }),
|
||||||
|
]);
|
||||||
|
|
||||||
|
expect(prismaMock.workAsin.findMany).toHaveBeenCalledTimes(1);
|
||||||
|
expect(prismaMock.workAsin.findMany).toHaveBeenCalledWith({
|
||||||
|
where: { asin: { in: ['A1', 'A2', 'A3'] } },
|
||||||
|
select: { asin: true, workId: true },
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|||||||
@@ -0,0 +1,95 @@
|
|||||||
|
/**
|
||||||
|
* Component: Narrator Extraction Utility Tests
|
||||||
|
* Documentation: documentation/integrations/audible.md
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { describe, expect, it } from 'vitest';
|
||||||
|
import * as cheerio from 'cheerio';
|
||||||
|
import { extractAllNarrators } from '@/lib/utils/extract-narrator';
|
||||||
|
|
||||||
|
function load(html: string) {
|
||||||
|
const $ = cheerio.load(`<div id="item">${html}</div>`);
|
||||||
|
return { $, $el: $('#item') };
|
||||||
|
}
|
||||||
|
|
||||||
|
describe('extractAllNarrators', () => {
|
||||||
|
it('returns the single narrator name when only one searchNarrator link is present', () => {
|
||||||
|
const { $, $el } = load(
|
||||||
|
`<a href="/search?searchNarrator=Andy%20Serkis">Andy Serkis</a>`,
|
||||||
|
);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe('Andy Serkis');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('joins multiple narrator names from separate searchNarrator links', () => {
|
||||||
|
const { $, $el } = load(`
|
||||||
|
<a href="/search?searchNarrator=Kristin%20Atherton">Kristin Atherton</a>,
|
||||||
|
<a href="/search?searchNarrator=Roy%20McMillan">Roy McMillan</a>,
|
||||||
|
<a href="/search?searchNarrator=Clare%20Corbett">Clare Corbett</a>,
|
||||||
|
<a href="/search?searchNarrator=Tom%20Bateman">Tom Bateman</a>,
|
||||||
|
<a href="/search?searchNarrator=Patience%20Tomlinson">Patience Tomlinson</a>,
|
||||||
|
<a href="/search?searchNarrator=Shaheen%20Khan">Shaheen Khan</a>
|
||||||
|
`);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe(
|
||||||
|
'Kristin Atherton, Roy McMillan, Clare Corbett, Tom Bateman, Patience Tomlinson, Shaheen Khan',
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('preserves document order (downstream sorts before comparing, but order should be stable)', () => {
|
||||||
|
const { $, $el } = load(`
|
||||||
|
<a href="/search?searchNarrator=Z">Zelda</a>
|
||||||
|
<a href="/search?searchNarrator=A">Alice</a>
|
||||||
|
<a href="/search?searchNarrator=M">Mallory</a>
|
||||||
|
`);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe('Zelda, Alice, Mallory');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to .narratorLabel text when no searchNarrator links exist', () => {
|
||||||
|
const { $, $el } = load(
|
||||||
|
`<span class="narratorLabel">Narrated by: Single Narrator</span>`,
|
||||||
|
);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe('Narrated by: Single Narrator');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('prefers searchNarrator links over .narratorLabel when both are present', () => {
|
||||||
|
const { $, $el } = load(`
|
||||||
|
<span class="narratorLabel">Narrated by: ONLY ONE</span>
|
||||||
|
<a href="/search?searchNarrator=First">First</a>
|
||||||
|
<a href="/search?searchNarrator=Second">Second</a>
|
||||||
|
`);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe('First, Second');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns empty string when neither links nor .narratorLabel exist', () => {
|
||||||
|
const { $, $el } = load(`<span>some other content</span>`);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe('');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('skips empty link text and joins only non-empty names', () => {
|
||||||
|
const { $, $el } = load(`
|
||||||
|
<a href="/search?searchNarrator=A"></a>
|
||||||
|
<a href="/search?searchNarrator=B">Bob</a>
|
||||||
|
<a href="/search?searchNarrator=C"> </a>
|
||||||
|
<a href="/search?searchNarrator=D">Diana</a>
|
||||||
|
`);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe('Bob, Diana');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('trims whitespace from each captured name', () => {
|
||||||
|
const { $, $el } = load(`
|
||||||
|
<a href="/search?searchNarrator=A"> Alice </a>
|
||||||
|
<a href="/search?searchNarrator=B">
|
||||||
|
Bob
|
||||||
|
</a>
|
||||||
|
`);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe('Alice, Bob');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to .narratorLabel when all searchNarrator links are empty', () => {
|
||||||
|
const { $, $el } = load(`
|
||||||
|
<a href="/search?searchNarrator=A"></a>
|
||||||
|
<a href="/search?searchNarrator=B"> </a>
|
||||||
|
<span class="narratorLabel">Fallback Narrator</span>
|
||||||
|
`);
|
||||||
|
expect(extractAllNarrators($, $el)).toBe('Fallback Narrator');
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -67,6 +67,24 @@ describe('jitteredBackoff', () => {
|
|||||||
expect(value).toBeGreaterThanOrEqual(250);
|
expect(value).toBeGreaterThanOrEqual(250);
|
||||||
expect(value).toBeLessThanOrEqual(750);
|
expect(value).toBeLessThanOrEqual(750);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('caps the result at maxBackoffMs when the raw backoff would exceed it', () => {
|
||||||
|
// attempt=10 with base=1000 produces 2^10 * 1000 * [0.5..1.5] = 512_000..1_536_000,
|
||||||
|
// all of which exceed a 60_000ms cap.
|
||||||
|
for (let i = 0; i < 50; i++) {
|
||||||
|
const value = jitteredBackoff(10, 1000, 60_000);
|
||||||
|
expect(value).toBeLessThanOrEqual(60_000);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns the un-capped jittered value when below the cap', () => {
|
||||||
|
// attempt=0 with base=1000 produces 500..1500, all below a 60_000ms cap.
|
||||||
|
for (let i = 0; i < 50; i++) {
|
||||||
|
const value = jitteredBackoff(0, 1000, 60_000);
|
||||||
|
expect(value).toBeGreaterThanOrEqual(500);
|
||||||
|
expect(value).toBeLessThanOrEqual(1500);
|
||||||
|
}
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
describe('randomDelay', () => {
|
describe('randomDelay', () => {
|
||||||
|
|||||||
Reference in New Issue
Block a user