barbercollie/ReadMeABook

Fork 1

mirror of https://github.com/kikootwo/ReadMeABook.git synced 2026-06-02 20:30:10 +00:00

Files

T

kikootwo a3ba192fbd Initial commit

2026-01-28 11:41:24 -05:00

4.0 KiB

Raw Blame History

Audible Integration

Status: ✅ Implemented (Audnexus API + Web Scraping)

Audiobook metadata from Audnexus API (primary) and Audible.com scraping (fallback) for discovery, search, and detail pages.

Detail Page Strategy

Primary: Audnexus API

Endpoint: https://api.audnex.us/books/{asin}
Structured JSON response (no parsing needed)
Provides: title, authors, narrators, description, duration, rating, genres, cover art
Free, no API key required
~95% success rate for popular audiobooks

Fallback: Audible Scraping

Used when Audnexus returns 404
Parse Audible HTML with Cheerio
Multiple selector strategies with promotional text filtering
Extract JSON-LD structured data when available

Discovery Strategy (Popular/New/Search)

Parse Audible HTML with Cheerio
Multi-page scraping (20 items/page)
Rate limit: max 10 req/min, 1.5s delay between pages
Cache results in database (24hr TTL)

Data Sources

Best Sellers: https://www.audible.com/adblbestsellers
New Releases: https://www.audible.com/newreleases
Search: https://www.audible.com/search?keywords={query}
Detail Page: https://www.audible.com/pd/{asin}

Metadata Extracted

ASIN (Audible ID)
Title, author, narrator
Duration (minutes), release date, rating
Description, cover art URL
Genres/categories

Unified Matching (`audiobook-matcher.ts`)

Status: ✅ Production Ready

Single matching algorithm used everywhere (search, popular, new-releases, jobs).

Process:

Query DB candidates: audibleId exact match OR partial title+author match
If exact ASIN match → return immediately
Fuzzy match: title 70% + author 30% weights, 70% threshold
Return best match or null

Benefits:

Real-time matching at query time (not pre-matched)
Works regardless of job execution order
Prevents duplicate plexGuid assignments
Used by all APIs for consistency

Database-First Approach

Status: ✅ Implemented

Discovery APIs serve cached data from DB with real-time matching.

Flow:

audible_refresh job runs daily → fetches 200 popular + 200 new releases
Downloads and caches cover thumbnails locally (reduces Audible load)
Stores in DB with flags (isPopular, isNewRelease) and rankings
Cleans up unused thumbnails after sync
API routes query DB → apply real-time matching → return enriched results
Homepage loads instantly (no Audible API hits)

Thumbnail Caching

Status: ✅ Implemented

Cover images cached locally to reduce external requests and improve performance.

Features:

Downloads covers during audible_refresh job
Stores in /app/cache/thumbnails (Docker volume)
Serves via /api/cache/thumbnails/[filename]
Auto-cleanup of unused thumbnails
Falls back to original URL if cache fails
24-hour browser cache headers

Implementation:

Service: src/lib/services/thumbnail-cache.service.ts
API Route: src/app/api/cache/thumbnails/[filename]/route.ts
Storage: Docker volume cache mounted at /app/cache
Filename: {asin}.{ext} (e.g., B08G9PRS1K.jpg)

API Endpoints:

GET /api/audiobooks/popular?page=1&limit=20 GET /api/audiobooks/new-releases?page=1&limit=20

Response:

{
  success: boolean;
  audiobooks: EnrichedAudibleAudiobook[];
  count: number;
  totalCount: number;
  page: number;
  totalPages: number;
  hasMore: boolean;
  lastSync: string | null; // ISO timestamp
  message?: string; // if no data
}

Data Models

interface AudibleAudiobook {
  asin: string;
  title: string;
  author: string;
  narrator?: string;
  description?: string;
  coverArtUrl?: string;
  durationMinutes?: number;
  releaseDate?: string;
  rating?: number;
  genres?: string[];
}

interface EnrichedAudibleAudiobook extends AudibleAudiobook {
  availabilityStatus: 'available' | 'requested' | 'unknown';
  isAvailable: boolean;
  plexGuid: string | null;
  dbId: string;
}

Tech Stack

axios (HTTP)
cheerio (HTML parsing)
Redis (caching, optional)
Database (PostgreSQL)
string-similarity (matching)

4.0 KiB Raw Blame History