Adds file hash-based matching for Audiobookshelf library items to ensure 100% accurate ASIN assignment for RMAB-organized content. Removes fuzzy matching from library availability checks, making all matching ASIN-only to eliminate false positives and race conditions. Updates database schema, processors, and matcher utilities; adds new tests and documentation for the new matching strategy. Removes obsolete scripts, Dockerfile, and related tests; updates docker-compose for test environments.
10 KiB
E-book Sidecar
Status: ✅ Implemented | Optional e-book downloads from Anna's Archive
Overview
Automatically downloads e-books from Anna's Archive to accompany audiobooks, placing them in the same folder.
Key Details
- When: Runs during file organization (after audiobook copied, after cover art)
- Matching: ASIN-based search (exact match)
- Non-blocking: Failures don't affect audiobook download
- Atomic: Either succeeds or fails gracefully
- Location: E-book placed in same directory as audiobook
- Filename:
[Title] - [Author].[format](sanitized)
Configuration
Admin Settings → E-book Sidecar tab
| Key | Default | Options | Description |
|---|---|---|---|
ebook_sidecar_enabled |
false |
true/false |
Enable feature |
ebook_sidecar_preferred_format |
epub |
epub, pdf, mobi, azw3, any |
Preferred format |
ebook_sidecar_base_url |
https://annas-archive.li |
URL | Base URL (mirror resilience) |
ebook_sidecar_flaresolverr_url |
`` (empty) | URL | FlareSolverr proxy URL (optional) |
Stored in: Configuration table (database)
FlareSolverr Integration
Anna's Archive uses Cloudflare protection which may block direct scraping requests. FlareSolverr solves this by using a headless browser to bypass the protection.
What is FlareSolverr?
- Proxy server using headless Chrome/Chromium
- Automatically solves Cloudflare challenges
- Returns HTML content after challenge is solved
- Open source: https://github.com/FlareSolverr/FlareSolverr
When to Use FlareSolverr
- Required: When e-book downloads consistently fail with no search results
- Optional: If direct requests work (depends on Cloudflare's current state)
- Recommended: For reliable, consistent downloads
Setup
- Run FlareSolverr via Docker:
docker run -d --name flaresolverr -p 8191:8191 ghcr.io/flaresolverr/flaresolverr:latest - In Admin Settings → E-book Sidecar, enter:
http://localhost:8191 - Click "Test Connection" to verify
How It Works
- Requests are routed through FlareSolverr
- FlareSolverr loads the page in headless Chrome
- If Cloudflare challenge appears, it waits for solution
- HTML is returned after page loads
- Falls back to direct requests if FlareSolverr fails
Performance Impact
- First request: ~5-10 seconds (browser startup)
- Subsequent requests: ~2-5 seconds per page
- Total time: ~15-30 seconds per e-book (vs ~5-15 without)
How It Works
Flow
- Trigger: File organization completes audiobook copy
- Check:
ebook_sidecar_enabled === 'true' - Search: Try ASIN first (if available), then fall back to title + author
- Extract MD5: First search result → MD5 hash
- Get Download Links: Find "no waitlist" slow download links
- Extract URL: Parse slow download page for actual file server URL
- Download: Stream file to audiobook directory
- Rename: Sanitize filename based on metadata
Scraping Strategy
Method 1: ASIN Search (exact match)
Search: https://annas-archive.li/search?ext=epub&lang=en&q="asin:B09TWSRMCB"
↓
MD5 Page: https://annas-archive.li/md5/[md5]
↓ (Filter: "slow partner server" links)
Slow Download: https://annas-archive.li/slow_download/[md5]/0/5
↓ (Parse for actual download URL)
File Server: http://[server-ip]:port/path/to/file.epub
Method 2: Title + Author Search (fallback)
Search: https://annas-archive.li/search?q=The+Housemaid+Freida+McFadden
&ext=epub
&content=book_nonfiction&content=book_fiction&content=book_unknown
&lang=en
↓
(Same flow as ASIN search from MD5 page onwards)
Matching Priority
- ASIN (exact match - most accurate, if available)
- Title + Author (fuzzy match with book/language filters)
Retry Logic
- Max attempts: 5 slow download links
- Timeout: 60 seconds per download
- Delays: 1.5 seconds between requests
- Retries: 3x for 5xx errors with exponential backoff
Format Support
| Format | Extension | Recommended | Notes |
|---|---|---|---|
| EPUB | .epub |
✅ Yes | Most compatible with e-readers |
.pdf |
⚠️ Sometimes | Best for fixed-layout books | |
| MOBI | .mobi |
⚠️ Legacy | Kindle (older devices) |
| AZW3 | .azw3 |
⚠️ Sometimes | Kindle (newer devices) |
| Any | [first available] |
❌ No | Downloads first match |
Recommendation: Use EPUB for maximum compatibility.
File Naming
Pattern: [Title] - [Author].[format]
Sanitization:
- Remove invalid chars:
<>:"/\|?* - Collapse multiple spaces
- Trim leading/trailing spaces and dots
- Limit to 100 characters
Examples:
The Housemaid - Freida McFadden.epubProject Hail Mary - Andy Weir.pdf
Error Handling
Graceful Failures (non-blocking):
- No ASIN available → Skip silently (log info)
- No search results → Log warning, continue audiobook
- No download links → Log warning, continue audiobook
- All downloads fail → Log error, continue audiobook
- Download timeout → Log error, continue audiobook
Never Blocks Audiobook:
- All e-book errors are non-fatal
- Audiobook organization completes regardless
- Errors logged to job events (visible in admin)
Logging
Success (with FlareSolverr):
E-book sidecar enabled, searching for e-book...
Using FlareSolverr at http://localhost:8191
Searching by ASIN: B09TWSRMCB (format: epub)...
Found via ASIN: 3b6f9c0f1665c4ba6e3214d43c37e1de
Found MD5: 3b6f9c0f1665c4ba6e3214d43c37e1de
Found 8 download link(s)
Attempting download link 1/5...
Downloading from: 93.123.118.12
E-book downloaded: The Housemaid - Freida McFadden.epub
Success (ASIN match, direct):
E-book sidecar enabled, searching for e-book...
Searching by ASIN: B09TWSRMCB (format: epub)...
Found via ASIN: 3b6f9c0f1665c4ba6e3214d43c37e1de
Found MD5: 3b6f9c0f1665c4ba6e3214d43c37e1de
Found 8 download link(s)
Attempting download link 1/5...
Downloading from: 93.123.118.12
E-book downloaded: The Housemaid - Freida McFadden.epub
Success (Title fallback):
E-book sidecar enabled, searching for e-book...
Searching by ASIN: B09TWSRMCB (format: epub)...
No results for ASIN, falling back to title + author search...
Searching by title + author: "The Housemaid" by Freida McFadden...
Found via title search: 3b6f9c0f1665c4ba6e3214d43c37e1de
Found MD5: 3b6f9c0f1665c4ba6e3214d43c37e1de
Found 8 download link(s)
E-book downloaded: The Housemaid - Freida McFadden.epub
Failure:
E-book sidecar enabled, searching for e-book...
Searching by ASIN: B09TWSRMCB (format: epub)...
No results for ASIN, falling back to title + author search...
Searching by title + author: "The Housemaid" by Freida McFadden...
No search results found for title: "The Housemaid" by Freida McFadden
E-book download failed: No search results found (tried ASIN and title+author)
Troubleshooting
E-book Not Downloaded
Cause: No matching e-book in Anna's Archive (tried ASIN and title+author) Solution: Not all audiobooks have e-book equivalents, this is expected
Cause: ASIN mismatch (Anna's Archive has different ASIN) Solution: Feature now automatically falls back to title + author search
Cause: All download links failed Solution: Check job logs for errors, may be temporary server issues
Wrong Format Downloaded
Cause: Preferred format not available Solution: Anna's Archive doesn't have that format, falls back to available format
Download Timeout
Cause: Slow file server or large file Solution: Automatic retry with next download link
Feature Not Working
Cause: Feature disabled Solution: Admin Settings → E-book Sidecar → Enable toggle
Cloudflare Blocking
Cause: Anna's Archive has Cloudflare protection enabled Solution: Configure FlareSolverr (see FlareSolverr Integration section)
Symptoms:
- No search results found
- Requests timing out
- Errors about Cloudflare challenge
FlareSolverr Not Working
Cause: FlareSolverr not running or unreachable Solution:
- Verify FlareSolverr is running:
docker ps | grep flaresolverr - Check URL is correct (usually
http://localhost:8191) - Test connection in Admin Settings
Cause: FlareSolverr timing out Solution: FlareSolverr may need more time; check container logs for errors
Security & Legal
Important Notes:
- Anna's Archive is a shadow library
- Use at your own discretion and responsibility
- Ensure compliance with local laws and regulations
- Feature is optional and disabled by default
- No API key required (web scraping)
Privacy:
- User-Agent:
ReadMeABook/1.0 (Audiobook Automation) - No tracking or analytics
- Distributed (each user scrapes for themselves)
Technical Implementation
Files:
- Service:
src/lib/services/ebook-scraper.ts - Integration:
src/lib/utils/file-organizer.ts(line 265+) - Settings API:
src/app/api/admin/settings/ebook/route.ts - FlareSolverr Test API:
src/app/api/admin/settings/ebook/test-flaresolverr/route.ts - UI:
src/app/admin/settings/page.tsx(ebook tab)
Dependencies:
- axios (HTTP requests)
- cheerio (HTML parsing)
- fs/promises (file operations)
Caching:
- MD5 lookups cached in-memory (prevents re-scraping same ASIN)
- Cache cleared on service restart
Performance
Impact:
- Network: 3-5 requests per e-book (search, MD5, slow download pages)
- Time: ~5-15 seconds per e-book (depends on file server)
- Storage: E-books typically 1-50 MB
- CPU: Minimal (streaming download)
Limitations
- Match Accuracy: Title + author search may return wrong book if title is common
- Format Availability: Depends on Anna's Archive catalog
- Download Speed: Depends on file server load
- Language: Title search filters for English books only
- Success Rate: ~70-90% (ASIN has higher accuracy, title fallback is less precise)
Future Enhancements
- ISBN-13 fallback matching (between ASIN and title search)
- Format preference priority list (try EPUB, then PDF, then MOBI)
- Per-request override (API endpoint)
- Statistics tracking (success rate, formats, match method)
- Rate limit monitoring
- Relevance scoring for title search results
Related
- File Organization - Where e-book download happens
- Settings Pages - Configuration UI
- Configuration Service - Settings storage