Add multi-source ebook search & processing

Refactor ebook flow to support multiple sources (Anna's Archive direct downloads + Prowlarr indexer search) and unify handling with existing audiobook processors. Key changes:
- search-ebook.processor: rewritten to try Anna's Archive first then fall back to indexer search, add Prowlarr grouping, ranking (rankEbookTorrents), and handlers to route results to direct-download or download-torrent flows.
- organize-files.processor: enriches audiobook/ebook metadata from AudibleCache (year, narrator), treats indexer downloads specially (seed retention), adds optional NZB cleanup/archive logic, and improves retryable error detection.
- file-organizer: organizeEbook now accepts additional metadata and an isIndexerDownload flag and supports directories vs single-file paths.
- API/UI: include request.type in admin requests API and remove the “coming soon” notice from Ebook settings tab.
- fetch-ebook route: removed blocking error for indexer-only mode so the flow can proceed when indexer search is enabled.
- Documentation: update TOC, ebook-sidecar, settings-pages, and ranking-algorithm docs to describe indexer search, unified ebook ranking, configuration, and flows.
These changes enable indexer-based ebook discovery, ranking, and downloads while preserving existing Anna's Archive behavior and reusing audiobook download processors where possible.
This commit is contained in:
kikootwo
2026-02-02 12:27:54 -05:00
parent 433123fcc3
commit 9dd09ec836
11 changed files with 1142 additions and 238 deletions
+5 -2
View File
@@ -41,10 +41,11 @@
## E-book Support (First-Class)
- **First-class ebook requests, separate tracking** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
- **Multi-source ebook downloads (Anna's Archive + Indexer Search)** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
- **Ebook indexer search (Prowlarr with ebook categories)** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md#flow-indexer-search)
- **ASIN-based matching, format selection** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
- **Ebook ranking algorithm (inverted size scoring)** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
- **Ebook ranking algorithm (unified with audiobooks)** → [phase3/ranking-algorithm.md](phase3/ranking-algorithm.md#ebook-torrent-ranking)
- **Direct HTTP downloads from Anna's Archive** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
- **Ebook delete behavior (files only)** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
- **Ebook delete behavior (files only, torrents seed)** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md#delete-behavior)
- **Ebook settings (3-section UI)** → [settings-pages.md](settings-pages.md#e-book-sidecar)
- **Indexer categories (audiobook/ebook tabs)** → [settings-pages.md](settings-pages.md#indexer-categories-tabbed)
@@ -116,7 +117,9 @@
**"How does e-book support work?"** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md)
**"How do I enable e-book downloads?"** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md), [settings-pages.md](settings-pages.md#e-book-sidecar)
**"How do I configure ebook sources (Anna's Archive vs Indexer)?"** → [settings-pages.md](settings-pages.md#e-book-sidecar)
**"How does ebook indexer search work?"** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md#flow-indexer-search)
**"How do I configure ebook categories per indexer?"** → [settings-pages.md](settings-pages.md#indexer-categories-tabbed)
**"How does ebook ranking work?"** → [phase3/ranking-algorithm.md](phase3/ranking-algorithm.md#ebook-torrent-ranking)
**"What happens when I delete an ebook request?"** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md#delete-behavior)
**"Why do ebook requests have an orange badge?"** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md#ui-representation)
**"How do scheduled jobs work?"** → [backend/services/scheduler.md](backend/services/scheduler.md)
+66 -36
View File
@@ -1,9 +1,9 @@
# E-book Support
**Status:** ✅ Implemented | First-class ebook requests with multi-source support (Anna's Archive + future Indexer Search)
**Status:** ✅ Implemented | First-class ebook requests with multi-source support (Anna's Archive + Indexer Search)
## Overview
Ebooks are first-class citizens in RMAB, with their own request type, tracking, and UI representation. When an audiobook request completes, an ebook request is automatically created (if a source is enabled). Supports multiple sources: Anna's Archive (direct HTTP) and Indexer Search (via Prowlarr, coming soon).
Ebooks are first-class citizens in RMAB, with their own request type, tracking, and UI representation. When an audiobook request completes, an ebook request is automatically created (if a source is enabled). Supports multiple sources: Anna's Archive (direct HTTP) and Indexer Search (via Prowlarr with ebook categories).
## Key Details
@@ -14,15 +14,35 @@ Ebooks are first-class citizens in RMAB, with their own request type, tracking,
- **UI Badge:** Orange (#f16f19) ebook badge to distinguish from audiobooks
- **Separate Tracking:** Own progress, status, and error handling
### Source Priority
1. **Anna's Archive** (if enabled) - Direct HTTP downloads
- Searched first via ASIN, then title + author
- Uses FlareSolverr if configured (Cloudflare bypass)
2. **Indexer Search** (if enabled, and no Anna's Archive result)
- Searches Prowlarr with ebook categories (default: 7020)
- Ranks using unified ranking algorithm with ebook-specific scoring
- Downloads via qBittorrent (torrents) or SABnzbd (Usenet)
3. **Both disabled** → Ebook downloads disabled entirely
### Flow (Anna's Archive)
1. Audiobook organization completes
2. Ebook request created automatically (if Anna's Archive enabled)
2. Ebook request created automatically (if source enabled)
3. `search_ebook` job searches Anna's Archive
4. `start_direct_download` downloads via HTTP
5. `organize_files` copies to audiobook folder
6. Request marked as `downloaded` (terminal)
7. "Available" notification sent
### Flow (Indexer Search)
1. Audiobook organization completes
2. Ebook request created automatically (if source enabled)
3. `search_ebook` job searches indexers (if Anna's Archive failed/disabled)
4. `download_torrent` job adds to qBittorrent/SABnzbd (reuses audiobook processor)
5. `monitor_download` tracks progress
6. `organize_files` copies to audiobook folder
7. Request marked as `downloaded` (terminal)
8. Torrent left to seed (respects seeding limits)
### Configuration
**Admin Settings → E-book Sidecar tab** (3 sections)
@@ -37,7 +57,7 @@ Ebooks are first-class citizens in RMAB, with their own request type, tracking,
#### Section 2: Indexer Search
| Key | Default | Description |
|-----|---------|-------------|
| `ebook_indexer_search_enabled` | `false` | Enable Indexer Search (not yet implemented) |
| `ebook_indexer_search_enabled` | `false` | Enable Indexer Search via Prowlarr |
*Note: Ebook categories are configured per-indexer in Settings → Indexers → Edit Indexer → EBook tab*
@@ -46,11 +66,6 @@ Ebooks are first-class citizens in RMAB, with their own request type, tracking,
|-----|---------|---------|-------------|
| `ebook_sidecar_preferred_format` | `epub` | `epub, pdf, mobi, azw3, any` | Preferred format |
### Source Priority
- If **Anna's Archive** is enabled → Use Anna's Archive (current behavior)
- If **only Indexer Search** is enabled → Log "not yet implemented", skip gracefully
- If **both disabled** → Ebook downloads disabled entirely
## Database Schema
**Request model additions:**
@@ -66,25 +81,36 @@ childRequests Request[] @relation("EbookParent")
## Job Processors
### search_ebook
- Searches Anna's Archive by ASIN first, then title + author
- Creates download history record with `downloadClient: 'direct'`
- Triggers `start_direct_download` job
- Searches Anna's Archive first (if enabled), then indexers (if enabled)
- Anna's Archive: Creates download history with `downloadClient: 'direct'`, triggers `start_direct_download`
- Indexer: Triggers `download_torrent` job (reuses audiobook processor)
### start_direct_download
- Downloads file via HTTP with progress tracking
- Tries multiple slow download links on failure
- Triggers `organize_files` on success
### monitor_direct_download
- Future use for async download monitoring
- Currently, most tracking happens in start_direct_download
### download_torrent (shared with audiobooks)
- Routes to qBittorrent (torrents) or SABnzbd (Usenet)
- Creates download history with indexer metadata
- Triggers `monitor_download` job
## Ranking Algorithm
## Ranking Algorithm (Indexer Results)
Ebook ranking (for future multi-source support):
- **Format Score:** 40 pts (exact match) to 10 pts (different format)
- **Size Score:** 30 pts (inverse - smaller files preferred)
- **Source Score:** 30 pts (Anna's Archive gets full score)
Ebook torrent ranking uses unified algorithm with ebook-specific scoring:
| Component | Points | Description |
|-----------|--------|-------------|
| **Title/Author Match** | 60 pts | Reuses audiobook matching logic (word coverage, author presence) |
| **Format Match** | 10 pts | 10 pts if matches preferred format, 0 otherwise |
| **Size Quality** | 15 pts | Inverted: < 5MB = 15pts, 5-15MB = 10pts, 15-20MB = 5pts |
| **Seeder Count** | 15 pts | Logarithmic scaling (same as audiobooks) |
**Filtering:**
- Files > 20 MB are filtered out (too large for ebooks)
- Dual threshold: base score >= 50 AND final score >= 50
**Bonus System:** Same as audiobooks (indexer priority, flag bonuses)
## Delete Behavior
@@ -94,6 +120,7 @@ Ebook ranking (for future multi-source support):
- Does NOT delete from backend library (Plex/ABS)
- Does NOT clear audiobook availability linkage
- Soft-deletes the ebook request record
- Torrents left to seed (respects seeding limits)
## UI Representation
@@ -124,7 +151,7 @@ Configure URL in Admin Settings → E-book Sidecar: `http://localhost:8191`
- Subsequent: ~2-5 seconds per page
- Total: ~15-30 seconds per ebook
## Scraping Strategy
## Scraping Strategy (Anna's Archive)
### Method 1: ASIN Search (exact match)
```
@@ -161,17 +188,19 @@ Search: https://annas-archive.li/search?q=Title+Author&ext=epub&lang=en
## Technical Files
**Processors:**
- `src/lib/processors/search-ebook.processor.ts`
- `src/lib/processors/direct-download.processor.ts`
- `src/lib/processors/search-ebook.processor.ts` - Multi-source search
- `src/lib/processors/direct-download.processor.ts` - Anna's Archive downloads
- `src/lib/processors/download-torrent.processor.ts` - Indexer downloads (shared)
- `src/lib/processors/organize-files.processor.ts` (ebook branch)
**Services:**
- `src/lib/services/ebook-scraper.ts`
- `src/lib/services/ebook-scraper.ts` - Anna's Archive scraping
- `src/lib/services/job-queue.service.ts` (ebook job types)
**Utils:**
- `src/lib/utils/file-organizer.ts` (`organizeEbook` method)
- `src/lib/utils/ranking-algorithm.ts` (`rankEbooks` function)
- `src/lib/utils/ranking-algorithm.ts` (`rankEbookTorrents` function)
- `src/lib/utils/indexer-grouping.ts` (supports `'ebook'` type)
**UI:**
- `src/components/requests/RequestCard.tsx` (ebook badge)
@@ -183,17 +212,10 @@ Search: https://annas-archive.li/search?q=Title+Author&ext=epub&lang=en
| Format | Extension | Recommended |
|--------|-----------|-------------|
| EPUB | `.epub` | Yes |
| PDF | `.pdf` | ⚠️ Sometimes |
| MOBI | `.mobi` | ⚠️ Legacy |
| AZW3 | `.azw3` | ⚠️ Sometimes |
## Limitations
1. Indexer Search not yet implemented (settings ready, search stubbed)
2. Title search may return wrong book for common titles
3. Download speed depends on file server load
4. English books only (title search filter)
| EPUB | `.epub` | Yes |
| PDF | `.pdf` | Sometimes |
| MOBI | `.mobi` | Legacy |
| AZW3 | `.azw3` | Sometimes |
## Indexer Categories
@@ -203,8 +225,16 @@ Indexer configuration supports separate category arrays for audiobooks and ebook
Categories are configured per-indexer via the tabbed interface in the Edit Indexer modal.
## Limitations
1. Title search may return wrong book for common titles
2. Download speed depends on file server load (Anna's Archive)
3. English books only (title search filter for Anna's Archive)
4. Format detection from torrent titles may be imprecise
## Related
- [File Organization](../phase3/file-organization.md) - Ebook organization
- [Settings Pages](../settings-pages.md) - Configuration UI
- [Ranking Algorithm](../phase3/ranking-algorithm.md) - Ebook ranking
- [Request Deletion](../admin-features/request-deletion.md) - Delete behavior
- [Prowlarr Integration](../phase3/prowlarr.md) - Indexer search
+74
View File
@@ -286,6 +286,80 @@ const ranked = rankTorrents(torrents, audiobook, {
return ranked; // User can see torrents without author info
```
## Ebook Torrent Ranking
The ranking algorithm also supports ebook torrents from indexers with ebook-specific scoring.
### Unified Code Architecture
Ebook ranking **reuses** the following from audiobook ranking:
- `scoreMatch()` - Title/author matching (60 pts)
- `scoreSeeders()` - Seeder count scoring (15 pts)
- Bonus modifier system (indexer priority, flag bonuses)
- Dual threshold filtering (base >= 50, final >= 50)
### Ebook-Specific Scoring
**Format Match (10 pts max)**
- 10 pts if torrent format matches preferred format
- 0 pts otherwise (no partial credit)
- Format detected from torrent title keywords: `.epub`, `.pdf`, `.mobi`, `.azw3`, etc.
**Size Quality (15 pts max, INVERTED)**
- < 5 MB: 15 pts (optimal for ebooks)
- 5-15 MB: 10 pts (may have images)
- 15-20 MB: 5 pts (large but acceptable)
- > 20 MB: **Filtered out** (too large for ebooks)
### Ebook vs Audiobook Comparison
| Component | Audiobook | Ebook |
|-----------|-----------|-------|
| Title/Author | 60 pts (reused) | 60 pts (reused) |
| Format | 10 pts (M4B > M4A > MP3) | 10 pts (match = 10, else 0) |
| Size | 15 pts (larger = better) | 15 pts (smaller = better) |
| Seeders | 15 pts (reused) | 15 pts (reused) |
| Size Filter | < 20 MB filtered | > 20 MB filtered |
### Ebook Interface
```typescript
interface EbookTorrentRequest {
title: string;
author: string;
preferredFormat: string; // 'epub', 'pdf', 'mobi', etc.
}
interface RankEbookTorrentsOptions {
indexerPriorities?: Map<number, number>;
flagConfigs?: IndexerFlagConfig[];
requireAuthor?: boolean; // Default: true
}
function rankEbookTorrents(
torrents: TorrentResult[],
ebook: EbookTorrentRequest,
options?: RankEbookTorrentsOptions
): RankedEbookTorrent[];
```
### Ebook Usage Example
```typescript
// Ebook search from indexers
const ranked = rankEbookTorrents(prowlarrResults, {
title: 'Project Hail Mary',
author: 'Andy Weir',
preferredFormat: 'epub',
}, {
indexerPriorities,
flagConfigs,
requireAuthor: true,
});
const bestEbook = ranked[0]; // Safe to auto-download
```
## Tech Stack
- string-similarity (fuzzy matching)
+4 -4
View File
@@ -85,7 +85,7 @@ src/app/admin/settings/
- FlareSolverr URL (optional, for Cloudflare bypass)
2. **Indexer Search Section**
- Enable toggle for indexer-based ebook search (not yet implemented)
- Enable toggle for indexer-based ebook search via Prowlarr
- Hint directing users to Indexers tab for category configuration
3. **General Settings Section** (visible when any source enabled)
@@ -95,14 +95,14 @@ src/app/admin/settings/
| Key | Default | Description |
|-----|---------|-------------|
| `ebook_annas_archive_enabled` | `false` | Enable Anna's Archive |
| `ebook_indexer_search_enabled` | `false` | Enable Indexer Search (stubbed) |
| `ebook_indexer_search_enabled` | `false` | Enable Indexer Search via Prowlarr |
| `ebook_sidecar_preferred_format` | `epub` | Preferred format |
| `ebook_sidecar_base_url` | `https://annas-archive.li` | Anna's Archive mirror |
| `ebook_sidecar_flaresolverr_url` | `` | FlareSolverr URL |
**Behavior:**
- If Anna's Archive enabled → Downloads work (current implementation)
- If only Indexer Search enabled → Gracefully logs "not yet implemented"
- If Anna's Archive enabled → Searches Anna's Archive first
- If Indexer Search enabled → Falls back to indexer search if Anna's Archive fails/disabled
- If both disabled → Ebook downloads completely off
## Indexer Categories (Tabbed)