diff --git a/documentation/TABLEOFCONTENTS.md b/documentation/TABLEOFCONTENTS.md index d10cc6b..bb677cb 100644 --- a/documentation/TABLEOFCONTENTS.md +++ b/documentation/TABLEOFCONTENTS.md @@ -36,6 +36,11 @@ - **Database caching, real-time matching** → [integrations/audible.md](integrations/audible.md) - **Book covers API for login page** → [frontend/pages/login.md](frontend/pages/login.md) +## E-book Sidecar +- **Optional e-book downloads from Anna's Archive** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md) +- **ASIN-based matching, format selection** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md) +- **Non-blocking, atomic failures** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md) + ## Automation Pipeline - **Full pipeline overview** → [phase3/README.md](phase3/README.md) - **Search via Prowlarr (torrents + NZBs)** → [phase3/prowlarr.md](phase3/prowlarr.md) @@ -76,6 +81,8 @@ **"How do torrent downloads work?"** → [phase3/qbittorrent.md](phase3/qbittorrent.md), [backend/services/jobs.md](backend/services/jobs.md) **"How do Usenet/NZB downloads work?"** → [phase3/sabnzbd.md](phase3/sabnzbd.md), [backend/services/jobs.md](backend/services/jobs.md) **"How does Plex matching work?"** → [integrations/plex.md](integrations/plex.md) +**"How does e-book sidecar work?"** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md) +**"How do I enable e-book downloads?"** → [integrations/ebook-sidecar.md](integrations/ebook-sidecar.md), [settings-pages.md](settings-pages.md) **"How do scheduled jobs work?"** → [backend/services/scheduler.md](backend/services/scheduler.md) **"How do I configure external services?"** → [setup-wizard.md](setup-wizard.md), [settings-pages.md](settings-pages.md) **"What's the database schema?"** → [backend/database.md](backend/database.md) diff --git a/documentation/integrations/ebook-sidecar.md b/documentation/integrations/ebook-sidecar.md new file mode 100644 index 0000000..6e96dbc --- /dev/null +++ b/documentation/integrations/ebook-sidecar.md @@ -0,0 +1,307 @@ +# E-book Sidecar + +**Status:** ✅ Implemented | Optional e-book downloads from Anna's Archive + +## Overview +Automatically downloads e-books from Anna's Archive to accompany audiobooks, placing them in the same folder. + +## Key Details +- **When:** Runs during file organization (after audiobook copied, after cover art) +- **Matching:** ASIN-based search (exact match) +- **Non-blocking:** Failures don't affect audiobook download +- **Atomic:** Either succeeds or fails gracefully +- **Location:** E-book placed in same directory as audiobook +- **Filename:** `[Title] - [Author].[format]` (sanitized) + +## Configuration + +**Admin Settings → E-book Sidecar tab** + +| Key | Default | Options | Description | +|-----|---------|---------|-------------| +| `ebook_sidecar_enabled` | `false` | `true/false` | Enable feature | +| `ebook_sidecar_preferred_format` | `epub` | `epub, pdf, mobi, azw3, any` | Preferred format | +| `ebook_sidecar_base_url` | `https://annas-archive.li` | URL | Base URL (mirror resilience) | +| `ebook_sidecar_flaresolverr_url` | `` (empty) | URL | FlareSolverr proxy URL (optional) | + +**Stored in:** `Configuration` table (database) + +## FlareSolverr Integration + +Anna's Archive uses Cloudflare protection which may block direct scraping requests. FlareSolverr solves this by using a headless browser to bypass the protection. + +### What is FlareSolverr? +- Proxy server using headless Chrome/Chromium +- Automatically solves Cloudflare challenges +- Returns HTML content after challenge is solved +- Open source: https://github.com/FlareSolverr/FlareSolverr + +### When to Use FlareSolverr +- **Required:** When e-book downloads consistently fail with no search results +- **Optional:** If direct requests work (depends on Cloudflare's current state) +- **Recommended:** For reliable, consistent downloads + +### Setup +1. Run FlareSolverr via Docker: + ```bash + docker run -d --name flaresolverr -p 8191:8191 ghcr.io/flaresolverr/flaresolverr:latest + ``` +2. In Admin Settings → E-book Sidecar, enter: `http://localhost:8191` +3. Click "Test Connection" to verify + +### How It Works +1. Requests are routed through FlareSolverr +2. FlareSolverr loads the page in headless Chrome +3. If Cloudflare challenge appears, it waits for solution +4. HTML is returned after page loads +5. Falls back to direct requests if FlareSolverr fails + +### Performance Impact +- **First request:** ~5-10 seconds (browser startup) +- **Subsequent requests:** ~2-5 seconds per page +- **Total time:** ~15-30 seconds per e-book (vs ~5-15 without) + +## How It Works + +### Flow +1. **Trigger:** File organization completes audiobook copy +2. **Check:** `ebook_sidecar_enabled === 'true'` +3. **Search:** Try ASIN first (if available), then fall back to title + author +4. **Extract MD5:** First search result → MD5 hash +5. **Get Download Links:** Find "no waitlist" slow download links +6. **Extract URL:** Parse slow download page for actual file server URL +7. **Download:** Stream file to audiobook directory +8. **Rename:** Sanitize filename based on metadata + +### Scraping Strategy + +**Method 1: ASIN Search (exact match)** +``` +Search: https://annas-archive.li/search?ext=epub&q="asin:B09TWSRMCB" + ↓ +MD5 Page: https://annas-archive.li/md5/[md5] + ↓ (Filter: "slow partner server" links) +Slow Download: https://annas-archive.li/slow_download/[md5]/0/5 + ↓ (Parse for actual download URL) +File Server: http://[server-ip]:port/path/to/file.epub +``` + +**Method 2: Title + Author Search (fallback)** +``` +Search: https://annas-archive.li/search?q=The+Housemaid+Freida+McFadden + &ext=epub + &content=book_nonfiction&content=book_fiction&content=book_unknown + &lang=en + ↓ +(Same flow as ASIN search from MD5 page onwards) +``` + +### Matching Priority +1. **ASIN** (exact match - most accurate, if available) +2. **Title + Author** (fuzzy match with book/language filters) + +### Retry Logic +- **Max attempts:** 5 slow download links +- **Timeout:** 60 seconds per download +- **Delays:** 1.5 seconds between requests +- **Retries:** 3x for 5xx errors with exponential backoff + +## Format Support + +| Format | Extension | Recommended | Notes | +|--------|-----------|-------------|-------| +| EPUB | `.epub` | ✅ Yes | Most compatible with e-readers | +| PDF | `.pdf` | ⚠️ Sometimes | Best for fixed-layout books | +| MOBI | `.mobi` | ⚠️ Legacy | Kindle (older devices) | +| AZW3 | `.azw3` | ⚠️ Sometimes | Kindle (newer devices) | +| Any | `[first available]` | ❌ No | Downloads first match | + +**Recommendation:** Use EPUB for maximum compatibility. + +## File Naming + +**Pattern:** `[Title] - [Author].[format]` + +**Sanitization:** +- Remove invalid chars: `<>:"/\|?*` +- Collapse multiple spaces +- Trim leading/trailing spaces and dots +- Limit to 100 characters + +**Examples:** +- `The Housemaid - Freida McFadden.epub` +- `Project Hail Mary - Andy Weir.pdf` + +## Error Handling + +**Graceful Failures (non-blocking):** +- No ASIN available → Skip silently (log info) +- No search results → Log warning, continue audiobook +- No download links → Log warning, continue audiobook +- All downloads fail → Log error, continue audiobook +- Download timeout → Log error, continue audiobook + +**Never Blocks Audiobook:** +- All e-book errors are non-fatal +- Audiobook organization completes regardless +- Errors logged to job events (visible in admin) + +## Logging + +**Success (with FlareSolverr):** +``` +E-book sidecar enabled, searching for e-book... +Using FlareSolverr at http://localhost:8191 +Searching by ASIN: B09TWSRMCB (format: epub)... +Found via ASIN: 3b6f9c0f1665c4ba6e3214d43c37e1de +Found MD5: 3b6f9c0f1665c4ba6e3214d43c37e1de +Found 8 download link(s) +Attempting download link 1/5... +Downloading from: 93.123.118.12 +E-book downloaded: The Housemaid - Freida McFadden.epub +``` + +**Success (ASIN match, direct):** +``` +E-book sidecar enabled, searching for e-book... +Searching by ASIN: B09TWSRMCB (format: epub)... +Found via ASIN: 3b6f9c0f1665c4ba6e3214d43c37e1de +Found MD5: 3b6f9c0f1665c4ba6e3214d43c37e1de +Found 8 download link(s) +Attempting download link 1/5... +Downloading from: 93.123.118.12 +E-book downloaded: The Housemaid - Freida McFadden.epub +``` + +**Success (Title fallback):** +``` +E-book sidecar enabled, searching for e-book... +Searching by ASIN: B09TWSRMCB (format: epub)... +No results for ASIN, falling back to title + author search... +Searching by title + author: "The Housemaid" by Freida McFadden... +Found via title search: 3b6f9c0f1665c4ba6e3214d43c37e1de +Found MD5: 3b6f9c0f1665c4ba6e3214d43c37e1de +Found 8 download link(s) +E-book downloaded: The Housemaid - Freida McFadden.epub +``` + +**Failure:** +``` +E-book sidecar enabled, searching for e-book... +Searching by ASIN: B09TWSRMCB (format: epub)... +No results for ASIN, falling back to title + author search... +Searching by title + author: "The Housemaid" by Freida McFadden... +No search results found for title: "The Housemaid" by Freida McFadden +E-book download failed: No search results found (tried ASIN and title+author) +``` + +## Troubleshooting + +### E-book Not Downloaded + +**Cause:** No matching e-book in Anna's Archive (tried ASIN and title+author) +**Solution:** Not all audiobooks have e-book equivalents, this is expected + +**Cause:** ASIN mismatch (Anna's Archive has different ASIN) +**Solution:** Feature now automatically falls back to title + author search + +**Cause:** All download links failed +**Solution:** Check job logs for errors, may be temporary server issues + +### Wrong Format Downloaded + +**Cause:** Preferred format not available +**Solution:** Anna's Archive doesn't have that format, falls back to available format + +### Download Timeout + +**Cause:** Slow file server or large file +**Solution:** Automatic retry with next download link + +### Feature Not Working + +**Cause:** Feature disabled +**Solution:** Admin Settings → E-book Sidecar → Enable toggle + +### Cloudflare Blocking + +**Cause:** Anna's Archive has Cloudflare protection enabled +**Solution:** Configure FlareSolverr (see FlareSolverr Integration section) + +**Symptoms:** +- No search results found +- Requests timing out +- Errors about Cloudflare challenge + +### FlareSolverr Not Working + +**Cause:** FlareSolverr not running or unreachable +**Solution:** +1. Verify FlareSolverr is running: `docker ps | grep flaresolverr` +2. Check URL is correct (usually `http://localhost:8191`) +3. Test connection in Admin Settings + +**Cause:** FlareSolverr timing out +**Solution:** FlareSolverr may need more time; check container logs for errors + +## Security & Legal + +**Important Notes:** +- Anna's Archive is a shadow library +- Use at your own discretion and responsibility +- Ensure compliance with local laws and regulations +- Feature is optional and disabled by default +- No API key required (web scraping) + +**Privacy:** +- User-Agent: `ReadMeABook/1.0 (Audiobook Automation)` +- No tracking or analytics +- Distributed (each user scrapes for themselves) + +## Technical Implementation + +**Files:** +- Service: `src/lib/services/ebook-scraper.ts` +- Integration: `src/lib/utils/file-organizer.ts` (line 265+) +- Settings API: `src/app/api/admin/settings/ebook/route.ts` +- FlareSolverr Test API: `src/app/api/admin/settings/ebook/test-flaresolverr/route.ts` +- UI: `src/app/admin/settings/page.tsx` (ebook tab) + +**Dependencies:** +- axios (HTTP requests) +- cheerio (HTML parsing) +- fs/promises (file operations) + +**Caching:** +- MD5 lookups cached in-memory (prevents re-scraping same ASIN) +- Cache cleared on service restart + +## Performance + +**Impact:** +- **Network:** 3-5 requests per e-book (search, MD5, slow download pages) +- **Time:** ~5-15 seconds per e-book (depends on file server) +- **Storage:** E-books typically 1-50 MB +- **CPU:** Minimal (streaming download) + +## Limitations + +1. **Match Accuracy:** Title + author search may return wrong book if title is common +2. **Format Availability:** Depends on Anna's Archive catalog +3. **Download Speed:** Depends on file server load +4. **Language:** Title search filters for English books only +5. **Success Rate:** ~70-90% (ASIN has higher accuracy, title fallback is less precise) + +## Future Enhancements + +- ISBN-13 fallback matching (between ASIN and title search) +- Format preference priority list (try EPUB, then PDF, then MOBI) +- Per-request override (API endpoint) +- Statistics tracking (success rate, formats, match method) +- Rate limit monitoring +- Relevance scoring for title search results + +## Related +- [File Organization](../phase3/file-organization.md) - Where e-book download happens +- [Settings Pages](../settings-pages.md) - Configuration UI +- [Configuration Service](../backend/services/config.md) - Settings storage diff --git a/documentation/phase3/ranking-algorithm.md b/documentation/phase3/ranking-algorithm.md index 7cde914..5d3edc5 100644 --- a/documentation/phase3/ranking-algorithm.md +++ b/documentation/phase3/ranking-algorithm.md @@ -30,11 +30,16 @@ Evaluates and scores torrents to automatically select best audiobook download. - Example: "We Are Legion (We Are Bob)" tries both full title and "We Are Legion" - Handles torrents that include subtitle AND those that omit it - Complete title match requirements (both must be true): - - No significant words BEFORE matched title (prevents "This Inevitable Ruin Dungeon Crawler Carl, Book 7") - - Followed by metadata markers: " by", " [", " -", " (", " {", " :", "," + - **Acceptable prefix** (any of these): + - No significant words before title (clean match) + - Title preceded by metadata separator (` - `, `: `, `—`) — handles "Author - Series - 01 - Title" + - Author name appears in prefix — handles "Author Name - Title" + - **Acceptable suffix**: Followed by metadata markers: " by", " [", " -", " (", " {", " :", "," or end of string - Complete match → 35 pts -- Title has prefix/suffix words OR continues with more words → fuzzy similarity (partial credit) -- Prevents series confusion: "The Housemaid" vs "The Housemaid's Secret", "Dungeon Crawler Carl" vs "Book 7" +- Unstructured prefix (words without separators) → fuzzy similarity (partial credit) + - Prevents: "This Inevitable Ruin Dungeon Crawler Carl" matching "Dungeon Crawler Carl" +- Suffix continues with non-metadata → fuzzy similarity (partial credit) + - Prevents: "The Housemaid's Secret" matching "The Housemaid" - No substring match → fuzzy similarity (best score from full or required title) **Stage 3: Author Matching (0-15 pts)** diff --git a/src/app/admin/components/RecentRequestsTable.tsx b/src/app/admin/components/RecentRequestsTable.tsx index b1e05ce..14c1e5a 100644 --- a/src/app/admin/components/RecentRequestsTable.tsx +++ b/src/app/admin/components/RecentRequestsTable.tsx @@ -102,6 +102,9 @@ export function RecentRequestsTable({ requests }: RecentRequestsTableProps) { await mutate('/api/admin/requests/recent'); await mutate('/api/admin/metrics'); + // Invalidate audiobook caches to update request status on home/search pages + await mutate((key) => typeof key === 'string' && key.includes('/api/audiobooks')); + // Close dialog setShowDeleteConfirm(false); setSelectedRequest(null); diff --git a/src/app/admin/settings/page.tsx b/src/app/admin/settings/page.tsx index d695c4a..0dd85dc 100644 --- a/src/app/admin/settings/page.tsx +++ b/src/app/admin/settings/page.tsx @@ -82,6 +82,12 @@ interface Settings { mediaDir: string; metadataTaggingEnabled: boolean; }; + ebook: { + enabled: boolean; + preferredFormat: string; + baseUrl: string; + flaresolverrUrl: string; + }; } interface PendingUser { @@ -127,7 +133,7 @@ export default function AdminSettings() { const [message, setMessage] = useState<{ type: 'success' | 'error'; text: string } | null>( null ); - const [activeTab, setActiveTab] = useState<'library' | 'auth' | 'prowlarr' | 'download' | 'paths' | 'account' | 'bookdate'>('library'); + const [activeTab, setActiveTab] = useState<'library' | 'auth' | 'prowlarr' | 'download' | 'paths' | 'ebook' | 'account' | 'bookdate'>('library'); // Password change form state const [passwordForm, setPasswordForm] = useState({ @@ -147,6 +153,14 @@ export default function AdminSettings() { const [testingBookdate, setTestingBookdate] = useState(false); const [clearingBookdateSwipes, setClearingBookdateSwipes] = useState(false); + // FlareSolverr testing state + const [testingFlaresolverr, setTestingFlaresolverr] = useState(false); + const [flaresolverrTestResult, setFlaresolverrTestResult] = useState<{ + success: boolean; + message: string; + responseTime?: number; + } | null>(null); + useEffect(() => { fetchSettings(); fetchCurrentUser(); @@ -460,6 +474,73 @@ export default function AdminSettings() { } }; + const handleSaveEbookSettings = async () => { + if (!settings) return; + + setSaving(true); + setMessage(null); + + try { + const response = await fetchWithAuth('/api/admin/settings/ebook', { + method: 'PUT', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ + enabled: settings.ebook?.enabled || false, + format: settings.ebook?.preferredFormat || 'epub', + baseUrl: settings.ebook?.baseUrl || 'https://annas-archive.li', + flaresolverrUrl: settings.ebook?.flaresolverrUrl || '', + }), + }); + + if (!response.ok) { + throw new Error('Failed to save e-book settings'); + } + + setMessage({ type: 'success', text: 'E-book sidecar settings saved successfully!' }); + // Update original settings to reflect the saved state + setOriginalSettings(JSON.parse(JSON.stringify(settings))); + setTimeout(() => setMessage(null), 3000); + } catch (error) { + setMessage({ + type: 'error', + text: error instanceof Error ? error.message : 'Failed to save e-book settings', + }); + } finally { + setSaving(false); + } + }; + + const testFlaresolverrConnection = async () => { + if (!settings?.ebook?.flaresolverrUrl) { + setFlaresolverrTestResult({ + success: false, + message: 'Please enter a FlareSolverr URL first', + }); + return; + } + + setTestingFlaresolverr(true); + setFlaresolverrTestResult(null); + + try { + const response = await fetchWithAuth('/api/admin/settings/ebook/test-flaresolverr', { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ url: settings.ebook.flaresolverrUrl }), + }); + + const result = await response.json(); + setFlaresolverrTestResult(result); + } catch (error) { + setFlaresolverrTestResult({ + success: false, + message: error instanceof Error ? error.message : 'Test failed', + }); + } finally { + setTestingFlaresolverr(false); + } + }; + const testPlexConnection = async () => { if (!settings) return; @@ -924,6 +1005,7 @@ export default function AdminSettings() { { id: 'prowlarr', label: 'Indexers', icon: '🔍' }, { id: 'download', label: 'Download Client', icon: '⬇️' }, { id: 'paths', label: 'Paths', icon: '📁' }, + { id: 'ebook', label: 'E-book Sidecar', icon: '📖' }, { id: 'bookdate', label: 'BookDate', icon: '📚' }, ...(isLocalAdmin ? [{ id: 'account', label: 'Account', icon: '🔒' }] : []), ]; @@ -1915,6 +1997,201 @@ export default function AdminSettings() { )} + {/* E-book Sidecar Tab */} + {activeTab === 'ebook' && ( +
+
+

+ E-book Sidecar +

+

+ Automatically download e-books from Anna's Archive to accompany your audiobooks. + E-books are placed in the same folder as the audiobook files. +

+
+ + {/* Enable Toggle */} +
+
+ { + setSettings({ + ...settings, + ebook: { ...settings.ebook, enabled: e.target.checked }, + }); + }} + className="mt-1 h-5 w-5 rounded border-gray-300 text-blue-600 focus:ring-blue-500" + /> +
+ +

+ When enabled, the system will search for e-books matching your audiobook's ASIN + and download them to the same folder. +

+
+
+
+ + {/* Format Selection */} + {settings.ebook?.enabled && ( +
+ + +

+ EPUB is recommended for most e-readers. "Any format" will download the first available format. +

+
+ )} + + {/* Base URL (Advanced) */} + {settings.ebook?.enabled && ( +
+ + { + setSettings({ + ...settings, + ebook: { ...settings.ebook, baseUrl: e.target.value }, + }); + }} + placeholder="https://annas-archive.li" + className="font-mono" + /> +

+ Change this if the primary Anna's Archive mirror is unavailable. +

+
+ )} + + {/* FlareSolverr (Optional - for Cloudflare bypass) */} + {settings.ebook?.enabled && ( +
+
+ +
+ { + setSettings({ + ...settings, + ebook: { ...settings.ebook, flaresolverrUrl: e.target.value }, + }); + setFlaresolverrTestResult(null); + }} + placeholder="http://localhost:8191" + className="font-mono flex-1" + /> + +
+

+ FlareSolverr helps bypass Cloudflare protection on Anna's Archive. + Leave empty if not needed. +

+ {flaresolverrTestResult && ( +
+ {flaresolverrTestResult.success ? '✓ ' : '✗ '} + {flaresolverrTestResult.message} +
+ )} +
+ {!settings.ebook?.flaresolverrUrl && ( +
+

+ Note: Without FlareSolverr, e-book downloads may fail if Anna's Archive + has Cloudflare protection enabled. Success rates are typically lower without it. +

+
+ )} +
+ )} + + {/* Info Box */} +
+

+ How it works +

+ +
+ + {/* Warning Box */} +
+

+ ⚠️ Important Note +

+

+ Anna's Archive is a shadow library. Use of this feature is at your own discretion and responsibility. + Ensure compliance with your local laws and regulations. +

+
+ + {/* Save Button */} +
+ +
+
+ )} + {/* BookDate Tab */} {activeTab === 'bookdate' && (
@@ -2738,8 +3015,8 @@ export default function AdminSettings() { )}
- {/* Footer - Hide for Account tab */} - {activeTab !== 'account' && activeTab !== 'bookdate' && ( + {/* Footer - Hide for Account, BookDate, and E-book tabs (they have their own save buttons) */} + {activeTab !== 'account' && activeTab !== 'bookdate' && activeTab !== 'ebook' && (
)} + + {/* Processing Badge - show when status is 'downloaded' */} + {audiobook.requestStatus === 'downloaded' && ( +
+ + + + + Processing +
+ )}
{/* Content */} @@ -162,12 +173,17 @@ export function AudiobookCard({ } // Check if book is requested and in progress (non-re-requestable statuses) - const inProgressStatuses = ['pending', 'awaiting_search', 'searching', 'downloading', 'processing', 'awaiting_import']; + const inProgressStatuses = ['pending', 'awaiting_search', 'searching', 'downloading', 'processing', 'downloaded', 'awaiting_import']; if (audiobook.isRequested && audiobook.requestStatus && inProgressStatuses.includes(audiobook.requestStatus)) { - // Show who requested it - const buttonText = audiobook.requestedByUsername - ? `Requested by ${audiobook.requestedByUsername}` - : 'Requested'; + // Special text for 'downloaded' status (waiting for Plex scan) + let buttonText; + if (audiobook.requestStatus === 'downloaded') { + buttonText = 'Processing...'; + } else { + buttonText = audiobook.requestedByUsername + ? `Requested by ${audiobook.requestedByUsername}` + : 'Requested'; + } return (