mirror of https://github.com/kikootwo/ReadMeABook.git synced 2026-06-02 20:30:10 +00:00

Files

T

kikootwo 4b90b35748 Add Transmission/NZBGet and per-client paths and much more

Extend multi-download-client support to include Transmission and NZBGet and introduce per-client custom download paths. Adds protocol mapping and new client types, Transmission/NZBGet integration services, API CRUD and validation changes, UI components/modal updates and live path previews, and manager routing by protocol. Includes DB migrations (download_path on download_history, interactive_search_access on users), schema updates, and related processor/service fixes and tests to ensure backward compatibility and proper path resolution.

2026-02-09 19:45:43 -05:00

15 KiB

Raw Blame History

Intelligent Ranking Algorithm

Status: ✅ Implemented | Comprehensive edge case test coverage Tests: tests/utils/ranking-algorithm.test.ts (80+ test cases)

Evaluates and scores torrents to automatically select best audiobook download.

Test Coverage

Comprehensive edge case testing includes:

✅ Parenthetical/bracketed content handling (4 tests)
✅ Structured metadata prefix validation (5 tests)
✅ Suffix validation (5 tests)
✅ Multi-author handling (6 tests)
✅ Bonus modifiers (indexer priority + flags, 7 tests)
✅ Tiebreaker sorting (2 tests)
✅ Word coverage edge cases (4 tests)
✅ Format detection (5 tests)
✅ Author presence check (10 tests)
✅ Context-aware filtering (3 tests)
✅ API compatibility (2 tests)
✅ CamelCase and punctuation separator handling (7 tests)

Tested edge cases prevent regressions from previous tweaks:

"We Are Legion (We Are Bob)" matching with/without subtitle
"This Inevitable Ruin Dungeon Crawler Carl" NOT matching "Dungeon Crawler Carl"
"The Housemaid's Secret" NOT matching "The Housemaid"
Multiple author splitting and role filtering
Flag bonus stacking and case-insensitive matching
Tiebreaker sorting by publish date
"Project Hail Mary" (no author) NOT matching when Andy Weir required (automatic mode)
All results shown in interactive mode regardless of author
Middle initials, name order, and role filtering for author matching

Scoring Criteria (100 points max)

1. Title/Author Match (60 pts max) - MOST IMPORTANT

Pre-Processing: Text Normalization

All titles and author names are normalized before matching
CamelCase splitting: "TheCorrespondent" → "the correspondent"
Punctuation to spaces: "Twelve.Months-Jim" → "twelve months jim"
Preserves apostrophes: "O'Brien" remains "o'brien"
Handles common indexer naming patterns (NZB, torrent scene releases)

Examples of normalization:

"VirginaEvans TheCorrespondent" → "virgina evans the correspondent"
"Twelve.Months-Jim.Butcher" → "twelve months jim butcher"
"Author_Name-Book.Title.2024" → "author name book title 2024"

Multi-Stage Matching:

Stage 1: Word Coverage Filter (MANDATORY)

Extracts significant words from request (filters stop words: "the", "a", "an", "of", "on", "in", "at", "by", "for")
Parenthetical/bracketed content is optional: Content in () [] {} treated as subtitle (may be omitted from torrents)
- "We Are Legion (We Are Bob)" → Required: ["we", "are", "legion"], Optional: ["bob"]
- "Title [Series Name]" → Required: ["title"], Optional: ["series", "name"]
- "Book Title {Extra Info}" → Required: ["book", "title"], Optional: ["extra", "info"]
Calculates coverage: % of required words found in torrent title
Hard requirement: 80%+ coverage of required words or automatic 0 score

Stage 1.5: Author Presence Check (CONTEXT-AWARE)

Automatic mode (requireAuthor: true - default): At least ONE author must be present with high confidence
Interactive mode (requireAuthor: false): Check disabled, all results shown to user
High confidence = any of:
1. Exact substring match: "dennis e. taylor" in torrent
2. High fuzzy similarity (≥ 0.85): handles spacing/punctuation
3. Core components present: First name + Last name within 30 chars
Handles variations:
- Middle initials: "Dennis E. Taylor" ↔ "Dennis Taylor"
- Name order: "Brandon Sanderson" ↔ "Sanderson, Brandon"
- Multiple authors: Only ONE needs to match (OR logic)
- Filters roles: "translator", "narrator" ignored
If check fails in automatic mode → automatic 0 score
Prevents wrong-author matches: Stops "Project Hail Mary" (no author) from matching request for Andy Weir

Edge Cases - Coverage Examples:

"The Wild Robot on the Island" → ["wild", "robot", "island"]
- ✅ "The Wild Robot on the Island" → 3/3 = 100% → PASSES
- ❌ "The Wild Robot" → 2/3 = 67% → REJECTED
"We Are Legion (We Are Bob)" → Required: ["we", "are", "legion"]
- ✅ "Dennis E. Taylor - Bobiverse - 01 - We Are Legion" → 3/3 = 100% → PASSES
- ✅ "We Are Legion (We Are Bob)" → 3/3 = 100% → PASSES
"Harry Potter and the Philosopher Stone" → ["harry", "potter", "philosopher", "stone"] (stop words filtered)
- ✅ "Harry Potter Philosopher Stone" → 4/4 = 100% → PASSES
- ❌ "Harry Potter" → 2/4 = 50% → REJECTED
Prevents wrong series books from matching while handling common subtitle patterns

Stage 2: Title Matching (0-45 pts)

Only scored if Stage 1 passes
Tries full title first, then required title (without parentheses) if no match
- Example: "We Are Legion (We Are Bob)" tries both full title and "We Are Legion"
- Handles torrents that include subtitle AND those that omit it
Complete title match requirements (both must be true):
- Acceptable prefix (any of these):
  - No significant words before title (clean match)
  - Title preceded by metadata separator (-, : , —) — handles "Author - Series - 01 - Title"
  - Author name appears in prefix — handles "Author Name - Title"
- Acceptable suffix: Followed by metadata markers: " by", " [", " -", " (", " {", " :", "," or end of string
  - Also accepts author name in suffix (e.g., "Title AuthorName Year")
Complete match → 45 pts
Unstructured prefix (words without separators) → fuzzy similarity (partial credit)
Suffix continues with non-metadata → fuzzy similarity (partial credit)
No substring match → fuzzy similarity (best score from full or required title)

Edge Cases - Prefix Validation:

✅ "Brandon Sanderson - Mistborn - 01 - The Final Empire" (structured metadata prefix)
✅ "Brandon Sanderson The Way of Kings" (author name in prefix)
✅ "Series Name: Book Title" (colon separator)
✅ "Author Name — Book Title" (em-dash separator)
❌ "This Inevitable Ruin Dungeon Crawler Carl" → REJECTED for "Dungeon Crawler Carl" (unstructured words before title)

Edge Cases - Suffix Validation:

✅ "The Great Book by Author Name" (metadata marker " by")
✅ "Book Title [Unabridged] (2024)" (bracketed metadata)
✅ "Book Title John Smith 2024" (author name in suffix)
✅ "Author - Book Title" (title at end of string)
❌ "The Housemaid's Secret - Freida McFadden" → REJECTED for "The Housemaid" (suffix continues with "'s Secret")

Stage 3: Author Matching (0-15 pts)

Exact substring match → proportional credit
No exact match → fuzzy similarity (partial credit)
Splits authors on delimiters (comma, &, "and", " - ")
Filters out roles ("translator", "narrator")
Order-independent, no structure assumptions
Ensures correct book is selected over wrong book with better format

Edge Cases - Multi-Author Handling:

✅ "Jane Doe, John Smith" → splits on comma
✅ "Jane Doe & John Smith" → splits on ampersand
✅ "Jane Doe and John Smith" → splits on "and"
✅ "Jane Doe, translator" → filters out "translator" role
✅ "Jane Doe, narrator" → filters out "narrator" role
Proportional credit: If 1 of 3 authors matches → 5 pts (1/3 × 15)
Proportional credit: If 2 of 3 authors match → 10 pts (2/3 × 15)
Full credit: If all authors match → 15 pts

2. Format Quality (10 pts max)

M4B with chapters: 10
M4B without chapters: 9
FLAC: 7 (lossless audio)
M4A: 6
MP3: 4
Other: 1

3. Seeder Count (15 pts max)

Formula: Math.min(15, Math.log10(seeders + 1) * 6)
1 seeder: 0pts, 10 seeders: 6pts, 100 seeders: 12pts, 1000+: 15pts
Note: Usenet/NZB results without seeders get full 15 pts (centralized availability)

Bonus Points System

Extensible multiplicative bonus system for external quality factors:

Indexer Priority Bonus (configurable 1-25, default: 10)

Formula: bonusPoints = baseScore × (priority / 25)
Priority 10/25 (40%) → 95 base score → +38 bonus = 133 final
Priority 20/25 (80%) → 95 base score → +76 bonus = 171 final
Priority 25/25 (100%) → 95 base score → +95 bonus = 190 final
Ensures high-quality torrent from low-priority indexer beats low-quality from high-priority
Bonus scales with quality (better torrents get more benefit from priority)

Indexer Flag Bonus (configurable -100% to +100%, default: 0%)

Formula: bonusPoints = baseScore × (modifier / 100)
Positive modifiers reward desired flags (e.g., "Freeleech" at +50%)
- +50% modifier → 85 base score → +42.5 bonus = 127.5 final
Negative modifiers penalize undesired flags (e.g., "Unwanted" at -60%)
- -60% modifier → 85 base score → -51 penalty = 34 final
Dual threshold filtering:
- Base score must be ≥ 50 (quality minimum)
- Final score must be ≥ 50 (not disqualified by negative bonuses)
- Negative bonuses can disqualify otherwise good torrents
Flag extraction from Prowlarr API:
- downloadVolumeFactor: 0 → "Freeleech"
- downloadVolumeFactor: <1 → "Partial Freeleech"
- uploadVolumeFactor: >1 → "Double Upload"
Case-insensitive, whitespace-trimmed matching
Universal across all indexers (not indexer-specific)
Multiple flag bonuses stack (additive)

Edge Cases - Flag Matching:

✅ "FREELEECH" matches config "freeleech" (case-insensitive)
✅ " Freeleech " matches config " Freeleech " (whitespace-trimmed)
✅ Multiple flags: ["Freeleech", "Double Upload"] → both bonuses applied
Example stacking: Freeleech (+50%) + Double Upload (+25%) on 80 base score
- Freeleech bonus: 80 × 0.5 = +40
- Double Upload bonus: 80 × 0.25 = +20
- Total bonus: +60 points
- Final score: 80 + 60 = 140

Future Modifiers (planned):

User preferences
Custom rules

Final Score Calculation:

Calculate base score (0-100) using standard criteria
Calculate bonus modifiers (indexer priority, flag bonuses, etc.)
Sum bonus points
Final score = base score + bonus points
Apply dual threshold filter:
- Base score ≥ 50 (quality minimum)
- Final score ≥ 50 (not disqualified by negative bonuses)
Sort by final score (descending), then publish date (descending)

Tiebreaker Sorting

When multiple torrents have identical final scores:

Secondary sort: Publish date descending (newest first)
Ensures latest uploads are preferred when quality is equal
Example: 3 torrents with 171 final score → newest upload ranks #1

Edge Cases - Tiebreaker Examples:

✅ Same score, different dates:
- Torrent A: Score 85, published 2024-06-01 → Ranks #1
- Torrent B: Score 85, published 2023-01-01 → Ranks #2
❌ Different scores, ignore date:
- Torrent A: Score 95, published 2020-01-01 → Ranks #1 (better match wins despite older date)
- Torrent B: Score 75, published 2024-01-01 → Ranks #2

Interface

interface IndexerFlagConfig {
  name: string;         // Flag name (e.g., "Freeleech")
  modifier: number;     // -100 to 100 (percentage)
}

interface RankTorrentsOptions {
  indexerPriorities?: Map<number, number>;  // indexerId -> priority (1-25)
  flagConfigs?: IndexerFlagConfig[];        // Flag bonus configurations
  requireAuthor?: boolean;                  // Enforce author check (default: true)
}

interface BonusModifier {
  type: 'indexer_priority' | 'indexer_flag' | 'custom';
  value: number;        // Multiplier (e.g., 0.4 for 40%)
  points: number;       // Calculated bonus points
  reason: string;       // Human-readable explanation
}

interface TorrentResult {
  // ... existing fields
  flags?: string[];     // Extracted flags from Prowlarr API
}

interface RankedTorrent extends TorrentResult {
  score: number;              // Base score (0-100)
  bonusModifiers: BonusModifier[];
  bonusPoints: number;        // Sum of all bonus points
  finalScore: number;         // score + bonusPoints
  rank: number;
  breakdown: {
    formatScore: number;
    seederScore: number;
    matchScore: number;
    totalScore: number;      // Same as score
    notes: string[];
  };
}

// New API (recommended)
function rankTorrents(
  torrents: TorrentResult[],
  audiobook: AudiobookRequest,
  options?: RankTorrentsOptions
): RankedTorrent[];

// Legacy API (backwards compatible)
function rankTorrents(
  torrents: TorrentResult[],
  audiobook: AudiobookRequest,
  indexerPriorities?: Map<number, number>,
  flagConfigs?: IndexerFlagConfig[]
): RankedTorrent[];

Usage Examples

Automatic selection (strict author filtering):

// Background job - safe auto-download
const ranked = rankTorrents(torrents, audiobook, {
  indexerPriorities,
  flagConfigs,
  requireAuthor: true  // Default - prevents wrong authors
});

const topResult = ranked[0];  // Safe to auto-download

Interactive search (show all results):

// User browsing - let user decide
const ranked = rankTorrents(torrents, audiobook, {
  indexerPriorities,
  flagConfigs,
  requireAuthor: false  // Show everything, including edge cases
});

return ranked;  // User can see torrents without author info

Ebook Torrent Ranking

The ranking algorithm also supports ebook torrents from indexers with ebook-specific scoring.

Unified Code Architecture

Ebook ranking reuses the following from audiobook ranking:

scoreMatch() - Title/author matching (60 pts)
scoreSeeders() - Seeder count scoring (15 pts)
Bonus modifier system (indexer priority, flag bonuses)
Dual threshold filtering (base >= 50, final >= 50)

Ebook-Specific Scoring

Format Match (10 pts max)

10 pts if torrent format matches preferred format
0 pts otherwise (no partial credit)
Format detected from torrent title keywords: .epub, .pdf, .mobi, .azw3, etc.

Size Quality (15 pts max, INVERTED)

< 5 MB: 15 pts (optimal for ebooks)
5-15 MB: 10 pts (may have images)
15-20 MB: 5 pts (large but acceptable)
20 MB: Filtered out (too large for ebooks)

Ebook vs Audiobook Comparison

Component	Audiobook	Ebook
Title/Author	60 pts (reused)	60 pts (reused)
Format	10 pts (M4B > M4A > MP3)	10 pts (match = 10, else 0)
Size	15 pts (larger = better)	15 pts (smaller = better)
Seeders	15 pts (reused)	15 pts (reused)
Size Filter	< 20 MB filtered	> 20 MB filtered

Ebook Interface

interface EbookTorrentRequest {
  title: string;
  author: string;
  preferredFormat: string;  // 'epub', 'pdf', 'mobi', etc.
}

interface RankEbookTorrentsOptions {
  indexerPriorities?: Map<number, number>;
  flagConfigs?: IndexerFlagConfig[];
  requireAuthor?: boolean;  // Default: true
}

function rankEbookTorrents(
  torrents: TorrentResult[],
  ebook: EbookTorrentRequest,
  options?: RankEbookTorrentsOptions
): RankedEbookTorrent[];

Ebook Usage Example

// Ebook search from indexers
const ranked = rankEbookTorrents(prowlarrResults, {
  title: 'Project Hail Mary',
  author: 'Andy Weir',
  preferredFormat: 'epub',
}, {
  indexerPriorities,
  flagConfigs,
  requireAuthor: true,
});

const bestEbook = ranked[0];  // Safe to auto-download

Tech Stack

string-similarity (fuzzy matching)
Regex for format detection

15 KiB Raw Blame History Unescape Escape