Skip to content

Character

Voice Fingerprinting

Flag character pairs whose dialogue patterns are nearly identical — they "sound" the same.

What It Does

Extracts dialogue attributed to named characters and builds a voice profile for each, measuring:

  • Average sentence length
  • Contraction rate (e.g., "can't" vs. "cannot")
  • Question frequency
  • Vocabulary overlap (Jaccard similarity)

Character pairs with near-identical profiles are flagged — they may be indistinguishable to readers.

Why It Matters

Every character should sound different. A teenager and a professor shouldn't use the same sentence length, vocabulary, and speech patterns. Distinct voices let readers identify speakers even without dialogue tags. When two characters "sound" identical, readability and characterization both suffer.

What Gets Flagged

Near-Identical Voice Profiles

Severity: Information

Example (flagged):

Voice fingerprint: Sarah and Marcus have near-identical dialogue patterns (avg length: 8 vs 9, contraction rate: 12% vs 14%, question rate: 20% vs 18%) — consider differentiating their voices

Why: Both characters use similar sentence lengths, contraction rates, and question frequencies. A reader wouldn't be able to tell who's speaking without tags.

How to differentiate:

  • Give one character longer, more formal sentences
  • Have one character use contractions while the other avoids them
  • Let one character ask more questions while the other makes declarations
  • Use distinct vocabulary or speech patterns (slang, technical jargon, etc.)

Requirements

  • At least 2 characters with 5+ attributed dialogue sentences each
  • Character attribution is detected via dialogue tag patterns ("said Sarah", "Marcus asked", etc.)
  • Pronouns (he, she, they, etc.) are ignored as character names

Configuration

No configuration options.

Technical Details

  • Source: prose-craft
  • Scope: Document-level (compares all character pairs)
  • Method: Dialogue extraction via regex, profile comparison using weighted distance metric (sentence length, contraction rate, question rate, vocabulary Jaccard index)