Character

Voice Fingerprinting

Flag character pairs whose dialogue patterns are nearly identical — they "sound" the same.

What It Does

Extracts dialogue attributed to named characters and builds a voice profile for each, measuring:

Average sentence length
Contraction rate (e.g., "can't" vs. "cannot")
Question frequency
Vocabulary overlap (Jaccard similarity)

Character pairs with near-identical profiles are flagged — they may be indistinguishable to readers.

Why It Matters

Every character should sound different. A teenager and a professor shouldn't use the same sentence length, vocabulary, and speech patterns. Distinct voices let readers identify speakers even without dialogue tags. When two characters "sound" identical, readability and characterization both suffer.

What Gets Flagged

Near-Identical Voice Profiles

Severity: Information

Example (flagged):

Voice fingerprint: Sarah and Marcus have near-identical dialogue patterns (avg length: 8 vs 9, contraction rate: 12% vs 14%, question rate: 20% vs 18%) — consider differentiating their voices

Why: Both characters use similar sentence lengths, contraction rates, and question frequencies. A reader wouldn't be able to tell who's speaking without tags.

How to differentiate:

Give one character longer, more formal sentences
Have one character use contractions while the other avoids them
Let one character ask more questions while the other makes declarations
Use distinct vocabulary or speech patterns (slang, technical jargon, etc.)

Requirements

At least 2 characters with 5+ attributed dialogue sentences each
Character attribution is detected via dialogue tag patterns ("said Sarah", "Marcus asked", etc.)
Pronouns (he, she, they, etc.) are ignored as character names

Configuration

No configuration options.

Technical Details

Source: prose-craft
Scope: Document-level (compares all character pairs)
Method: Dialogue extraction via regex, profile comparison using weighted distance metric (sentence length, contraction rate, question rate, vocabulary Jaccard index)