MethodologyOct 20245 min read

AI voice vs human voice. What to listen for.

The acoustic tells that still give synthetic voices away, even at the current state of the art. Useful when you do not have time to run a detector pass.

By the teamLondonField guide

No single tell is conclusive. Several tells together are a strong signal that the audio is worth verifying.

Prosody

Real speech has irregular cadence. Speakers stress different words across similar sentences. Synthesized voices often hit the same rhythm for similar phrases, especially in longer outputs. Listen to two consecutive sentences with similar structure. If the rhythm is identical, that is a tell.

Breath and pauses

Humans breathe. The intake is usually audible at the start of phrases. Synthesized voices in 2024 still skip or fake breath in ways that sound mechanical when you listen closely. Pauses for thought are also a tell: real speakers pause unevenly. Synthesized voices pause on a metronome.

Room tone

Real recordings carry the room. Even a quiet office has HVAC, distant traffic, a refrigerator. Synthesized audio is often suspiciously clean. If the background is completely silent, that is a tell.

Dynamic range

Real voices get louder and softer through a sentence. Synthesized voices often stay in a narrow band. Listen for emotional variation: a real angry sentence is loud; a real worried sentence is quiet. Synthesized "angry" often sounds the same volume as synthesized "calm".

Sibilance

The "s" sounds. Synthesized voices in 2024 can produce sibilance with a slightly metallic edge, especially when the underlying model was trained on compressed audio. Once you hear it, you cannot unhear it.

A practical checklist

  1. Play the clip twice through headphones (laptop speakers will mask the tells).
  2. Check prosody on two similar sentences.
  3. Listen for breath at phrase boundaries.
  4. Check the silence between phrases.
  5. Notice the "s" sounds.
  6. If two or more tells fire, run the detector.
The acoustic tells will not last. The detector exists because the human ear stops being reliable somewhere between this year and next.
When the tells fire, verify with the detector.
Open detector