MethodologyApr 202514 min read

How to detect AI voices. Voice cloning and audio deepfakes.

A step-by-step guide for verifying suspicious audio. Written for journalists, investigators, compliance teams, and anyone who has been forwarded a voicemail that "felt off."

By the teamLondonMethodology v3.2

What is an AI voice?

An AI voice is audio of a human voice generated by a machine learning model rather than recorded from a human. The two common categories are voice cloning, which produces speech in the voice of a specific person, and generic synthesis, which produces speech in a generic voice. Both are increasingly hard to distinguish from real audio by ear alone.

The current generation of models (ElevenLabs v2, Resemble v3, PlayHT, OpenAI's TTS, and others) can clone a voice from as little as three seconds of source audio. That is the world we live in now.

Why AI voice scams are rising

Three reasons:

  • The bar to clone a voice has collapsed. A few years ago, voice cloning required hours of clean studio audio and serious compute. Today it takes seconds of any audio (a TikTok, a voicemail) and a free trial of a hosted service.
  • The attack surface is huge. Anyone who answers the phone is a target. CEO fraud, family ransom scams, election robocalls, and customer-service impersonation all use the same underlying technique.
  • Detection has lagged. Until recently, voice detection was research-grade only. There was no consumer-accessible way to run a verdict on a suspicious clip.

Fast checklist: signs a voice may be AI

If you have 30 seconds and need to make a snap judgment, listen for these:

  • Unnatural pauses. Real speech has irregular pauses for breath and thought. Synthesized speech often paces too evenly.
  • Missing room tone. Real recordings carry background noise (HVAC, traffic, a TV). Synthesized audio is often too clean.
  • Identical prosody across sentences. Real speakers vary cadence and pitch. Synthesized voices often hit the same rhythm twice in a row.
  • Compressed dynamic range. Real voices get louder and softer. Synthesized voices often stay in a narrow band.
  • "Plastic" sibilance. The "s" sounds in synthesized speech can have a metallic edge that real voices do not.

None of these alone is conclusive. All five together is a strong tell. The next section shows you how to verify quickly.

Step by step: how to verify suspicious audio

1. Save the audio file

If the audio came through a messaging app, download the file. Do not screen-record. The compression added by screen recording destroys the acoustic signature we rely on.

2. Run it through the detector

Drop the file at aivoicedetector.com/is-this-ai. The verdict appears in about half a second. You will see a probability, a model attribution (when we can identify the specific generator), and a confidence number.

3. Read the verdict carefully

A high confidence number means we are very sure. A confidence between 40% and 60% means we genuinely cannot tell, usually because the audio is heavily compressed or very short. In those cases, find a longer or cleaner sample.

4. Cross-check

If the audio claims to be a specific person saying something specific, call that person on a verified number and ask. Detector verdicts are evidence, not proof. They join other evidence in a chain.

5. File the verdict if it matters

If you are publishing a story or making a legal decision, sign in and save the verdict to your dossier. Every saved verdict gets a permanent URL with a citation in APA format that holds up in court or an editor's office.

Tools we recommend

  • The web detector for one-off verification. Free, no account.
  • The Chrome extension for in-browser audio (WhatsApp Web, YouTube, podcasts).
  • The API for newsrooms and call centers processing audio at scale.

A note on real cases

We have flagged audio in the 2024 election robocall campaigns, in the Hong Kong $25 million CFO deepfake fraud, and in dozens of journalist-submitted clips that turned out to be authentic recordings of real people saying real things. The detector is right roughly 99% of the time on clean audio. We tell you when we are not sure.

The detector returns a probability. It is not a verdict on the speaker. It is a verdict on the audio. Read it that way.
Have an audio clip you want to verify? The detector is free for a single verdict and takes about half a second.
Open detector