MethodologyApr 202517 min read

How to detect AI voices. Voice cloning and audio deepfakes.

A step-by-step guide for verifying suspicious audio. Written for journalists, investigators, compliance teams, and anyone who has been forwarded a voicemail that "felt off."

By the AI Voice Detector Editorial Team · London · Methodology v3.2 · Updated July 2026

To detect an AI voice, download the original audio file and run it through a detector such as the one on this site, then read the probability and confidence it returns. By ear, listen for pacing that is too even, missing room tone, and identical prosody across sentences. No single tell is proof, so pair a detector verdict with a call-back on a trusted number.

What is an AI voice?

An AI voice is audio of a human voice generated by a machine learning model rather than recorded from a human. The two common categories are voice cloning, which produces speech in the voice of a specific person, and generic synthesis, which produces speech in a generic voice. Both are increasingly hard to distinguish from real audio by ear alone.

The current generation of models (ElevenLabs v2, Resemble v3, PlayHT, OpenAI's TTS, and others) can clone a voice from as little as three seconds of source audio. That is the world we live in now.

The detector reads spectral, prosodic, and timing traces the ear misses

Signal	What to listen for	Why it appears
Prosody	Repeated rhythm across similar sentences	Models reuse intonation contours
Breath	Missing or metronomic breaths	Synthesis does not breathe like a body
Room tone	Suspiciously clean or looped background	No real room was recorded
Sibilance	A metallic edge on s and sh	Artifacts from compressed training audio

Why AI voice scams are rising

Three reasons:

The bar to clone a voice has collapsed. A few years ago, voice cloning required hours of clean studio audio and serious compute. Today it takes seconds of any audio (a TikTok, a voicemail) and a free trial of a hosted service.
The attack surface is huge. Anyone who answers the phone is a target. CEO fraud, family ransom scams, election robocalls, and customer-service impersonation all use the same underlying technique.
Detection has lagged. Until recently, voice detection was research-grade only. There was no consumer-accessible way to run a verdict on a suspicious clip.

How I actually listen: headphones, twice. Once for the words, once only for the sound. Laptop speakers hide the exact cues that matter.

I want to be honest about the ceiling here. Listening is a useful first filter, and I use it, but on clean current-generation audio I know my ear performs close to chance, and so does yours. The tells I trust most are the ones that survive a bad connection: prosody that repeats, breaths on a metronome, a background that never changes. When two or more of those fire together I stop guessing and run the clip, because the model reads the acoustic fingerprint underneath all of it, which is the part an attacker cannot easily launder.

Fast checklist: signs a voice may be AI

If you have 30 seconds and need to make a snap judgment, listen for these:

Unnatural pauses. Real speech has irregular pauses for breath and thought. Synthesized speech often paces too evenly.
Missing room tone. Real recordings carry background noise (HVAC, traffic, a TV). Synthesized audio is often too clean.
Identical prosody across sentences. Real speakers vary cadence and pitch. Synthesized voices often hit the same rhythm twice in a row.
Compressed dynamic range. Real voices get louder and softer. Synthesized voices often stay in a narrow band.
"Plastic" sibilance. The "s" sounds in synthesized speech can have a metallic edge that real voices do not.

None of these alone is conclusive. All five together is a strong tell. The next section shows you how to verify quickly.

Step by step: how to verify suspicious audio

1. Save the audio file

If the audio came through a messaging app, download the file. Do not screen-record. The compression added by screen recording destroys the acoustic signature we rely on.

2. Run it through the detector

Drop the file at aivoicedetector.com/is-this-ai. The verdict appears in about half a second. You will see a probability, a model attribution (when we can identify the specific generator, e.g. ElevenLabs or others), and a confidence number.

3. Read the verdict carefully

A high confidence number means we are very sure. A confidence between 40% and 60% means we genuinely cannot tell, usually because the audio is heavily compressed or very short. In those cases, find a longer or cleaner sample.

4. Cross-check

If the audio claims to be a specific person saying something specific, call that person on a verified number and ask. Detector verdicts are evidence, not proof. They join other evidence in a chain.

5. File the verdict if it matters

If you are publishing a story or making a legal decision, sign in and save the verdict to your dossier. Every saved verdict gets a permanent URL with a citation in APA format that holds up in court or an editor's office.

One more thing I have learned the hard way: the goal is not to win an argument about whether a clip sounds fake. It is to produce an answer you can defend. I would rather hand someone a probability with a stated confidence and a published method behind it than insist I can hear the seam, because the moment the stakes are real, my ear is not evidence and neither is yours. The detector exists precisely for that handoff, from a gut feeling to something checkable.

Tools we recommend

The web detector for one-off verification. Free, no account.
The Chrome extension for in-browser audio (WhatsApp Web, YouTube, podcasts).
The API for newsrooms and call centers processing audio at scale.

A note on real cases

AI voice fraud is no longer hypothetical. It has surfaced in election robocalls, in executive-impersonation scams that moved real money, and in a steady stream of clips forwarded to newsrooms that turn out, on inspection, to be authentic recordings of real people. The pattern is always the same: a convincing voice, an urgent request, and no easy way to check. The detector is right roughly 99% of the time on clean audio, and it tells you plainly when it is not sure rather than guessing.

Frequently asked questions

Can AI voices be detected?

Yes. A detector reads the acoustic and statistical signature that synthesis leaves behind, which is present even when a clip sounds convincing to the ear. Accuracy is highest on clean audio and drops on short or heavily compressed clips, where a good detector lowers its stated confidence rather than guessing.

Are AI voice detectors accurate?

On clean audio, the detector here is right about 99% of the time. Accuracy falls as audio quality falls, so the honest number is the confidence returned with each verdict. Treat a detector result as strong evidence to combine with a call-back, not as standalone proof.

How can I detect an AI voice for free?

Download the original audio file and drop it into the detector. A single verdict is free with no account, and it returns a probability, a confidence level, and the recognized model where we can name it.

How do I know if a voicemail is AI-generated?

Save the voicemail file rather than screen-recording it, run it through the detector, and listen for the tells in the checklist above. If it claims to be someone you know asking for money or access, call that person back on a number you already trust.

Does detection work on phone-quality audio?

Often, but compression from phone lines and messaging apps lowers confidence. The detector reports that confidence, so a low-quality clip is never mistaken for a certain verdict. A longer or cleaner sample improves the result.

The detector returns a probability. It is not a verdict on the speaker. It is a verdict on the audio. Read it that way.

Have an audio clip you want to verify? The detector is free for a single verdict and takes about half a second.

Open detector