Free Demo
  • Linkedin
  • Twitter
  • Youtube

Daon named a Leader in the 2025 Gartner® Magic Quadrant™ for Identity Verification: READ MORE

Connect with a Daon solutions expert

Let us know how we can assist you

  • Product/Solution Information
  • Product Demonstration
  • Request for Proposal
  • Partnership Opportunities

See why many of the world’s strongest brands chose Daon to help them build lasting trust with their customers.

Voice Deepfakes Are Fooling Call Center Agents: Here’s How to Stop Them

Voice cloning has made contact centers a primary fraud target, with two-thirds of fraud losses traceable to the channel and synthetic audio now capable of defeating both knowledge-based authentication and agent intuition. Passive voice biometrics combined with real-time synthetic speech detection and continuous authentication shifts contact center security from what callers know to who they are, eliminating the trade-off between security and customer experience.



 

For decades, contact centers operated under a simple assumption: the person on the line was who they said they were, and a few well-chosen security questions would confirm it. That assumption has not aged well. Voice cloning technology has matured to a point where synthetic audio can replicate a person’s accent, cadence, emotional register, and breathing patterns with unsettling accuracy. The tools required to do this are widely available, often free, and require no specialized technical knowledge. What was once a capability reserved for well-resourced nation-state actors is now accessible to organized fraud rings, and increasingly, to individual bad actors operating at scale.

The contact center, long positioned as a customer service function, has become one of the most actively targeted fraud surfaces in financial services.

The numbers reflect this reality. As many as two-thirds of all fraud losses can be traced back to contact centers. For organizations processing millions of customer calls annually, that concentration of fraud exposure represents a material risk, one that cannot be addressed through agent training programs or tighter knowledge-based authentication policies.

Why the Contact Center Became a Preferred Target

The contact center’s vulnerability is structural evidenced by how the entire model is optimized for accessibility. Customers call when something has gone wrong, when they are frustrated, when they need resolution quickly. Agents are trained to resolve problems efficiently. Security measures that slow down that resolution introduce friction that customers experience negatively and that businesses measure as a cost. The result, over many years, has been a contact center environment where security has been subordinated to speed.

Knowledge-based authentication (KBA), the standard bearer of contact center security, depends on information that the customer can recall (account numbers, PINs, the last four digits of a social security number, the name of a first pet). The problem is that this same information circulates freely across data broker databases and dark web marketplaces. Fraudsters who want to impersonate a customer don’t need to crack sophisticated encryption. They need to buy a data file and make a phone call.

Voice deepfakes change the risk calculus further. Even when organizations layer call-back verification or agent intuition on top of KBA, they are relying on human perception to detect threats. A well-constructed voice clone does not sound artificial to an untrained ear, and in many cases, it does not sound artificial to a trained one.

What Agents Are Actually Hearing

Voice deepfakes targeting contact centers generally take one of two forms. The first involves pre-recorded synthetic audio, where a cloned voice is scripted in advance and injected into the call stream in response to agent prompts. The second involves real-time voice conversion, where a live fraudster’s voice is transformed in real time to match the acoustic profile of the account holder. The caller sounds like the customer because, at the audio level, they effectively are.

Agents operating under these conditions face an impossible task. They are managing high call volumes, working against handle time targets, and dealing with customers who are often already frustrated. Asking them to simultaneously perform audio forensics is neither realistic nor fair. The lesson learned in other deepfake contexts applies equally here; manual detection has been outpaced by the technology it is meant to catch.

Fraud rings understand this. Successful attack patterns get shared, refined, and scaled. When a voice deepfake clears KBA at one institution, that method becomes a template. Organizations relying on agent vigilance as a primary control are setting their defenses up for failure.

The Authentication Gap

The deeper issue is that most contact center authentication still relies on what a customer knows rather than who they are. Knowledge factors were never ideal identity proof. They degrade over time as data circulates, they create friction for legitimate customers who forget them, and they provide no resistance whatsoever against an adversary who has obtained the relevant data in advance. In an environment defined by large-scale data breaches, KBA’s limitations are an expected outcome.

Voice biometrics offer a structurally different approach. Rather than asking a caller to recite information that can be stolen, a voice biometric system authenticates the caller’s identity by analyzing the acoustic characteristics of their voice itself, comparing a live voiceprint against a template established during enrollment. The voice, unlike a PIN, cannot be purchased from a data broker. It cannot be phished in an email or extracted from a database. It is inherent to the individual.

Text-independent passive voice authentication goes a step further. The caller simply begins speaking. The authentication occurs in the background of the conversation, invisible and frictionless, requiring no change in caller behavior. There is no passphrase to recite or security script for the agent to administer. The system returns a match result before the caller has finished explaining why they called.

This matters for the customer experience as much as it matters for security. Contact centers have long operated on the premise that more security means more friction. Voice biometrics dissolves that trade-off.

A Layered Defense Against AI-Driven Threats

Passive voice matching is a necessary foundation. On its own, it is not sufficient. A voice biometric system that cannot distinguish a live human voice from a high-quality synthetic reproduction is a defense built for last decade’s threat.

Effective contact center security against voice deepfakes requires multiple overlapping controls. Liveness detection must be present to flag pre-recorded audio. Continuous authentication must be active throughout the call, not only at the point of initial greeting, because of handoff fraud, where a fraudster transfers the call to a different speaker after clearing initial verification. And real-time synthetic speech detection must operate as a parallel layer, analyzing audio characteristics to determine whether the source is human or machine-generated, even in noisy call environments.

Step-up authentication adds a final layer of risk-calibrated control. When a caller requests a sensitive action, a password reset, a new payee, or a high-value transfer, the system triggers secondary biometric verification through a mobile app. The caller’s identity is confirmed across two channels simultaneously, the voice on the phone and the biometric factor on the registered device. For high-risk transactions, this architecture provides a level of assurance that no knowledge-based system can approach.

Critically, this defense is only as strong as the identity data it draws on. A contact center authentication platform that operates on a fragmented view of the customer, one record for the phone channel and another for the app, creates gaps that adversaries can find and exploit. Authentication must be grounded in a single, centralized customer identity record that reflects every channel interaction, so that anomalies, like a voice that doesn’t match the enrolled template, or a device that has never appeared in the customer’s history, generate signals that can be acted on in real time.

How Daon Addresses the Threat

Daon’s contact center authentication solutions were built especially for this threat environment. xVoice supports both active and passive voice authentication models, operates across every major voice channel, and uses a language-independent universal voice model that requires no dialect-specific tuning for global deployment. Continuous voice authentication ensures that the person who cleared initial verification is the same person speaking at the end of the call.

xDeTECH, Daon’s dedicated synthetic speech detection engine, operates as a parallel layer, analyzing audio in real time to determine whether the source is human or artificially generated. The algorithms are trained to identify AI-manipulated audio even under adverse acoustic conditions, and they are continuously updated as new attack patterns emerge. This is not detection calibrated to yesterday’s voice cloning tools. It is detection designed to keep pace with the tools fraudsters are deploying today.

Step-up authentication, triggered by agents or within the IVR for high-risk requests, routes callers to app-based secondary biometric verification, bringing the full security of multi-modal biometrics to the contact center channel without adding friction to routine calls. The platform integrates with existing contact center infrastructure via industry-standard protocols, and fully hosted deployment models can be operational in days.

Organizations deploying voice biometrics in the contact center report significant annual savings across two dimensions, fraud reduction and call handle time, outcomes that compound quickly at enterprise scale. That figure represents the combined financial impact of stronger security and a better customer experience, two outcomes that, in this case, point in exactly the same direction.

The Strategic Imperative

Contact center fraud is not a problem that will resolve itself as generative AI matures. The same advances that are making voice cloning more accessible are making it more convincing. The gap between what synthetic audio sounds like and what a live human voice sounds like will continue to close. Organizations that treat this as a future concern are already behind.

The contact center exists to serve customers. That mission is unchanged. But serving customers well now requires protecting them from adversaries who are specifically targeting the accessibility of the channel. Organizations that equip their contact centers with AI-powered voice biometrics and real-time synthetic speech detection are building the authentication infrastructure that high-trust customer relationships require. In a threat environment that evolves at machine speed, the decision to delay is a decision to absorb losses that better infrastructure would have prevented.

To learn how Daon’s voice biometric authentication and synthetic speech detection can secure your contact center, contact us today.