What if real-time Voice AI could detect the deepfake before the damage is done?

In this episode of An Hour of Innovation podcast, Vit Lyoshin sits down with Carter Huffman, CTO and co-founder of Modulate AI, to explore how artificial intelligence is transforming cybersecurity through advanced voice AI detection systems that can stop fraud, harassment, and social engineering attacks in real time.

Throughout the conversation, they unpack how voice AI differs from text-based AI, why detecting tone and context is far more complex than keyword filtering, and how ensemble AI models help balance cost, accuracy, and scalability. They also explore real-world deployments in gaming moderation, healthcare security, and call center fraud prevention, showing how AI can escalate threats, detect synthetic voices, and even lock accounts before breaches occur.

Carter Huffman is the CTO and Co-Founder of Modulate AI, a leader in voice AI and conversational AI security. With a background in physics and audio signal processing, he has spent over a decade advancing audio machine learning systems that understand emotion, intent, and context in human speech. His work powers AI moderation systems in major gaming platforms and strengthens AI security in call centers and hospitals. In this episode, he offers rare insight into how AI voice detection works under the hood and where the future of deepfake defense is headed.

Takeaways
* Voice AI can detect a deepfake voice within the first two seconds of a phone call.
* Toxicity detection isn’t about keywords: sarcasm, tone, and context completely change meaning.
* A single toxic voice interaction can drive gamers away permanently, creating massive churn.
* Real-time AI fraud prevention must operate at low latency and high accuracy simultaneously.
* Ensemble AI models (many small specialized models) outperform one large general model in cost and precision.
* Audio AI systems often fail when the microphone setups or recording environments slightly change.
* Social engineering attacks rely on emotional pressure, which AI can detect through conversational patterns.
* AI can escalate suspicious calls to supervisors or automatically lock accounts before fraud succeeds.
* Speaker identification allows AI to track participants within a meeting, without tracking them across calls.
* Synthetic voice detection doesn’t automatically mean malicious intent; assistive tech must be considered.
* AI moderation systems must include human review and appeals to remain ethical and compliant.
* The same Voice AI technology that prevents fraud could be misused for censorship if deployed unethically.

Timestamps
00:00 Introduction
01:30 What Is Modulate AI?
03:09 Why Voice AI Is Harder Than Text AI
06:54 The Evolution of Voice AI in Gaming
10:26 How Modulate AI Works
19:05 Voice AI in Various Industries
26:31 Ethical Considerations in Voice AI Technology
32:40 Ethics in AI: Balancing Good and Bad Uses
34:32 Audio Machine Learning Challenges
41:09 The Future of Voice AI
45:45 Connect with Carter
46:31 Innovation Q&A

Connect with Carter
* Website: https://www.modulate.ai/
* LinkedIn: https://www.linkedin.com/in/carter-huffman-a9aba05b/

This Episode Is Supported By
* Google Workspace: Collaborative way of working in the cloud, from anywhere, on any device - https://referworkspace.app.goo.gl/A7wH
* Webflow: Create custom, responsive websites without coding - https://try.webflow.com/0lse98neclhe
* Monkey Digital: Unbeatable SEO. Outrank your competitors - https://www.monkeydigital.org?ref=110260

For inquiries about sponsoring An Hour of Innovation, email iris@anhourofinnovation.com

Connect with Vit
* LinkedIn: https://www.linkedin.com/in/vit-lyoshin/
* Substuck: https://anhourofinnovation.substack.com/
* X: https://x.com/vitlyoshin
* Website: https://vitlyoshin.com/contact/
* Podcast: https://www.anhourofinnovation.com/