Voice

Voice built as a first-class channel, not an afterthought. The streaming pipeline overlaps STT and TTS so first audio hits before the LLM finishes generating — targeting sub-800ms TTFA on WebRTC. DTMF suppression during payment capture means card data never touches the AI pipeline.

  • Sub-800ms first response latency on WebRTC — indistinguishable from a fast agent
  • Deepgram Nova-3 STT + ElevenLabs Flash v2.5 TTS with streaming overlap
  • DTMF suppression: payment card data never transcribed, stored, or logged
  • Audio emotion detection: anger/frustration/distress triggers automatic escalation

Voice Pipeline — Live Call

STT EngineDeepgram Nova-3
TTS EngineElevenLabs Flash v2.5
End-of-TurnDeepgram Flux ~260ms
First Audio (TTFA)612ms
EmotionNeutral (0.12)
Deepfake Score0.04 — Human

TTFA Distribution (last 100 calls)

400msp95: 748ms1200ms

DTMF Protection

STT paused during capture
Card data → Telnyx direct
Zero AI pipeline exposure

Emotion Escalation

Anger > 0.70→ Tier 2
Distress > 0.80→ Tier 3
Today triggered14 times