Voice
Voice built as a first-class channel, not an afterthought. The streaming pipeline overlaps STT and TTS so first audio hits before the LLM finishes generating — targeting sub-800ms TTFA on WebRTC. DTMF suppression during payment capture means card data never touches the AI pipeline.
- ✓Sub-800ms first response latency on WebRTC — indistinguishable from a fast agent
- ✓Deepgram Nova-3 STT + ElevenLabs Flash v2.5 TTS with streaming overlap
- ✓DTMF suppression: payment card data never transcribed, stored, or logged
- ✓Audio emotion detection: anger/frustration/distress triggers automatic escalation
Voice Pipeline — Live Call
STT EngineDeepgram Nova-3
TTS EngineElevenLabs Flash v2.5
End-of-TurnDeepgram Flux ~260ms
First Audio (TTFA)612ms
EmotionNeutral (0.12)
Deepfake Score0.04 — Human
TTFA Distribution (last 100 calls)
400msp95: 748ms1200ms
DTMF Protection
●STT paused during capture
●Card data → Telnyx direct
●Zero AI pipeline exposure
Emotion Escalation
Anger > 0.70→ Tier 2
Distress > 0.80→ Tier 3
Today triggered14 times