Voice

Voice built as a first-class channel, not an afterthought. The streaming pipeline overlaps STT and TTS so first audio hits before the LLM finishes generating — targeting sub-800ms TTFA on WebRTC. DTMF suppression during payment capture means card data never touches the AI pipeline.

✓Sub-800ms first response latency on WebRTC — indistinguishable from a fast agent
✓Deepgram Nova-3 STT + ElevenLabs Flash v2.5 TTS with streaming overlap
✓DTMF suppression: payment card data never transcribed, stored, or logged
✓Audio emotion detection: anger/frustration/distress triggers automatic escalation

Back Next: Governance

Voice Pipeline — Live Call

STT EngineDeepgram Nova-3

TTS EngineElevenLabs Flash v2.5

End-of-TurnDeepgram Flux ~260ms

First Audio (TTFA)612ms

EmotionNeutral (0.12)

Deepfake Score0.04 — Human

TTFA Distribution (last 100 calls)

400msp95: 748ms1200ms

DTMF Protection

●STT paused during capture

●Card data → Telnyx direct

●Zero AI pipeline exposure

Emotion Escalation

Anger > 0.70→ Tier 2

Distress > 0.80→ Tier 3

Today triggered14 times