About the role
Delfino automates the repetitive phone calls providers make to payors. The hardest, most differentiated part of that is the voice agent itself: it has to navigate an IVR, sit on hold, then hold a genuine back-and-forth with a human agent about eligibility, prior authorization, or a claim — all in real time, without awkward pauses.
As our Voice AI Engineer you own that experience end to end. You will work across speech recognition, text to speech, turn-taking, and the language models that decide what to say next, and you will tune the whole loop for the one metric that matters on a phone call: latency. This is a deep, mostly-greenfield problem with very few off-the-shelf answers.
What you'll do
- Build the real-time conversation loop — streaming ASR, LLM reasoning, and TTS wired together to respond in well under a second.
- Solve turn-taking and barge-in so the agent knows when the rep is done talking, when to interrupt, and when to wait through hold music.
- Teach the agent to navigate IVRs — DTMF tones, menu prompts, and the endless "please listen carefully as our options have changed."
- Ground the dialog in real data so answers about eligibility, auth, and claims are accurate and never invented.
- Own quality and latency budgets end to end, and instrument everything so you can see exactly where a call went wrong.
- Turn failed calls into fixes — listen to recordings, find the breakdown, and ship the improvement.
What we're looking for
- Strong engineering fundamentals and comfort in a systems language (Python, Go, Rust, or similar) where milliseconds matter.
- Hands-on experience with LLMs, speech (ASR/TTS), or real-time audio/streaming pipelines.
- A feel for latency: you profile before you guess, and you know where a real-time system spends its time.
- Product taste for conversation — you can tell a natural exchange from a robotic one and know how to close the gap.
- Comfort with ambiguity and ownership; you are happy being the person the voice stack depends on.
Nice to have
- Experience with telephony or SIP/VoIP (Twilio, LiveKit, Asterisk, or similar).
- Worked on voice assistants, IVR systems, or contact-center automation before.
- Familiarity with healthcare data (eligibility, prior auth, claims) or other regulated domains.