About the role
Testing a non-deterministic voice agent is a genuinely hard problem. The same call can go a hundred ways; a change that helps one payor can quietly break another; and "correct" means the answer a nurse acts on is actually right. We need someone who treats that as an engineering challenge, not a checklist.
As our QA Engineer you own how we know the product works. You will build the evaluation harnesses, the simulated calls, and the regression suites that catch problems before a real payor ever does — and you will make quality a number the whole team can see and trust.
What you'll do
- Build eval frameworks that score the voice agent on accuracy, completeness, and behavior across many call scenarios.
- Create call simulators — realistic payor IVRs and agent personas — so we can test hard paths without dialing a real line.
- Own the regression suite so a change that improves one call type can't silently break another.
- Turn real call failures into test cases, building a growing library of the tricky situations that actually happen.
- Automate quality in CI so every release ships against a clear, honest quality bar.
- Partner with the voice team to define what "good" means and to catch regressions early.
What we're looking for
- Solid engineering skills — you write real code (Python or similar) to build tooling, not just click through test plans.
- A rigorous, adversarial mindset: you instinctively look for the input that breaks things.
- Experience with test automation, and comfort testing systems whose output isn't fixed or deterministic.
- Sharp attention to detail and a low tolerance for "probably fine."
- Ability to define and measure quality where the right answer isn't always obvious.
Nice to have
- Experience evaluating ML/LLM systems or other probabilistic pipelines.
- Familiarity with audio, speech, or telephony testing.
- Background in healthcare or another domain where correctness is regulated and non-negotiable.