About the role
Every Delfino call ties up real, scarce resources: a phone line, a voice pipeline, model capacity, and a slice of compute โ all held live for the length of the call. As we grow from hundreds to many thousands of concurrent calls, the interesting problem stops being "can it place a call" and becomes "can it place ten thousand, balanced across carriers and regions, without a single one dropping."
That is your problem. You will own capacity and concurrency: how calls are queued, scheduled, and load-balanced across telephony providers and workers; how we stay inside carrier rate limits; how we scale up for a morning rush and down after; and how the whole fleet stays reliable and observable when something upstream fails.
What you'll do
- Own call concurrency and scheduling โ the queues and schedulers that decide which call goes out, when, and on which line.
- Balance load across carriers and workers so no provider, region, or node becomes the bottleneck or the single point of failure.
- Manage telephony capacity โ respect and negotiate carrier limits, handle failover between providers, and plan headroom ahead of demand.
- Scale the fleet elastically to match demand through the day while keeping cost per call honest.
- Build the observability โ the dashboards and alerts that show fleet health, concurrency, and where calls are backing up, in real time.
- Own reliability and on-call for the call platform: graceful degradation, backpressure, and blameless post-mortems when things break.
What we're looking for
- Strong backend or infrastructure background running distributed systems at scale in production.
- Real experience with concurrency, queueing, rate limiting, and load balancing โ you think in throughput, tail latency, and failure modes.
- Fluency with cloud infra and orchestration (containers, autoscaling, and infra-as-code).
- An operational instinct: you build the metrics and alerts before the incident, not after.
- Calm under pressure and a habit of designing for the failure, not just the happy path.
Nice to have
- Hands-on experience with telephony / VoIP / SIP infrastructure (Twilio, SignalWire, LiveKit, Asterisk, or carrier integrations).
- Built systems that manage pools of long-lived, stateful connections.
- Cost-optimization experience for compute- or usage-metered workloads.