Scaling & Load Balancing Engineer

About the role

Every Delfino call ties up real, scarce resources: a phone line, a voice pipeline, model capacity, and a slice of compute — all held live for the length of the call. As we grow from hundreds to many thousands of concurrent calls, the interesting problem stops being "can it place a call" and becomes "can it place ten thousand, balanced across carriers and regions, without a single one dropping."

That is your problem. You will own capacity and concurrency: how calls are queued, scheduled, and load-balanced across telephony providers and workers; how we stay inside carrier rate limits; how we scale up for a morning rush and down after; and how the whole fleet stays reliable and observable when something upstream fails.

What you'll do

Own call concurrency and scheduling — the queues and schedulers that decide which call goes out, when, and on which line.
Balance load across carriers and workers so no provider, region, or node becomes the bottleneck or the single point of failure.
Manage telephony capacity — respect and negotiate carrier limits, handle failover between providers, and plan headroom ahead of demand.
Scale the fleet elastically to match demand through the day while keeping cost per call honest.
Build the observability — the dashboards and alerts that show fleet health, concurrency, and where calls are backing up, in real time.
Own reliability and on-call for the call platform: graceful degradation, backpressure, and blameless post-mortems when things break.

What we're looking for

Strong backend or infrastructure background running distributed systems at scale in production.
Real experience with concurrency, queueing, rate limiting, and load balancing — you think in throughput, tail latency, and failure modes.
Fluency with cloud infra and orchestration (containers, autoscaling, and infra-as-code).
An operational instinct: you build the metrics and alerts before the incident, not after.
Calm under pressure and a habit of designing for the failure, not just the happy path.

Nice to have

Hands-on experience with telephony / VoIP / SIP infrastructure (Twilio, SignalWire, LiveKit, Asterisk, or carrier integrations).
Built systems that manage pools of long-lived, stateful connections.
Cost-optimization experience for compute- or usage-metered workloads.

About the role

What you'll do

What we're looking for

Nice to have

Design for the failure

Scale the fleet that never drops a call

Connect with Delfino AI