Job Description: Voice AI Engineer

Company: VoxyHealth (VoxyAI)

Location: Chennai, India (On-site)

Level: Mid-to-Senior Individual Contributor

About VoxyHealth

VoxyHealth builds AI-powered voice and workflow automation for healthcare. Our platform connects EHR systems, automates front-desk workflows, and enables ambient clinical intelligence — helping healthcare providers spend less time on paperwork and more time on patients. We work with clients ranging from specialty clinics to national health systems.

The Role

We’re hiring a voice AI engineer to build and scale the voice intelligence stack at the heart of our product—the real-time pipelines that listen, understand, and respond on behalf of healthcare providers. You’ll work on the production voice agent: STT, LLM orchestration, TTS, telephony integration, latency tuning, and the hard edge cases that come with handling clinical conversations.

This is a hands-on engineering role for someone who’s already shipped voice AI in production and wants to go deeper—not learn it for the first time.

What You’ll Own

– Voice agent pipeline — end-to-end design and tuning of STT → LLM → TTS flows, including barge-in, turn-taking, and latency budgets

– Telephony integration — SIP/WebRTC, call routing, audio streaming, jitter/packet-loss handling

– Model integration — work with hosted and self-hosted models (Whisper, Deepgram, ElevenLabs, OpenAI Realtime, Claude, etc.); evaluate, swap, and tune

– Real-time performance — drive down end-to-end latency; profile and fix the slow path

– Conversation quality — prompt engineering, function/tool calling, dialog state, fallback handling

– Evaluation and observability — build the test harness and metrics that tell us when the agent regresses

– Healthcare-grade reliability — handle edge cases (silence, noise, accents, code-switching, interruptions) without the agent falling apart

What We’re Looking For

Must-Have

– 5–7 years of software engineering experience overall, with 1–2 years specifically in voice AI (production, not side projects)

– Hands-on with at least one full voice pipeline: STT, TTS, and an LLM in the loop — you’ve debugged latency, barge-in, and turn-taking issues in production

– Telephony experience — SIP, WebRTC, or comparable real-time audio stacks; understands codecs, jitter buffers, and streaming audio

– Python backend proficiency; comfortable with Node.js as well

– Solid grasp of LLM orchestration — function/tool calling, streaming, prompt design, context management

– Comfortable in production: Docker, cloud (AWS or Azure), logs, traces, and metrics

– Understands the difference between a demo that works and a system that holds up under real call volume

Strong Plus

– Experience with Whisper, Deepgram, AssemblyAI, ElevenLabs, Cartesia, OpenAI Realtime API, LiveKit, Pipecat, or similar voice frameworks

– Healthcare domain knowledge — HIPAA, clinical workflows, EHR integrations

– Worked on multi-lingual or accent-robust voice systems

– ML/DL background — fine-tuning, evaluation harnesses, or custom model deployment

– Has shipped a voice agent that handled >1k concurrent calls or non-trivial production traffic

– Familiarity with Claude Code or similar AI-assisted development tools—we mandate their use across the engineering team

Mindset

– Obsessed with latency and conversation quality—”It works on my laptop” is not a finish line

– Ships and measures—every change is backed by an evaluation, not vibes

– Comfortable with messy real-world audio and the long tail of edge cases that come with it

– Bias toward production—you’d rather ship something behind a flag than perfect it offline

– Direct communicator; surfaces tradeoffs early

What You’ll Work On (Examples)

– Cutting end-to-end agent response latency from 1.8s to under 800ms

– Building a barge-in handler that doesn’t cut the user off mid-sentence

– Designing the eval harness that catches regressions before they hit a clinic

– Integrating a new TTS provider behind a feature flag and A/B testing voice quality

– Debugging why the agent loses context on long calls with accented speakers

Compensation

– Competitive base salary (commensurate with experience)

– Equity

– Health benefits

– On-site, Chennai

How We Hire

1. Intro call with Founder

2. Technical conversation — past voice AI work + system design (60 min)

3. Hands-on round — voice pipeline debugging or design exercise

4. Team / culture interview

5. Offer

Alert me to jobs like this

Voice AI Engineer Full Time

VoxyHealth

Job Overview

Access Thousands of Resumes & Candidate Profiles for Free

Log In

Sign Up

Voice AI Engineer Full Time

VoxyHealth

Apply For This Job

Related Jobs

Fleet Procurement Administrator Full Time

English Faculty for Class VIII IX & X CBSE Full Time

Jr Performance Marketing Executive- Freshers Full Time

Executive – Order Management Full Time

Accounting Professional Full Time

Agentic AI/ML Engineer with Telecommunications (Telco – OSS) Full Time

Job Overview

Apply For This Job

Access Thousands of Resumes & Candidate Profiles for Free