Job Description: Voice AI Engineer
Company: VoxyHealth (VoxyAI)
Location: Chennai, India (On-site)
Level: Mid-to-Senior Individual Contributor
About VoxyHealth
VoxyHealth builds AI-powered voice and workflow automation for healthcare. Our platform connects EHR systems, automates front-desk workflows, and enables ambient clinical intelligence — helping healthcare providers spend less time on paperwork and more time on patients. We work with clients ranging from specialty clinics to national health systems.
The Role
We’re hiring a voice AI engineer to build and scale the voice intelligence stack at the heart of our product—the real-time pipelines that listen, understand, and respond on behalf of healthcare providers. You’ll work on the production voice agent: STT, LLM orchestration, TTS, telephony integration, latency tuning, and the hard edge cases that come with handling clinical conversations.
This is a hands-on engineering role for someone who’s already shipped voice AI in production and wants to go deeper—not learn it for the first time.
What You’ll Own
– Voice agent pipeline — end-to-end design and tuning of STT → LLM → TTS flows, including barge-in, turn-taking, and latency budgets
– Telephony integration — SIP/WebRTC, call routing, audio streaming, jitter/packet-loss handling
– Model integration — work with hosted and self-hosted models (Whisper, Deepgram, ElevenLabs, OpenAI Realtime, Claude, etc.); evaluate, swap, and tune
– Real-time performance — drive down end-to-end latency; profile and fix the slow path
– Conversation quality — prompt engineering, function/tool calling, dialog state, fallback handling
– Evaluation and observability — build the test harness and metrics that tell us when the agent regresses
– Healthcare-grade reliability — handle edge cases (silence, noise, accents, code-switching, interruptions) without the agent falling apart
What We’re Looking For
Must-Have
– 5–7 years of software engineering experience overall, with 1–2 years specifically in voice AI (production, not side projects)
– Hands-on with at least one full voice pipeline: STT, TTS, and an LLM in the loop — you’ve debugged latency, barge-in, and turn-taking issues in production
– Telephony experience — SIP, WebRTC, or comparable real-time audio stacks; understands codecs, jitter buffers, and streaming audio
– Python backend proficiency; comfortable with Node.js as well
– Solid grasp of LLM orchestration — function/tool calling, streaming, prompt design, context management
– Comfortable in production: Docker, cloud (AWS or Azure), logs, traces, and metrics
– Understands the difference between a demo that works and a system that holds up under real call volume
Strong Plus
– Experience with Whisper, Deepgram, AssemblyAI, ElevenLabs, Cartesia, OpenAI Realtime API, LiveKit, Pipecat, or similar voice frameworks
– Healthcare domain knowledge — HIPAA, clinical workflows, EHR integrations
– Worked on multi-lingual or accent-robust voice systems
– ML/DL background — fine-tuning, evaluation harnesses, or custom model deployment
– Has shipped a voice agent that handled >1k concurrent calls or non-trivial production traffic
– Familiarity with Claude Code or similar AI-assisted development tools—we mandate their use across the engineering team
Mindset
– Obsessed with latency and conversation quality—”It works on my laptop” is not a finish line
– Ships and measures—every change is backed by an evaluation, not vibes
– Comfortable with messy real-world audio and the long tail of edge cases that come with it
– Bias toward production—you’d rather ship something behind a flag than perfect it offline
– Direct communicator; surfaces tradeoffs early
What You’ll Work On (Examples)
– Cutting end-to-end agent response latency from 1.8s to under 800ms
– Building a barge-in handler that doesn’t cut the user off mid-sentence
– Designing the eval harness that catches regressions before they hit a clinic
– Integrating a new TTS provider behind a feature flag and A/B testing voice quality
– Debugging why the agent loses context on long calls with accented speakers
Compensation
– Competitive base salary (commensurate with experience)
– Equity
– Health benefits
– On-site, Chennai
How We Hire
1. Intro call with Founder
2. Technical conversation — past voice AI work + system design (60 min)
3. Hands-on round — voice pipeline debugging or design exercise
4. Team / culture interview
5. Offer
Responsibilities Develop and execute sales strategies to increase market share for PNG and CNG in the Faridabad region. Identify and...
Apply For This JobDo you want your voice heard and your actions to count? Discover your opportunity with Mitsubishi UFJ Financial Group (MUFG),...
Apply For This JobLocation Mumbai Business Area Data Ref # 10046619 Description & Requirements Bloomberg runs on data. Our products are fueled by...
Apply For This JobWe Are: At Synopsys, we drive the innovations that shape the way we live and connect. Our technology is central...
Apply For This JobPosition: (MR) Medical Representative Location: Ferozepur , Patiala , Rajkot Company: Jagsonpal Pharma Posted on: 23-Apr-2026 Position: (MR) Medical Representative...
Apply For This JobRole: Accounts & AMP; Compliance Manager Location: Okhla Phase-III, New Delhi Department: Accounts Experience : 4-7 Years Type: Full-time About...
Apply For This Job