We’re building agentic AI systems that autonomously reason, plan, and act across complex workflows. You’ll be a core contributor designing and shipping production-grade AI agents, integrating with LLM providers, orchestrating multi-step workflows, and building the infrastructure that makes agents reliable, observable, and scalable. This is a deeply technical role — you’ll work at the intersection of LLM engineering, backend systems, and agent architecture.

Job Title – Senior AI & Agentic System Engineer

Location – Gurgaon, India (You may note that this role is fully remote)

Level – Senior (5+)

Team – AI Platform / Engineering

WHAT YOU’LL DO

• Design and build AI agents — autonomous systems that use tools, browse the web, call APIs, manage memory, and complete multi-step tasks with minimal human intervention

• Architect agentic workflows — orchestrate multi-agent pipelines (planner → executor → verifier patterns), handle tool-use loops, HITL checkpoints, retries, and failure recovery

• Integrate LLM providers — work with Anthropic Claude, OpenAI, Gemini; implement prompt caching, structured outputs, and context management at scale; select the right model for the right task

• Build knowledge pipelines — design RAG systems, hybrid retrieval, vector stores, embedding pipelines, and memory layers that give agents contextual awareness

• Own backend reliability — REST/GraphQL APIs, async job queues, observability, latency optimization, cost tracking for token usage

• Drive system design decisions — lead architecture discussions, define component boundaries, evaluate build vs. buy tradeoffs, and document decisions (ADRs, diagrams)

• Maintain CI/CD pipelines — keep deployments fast, safe, and automated across dev/staging/prod environments

• Evaluate agent behavior — build LLM tracing, regression frameworks, and evals pipelines to detect behavioral drift, intent failures, and quality degradation across model updates

• Stay ahead of the curve — evaluate emerging standards (MCP, A2A, reasoning models, new frameworks) and make adoption recommendations

WHAT WE’RE LOOKING FOR

• Deep experience with at least one major LLM provider SDK (Anthropic, OpenAI, Gemini)

• Prompt engineering depth: few-shot design, chain-of-thought, structured outputs, tool/function calling, multi-turn context management, prompt caching, token budgeting

• Built real production agents — not just chatbots; systems that take actions, use tools, and operate autonomously

• Understands agent patterns: ReAct, plan-and-execute, multi-agent orchestration, reflection loops, supervisor/worker hierarchies

• Familiar with MCP (Model Context Protocol), A2A, and agent frameworks (LangGraph, CrewAI, AutoGen) — and when NOT to use them

• Knows how to handle non-determinism: retry strategies, output validation, graceful degradation

Emerging AI Landscape

• Reasoning models — knows when to use o1/o3, Claude Extended Thinking vs. standard models; treats model selection as an architectural decision, not an afterthought

• Model routing & cascading — routes queries to right-sized models (Haiku/Flash for simple, Opus/GPT-4o for complex) based on cost/capability tradeoffs; doesn’t default to the largest model for everything

• Compound AI systems — combine multiple specialized models and tools in pipelines rather than relying on a single general-purpose model

• Human-in-the-loop (HITL) — designs checkpointing for long-running agents, approval workflows, interrupt/resume patterns; knows when and where agents need human gates

• Persistent agent state — durable execution patterns (LangGraph checkpointers, Temporal-style workflows); agents that survive restarts and partial failures

• Prompt injection & AI security — understands attack surfaces unique to tool-using agents; builds with guardrails in mind (NeMo Guardrails, Llama Guard, output schemas)

• Semantic caching & batch inference — reduces LLM costs through response-level semantic caching and async batch APIs; thinks about cost at 10x before it becomes a problem

LLMOps & Evaluation

• LLM-specific observability: Langfuse, LangSmith, Helicone, or Arize Phoenix — prompt/response tracing, token cost per flow, per-step latency, session replay

• Evals methodology: RAGAS, DeepEval, Promptfoo, or Braintrust — systematic evaluation of LLM outputs, not vibes-based testing

• Treats prompts as code: versioning, A/B testing, rollback; understands how prompt changes ripple into downstream agent behavior

• Knows how to run AI/non-deterministic systems in CI — mock vs. live modes, cost controls, behavioral regression (not just output format matching) • Cost tracking and optimization: token usage per workflow, model spend dashboards, alerting on cost regressions

Knowledge & Retrieval

• RAG pipeline depth: document ingestion, chunking strategies (fixed, semantic, agentic), embedding, retrieval, reranking

• Hybrid search — dense (vector) + sparse (BM25/keyword) retrieval; understands where pure vector search underperforms and how to combine approaches

• Awareness of GraphRAG and contextual retrieval (prepending context summaries to chunks) for structured knowledge domains

• Worked with vector databases: Pinecone, Weaviate, pgvector, or similar

• Understands knowledge graph concepts and when structured retrieval outperforms embedding-based approaches

System Design & Architecture

• Can design distributed systems end-to-end: from data flow diagrams to API contracts to deployment topology

• Experience designing for agentic-specific challenges: long-running tasks, partial failures, agent state persistence, tool result caching, idempotent retries

• Knows when to use async vs. sync, event-driven vs. request-response, monolith vs. microservices — and can justify the choice

• Comfortable with architecture patterns: event sourcing, CQRS, saga pattern for multi-step workflows, circuit breakers

• Documents decisions clearly (ADRs, architecture diagrams, onboarding docs) — not just builds, but communicates design

• Has opinions on API versioning, backward compatibility, and service contracts

• Thinks about scalability and cost together — not just ‘will it scale’ but ‘what does it cost at 10x’

Backend Engineering

•Strong in Python and/or Node.js/TypeScript for backend services

• API design (REST, GraphQL, webhooks), async patterns, background job processing

• Database proficiency: PostgreSQL, DynamoDB, or equivalent

• Strong on observability: structured logging, distributed tracing, metrics

CI/CD & Developer Tooling

• Proficient with GitHub Actions — writing workflows from scratch, not just copy-pasting templates

• Understands CI/CD pipeline stages: lint → test → build → deploy with proper gate logic

• Can set up environment-specific deployments (dev/staging/prod) with appropriate safeguards

• Familiar with Docker and container-based build pipelines

• Knows how to handle secrets management in pipelines (AWS Secrets Manager, GitHub Secrets)

• Experience with deployment strategies: blue/green, canary, rolling updates

• Understands how to test AI/non-deterministic systems in CI — mock vs. live modes, cost controls, timeout handling

AWS (Core Services)

• Compute: Lambda (serverless functions, cold start optimization, layer management), ECS/Fargate (containerized long-running agents and services) • API & Routing: API Gateway for REST endpoints, ALB for container-based routing

• Storage: S3 for artifacts/reports/model outputs, DynamoDB or RDS/PostgreSQL for structured data

• Async & Queuing: SQS for decoupled task queues, SNS for fan-out, EventBridge for scheduled or event-driven triggers

• IAM: Writing least-privilege policies, understanding role assumption, cross-service permissions — not just using admin credentials

• Observability: CloudWatch Logs and Metrics for baseline monitoring; experience connecting to Datadog or similar APM preferred

• Infrastructure as Code: Proficient with Terraform — writing modules, managing state (remote backends, workspaces), handling drift, and deploying infra changes safely without manual console work

Mindset

• Comfortable with high ambiguity — AI systems behavior isn’t always predictable

• Ships iteratively; knows when a prototype is good enough vs. when to harden

• Opinionated about quality but pragmatic about tradeoffs

NICE TO HAVE

• Experience with browser automation in agentic contexts (Playwright, Puppeteer, Anthropic computer use API)

• Multi-modal agent experience — vision inputs, document parsing, image understanding in agentic loops

• Familiarity with agent evaluation frameworks and behavioral regression testing

• Contributions to open-source AI tooling

• Background in ML/data science alongside software engineering

TECH STACK WE USE

• LLMs: Anthropic Claude, OpenAI

• Agent infra: Custom orchestration, MCP servers

• Backend: Node.js / Python, AWS Lambda + ECS/Fargate

• Browser automation: Playwright

• CI/CD: GitHub Actions

• Observability: Datadog, CloudWatch, Langfuse

• Data: PostgreSQL, S3, SQS, DynamoDB

• IaC: Terraform

WHY THIS ROLE

• You’ll work on AI problems that are genuinely unsolved — not CRUD apps with an LLM wrapper

• Greenfield architecture decisions — your opinions shape the system

• Direct impact on product; small team, high ownership

• Exposure to the full agentic stack from infra to eval to UX

WHAT WE DON’T EXPECT

• A research background or ML theory depth — this is an engineering role

• Experience with every framework listed — curiosity and speed of learning matter more than coverage

• You need to know everything about the AI market — but you should have strong opinions about where it’s heading

Alert me to jobs like this

Senior AI & Agentic System Engineer Full Time

Alternative Path

Job Overview

Access Thousands of Resumes & Candidate Profiles for Free

Log In

Sign Up

Senior AI & Agentic System Engineer Full Time

Alternative Path

Apply For This Job

Related Jobs

Business Co-ordinator Full Time

ETF Marketing Manager Full Time

Manufacturing and Quality Lead Full Time

Accountant-Australian Accounting-Remote Full Time

Sales Operations Analyst – Sales Solutions (Fixed Term Contract – 12 Months) Full Time

Sr. QA Automation Engineer I Full Time

Job Overview

Apply For This Job

Access Thousands of Resumes & Candidate Profiles for Free