AI Insights
NVIDIA

Senior LLM Agents Architect

NVIDIA · Santa Clara, California, US
full-timesenior (7-15 yrs)Posted 22d ago
Software EngineeringIC4IC + ManagementHybrid (3d)Visa SponsoredRelocation
StackPythonC++RustPyTorchTensorFlowCUDARAG pipelinesLLM orchestrationAgentic systemsGPU programmingPrompt engineeringObservability toolingEvaluation frameworksTool use / function callingModel adaptation / fine-tuningTelemetry / monitoring

Summary

NVIDIA is seeking a Senior LLM Agents Architect to design and deploy production-grade agentic AI systems that integrate LLMs with domain tools to accelerate hardware and software engineering workflows, including GPU kernel optimization and HW simulation analysis.

About the role

Our team propels generative AI forward by building and deploying agentic systems that integrate innovative LLMs with domain tools to expedite HW and SW engineering workflows at scale. We are in search of a top-tier AI Agents Solution Architect to work closely with hardware architects, verification engineers, GPU performance experts, and software developers to develop end-to-end agent flows that drive significant enhancements in simulation analysis, kernel optimization, and developer efficiency.

What you'll be doing:

  • Develop innovative AI flows to improve hardware and software through collaboration with various engineering roles.

  • Facilitate co-creation workshops to transform SME rules of thumb into specific assignments, resources, cues, and guidelines. Define success measures, evaluation data, and feedback loops for agents to make a measurable difference in NVIDIA's HW simulation analysis and GPU-kernel optimization workflows.

  • Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.

  • Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.

  • Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

What we need to see:

  • 7+ years in applied ML/AI or large-scale systems, with 3+ years crafting agentic or LLM-powered applications in production environments.

  • B.Sc in Computer Science/ Electrical Engineering

  • Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production.

  • Strong software engineering skills in Python and one systems language (C++ or Rust preferred); experience integrating with GPUs, CUDA, or performance-critical services.

  • Proficient in PyTorch or TensorFlow; skilled in tool use, RAG pipelines, and model adaptation.

  • Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.

  • Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.

  • Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.

  • Proactive, independent, possessing strong analytical and problem-solving abilities; adept at handling uncertainty to provide practical, gradual benefits.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com.

#LI-Hybrid 

What you'll do

1Develop innovative AI agent flows to improve hardware and software workflows in collaboration with hardware architects, verification engineers, and GPU performance experts
2Facilitate co-creation workshops to transform domain expert heuristics into agent tasks, resources, cues, and guidelines
3Define success measures, evaluation datasets, and feedback loops for agents targeting HW simulation analysis and GPU kernel optimization
4Rapidly prototype and productize agent solutions; integrate with internal services, leverage GPU capabilities, and remove performance bottlenecks
5Set up evaluation backbone using offline golden sets and online telemetry for cost control and safe iterative improvements
6Mentor engineering teams on agent orchestration, prompting, RAG, observability, and document playbooks for NVIDIA-wide adoption

Requirements

7+ years in applied ML/AI or large-scale systems with 3+ years building agentic or LLM-powered applications in production
Proven end-to-end ownership of at least one agentic system covering architecture, implementation, evaluation, and production hardening
Strong Python skills plus proficiency in a systems language (C++ or Rust); experience with GPU/CUDA and performance-critical integrations
Proficiency in PyTorch or TensorFlow; skilled in RAG pipelines, tool use, and model adaptation
Demonstrated ability to build AI observability: dataset/version management, offline test suites, online telemetry, guardrails, and rollback plans

Nice to have

Rust
C++
CUDA
GPU kernel optimization
HW simulation
LangChain
LlamaIndex
Evaluation dataset management
Safety/guardrails design
Technical documentation and playbook writing

Role overview

Role family
Software Engineering
Level
IC4 — ai
Experience
7–15 years
Type
Hybrid (IC + Management)
Remote policy
Hybrid (3 days)
Visa sponsorship
Available

Tech stack analysis

LANGUAGES
PythonC++Rust
FRAMEWORKS
PyTorchTensorFlowLangChain (inferred)LlamaIndex (inferred)
DATABASES
Vector databases (inferred for RAG — e.g., FAISS, Pinecone, Weaviate)
INFRASTRUCTURE
NVIDIA GPU computeCUDAInternal NVIDIA services/APIsTelemetry/monitoring pipelines
TOOLS
RAG pipelinesEvaluation frameworksDataset versioning tools (e.g., DVC, W&B)Observability/telemetry toolingOffline test suites / golden sets

Salary estimate

$220K – $320K
AI-estimated salary range
Confidence82%
Reasoning

Salary not explicitly stated. Based on NVIDIA's known compensation structure for senior AI/ML architect roles in Santa Clara, CA (one of the highest-paying tech markets), combined with 7+ years experience requirement, production LLM/agentic systems expertise, and CUDA/GPU specialization premium. NVIDIA is consistently ranked among the top-paying employers globally. Total comp including RSUs is likely $350K–$600K+, but base salary estimate is $220K–$320K. Confidence is high given NVIDIA's publicly known pay bands.

See the AI-estimated salary range for this role

Sign up free →

Green flags

5 items
Role sits at the cutting edge of LLM agentic systems applied to semiconductor EDA — a rare and high-impact intersection with significant career upside.growth

Discover all 5 green flags for this role

Sign up free →

Benefits breakdown

HEALTH & WELLNESS
Comprehensive health benefits package (details at nvidiabenefits.com)
Family coverage benefits

See all benefits organized by category — health, financial, time off & more

Sign up free →

Hiring insights

JD quality
8/10
Urgency
medium
Autonomy
high
Team size
medium (5-15)

See JD quality score, hiring urgency & team details

Sign up free →

Red flags

PRO4 items
Requires both deep LLM/agent expertise AND GPU/CUDA/systems programming — a very rare skill combination that may deter qualified candidates unnecessarily.requirements

See all 4 red flags — what the JD isn't telling you

Sign up free →

Interview insights

PRO
Rounds
5
Duration
5 wks
Difficulty
very hard
Take-home
Yes

Get full interview breakdown — rounds, likely topics & prep tips

Sign up free →

Career path

PRO
Next roles
Staff AI/ML ArchitectPrincipal LLM Systems EngineerAI Engineering Manager

See where this role leads — full career progression

Sign up free →
About the company

NVIDIA is the world's leading designer of GPUs and AI computing platforms. Its chips power everything from gaming and data centers to autonomous vehicles and scientific research. With a market cap exceeding $2 trillion, NVIDIA's CUDA platform and AI accelerators have become the backbone of the global AI revolution.

HQSanta Clara, CA, USA
Interview difficultyvery hard
Build vs Maintainbuild
Cross-functionalYes