Job Title
Senior LLM Agents Architect
Role Summary
Hands-on architect and builder of agentic LLM systems that generate, analyze, and optimize GPU compute kernels and support hardware/software co-design. Work closely with GPU architects, verification and performance engineers, and software teams to create end-to-end agent flows for kernel optimization, architectural exploration, and automated performance forensics.
Deliver production-grade agentic workflows integrated with internal services, evaluation backbones, and observability to enable rapid iteration and safe deployment.
Experience Level
Senior β requires 8+ years in applied ML/AI or large-scale systems, with 2+ years building agentic or LLM-powered applications in production environments.
Responsibilities
Design, implement, and productize agentic systems that improve GPU kernel performance and support architectural studies.
- Design and build agent workflows that generate, analyze, and optimize GPU kernels for peak performance on NVIDIA hardware.
- Encode domain expertise (memory hierarchy trade-offs, occupancy tuning, instruction-level reasoning) into agent orchestration and decision logic.
- Develop automated performance forensics agents to ingest simulation traces and profiler data (e.g., Nsight) to find bottlenecks and recommend mitigations.
- Partner with hardware architects to enable rapid what-if analyses across micro-architecture configurations (cache sizing, memory controller, compute unit scaling).
- Prototype and productize solutions; integrate with internal services, optimize pipelines, and remove system bottlenecks.
- Establish evaluation infrastructure using offline golden sets and online telemetry; implement guardrails, cost control, and rollback plans.
- Mentor teams on agent orchestration, prompting, retrieval-augmented generation (RAG), observability, and operational playbooks.
Requirements
Must-have technical skills, production experience, and collaboration abilities.
- Solid grounding in computer architecture: memory hierarchies, parallelism, pipelining, cache behavior; familiarity with NVIDIA GPU concepts (streaming multiprocessors, warp scheduling, shared/global memory model, occupancy reasoning).
- Hands-on CUDA programming: writing, profiling, and optimizing GPU kernels; experience with profiler workflows such as Nsight Compute or Nsight Systems.
- Proven ownership of at least one end-to-end agentic system or LLM application in production (requirements, architecture, implementation, evaluation, hardening).
- Strong software engineering skills in Python and one systems language (C++ preferred).
- Proficiency with tool orchestration, RAG pipelines, model adaptation techniques, and building agentic systems.
- Experience building observability for AI systems: dataset/version management, offline test suites, online telemetry, safety checks, and rollback plans.
- Excellent communication and facilitation skills; able to align diverse collaborators and document decisions and assumptions.
Nice-to-have:
- Experience with PyTorch compilation/lowering (torch.compile, TorchDynamo, TorchInductor), Triton, PTX, kernel fusion, or auto-tuning frameworks.
- Background in performance engineering for HPC or GPU workloads, performance modeling, or hardware simulators.
- Familiarity with distributed multi-GPU workloads and networking (NVLink, InfiniBand).
- Experience building domain-specific coding agents or using frontier agentic tools and lower-level agent frameworks (e.g., LangChain).
Education Requirements
B.Sc. in Computer Science or Electrical Engineering is required (as stated in the posting).
About the Company
Company: NVIDIA
Headquarters: Santa Clara, California, USA
NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-05-21