NVIDIA logo

Senior Deep Learning Performance Architect

NVIDIA
May 07, 2026
Full-time
On-site
Santa Clara, California, United States
$184,000 - $356,500 USD yearly
SoC Architecture Jobs, Level - Senior

Job Title

Senior Deep Learning Performance Architect

Role Summary

Design and evaluate hardware and system-level architectures to accelerate deep learning and high-performance computing workloads. The role sits on the Deep Learning Architecture team and partners with software, systems, and product teams to align hardware capabilities with real-world workload requirements.

Experience Level

Senior β€” 5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.

Responsibilities

Own analysis, modeling, and evaluation of production AI workloads to drive architecture and product decisions.

  • Design and evaluate hardware architectures to improve performance, efficiency, and scalability for production AI workloads.
  • Analyze and optimize large-scale deep learning workloads, including LLM inference and training in real-world deployments.
  • Build and use performance and power models (Python/C++) to inform architecture trade-offs.
  • Identify and resolve bottlenecks across compute, memory, and interconnect subsystems.
  • Evaluate PPA (performance, power, area) trade-offs and guide feature prioritization for next-generation GPU/ASIC designs.
  • Collaborate closely with software, systems, and product teams to ensure hardware meets workload needs.

Requirements

Must-have technical skills and experience.

  • 5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.
  • Experience with deep learning workloads in production (training and/or inference).
  • Proficiency in Python and C++ for building performance models, simulators, or analysis tools.
  • Solid understanding of system architecture: memory hierarchy, data movement, and scalability.
  • Experience debugging, profiling, and performance tuning on real systems.
  • Proven ability to work across teams and drive technical decisions in fast-paced product environments.

Nice-to-have:

  • Experience translating workload behavior into concrete hardware or system-level improvements.
  • Practical experience with LLM inference optimization (batching, disaggregation, KV-cache management, latency/throughput tuning).
  • Familiarity with production inference systems (scheduling, multi-node scaling, resource utilization).

Education Requirements

MS or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related technical field, or equivalent practical experience.


About the Company

Company: NVIDIA

Headquarters: Santa Clara, California, USA

NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

NVIDIA logo

Date Posted: 2026-05-06