Senior Deep Learning Performance Architect

NVIDIA

May 07, 2026

Full-time

On-site

Santa Clara, California, United States

$184,000 - $356,500 USD yearly

SoC Architecture Jobs, Level - Senior

Job Title

Senior Deep Learning Performance Architect

Role Summary

Design and evaluate hardware and system-level architectures to accelerate deep learning and high-performance computing workloads. The role sits on the Deep Learning Architecture team and partners with software, systems, and product teams to align hardware capabilities with real-world workload requirements.

Experience Level

Senior — 5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.

Responsibilities

Own analysis, modeling, and evaluation of production AI workloads to drive architecture and product decisions.

Design and evaluate hardware architectures to improve performance, efficiency, and scalability for production AI workloads.
Analyze and optimize large-scale deep learning workloads, including LLM inference and training in real-world deployments.
Build and use performance and power models (Python/C++) to inform architecture trade-offs.
Identify and resolve bottlenecks across compute, memory, and interconnect subsystems.
Evaluate PPA (performance, power, area) trade-offs and guide feature prioritization for next-generation GPU/ASIC designs.
Collaborate closely with software, systems, and product teams to ensure hardware meets workload needs.

Requirements

Must-have technical skills and experience.

5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.
Experience with deep learning workloads in production (training and/or inference).
Proficiency in Python and C++ for building performance models, simulators, or analysis tools.
Solid understanding of system architecture: memory hierarchy, data movement, and scalability.
Experience debugging, profiling, and performance tuning on real systems.
Proven ability to work across teams and drive technical decisions in fast-paced product environments.

Nice-to-have:

Experience translating workload behavior into concrete hardware or system-level improvements.
Practical experience with LLM inference optimization (batching, disaggregation, KV-cache management, latency/throughput tuning).
Familiarity with production inference systems (scheduling, multi-node scaling, resource utilization).

Education Requirements

MS or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related technical field, or equivalent practical experience.

About the Company

Company: NVIDIA

Headquarters: Santa Clara, California, USA

NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-05-06

Apply now

Senior Deep Learning Performance Architect

Job Title

Role Summary

Experience Level

Responsibilities

Requirements

Education Requirements

About the Company

More jobs

Design Automation & Software Engineer

Broadcom

Principal PCIe/CXL RTL Design Engineer

Rambus