Deep Learning Performance Architect

NVIDIA

May 14, 2026

Full-time

On-site

Shanghai, China

SoC Architecture Jobs, Level - Senior

Job Title

Deep Learning Performance Architect

Role Summary

Join the inference architecture team to model, analyze, and optimize deep learning inference performance on current and next-generation NVIDIA GPUs. The role emphasizes performance prototyping, kernel development, and providing data-driven guidance to hardware and software teams.

Work with architecture, software, and product teams to influence design and implementation for inference products.

Experience Level

Senior — 5+ years of relevant industry experience preferred.

Responsibilities

Principal responsibilities include:

Analyze new deep learning networks (including LLMs) to identify performance opportunities and prototype optimizations.
Develop high-performance kernel prototypes for current and future GPU architectures.
Define and execute measurement setups to evaluate performance, power consumption, and accuracy on chips under test.
Collaborate across architecture, software, and product teams to influence next-generation deep learning hardware and software direction.

Requirements

Must-have technical skills and experience; nice-to-have items listed separately.

Must-have: 5+ years of professional experience in relevant roles.
Excellent C/C++ programming and software build skills.
Experience with kernel development and performance tuning on GPUs or other accelerators.
Familiarity with deep learning frameworks such as PyTorch, JAX, TensorFlow, or TensorRT and with common AI models (e.g., LLMs, AIGC).
Experience with hardware frameworks for deep learning applications.
Nice-to-have: Experience optimizing DL workloads; experience with MLIR or AI compiler development.

Education Requirements

BS, MS, or PhD in Computer Science, Electrical Engineering, Mathematics, or a related technical field — or equivalent practical experience.

About the Company

Company: NVIDIA

Headquarters: Santa Clara, California, USA

NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-05-14

Apply now

Deep Learning Performance Architect

Job Title

Role Summary

Experience Level

Responsibilities

Requirements

Education Requirements

About the Company

More jobs

Design Automation & Software Engineer

Broadcom

Principal PCIe/CXL RTL Design Engineer

Rambus