Deep Learning Performance Architect, CUTLASS DSL

NVIDIA

June 02, 2026

Full-time

On-site

Shanghai, China

EDA Jobs, Level - Senior

Job Title

Deep Learning Performance Architect, CUTLASS DSL

Role Summary

Design and implement a Python-native domain-specific language (CUTLASS DSL) and its compiler infrastructure to generate high-performance GPU kernels. Work on MLIR dialects, lowering passes, and code generation to enable fast, optimized kernel compilation for AI workloads.

Collaborate with architecture, research, and product teams to integrate compiler optimizations into production GPU software stacks.

Experience Level

Senior level. The posting requests approximately 2+ years of relevant experience.

Responsibilities

Primary responsibilities include language and compiler development, performance optimization, and cross-team collaboration:

Design, develop, and optimize CUTLASS DSL for high-performance GPU kernel development.
Implement and advance MLIR dialects, lowering passes, and code-generation flows for the DSL.
Improve kernel compilation speed while preserving performance comparable to CUTLASS C++.
Collaborate with GPU architects, researchers, software product teams, and the open-source community to integrate optimizations.

Requirements

Must-have technical skills and experience.

Excellent programming skills in Python and strong proficiency in C++.
Hands-on experience with DSLs, compilers, or code-generation systems.
Strong command of the MLIR/LLVM stack, including IR design and pass optimization.
Strong communication and collaboration skills for cross-functional work.

Nice-to-have:

Deep understanding of the CUDA GPU programming model, GPU microarchitecture, and performance analysis techniques.
Familiarity with high-performance computing abstractions such as layout, tiling, MMA, and TMA within the CuTe ecosystem.

Education Requirements

MS or PhD in Computer Science, Software Engineering, or a related field, or equivalent practical experience; related technical fields accepted. (The source explicitly lists "MS, PhD, or equivalent experience in Computer Science, Software Engineering, or a related field.")

About the Company

Company: NVIDIA

Headquarters: Santa Clara, California, USA

NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-06-03

Apply now

Deep Learning Performance Architect, CUTLASS DSL

Job Title

Role Summary

Experience Level

Responsibilities

Requirements

Education Requirements

About the Company

More jobs

Senior FPGA Engineer

A&W Engineering Works

Senior Engineering Leader, Analog / Mixed-Signal IC Design

ASICSoft