Deep Learning Performance Architect, CUTLASS DSL
Design and implement a Python-native domain-specific language (CUTLASS DSL) and its compiler infrastructure to generate high-performance GPU kernels. Work on MLIR dialects, lowering passes, and code generation to enable fast, optimized kernel compilation for AI workloads.
Collaborate with architecture, research, and product teams to integrate compiler optimizations into production GPU software stacks.
Senior level. The posting requests approximately 2+ years of relevant experience.
Primary responsibilities include language and compiler development, performance optimization, and cross-team collaboration:
Must-have technical skills and experience.
Nice-to-have:
MS or PhD in Computer Science, Software Engineering, or a related field, or equivalent practical experience; related technical fields accepted. (The source explicitly lists "MS, PhD, or equivalent experience in Computer Science, Software Engineering, or a related field.")
Company: NVIDIA
Headquarters: Santa Clara, California, USA
NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.
