Job Title
Senior Deep Learning Performance Architect
Role Summary
Design and evaluate hardware and system-level architectures to accelerate deep learning and high-performance computing workloads. The role sits on the Deep Learning Architecture team and partners with software, systems, and product teams to align hardware capabilities with real-world workload requirements.
Experience Level
Senior β 5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.
Responsibilities
Own analysis, modeling, and evaluation of production AI workloads to drive architecture and product decisions.
- Design and evaluate hardware architectures to improve performance, efficiency, and scalability for production AI workloads.
- Analyze and optimize large-scale deep learning workloads, including LLM inference and training in real-world deployments.
- Build and use performance and power models (Python/C++) to inform architecture trade-offs.
- Identify and resolve bottlenecks across compute, memory, and interconnect subsystems.
- Evaluate PPA (performance, power, area) trade-offs and guide feature prioritization for next-generation GPU/ASIC designs.
- Collaborate closely with software, systems, and product teams to ensure hardware meets workload needs.
Requirements
Must-have technical skills and experience.
- 5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.
- Experience with deep learning workloads in production (training and/or inference).
- Proficiency in Python and C++ for building performance models, simulators, or analysis tools.
- Solid understanding of system architecture: memory hierarchy, data movement, and scalability.
- Experience debugging, profiling, and performance tuning on real systems.
- Proven ability to work across teams and drive technical decisions in fast-paced product environments.
Nice-to-have:
- Experience translating workload behavior into concrete hardware or system-level improvements.
- Practical experience with LLM inference optimization (batching, disaggregation, KV-cache management, latency/throughput tuning).
- Familiarity with production inference systems (scheduling, multi-node scaling, resource utilization).
Education Requirements
MS or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related technical field, or equivalent practical experience.
About the Company
Company: NVIDIA
Headquarters: Santa Clara, California, USA
NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-05-06