Job Title
Senior Software Engineer, Data Center Workloads – Infrastructure
Role Summary
Responsible for developing and executing software-driven characterization workflows on NVIDIA rack-scale systems to analyze, characterize, and optimize power, performance, and drive behavior at system level.
The role works at the intersection of software, infrastructure, silicon, and large-scale AI platforms and collaborates with hardware, firmware, driver, system software, performance, and validation teams to validate and optimize rack-scale AI systems.
Experience Level
Senior-level. Requires approximately 5+ years of software engineering experience in system software, infrastructure, validation, or performance-focused environments.
Responsibilities
Primary responsibilities include running and automating system-level workloads, collecting and analyzing telemetry, and supporting platform bring-up and validation.
- Develop and run tools, automation, and workloads to characterize power, performance, and drive behavior across rack-scale systems.
- Execute AI and system-level workloads to stress-test stacks including GPUs, CPUs, networking, storage, firmware, drivers, and system software.
- Build automated frameworks for data collection, telemetry, validation, correlation, and analysis of characterization results.
- Investigate system behavior to identify bottlenecks, anomalies, and optimization opportunities.
- Collaborate with hardware, firmware, driver, system software, performance, and validation teams to define methodologies and debug cross-stack issues.
- Support bring-up, validation, and readiness activities for new rack-scale platforms and AI infrastructure.
- Create documentation, repeatable test flows, and processes to improve coverage, efficiency, and reproducibility.
Requirements
Core technical skills and experience required; listed as must-haves and desirable differentiators.
-
Must-have: Strong programming skills in Python and experience in at least one system-level language (C/C++).
-
Must-have: Experience developing automation and test infrastructure for complex hardware/software systems.
-
Must-have: Hands-on experience running, debugging, or optimizing AI, HPC, or large-scale system workloads.
-
Must-have: Good understanding of system-level architecture and interactions across hardware, firmware, drivers, operating systems, and applications.
-
Must-have: Experience working in Linux environments with scripting, telemetry, logging, and data analysis tools.
-
Must-have: Strong debugging and cross-disciplinary problem-solving skills and effective communication in fast-paced, cross-functional settings.
-
Nice-to-have: Experience with NVIDIA platforms, GPU systems, rack-scale AI infrastructure, power/thermal/performance/storage characterization, workload automation, cluster orchestration, lab infrastructure, AI benchmarks, or post-silicon validation and system bring-up.
Education Requirements
B.Sc. or M.Sc. in Computer Science, Electrical Engineering, or a related field.
About the Company
Company: NVIDIA
Headquarters: Santa Clara, California, USA
NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-05-01