Job Title
Data Center Power Test Architect
Role Summary
Technical lead responsible for designing and executing validation and test architectures for data center power features across NVIDIA platforms. The role focuses on test strategy, scalable automation, lab infrastructure, and ensuring firmware/software readiness for production datacenter systems.
Experience Level
Senior level. Posting requests 10+ years of relevant experience in data center power enablement, software/firmware testing, telemetry and power efficiency.
Responsibilities
Primary responsibilities include defining test strategy, building automation, and driving reliability and product readiness for data center power systems.
- Define end-to-end validation strategy for power features from pre-silicon simulation to post-silicon bring-up and production readiness.
- Architect and implement modular, reusable test frameworks and automation harnesses for functional, integration, stress, regression, power, security, and performance testing.
- Design test infrastructure that scales across hundreds of systems in parallel and integrate CI/CD for continuous testing.
- Establish power quality KPIs, dashboards, and reporting to measure coverage, uptime, bug escape rate, and validation completeness.
- Lead root-cause analysis across firmware, software, and hardware; develop debug methodologies and tools.
- Validate real-world customer configurations and contribute to release gates and sign-off criteria for production deployment.
- Drive lab automation, HW-in-the-loop testing, and simulation to improve test throughput and repeatability.
- Mentor and coach engineers and junior test developers; promote adoption of new tools and industry standards.
- Leverage AI-assisted tools to accelerate test development, automate repetitive workflows, and streamline debugging.
Requirements
Must-have technical skills and experience required to perform the role; followed by preferred qualifications.
Must-have
- 10+ years of experience in data center power enablement related to software/firmware testing, with focus on telemetry and power efficiency across systems.
- Strong knowledge of system architecture, power shelf, BMC/baseboard management, hardware/software power features, industry power standards, and embedded controllers.
- Proven experience designing test frameworks and infrastructure using Python, C/C++, or similar languages; strong scripting ability in Python.
- Experience with platform telemetry, datacenter node lifecycle management, and debugging node-level issues (CPU/GPU workloads).
- Experience administering and configuring Kubernetes and Envoy; familiarity with CI/CD tools such as GitLab and Jenkins and GitOps practices.
- Hands-on experience with lab automation, HW-in-the-loop testing, simulation, and CI/CD pipelines.
- Strong debugging, problem-solving, analytical skills, and ability to work with cross-functional teams globally.
Nice-to-have
- Experience with NVIDIA platforms (DGX, HGX, Grace Hopper systems) or large-scale datacenter platforms.
- Exposure to security validation/compliance (e.g., FIPS, BMC security), thermal/power validation, or prior role as test architect or technical lead.
- Contributions to open-source testing tools or frameworks and experience with infrastructure automation or virtualization.
- Experience applying AI tools to create agents, design test plans, identify test gaps, and automate failure analysis.
Education Requirements
B.S./M.S./PhD in Electrical Engineering, Computer Engineering, Computer Science, or a related technical field β or equivalent practical experience.
About the Company
Company: NVIDIA
Headquarters: Santa Clara, California, USA
NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-06-22