NVIDIA logo

Senior Reliability Engineer

NVIDIA
June 12, 2026
Full-time
On-site
Santa Clara, California, United States
$116,000 - $184,000 USD yearly
Test Engineering Jobs, Level - Senior

Job Title

Senior Reliability Engineer

Role Summary

Senior Reliability Engineer based in NVIDIA's Santa Clara lab responsible for designing and operating HTOL (High Temperature Operating Life) test systems and burn-in hardware to validate silicon reliability. The role combines hands-on hardware development, thermal management, test automation, and data analysis.

Works cross-functionally with lab technicians, build engineers, reliability engineers, and vendors to develop HTOL boards, run ovens, and improve test processes and data quality.

Experience Level

Senior β€” typically 5+ years of experience in HTOL test system operation and reliability data analysis for semiconductor devices.

Responsibilities

Primary responsibilities include:

  • Develop, implement, and optimize HTOL test programs consistent with JEDEC standards.
  • Operate, maintain, and perform preventative maintenance and repairs on HTOL ovens and chambers.
  • Design, build, and debug burn-in boards; resolve signal-integrity and thermal issues.
  • Apply advanced thermal management techniques to control temperature and mitigate thermal stress during HTOL testing.
  • Collect, validate, and analyze test data using oscilloscopes, current probes, and other acquisition tools.
  • Develop and modify test scripts and perform vector debugging; support ATE when applicable.
  • Maintain and improve the reliability database and report findings that drive process or design changes.
  • Collaborate with vendors to qualify and improve burn-in boards, thermal interface materials, and HTOL systems.

Requirements

Key technical and professional requirements (must-have vs nice-to-have):

  • Must-have: Deep expertise in HTOL stress testing and JEDEC/environmental stress tests (Temperature Cycling, Reflow, Thermal Shock, HAST).
  • Must-have: Hands-on experience with MCC HTOL chamber operation, repairs, and preventative maintenance.
  • Must-have: Proficiency with oscilloscopes, current probes, data acquisition equipment, and reliability data analysis.
  • Must-have: Experience developing/modifying test scripts, vector debugging, and working knowledge of ATE concepts.
  • Must-have: Programming experience in Python or MATLAB for automation and data analysis; strong data-handling skills.
  • Must-have: Strong communication, teamwork, problem-solving skills, and attention to detail.

Nice-to-have:

  • Experience with dual-die/multi-die thermal challenges and high-power GPU or SoC burn-in board design.
  • Familiarity with reliability analytics platforms (e.g., JMP) and statistical lifetime modeling (Weibull, Arrhenius).
  • Track record driving vendor qualification and component selection for reliability test hardware.
  • Exposure to AI/ML methods for reliability data analysis or predictive failure modeling.

Education Requirements

Bachelor's or Master's degree in Electrical Engineering or a related technical field, or equivalent practical experience.


About the Company

Company: NVIDIA

Headquarters: Santa Clara, California, USA

NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

NVIDIA logo

Date Posted: 2026-06-12