Job Title
Senior Reliability Engineer
Role Summary
Senior Reliability Engineer based in NVIDIA's Santa Clara lab responsible for designing and operating HTOL (High Temperature Operating Life) test systems and burn-in hardware to validate silicon reliability. The role combines hands-on hardware development, thermal management, test automation, and data analysis.
Works cross-functionally with lab technicians, build engineers, reliability engineers, and vendors to develop HTOL boards, run ovens, and improve test processes and data quality.
Experience Level
Senior β typically 5+ years of experience in HTOL test system operation and reliability data analysis for semiconductor devices.
Responsibilities
Primary responsibilities include:
- Develop, implement, and optimize HTOL test programs consistent with JEDEC standards.
- Operate, maintain, and perform preventative maintenance and repairs on HTOL ovens and chambers.
- Design, build, and debug burn-in boards; resolve signal-integrity and thermal issues.
- Apply advanced thermal management techniques to control temperature and mitigate thermal stress during HTOL testing.
- Collect, validate, and analyze test data using oscilloscopes, current probes, and other acquisition tools.
- Develop and modify test scripts and perform vector debugging; support ATE when applicable.
- Maintain and improve the reliability database and report findings that drive process or design changes.
- Collaborate with vendors to qualify and improve burn-in boards, thermal interface materials, and HTOL systems.
Requirements
Key technical and professional requirements (must-have vs nice-to-have):
-
Must-have: Deep expertise in HTOL stress testing and JEDEC/environmental stress tests (Temperature Cycling, Reflow, Thermal Shock, HAST).
-
Must-have: Hands-on experience with MCC HTOL chamber operation, repairs, and preventative maintenance.
-
Must-have: Proficiency with oscilloscopes, current probes, data acquisition equipment, and reliability data analysis.
-
Must-have: Experience developing/modifying test scripts, vector debugging, and working knowledge of ATE concepts.
-
Must-have: Programming experience in Python or MATLAB for automation and data analysis; strong data-handling skills.
-
Must-have: Strong communication, teamwork, problem-solving skills, and attention to detail.
Nice-to-have:
- Experience with dual-die/multi-die thermal challenges and high-power GPU or SoC burn-in board design.
- Familiarity with reliability analytics platforms (e.g., JMP) and statistical lifetime modeling (Weibull, Arrhenius).
- Track record driving vendor qualification and component selection for reliability test hardware.
- Exposure to AI/ML methods for reliability data analysis or predictive failure modeling.
Education Requirements
Bachelor's or Master's degree in Electrical Engineering or a related technical field, or equivalent practical experience.
About the Company
Company: NVIDIA
Headquarters: Santa Clara, California, USA
NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-06-12