Job Title
Senior Resiliency and Safety Architect
Role Summary
This role designs and validates hardware and software resiliency and functional safety features for NVIDIA GPUs and Tegra SoCs. The position works across hardware and software teams to define architecture, diagnostics, simulations, and standards compliance for products used in graphics, AI, and automotive applications.
You will be part of the Accelerated and Resilient Compute Systems team and will influence system robustness, performance, and safety across product lines.
Experience Level
Senior β typically requires at least 5+ years of relevant experience.
Responsibilities
Primary responsibilities include architecting resiliency/safety features, analysis, diagnostics development, and ensuring compliance with automotive functional safety processes.
- Collaborate with software and hardware teams to architect and guide development of safety and resiliency features.
- Optimize hardware and software features to improve system robustness, performance, and security.
- Model and analyze RAS metrics (e.g., Failures in Time, Availability) and safety metrics (e.g., Diagnostic Coverage, PMHF).
- Run simulations to analyze Architectural Vulnerability Factor and liveness of on-die memory.
- Develop diagnostics software components for resiliency and safety to run on NVIDIA GPUs.
- Participate in testing and validation of new and existing resiliency and safety features.
- Define requirements, architecture, and design with traceability and perform safety analyses (FMEA, DFA, FTA).
- Work on compliance with functional safety standards such as ISO 26262 and automotive SPICE (ASPICE); ensure software conforms to MISRA and Cert-C where applicable.
Requirements
Must-have skills and experience:
- At least 5+ years of relevant experience in hardware/software resiliency or functional safety domains.
- Proficiency in C/C++.
- Scripting and automation experience with Python or similar tools.
- Familiarity with computer system architecture and microprocessor/microcontroller fundamentals (caches, buses, DMA, etc.).
- Understanding of the software development lifecycle from requirements through testing and maintenance.
- Experience with resiliency and/or functional safety practices and analyses.
- Strong debugging and analytical skills; excellent interpersonal and collaboration skills.
- Self-driven and results oriented.
Nice-to-have:
- Verilog RTL coding and simulation/debug experience.
- GPU and SoC architecture knowledge; CUDA programming experience.
- Embedded software development experience and familiarity with machine learning/deep learning concepts.
Education Requirements
Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a closely related field β or equivalent practical experience.
About the Company
Company: NVIDIA
Headquarters: Santa Clara, California, USA
NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-05-08