NVIDIA logo

Senior Resiliency and Safety Architect

NVIDIA
May 08, 2026
Full-time
On-site
Santa Clara, California, United States
$184,000 - $356,500 USD yearly
SoC Architecture Jobs, Level - Senior

Job Title

Senior Resiliency and Safety Architect

Role Summary

This role designs and validates hardware and software resiliency and functional safety features for NVIDIA GPUs and Tegra SoCs. The position works across hardware and software teams to define architecture, diagnostics, simulations, and standards compliance for products used in graphics, AI, and automotive applications.

You will be part of the Accelerated and Resilient Compute Systems team and will influence system robustness, performance, and safety across product lines.

Experience Level

Senior β€” typically requires at least 5+ years of relevant experience.

Responsibilities

Primary responsibilities include architecting resiliency/safety features, analysis, diagnostics development, and ensuring compliance with automotive functional safety processes.

  • Collaborate with software and hardware teams to architect and guide development of safety and resiliency features.
  • Optimize hardware and software features to improve system robustness, performance, and security.
  • Model and analyze RAS metrics (e.g., Failures in Time, Availability) and safety metrics (e.g., Diagnostic Coverage, PMHF).
  • Run simulations to analyze Architectural Vulnerability Factor and liveness of on-die memory.
  • Develop diagnostics software components for resiliency and safety to run on NVIDIA GPUs.
  • Participate in testing and validation of new and existing resiliency and safety features.
  • Define requirements, architecture, and design with traceability and perform safety analyses (FMEA, DFA, FTA).
  • Work on compliance with functional safety standards such as ISO 26262 and automotive SPICE (ASPICE); ensure software conforms to MISRA and Cert-C where applicable.

Requirements

Must-have skills and experience:

  • At least 5+ years of relevant experience in hardware/software resiliency or functional safety domains.
  • Proficiency in C/C++.
  • Scripting and automation experience with Python or similar tools.
  • Familiarity with computer system architecture and microprocessor/microcontroller fundamentals (caches, buses, DMA, etc.).
  • Understanding of the software development lifecycle from requirements through testing and maintenance.
  • Experience with resiliency and/or functional safety practices and analyses.
  • Strong debugging and analytical skills; excellent interpersonal and collaboration skills.
  • Self-driven and results oriented.

Nice-to-have:

  • Verilog RTL coding and simulation/debug experience.
  • GPU and SoC architecture knowledge; CUDA programming experience.
  • Embedded software development experience and familiarity with machine learning/deep learning concepts.

Education Requirements

Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a closely related field β€” or equivalent practical experience.


About the Company

Company: NVIDIA

Headquarters: Santa Clara, California, USA

NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

NVIDIA logo

Date Posted: 2026-05-08