Meta Platforms logo

Production Systems Engineer, AI Systems

Meta Platforms
July 01, 2026
Full-time
On-site
Menlo Park, California, United States
$173,000 - $245,000 USD yearly
Test Engineering Jobs, Level - Senior

Job Title

Production Systems Engineer, AI Systems

Role Summary

Join Meta's Release to Production (RTP) team to drive the end-to-end hardware lifecycle for AI server systems, from prototyping and pre-production through production monitoring, provisioning, and remediation. The role is cross-functional and hands-on, coordinating with hardware designers, vendors, manufacturers, production engineering, and data center teams to enable large-scale deployments.

Experience Level

Senior β€” requires 8+ years of hands-on software, firmware, or hardware engineering experience with AI silicon, GPUs, AI servers, or comparable products.

Responsibilities

Primary duties include:

  • Define and execute end-to-end system validation strategies (hardware and software) for AI/HPC datacenter systems.
  • Lead bring-up, validation, and deployment of hardware systems with active hands-on participation.
  • Develop and update test methodologies and test cases for new product introduction (NPI).
  • Investigate, triage, and troubleshoot complex hardware-related failures; perform root-cause analysis.
  • Drive automation and data analysis for validation and NPI projects.
  • Interface with external vendors and internal teams (mechanical, power, thermal, manufacturing, software) to understand system architecture.
  • Implement monitoring, visualization, and systemic solutions to hardware health issues.
  • Create experiments and tooling to detect and diagnose hardware/firmware/software health issues.
  • Communicate project status and technical assessments to stakeholders.

Requirements

Key qualifications and skills.

  • Must-have: 8+ years of hands-on engineering experience in software, firmware, or hardware for AI silicon, GPUs, TPUs, AI servers, autonomous systems, or similar products.
  • Experience in one or more domains: ASIC development (silicon design, bringup, characterization, validation), board-level debug, firmware validation, or system validation.
  • Development or debug experience in hardware fault management, error reporting, and error handling on hardware products.
  • Knowledge of architecture and components for server/PC/laptop systems.
  • Strong troubleshooting, root-cause analysis, and cross-functional collaboration skills.
  • Nice-to-have: networking experience (switches, NICs, DPU), familiarity with TCP/IP and tools like iperf/uperf, RDMA/RoCE experience, and prior work with AI server systems and large-scale deployments.

Education Requirements

Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience.


About the Company

Company: Meta Platforms

Headquarters: Menlo Park, California, United States

American technology company that develops social networking products (Facebook, Instagram, WhatsApp) and invests in virtual/augmented reality hardware and software through Reality Labs, focusing on connectivity, advertising, and immersive computing experiences.

Meta Platforms logo

Date Posted: 2026-06-30