Job Title
Production Systems Engineer, AI Systems
Role Summary
Join Meta's Release to Production (RTP) team to drive the end-to-end hardware lifecycle for AI server systems, from prototyping and pre-production through production monitoring, provisioning, and remediation. The role is cross-functional and hands-on, coordinating with hardware designers, vendors, manufacturers, production engineering, and data center teams to enable large-scale deployments.
Experience Level
Senior β requires 8+ years of hands-on software, firmware, or hardware engineering experience with AI silicon, GPUs, AI servers, or comparable products.
Responsibilities
Primary duties include:
- Define and execute end-to-end system validation strategies (hardware and software) for AI/HPC datacenter systems.
- Lead bring-up, validation, and deployment of hardware systems with active hands-on participation.
- Develop and update test methodologies and test cases for new product introduction (NPI).
- Investigate, triage, and troubleshoot complex hardware-related failures; perform root-cause analysis.
- Drive automation and data analysis for validation and NPI projects.
- Interface with external vendors and internal teams (mechanical, power, thermal, manufacturing, software) to understand system architecture.
- Implement monitoring, visualization, and systemic solutions to hardware health issues.
- Create experiments and tooling to detect and diagnose hardware/firmware/software health issues.
- Communicate project status and technical assessments to stakeholders.
Requirements
Key qualifications and skills.
-
Must-have: 8+ years of hands-on engineering experience in software, firmware, or hardware for AI silicon, GPUs, TPUs, AI servers, autonomous systems, or similar products.
- Experience in one or more domains: ASIC development (silicon design, bringup, characterization, validation), board-level debug, firmware validation, or system validation.
- Development or debug experience in hardware fault management, error reporting, and error handling on hardware products.
- Knowledge of architecture and components for server/PC/laptop systems.
- Strong troubleshooting, root-cause analysis, and cross-functional collaboration skills.
-
Nice-to-have: networking experience (switches, NICs, DPU), familiarity with TCP/IP and tools like iperf/uperf, RDMA/RoCE experience, and prior work with AI server systems and large-scale deployments.
Education Requirements
Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience.
About the Company
Company: Meta Platforms
Headquarters: Menlo Park, California, United States
American technology company that develops social networking products (Facebook, Instagram, WhatsApp) and invests in virtual/augmented reality hardware and software through Reality Labs, focusing on connectivity, advertising, and immersive computing experiences.

Date Posted: 2026-06-30