Job Title
AI/ML Staff Software Engineer
Role Summary
Senior individual contributor responsible for defining workload-driven architecture strategy across hardware and software boundaries for AI/ML systems. The role drives workload characterization, performance analysis, and HW/SW co-optimization for current and next-generation SoC products.
Experience Level
Senior β requires technical leadership experience; posting requests 5+ years in systems engineering, hardware architecture, ML systems, or performance engineering.
Responsibilities
Lead technical work to measure, model, and optimize AI/ML workloads and translate findings into architecture and product decisions.
- Own workload characterization and hardware performance analysis for AI/ML systems; select representative workloads and define measurement methodology.
- Project system-level KPIs and translate results into SoC architecture, memory subsystem, and HW/SW co-optimization recommendations.
- Define and implement software frameworks and metrics for portfolio-wide performance analysis; leverage MLIR, IREE, and related infrastructure.
- Represent software in cross-functional architecture discussions with CPU, SoC, memory, interconnect, compiler, runtime, and ML framework teams.
- Identify critical bottlenecks (compute, memory bandwidth, on-chip memory, data movement, software overhead) and build cases for architectural changes.
- Prepare concise recommendations and detailed technical memos; present findings to senior engineering leadership and product stakeholders.
- Mentor and guide junior engineers on methodology and best practices.
- Follow Environmental, Health, Safety & Security requirements in all activities.
Requirements
Key technical and professional requirements. Must-haves are listed first; preferred skills follow.
- 5+ years of experience in systems engineering, hardware architecture, ML systems, or performance engineering with demonstrated technical leadership.
- Deep expertise in CPU and SoC architecture: memory hierarchies, out-of-order execution, vector/SIMD pipelines, and power management, and quantitative reasoning about memory-bound vs. compute-bound workloads.
- Strong understanding of system-level DRAM/LPDDR bandwidth, channel configuration, and utilization efficiency.
- Experience with AI/ML acceleration on edge devices (NPUs, inference accelerators, DSP pipelines) and HW/SW co-design challenges.
- Familiarity with AI compiler infrastructure (MLIR-based toolchains, IREE, TVM, TFLite) and how graph representations are lowered to hardware.
- Effective cross-functional collaborator who can drive consensus, communicate clearly to varied audiences, and influence without direct authority.
- English fluency (written and verbal); willingness to travel up to 10%; US work authorization.
- Role is 100% in-office at one of the company's sites (Dallas/Richardson, Austin, or San Jose).
Nice-to-have:
- Prior implementation of CPU features (AVX, NEON, RVV) or matrix extensions (AMX, SME).
- Experience defining SoC architecture requirements from workload analysis.
- Contributions to graph lowering in MLIR/IREE or similar compiler infrastructure; publications or standards contributions.
- Knowledge of RISC-V and vector/matrix extensions; experience mentoring junior systems engineers.
Education Requirements
BS required or MS preferred in Electrical Engineering, Computer Engineering, Computer Science, or a related technical field β or equivalent practical experience.
About the Company
Company: GlobalFoundries
Headquarters: Saratoga Springs, New York, USA
GlobalFoundries is a leading contract manufacturer for the global semiconductor industry, with facilities in multiple countries, including the USA. The company develops a broad portfolio of semiconductor technologies and employs around 13,000 people worldwide. GlobalFoundries focuses on enhancing competitiveness in specialized application solutions and fostering innovation in mobile communications, consumer electronics, and automotive applications.

Date Posted: 2026-06-12