Job Role Variations AI chip ML accelerator

AI Chip Design Engineer Jobs: Find ML Accelerator Roles

September 19, 2025

AI chip design engineer working on ML accelerator silicon — Photo: Pixabay

The silicon behind large-scale AI training isn't adapted CPU or GPU architecture; it's built specifically around how neural networks move and multiply data. AI chip design engineers are the ones doing that work, and there's a genuine shortage of people who've actually shipped it before.

The core of the job is matrix multiply hardware: systolic arrays, sparse compute engines, and SRAM hierarchies large enough to keep weights and activations close to the multipliers without constant trips to DRAM. You also spec HBM interfaces, design tile-to-tile interconnect, and write the scheduler that coordinates compute across a chiplet array. Most advanced shops are taping out at 5nm or 3nm. Solid digital IC skills are the floor; the job gets harder without some intuition for neural network memory access patterns, because that's usually where the architectural assumptions break.

The hiring splits roughly into two tracks that feel pretty different to work in. AI chip startups, Groq, Cerebras, SambaNova, d-Matrix, and Etched among them, run small teams where one engineer can own a large chunk of the architecture end-to-end and see a full tapeout from spec to silicon. The hyperscaler in-house programs, Google's TPU team, Amazon Trainium, Microsoft Maia, and Meta MTIA, work at larger scale with more defined processes and broader comp packages. Nvidia is also hiring on both the GPU side and for dedicated inference silicon.

If you have a background in GPU design or CPU design, the transition to AI accelerator work is worth considering. The parallel compute substrate and memory hierarchy challenges are familiar enough that experienced engineers from either discipline can move across without starting over.

Comp at staff and principal level is real: hyperscalers commonly offer $250K-$350K total for experienced accelerator designers; startup packages front-load equity with a lower base. The gap between those two tracks varies a lot depending on company stage and location. The semiconductor salary guide on semidesignjobs.com has current data broken down by level and geography.

Save a search on semidesignjobs.com filtered to AI or data center. Good roles at well-funded startups tend to fill before they get much wider distribution.

FAQ

What makes AI chip design different from traditional CPU or GPU design?

AI accelerators are built for a narrow set of operations, primarily dense and sparse matrix multiply, rather than general-purpose instruction execution. That narrowness is the whole point: you trade flexibility for compute density and energy efficiency. The design constraints shift accordingly. Instead of optimizing for single-thread latency, you're tuning for sustained throughput and memory bandwidth at scale.

What ML knowledge is useful for AI chip design engineer jobs?

Knowing neural network architectures (transformers, CNNs), quantization formats (INT8, FP8, BF16), and how different layer types hit the memory hierarchy gives you a real edge in architectural decisions. Familiarity with PyTorch or TensorFlow from a hardware perspective, specifically how frameworks dispatch operations and stage data, is something most teams will ask about in interviews.

What is a systolic array in AI chip design?

A systolic array is a grid of processing elements that passes partial sums and weight values between neighbors to compute matrix multiplications with high throughput and minimal DRAM traffic. Google's TPU made the approach famous, and versions of it show up across most major AI chip architectures. Data reuse happens locally between adjacent tiles rather than through a shared bus, which is where the efficiency comes from.

Back to blog