Job Title
Senior Performance Architect
Role Summary
Lead performance analysis and optimization across Quadric's hardware/software stack, identifying bottlenecks from high-level C++/Python down to generated assembly and hardware execution. Prototype fixes and coordinate with compiler and hardware teams to validate and drive product improvements.
This is a hybrid role based in the Burlingame, CA office with a regular onsite requirement (minimum 2–3 days per week); candidates must be able to commute to the office.
Experience Level
Senior — the posting specifies 5+ years of performance analysis experience.
Responsibilities
Primary responsibilities include hands-on analysis, prototyping, and cross-team collaboration:
- Analyze application performance across the full stack: C++/Python source, compiler output, assembly, and hardware execution.
- Identify and localize performance bottlenecks to code regions, assembly sequences, or architectural limitations.
- Implement proof-of-concept fixes and optimizations to validate solutions prior to product handoff.
- Develop and maintain profiling infrastructure, benchmarks, and performance regression tests.
- Collaborate with compiler engineers to improve code generation and optimization passes.
- Work with hardware architects to identify microarchitectural improvements and validate performance models.
- Create performance models that predict workload behavior and guide optimization priorities.
- Document findings and communicate performance insights to both technical and non-technical stakeholders.
- Support customer engagements by analyzing customer workloads and recommending optimizations.
Requirements
Must-have technical skills and experience:
- 5+ years of performance analysis experience.
- Strong proficiency in C++ and Python; ability to read, reason about, and write optimized code at the assembly level.
- Hands-on mentality: comfortable implementing prototypes, modifying compiler passes, or building proof-of-concept implementations.
- Deep understanding of computer architecture: pipelines, caches, memory hierarchies, SIMD/vector execution.
- Experience with profiling tools (perf, VTune, custom trace analysis) and performance debugging methodologies.
- Ability to trace performance issues from application behavior down to microarchitectural root causes.
- Strong analytical and problem-solving skills and the ability to explain complex issues clearly to diverse audiences.
- Experience working cross-functionally with compiler, runtime, and hardware teams.
- Able to commute to Burlingame, CA and work onsite a minimum of 2–3 days per week.
Nice-to-have:
- Experience with ML/AI workloads and frameworks (PyTorch, TensorFlow, ONNX).
- Background in compiler development or code generation.
- Experience with GPU, DSP, or custom accelerator architectures.
- Familiarity with cycle-accurate simulation and performance modeling tools.
Education Requirements
Bachelor's or Master's degree in Computer Science, Computer Engineering, or Electrical Engineering. The posting also specifies 5+ years of relevant performance analysis experience. No certifications or alternative "equivalent experience" language were provided.
About the Company
Company: Quadric
Headquarters: Burlingame, California, United States
Quadric is building the world’s first supercomputer designed for the real-time needs of edge devices. Founded in 2016, the company empowers developers across industries with innovative general-purpose neural processing unit (GPNPU) architecture for neural network workloads. Co-founded by technologists from MIT and Carnegie Mellon, Quadric aims to enable groundbreaking technology development.

Date Posted: 2026-06-22