Job Title
Senior MLOps & AI Infrastructure Engineer
Role Summary
The Senior MLOps & AI Infrastructure Engineer will design, build, and operate production machine-learning systems and infrastructure across cloud and on-prem HPC environments. This role partners with software, data science, and infrastructure teams to deliver end-to-end ML pipelines, automate model lifecycle management, and enable AI capabilities for EDA, HPC, and cloud workflows.
Experience Level
Senior. The posting indicates senior-level requirements with roughly 8–10+ years of relevant industry experience in ML engineering, MLOps, and cloud/HPC operations.
Responsibilities
Deliver and maintain scalable ML infrastructure, pipelines, and operational practices to move models from research to production.
- Design, build, and maintain ML pipelines for training, evaluation, and deployment across cloud and on‑prem HPC.
- Implement and operate MLOps components: experiment tracking, model registry, feature stores, automated retraining, and data/version lineage.
- Create CI/CD/CT pipelines for models using tools such as Kubeflow, MLflow, or Airflow.
- Containerize and orchestrate ML workloads with Docker and Kubernetes; manage GPU node pools.
- Develop and optimize large-scale models (LLMs, GNNs, RL agents) and apply techniques such as quantization, pruning, distillation, and transfer learning.
- Build data pipelines and feature engineering systems for large structured and unstructured datasets, including data versioning and lineage.
- Manage cloud ML infrastructure (AWS/Azure/GCP), automate provisioning (Terraform/CloudFormation), and optimize cost/performance.
- Implement monitoring, alerting, and observability for model performance drift, data quality, and system health.
- Support HPC schedulers (LSF, Slurm) for distributed training and collaborate with research teams to productionize models.
Requirements
Must-have technical skills and hands-on experience. Preferred items follow where indicated.
- 10+ years of experience across ML engineering, data science, and MLOps, including production model deployment at scale.
- 10+ years of Python programming experience; proficiency in Python for production ML systems.
- 8+ years with parallelism strategies (FSDP, DeepSpeed, data/model parallelism) and distributed training techniques.
- 8+ years operating cloud ML platforms and container orchestration: AWS SageMaker / GCP Vertex AI / Azure ML, Docker, Kubernetes, CI/CD pipelines.
- 5+ years with experiment tracking and reproducibility tools (MLflow, Weights & Biases, Neptune).
- Experience with ML frameworks and libraries: PyTorch, TensorFlow, JAX, Hugging Face, scikit-learn, XGBoost.
- Familiarity with data/versioning tools and feature stores: DVC, Delta Lake, Feast.
- Experience building monitoring and observability stacks (Prometheus, Grafana, ELK, Evidently, Arize).
- Hands-on experience with infrastructure-as-code (Terraform/CloudFormation) and GPU cluster management.
- Strong ownership, automation-first mindset, and ability to translate research into production-grade systems.
- Nice-to-have: experience applying ML to semiconductor/EDA domains, LLM fine-tuning and RAG, GNNs, reinforcement learning, DevSecOps for ML, familiarity with Synopsys/Cadence toolchains, and published research or open-source contributions.
Education Requirements
Required: Bachelor’s or Master’s degree in Computer Science, Machine Learning, Statistics, or a related field. Preferred: PhD in Computer Science, Machine Learning, Statistics, or a related field. (The posting pairs degree requirements with senior experience expectations; no explicit "equivalent experience" language was provided.)
About the Company
Company: Altera
Headquarters: Bengaluru, Karnataka, India
Altera provides leadership programmable solutions for applications ranging from cloud to edge, unveiling limitless AI possibilities. Their extensive product portfolio includes FPGAs, CPLDs, Intellectual Property, development tools, and System on Modules aimed at accelerating innovation in various fields.

Date Posted: 2026-05-16