Job Title
Senior MLOps & AI Infrastructure Engineer
Role Summary
Architect, build, and operate production-grade ML infrastructure and pipelines to support large-scale model training, evaluation, and deployment across cloud and on-prem HPC environments. Partner with software, data, and research teams to productionize models for EDA, HPC, and cloud use cases.
Experience Level
Senior β requires extensive industry experience (10+ years overall; specific years noted in Requirements).
Responsibilities
Deliver end-to-end MLOps solutions, optimize model lifecycle, and maintain robust infrastructure for scalable ML workloads.
- Design, build, and maintain scalable training/evaluation/deployment pipelines across cloud and on-prem HPC.
- Implement and operate experiment tracking, model registry, feature stores, and automated retraining workflows.
- Develop CI/CD/CT pipelines for models (e.g., Kubeflow, MLflow, Airflow) and containerized deployments on Kubernetes with GPU node pools.
- Fine-tune and deploy large models (LLMs, GNNs, RL agents) and apply efficiency techniques (quantization, pruning, distillation, RLHF).
- Build data pipelines, feature engineering systems, and data versioning/lineage for terabyte-scale datasets.
- Manage cloud ML resources (AWS SageMaker, Azure ML, GCP Vertex AI) and optimize cost/performance.
- Automate infrastructure provisioning (Terraform/CloudFormation) and integrate with HPC schedulers (Slurm, LSF) for distributed training.
- Implement monitoring, alerting, and observability for model performance, data quality, and system health.
- Mentor engineers, collaborate with research scientists, and drive adoption of ML engineering best practices.
Requirements
Must-have technical skills and hands-on experience.
- 10+ years of experience in ML engineering, data science, and MLOps; production deployment of models at scale.
- Proven expertise with ML frameworks: PyTorch, TensorFlow, JAX, Hugging Face, scikit-learn, XGBoost.
- Experience with parallelism strategies and large-model training (FSDP, DeepSpeed, data/model parallelism).
- Strong Python proficiency (10+ years) and experience with Bash, SQL; Go is a plus.
- 8+ years working with cloud ML platforms, Docker, Kubernetes, and CI/CD pipelines.
- 5+ years using experiment tracking and reproducibility tools (MLflow, Weights & Biases, Neptune) and data versioning tools (DVC, Delta Lake).
- Experience optimizing inference on GPU/TPU clusters and benchmarking model performance.
- Familiarity with monitoring/observability stacks (Prometheus, Grafana, ELK, Evidently, Arize) and security/DevSecOps practices for ML systems.
Education Requirements
Bachelor's or Master's degree in Computer Science, Machine Learning, Statistics, or a related technical field is stated as required; a PhD in a related field is listed as preferred. (The posting pairs degree requirements with senior-level experience requirements.)
About the Company
Company: Altera
Headquarters: Bengaluru, Karnataka, India
Altera provides leadership programmable solutions for applications ranging from cloud to edge, unveiling limitless AI possibilities. Their extensive product portfolio includes FPGAs, CPLDs, Intellectual Property, development tools, and System on Modules aimed at accelerating innovation in various fields.

Date Posted: 2026-07-02