NVIDIA logo

DevOps Engineer, HPC and LSF

NVIDIA
June 02, 2026
Full-time
Remote friendly (Bengaluru, Karnataka, India)
Worldwide
Other Semiconductor Jobs, Level - Mid-Career

Job Title

DevOps Engineer, HPC and LSF

Role Summary

As a member of the Hardware Infrastructure Farm team, provide engineering and operational leadership to build and operate large-scale compute clusters that support silicon development. Focus on reliability, performance, automation, and improving engineering productivity.

Work includes system-level troubleshooting, automation of deployments and configuration, and collaborating with chip development teams to optimize infrastructure usage.

Experience Level

Mid-level β€” requires 3+ years experience in large, distributed Linux environments.

Responsibilities

Primary responsibilities include operating and improving HPC infrastructure and schedulers:

  • Manage and support workload and resource schedulers (e.g., IBM Spectrum LSF or SLURM) in large-scale HPC clusters.
  • Develop automation for deployment, configuration management, and operational monitoring.
  • Collect and analyze grid and cluster performance metrics for troubleshooting and optimization.
  • Troubleshoot issues across the stack from bare metal to application level.
  • Define and document standard methodologies, runbooks, and best practices for internal teams.
  • Collaborate with domain experts to improve how chip development uses infrastructure.
  • Contribute to reliability improvements and reduce time to market for hardware projects.

Requirements

Must-have technical skills and experience:

  • Extensive experience administering job schedulers such as IBM Spectrum LSF or SLURM.
  • Proficient with CentOS/RHEL Linux administration.
  • Hands-on experience with container technologies (Docker).
  • Proficient in UNIX shell scripting and Python.
  • 3+ years operating in a large, distributed Linux environment.
  • Strong problem-solving, communication, and teamwork skills.

Nice-to-have:

  • Experience analyzing and tuning performance for HPC or EDA workloads.
  • Familiarity with configuration management tools such as Ansible.
  • Experience with Perl for maintaining legacy automation scripts.
  • Deep understanding of distributed systems principles.

Education Requirements

BS in Computer Science or a similar degree, or equivalent practical experience.


About the Company

Company: NVIDIA

Headquarters: Santa Clara, California, USA

NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

NVIDIA logo

Date Posted: 2026-06-03