DevOps Engineer, HPC and LSF

NVIDIA

June 02, 2026

Full-time

Remote friendly (Bengaluru, Karnataka, India)

Worldwide

Other Semiconductor Jobs, Level - Mid-Career

Job Title

DevOps Engineer, HPC and LSF

Role Summary

As a member of the Hardware Infrastructure Farm team, provide engineering and operational leadership to build and operate large-scale compute clusters that support silicon development. Focus on reliability, performance, automation, and improving engineering productivity.

Work includes system-level troubleshooting, automation of deployments and configuration, and collaborating with chip development teams to optimize infrastructure usage.

Experience Level

Mid-level — requires 3+ years experience in large, distributed Linux environments.

Responsibilities

Primary responsibilities include operating and improving HPC infrastructure and schedulers:

Manage and support workload and resource schedulers (e.g., IBM Spectrum LSF or SLURM) in large-scale HPC clusters.
Develop automation for deployment, configuration management, and operational monitoring.
Collect and analyze grid and cluster performance metrics for troubleshooting and optimization.
Troubleshoot issues across the stack from bare metal to application level.
Define and document standard methodologies, runbooks, and best practices for internal teams.
Collaborate with domain experts to improve how chip development uses infrastructure.
Contribute to reliability improvements and reduce time to market for hardware projects.

Requirements

Must-have technical skills and experience:

Extensive experience administering job schedulers such as IBM Spectrum LSF or SLURM.
Proficient with CentOS/RHEL Linux administration.
Hands-on experience with container technologies (Docker).
Proficient in UNIX shell scripting and Python.
3+ years operating in a large, distributed Linux environment.
Strong problem-solving, communication, and teamwork skills.

Nice-to-have:

Experience analyzing and tuning performance for HPC or EDA workloads.
Familiarity with configuration management tools such as Ansible.
Experience with Perl for maintaining legacy automation scripts.
Deep understanding of distributed systems principles.

Education Requirements

BS in Computer Science or a similar degree, or equivalent practical experience.

About the Company

Company: NVIDIA

Headquarters: Santa Clara, California, USA

NVIDIA is a global leader in accelerated computing, renowned for its innovative solutions in AI and digital twins that transform diverse industries. The company specializes in networking technologies, providing end-to-end InfiniBand and Ethernet solutions for servers and storage that optimize performance and scalability. NVIDIA serves sectors such as high-performance computing, enterprise data centers, and cloud computing, constantly reinventing its products and services to stay ahead in the market.

Date Posted: 2026-06-03

Apply now

DevOps Engineer, HPC and LSF

Job Title

Role Summary

Experience Level

Responsibilities

Requirements

Education Requirements

About the Company

More jobs

RTL Design Engineer

ACL Digital

FPGA Engineer

Acron Aviation