HPC (High Performance Computing) Engineer
Detailed Job Description for HPC (High-Performance Computing) Engineer:
We are looking for an HPC (High-Performance Computing) Engineer with expertise in Linux, Ansible, and Terraform specialized in building, deploying, and managing scalable, high-performance computing environments using Infrastructure as Code (IaC) principles.
Core Competencies & Responsibilities:
Linux System Administration: Deep knowledge of RHEL/CentOS/Rocky Linux, kernel tuning, and network troubleshooting (TCP/IP, InfiniBand).
Infrastructure as Code (Terraform): Provisioning compute nodes, storage, and networking resources across cloud (AWS/Azure) or on-prem (VMware/Proxmox).
Configuration Management (Ansible): Agentless configuration of clusters, installing software stacks, managing user accounts, and automating repetitive tasks.
HPC Specifics: Installing and configuring cluster managers and job schedulers like Slurm or OpenGridEngine (OGE).
Filesystem & Storage: Managing parallel file systems such as GPFS/Spectrum Scale or Lustre.