Platform Design & Deployment Work closely with CEP customer on technical requirement gathering and Architect and implement production-grade OpenShift clusters on OpenStack, including control plane, compute nodes, storage integrations, and networking. Adapt typical OpenShift and OpenStack design into government security and governance compliance construct. Provide deep technical advisory and design decision rationales to internal and external stakeholders. Define and automate infrastructure provisioning (IaaC) using tools such as Terraform, Ansible, or Red Hat Ansible Tower. Operational Excellence Develop and maintain monitoring, alerting, and logging pipelines (Prometheus, Grafana, EFK/ELK, Alertmanager). Lead capacity planning, performance tuning, and day-to-day cluster health management. Implement robust backup, disaster recovery, and upgrade strategies. Automation & CI/CD Build and manage CI/CD pipelines (Jenkins, GitLab CI, Argo CD) for platform updates, operator deployments, and application rollouts. Author scripts and operators to automate routine maintenance, scaling, and self-healing tasks. Security & Compliance Enforce security best practices: RBAC, network policies, SELinux, secrets management (Vault, OpenShift Secrets). Collaborate with security teams to implement vulnerability scanning, baseline hardening, and compliance audits. Collaboration & Documentation Partner with development, QA, and networking teams to onboard new applications and troubleshoot platform issues. Produce runbooks, run-charts, design docs, and knowledge-base articles. Experience 5+ years in Linux system administration (RHEL) and virtualization (KVM/QEMU). Experience in VMware would be added advantage. 3+ years deploying and operating OpenShift in production environments. Strong understanding about network and storage virtualisation. Hands-on experience with OpenStack (Ansible-based or OpenStack SDK): Nova, Neutron, Cinder, Keystone, Glance. Understand about basic infrastructure security and policies in government will be added advantage. Technical Skills Infrastructure as Code: Terraform, Ansible, or equivalent. Physical, virtual and container-based networking & storage: Calico, OVN, Ceph, Portworx. Monitoring/Logging: Prometheus, Grafana, ELK/EFK stacks. Scripting: Bash, Python, or Go. Networking fundamentals: VLANs, SDN, L3 routing, load balancing (HAProxy, OVN LB). Soft Skills Strong problem-solving and troubleshooting aptitude in complex distributed systems. Excellent.