🌎
This job posting isn't available in all website languages

SRE-Production Support Engineering Manager

📁
Architect (Level: Manager)
📅
CREQ221485 Requisition #
Thanks for your interest in the SRE-Production Support Engineering Manager position. Unfortunately this position has been closed but you can search our 989 open jobs by clicking here.

Pls see below:

 

  • We need a strong profile having good exp in stakeholder & SRE team management.
  • Good understanding of Production engineering/ production support projects is a must which includes handling teams working in 24/7 model.
  • Good understanding of Incident, change, service req management is a daily routine – so candidate should know how to manage the workload, rotate FTEs as and when required.
  • Management of Ad hoc activities such as Vulnerabilities fixes/ patching awareness is required.
  • Should be able to lead BAU governance activities Daily, Weekly & Monthly cadence with necessary reporting data.
  • Having GCP cloud infra management knowledge, Postgres DB basic knowledge & banking domain experience is a big advantage to the role.

 

==================================================================================================

Job Description:

  • Mandatory experience on SRE (not Traditional Production Support) covering integration platforms on cloud-based deployments.
  • Knowledge of applying SRE practices to daily operations is key.
  • Ability to manage teams in shifts from office is mandatory; this is a 24x7 on desk operation.
  • Computer Science and/or Engineering degrees are preferred.
  • Having domain experience in Banking will be a great advantage.

Working Experience/ Awareness:

  • 24x7 operations support model for mission critical applications and infrastructure using ServiceNow as the ITSM ticketing tool.
  • GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc.
  • Working knowledge on AppDynamics and Splunk for monitoring and setting up observability is key. CI/CD tool chains, setting up and running deployment pipelines and propagating changes on different environments. Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat).
  • Maintain Hazelcast Data storage platform clusters and Control M job schedulers.
  • Kubernetes cluster management, monitoring, and remediation. Knowledge of Docker is important.
  • Automating deployments and scripting self-healing workflows based on telemetry.
  • Work closely with the team to define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability.
  • Work closely with the team to understand the code as well as configuration artifacts to debug and fix issues that may arise.
  • Must be inclined to work on proof of concepts solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging.
  • Able to lead & drive SRE team to parallelly work on Service or Change Requests, Defect management board, backlog management in agile manner.

Good to have:

  • SRE Foundation certification by DevOps Institute or any other equivalent certification on SRE by a recognized body is mandatory.
  • CKA certification.
  • GCP Cloud Digital Leader certification at a minimum is mandatory; Cloud Engineer level is a bonus.
  • Hazelcast Platform Operations certification badge

Previous Job Searches

Similar Listings

Pune, Maharashtra, India

📁 Architect (Level: Manager)

Requisition #: CREQ245803