🌎
This job posting isn't available in all website languages
📁
Architect (Level: Manager)
📅
CREQ255093 Requisition #

We need a strong profile having good exp in stakeholder & SRE team management.
Good understanding of Production engineering/ production support projects is a must which includes handling teams workin ag in 24/7 model.
Good understanding of Incident, change, service req management is a daily routine – so candidate should know how to manage the workload, rotate FTEs as and when required.
Management of Ad hoc activities such as Vulnerabilities fixes/ patching awareness is required.
Should be able to lead BAU governance activities (daily, weekly, & monthly cadence) with necessary reporting data.

Working Experience/ Awareness

24x7 operations support model for mission critical applications and infrastructure using ServiceNow as the ITSM ticketing tool.
GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc.
Working knowledge on AppDynamics and Splunk for monitoring and setting up observability is key. CI/CD tool chains, setting up and running deployment pipelines and propagating changes on different environments. Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat).
Maintain Hazelcast Data storage platform clusters and Control M job schedulers.
Kubernetes cluster management, monitoring, and remediation. Knowledge of Docker is important.
Automating deployments and scripting self-healing workflows based on telemetry.
Work closely with the team to define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability.
Work closely with the team to understand the code as well as configuration artifacts to debug and fix issues that may arise.
Must be inclined to work on proof of concepts solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging.
Able to lead & drive the SRE team to work in parallel on Service or Change Requests, the Defect management board, and backlog management in an agile manner.

We need a strong profile having good exp in stakeholder & SRE team management.
Good understanding of Production engineering/ production support projects is a must which includes handling teams workin ag in 24/7 model.
Good understanding of Incident, change, service req management is a daily routine – so candidate should know how to manage the workload, rotate FTEs as and when required.
Management of Ad hoc activities such as Vulnerabilities fixes/ patching awareness is required.
Should be able to lead BAU governance activities (daily, weekly, & monthly cadence) with necessary reporting data.

Working Experience/ Awareness:

24x7 operations support model for mission critical applications and infrastructure using ServiceNow as the ITSM ticketing tool.
GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc.
Working knowledge on AppDynamics and Splunk for monitoring and setting up observability is key. CI/CD tool chains, setting up and running deployment pipelines and propagating changes on different environments. Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat).
Maintain Hazelcast Data storage platform clusters and Control M job schedulers.
Kubernetes cluster management, monitoring, and remediation. Knowledge of Docker is important.
Automating deployments and scripting self-healing workflows based on telemetry.
Work closely with the team to define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability.
Work closely with the team to understand the code as well as configuration artifacts to debug and fix issues that may arise.
Must be inclined to work on proof of concepts solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging.
Able to lead & drive the SRE team to work in parallel on Service or Change Requests, the Defect management board, and backlog management in an agile manner.

Previous Job Searches

Similar Listings

Pune, Maharashtra, India

📁 Architect (Level: Manager)

Requisition #: CREQ245803

Pune, Maharashtra, India

📁 Architect (Level: Manager)

Requisition #: CREQ260391

Pune, Maharashtra, India

📁 Architect (Level: Manager)

Requisition #: CREQ258712