🌎
This job posting isn't available in all website languages

SRE Observability Leader

📁
Senior Program Director (Level: Senior Director)
📅
155723 Requisition #
SRE Director – Job Description
We are looking for a leader for our Site Reliability Engineering (SRE), Observability team. As a leader of SRE/Observability you will create compelling Offerings in SRE, Observability and Resiliency for customers and contribute to the business growth. Deliver solutions to our customers and maintain the highest standards and develop and implement Observability and SRE team and offerings for Virtusa.
• Be a strong thought leader in Site Reliability engineering, Observability, Operational excellence, and DevOps Principles.
• Strong technical acumen in Cloud Architecture, Observability, Performance Benchmarking, Capacity planning and Reliability tools.
• Experience in Observability platforms, application monitoring tools and performance analysis techniques.
• Experience managing & growing technical leaders and teams.
• Be responsible for building and mentoring a new team of SRE, Observability specialists
• Strong technical acumen in Cloud Architecture, Observability, Performance Benchmarking, Capacity planning and Reliability tools.
KEY QUALIFICATION & EXPERIENCES:
• 15+ yrs of IT experience with minimum 5 years of experience in SRE/Observability/Monitoring tools
• Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field.
• Expert level experience in monitoring and logging technologies, both open source and closed source (e.g. AppDynamics, Newrelic, Datadog, Prometheus, Grafana, LogicMonitor, SumoLogic, ELK)
• Experience in implementing Metrics, Logs and Tracing for E2E observability
• A working knowledge of systems is needed. Terraform, Ansible, Chef, Puppet, Jenkins, Designing and implementing CI/CD pipelines, Infrastructure provisioning and management
• Ability to communicate and coordinate with cross-functional engineering teams across multiple geographic regions.
• Experience with AIOps and machine learning is highly desirable.
• Experience with other monitoring tools like Prometheus, Grafana, etc.
• Experience with Observability solutions like Dynatrace, DataDog, Instana etc. is highly desirable
• Excellent problem-solving and analytical skills.
• Strong communication and collaboration skills.
• Ability to work independently and manage multiple projects simultaneously.
• Knowledge of IT operations concepts and processes, such as monitoring, incident management, root cause analysis, remediation.

Previous Job Searches