SRE Engineer Platforms
Lead SRE Engineer Platforms
BT's Platform Engineering team is responsible for building integrated, scalable, and robust enterprise journeys. We are currently seeking an experienced Lead SRE Engineer with a proven track record in designing, architecting, and implementing robust and durable applications, especially in the cloud and at scale.
About the Role
The Platform Support Lead will oversee operations support across the entire platform, ensuring stability, scalability, and performance. Acting as the central point for platform operations, this role will manage cross-functional teams, align processes, and ensure effective incident management. You will work closely with engineering, DevOps, and SRE teams to provide a seamless experience for developers and product owners.
Key Responsibilities
Operational Leadership
Oversee platform-wide operations, ensuring SLAs and reliability targets are consistently met.
Coordinate across teams to streamline processes and establish best practices for operational excellence.
Define and manage on-call rotations, ensuring adequate coverage and efficiency.
Incident Management
Develop and maintain comprehensive runbooks and playbooks for platform operations.
Lead incident response and resolution, conducting post-mortems and implementing improvements.
Proactively monitor platform health and prevent potential issues.
Process and Automation
Identify opportunities to enhance operational workflows and eliminate inefficiencies.
Drive automation initiatives to reduce manual tasks and improve platform scalability.
Collaborate with SRE and engineering teams to implement robust monitoring and alerting systems.
Stakeholder Collaboration
Serve as the liaison between operations and development teams, aligning on priorities and requirements.
Ensure platform performance meets the needs of developers, product owners, and end users.
Qualifications
5+ years of experience in platform or operations support, with a focus on large scale systems.
Strong understanding of platform engineering concepts, including Kubernetes K8s and AWS.
Proficiency in monitoring and observability tools e.g. Dynatrace, Prometheus.
Experience with CICD pipelines and automation tools e.g. GitLab, Pulumi.
Exceptional organizational and communication skills, with the ability to manage cross-functional initiatives.
Familiarity with incident response processes and post mortem analysis.
Leadership capabilities, including mentoring and coordination of operational teams.
Why Join Us
Opportunity to work with cutting edge cloud and DevOps technologies.
A collaborative team environment that encourages innovation and growth.
Hands on exposure to scalable architectures and automation practices.
Support for professional development and certifications in AWS, Kubernetes, and other modern technologies.