Lead Database Reliability Engineer
About the Role:
The Database Reliability Engineer (DBRE) – which is an extension or subset of the SRE (Site
Reliability Engineering) model, just specializing in database technologies but with the same
underlying DevOps principle, will be a lead strategic partner in building and maintaining a
Database as a Service Platform to help software engineers build, deploy, and monitor
applications with an emphasis on automation. This is an engineering discipline that combines
software and systems engineering to build and run large-scale, massively distributed, fault-
tolerant systems.
DBRE is responsible for the availability and reliability of our most critical database platform
services and ensures they meet our internal and external users requirements. The hosting
platforms will be on-prem servers as well as public clouds such as AWS/Azure.
How you will make an impact:
Drive technology initiatives by taking the lead and providing guidance to team members.
Design, build, and maintain enterprise-scale production relational backends using Microsoft SQL Server, MySQL, or Oracle (both on-premises and in the cloud, with a particular emphasis on Relational Database Service in Amazon Web Services)
Be involved in designing, building, maintaining, and monitoring CI/CD pipelines and all deployments up to production.
Handle performance tuning, backup, and recovery tasks.
Create automated processes for recurring database tasks and deployments (such as migrations, replication, restoring backups, and spinning up new clusters).
Develop and automate best practices and repeatable procedures for deploying and
scaling databases.
Provide production and lower-environment support for assigned applications related to
their back-end databases
Build and maintain High Availability (HA) and Disaster Recovery (DR) design/implementation for complex mission-critical environments.
Assist with the design and implementation of infrastructure assets using cloud services.
Identify improvement opportunities on existing systems, build plans, and execute improvements.
Research of automation-related technologies.
Diagnose and troubleshoot database errors, including participating in an on-call rotation
and being available for on-call support as needed (even working over weekends when
required).
We are looking for people who:
Have 5+ years of experience either in PowerShell/ Windows command line scripting, or Linux scripting such as bash, especially with troubleshooting production systems.
Have 5+ years of experience in building, configuring, and managing database environments.
Experience with at least two relational and non-relational databases such as Microsoft
SQL Server, MySQL, Oracle, PostgreSQL, MongoDB and CouchDB is expected.
Experience in analyzing requirements and proposing database solutions.
Hands on experience in building, managing and troubleshooting high availability
features such as Clustering, Log-shipping and Mirroring.
Have 2-4 years of experience using cloud database services such as Amazon RDS.
Have experience in DEV-OPS configuration management system automation using tools
such as Terraform, Ansible, CloudFormation, Chef etc.
Have hands-on experience with Continuous Integration/Continuous Delivery & Deployment techniques and tools such as Jenkins and GitHub.
Have exposure to containerization (Docker) and a container orchestration system
(ECS/Kubernetes).
Have good understandings on disciplines related to database reliability engineering such
as systems management, security and release management.
Have experience in managing projects and initiatives, with minimum supervision.
Have effective communication skills - both verbally and in writing.
Can document the processes and procedures involved.
About the Role:
The Database Reliability Engineer (DBRE) – which is an extension or subset of the SRE (Site
Reliability Engineering) model, just specializing in database technologies but with the same
underlying DevOps principle, will be a lead strategic partner in building and maintaining a
Database as a Service Platform to help software engineers build, deploy, and monitor
applications with an emphasis on automation. This is an engineering discipline that combines
software and systems engineering to build and run large-scale, massively distributed, fault-
tolerant systems.
DBRE is responsible for the availability and reliability of our most critical database platform
services and ensures they meet our internal and external users requirements. The hosting
platforms will be on-prem servers as well as public clouds such as AWS/Azure.
How you will make an impact:
Drive technology initiatives by taking the lead and providing guidance to team members.
Design, build, and maintain enterprise-scale production relational backends using Microsoft SQL Server, MySQL, or Oracle (both on-premises and in the cloud, with a particular emphasis on Relational Database Service in Amazon Web Services)
Be involved in designing, building, maintaining, and monitoring CI/CD pipelines and all deployments up to production.
Handle performance tuning, backup, and recovery tasks.
Create automated processes for recurring database tasks and deployments (such as migrations, replication, restoring backups, and spinning up new clusters).
Develop and automate best practices and repeatable procedures for deploying and
scaling databases.
Provide production and lower-environment support for assigned applications related to
their back-end databases
Build and maintain High Availability (HA) and Disaster Recovery (DR) design/implementation for complex mission-critical environments.
Assist with the design and implementation of infrastructure assets using cloud services.
Identify improvement opportunities on existing systems, build plans, and execute improvements.
Research of automation-related technologies.
Diagnose and troubleshoot database errors, including participating in an on-call rotation
and being available for on-call support as needed (even working over weekends when
required).
We are looking for people who:
Have 5+ years of experience either in PowerShell/ Windows command line scripting, or Linux scripting such as bash, especially with troubleshooting production systems.
Have 5+ years of experience in building, configuring, and managing database environments.
Experience with at least two relational and non-relational databases such as Microsoft
SQL Server, MySQL, Oracle, PostgreSQL, MongoDB and CouchDB is expected.
Experience in analyzing requirements and proposing database solutions.
Hands on experience in building, managing and troubleshooting high availability
features such as Clustering, Log-shipping and Mirroring.
Have 2-4 years of experience using cloud database services such as Amazon RDS.
Have experience in DEV-OPS configuration management system automation using tools
such as Terraform, Ansible, CloudFormation, Chef etc.
Have hands-on experience with Continuous Integration/Continuous Delivery & Deployment techniques and tools such as Jenkins and GitHub.
Have exposure to containerization (Docker) and a container orchestration system
(ECS/Kubernetes).
Have good understandings on disciplines related to database reliability engineering such
as systems management, security and release management.
Have experience in managing projects and initiatives, with minimum supervision.
Have effective communication skills - both verbally and in writing.
Can document the processes and procedures involved.