🌎
This job posting isn't available in all website languages

Grafana & Prometheus Specialist

📁
Senior Lead Software Engineer (Level: Lead Consultant)
📅
CREQ220247 Requisition #

Grafana & Prometheus Specialist Experience Minimum 5 years of relevant work experience with observability particularly Grafana & Prometheus set up in critical production environments. Writing custom exporters and integrations. Has experience working with OpenShift private cloud infrastructure and hosted applications. Experience with Managed Grafana on public cloud environments is beneficial. Multitenancy setup and data segregation on the observability and AIOps stack. Defining SLIs and setting up SLOs for multitenant solutions. 

  • Experience in implementing Container, Network, APM, RUM, Log Analytics, end to end tracing, and custom alerts with Grafana, Prometheus, Grafana Loki alternatively Logstash or Fluent bit.
  • OpenShift proficiency with containers and multitenancy setup for the observability solution is critical.
  • Ability to configure custom alerts, monitors and build AIOps workflows based on telemetry.
  • Good understanding of setting up integration capabilities with other systems via APIs and consuming external APIs for IAM as well as ingesting metric based telemetry via collectors.
  • Ability to build custom Grafana dashboards.
  • Setting up Synthetic Monitoring and Test Automation while integrating its telemetry into the observability stack.
  • Tenant and data segregation. Ability to code is mandatory Python Java and Ansible scripting preferred.
  • Qualification in Grafana, Prometheus official certifications or alternative certifications from Udemy, Coursera or other platforms.
  •  Cloud certifications particularly AWS or OpenShift related are required.
  • Any recognized System Architecture qualifications e.g. TOGAF are a bonus. Role and Responsibilities Architect, design and ensure Implementation of the entire observability solution to be packaged as a module in a multitenant private cloud solution.
  • Implement an observability solution to monitor and apply the same feature set across all tenants monitor and act upon telemetry from tenants serving as a hypervisor.
  • Design and implement integrations as well as externalize APIs. Set up authentication and authorization controls by integrating with an IAM layer.
  • Work with UIUX teams to design dashboards for the Observability & Maintenance platform for both the tenants as well as the host.
  • Design and set up an AIOps module responsible for automated remediation workflows such as capacity scaling, container restarts, anomaly detection, etc.
  • Work on building Proof of Concept solutions to view end to end tube maps service flows for the respective tenants services.
  • Defining and setting up a CMDB to serve as a source for the infrastructure and application telemetry.
  • Work with other teams to ensure the system is well tested and scalable, meeting the demands of the tenants. Define SLIs and set SLOs for core services and journeys.

Previous Job Searches

Similar Listings

Colombo, Western Province, Sri Lanka

📁 Senior Lead Software Engineer (Level: Lead Consultant)

Requisition #: CREQ220243

Colombo, Western Province, Sri Lanka

📁 Senior Lead Software Engineer

Requisition #: CREQ210445