We're Hiring

Explore Our Job Openings

Cloud Observability

Responsible for architecting, designing, and implementing our comprehensive observability stack, including tracing, telemetry, logging, health monitoring, visualization, and dashboards. You will play a key role in ensuring the reliability, performance, and operational efficiency of our services


Resposibilities

1. Design and implement a robust observability framework using technologies like Prometheus, Grafana, OpenTelemetry, ELK Stack, Zabbix, and Jaeger.
2. Develop and maintain health monitoring and alerting systems for our OpenStack and Kubernetes-based platforms, with a focus on GPU-supported environments.
3. Create and manage visualization dashboards to monitor system performance, resource utilization, and operational health
4.Implement scalable, distributed logging and tracing solutions to diagnose, troubleshoot, and resolve system issues effectively.
5. Collaborate with development and operations teams to integrate observability practices into the development lifecycle.
6. Conduct performance analysis and optimization to ensure system reliability and efficiency.
7. Stay updated with the latest trends and technologies in observability and performance monitoring.

Qualifications

1. Bachelor's degree in Computer Science, Engineering, or a related field.
2. Proven experience in observability, monitoring, and system performance analysis, particularly in a cloud or data center environment.
3. Expertise in implementing and managing observability tools such as Prometheus, Grafana, OpenTelemetry, ELK Stack, Zabbix, and Jaeger.
4. Strong understanding of container orchestration using Kubernetes, and familiarity with OpenStack and GPU computing.
5. Proficiency in scripting and automation using languages such as Python, Shell, or Go. Excellent problem-solving skills and the ability to work independently or as part of a team.
6. Strong communication skills and the ability to work in a fast-paced, dynamic environment.


Apply Here