IT Svit internal monitoring system
IT Svit cloud infrastructure is diverse and when something stops working, it’s hard to identify the issue and react at once. This is why we decided to implement an internal monitoring for our services with the following components:
- Prometheus operator
We wanted our system to provide the following results:
- Automatic monitoring of various cloud infrastructure parameters like CPU usage, bandwidth usage, disk volume usage, etc.
- Timely alerts if any issues occur
- Informative notifications with screenshots to simplify troubleshooting
With installed monitoring, consisting of Zabbix+Prometheus+Grafana, we always receive alert notifications in Telegram and know when something is wrong with our Linux server or Kubernetes cluster. It allows us to respond instantly and solve issues quickly. Also it allows to save time, because we know an issue and the reason of the issue.
Partnership period: 2005-ongoing
Team size: 2 – 4 people
Team location: Kharkiv, Ukraine
Services: Cloud architecture, cloud infrastructure management, cloud monitoring solutions
Expertise delivered: AWS cloud administration, DevOps services, cloud infrastructure management, monitoring solutions configuration
Technologies: Zabbix, Kubernetes, Prometheus, Grafana
This was the internal project aimed at improving IT Svit DevOps team versatility and performance. We simply wanted to be better informed of the various processes within our IT infrastructure in order to be able to identify and solve the issues faster:
- Automatic monitoring of the infrastructure must be performed by Zabbix
- Timely alerts sent to Telegram
- Informative screenshots with the trouble description to simplify troubleshooting
Project implementation and challenges resolved
In order to achieve the goals set, we implemented the following decisions:
- We used Zabbix + Prometheus as a datasource for Grafana. This allowed us to enable detailed and on-point issue reporting
- Zabbix agents are running in every container or instance within our infrastructure and they report to a Zabbix server should any issue arise.
- A Python script sends all the information about the incident to Telegram chat, so the admins are notified momentarily if something happens — and we know what exactly happened at once.
Combining Zabbix + Prometheus + Grafana allows us to monitor the IT infrastructure cost-effectively and almost effortlessly, while responding to the issues immediately.