ELK VS Loki! How to gather logs from Kubernetes cluster and effectively navigate through them
It is rather crucial to constantly monitor cluster activity, detect bugs and provide debugging as quickly as possible. Logs help follow what is going on inside of any cluster. If you want to monitor the cluster and easily get to the logs, if anything happens, you need a cluster-level-logging that requires a separate backend to stockpile, examine, and solicit logs. Unfortunately, Kubernetes doesn’t provide a native storage solution for log data. You can assimilate some logging solutions into your Kubernetes cluster, thought. Therefore, here we will discuss logging to Kubernetes and how we can gather logs from the Kubernetes cluster and navigate through them efficiently.
What is a Logging system? Why do you need it?
In simple terms, a log management system works to identify what you need to log, define the ways of logging, and determine how long the data should be maintained. The general requirements for logging systems are the following:
- they are CPU/RAM efficient. It is very desirable if a logging system consumed very little CPU/RAM;
- they are storage efficient. The amount of disk space that is needed for storing the logs is highly dependable on the amount of produced logs, but basically, logs can be compressed that will allow saving disk space at the level of the logging system itself;
- they allow flexible parsing. Good logging system support logs parsing to get only needed information;
- logs are easy to read and filter. The UI allows any developer to get logs of some applications and filter entries;
- they can operate a wealth of data. There can be TiB of logs. The system has to be able to proceed with user queries and select logs in an acceptable time.
Kubernetes native logging
Kubernetes gathers logs out of the box and you can read any logs from the pods using kubectl. We think it’s not enough because logs are rotated by logrotate and the default size per pod is 10 MiB, as such, we won’t be able to store logs for let’s say a month, without an additional configuration.
ElasticSearch + Kibana + ? (ELK) solution
ElasticSearch is a powerful system that can store and proceed with many log entries, TiB of data. It is a well-known solution that is often used as a text search engine. Kibana is a native UI that is used for index management, log selection and many others. Here is a full list of ElasticSearch+Kibana features. If you provide enough resources it can handle the TiB of data. However, ElasticSearch does not gather log entries itself. To do that you can use:
- Logstash is aimed to process many logs;
- Filebeat is a more lightweight solution. Filebeat, developed by elastic. co can also handle Logstash but does not perform complex data processing. At the same time, it allows you to declare ingest pipelines which will be processed by ElasticSearch nodes themselves;
- Fluentd is a well-documented relatively new open-source solution that has already been provided with many plugins to support a common software. You can use plugins to parse logs without any advanced parsing configurations;
- Fluentbit is another open-source solution, which has as many plugins as Fluentd, but it is more resource-efficient.
Loki + Promtail + Grafana + Prometheus (PLG) solution
Grafana + Prometheus is a well-known monitoring solution. Grafana Labs started the development of Loki/Promtail that are scalable and simple solutions for log gathering. Here you can find an in-depth comparison of Loki with other logging systems.
E?K & PLG Comparison
|Data processing||Logs content processing||Metadata processing|
|Search||Can handle complex queries and filter||Can process only simple search queries, cannot filter|
|Scalability||Highly configurable||Configurable but not well-documented yet|
|Access||Has authorization only in paid subscription||Grafana allows managing users and restricts access to logs exploration out of the box|
|Log-based alerting||Complex alerts configuration using third-party tools||Promtail can produce log-based metrics, but it cannot be configured through UI|
|Performance||Can make complex selections||It is not recommended to select more than 5-10k entries|
|Dashboards||Kibana can be used to build reach dashboards even with IP geolocation||Only simple tables with log data|
E?K is a very powerful solution that allows us to build complex logging systems and dashboards to proceed with big queries. PLG stack, on the other hand, gives us a small, efficient and simple logging system that allows us to explore logs without complex queries. E?K consumes more resources while PLG does not provide advanced querying capabilities.
Let’s look at two hypothetical situations. Let’s say you have a start-up built on the top of a small Kubernetes cluster on GCP and you want to be able to find some new errors that can appear in one of two environments. No alerts or complicated dashboards are needed here. As such, we would suggest the Prometheus + Grafana + Loki solution in this case. These tools will allow gathering logs and making simple selection queries to find the last errors. Also, the same stack can (and should) be used for cluster monitoring. Moreover, Prometheus + Grafana will provide monitoring dashboards and alerts. The other situation. Let’s say you present enterprise solutions and have several QA, UAT, production environments and many Kubernetes clusters. There are a lot of logs sent from services and it would be great to minimize RPO/RTO. You want to extract some metrics from the logs stream, receive alerts if there is an exception located. The solution here would be ElasticSearch + Fluentd + Kibana to monitor logs. It is an excellent chain of tools to gather, process, store and query log entries. There are cons, however, it is not very handy to build alerts based on the logs stream, but it is possible to solve. In any case, there is no better solution for a gathering of a vast amount of logs.