Overview of Managed Airflow by Google Cloud Composer
Cloud computing widespread adoption opened possibilities of building complex systems spanning multiple cloud platforms and on-prem facilities. However, setting up the required workflows required to make such systems work requires a great deal of scripting and troubleshooting to make sure all the dependencies work as intended. Apache Airflow is a great microservice-architected open-source project for building and managing distributed systems and workflows. Cloud Composer is a managed service of Airflow from Google Cloud Platform, and today we briefly overview its key features, pros and cons.
First of all, what is Google Cloud Composer? It is a fully-managed service for designing, running, scheduling and troubleshooting distributed workflows for hybrid cloud and multi-cloud systems. Due to it being managed, developers and system administrators can concentrate on designing and running their workflows, instead of handling the needs of scaling, backups load balancing and other DevOps aspects of such operations.
Written in Python atop open-source Apache Airflow project, Cloud Composer can be integrated with a wide variety of Google services and open-source products via APIs, allowing any business to build complex infrastructures and workflows in a simple and understandable way with Google and use it anywhere later on. With quite an affordable pricing, Cloud Composer can become a one-stop-shop for many businesses in need of building a reliable workflow management solution. Below we list the main features of Cloud Composer.
Google Cloud Composer main features
Every product from Google shares the same paradigm: a well-thought-through approach to the system architecture with deep integration with the rest of Google and an open-source ecosystem, built with attention to detail and positive customer experience at its core. Google Cloud Composer shares these characteristics through the following features:
- Portability. Built atop Apache Airflow, your Composer project can be taken to any other platform and work there successfully, after adjusting the underlying infrastructure.
- Multi-cloud functionality. Built-in connectors allow replacing Google Cloud services with other cloud products should the need be, avoiding vendor lock-in.
- Hybrid cloud operations. Cloud Composer ensures safe transfer and processing of data stored at your on-prem data warehouses, allowing the businesses to combine the unlimited computational scalability of the cloud with the security of on-prem operations.
- Python. Being the most popular programing language for Big Data operations nowadays, Python was obviously the best choice for building Apache Airflow. Due to using Python, developers can quickly design, troubleshoot and launch workflows and pipelines for their projects, without having to worry about the DevOps side of things.
- Integration. Cloud Composer interacts with other services via APIs and has native support for Google products like Big Query, Cloud Datastore, Dataflow and Dataproc, AI Platform, Cloud Pub/Sub and Cloud Storage. However, any of these components can be easily replaced with AWS or Azure analogs, should you need this, due to a variety of connectors, plugins and extensions available.
- Resilience. Built atop Google infrastructure, Cloud Composer is a very fault-tolerant system that can ensure the reliability of your operations and provides convenient dashboards for system performance monitoring and issue root cause troubleshooting.
Let’s take a quick look at the pros and cons of working with Cloud Composer.
Cloud Composer benefits and downsides
Many Big Data architects and developers of data-driven software products seek for a way to get a hosted solution and build the systems they need without having to configure the underlying infrastructure. Cloud Composer provides the following advantages in this regard:
- Speed and ease of configuration. Once you register a Google Cloud account, configuring Composer is literally a couple of clicks away. During the 20 minutes needed to launch your Composer project, you can simply select the Python libraries you are going to use from a detailed PyPI list, configure the needed environment variables, etc. — and voila, you are good to go.
- The simplicity of deployment. Composer projects are build using DAGs — Directed Acyclic Graphs, which are stored in a dedicated folder in your Google Cloud Storage. You have a detailed dashboard, where you can compose a DAG from a variety of available components, and simply drag-and-drop it to this folder — the service does all the remaining configuration itself, and the working data pipeline appears in the UI of your Airflow webserver. Should you prefer CLI operations to drag-and-drop — no problem, this can be done through gcloud.
- Clean UI. Cloud Composer is a managed service, meaning most of the configuration happens behind the scenes and your dashboard is not clogged with multiple checkboxes. Your Google Cloud dashboard connects to the DAG folder and to the Airflow webserver, so you can easily troubleshoot your pipelines in real-time.
- The latest Python version supported. Upon release, Cloud Composer worked with Python 2.7 only, but as of now it supports Python 3.6 and works hard to implement all the latest Python features.
The only serious downside of Google Cloud Composer is that it is a managed service, meaning you pay a bit more due to getting a ready solution instead of configuring the infrastructure yourself. However, $250/mo. is not a huge price for a single Composer project, while it might be a bit too much for pet projects.
On the other hand, as a managed service, Cloud Composer has limits to the number of services and integrations it supports. In some cases, especially when dealing with legacy on-prem infrastructures, you might need to build custom connectors or APIs that are not readily available from Google. In addition, the troubleshooting of DAG connectors requires in-depth expertise with Google Cloud Platform operations.
Conclusions: Cloud Composer is a solid choice
Thus said, Google Cloud Composer still is a solid choice for any business or entrepreneur who needs to build a reliable data processing ecosystem. It is a polished solution with extensive documentation, allowing any customer to learn the basics in a short time. However, you are better off having access to a solid GCP expertise to speed up the initial troubleshooting while building your data pipelines and/or build/configure custom connectors and APIs when working with legacy infrastructure.
IT Svit has this expertise based on a large number of GCP-based projects we successfully accomplished for our customers. We are ready to help you use Google Cloud Composer with maximum cost-efficiency, so feel free to contact us with any questions regarding managed Airflow operations — we would be glad to answer!