What is Google Cloud Composer: all you need to know
Every business might face the need to build hybrid or multi-cloud infrastructures, for data security or other reasons. Doing this from scratch can be a system architect’s nightmare due to the need to correctly configure and monitor all the multiple interdependencies between various infrastructure components. Meet Google Cloud Composer — a managed service for building, scheduling and running workflow orchestration pipelines for hybrid and multi-cloud environments. It uses open-source Apache Airflow and Python to avoid vendor lock-in and has steadily grown in popularity since its release in 2017. Why is it so good and what can it do for your business? Let’s take a closer look!
Cloud Composer uses DAGs — direct acyclic graphs to visually represent the pipelines you create. This is a very simple way to design, monitor and troubleshoot your infrastructure spanning a variety of platforms. Every module on the graph comes complete with webhooks, APIs, pre-configured connectors and all other network dependencies. This way, you can simply connect a module to another one on a graph, and all the required configuration will happen under the hood.
This is very convenient for startups and small-to-medium enterprises alike, as it allows them to build top-notch hybrid and multi-cloud infrastructures and workflows without having in-depth technical expertise or paying a fortune for DevOps support from Google or its affiliated partners. In addition, Cloud Composer is deeply integrated with other Google Cloud services like Google Dataflow and Dataproc, Big Query, Google Kubernetes Engine, Google Data Storage and other services, so building even complex multicomponent infrastructures is much easier than before.
Even more importantly, Cloud Composer informs if there are some issues in the underlying components, so monitoring and troubleshooting your infrastructure becomes even easier. Below we take a look at Google Cloud Composer’s main concepts, benefits and use cases.
Google Cloud Composer main concepts
Google used Apache Airflow, an open-source project for building modular architectures and workflows, to enable the Composer functionality — and now Google is one of the biggest contributors to the ongoing development of Airflow. However, this has lead to certain limitations. For example, as the Composer uses Airflow logic, it leverages the pretty rigid system of key components and dependencies between them.
This is somewhat similar to on-prem deployments, where you have to use only the resources available on a specific machine and perform operations in a specific order to ensure they succeed. Therefore, Google Composer users have to follow several important rules described in detail in corresponding Composer FAQ documentation.
For starters, Airflow works with microservices only and to deploy it successfully, you must provision several Google Cloud modules forming a Cloud Composer environment. You can have as many such environments as you need, grouped into Composer Projects. Every such project is run on Google Kubernetes Engine, interacts with required Google services via built-in connectors and is completely self-sustainable.
Not all Google Cloud regions support Composer, and it has to run within a Compute Engine zone. However, both simple projects with one Composer environment per region, or complex ones with multiple environments spanning multiple regions or on-prem datacenters can be configured based on your project needs. All Airflow components communicate with Google Cloud products via open APIs.
Your Composer environment includes 2 major parts — a Google-run tenant project ad your customer project. Tenant project runs key system components like Cloud SQL and App Engine. This provides an additional layer of security, access control and identity management.
Cloud SQL is needed to store Airflow metadata and protect these sensitive details of connection and workflow configuration. To minimize the risk of mishandling this data, Composer limits the database access to a custom service account — an entity used by a VM to make API calls, so not a single person has access to it. In addition, all Airflow data is automatically backed up regularly to minimize the potential impact of data loss.
App Engine is needed to run the Airflow webserver. It comes with IAM policy embedded, so you have granular control over who can access your Composer resources. For the sake of ease of configuration, Airflow webserver can come preconfigured by Google.
The customer project operates Google Kubernetes Engine, Cloud Storage, Logging and Monitoring features.
Google Kubernetes Engine provides an infrastructure where your key Composer components like Airflow scheduler, CeleryExecutor and worker nodes will run. CeleryExecutor uses Redis message broker to ensure workflow consistency across a variety of infrastructures that might restart their components independently.
Cloud Storage is a data storage bucket for storing data, logs, DAGs, data dependencies and various plugins. Simply placing a DAG in your storage forces Composer to automatically configure all the required dependencies
Cloud Logging and Monitoring integrate with Composer by default, providing you with a centralized dashboard to view all the logs and metrics for your project. As Airflow uses a data streaming logic, you can configure your systems to consume the events in real-time and be able to provide useful insights on your system-level data and dependencies on the fly.
To wrap it up, Cloud Composer is a fully managed workflow orchestration service that is quite easy to configure and starts providing value from day one.
Cloud Composer key benefits
We briefly list the reasons why Cloud Composer is a worthy tool for your IT projects:
- It allows building a multi-cloud environment in a simple way to combine all your data, workflows and services into a holistic system.
- It is portable and flexible, as it is built upon Apache Airflow open-source project and saves you from vendor lock-in.
- If you need a hybrid cloud solution to meet data security requirements — Composer is the best way to go to ensure safe and transparent data transfer between on-prem and cloud-based components.
- It is integrated with Big Query, Dataproc, Dataflow, Google AI platform, Cloud Pub/Sub and Data Storage, giving you an ability to use the latest Google tools for Big Data processing with ease.
- It is written in Python, so the learning curve is quite low and you can master this service in no time.
- DAGs and convenient dashboards ease the creation and configuration of your workflows, as well as troubleshooting them to the roots of any issue.
- Due to being a fully managed service, Composer is all about designing, running and managing workflows for your projects — all infrastructure configuration is one once and is fully automated since then.
Thus said, Google Cloud Composer is a great solution for startups that want to leverage the full potential of Google Cloud or their projects. Due to extensive documentation and well-thought-through operational practices, having Python skills is nearly all you need to create modular, resilient and cost-efficient infrastructures for your projects, wherever they run.
Should you have any more details regarding Google Cloud Composer — IT Svit would be glad to answer. Actually, over the course of the next few articles, we will discuss the scenarios to use Composer with, possible alternatives and use cases. If you need help configuring or optimizing Cloud Composer environments — let us know, we are ready to assist!