What are DevOps dreaming about?
Many companies want to implement DevOps best practices to ensure cost-efficiency and stability of their IT operations. However, DevOps is a double-edged sword, as it must have a high degree of self-management in order to be efficient and have a final say in most processes to be useful. Otherwise, all the DevOps benefits go to waste due to managerial interruptions and overhead. Thus, let’s discuss what are DevOps engineers dreaming about!
The key benefits of DevOps include the removal of manual routine tasks like testing environment configuration and increasing the overall efficiency of software development. This comes in pair with automating the infrastructure management, monitoring, logging, alerting and restoration so that DevOps engineers can avoid putting out the fires every shift and can concentrate on implementing innovative solutions to further boost business performance (and reduce their own workloads).
Because let’s face it, the DevOps dreams of a software engineer is creating such a CI/CD pipeline, where the output of every previous operation automatically becomes an input of the next operation and the DevOps engineer must not monitor and control all the details every step of the way. Therefore, if such a pipeline is built, a DevOps talent literally has nothing to do, from the manager’s point of view! However, the truth is — it is managers who have nothing to do in such a scenario, and this scares them greatly.
A conventional Git-based workflow
Let’s take a look at what a usual day of an Ops engineer looks like.
He probably needs to start by checking the to-do tickets from the previous shift or finish the open tasks from his previous shift. He might finish a couple of them quickly and decide to have a cup of coffee or tea or have a smoke — and continue working on that deployment script he has to finish by the end of the sprint. While he is away, a developer finishes a new feature and creates a request for a QA engineer to test it. A QA engineer, in his turn, requests a testing environment to run the tests.
We know that the majority of companies have automated this process to a certain degree, at least they run their testing server farms 24/7. so the QA engineers can have access to them whenever needed. However, this is not always the case, as it can be quite resource-intensive, and there might be a need to spin up a new testing environment for this request.
Meanwhile, another urgent alert from a monitoring system is received, and the Ops engineer has to postpone his plans on working on the script and start putting out the new fire. He might have seen the request from the QA or he might have missed it completely, as a high-priority issue must be dealt with first. However, once the issue is over, he might start configuring the test environment, or forget about it completely (we are all humans, after all) and work on his script — or have another bunch of hot issues to fix.
Meanwhile, the developer works on another bunch of code and the QA works on another testing task, but once it is done the QA finds out that the 4-hours-old request for a testing environment configuration is not completed yet. In the best-case scenario, it is done and the QA can start testing the new batch of code — but this is not always what happens.
Now, let’s assume some bugs are found when the code is tested. The QA engineer prepares a report on them and sends it back to the Dev for fixing and issues a request to shut down this test environment — as he has received several new testing requests from the Devs and has issued several more requests for testing environments to the Ops engineers. He might be able to test a couple more batches of code if the environments for them were configured in time. But this is not always the case, as Ops engineers have to deal with an endless influx of alerts from the production environment.
Rinse, repeat. This is an endless kaleidoscope of repetitive actions that blur and form a software development bog your IT department has to wade through. Mind you, we have not even covered the other variant, where the new batch of code successfully passed the tests and has to be pushed to the staging environment where a new product version is to be extensively tested before release. Unfortunately, some bugs might still make it to production despite the best effort of your QA and Ops teams, so post-update crashes are still a reality, while not so widespread as they were a decade ago.
Another important issue is the misalignment of goals. The Devs have to roll out as many story points per sprint as they can, the QA engineers have to find as many bugs as they can, and Ops engineers have to minimize the number of issues in production. Thus said, the goals of the Devs and Ops are misaligned and even contradictory, as the Devs want to maximize change, and Ops want to minimize it to preserve the stability of operations.
This often results in multi-stage approval processes for every release, so there is always someone to put the blame on if anything goes awry. As the developers don’t want to get the blame, they try to avoid experimenting, which significantly reduces the pace of innovation in the company.
Add a cherry on top — there always are repetitive infrastructure management operations like database backups & restoration and other activities that are performed via scripts (the ones the poor Ops never have time to write) and this leads to new product releases being seldom and quite risky.
How do you deal with risk? You spread it across many decision-makers, don’t you? When an Ops engineer requests a purchase of a new dedicated server (or a whole server farm), he must provide the economic grounding for it, which must be approved by the Team Lead, the Project Manager and all the rest of the managerial body up to the CTO as this is quite a significant investment. Each approval might require some convincing and result in time delays and the risk of missing the opportunity that demanded this server farm can be quite high. This is the unfortunate reality of enterprise-grade software development.
DevOps approach: culture, flexibility, waste removal
DevOps culture, as the practical implementation of the Agile software development methodology with the addition of LEAN principles, concentrates mostly on three aspects:
- Waste removal. All repetitive actions, like manual environment configurations, equipment acquisition and provisioning approvals, data backups and restorations, etc. — all of these operations are automated to make them cost-efficient and error-proof.
- Culture of collaboration. Quite contrary to the enterprise situation with “throwing the code over the wall” between developers and Ops engineers, DevOps culture of communication and collaboration allows aligning the goals of Devs and Ops, so both teams are interested in delivering value as quickly and consistently as possible.
- Innovative flexibility. Due to using virtualized cloud resources, the cost of provisioning the environments needed for experimentation is almost negligible. This facilitates experimentation and greatly increases the pace of innovation, as the cost of error is minimal.
Due to this approach, the developers can gather customer feedback and stakeholder input, transform it into a plan for new features, code, build and test them seamlessly, so that DevOps engineers can release new product versions automatically, operate and monitor them and continuously gather feedback for new planning stages.
This all might sound fine and good, but how is it actually done? There are three key components of DevOps workflows: IaC, CI and CD.
- Infrastructure as Code or IaC.This is a software development and infrastructure management practice where all environmental and operational parameters are scripted in a form of simple text files stored in your VCS. Executing these files creates infrastructure and changing values in them like when versioning any other code creates new infrastructure states. These files are called “manifests” and are used by DevOps infrastructure configuration tools like Terraform, Kubernetes and Pulumi.
This approach allows building stateful immutable infrastructures, where it is easier to reboot a component than trying to fix an error. This enables greatly saving time and reducing the complexity of all IT operations, streamlining your business processes and making them more productive.
- Continuous Integration or CI. This is a practice of configuring the software development processes in such a way that developers can build the code in small, clean batches and test it against the previously prepared automated unit and integrity tests. A valuable aspect of this approach is the automated provisioning of the needed environments for testing and preparation of all the needed artifacts and dependencies by a previously configured manifest. This way a repetitive, tedious and error-prone process becomes a simple and easy operation performed by any developer in one click.
Most importantly, though the results of all stages are reported automatically, so if the testing is successful, the developers are informed at once, and if anything fails, they receive a report into Slack with all the information needed to fix the issue — and re-run the tests again in minutes, not hours. This results in a clean and efficient code.
- Continuous delivery or CD. This is a practice of configuring all the stages of the SDLC in such a way that the output of the previous stage automatically becomes the input for the next stage. This works both for developing new product features and for infrastructure management in production and ensures the minimization of human error risks in your IT operations. CI/CD processes are called “pipelines” and are enabled by tools like Jenkins, Circle CI, Ansible, etc.
Now, when you know how DevOps engineers work in theory, let’s take a look at what a perfect DevOps day looks like:
- A needed code repository is set up using the code writing best practices and with all the required permissions and credentials for contributing the code
- The required CI/CD pipeline is automatically deployed with all the permissions needed to use it
- The team uses a modern secure code project framework able to toggle features on and off for testing. This framework provides reusable libraries and is configured using DevOps best practices.
- The team has access to an end-to-end dashboard for monitoring all the important project aspects.
- A developer submits an environment request and clicks a button.
- A preconfigured manifest is used to provide the pipeline and configure the required cloud environment.
- If the unit and integration testing are successful, the pipeline automatically compiles the new batch of code, builds a new app version, creates 2 temporary cloud staging environments in different regions and deploys the project there, enabling auto-scaling and failover between these virtual machines, testing the deployments on staging.
- The pipeline notifies the developer of successfully passing the tests (or informs of the bugs encountered and stops). If the staging tests are successful, the CI/CD pipeline requests approval for releasing the new app version to production from the release manager and securely logs the approval.
- The pipeline performs an instant snapshot of the working production environment to enable secure backup&restore upon request.
- The pipeline performs and always-on secure release of the new product version to production (Canary, Blue-Green, etc.) with the ability to provide a granular roll-out to the whole user pool with instant rollback if need be.
- The deployment pipeline triggers the monitoring pipeline that generates detailed logs for monitoring, alerting and debugging.
- The monitoring pipeline produces and sends to the dashboard real-time metrics that shape the team’s Definition of Done and confirm successful value delivery to product end users.
- Team members are able to generate various reports in one click to ensure system performance, compliance and ease of auditing
- The monitoring pipeline automatically alerts of the issues and is coupled with a prescriptive analytics Machine Learning model that fixes the issues automatically. ensuring self-healing capabilities and stable uptime of your infrastructure
- All aspects of all environments and pipelines are easily versioned through the manifests
- The team receives automated alerts if some open-source system modules and libraries can be updated to the latest stable versions, ensuring maximum security of operations
- Project stakeholders can receive any needed data, metrics and reports themselves via the dashboard or external webhooks.
And do you know what is the DevOps engineer doing while all of this is happening? He drinks coffee in peace and plans on the next infrastructure to architect and configure.
As you can see, this differs quite a lot from the nightmare of manual code development and infrastructure management. This DevOps workflow is enabled due to the three principles listed above, using the DevOps tools we mentioned and following the DevOps culture of communication and collaboration.
With this approach, the DevOps engineers and developers discuss the architecture of the future product, plan the needed infrastructure and DevOps specialists prepare all the pipelines and manifests, along with step-by-step guides JUST ONCE. After that, the developers can use them and version them the way they need it without ever bothering the DevOps — who can now do the same for the next project or plan on improving the infrastructure used by the organization and removing its performance bottlenecks.
Conclusions: what are DevOps dreaming about?
Keep in mind that the pipeline described above requires only a DevOps to configure it, a developer to launch it and a single Release Manager to approve the release. It removes the need for endless approvals from various intermediaries, which is very beneficial — and very dangerous for enterprises, as when DevOps pipelines are implemented, it turns out many middlemen are not needed anymore. However, the economical savings are so huge that many enterprises embrace this digital transformation to become competitive and outperform the rest of the market players.
How to achieve these results? The best way for a company that has no in-house DevOps expertise is to contract an external Managed DevOps Services Provider, who will build the needed pipelines and help your teams become much more flexible and productive. As a result, you will see what the DevOps are dreaming about — making systems that work with minimal observation and intervention, instead of constantly putting out the fires manually. Rinse, repeat for the next project.