The cost of all that scalability, speed and agility? Complexity. Bare metal servers with well-known host names were once clearly traceable to a login service, a customer database, an email server or an e-commerce pipeline. Now, there are abstracted layers of infrastructure, hosts, containers, applications and databases with poorly defined, often transient relationships. Changes to those relationships, configurations, versions, clusters, roles and content occur constantly, executed manually and through automation, using a variety of methods. There is nothing to hold that complexity back. Software and infrastructure engineering teams optimize within their silo and ensure they adapt to new inputs and deliver expected outputs.
IT operations teams now find an existential challenge.
These inputs consist of the events and alerts coming from monitoring processes that were configured before, during and after multiple waves of digital transformation. Those events, from client, application, infrastructure, network, cloud, database and data center monitoring services, all funnel to those IT Ops teams. IT Ops must then transform these inputs into a picture of service health, and identify when and where problems are occurring, and react with speed, decisiveness and clarity.
However, the complexity of modern services has distributed the understanding of how things work and how to risk manage events and alerts across too many human nodes for an advisory board to effectively communicate and decide where the risks are and how to mitigate them. Those choices are decentralized, and safe changes ultimately depend on good choices, good architecture and good safeguards.
Now the IT Ops team has largely lost its defining trait: centralized situational awareness. They have been heads-down, trying to make sense of the increasing volume of events and alerts as their peers in the IT department have gained all this new capability. They have been trying to figure out ways to maintain the old service stack on bare metal, while transitioning to the new one in the cloud. They’ve been moving individuals to the right security groups, granting access permissions in a sensible way, maintaining certificates across hosts they are no longer sure about, decommissioning old servers and ensuring the new ones are up to the latest approved security version. In prioritizing and managing an event stream with entirely new volumes of alerts, IT Ops has sacrificed clear signal.
Digital transformation, it seems, ignored IT operations.
Continuous delivery doesn’t provide a holistic understanding of IT changes.
What used to be manageable tribal knowledge gained over a period of months during onboarding has been multiplied and fragmented. And it’s now difficult to sort out this knowledge. To know if the 200 alerts that fired in the past four minutes are related to application change A, marketing push B, planned host migration C, security update D or undefined incident E, IT Ops teams must trace the alerts and determine known relationships and the context of the alerts.
That sort of holistic understanding requires more than increased scalability and speed of clouds and containers. New DevOps CI/CD pipelines cannot provide this change-related insight. Instead, IT Ops teams require a system-wide map of the new digital terrain as a backdrop on which they can plot the incoming events and changes with all the required attributes. That does not happen in silos.
But for most organizations, digital transformation occurred in pockets, piece by piece, over a period of years. And the incidents did not stop during that transformation; they increased in volume. The traditional ITIL model did not provide much guidance to IT Ops beyond: “Do the same things, just a little faster and with more precision.”
That is where IT Ops teams have been stuck. But the tide is beginning to turn. The underpinnings of their transformation are starting to take shape. Monitoring, deduplication and filtering were the start. Then came multisource topological enrichment of barebones event payloads. Rules-based and then ML-based event correlation to group-related alerts came next, which was extended to include correlating changes. API-based tools integration with operational enrichment provides fully contextualized incidents across collaboration and ticket and chat systems, restoring situational awareness.
Automating and integrating manual bureaucratic tasks enabled IT Ops to drive straight from awareness to contextualization to understanding and then execution. The challenge now is to unify these new capabilities into a cohesive strategy. Such a strategy must deliver clarity and actionability to IT operations teams, so they can fulfill their mission of maintaining service availability, centralizing that function within an organization.
Modern IT operations must return to its roots.
The operations transformation doesn’t have a buzzword yet or a Google-authored reference manual for operations engineers to quote. Although teams are leveraging automation in multiple places in the workflow and data management, the process isn’t “autonomous operations” because humans are still required; they just have new capabilities. Similarly, although AI and ML are used, “AIOps” is too limiting. The IT operations transformation is made up of multiple new capabilities that are married to a renewed focus on what IT operations has always been there to do: maintain situational awareness and act quickly and effectively to minimize any service disruptions.
Cubility are the trusted advisor to some of Australia’s largest oil and gas, mining, utilities and public companies. We help ensure your company is operational ready and business effective through modern technology strategies, program management and IT support.