With modern information technology constructs such as microservices, containers and serverless environments proliferating, the task of ensuring that applications are available and running smoothly has never been more challenging.
Traditional application performance management tools aren’t up to the task, so a new discipline — “observability” — has emerged that encompasses the myriad factors that make up modern cloud-native software. But achieving full observability into hardware, software and networking stacks is a challenge many organizations are still struggling to master, according to a survey being released today by New Relic Inc.
The poll of more than 1,600 practitioners and IT executives across 14 countries found that only 27% have achieved what New Relic defines as full-stack observability, or the ability to see everything in the tech stack that could affect the customer experience. Just 5% had achieved what the company defines as a mature observability practice.
Perhaps the most alarming finding is that more than half of respondents said they’re experiencing “high-business-impact outages” at least once per week and 29% said it typically takes more than an hour to recover from them. Peter Pezaris, senior vice president of strategy and user experience at New Relic, said the frequency of incidents was surprising. “To have to deal with an outage every week is crazy, but that’s how complex software has become,” he said.
Cloud-native applications are composed of loosely coupled microservices that connect both to traditional stacks and modern cloud services. That gives organizations flexibility in development and deployment but at the expense of complexity.
For example, a single transaction in a hybrid cloud may combine function-as-a-service, Kubernetes container orchestration, load balancers and database services both in the cloud and on-premises. Complex applications can have millions of moving parts spread across multiple public and private clouds, making it almost impossible to pinpoint the cause of the problem.
At this point, most organizations appear to be struggling simply to know when problems occur. One-third of respondents to the survey said they mainly rely on complaints to detect outages. Only 3% said they have all 17 major observability capabilities that New Relic suggested in place. These include alerts, browser monitoring, database monitoring, distributed tracing, log management, mobile monitoring and network performance monitoring.
Legacy application performance management tools are mostly focused on a subset of monitoring capabilities and don’t integrate well with each other, Pezaris said. “Where the industry has fallen short is that today’s customer has picked up point solutions that give them different observability from different vendors,” he said, noting that more than 80% of survey respondents said they use four or more such tools.
The most commonly deployed observability practices include network/security/database monitoring, alerts, infrastructure monitoring and log management. All are in use by between 50% and 57% of respondents. The least used are Kubernetes monitoring, machine language model performance monitoring, synthetic monitoring, distributed tracing and serverless monitoring. Fewer than 40% of respondents were using them when the survey was fielded last spring.
Only 7% of respondents said their telemetry data is unified in one place and only 13% said visualization or dashboarding is unified. “The reason it takes so long [to diagnose the problem] is the complexity of the software subsystems and the skill levels of the teams involved,” Pezaris said. “You have people from infrastructure, development and network engineering involved and everyone is seeing something different.”
Solutions in hand?
The good news is that survey respondents are confident they have the problem in hand. By 2025, most indicated, they would have between 88% and 97% of the 17 observability capabilities deployed. More than 70% expect to maintain or increase observability budgets next year.
There is evidence that those investments will pay off, as more than one-third of respondents said observability will increase their productivity, 30% expect it to improve cross-team collaboration, and 28% think it will make them more innovative.
However, technology doesn’t stand still and the growing adoption of artificial intelligence, the “internet of things,” edge computing and blockchain will create new complexity. “With newer technologies, you don’t have the benefit of a 20-year history to work from,” Pezaris said.