In the wake of the economic downturn brought on by the COVID-19 pandemic, investments in digital business transformation have accelerated. The applications that drive those processes are highly distributed and operate at a level of scale that no IT team can manage using legacy approaches to managing IT. It can take weeks to discover the root cause of any issue.
Enter AIOps. Machine learning algorithms make it possible to not only reduce the time it takes to resolve an issue they also enable IT teams to optimize IT environments at any scale continuously. Many aspects of AIOps, however, are still mostly unexplored. Rather than opting for proven platforms, some IT teams are building custom solutions in-house.
I have experienced firsthand several large enterprises that have embarked on this risky journey, including one Fortune 500 company that asked a partner to help develop a solution and ultimately deployed a commercial product. Enterprise IT teams go down this road to solve a specific tactical problem, such as alert noise reduction. In many cases, all the time and effort put into a do-it-yourself (DIY) project simply winds up being wasted.
Why DIY AIOps usually fails
Given the prevalence of open-source AI tools and frameworks such as TensorFlow, Theano, or the Microsoft Cognitive Toolkit (CNTK), it can be tempting to build your own custom AIOps platform. However, it takes considerable expertise to build, integrate, and maintain an AIOps platform. Here are the leading reasons why in-house developed AI projects are risky:
- You’ll need a properly-constructed data lake: AIOps platforms require access to data residing in multiple technology silos in real-time. IT teams that build their own AIOps platforms need to gather all the right log data, metrics, and traces alongside data collected from IT service and incident management platforms. These comprehensive data sets are necessary to train whatever machine learning framework is in place, often selected at random. Invariably, that means building or buying a costly big data platform to create a data lake to store all that data. A poorly-constructed AIOps platform will be worse than the proverbial disease it is meant to cure because the insights don’t accurately reflect what’s occurring in the IT environment. Do you have the funds for this and experienced data science experts on board to get this right?
- Designing AI-enhanced workflows is unlike other workflows: Getting the data is just the beginning. Determining how the system behaves and affects existing workflows is the next step. IT teams must decide to what degree they merely want the AIOps platform to passively surface recommendation based on observed versus automatically resolving issues based on defined parameters.
- Deployment is complex: After developing a few AIOps algorithms to produce meaningful results, the next step is to determine how to deploy it in a resilient and performant architecture. What other systems does it need to integrate with, and how will results be monitored and viewed?
- Monitoring user impact is critical: How will end-users interact with the algorithm, and what is the ideal UI/UX and workflow? How will end-users provide feedback for improvement and adoption success?
- AIOps support and maintenance is not a project but a team. Ultimately, an internal IT team would need to build the equivalent of a product that needs ongoing maintenance and support. The custom platform’s total cost starts to rise as the IT team could wind up spending most of their time managing the AIOps platform instead of making continual improvements. Even if the IT team has the expertise required to build an AIOps platform, there’s no guarantee those individuals will always be available to maintain and update it. Significantly few IT professionals spend their entire career at one organization.
- Keeping pace with marketplace innovation: Finally, AIOps as a field is still relatively nascent, and the startup community has hundreds of millions of dollars in VC backing to support R&D. Advances are being made at a rate most internal IT teams can’t keep up with, let alone evaluate and vet on their own.
How to safely encourage AIOps exploration
There’s no substitute for knowing where an organization needs to go and how to get there. A commercial AIOps platform incorporates all the best practices that have been defined by legions of IT experts, along with these benefits:
- Faster time to value: You can embark on the AIOps journey much sooner. A commercial AIOps platform will begin surfacing insights in a matter of weeks. It will take an internal IT team months to build an equivalent platform with no guarantee of success. Time is better spent on user adoption and adding and refining use cases for business benefit.
- Seasoned experts: A commercial platform provides immediate access to not only a proven framework but also AIOps experts who can troubleshoot and optimize issues quickly. There’s almost no AIOps challenge they haven’t seen before.
How to measure ROI from AIOps
Savvy organizations that invest in AIOps are primarily betting on a better way to manage IT that will enable them to accomplish more as a business. The real value proposition of any AIOps platform is that it allows an existing IT team to do more by eliminating rote tasks and making it possible to deploy more applications reliably without adding IT staff. It’s worth remembering that the cost of labor continues to be the single most significant IT expense.
The return on investment from an AIOps platform can be easily calculated by measuring:
- The number of incidents resolved in a given period;
- The size of the IT operations/incident management staff before and after an AIOps platform is deployed.
Arguably, those savings pale in comparison to the opportunity costs of using advanced technologies, which can increase operational intelligence and deliver new business value in the form of excellent user experiences and high-performing digital services. IT organizations that don’t embrace AIOps will soon find themselves unable to compete with faster, more nimble rivals that have modernized their IT processes. The easier, more affordable, and least risky way to go about this is deploying a solution already proven and in the market.
Ciaran Byrne is VP of product management at OpsRamp, authored this article.
The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Image credit: iStockphoto/Orla