Observations on the Atlassian Outage

Image credit: iStockphoto/Dmitry Belyaev

One of our less pleasant responsibilities here at Forrester is commenting on serious business, security, or technical failures in the digital and IT industry. Due to its duration and the implications for a subset of the user base, the current Atlassian outage rises to that level.

Atlassian is staking its future on being a cloud provider — it is transforming all of its products into SaaS offerings and sunsetting most of its traditional support for on-premises. The recent outage puts intense scrutiny on its abilities to execute, win, and maintain customer trust, despite the reported low number of customers impacted. Atlassian-sourced figures put the number at around 0.2% of its cloud customer base, and it says that it has restored service to ~45% of its impacted users, but the duration of this restoration now makes this an unusually long SaaS outage.

For those not familiar, 400 customers lost service on Jira, Jira Service Management, Jira Work Management, Confluence, Opsgenie, Statuspage, and Atlassian Access for a week. The outage is expected to last at least two more weeks for some.

This outage was particularly ill-timed, occurring during its annual Team ’22 customer conference. Before the outage broke, analyst and market reception to Atlassian’s business strategy was mixed. Forrester received customer complaints about being forced to shift to the cloud, as well. While there are natural benefits for customers in moving to a SaaS model (such as reduced admin work), the reputational damage to Atlassian’s cloud capabilities is occurring at a particularly contentious time. It seems likely that Atlassian’s cloud migration timelines will be adjusted.

What Can Customers Do?

In the interim, Atlassian customers should take a few steps in response to this outage:

  1. Verify whether you are affected across all of your Atlassian products and instances. You may have an Atlassian product being run independently in your organization that’s not part of standard IT channels. This discovery may prove useful for bundling instances or centralizing management in future negotiations.
  2. If you have not yet migrated off its Server option, speak with Atlassian migration reps about ongoing risk to see if there are any architectural strategies you can employ to avoid a similar outage.
  3. If you have migrated to the cloud (or started on the cloud), speak with your representative about the outage. Explore if there are additional assurances that your organization can leverage, whether it’s an advanced SLA level (e.g., Atlassian’s 99.9% and 99.95% uptime options) or architectural strategies to avoid similar impacts.
  4. Watch how Atlassian reacts to the outage:
    1. Atlassian has just concluded its first blameless mid-incident assessment (not quite postmortem yet) and posted it for public assessment. Its primary content is around communicating what went wrong, correctly avoiding narrowing down the incident to a single point of failure or individual. It should be accompanied by additional pieces following the conclusion of the incident, outlining future actions to be taken to ensure that this can’t happen again. While the initial mid-incident assessment avoided putting blame on a single individual, if Atlassian pins responsibility on a specific group in the future and obscures culpability, that will be an organizational red flag. This seems unlikely, but it is worthy of attention.
    2. Look for customer compensation beyond the required SLA. How does Atlassian make it right? Does it go above and beyond to repair customer trust, or does it meet the contractual minimums? Just meeting minimums should generate skepticism.
    3. Look at how it executes on its findings and how it acts to prevent this from happening again. Does it invest significantly in resiliency? Does it hire resiliency experts? Or does it routinely downplay the probability of such a failure happening again? The latter is less encouraging as an existing or potential customer.

Can You Use Other Tools?

Some will undoubtedly consider alternatives to Atlassian. The challenge to this approach is that Atlassian is an increasingly broad and integrated, cross-functional suite (as we can see above). The recent product announcements around Atlas, Compass, and enhancements to underlying architecture (such as the Atlassian Data Lake and Atlassian Analytics) indicate a smart emphasis of this strategy. Acquisitions (especially Opsgenie) are not remaining cohesive and decoupled — quite the opposite: They are being integrated into the whole. Atlassian increasingly finds itself in the company of vendors such as SAP or Salesforce, where replacement is made difficult thanks to their cross-functional capabilities.

For those looking for resources to compare offerings, however, we have Forrester Wave™ evaluations in the following categories:

What Is Going To Happen Next?

All is of course not lost for Atlassian. Unfortunately, high-profile outages are common. One of us worked for a major US bank that suffered a high-profile mainframe outage. Customers could not access their funds, and impacts included some customers missing payroll to their employees. That bank still exists, and there is little or no residual impact of that outage.

As our friends in the resilience engineering community are fond of pointing out, it’s a miracle that complex systems work at all — a clear-eyed examination of their operational history reveals sobering and ongoing near misses and is critical for building more resilient systems.

But Atlassian will not escape unscathed. It is in multiple challenging markets, and it has formidable competitors. Customers are going to use this opportunity to demand additional discounts and deployment flexibility. Dramatic commitments to resilience are going to be required to reestablish trust so that Atlassian can become the center of work it seeks to be. Cloud improvements have already been top of mind for the Atlassian teams (as its leaders spoke to performance improvements at its Team ’22 event on the mainstage), but further improvements to resilience must come immediately, along with financial commitments.

In conclusion, we expect Atlassian to survive the situation — the majority of Atlassian’s pragmatic customers will say “I wasn’t affected,” but we expect additional cloud migration resistance introduced into the market, as well as additional fallout dependent on Atlassian’s response to the situation. While this unfortunate situation involved SaaS offerings, this alone is no reason to abandon SaaS. Cloud services have generally proven dependable. It is a wake-up call, however, that just because you have something in a cloud service does not grant blind trust in that service. Perform your own diligence regardless of where “it” is.

In the meantime, for customers, there aren’t many options to increase your own Atlassian resiliency beyond some of the more basic steps outlined above. You can develop alternatives/options to mitigate risks, such as what Brent Ellis and Naveen Chhabra outline here. Some reports to refine your strategy include:

Forrester's analyst William McKeon-White, principal analysts Charles Betz and Christopher Condo, and senior analysts Naveen Chhabra and Brent Ellis, wrote this article. Glenn O’Donnell, Julie Mohr, David Mooter, Janet Worthington, Margo Visitacion, and Tracy Woo also contributed. The original article is here.

The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Image credit: iStockphoto/Dmitry Belyaev