How To Avoid the 10 Big Data Analytics Blunders

Image credit: iStockphoto/AaronAmat

Leading organizations are leveraging an analytics-driven approach — fueled and informed by data — to achieve marketplace advantages and create entirely new business models. However, even the savviest companies are repeating common missteps.

Here are the top 10 blunders we see in working with our customers — plus insights into how you can work to overcome them.

Blunder #1: Not moving to the cloud

If your organization isn’t planning to become cloud-exclusive, you could be backing losing technology. The cloud is more elastic than your in-house solution and more cost-effective in the long run.

The cloud will save your organization a raft of money, allow your business to take advantage of new technologies with elastic compute, and open your organization to new geographies. So take action now and look into what the cloud offers.

Blunder #2: Not planning for AI/ML to be disruptive

Make no mistake: AI will displace some of your workers and has the potential to upend how you handle your operations. But there is only one choice: you can be a disruptor, or you can be disrupted.

If you want to lead, you must be willing to pay for talent and act quickly because the best talent is being snapped up fast. Of course, HR won’t like what you need to pay for machine learning (ML) experts but spending money now on experts nets you a much greater return in the long run. And don’t make the mistake of contracting this essential skill out.

Blunder #3: Not solving your real data science problem: dirty data

You’ve hired data scientists, so you think you’ve got big data analytics covered. First, however, it’s crucial to look at how they are spending their time. Unfortunately, most of their time is spent analyzing and cleaning data and integrating it with other sources. 

A machine learning expert at iRobot told me that she spent 90% of her time working on data discovery, integration, and cleaning. Of the 10% left of her time, she spent 90% of that fixing data cleaning errors —which left about 1% of her time to the job she was hired for.

Without clean data, your data science is worthless. So, have a clear strategy for dealing with data cleaning and integration, and have a Chief Data Officer on staff.

Blunder #4: Believing that traditional data integration techniques will solve issue #3

Clean, integrated data—at scale—has become nearly impossible to achieve with traditional techniques and technologies. Extract, transform, load (ETL) processes require intensive human effort and take a lot of time. In addition, every time a new data source is added, a human’s capacity to manage that additional information is diminished. In my experience, I’ve never seen this human-first technique work with more than 20 data sources. Most enterprises need to integrate far more than that. 

Additionally, once you’ve run ETL processes, you need to match records to determine which ones go together and remove duplicates, traditionally using rules-based Master Data Management (MDM) systems, which also don’t scale. Rules can work to generate training data, but they don’t work for solving big problems.

Blunder #5: Believing that data warehouses will solve all your problems

Data warehouses are great for structured data from around 10 data sources, but they don’t work for things like text, images, and video. Many companies have bought into traditional data warehouse technology that costs up to seven figures a year. But they’re only useful in a limited way. If you have a data warehouse, don’t try to shoehorn unstructured data into it.

Blunder #6: Believing that Hadoop/Spark will solve all your problems

Many companies have invested in Hadoop, the open-source software collection from Apache, or Spark, the company’s analytics engine for big data processing. They have their place, but they are not the answer to everything. So would you use a “lowest common denominator” solution for your company’s “secret sauce” — or the best the industry has to offer?

Also, keep in mind that Hadoop and Spark won’t solve your data integration problems, where data scientists spend the bulk of their time. (see Blunder #3).

Blunder #7: Believing that data lakes will solve all your problems

Many people assume that if a company loads all its data into a data lake — a centralized repository for all data — they’ll be able to correlate all their data sets. But they often end up with data swamps, not data lakes.

It’s a problem of garbage in, garbage out. For example, let’s say HR databases need to account for employees working in two different locations. If two records are simply added together, staff will be over-counted by the number of duplicates. The net result is your analytics will be garbage, and your machine learning models will fail. Companies need to clean their lake data with a data curation system that will solve these problems.

Blunder #8: Outsourcing your new stuff to big data analytics services firms

Typical enterprises spend about 95% of the IT budget on running legacy code, and they often have their best people doing things like maintenance. The most exciting stuff gets outsourced, usually because there is no appropriate talent internally or because the best people are stuck keeping existing systems running.

This is a losing strategy. The “new stuff” will propel the business forward and keep your best, most creative people engaged. Instead, companies should outsource mundane things like maintenance, email systems, and such, not the promising new technologies.

Blunder #9: Succumbing to the Innovator’s Dilemma

In his classic book, The Innovator’s Dilemma, Harvard Business School professor Clayton Christiansen suggests that when technology changes and you are a vendor selling the “old stuff,” it is tough to pivot to the new stuff without losing significant market share in the process.

As a business, you have to be willing to change and evolve when it is needed. It’s possible—and even likely—that a reinvention will hurt your business in the short term, but it’s essential to stay in business for the long run. There are plenty of examples of this in practice. Most people are familiar with the emergence of ridesharing companies like Lyft and Uber and the negative consequences for legacy taxi companies. Today, a taxi license in the City of Cambridge has dropped from USD 700K to USD 10K.

Blunder #10: Not paying up for a few “rocket scientists”

Companies need to invest in a few highly skilled employees to address all of the above issues and the hundreds of others you will inevitably face. The new hires are not going to wear suits, but they will be your guiding lights.

(Bonus) Blunder #11: Working for a company that is not trying to do something about the “sins of the past”

If you work for a company that’s falling into any of the above blunders, figure out how to fix it — or start looking for a new job.

Michael Stonebraker, chief technology officer and co-founder at Tamr, wrote this article. You can find the original article here.

The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Image credit: iStockphoto/AaronAmat