Why Humans Are Not Optional in a Data-Driven World

Businesses today are putting more emphasis on establishing a data culture than they ever did in the past, with spending on big data and business analytics projected to reach USD448 billion by 2027. Beyond traditional analysis and business intelligence, organizations are also leveraging AI technology to glean fresh insights from their data.

Tools such as TensorFlow and PyTorch have dramatically lowered the threshold to implement machine learning (ML). This means that even a mid-sized organization or tech start-up can now turn to AI to improve customer experience or better understand consumer trends.

And the momentum behind AI is growing swiftly. Just last week, we reported that Meta is building the world’s fastest AI supercomputer with some 22,000 Nvidia Tensor Core GPUs for 20 times the performance of its current systems. In China, SenseTime recently launched its Artificial Intelligence Data Center (AIDC) to more than double its global computing capacity and commoditize AI.

How machine learning works

Have you ever wondered how ML models work? While most of us will probably never fully understand the math and algorithms highlighted in AI research papers, the supervised ML commonly deployed in businesses today is relatively straightforward.

At its heart, it is purely mathematical and works by analyzing an increasing amount of data with a suitable algorithm. The algorithm doesn’t change throughout the training process, though the various internal weights and biases that influence its outputs do.

Once a model is trained, it can provide an expected response for any given input. An ML model trained using metrological data from the last 20 years could hence offer a prediction about the chances that it will rain tomorrow given inputs such as the temperature and other parameters today.

At a commercial level, businesses have used ML to improve efficiency or business operations. Google, for instance, plugged its data centers into ML models to manage the hundreds of systems from air exchangers to chillers, based on a plethora of parameters that include external conditions such as the weather.

Some of the recommendations even came across as counterintuitive to seasoned experts, the top executive in charge of its data centers told me in an interview some years back. But the results speak for themselves, and Google saved millions of dollars annually through improved energy efficiency.

The problem of missing data

But ML isn’t magic, and missteps are possible. Take Zillow Offers, which for a brief eight months purchased homes in the U.S. at price points recommended by an AI engine. It didn’t pan out and Zillow took a USD304 million inventory write-down with as many as 2,000 jobs lost.

As I noted earlier in “At the Limits of Data”, the failure stems from an inability to accurately forecast the future price of houses up to six months out, in a tumultuous, pandemic-riven market with no precedence in recent history. In a nutshell, crucial data was missing.

This is why data scientists spend most of their time on data munging, a process of transforming data from erroneous or unusable forms into useful ones. This might entail filling it up with a median or average value or using more advanced algorithms to fill in the gap through predictions.

When it comes to records with missing data fields and outlier data points, sometimes the easiest thing to do is to get rid of it entirely, explained the professor at a Python data science class I attended at the National University of Singapore (NUS) last year.

But what if key metrics are missing entirely? For instance, hidden problems such as structural defects can dramatically skew the prices of a home. But an AI working solely on data it is given can’t possibly “see” it. And this is why we need humans to step in.

Bring in the humans

An article on Fortune summed up the situation with the current state of AI. As observed by Aleksandar Tomic, associate dean for strategy, innovation, and technology at Boston College: “Data models, especially data science and the training models, are not good at things that have not happened before. A computer will do whatever you ask it to do, but [the outcome] depends on what you ask.”

In the same report, Oliver Yao, a professor and associate dean for graduate programs at Lehigh University’s College of Business cautioned that technology has limits.

“I do think that people sometimes rely on big data, on technology, too much. The time has come, and they don’t have to think, they don’t have to do much. They just have to rely on what data tells us to do,” he said. According to him, Data is “absolutely useful”, but we cannot “100 percent rely on them”.

As business executives increasingly turn to data for insights, they must not only focus on how to use AI and data analytics but also the “why” and the “when” to use it. And they must bear in mind that they will still need human employees around to ask the right questions – and fill in the inevitable gaps.

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/monsitj