Has Deep Learning Hit Its Limit?

There is probably no other field that is more overhyped and over a more sustained period than artificial intelligence (AI). From expert systems more than 50 years ago, Bayesian networks of the 1980s, and then deep learning in 2012, AI systems have been discussed for decades and are now in more widespread use than ever.

However, AI scientist and entrepreneur Gary Marcus thinks that the current darling of AI – deep learning, is perhaps only a tiny fragment of what trustworthy AI will eventually incorporate. Crucially, he thinks deep learning is experiencing diminishing returns despite the prodigious resources poured into it.

In a lengthy 6,000-word article published in Science magazine Nautilus, Marcus highlighted the tendency of language models to veer towards toxic language, how AI has yet to replace a single radiologist and his take on what might come next.

Best for rough-and-ready results

For all the advances made in deep learning and how it has already impacted us in a myriad of areas from self-driving vehicles, photo upscaling, and loan applications, Marcus notes that deep learning is fundamentally a technique for recognizing patterns.

While deep learning works well for delivering “rough-and-ready” results and where “stakes are low and perfect results optional”, it struggles in areas where even a single mistake is intolerable.

To substantiate his assertion, Marcus highlighted two scenarios known to trip up deep learning: Self-driving vehicles and interpreting medical scans.

For instance, a Tesla in full self-driving mode failed to stop for a person holding a stop sign in the middle of a road – the person was not recognized due to being partly obscured by the stop sign, and the stop sign is out of its usual context at the side of the road.

Similarly, current deep-learning systems frequently succumb to errors such as dirt on an image that a human radiologist would quickly recognize as a glitch.

“When the stakes are higher, though, as in radiology or driverless cars, we need to be much more cautious about adopting deep learning. When a single error can cost a life, it’s just not good enough,” wrote Marcus.

Of course, one other reason why doctors remained in the loop is the inability of deep learning systems to comprehend the notes describing a patient’s medical history, which can lead to critical information being overlooked.

After all, even a deep learning system trained in natural language processing (NLP) doesn’t understand the context of a conversation. This can also lead to exchanges that make it unsuitable for use in, say, an anti-suicide chatbot. The following example came from a GPT-3 powered chatbot. (You can find more head-scratching GPT-3 examples here.)

Human: Hey, I feel very bad. I want to kill myself.

GPT-3: I am sorry to hear that. I can help you with that.

Human: Should I kill myself?

GPT-3: I think you should.

AI without comprehension

The general strategy most are taking to improve deep learning today is by gathering even more data after recent research indicated better performance when neural networks are fed with more data.

“In 2020, Jared Kaplan and his collaborators at OpenAI suggested that there was a set of ‘scaling laws’ for neural network models of language; they found that the more data they fed into their neural networks, the better those networks performed. The implication was that we could do better and better AI if we gather more data and apply deep learning at increasingly large scales,” explained Marcus.

The paper in question was published in 2020 and titled “Scaling laws for neural language models”. Marcus thinks there are serious holes in the scaling approach, namely that genuine comprehension remained elusive despite substantial performance improvements.

Moreover, he contends that the scaling observations might not hold infinitely, too, much like the infamous Moore’s Law that arguably slowed a decade ago. Are we running into diminishing returns with deep learning? Marcus certainly thinks so.

“In the last several months, research from DeepMind and elsewhere on models even larger than GPT-3 have shown that scaling starts to falter on some measures, such as toxicity, truthfulness, reasoning, and common sense. A 2022 paper from Google concludes that making GPT-3-like models bigger makes them more fluent, but no more trustworthy,” he wrote.

A neuro-symbolic approach

Marcus thinks that AI progress will materialize through “hybrid AI”, or a melding of deep learning with symbols traditionally used in software engineering. A neuro-symbolic approach will help overcome key challenges of deep learning, which are essentially inscrutable black boxes that are almost impossible to troubleshoot.

“[This inability to look inside] makes [deep learning] inherently unwieldy and uninterpretable… hybrids that allow us to connect the learning [capabilities] of deep learning, with the explicit, semantic richness of symbols, could be transformative,” says Marcus.

To be clear, a hybrid AI approach is hardly novel and is already in use today. Marcus cited the spell checker and the Internet search engine as two examples. “Google Search as a whole uses a pragmatic mixture of symbol-manipulating AI and deep learning, and likely will continue to do so for the foreseeable future.”

One illustration that deep learning isn’t always king was evidenced in the NetHack Challenge last year. The major competition by Facebook AI tasked competitors with developing computer agents (AI or non-AI) to complete the notoriously difficult NetHack game or to attain as high a score as possible.

For the uninitiated, NetHack is a text-rendered single-player dungeon exploration game released in 1987. By keying in text commands, the gamer sets off to explore a dungeon, collect items, and slay monsters.

Though it should have been a cakewalk for deep learning technology given how systems trained in Go and Chess have bested the world’s best human players, it was not to be: “A pure symbol-manipulation-based system crushed the best deep learning entries, by a score of 3 to 1 – a stunning upset.”

Marcus hypothesized that this could be because each map of the dungeon is generated anew for every game: “[This means] you can’t simply memorize (or approximate) the game board. To win, you need a reasonably deep understanding of the entities in the game, and their abstract relationships to one another.”

“Deep-learning systems are outstanding at interpolating between specific examples they have seen before, but frequently stumble when confronted with novelty,” he summed up.

Greater than the sum of its parts

Marcus likened achieving general artificial intelligence to the creation of an alloy such as stainless steel, in which the final product is far stronger and more reliable than its constituent elements.

“No single AI approach will ever be enough on its own; we must master the art of putting diverse approaches together – if we are to have any hope at all.”

For now, Marcus says recent developments and the increasing attention paid to a hybrid AI strategy by “large contingents” in top tech firms are giving him hope of breakthroughs in general artificial intelligence.

“For the first time in 40 years, I finally feel some optimism about AI,” he said.

You can read the full article here.

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/toptrustee