The State of AI 2021: Enterprise Data Sector Booming

The annual State of AI Report 2021 is out, and AI investors Nathan Benaich and Ian Hogarth have outdone themselves once again with almost 200 pages that look at technology breakthroughs and capabilities in AI, commercial applications, regulation, and predictions about the road ahead.

Bad data is a major problem

The report called attention to how the ML community has had a minimum positive impact against COVID-19. Specifically, the ability to diagnose the coronavirus from chest X-rays or CT scans using computer vision has been a universal failure. The problem? Methodological, dataset, and bias issues crept in as researchers inadvertently placed the cart before the horse by jumping in without adequately validating the data.

We wrote about this in August, when we noted how a report (pdf) from the Alan Turing Institute, the national institute for data science and AI in the UK, and a report on the MIT Technology Review both came to this conclusion. Indeed, not only did AI not make a real difference in the fight against the pandemic, but some were even potentially harmful, noted the latter. And poor quality data and incorrect assumptions were the cause.

Training data becoming more important

Expect training data quality and quantity to become more important than ever as automated machine learning (AutoML) use increases, and new AI architectures offer diminishing gains. “With automated labeling and plateauing architecture performance, training data quantity and quality [are] becoming the competitive metric for AI-first startups,” asserted the authors.

AutoML is the process of automating the process of applying machine learning to real-world problems, with the stated aim of reducing or eliminating the need for skilled data scientists. The authors point to AutoML for enabling model-in-the-loop training data to become more common; according to a chart in their report, the ratio of annotations made by AI across computer vision is now up by 36% in Q2 2021 compared with the same quarter in 2020.

As a benefit, training datasets are no longer a fixed object but a continuously growing corpus of knowledge, and ML teams are launching more projects as their confidence in tooling and data quality grows.

Enterprise data, automation sector ‘on fire’

AI can’t exist in a vacuum – it requires access to data for training ML models and to draw insights from. It is hence unsurprising that the enterprise data and automation sectors are red hot right now, according to the report, which pointed to the rapid growth of UniPath, Snowflake, and Confluence.

Together, the three firms make up USD138 billion in newly created public market value this year alone in the areas of robotic process automation (RPA), cloud data platform, and data streaming, respectively.

Databricks, which launched its original data platform built on Apache Spark in 2015, has also grown into a popular platform for unmanaged data, automated ETL (Extract, transform, load), collaborative data science, and full-stack machine learning. Don’t ignore the cloud too, as Amazon, Google, and Microsoft have all invested in Databricks in February this year.

Talent, and the need to study impact of AI

Finally, there is a significant growth in AI talent in both India and China, though India’s AI research is the most diverse. “Almost 30% of scientific research papers from India include women authors compared to an average of 15% in the US and UK, and far greater than 4% in China,” the authors wrote.

The demand for AI talent is global, however. Brazil and India are currently hiring more than three times more AI talent today than they were in 2017, matching or surpassing the hiring growth of Canada and the US. And the institution publishing the largest volume of quality AI research paper – or those ranked in the top 25 percentile in quality, is the Chinese Academy of Sciences from China.

For now, a lot more needs to be done to learn more about the implications of AI and its safety. “While AI’s growing impact on society and the economy is now evident, our report highlights that research into AI safety and the impact of AI still lags behind its rapid commercial, civil, and military deployment,” Benaich and Hogarth summed up in their blog post announcing their latest report.

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/BOTCookie