Data Science Predictions for 2023
- By Paul Mah
- February 01, 2023
Data has never been more relevant, with more businesses than ever taking the plunge into making data-driven decisions and machine learning. As the Lunar New Year celebrations wind down and we start 2023 in earnest, what are some possible developments in data science that we can expect this year?
I outline three predictions below.
Interoperable data solutions will grow in importance
There are more data-centric tools than before. But while choice is a good thing, the large and rapidly growing data ecosystem does mean that modern data stacks are more fragmented than ever. This makes interoperability a vital consideration.
Certainly, a team with sufficient engineering know-how can probably get disparate solutions to work - eventually. But even when it is done successfully, the resulting smorgasbord will often result in the organization never having comprehensive data governance, data profiling or data lineage.
Having the right tools can save a tremendous amount of time. When I spoke with Daniel Hein of Informatica – who heads cloud sales and product specialists at Informatica – last year, he shared about the enormous effort a retailer in Australia took to retrieve weather updates from the Bureau of Meteorology.
“It took them two weeks of work to code the scripts to extract this data. With a cloud-native solution supported by a wizard-based interface, they were able to address the same use case in less than five minutes,” said Hein. Though he would have you buy Informatica, I would argue that any right tool could have helped the team save a tremendous amount of time.
Python goes mainstream
According to Torsten Grabs, a director of product management at Snowflake who oversees data engineering and data science workloads, the use of Python will grow beyond data professionals to citizen developers in 2023.
“Business people can’t afford to wait for data scientists to provide the analytics they need, so they’re taking matters into their own hands. Python has become more approachable for non-professionals with the availability of preconfigured cloud runtimes and accessible tools like NumPy for numerical data, Prophet for forecasting and H3 for geospatial data,” he wrote in a contributed article on TechCrunch.
And of course, many modern cloud platforms offer support for Python with integrated security and governance features that can give novice users a running start. For this reason, Grabs predicts Python use will expand beyond data professionals and into the hands of business analysts and other less technical users.
“Just as self-service business intelligence tools went mainstream 15 to 20 years ago, Python is now starting to put even more powerful analytics capabilities into the hands of business users,” he wrote.
AutoML to lower barriers in AI
Finally, there is no question that ChatGPT has generated a global surge of interest in the otherwise arcane field of AI. Businesses are taking another look at how they can leverage machine learning (ML) and build models to discern trends and solve problems. (Read “Deciphering Data Science and Machine Learning” here)
But ML is hardly trivial and requires data scientists with the expertise and time to perform multiple steps such as data collection, data preparation, model training, model evaluation, and deployment.
One way to speed up that analysis is through AutoML or Automated machine learning. AutoML is the process of automating the tasks of applying machine learning to real-world problems. By automating a range of time-consuming tasks such as feature engineering, algorithm selection, hyperparameter tuning, model selection and training, even non-experts can potentially access ML capabilities.
As noted by Christophe Atten, a data scientist and writer, AutoML can produce solutions with explainable and reproducible results and automate a manual and tedious process of repetitive tasks. Crucially, it allows domain experts to become citizen data scientists.
For businesses mulling the use of AI this year, I really liked the advice Atten offered. “The future of AI is not something that will happen in the distant future; it’s happening right now. The best way to prepare for the future is to take action today,“ he wrote.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto/AndreyPopov
Paul Mah
Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.