Understanding the Modern Data Stack
- By Paul Mah
- January 16, 2022
The power of data is no longer in dispute, with some even pointing to how data drives disruption in today’s digital economy. From advanced predictive analytics to identifying profitable customers, businesses are turning to data-driven decisions to pull ahead of their competitors.
But successfully leveraging data goes beyond hiring a data scientist or convincing a couple of employees to enroll for some data science courses.
After all, even a talented chef would be hard-pressed to whip out fantastic meals, day after day, without a proper kitchen stocked with the right ingredients and adequately equipped with kitchen tools. In the same token, organizations cannot expect to advance their business and harness the power of data at scale without first putting the requisite systems and capabilities in place.
The modern data stack
The way forward is by deploying a data stack, a term that originates from “technology stack” or tech stack. The tech stack denotes the suite of technology and software suites that powers an organization’s digital systems; data stack does the same thing for data.
But in contrast to the broad remit of generic tech stacks, the data stack is focused on a relatively narrow range of data-centric tasks. This typically includes funneling data into the organization, manipulating it, analyzing it, and transforming it into actionable insights.
While the description of a modern data stack will invariably differ depending on who you ask, there is a growing consensus about its core characteristics. For a start, it is increasingly clear that the future of the data stack lies in the cloud.
Unlike an on-premises data warehouse where data is compartmentalized and difficult to access, a cloud data warehouse tends to be more flexible and offer greater ease of self-service access. Performance can also be superior, while flexible storage and elasticity can potentially reduce costs.
Indeed, some of the biggest names in data warehousing such as Snowflake and Databricks are in the cloud.
Where enterprises were previously forced to choose between monolithic solutions that attempt to do everything, the industry is increasingly leaning towards modular, interchangeable software. Businesses can now opt for best-of-the-breed solutions to customize their data stack for the precise capabilities they need.
Finally, don’t get distracted by the glitter of advanced analytics or AI technology. Speaking about the state of data platforms today, Bob Muglia, former CEO of Snowflake talked of a near-future where analytic systems will merge with data platforms and alluded to how SQL will continue to play a crucial role – for now.
“I think you’ll see analytic systems merging into the data platforms… You’ll see a very complete stack that will have both analytics and advanced analytics and machine learning systems, together with SQL-based data management systems,” said Muglia.
Moving into the future
As we move into the future, expect data pipelines to evolve, too. Traditionally, organizations relied on an ETL process to Extract, Transform, and Load new data into databases, which is done by a small team of data specialists. As the number of users dabbling in the data increases, the volume of work is now outstripping what a small team can achieve.
An ELT process thus makes sense, which entails loading data into data warehouses first and allowing business users to transform it themselves. Aside from being more efficient, other advantages include reduced time to access, better usability, and improved outcomes as business users take ownership of business logic.
As you might imagine, this makes the democratization of data and self-service analytics more vital than ever. When fully deployed, it is a game-changer where users can freely ask new questions about their data – and get the answers quickly, if not immediately.
Ultimately, it is worth noting that the modern data stack itself is relatively nascent. For organizations already on board this journey, it means there are ample opportunities to solve even more problems or to improve existing solutions through the power of data.
And for those only just getting their toes wet, well, don’t fret. It is not too late yet – if they get started today.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto/thesomegirl
Paul Mah
Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.