Databricks Acquires Tabular With Eye Towards Lakehouse Compatibility
- By Paul Mah
- June 05, 2024
Databricks today announced that it has agreed to acquire Tabular, a data management company. Tabular was founded by the original creators of Apache Iceberg, which Databricks hopes to build data compatibility with over time.
With the move, Databricks intends to work closely with the Delta Lake and Iceberg communities to bring format compatibility to the lakehouse. This will be achieved in the short term inside its own Delta Lake UniForm feature. In the long term, it hopes to work with Tabular to evolve toward a single, open, and common standard of interoperability for an open lakehouse.
The Lakehouse architecture
Databricks pioneered the lakehouse architecture in 2020 to enable the integration of traditional data warehousing workloads with AI workloads on a single, governed copy of data. We wrote about the lakehouse architecture previously, which seeks to maximize enterprise productivity by democratizing access to data.
The idea is to offer an open format so different workloads, applications, and engines can access the same data, unlike proprietary data warehouses where only a proprietary SQL engine can read, write, or share the data. For the latter, data must typically be copied and exported to be used by other applications, creating a high degree of vendor lock-in.
The foundation of the lakehouse is open-source data formats that enable ACID transactions on data stored in object storage. These formats have dramatically improved the reliability and performance of data operations on the data lake and were specifically designed for open-source engines such as Apache Spark, Trino, and Presto.
Databricks says that 74% of enterprises have now deployed a lakehouse architecture, four years after its launch. Databricks worked with the Linux Foundation to create the Delta Lake project. Since its inception, Delta Lake has had over 500 code contributors from a diverse set of organizations, and over 10,000 companies globally use Delta Lake today.
Towards interoperability
Ryan Blue and Daniel Weeks of Tabular originally developed the Iceberg project at Netflix and donated it to the Apache Software Foundation. Since then, Delta Lake and Iceberg have emerged as the two leading open-source standards for lakehouse formats.
However, even though both of these formats are based on Apache Parquet, they have became incompatible due to independent development. Databricks says it will work closely with the Delta Lake and Iceberg communities to bring interoperability to the formats over time.
“Databricks pioneered the lakehouse and over the past four years, the world has embraced the lakehouse architecture, combining the best of data warehouses and data lakes to help customers decrease TCO, embrace openness, and deliver on AI projects faster,” said Ali Ghodsi, co-founder and CEO of Databricks.
“Unfortunately, the lakehouse paradigm has been split between the two most popular formats: Delta Lake and Iceberg. Databricks and Tabular will work with the open-source community to bring the two formats closer to each other over time, increasing openness, and reducing silos and friction for customers,” said Ghodsi.
Interoperability will allow companies to fully realize the benefits of the lakehouse, according to Databricks.
The proposed acquisition is subject to customary closing conditions and is expected to close in Databricks’ second fiscal quarter.
Image credit: iStock/popba
Paul Mah
Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.