Google Cloud Unveils BigLake Data Lakehouse Preview

At the Google Cloud Data Summit digital event last week, Google announced the preview launch of BigLake, a new data lake storage engine designed to unify data lakes and data warehouses amid the rapid proliferation of data.

Google BigLake

According to Google, BigLake gives teams the ability to analyze data without worrying about the underlying storage format or system, addressing the problems associated with both growing volumes of data and data stored across different cloud platforms.

Crucially, BigLake eliminates the need to duplicate or move data across platforms or services. This significantly reduces risks, costs, and inefficiencies, and avoids the dangers of fostering disparate data environments with data silos.

Enterprises can hence maintain a single copy of data and enforce consistent access controls across analytics engines of their choice, ranging from Google Cloud to open-source technologies such as Spark, Presto, Trino, and TensorFlow.

In addition, organizations also gain unified governance and management at scale through seamless integration with Dataplex – a Google Cloud service that offers an intelligent data fabric to automate data management.

“With BigLake, customers gain fine-grained access controls, with an API interface spanning Google Cloud and open file formats like Parquet, along with open-source processing engines like Apache Spark,” said Gerrit Kazmaier, Google vice president and general manager of Database, Data Analytics, and Looker.

“These capabilities extend a decade’s worth of innovations with BigQuery to data lakes on Google Cloud Storage to enable a flexible and cost-effective open lake house architecture,” he said.

BigQuery is Google’s fully managed SQL data warehouse service for business intelligence and analytics.

In a post on the Google Cloud blog, software engineer Justin Levandoski and product manager Gaurav Saxena explained that BigLake offers fine-grained access control and performance acceleration across BigQuery as well as multicloud data lakes hosted on competitors’ platforms on AWS and Azure.

“BigLake extends a decade of innovations with BigQuery to data lakes on multicloud storage, with open formats to ensure a unified, flexible, and cost-effective lakehouse architecture,” wrote Levandoski and Saxena.

Google isn’t the first to moot the lakehouse concept. Databricks has long championed the data lakehouse, claiming that its data lakehouse architecture built on open data lakes can deliver better data warehousing performance than traditional data warehouses built using proprietary data formats.

Image credit: iStockphoto/Казаков Анатолий Павлович