Databricks Introduces Lakeflow for Data Engineering
- By Paul Mah
- June 19, 2024
Databricks last week took the wraps off Databricks LakeFlow, which it says was designed to unify all aspects of data engineering, from data ingestion, transformation, to orchestration.
According to Databricks, LakeFlow makes building and operating production-grade data pipelines simple and efficient while addressing complex data engineering use cases – allowing busy data teams to meet growing demands for reliable data and AI.
Data engineering today
Essential for democratizing data and AI within businesses, data engineering remains a challenging and complex field. For a start, data teams must ingest data from siloed systems such as databases and enterprise applications, often through complex and fragile connectors. Moreover, data preparation involves intricate logic, while failures or even latency spikes can lead to operational disruptions and unhappy customers.
Finally, deploying pipelines and monitoring data quality typically requires additional, disparate tools that are fragmented and incomplete. This leads to low data quality, reliability issues, high costs, and an increasing backlog of work.
LakeFlow addresses these challenges by simplifying all aspects of data engineering via a single, unified experience. With LakeFlow, data teams can easily ingest data at scale from traditional databases such as MySQL, Postgres, Oracle, and enterprise applications such as Salesforce, Dynamics, Sharepoint, Workday, NetSuite, and Google Analytics.
In addition, LakeFlow automates deploying, operating, and monitoring pipelines in production with built-in support for CI/CD, as well as advanced workflows that support triggering, branching, and conditional execution.
Key features of LakeFlow
There are three key features to LakeFlow, namely LakeFlow Connect, LakeFlow Pipelines, and LakeFlow Jobs.
- LakeFlow Connect: As its name suggests, LakeFlow Connect incorporates the capabilities of Arcion – acquired by Databricks last year, to offer simple and scalable data ingestion. LakeFlow Connect also provides various native connectors integrated with the Unity Catalog for data governance.
- LakeFlow Pipelines: Built on Databricks’ highly scalable Delta Live Tables technology, LakeFlow Pipelines allows data teams to implement data transformation and ETL in SQL or Python for automated, real-time data pipelines.
- LakeFlow Jobs: LakeFlow Jobs provides automated orchestration, data health, and delivery spanning scheduling notebooks and SQL queries to ML training and automatic dashboard updates.
LakeFlow is entering preview soon, starting with LakeFlow Connect. Customers can register to join a waitlist today.
Image credit: iStock/iosebi meladze
Paul Mah
Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.