Databricks Inc. today announced the general availability of Delta Live Tables, which it claims is the first extract/transfer/load or ETL framework to use a declarative approach to building data pipelines and managing data infrastructure at scale.
In declarative programming, the developer writes code to describe what the program should do as opposed to how it should do it. The ultimate approach is determined by code.
Delta Live Tables was announced last spring as part of a Databricks-built open-source project called Delta Sharing that provides an open protocol for securely sharing data across organizations in real-time, regardless of the platform on which the data resides.
The company said it’s aiming to shorten the often tedious and complicated task of turning SQL queries into production ETL pipelines by automating the most time-consuming parts of data engineering. In most organizations, the building of data pipelines is a mostly manual process of specifying granular instructions that define how data should be manipulated and tested.
It’s also hard to build reliable data pipelines without operational rigor to keep them up and running. Even at a small scale, the majority of a data practitioner’s time is spent on tooling and managing infrastructure to make sure data pipelines don’t break, the company said.
Databricks said Delta Live Tables is the first ETL framework to combine both modern engineering practices and automatic infrastructure management. Engineers describe the desired outcomes of data transformations and Delta Live Tables automates virtually all of the manual complexity, the company said.
It also enables data engineers to treat their data as code and apply software engineering best practices such as testing, error handling, monitoring and documentation to deploy pipelines at scale. Delta Live Tables supports both Python and SQL on both streaming and batch workloads.