Dremio upgrades its data lakehouse to streamline analytics projects

Dremio upgrades its data lakehouse to streamline analytics projects

Posted on



Dremio Corp. today introduced a new version of its data lakehouse that aims to ease several common analytics tasks for customers.

Dremio is a Santa Clara, California-based startup backed by more than $410 million in funding from Cisco Systems Inc., Insight Partners and other prominent investors. Its namesake data lakehouse enables companies to analyze large quantities of business information for useful insights. It also eases related tasks, such as creating data visualizations.

“It’s been great to see the incredible growth of our product driving value for our customers,” said co-founder and Chief Product Officer Tomer Shiran. “Every company we speak to is struggling to help their businesses move faster and self-serve while maintaining security and governance. Data meshes solve for these competing priorities, and we’ve been innovating to make it easier and easier for companies to create and operate data mesh architectures.”

The most significant new features that Dremio debuted today focuses on an open-source technology called Apache Iceberg. The technology, which the startup has implemented in its platform, is used to streamline certain common data management chores.

Iceberg allows companies to organize up to tens of petabytes of information in a single database table. Keeping information in a single table eases tasks that are usually difficult to perform with large datasets, such as modifying specific records and adding new ones. Iceberg also includes features that help administrators remove erroneous items. 

Dremio’s data lakehouse enables organizations to run analyses on information stored in Iceberg tables. According to the startup, the new release of its platform that debuted today will be capable of performing the task more efficiently.

When Dremio’s data lakehouse adds, removes or updates records in an Iceberg table, it generates secondary files to keep track of what changes were made. Those secondary files can quickly add up. They not only take up storage capacity but can potentially also slow queries if not managed correctly.

The company is now rolling out a feature that can automatically optimize the secondary files generated during data operations to reduce their storage footprint. The result, it says, is an increase in hardware efficiency. That means queries can be potentially completed faster and at a lower cost. 

There are cases where erroneous information may find its way into an Iceberg-powered database table. For such situations, the new release of Dremio’s platform includes a table rollback feature. The feature enables customers to restore an earlier version of a dataset if it’s found to contain incorrect information.

In conjunction, the company is making it easier to analyze data from external sources.

Turning data into an Iceberg table simplifies the process of analyzing it with Dremio. The company has added a tool that automates the process of creating Iceberg tables from CSV and JSON files stored in Amazon S3, Azure Data Lake Storage and HDFS. It’s also adding connectors that will allow customers to run analyses on records stored in Snowflake and IBM Corp.’s Db2 relational database.

Using a Dremio data lakehouse to analyze information stored in other deployments of the startup’s platform is set to become simpler as well. According to the company, today’s update includes features that significantly reduce the amount of manual work involve in the task. It says the features work even when a company’s data lakehouse environments are scattered across cloud and on-premises infrastructure. 

Image: Dremio

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *