CelerData Inc. maker of a real-time analytics platform based on the StarRocks open-source massively parallel database, today announced version 3 of its enterprise product with enhanced support for the hybrid data warehouse/data lake repositories known as data lakehouses.
CelerData, which renamed itself from StarRocks Inc. last year, is the principal developer of StarRocks, a fork of Apache Doris that was recently donated to the Linux Foundation.
The company said most query engines are not well-tuned for real-time analytics. They struggle with ad-hoc queries and bog down under a large number of concurrent users. “They may accept streaming data sources but they don’t support real-time,” said Li Kang, Celerdata’s vice president of strategy. As a result, he said, “enterprises will often build two pipelines — one for batch processing in the data lake or data warehouse and a separate real-time pipeline.”
The new release is built on a cloud-native architecture to enable better workload and resource isolation so that different warehouses can be created for different use cases. It gives lakehouse users the option to run high-performance analytics without ingesting data into a central data warehouse. CelerData claims its query engine can support thousands of concurrent users at 10,000 queries per second and is three times faster than competitive query engineers.
Batch and streaming
Users can query both streaming and historical data in real-time without having to wait for streaming data to be batched for analysis. The company’s approach differs from the quasi-real-time processing technique called micro-batching by splitting data into different partitions called tablets. “Each time we get a new record we read it from our reader,” Kang said. “It’s not micro-batching but you can think of it that way and combine that data with other tables.”
This release also adds integration with common storage formats such as Apache Iceberg and Apachi Hudi. Previously the software was limited to local storage on a virtual machine or server and only supported one direct-attached storage type. “Data can now be stored in S3 or our local storage,” Kang said, referring to Amazon Web Services Inc.’s object-storage format.
Performance can be further improved using a local caching layer for remote input/output operations and multi-table materialized views that are built from multiple joint base tables.
CelerData Version 3 will be generally available in early April 2023. The company also operates a fully managed cloud service.