Data analytics developer Databricks Inc. today announced the general availability of Databricks Model Serving, a serverless real-time inferencing service that deploys real-time machine learning models natively within the Databricks Lakehouse Platform and is exposed via a Representational State Transfer or REST application programming interface.
The company said the service enables developers to build applications such as personalized recommendations, customer service chatbots and fraud detection algorithms without the need to manage infrastructure. The company said it’s addressing common pain points in the deployment of machine learning, a branch of artificial intelligence that uses data and algorithms to imitate roughly how the brain works.
Such applications require fast and scalable server infrastructure with functions such as feature lookups, monitoring, automated deployment and model retraining. That often means teams must integrate disparate tools, increasing operational complexity and creating maintenance overhead, the company says. So businesses often end up spending a lot of time and resources on maintaining infrastructure instead of integrating machine learning into their processes.
“One thing we hear from customers is that managing infrastructure is a real challenge,” said Craig Wiley, senior director of product management. “Our goal is to simplify workflows.”
Flexible capacity
As a serverless deployment, Databricks’ infrastructure expands and contracts according to the needs of the machine learning model. “We’ll run an inference for as long and for as many of them as needed and then we’ll spin it back down when traffic to the model slows,” Wiley said. “We believe that can be transformational to how a team works.”
Exposing the inferencing engine as an API makes it possible for developers to call the service only when an inference is needed. “Customers can quickly and easily put a model behind an endpoint,” Wiley said. “They can call this service with an API anytime they need to and we return to them an inference based on the data they submitted to us.” Databricks is working on an enhancement that will publish all inferences to a table so developers can monitor and track the performance of their model over time.
The only requirement is that a model is built using MLFlow, which is an open-source platform for managing the machine learning lifecycle. “It can be a TensorFlow model or a scikit model,” Wiley said. “As long as it’s wrapped in MLflow, we’re more than happy to support it.”
Unified platform
Databricks said Databricks Model Serving is the first production-grade model serving solution developed on a unified data and AI platform based on the company’s “lakehouse” data warehouse/data lake hybrid.
The service integrates with other lakehouse services, including Databricks Feature Store for automated online lookups, MLflow Model Registry for model deployment, Unity Catalog for unified governance, and its quality and diagnostics tools. As part of the launch, the company is also introducing serving endpoints, which uncouple the model registry and the scoring uniform resource identifier. This gives developers a way to deploy multiple models behind a single endpoint and distribute traffic as needed among them.
Pricing is based on the amount of capacity and the number of concurrent requests customers require.