Chroma Inc., a database startup led by a prominent former Google LLC executive, today announced that it has raised $18 million in seed funding.
The investment was led by Quiet Capital. Executives from publicly traded database maker MongoDB Inc., Hugging Face Inc. and more than a half dozen other tech companies contributed as well. Chroma previously closed a pre-seed round of undisclosed size last year.
San Francisco-based Chroma is led by co-founder and Chief Executive Officer Jeff Huber, who previously established venture capital firm Triatomic Capital. He earlier spent more than a decade at Google in various executive roles. During his stint at the search giant, Huber played a key role in the development of Google Ads and Google Maps.
Chroma develops an open-source database that is specifically designed to power artificial intelligence applications. Since its release less than two months ago, the database has been downloaded more than 35,000 times.
AI models have a data bank that they draw on to make decisions. A shopping recommendation model, for example, may maintain a database of the latest product listings. Neural networks built for cybersecurity tasks store information about hacker activity.
AI models don’t store their data in its raw form but rather as abstract mathematical structures called vectors. A collection of vectors is known as an embedding. Chroma’s open-source database, which is also called Chroma, is specifically built to store AI models’ embeddings.
“Developers use Chroma to give LLMs pluggable knowledge about their data, facts, tools, and prevent hallucinations,” Huber and Chrome co-founder Anton Troynikov wrote in a blog post today. “Many developers have said they want ‘ChatGPT but for my data’ – and Chroma provides the ‘for my data’ bridge through embedding-based document retrieval.”
One of the most important features of embeddings is their ability to highlight similarities between the data points they store. An embedding that contains vectors representing handsets, for example, can highlight handsets that have a similar price. The opposite is also true: it’s possible to point out handsets that have a significant price difference.
Embeddings’ ability to highlight similarity between data points is essential to the functioning of AI models. Recommendation models, for example, generate shopping suggestions by analyzing what items a user has bought in the past and finding similar merchandise. Neural networks that detect malware look for network activity that resembles known hacking tactics.
Embeddings also make it possible to detect when two items are dissimilar, which is likewise useful for AI applications. In the cybersecurity market, some AI-powered breach prevention tools work by mapping out how customers typically interact with an application. Such tools then look for activity that is dissimilar to a customer’s usual access patterns.
Many traditional databases are not specifically designed to store AI model embeddings, which complicates the work of developers. Chroma’s database is designed to address that challenge. According to the startup, its platform is specifically optimized to store AI embeddings and can consequently provide relatively simple developer experience.
The task of turning the data that an AI model ingests into embeddings it can use for processing is done with specialized algorithms. According to Chroma, its database provides features that make it easier to use such algorithms. The result is a reduction in manual work for software teams.
Chroma supports several open-source embedding generation algorithms. It can also make it easier to use a number of commercial tools in the category, including OpenAI LLC’s cloud-based service for creating embeddings. Developers with more advanced requirements can deploy their own custom algorithms.
To speed up queries, Chrome offers an in-memory mode. Databases usually store information on disk or flash storage and bring it into memory only when it’s actively used. An in-memory system keeps information in RAM from the outset, which skips the process of retrieving data from storage and thereby speeds up computations.
Chroma says that it will use its newly announced funding round to build new features. The startup is planning, among other additions, a capability that will allow developers to determine if information retrieved by the database is relevant to a given query. Chroma is also developing a commercial, managed version of its database that is set to launch in the third quarter.