Artificial intelligence-powered search startup Nuclia said today it has raised $5.4 million in a seed funding round led by Crane Venture Partners and Ealai.
The round will enable the company to build upon the launch of its open-source and cloud-native database NucliaDB. Officially called Bosutech S.L., Nuclia also announced the public availability of its application programming interface, which allows developers to integrate extremely potent, AI-powered search into any application, service or website.
What Nuclia has built is an AI-powered solution for searching unstructured data, and it looks to be a big deal. These days, every company has built up vast troves of data, but the vast majority of that information — around 80% to 90%, Nuclia says — is unstructured, in the form of unreadable text documents such as PDFs, or video or audio files. Until now, searching through this kind of data accurately has always been a big challenge.
The problem is that most companies simply don’t have the capability to ingest, process and index all of this unstructured data. As Nuclia points out, meeting even a small part of this challenge has taken hundreds of engineers and millions of machines’ worth of computation.
With the launch of Nuclia’s API and NucliaDB, the startup claims that this kind of power is now available to anyone. NucliaDB, which can be found on GitHub, is the foundation of Nuclia’s capabilities. The company claims it’s the first vector database that’s specifically designed for unstructured data.
Vector databases are purpose-built to handle the unique structure of vector embeddings. They index unstructured data as vectors that can easily be searched and retrieved by comparing values and finding those that are most similar to one another.
Companies can tap NucliaDB through the Nuclia API, which can quickly connect to any data source and automatically index its content regardless of what format or even language it is in. Eudald Camprubí, founder and chief executive of Nuclia, told SiliconANGLE that the Nuclia API enables users to perform multilingual semantic searches on their entire unstructured data set, transforming that information into knowledge.
“Nuclia’s API enables developers to integrate AI-powered search by normalizing unstructured data,” Camprubí explained, adding that this includes transcribing video and audio, extracting all content from images, documents and other text-based information. “It vectorizes all of this data and creates an index that can be searched.”
Once all of this unstructured data has been indexed, developers can use Nuclia API to discover semantic results, specific paragraphs in text and relationships between data. These capabilities can be integrated in any application with ease, Camprubí said.
Crane Venture Partner Aneel Lakhani was full of praise for the company, saying Nuclia has built “something incredible” that will enable users to be taken to the exact moment they’re searching for in a video or podcast, or the exact block of text they’re looking for in a PDF or PowerPoint presentation.
Constellation Research Inc. analyst Andy Thurai was more practical in his assessment. He told SiliconANGLE Nuclia is solving a problem many companies have, which is that they possess far more unstructured data than they really know what to do with. Because that information is not easily readable, they just tend to just store it indefinitely, in the hope that one day they’ll be able to figure out what to do with it, he said. Although Nuclia is in many ways just another entrant into a very crowded field of startups all trying to solve this problem, Thurai said it has a few capabilities that may be useful to enterprises.
“First their search can be completely API-based. Once the proper data sources are connected, any app can use Nuclia’s publicly available API to perform search on unstructured data,” Thurai said. “Second, they can do they can do multilingual search using a text search box, so in that way it’s aiming to be the Google of unstructured data search.”
Thurai said Nuclia’s claim to be able to detect images within unstructured datasets is also somewhat unique. For example, it says it can find an embedded image within a scanned document. Further, the company also claims the capability to perform a “fuzzy search” on unstructured data. Thurai explained that this means it can search for sentences that closely match what the user is searching for, as opposed to finding an exact match.
“This last capability is unique in the area of unstructured data search,” Thurai said. “It can potentially lead to new image classification and video classification techniques, so we could say one video is X% similar to another one, and so on. It could also be very useful in defending copyright to music, imagery, music videos and more.”