OctoML Inc., a startup that focuses on artificial intelligence optimization, today announced the launch of OctoAI Image Gen, an architecture for customizing image generation on popular models such as Stable Diffusion, that will allow developers to apply modifications to thousands of assets at a time.
“Image generation applications have quickly gone from fad to real business, with many e-commerce, entertainment and creative organizations looking to differentiate their service with AI,” said OctoML Chief Executive Luis Ceze. “But building these custom experiences with Stable Diffusion today is an extensive engineering effort that simply doesn’t scale.”
OctoAI launched in June to help developers build and scale their AI models. Now it’s being expanded with this new offering to provide an application programming interface endpoint and allow for mass fine-tuning using their assets.
At the core of the release is a new “Asset Orchestrator” that will allow developers to fine-tune their models with assets such as Low-Rank Adaptations or LoRAs. A LoRA is a fine-tuning model that permits a user to quickly train Stable Diffusion on different concepts, such as a particular character or style.
LoRAs are useful because they produce small models that can be portable compared to the standard image generating models, whose size can make them unwieldy. They’re also much faster and easier to train because they require less computing power.
Once trained, a LoRA can augment a Stable Diffusion model to get it to produce an image that features that character or style. As a result, LoRAs represent a reasonable tradeoff in size, time and computing power for fine-tuning a model.
For example, if users wanted to have a Stable Diffusion model produce an image of a video game or comic book character, they would prompt it with text. However, the results would probably be finicky and inconsistent — and probably involve a lot of toying around with the right words to use to get the model to manifest the image they want. This is known as prompt engineering.
With a LoRA trained on images of that specific character and in the styles that the user wanted – such as from a particular video game or in the art style of a particular comic period — it would align the model to match the desired outcomes more authentically. It would require far less engineering to get a reasonably good customized image out of the model.
According to OctoML, the new service can greatly hasten image production with the photo-realistic model Stable Diffusion XL, producing art generations on average in 2.8 seconds. The asset management feature can manage and pull models and data from popular sources such as CivitAI, an open-source tool where users share their AI artwork produced by Stable Diffusion, and from the open-source AI model repository Hugging Face Inc.
Several customers have already put the OctAI image generation solution to use in their business applications, including Storytime AI, which makes an app that uses AI to produce kid’s stories, and the AI art generator website and community NightCafe Studio Ply Ltd.
“Our top priority is to deliver kid-safe, consistent, engaging images for our custom children’s stories,” said Storytime AI CEO Brian Carlson. “Previously, this process relied on heavy-handed prompt engineering. But OctoAI helped us stand up a whole new image gen architecture utilizing assets like LoRAs to create consistent visuals without the added complexity of prompt engineering.”
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.