Meta is building better AI-driven audio for the virtual reality

Meta is building better AI-driven audio for the virtual reality

Posted on

When it comes to virtual reality, creating immersive worlds is more than just generating visually perfect environments, the way that sound works can make or break an experience. To tackle the subject of audio, researchers at Meta Platforms Inc. today open-sourced three artificial intelligence models that take sound in the metaverse to a new level.

“Getting spatial audio right is key to delivering a realistic sense of presence in the metaverse,” said Mark Zuckerberg, founder and chief executive of Meta. “If you’re at a concert, or just talking with friends around a virtual table, a realistic sense of where sound is coming from makes you feel like you’re actually there.”

Things sound different across different environments, for example, everyone has the experience of singing in an enclosed space such as the shower or talking in the park. The experience is entirely different. There’s also the way that friend’s voices reflect off walls in the living room of a house or the low murmur in a restaurant.

This is the essence of the first model, called the Visual Acoustic Matching model, which uses an image of the space to adjust sounds so that they match the target environment. For example, it could take an audio clip of a person speaking in an open field and match it to someplace cozy and intimate, making the voice sound closer and echo off nearby walls.

“Human listeners, without us even realizing it, are expecting to hear sounds in a certain way depending on the physical environment that we’re in,” said Kristen Garuman, research director at Meta AI. “That’s because audio is shaped by the environment we’re in.”

This could be useful for meetings with friends in the metaverse because although when we don VR headsets, we might get whisked away to a forest campsite to talk with our friends, we don’t actually leave our living room or home office. The recordings of our voices still keep the sounds that are generated by the spaces that we’re in, so the AI model can change that sound to match the gloaming-lit forest we’re in and make it that much more immersive.

The next model does the opposite, it takes knowledge of the environment and takes away echoes that might be generated by surfaces sound could bounce off (called reverberations) in order to create cleaner, crisper sound. The Visually-Informed Dereverberation model could be used to take a violinist’s performance in a massive train station and turn it into something that sounded like it was played in a studio.

The result of this is potentially better audio in general for recording from headsets worn in homes and home offices for speech enhancement, speaker identification and speech recognition purposes. With less echo sneaking into the audio, smart agents – and even people listening on the other end – would have a better time understanding speech.

Finally, in the metaverse things will probably get a little bit noisy when lots of people are talking nearby, and potentially talking over one another. VisualVoice (white paper) takes a page from humans, who can listen with more than just their ears – they also use their eyes for clues in mouth movements and facial expressions.

The objective of VisualVoice is to disentangle individual voices from background noises and other voices that might be speaking at the same time and identify individual speakers. The end result is for the AI model to provide better accessibility and potentially even create subtitles that attach to those speakers. It could even be used for smart agents to focus on and identify individuals in crowds.

With these new AI models, Meta hopes to supply superior audio to immersive AR and VR experiences in the future. Virtual reality is already providing profound experiences with visual representations of spaces, it’s important that the quality of the sound keeps up with it.

Garuman sees a future where this AI audio research will provide truly unique experiences for people in the metaverse, such as visiting a concert.

“As soon as you put on your headset the sounds from your home would fade away and the audio would adjust realistically as you move from the hallway into the concert hall and closer to the stage,” she said. “And, if you wanted, AI could enhance the experience so that you could enjoy the experience and still hear your friend next to you.”

Image: Meta

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *