Technology
Can you hear me now? AI acoustics to combat noisy sound with generative artificial intelligence
Noisy recordings of interviews and speeches are the nightmare of sound engineers. But one German startup hopes to solve this problem with a singular technical approach that uses generative artificial intelligence to improve the clarity of voices in video.
Today, AI acoustics got here out of hiding thanks to financing of 1.9 million euros. According to co-founder and CEO Fabian Seipel, AI-coustics technology goes beyond standard noise cancellation and works with any device and speaker.
“Our core mission is to ensure that every digital interaction, whether on a conference call, a consumer device, or a regular video on social media, is as clear as a professional studio broadcast,” Seipel told TechCrunch in an interview.
Seipel, an audio engineer by training, founded AI-coustics in 2021 together with Corvin Jaedicke, a lecturer in machine learning on the Technical University of Berlin. Seipel and Jaedicke met while studying audio technology at TU Berlin, where they often encountered poor sound quality in the net courses and tutorials that they had to take.
“We are driven by a personal mission to address the pervasive challenge of poor audio quality in digital communications,” said Seipel. “Although my hearing is somewhat impaired by music production in my early 20s, I have always struggled with online content and lectures, which led us to work primarily on speech quality and speech intelligibility.”
The marketplace for software that uses artificial intelligence to suppress noise and improve voice is already very strong. AI-coustics’ rivals include Insoundz, which uses generative artificial intelligence to enhance streamed and pre-recorded speech clips, and Veed.io, a video editing suite with tools to remove background noise from clips.
But Seipel says AI has a singular approach to developing AI mechanisms that truly reduce noise.
The startup uses a model trained on speech samples recorded on the startup’s studio in Berlin, AI-coustics’ hometown. People are paid to record samples – Seipel didn’t say what number of – that are then added to the info set to train an artificial intelligence noise reduction model.
“We have developed a unique approach to simulating audio artifacts and issues – e.g. noise, reverberation, compression, band-limited microphones, distortion, clipping, etc. – during the training process,” Seipel said.
I bet some people won’t mind AI-coustics’ one-time compensation system for creators, provided that the model the startup is training could prove quite lucrative in the long term. (There is a healthy debate about whether the creators of coaching data for AI models deserve to be compensated for his or her contributions.) But perhaps the larger and more immediate problem is bias.
It is well-known that speech recognition algorithms may cause errors – errors that ultimately harm users. AND test published in The Proceedings of the National Academy of Sciences found that speech recognition from leading firms was twice as likely to incorrectly transcribe audio from Black speakers than from white speakers.
To combat this, Seipel says the AI focuses on recruiting “diverse” contributors to speech samples. He added: “Size and diversity are key to eliminating bias and ensuring the technology works across languages, speaker identities, ages, accents and genders.”
It wasn’t essentially the most scientific test, but I submitted three video clips – and interview with a farmer from the 18th centuryAND automotive driving demonstration and Protest in connection with the Israeli-Palestinian conflict — to the AI-coustics platform to see how well it handles each of them. AI has indeed delivered on its promise to increase transparency; to my ears, the processed clips had significantly less ambient noise drowning out the speakers.
Here’s an earlier clip of a farmer from the 18th century:
And after:
Seipel sees AI-coustics technology getting used to enhance real-time and recorded speech, and maybe even being built into devices resembling soundbars, smartphones and headphones to routinely increase voice clarity. Currently, AI-coustics offers an online application and API for audio and video post-processing, in addition to an SDK that permits the AI-coustics platform to be integrated with existing workflows, applications and hardware.
Seipel says the AI – which makes money through a mix of subscriptions, on-demand pricing and licensing – currently has five enterprise customers and 20,000 users (though not all of them are paying). The plan for the subsequent few months includes expanding the corporate’s four-person team and refining its basic speech amplification model.
“Prior to our initial investment, Coustics AI was operating quite leanly and at a low burn rate to weather the headwinds in the VC investment market,” Seipel said. “AI-coustics now has a significant network of investors and mentors in Germany and the UK who provide advice. A strong technology base and the ability to serve different markets with the same database and core technology gives the company flexibility and the ability to change less.”
When asked whether audio mastering technologies resembling AI acoustics could steal jobs what some experts fearSeipel saw the potential of artificial intelligence to speed up time-consuming tasks that currently fall on audio engineers.
“A content creation studio or broadcast manager can save time and money by automating parts of the audio production process using artificial intelligence while maintaining the highest speech quality,” he said. “Speech quality and intelligibility continues to be a vexing issue for nearly every consumer or skilled device, in addition to when creating and consuming content. Any application that records, processes or transmits speech can potentially profit from our technology.
The financing got here in the shape of an equity and debt tranche from Connect Ventures, Inovia Capital, FOV Ventures and Ableton CFO Jan Bohl.