Technology

Soul’s AI Voice Model Innovates Emotional Intelligence in Tech

NY Weekly Staff

September 26, 2024
6:08 pm

The CEO of Soul, Zhang Lu, and her team have been in the news lately for the significant strides they have made towards infusing emotional intelligence into artificial intelligence models and for upgrading existing models to achieve multimodal capabilities.

The most recent in this string of developments is the upgrade that Soul Zhang Lu’s team made to their voice model. The new addition is a self-developed end-to-end full-duplex voice call model that brings some extraordinary features to the table. Among its most impressive attributes are ultra-low interaction latency, rapid automatic interruption, highly realistic voice expression, and emotion perception capabilities.

Together, these capabilities allow the model to understand the highly nuanced world of sounds. It also supports highly human-like, multi-style language. So, it is capable of strikingly interactive conversations that are very close to the communication style used by humans. Hence, applications powered by this model can provide an emotional companionship experience that is close to what an interaction with a real person can offer.

At this time, this self-developed end-to-end voice call model is in beta testing, but soon it will be integrated into AI companionship and AI interaction scenarios, such as the very popular in-app assistant and chatbot, AI Goudan.

Since the platform’s launch in 2016, Soul Zhang Lu has been committed to expanding social experiences through innovative technological solutions and product design. In 2020, the team behind the extremely popular social media app began research and development in AI-generated content (AIGC). This led to systematic advancements in key technologies such as intelligent conversation, voice technology, and virtual humans. In turn, the developments allowed the team to facilitate the deep integration of AI capabilities in social settings via various platform features.

This was and continues to be in line with Soul Zhang Lu’s core focus – To upgrade social interactions through AI-powered applications. The central idea for the team is to pursue advancements and capabilities that will allow artificial intelligence to offer a realistic and emotional companionship experience to the users.

Naturally, voice plays a key role when it comes to realizing the goal of realistic, human-like interactions between AI characters and the platform’s users. Verbal communication is vital for conveying information and emotions. Of course, a lot can also be accomplished through the written word but giving those thoughts, words, and ideas a voice is what creates a true human-like experience.

Also, voice is the most effective when it comes to providing “emotional warmth” and a “sense of companionship” in communication. Especially in regards to the use of AI in social scenarios, voice can be a game changer. In fact, emotionally expressive, low-latency, multi-style, and highly realistic voice capabilities can break the “dimensional wall,” allowing for the natural flow of conversation.

In turn, this can lead to more immersive interactions between humans and machines, simulating real-life conversational experiences in online social settings. Because Soul Zhang Lu’s aim has always been to offer rich experiences to the platform’s users, emotion perception and latency issues have always been the focus of Soul’s technical team.

To meet this goal, Soul Zhang Lu’s team has launched several self-developed voice models, including voice generation models, voice recognition models, voice dialogue models, and music generation models. These models support realistic voice generation, DIY voice creation, multilingual switching, and emotionally rich real-time conversations.

Under Soul Zhang Lu’s leadership, the engineers of the platform are consistently keeping pace with cutting-edge international technological developments. Over the last few years, Soul has continued to build on its accumulated large language model and voice model capabilities. This has allowed the platform to offer new and improved AI social application experiences to its users.

Moreover, in July of this year, Soul Zhang Lu’s team brought home the trophy for their submission at the second Multimodal Emotion Recognition Challenge (MER24) held by the International Joint Conference on Artificial Intelligence (IJCAI). Soul clinched its win in the SEMI (Semi-Supervised Learning) category, which centered on optimal model training strategies and techniques.

With their submission at MER 24, the team of Soul Zhang Lu had already ushered in the reality of an emotionally intelligent model with multimodal capabilities. The launch of this self-developed end-to-end voice call model is a development that has been built on the team’s accomplishment at MER24.

What makes this end-to-end voice-to-voice model truly unique is its shift from the traditional cascading system. Unlike run-of-the-mill voice interaction systems, Soul’s upgraded model eliminates the need for multiple stages such as “speech recognition, natural language understanding, and voice generation. Because the end-to-end model allows for direct voice input and output, it enables near-lossless information transmission and reduces response latency.

In fact, when tested for latency during real-world application tests, the model offered latency levels below the industry average. Simply put, it was able to achieve true instant AI communication and companionship.

Soul’s engineers are hard at work trying to bring the latest and most groundbreaking AI technology advancements to the platform. So, users won’t have to wait too long for even more emotionally resonant experiences from Soul’s many features.

Published By: Aize Perez