Tech
New AI Voice Model Sparks Fascination and Unease Among Users

City, State – A breakthrough in voice technology has showcased the remarkable potential of AI-assisted communication, leaving users intrigued yet apprehensive. Sesame AI, a startup founded by Brendan Iribe, Ankit Kumar, and Ryan Brown, recently unveiled its Conversational Speech Model (CSM), which enhances human-like speech synthesis to an unprecedented level.
Launched in late February, the CSM allows users to interact with AI voice assistants named ‘Miles‘ and ‘Maya.’ Feedback from early testers has been overwhelmingly positive, with some, such as a user from Hacker News, expressing surprise at how human-like the interactions felt. ‘I tried the demo, and it was genuinely startling how human it felt,’ the user stated, voicing concerns about developing emotional attachments to such realistic AI.
During a personal evaluation, interactions with the male voice ‘Miles’ demonstrated a high level of expressiveness, mimicking breath sounds and interruptions to create a natural conversation flow. These features are designed intentionally; as the company explains, they aim to foster ‘voice presence’ that makes conversations feel authentic and meaningful.
Despite the technological marvel that CSM represents, criticisms arise as some users report feeling unsettled by the AI’s human-like qualities. Mark Hachman, a senior editor at PCWorld, recounted his experience, stating, ‘Fifteen minutes after ‘hanging up’ with Sesame’s new ‘lifelike’ AI, and I’m still freaked out.’ He cited the AI’s conversational style resonating too closely with an old friend.
The emotional impact of AI interactions has significant implications. While many users share astonishment at the AI’s realism, some individuals express caution. ‘I’ve been into AI since I was a child, but this is the first time I’ve experienced something that made me definitively feel like we had arrived,’ mentioned another Reddit user.
Gavin Purcell, co-host of a podcast, demonstrated the potential of the AI through a roleplay interaction that was so authentic it blurred the lines between human and AI. With a unique conversational model, Sesame integrated two AI models: a backbone and a decoder, to facilitate seamless processing of both text and audio in one comprehensive step.
Each model’s performance has been rigorously tested, revealing an impressive likeness to human conversation. Blind evaluations indicated no clear preference between CSM-generated speech and real human recordings in isolated contexts. However, evaluators noted that during more nuanced conversations, they still preferred human audio, highlighting the areas where the AI still has room for growth.
Brendan Iribe acknowledged the current limitations in a Hacker News comment, describing the AI’s eagerness and imprecision in tone and pacing. ‘Today, we’re firmly in the valley, but we’re optimistic we can climb out,’ he stated.
However, as voice AI technology enhances, so do potential risks, particularly regarding deception. A surge in scams utilizing lifelike speech raises alarm. The ‘next-generation voice AI’ risks eliminating common signs that indicate artificiality, posing significant challenges for personal and public safety.
While Sesame’s current model does not clone individual voices, future technologies based on its framework could pave the way for malicious exploitation by impersonating trusted figures in social engineering schemes. In response to concerns, some have sought to implement family verification procedures that utilize AI to confirm identities.
Despite the ethical dilemmas posed by such advancements, Sesame is committed to open-sourcing key components of its technology in hopes of inviting community collaboration. The company plans to expand support for over 20 languages, increase data sets, and refine interaction techniques through fully duplex models.
As voice AI technology continues to evolve at a rapid pace, its integration into everyday life is on the horizon, creating both opportunities and challenges that require careful consideration.