Apple may be leaning on Google’s Gemini in some areas to bridge gaps in its current AI lineup, but the company is still working hard behind the scenes to make Siri faster, smarter, and more natural to talk to. A new research paper from Apple outlines an approach that could significantly reduce Siri’s response delay while also improving how its speech sounds.
Today’s AI voice systems usually build speech by assembling tiny pieces of sound, often generated as “tokens” that represent very short phonetic snippets measured in milliseconds. To produce a sentence, the model typically uses an autoregressive process—choosing the next speech token step by step based on what came before. While this method works, it can slow things down because the system has to make a huge number of small decisions before you hear a complete response. It can also lead to the occasional awkward pronunciation, especially if the training data didn’t contain enough variety for certain sounds or transitions.
In Apple’s latest study, researchers suggest that Siri could respond more quickly—and with more natural-sounding speech—by moving away from the standard token-by-token matching approach and instead using something called Acoustic Similarity Groups, or ASGs.
The idea behind ASGs is straightforward: instead of treating every possible speech token as a completely separate choice, the system groups tokens together based on how similar they sound to human ears. Because many speech sounds are perceptually close, some overlap between groups is expected and even useful. Once the model has these groupings, it can perform a probabilistic search within the most relevant ASGs and then apply autoregression at that group level. By narrowing the search space and making the selection process more efficient, the model can arrive at a suitable speech token sooner—helping reduce the time between your request and Siri’s spoken response.
While the research isn’t presented as a dramatic breakthrough on its own, it does underline Apple’s continued investment in improving its AI and machine learning foundation. It also reflects a broader goal: developing more of its own end-to-end AI technology over time, rather than relying heavily on outside solutions.
If Apple can translate this kind of research into real Siri upgrades, users could eventually notice two of the biggest improvements they care about most—quicker responses and speech that sounds less robotic and more conversational.




