How Does Siri Really Work?
Those of us who are iPhone users have been using and probably enjoying Siri for quite some time now. But have you ever wondered how Siri really works? Here’s a summary from SmartPlanet on how it works from the moment you speak to your phone to the moment it gives you results.
The sounds of your speech were immediately encoded into a compact digital form that preserves its information.
The signal from your connected phone was relayed wirelessly through a nearby cell tower and through a series of land lines back to your Internet Service Provider where it then communicated with a server in the cloud, loaded with a series of models honed to comprehend language.
Simultaneously, your speech was evaluated locally, on your device. A recognizer installed on your phone communicates with that server in the cloud to gauge whether the command can be best handled locally — such as if you had asked it to play a song on your phone — or if it must connect to the network for further assistance. (If the local recognizer deems its model sufficient to process your speech, it tells the server in the cloud that it is no longer needed: “Thanks very much, we’re OK here.”)
The server compares your speech against a statistical model to estimate, based on the sounds you spoke and the order in which you spoke them, what letters might constitute it. (At the same time, the local recognizer compares your speech to an abridged version of that statistical model.) For both, the highest-probability estimates get the go-ahead.
Based on these opinions, your speech — now understood as a series of vowels and consonants — is then run through a language model, which estimates the words that your speech is comprised of. Given a sufficient level of confidence, the computer then creates a candidate list of interpretations for what the sequence of words in your speech might mean.
If there is enough confidence in this result, and there is — the computer determines that your intent is to send an SMS, Erica Olssen is your addressee (and therefore her contact information should be pulled from your phone’s contact list) and the rest is your actual note to her — your text message magically appears on screen, no hands necessary. If your speech is too ambiguous at any point during the process, the computers will defer to you, the user: did you mean Erica Olssen, or Erica Schmidt?