IVR: The History And Future Of Speech Recognition
October 11, 2017
By the mid to late 90’s, telephony was advanced enough for callers to interact with an IVR through both speech and telephone keypads. During this time, rumors fueled by industry experts about the new voice response technology and promises for more sophisticated applications were fiercely circulated. Yet, in looking at today’s speech IVRs, it’s painfully clear that very little real progress has actually occurred. Today’s models are acutely limited in scope, and are frequently implemented incorrectly. We’ll talk more about these points in a moment, but for now I’d like to highlight important characteristics of the newer technologies.
There are basically three popular tiers of speech IVR technology, provided by a single well known technology supplier. The tiers, which consist of keyword, key phrase, and natural recognition are each designed with a specific degree of sophistication to accommodate as many business settings as possible.
The least sophisticated tier, keyword recognition, works by having the IVR guide or direct caller requests. For example, the IVR might say something like “please say or press 1”. In the event that the caller said something that did not match the IVR’s script, the caller may be sent to the beginning of the menu – or even disconnected from the call entirely.
The next step up from that, key phrase recognition, works in a similar fashion as search engines. Basically, the IVR provides answers based on a predefined set of terms provided by the caller. The caller might say something like “please give me the number to the bank branch near Disney”, and the IVR would provide information about the key phrases “bank” and “Disney”. Obviously, because the caller does not know what the key phrases are in advance, communication is prone to breakdown.
Finally, the most sophisticated technology is dramatically different from the two previous examples given. With natural recognition, the IVR is designed (hypothetically) to fully understand the caller’s real, or natural language. In this case, the caller may say something like “I’ve already tried fixing my computer by restarting, turning off power, unplugging, now what do I do?”. In order for this type of technology to work successfully, three different things must first happen:
- The IVR needs to be programmed with an extensive list of vocabulary, and in addition, must also figure out how frequently these vocabulary items appear.
- The IVR needs to be able to understand complete sentences, so grammar must also be programmed.
- The IVR needs to be able to extract the caller’s actual intent, and not just rely on keywords or phrases.
It should be rather apparent to the reader that natural recognition is the preferred level of speech technology. However, at present these IVRs are restricted to the simplest applications of call routing. The reason more sophisticated applications are not being used is because the costs required for IVRs capable of extracting intent is astronomical! Rather than choose the lesser of three evils, there is yet another IVR technology on the market — IVRs powered by artificial intelligence (AI).
The newer IVRs are essentially still stuck in the 90s and have failed to deliver on the promises that were made during their inception. Modern IVRs that embrace technology from the current era have moved beyond promises of a better IVR, and on to doing what their predecessors only dreamed of.