NSS Group logo
Voice Processing

An NSS Group White Paper

Table of Contents

Introduction
Computer Telephony Integration (CTI)
PBX Standards
CTI Applications
Voice Mail
Fax Mail
Automated Call Distribution
Fax On Demand
Interactive Voice Response (IVR)
Call Queuing
Desktop Control
Audiotex
Speech Recognition
Open Standards in the Telephony World
Microsoft
Novell
TAPI vs. TSAPI
What Does the Future Hold?

INTRODUCTION

Effective communication is one of the most important elements of a successful business, and communication modes have broadened in recent years to include not just voice, but fax and electronic data too. The use of the LAN and WAN, and the more widespread acceptance of newer technologies such as electronic mail, has helped to increase efficiency of communications in many companies throughout the world. E-mail has been able to make this impact due largely to the relatively low cost of PC-based systems, allowing us at last to have a reasonably powerful computing resource on everyone's desk.

Text-based messaging systems have become increasingly sophisticated, and with the Internet at our disposal and the adoption of enterprise-wide messaging strategies by many large corporates, world-wide e-mail communication has become a reality.

When it comes to voice communications, however, things have not progressed to the same extent. We still have the daily problems of being put through to the wrong extension, extensions ringing unanswered, or being answered by a complete stranger who has little incentive to ensure your important message reaches the intended recipient.

The telephone remains the commonest communications medium, however, and this in spite of the damning statistics (source: International Resource Development, Inc.) which tell us :

  • 70% of all business calls are not completed at the first attempt
  • 76% of calls do not require an immediate response
  • 65% of calls are for one-way transfer of information
  • 67% of calls are considered less important than the work they interrupt

Obviously many of the above statistics infer that we should be using some form of indirect electronic communication medium, but the fact remains that real-time interaction - best achieved by use of a telephone - is still required in many circumstances. The obvious conclusion, then, is to integrate the two.

Computer Telephony Integration (CTI)

It may come as a surprise to many, but the integration of computers and telephones - known, unsurprisingly, as Computer Telephony Integration (CTI) - is not a new concept. In fact it has been around for a number of years in the mini and mainframe environment, where it has been restricted mainly to large call-centre installations because of the high costs involved.

CTI allows access and control of telephone functionality from a graphical interface at a computer terminal or PC. The use of PCs and LANs, of course, has lowered the hardware costs to the point where CTI applications are now within the reach of most sites, and providing CTI in a LAN environment brings the technology into the mainstream.

CTI is an ideal example of client-server technology at work. To provide voice processing and PBX integration for every PC which needs it would require a tremendous capital investment. Instead, all of the voice processing hardware is installed in a dedicated central PC. In some systems, this PC is capable of acting as a stand alone voice-mail and telephony server, storing all the messages on its local hard disk and performing the necessary call handling by virtue of the fact that it is attached directly to the PBX. But by also attaching that telephony server to the LAN, we provide a logical link between every telephone and every PC on the network.

The use of PCs and LANs allows expensive resources - like a fully populated telephony server - to be centralised and shared by every authorised network user. Add to this the relatively low cost of even the most powerful PC and the readily available off-the-shelf voice processing cards, and the result is an overall lowering of hardware costs to the point where CTI applications are now within the reach of most sites.

PBX Standards

One of the biggest obstacles currently facing the CTI vendor is the lack of standards when it comes to integrating computers and PBXs. Even the name can cause confusion, since it can be known simply as a "switch", as a PBX (Private Branch eXchange) or PABX (Private Automatic Branch eXchange) - all different ways of referring to the common telephone switchboard!

Most PBXs provide for integration over the phone lines between the PBX and the ports on the voice processing cards using the standard DTMF signals (the musical tones you here when pressing your telephone keypad) - this is knows as "inband signalling". Other PBXs provide a more direct form of integration, known as "outband signalling", which requires a serial link between the telephony server and a dedicated port on the PBX.

Standard Messaging Device Interface (SMDI) is a simple protocol for such links, originally developed by AT&T as a standard for integration of voice mail equipment to Centrex switches (a "virtual PBX" service, performing the functions of a PBX centrally without the need for equipment on the client's premises). A variety of switches provide SMDI output, whilst others provide similar functionality but using a proprietary protocol. For the latter, a protocol converter can usually be purchased to change the proprietary protocol into SMDI.

CTI Applications

Voice Mail

Voice mail (or v-mail) - like e-mail before it - is set to sweep the corporate world. It is not, however, destined to replace e-mail, but rather to complement it. And unlike e-mail, it has applications in even the smallest business, especially when combined with basic call-handling capabilities.

The problem with e-mail, of course, is that it requires that a PC be available for everyone who wishes to use it. This is not always possible, or desirable, and no matter how popular the e-mail system is, unless everyone in the company is using it, its value is greatly diminished.

On the other hand, there must be few companies around these days where a telephone is not available to every employee - and that is all you need to access a v-mail system. Simply dial an extension, and if it is busy or there is no answer, you can leave a voice mail message instead - the call transfer is performed automatically by a process known usually as the "automated attendant", a program whose sole job it is to route calls according to a predefined set of rules.

Perhaps you don't actually want to speak to anybody, but just wish to retrieve your own v-mail messages. This can also be done at the touch of a couple of buttons on your phone, at which point you can also save or delete your messages, reply to them or forward them to another mailbox. A further huge benefit is the straightforward access provided to remote users of your system - home workers or sales people requiring access from a hotel room. Once again, all that is required is to dial in to the system in order to access your messages, and leave some of your own.

Voice mail systems which have sophisticated replay options (such as fast forward, rewind, pause, speed and volume control) - especially those which can be controlled via a foot pedal - can also be utilised as advanced dictation and transcription systems.

Fax Mail

Where fax cards are installed in the telephony server, incoming faxes can be stored centrally, or routed to individual fax mail boxes. These can be viewed and printed locally at leisure, and remote users can dial-in and have their stored faxes redirected to wherever they happen to be at that moment - even their e-mail messages can be picked up from the system at the office and faxed to them.

Where a fax is not available to the remote user, text to speech facilities available on even the basic voice processing cards allow e-mail messages to be spoken to the recipient over the phone line. The next step (possibly not too far away) is fax to speech, allowing the text portion of stored faxes to be spoken to the caller.

Although this may sound very similar to the text to speech facility just mentioned, it should be remembered that text in e-mail messages is already in a computer-readable form. Text on a fax must first go through some form of reliable Optical Character Recognition (OCR) process to convert it to the same format as the e-mail message.

Automated Call Distribution

Automated Call Distribution - or ACD - is a function of the auto-attendant feature already mentioned. As the name implies, external and internal calls can be routed between extensions, voice and fax mail boxes - without operator intervention - based on a set of user-programmable rules.

If the automated attendant is well programmed, communications between your company and your potential customers could be improved a thousandfold. For instance, your standard greeting message could include a menu of options which encourage the caller to press a digit on their telephone keypad depending on which department they wish to speak to. The call is then routed automatically to the correct extension or hunt group. Should there be no answer within a predetermined number of rings, the call can then be re-routed to an "overflow" extension, or directly to a voice mailbox.

Your customers need no longer be left hanging on the end of an eternally ringing extension - the automated attendant will attempt to re-route the call elsewhere or will allow them to leave a message. If your customers know the extension of the person they wish to speak to, they can dial directly, and once again leave a message if unavailable - an end to the frustration of being passed from person to person, each one asking their name and number before finally taking a message which is ultimately mislaid.

Fax On Demand

We have already touched on the fact that many telephony servers may want to include integrated fax facilities to provide a fax mail capability. A further use for fax hardware is to provide an automated Fax On Demand service. With such a system, extensive product literature and reference material can be placed on your server, each document identified by a unique number, and listed in a fax "catalogue".

Customers could then dial in and request any of this information, entering only the required document number(s) and their own fax number. Minutes later the requested literature arrives on their own fax machine.

Interactive Voice Response (IVR)

Interactive Voice Response (IVR) is a facility which many will have had the chance to experience first hand, thanks to the provision by several high street banks of a home banking service.

"Voice Forms" prompt users through a series of questions, which can be answered in a variety of ways : by speech, where the answer is simply recorded for later playback (to capture name and address, for example); by speech, where the caller’s response is subjected to voice recognition in order to extract computer-useable information directly; or by entering numerical information via the telephone keypad.

IVR can be used to take support calls or enquiries if all operators are busy, or even to prompt your customers through an automated sales order entry program, allowing them to place telephone orders, request information on their account activity or confirm the status of a current order - all without you having to invest in a host of tele-sales personnel.

Call Queuing

If you operate any form of technical support or information centre, CTI could certainly ease the frustration for your customers when trying to get through to operators who always seem to be engaged.

The automated attendant portion of the telephony system would attempt to put the call through, find all the lines busy and inform the customer of that fact. Should the customer wish to hold, the system could prompt for the input of, say, the product serial number or the technical support agreement number which could be entered by means of the telephone keypad. The call will then be placed in a queue, with the customer informed at regular intervals of his position in that queue, and asked if he wishes to continue holding.

Meanwhile, the customer's details have been retrieved from the central database and his name is displayed in the queue information at the operator's screen, together with an indication of the length of time he has been holding. By clicking on the customer name, the operator can view all available information in a window, probably with a complete history of previous calls made. This helps the operators to optimise their time, spending longer on individual calls when there is no queue.

Desktop Control

The Call Queuing example talks of obtaining some from of identifier from the user in order to perform database look-ups. Now available in the UK (over some networks) is Calling Line Identification (CLI) - a system which provides the telephone number of the caller to the called party as they are connected.

Using special equipment, anybody will be able to see that number displayed on an LCD screen at the side of (or built into) the phone. If the phone number is known to you, you can then decide whether or not to take the call.

For CTI applications, of course, the implications are obvious. Whenever a call is received at your PBX, it is dealt with by the telephony server's auto attendant feature. Auto attendant passes the call through to your extension, and at the same time passes details of the call to the network file server. Those details are then transmitted across the LAN and picked up by a program running on your PC, where the incoming number could be matched up with a central database, allowing the caller’s details to be presented to you on screen as the phone on your desk begins to ring.

At the click of a button on the toolbar, you can take the call; ask the caller to hold; take a message (which is directed to your voice mailbox); ask for the caller's name; or route a call to a colleague. If you elect to take a message, you can select from a number of personal greetings to use on your caller (i.e. "in a meeting - will call back later", or "unavailable for a few days") when prompting them to leave a message.

More than that, the automated attendant facility could be programmed to perform selective call-barring or call-filtering on a user by user basis, diverting certain calls directly to your mailbox (or someone else’s) or allowing it through to your extension depending on the identity and importance of the calling party. When returning calls, many systems will offer out-dialling facilities, which will automatically dial the number and wait for an answer before ringing the extension at your desktop and connecting you.The whole concept is geared towards making the most efficient use of your time, and improving communications in the process.

Audiotex

Although home banking is on the increase, Audiotex currently remains the most well known application of CTI technology. Anyone who has ever rung an 0898 number will have been exposed to the services on offer from an audiotex bureau.

Using the sophisticated call handling, IVR and user interaction facilities inherent in many of today’s CTI applications, together with high-level scripting or programming languages, complete audiotex applications can be up and running on your telephony server within days. Different scripts can be run on the same line depending on the time of day thus allowing you to offer different services depending on your likely audience.

But audiotex also has applications in a purely corporate environment too. Automated response to advertised direct response campaigns, or to advertising of job vacancies, for example, are both areas which would benefit from audiotex.

An enormous number of responses can be collected and processed without having to employ teams of telephone operators. Don’t forget that audiotex provides a sophisticated interface to your computer systems for external users who do not have access to a PC - all the necessary details can be recorded over the phone or entered via the telephone keypad.

Speech Recognition

Until very recently, speech input into computers has been confined mainly to voice annotation, which is actually little more than recording rather than speech recognition. This is analogous to telephone answering machines, which today are commonplace, but do little to add much intelligence to facilitate human communication. They simply record a stream of speech for later playback.

However, voice capabilities are starting to become transformed from mere annotation into more intelligent communications. One of the earliest and most practical applications of speech recognition occurs in an area where the mouse and GUIs first excelled - system commands to control your PC. When used for such applications, speech recognition technology should be employed not to do everything with voice, but what is best done with voice.

Just the same as all dedicated GUI users still recognise that the keyboard is better than the mouse for certain tasks, likewise the keyboard and mouse are better than speech for many things. For example, saying "left, left, left, left, left" into a microphone in order to move the cursor in small increments is not very efficient compared with picking up a mouse and dragging the cursor to the exact spot you had in mind. But being able to say to the computer "DIAL BOB" or "REPLAY VOICE MAIL" or "DO NOT DISTURB" is similarly more efficient than performing all the necessary steps to launch the appropriate application, and then access one or more pull-down menus to locate the correct file and commands.

Application developers are actively working on applications that are truly voice-centric. While a voice-aware application senses the presence of voice system capabilities and takes advantage of them, a voice-centric application is totally oriented toward the voice function which holds most of its intrinsic use and value. The voice-centric applications of the future will be those which enable voice to eventually become the dominant way of interacting with a computer, a common theme in many science-fiction films today.

Speech recognition technologies today are characterised by a combination of four factors :

Whether the system can handle continuous or discrete (i.e. one word at a time) speech

Whether the system is speaker dependant, requiring "training" to recognise an individual voice, or speaker-independent

The size of the vocabulary the system can recognise, which is further broken down not only as to total number of words, but total number which can be made active simultaneously (thus allowing more complex sentence structures)

Whether the system can recognise speech over the telephone, with its narrow frequency bandwidths and other distortions.

The most difficult application is, of course, large vocabulary, continuous, speaker-independent voice carried over the telephone. Today, such systems represent the high-end of speech recognition, running on expensive hardware and generally used for telephone industry-type applications, where system speed and overall throughput are at a premium. With a few exceptions, most of today's speech recognition systems are speaker dependant, discrete word systems, but this is based as much on the amount of desktop processing power that is available for the typical user as it is on the state of the technology.

Open standards in the Telephony world

As we mentioned in the introduction, CTI is not a new technology - just new to the PC world. But like so many computer technologies before it, life began in the world of the mini and the mainframe and the "closed architecture".

Call processing systems have, in the past, been proprietary systems implemented on proprietary hardware, and requiring considerable amounts of processing power in order to run them effectively. With the advent of the PC, computing power has increased whilst costs have decreased in inverse proportions. The whole premise of the PC industry is that of the "open architecture", however, and this is somewhat at odds with the current status of the switch industry.

The problem is that the switch manufacturers are reluctant to open things up to the extent that PBX's become just another commodity product. This makes life extremely difficult for those developers attempting to write call processing applications, since implementing the same application on a variety of switches (a necessity if one is to achieve a reasonable market share) requires significant portions of the program code to be rewritten for each new switch environment.

What is required is a piece of software sitting in-between the switch and the application, communicating easily with all the various types of switches on the market, yet presenting a single Applications Programming Interface (API) to the developer.

Microsoft

Microsoft's attempt at this is called the Windows Telephony Application Programming Interface - TAPI for short. TAPI operates independently of the underlying telephone network and equipment, yet allows application programs to take control of telephony functions.

This includes such basic functions as establishing, answering and terminating a call, as well as supplementary functions such as hold, transfer, conference and call park found in PBXs and other phone systems.

The API also provides access to features that are specific to certain service providers, with built-in extendibility to accommodate future telephony features and networks as they become available.

Novell

Novell's stab at the problem is called TSAPI (Telephony Services Application Programming Interface), which is based on the Computer Supported Telephony Applications (CTSA) standard, widely referred to in Europe as CTI (Computer Telephony Integration).

Certainly the support is there for the Novell initiative, backed by a new consortium of telecommunications companies designed to drive forward CTI. The first members of the Novell Open Telephony Association (NOTA) are Alcatel, AT&T, Dialogic, Ericsson, Global Communications, GPT, Interconnect, Mitel, Philips, ROLM and SDX, with all these companies committed to supporting TSAPI in their own products due to ship later this year.

TAPI vs. TSAPI

As usual when Novell and Microsoft are involved, the two resulting "standards" are anything but complimentary. But this time, just for a change, they are not actually fighting against one another, with TAPI better suited to small installations and TSAPI to larger ones. There are three main differences :

Microsoft's is a client-oriented approach (requiring telephony hardware in each PC) whilst Novell's is a server-based solution (requiring a single voice processing card in a telephony server)

TAPI offers only 1st party call control (giving you control only over your own call), whilst TSAPI provides both 1st and 3rd party call control (3rd party giving you the power to control any telephone call from your own terminal).

Microsoft's TAPI comes closer to the ideal of providing a single API to cover all switches (at the expense of some functionality), whilst Novell's TSAPI provides a common API to the developer, but requires a different one to be written for each switch.

What does the future hold?

As with all new technologies, voice processing applications are not without their problems - though these are paltry at the side of the potential advantages offered.

The first is the familiar problem of multiple directories. Sites installing a host operating system like NetWare 4 or Banyan Vines, for example, which are both designed to reduce the amount of system maintenance required, could still find themselves having to create a user in the host directory services database, the e-mail system and the voice mail system. Hopefully, the future will bring closer integration between e-mail, voice mail and Network Operating System (NOS) directory services, meaning that an administrator will truly only have to create a user once to provide access to all the services offered across the network.

Another possible future direction is the wider use of Windows multimedia facilities to record and playback messages where required. This would be particularly beneficial when recording system prompts, since even low cost PC sound cards are capable of recording CD quality sound - far better than can be achieved using a telephone handset, for example.

In some sites, the multimedia PC approach would be a better one even for the recording and playback of v-mail messages, despite the potential distraction this could cause. It should be remembered, however, that the client-server approach of CTI means that the expensive voice processing and PBX linking hardware need only be installed in a single machine, instead of every machine which requires access to the system, resulting in a very cost-effective solution.

For those users who already have multimedia-capable PCs, however, it is worth noting that some suppliers are already using the internal sound capabilities for playback, whilst others are actively working towards doing so.

Finally, switch manufacturers too are looking to the future, producing PBX’s with an integrated network interface card. Using these, it will be possible to connect the switch directly to the LAN, thus doing away with the telephony server. The one thing the whole voice processing industry is actively campaigning for is an industry standard covering software control of switches. At the lower end of the market, manufacturers of voice processing cards are currently working with operating system vendors to develop NOS drivers to allow their products to be installed directly in the central file server.

All such moves are to be welcomed, since they will reduce the cost of installation and, ultimately, make the product easier to install and maintain - the days of the shrink-wrapped CTI product may not be too far away.

Top         Home

Certification Programs

Group Test Reports

White Papers

On-Line Store

Contact The NSS Group

Home

Send mail to webmaster with questions or 
comments about this web site.

Copyright � 1991-2006 The NSS Group Ltd.
All rights reserved.