August 2012

Transcription

August 2012
August 2012
ISSN 1932-8214
AT&T makes its speech technology available to developers
Network-based speech recognition accessed through an API
AT&T Research (formerly Bell Labs) has been
involved in speech technology research for many
decades, for example, developing a continuous digit
recognizer in the 50s. (See the interview with Mazin
Gilbert, AVP of the Intelligent Systems
Organization, AT&T Research, SSN, May 2012, p.
15.) The company’s speech recognition technology
has found a home in some deployed applications,
including the Vlingo voice assistant that is part of
the new Samsung Galaxy III, with the assistant
called S-Voice (SSN, July 2012, p. 1). (Vlingo is
now part of Nuance, so the technology used in the
Samsung phone lines may evolve eventually to
Nuance technology.) Among other applications,
AT&T speech technology has been used within
AT&T for IVR customers for over 20 years.
The AT&T Watson speech technology has now
been made available to developers as a networkbased service accessed through an Application
Programming Interface (API) that AT&T recently
released. Gilbert summarized in a note to Speech
Strategy News: “By exposing the speech APIs, we
are lowering the barrier to entry for developers to
empower their applications with speech. The
responses we have received so far have been
overwhelming. Our plans don't stop here. We will
continue to expose additional APIs and innovations
to enable developers to create more advanced and
personalized mobile applications ranging from
virtual assistants to interactive gaming. Stay tuned!”
Continued on page 17
Nuance to release Dragon NaturallySpeaking 12 desktop dictation software
Improved accuracy and more ease-of-use features
On July 26, Nuance Communications announced
the latest version of their speech-to-text dictation
software for Windows personal computers, Dragon
NaturallySpeaking 12. While Nuance says there are
more than 100 new features and enhancements,
perhaps the most important is a 20% drop in error
rate on average in the core speech recognition
technology out-of-the-box plus faster response,
according to Erica Hill, a product marketing manager
at Nuance. The faster speed of transcription is
achieved in part by taking advantage of multi-core
processors and more memory if available, Hill said.
Dragon is the core technology in Dragon Dictation,
the Dragon Go! mobile personal assistant, Dragon
TV, Dragon Drive! (the automobile version), and
Dragon ID (biometric speaker authentication).
Continued on page 18
Google Voice Search in Android 4.1 Jelly Bean
More direct answers with Google Now and “all business”
Google made announcements on June 27 on
improved voice search in the next version of
Android, which allows queries in natural language
(SSN, July 2012, p. 7). With Google’s Knowledge
Graph display of search results that attempt to
provide a more direct answer to your inquiry next to
the classical research results (SSN, June 2012, p. 1),
The voice search can also respond by voice.
Google also announced Google Now, which adds
to user-initiated searches in that it attempts to
anticipate a search using location and time of day
(e.g., weather), and displays multiple possible panes
of such information. It can automatically notice you
have an appointment upcoming, calculate travel
time, post a map in the pane, and even warn you
when you should leave. In the past, Google has in the
past displayed what it calls “one boxes,” displaying
relevant information in one pane next to the search
web sites, such as a map showing locations of
restaurants in response to a search for “Greek
restaurants.” Google Now also formats the results in
a way more suited to a mobile device. Google also
has a translation app within Google Now.
Continued on page 19
Interviews with Nik Stanbridge, VoiceVault, p. 13; Chih-Chung Kuo, ITRI, p. 15
Speech Strategy News
August 2012
2
Table of Contents
AT&T makes its speech technology available to
developers
1 Nuance Dragon Drive! messaging in 2012 BMW 7
and BMW 3 Series
12 Nuance to release Dragon NaturallySpeaking 12
desktop dictation software
1 AVIOS Speech Conference in Israel draws both
academia and industry
12 1 12 Network-based speech recognition accessed through
an API
1 Improved accuracy and more ease-of-use features
Google Voice Search in Android 4.1 Jelly Bean
More direct answers with Google Now and all
business
Editor’s Notes
Adding value rather than adding apps
Bill Meisel, Publisher & Editor
Commentary: Follow-up on software patents
Bill Meisel, Publisher & Editor
ICSI and Microsoft to collaborate on
conversational human-machine interaction
1 1 5 5 5 6 6 7 International Computer Science Institute will explore
speech and other modalities
7 US Department of Justice licenses Nexidia audio
discovery
7 Criminal Division installs Nexidia audio technology to
streamline investigations
7 iSpeech Cloud mobile speech platform claims
13,000 developers
7 Improvements in speech recognition and text-tospeech and a home automation solution announced 7 Empirix launches automated contact center
regression testing as a service
Checks how existing services are impacted when
changes are introduced
Voxeo launches "Zombie IVR" campaign
8 8 9 “Talking-Dead” self-service platforms continue to suck
the life out of customer satisfaction
9 Calabrio integrates speech analytics and
workforce optimization
More efficient review of voice transactions
StrikeForce and TradeHarbor partner to offer
three-factor voice verification
10 10 10 Out-of-band authentication adds security for mobile
transactions
10 Active Endpoints allows use of enterprise
software from a mobile phone
Helps iPhone and Android smartphone users to
visualize, create, and modify their own wizards
11 11 Polish telecom deploys Nuance VocalPassword
for use by its employees
11 Subsidiary of Deutsche Telekom uses voice
verification for automatic resetting of passwords
11 Speak emails and text messages with cloud-based
transcription
12 Speakers from around the world addressed both
academic and business issues
Trapit uses natural language text processing to
deliver content
12 Highly personalized content based on user-specified
interest and user-specific adaptation
12 Interview with Nik Stanbridge, VoiceVault
13 Interview with Chih-Chung Kuo, Industrial
Technology Research Institute
15 Biometric identity verification with text-dependent
voiceprints
13 ITRI’s speech research includes speech recognition,
speaker recognition, and speech synthesis
15 News briefs ............................................... 19 VoiceVault releases new generation of its speaker
verification.................................................................... 19 Samsung reduces search capability in its Galaxy SIII
smartphone, apparently in response to Apple patent
suit ................................................................................ 20 Nuance hints that its personal assistant aimed at
corporations will be called “Nina” .............................. 20 Voxeo Labs announces strategic partnership with
Deutsche Telekom ....................................................... 20 Pronexus adds web sites to support IVR developers ... 20 Spoken Communications partners with Varolii to
combine customer interaction applications with
Spoken’s inbound capabilities.................................... 20 Chinese search giant Baidu opens tech lab in
Singapore, with a partial focus on speech applications20 SRI reveals voice assistant for Spanish banking group
BBVA ............................................................................. 20 Northwest Multiple Listing Service selects Interactive
Intelligence’s IP communications software suite ...... 21 CallMiner analytics solution adds new personalization
features ........................................................................ 21 Hungarian telecom operator introduces voice
identification in customer service operations using
Nuance voice biometric solution ................................ 21 Sandata Technologies Electronic Visit Verification
technology to improve visibility and oversight of home
care delivery for Louisiana Department of Health and
Hospitals ...................................................................... 21 Easy Voice Biometrics allows finding closest match to
individuals when comparing voice files ..................... 21 Google adds voice search to Google+ Local on iOS ..... 22 VoiZapp app uses Android speech recognition to post to
a Facebook news feed ................................................ 22 Mossberg review of Android Jelly Bean criticizes the
voice assistant ............................................................. 22 Speech Strategy News
August 2012
3
Wolfram|Alpha provides answers to Samsung’s S Voice
as well as Apple’s Siri .................................................. 22 BMW to incorporate Nuance voice command in its
dashboards .................................................................. 22 Horizon Private Cloud provides outsourced services to
Voice Automated, a Nuance Dragon reseller............. 22 Motorola’s new ATRIC HD phone for AT&T automatically
goes into car mode when docked .............................. 23 Dictation features in new Mac OS is done in the
network, some personal data is used ........................ 23 Mercedes-Benz adds connection to Apple Siri to its
COMAND navigation system ....................................... 23 It’s will be legal to text while driving in California if you
use speech recognition in the new year .................... 23 SoundGecko web application and mobile phone app is
a text-to-speech service .............................................. 23 Veveo provides predictive search on Android phones,
including personal info on phone ............................... 23 Nuance Dragon Dictation and Dragon Search apps now
available in Vietnam .................................................... 24 Samsung TV has voice recognition ............................... 24 HondaLink allows communicating with your car using
your smartphone ......................................................... 24 TalkTalk chooses Nexidia Advanced Interaction
Analytics for its phone services .................................. 24 United Hospital System selects M*Modal clinical
documentation system with speech recognition ....... 24 Providence Health & Services deploys Nuance Dragon
Medical 360 Network Edition ..................................... 25 Terra Nova provides Health Sciences North with
transcription and speech recognition editing services25 4medica’s cloud-based Electronic Health Record adds
medical speech recognition from Nuance ................. 25 me2me releases a new version of its digital dictation
app for iOS and BlackBerry devices, targeted at the
healthcare market through M*Modal partnership ... 25 Leon Medical Centers selects IDS for speech
recognition, mobile dictation, and workflow solutions26 Google search box in Chrome web browser displays
calculator when calculation is entered, allows voicing
equation ....................................................................... 26 SpeakGlobal adds text-to-speech to its English
language learning site for Japanese learners ........... 26 Raytheon BBN awarded DoD contract to develop a
foreign-document translation system ........................ 26 Carnegie Speech provides English language training
with speech recognition for training institute in Dubai26 Goya Foods chooses Wavelink for voice-enabled
warehouse picking solution ........................................ 27 W3C Multimodal Interaction Working Group publishes
“Registration & Discovery of Multimodal Modality
Components in Multimodal Systems: Use Cases and
Requirements” ............................................................. 27 Shanghai Zhi Zhen Internet Technology sues Apple in
China over Siri .............................................................. 27 International Research Consortium (U-STAR) launches
translation app............................................................. 27 Voxbone provides phone network for Lexifone speechrecognition-based realtime translation service ......... 28 Microsoft touch keyboard in Windows 8 corrects some
touch mistakes ............................................................ 28 Siri knows which is the best phone now ....................... 29 National Federation of the Blind sues over US State
Department’s purchase of Amazon Kindles, citing
limitations of the text-to-speech feature .................... 29 Accessible Media service adds text-to-speech ............. 29 Microsoft improves accessibility TTS function in
Windows 8 .................................................................... 29 Proloquo2Go assistive software offers children with
speaking disabilities artificial speech ........................ 30 Google researches computing methods using simulated
neural networks ........................................................... 30 James and Janet Baker still pursuing compensation for
their Dragon speech recognition technology ............. 30 Robots don’t just beep to warn you of movement, they
now talk ........................................................................ 30 Analyst compares Siri speech recognition search to
Google text search ....................................................... 30 Loading the dishwasher is still a job! ............................ 31 Taiwan's National Cheng Kung University files patent a
lawsuit against Apple over Siri features ..................... 31 Statistics and Surveys ............................... 31 Smartphones in use worldwide to exceed 2.4 billion in
2016 ............................................................................. 31 Smartphone shipments to grow 38.8% this year to 686
million units .................................................................. 31 Approximately three quarters of the world’s population
now has access to a mobile phone ............................ 31 325 million Android phones expected to be sold
worldwide in 2012 ....................................................... 31 Samsung Galaxy S3 hits 10 million units in sales within
two months .................................................................. 32 Android has 77% share of China’s smartphone market32 Biometric security to become a “must have” on all
smart mobile devices, market research firm claims . 32 Apple iPhone maintains consumer interest over Android32 If you are under 34, you most likely use your mobile
phone as your primary phone ..................................... 32 Voice search from Google on top 10 list of downloaded
apps .............................................................................. 32 The mobile ad market could reach $18.3 billion by
2015 ............................................................................. 32 Consumers show mixed interest in mobile coupons ... 32 Global mobile app store revenue to exceed $34 billion
in 2016 ......................................................................... 33 Hispanic community increasingly using mobile devices
as a primary means of Internet access ...................... 33 Nearly six out of 10 parents of children aged 8-12 have
provided their children with cell phones .................... 33 Vocalabs finds that making it hard for a customer to
reach an agent serves no purpose ............................. 33 Contact center campaign survey concludes that the
phone remains the most popular communications
channel ......................................................................... 33 A variety of issues flagged in a survey of contact center
professionals................................................................ 33 600 million smartphones projected to support gesture
recognition in 2017 ..................................................... 33 Speech Strategy News
Financial Notes ......................................... 34 Nuance reports Vlingo financials .................................. 34 M*Modal to be acquired for approximately $1.1 Billion
by One Equity Partners ................................................ 34 Agero expands cloud-based content delivery to vehicles
with investment in M-Way Solutions of Germany ...... 34 Samsung delivers higher profits due to smartphone
sales surge ................................................................... 35 West Corporation reports increased revenue and profits
for its second quarter .................................................. 35 Spoken Communications acquires HyperQuality,
provider of quality assurance and business
intelligence for contact centers .................................. 35 Interactive Intelligence announces preliminary Q2
results ........................................................................... 35 Apple acquires fingerprint scanner firm AuthenTec..... 35 People ...................................................... 36 Thomas B. Sabol named Chief Financial Officer of
Comverse, Inc. ............................................................. 36 Bill Robinson named Executive Vice President of
Worldwide Sales at inContact ..................................... 36 Eliza names Lee Horner Senior Vice President of Sales36 Lyle Ball named Chief Operating Officer at translation
company MultiLing ...................................................... 36 Cyara Solutions names Laurence Webb general
manager of sales for Australia and New Zealand ..... 36 For Further Information on Products Mentioned
in this Issue .............................................. 37 Meisel-on-Mobile (www.meisel-on-mobile.com)........... 43 August 2012
4
Speech Strategy News
August 2012
5
Editor’s Notes
Adding value rather than adding apps
Bill Meisel, Publisher & Editor
Modularity is an important principle. Today’s complex software couldn’t be written without subroutines or
layers of software such as the operating system, device drivers, etc. Adam Smith in his An Inquiry into the
Nature and Causes of the Wealth of Nations, talked about the “division of labor” creating efficiencies by
breaking up a complex task into separate steps (each requiring less skill and training than the whole), using a
pin factory as an example. The Web can be considered a set of modules (web sites) with different information
and services.
Today’s mobile apps might be thought of as applying the “principle of modularity” to the user interface of
a mobile device. If a developer can think of something you might want to do on a mobile device, today they
can create a module for just doing that, and it can be downloaded to your device easily and extend the
function of that device. You can assemble pieces to match your needs.
But modularity works best when it contributes to the whole. The subroutines in a well-designed application
contribute to overall effective program behavior, delivering a consistent and unified experience. When that
integration fails, the software will fail as a product. A factory production line has to create a total product
with features that customers will buy—the head of a pin is not of much use without the rest of the pin.
Without search engines to unify all the variety on the Web, the Web wouldn’t be the asset it is today.
Integration can be an issue for mobile apps. One can tolerate learning, navigating to, and launching an app
that is used often. A frequently used application can be placed in a prominent position, and frequent use
means you won’t have to think much about how to use it. But, as the number of apps grows, usability drops.
The value of the hundreds of thousands of apps available is that you can assemble a set of capabilities
tuned to your every interest and whim. But once assembled, are they a whole? Or to use most of them, do you
find yourself looking for a particular app and then having to remember how to use it? Is your “user interface”
over-burdened—a series of disconnected modules? Has modularity been overdone?
In a mobile device, or even a PC/laptop, this integration will become increasingly critical over time. The
current methods, such pages full of application icons on a mobile device or a list of Web pages delivered as
the result of a search, are already becoming over-burdened. Even the ubiquitous pull-down menus on PC
applications are getting hard to use as features and sub-menus proliferate. If every company develops an
interactive mobile app (voice-enabled or otherwise) that is used only occasionally by a consumer, the number
of mobile apps will become like the number of web sites, requiring a unifying force.
Adding features to add value has its limits. Most big breakthroughs have been means of allowing
modularity to have its impact while integrating that modularity into a whole. The Graphical User Interface
was such an innovation. Personal assistants on mobile devices are another integration innovation, although
perhaps one still in its infancy. Perhaps a search feature that includes apps (initiated by either voice or typing,
and, most likely including some natural language handling capability) will be the unifying force. This feature
is available on some phones through personal assistants or search for at least the apps delivered with the
phone.
If apps are included in a search function or a request to a personal assistant, there are two aspects that seem
necessary for a successful integration of separate modules. First, a new app/module should automatically
become accessible to that integration engine. For full integration, this might require an industry-standard way
for an app to report its name, what it does, and what requests it might be able to address. Web sites are
collections of text in an industry-standard format (HTML) that can be searched or tagged, allowing search
engines to do what they do. It would be ideal if apps had a consistent way to understand what they do.
Second, a request may include parameters that an app uses to perform its function, such as an address for a
navigation application or a restaurant name for a review or reservation. It would be frustrating and inefficient
if the user had to repeat the information once the app is launched after including it in the original request
(e.g., “Italian restaurant in Beverly Hills”). Thus, the app, when it is registered with the integration function,
should report the parameters it uses, such as “business name” and “location” (ideally with supplementary
information that can be used in natural language processing—e.g., “restaurant,” “diner,” “café”—to identify
those parameters). An industry standard should include parameter reporting, at least as an option.
Speech Strategy News
August 2012
6
The commercial success of such implementations is likely and would drive acceptance of at a standard way
for applications to describe themselves. Since a standard takes time, the most likely first efforts will be
consortiums, informal agreements, or a format driven by a successful integration platform such as personal
assistant applications or search engines. The de facto standards can be driven by firms with the power to do
so, with Apple, Google, and Microsoft, perhaps even Nuance, being prime suspects. Those firms could
simply have a reporting mechanism for a new app that accepted a particular format. Other integrators could
use that information if an app delivered it, making the de facto standard available to all. Either an informal or
formal standard would help the user and the software industry.
Commentary: Follow-up on software patents
Bill Meisel, Publisher & Editor
In last month’s editorial, I expressed my concern over the impact of the current patent system on
innovation, using Apple patents cited in a suit against HTC (really against the Android operating system) as
an example. I suggested that patents, particularly when covering an element of a user interface, were difficult
to evaluate. One short-term remedy I suggested was that judges refuse to issue injunctions based on an
element of a product design that wasn’t a core aspect of the product. If a patent violation was upheld at the
end of the trial, the court could then assess financial damages based on the importance of the feature, but
allow the product to remain on the market. Such actions help consumers by maintaining competitive products
and consistent user-interface features across products.
Richard Posner, a well-known jurist who sits on the 7th U.S. Circuit Court of Appeals in Chicago, teaches
at the University of Chicago, and has written books on intellectual property and the impact of law on
economics, expressed similar views. Posner presided over Apple’s lawsuit against Motorola Mobility, soon
to be part of Google. He canceled a trial between the two and rejected Apple’s request for an injunction
barring the sale of Motorola products claimed to be using Apple’s patented technology. In his ruling, Posner
said an injunction barring the sale of Motorola phones would harm consumers. He further rejected the idea of
trying to ban an entire phone based on patents that cover individual features like the smooth operation of
streaming video. Apple’s patent, Posner wrote, “is not a claim to a monopoly of streaming video!”
Posner told Reuters in an interview in July that some industries, like pharmaceuticals, had a better claim to
intellectual property protection because of the enormous investment it takes to create a successful drug.
Advances in software and other industries cost much less, he said, and the companies benefit tremendously
from being first in the market—a benefit they would still get if there were no software patents. “It's not clear
that we really need patents in most industries,” he said. Posner’s views were also reported in the Wall Street
Journal in July, and hopefully will have some impact on other jurists.
Posner also noted that devices like smartphones have thousands of component features, and they can all
receive legal protection individually. He commented, “You just have this proliferation of patents. It's a
problem.”
In a blog posting, Are Patents on the Mobile User Experience in the Public Interest?, last October, I cited a
patent suit Apple filed against HTC that included a patent on the slide-to-unlock feature on mobile phones,
one that could be considered an image of a slide switch on the screen that works like a slide switch. In another
example of a careful ruling, a judge in the UK in July, in the Apple suit against HTC, called that patent and
two other user-interface patents (one on multi-touch and the other on a multilingual keyboard) invalid. The
judge said the slide-to-unlock was an obvious development, citing the presence of a similar feature on a 2004
Swedish phone.
In July, Samsung issued a software update for its flagship Galaxy III smartphone that was characterized as
a security update. The update, however, removed the feature in the Google search bar that was used for a
search for local content on the phone as well as web search (p. 20). Apple had successfully obtained a ruling
that this feature infringed an Apple patent. This is an example where a useful feature for users is being denied
Samsung buyers by a patent war.
Speech Strategy News
August 2012
7
ICSI and Microsoft to collaborate on conversational human-machine interaction
International Computer Science Institute will explore speech and other modalities
Researchers at the International Computer
Science Institute (ICSI) in Berkeley, California, will
work with Microsoft to advance the state of the art
in human-computer interaction relying on speech and
other modalities, the organizations announced. The
collaboration takes advantage of ICSI’s history in
speech processing research (SSN, July 2012, p. 22)
and Microsoft’s experience in deploying natural
speech interfaces in its services and applications.
Roberto Pieraccini, director of ICSI, said that this
work is “particularly important now, as the
popularity of devices that understand and produce
speech grows more quickly than ever before.” Senior
ICSI and Microsoft researchers, as well as
postdoctoral researchers and students at ICSI, will
conduct the research.
Elizabeth Shriberg and Andreas Stolcke, Principal
Scientists with the Conversational Systems
Laboratory at Microsoft and ICSI External Fellows,
will lead the effort. The Conversational Systems Lab
(CSL) is an applied research group within
Microsoft’s Online Services Division based at the
Microsoft Silicon Valley campus in Sunnyvale,
California. CSL is exploring novel ways to interact
naturally with computer systems and services using
speech, natural language text, and gesture. Its aim is
to enable conversational understanding of users’
inputs and intentions across a range of devices, from
mobile phones to Xbox consoles in the living room.
In one of the first projects under this collaboration,
researchers will use information conveyed by speech
prosody (the melody and rhythm of speech) to
improve automatic speech understanding.
Shriberg noted that Patterns of timing and
intonation in spoken language encode information
far beyond that conveyed by words alone. “This
information is important for achieving natural and
efficient conversational interactions with machines,”
she said. “We expect to accelerate progress on
human-computer dialog systems that better
understand and use cues in human-human spoken
communication that we often take for granted.”
US Department of Justice licenses Nexidia audio discovery
Criminal Division installs Nexidia audio technology to streamline investigations
Nexidia announced that the Criminal Division of
the United States Department of Justice (DOJ) has
licensed Nexidia’s Audio Discovery software, which
can find content in audio files containing specific
phrases (SSN, July 2012, p. 11). The DOJ will use
Nexidia for reviewing audio content produced in its
investigations. Other government agencies already
using Nexidia solutions include the United States
Securities and Exchange Commission (SEC), the
Commodity Futures Trading Commission
(CFTC), the Federal Energy Regulatory
Commission (FERC), and the Federal Trade
Commission (FTC), as well as OfCom in the UK
(the independent regulator and competition authority
for UK communications industries).
Jeff Schlueter, Vice President & General Manager
of the Legal Market business unit for Nexidia, said,
“The DOJ decision to license our software is further
proof that it is the de-facto standard for reviewing
audio in a timely and cost effective manner.”
iSpeech Cloud mobile speech platform claims 13,000 developers
Improvements in speech recognition and text-to-speech and a home automation solution announced
iSpeech provides internally developed, cloudbased speech recognition and text-to-speech as well
as mobile apps, including DriveSafe.ly and iSpeech
Translator (SSN, September 2011, p. 11). In July, the
company announced that the iSpeech development
platform has been used over 1.6 billion times in
mobile apps made by over 13,000 developers, which
the company claims makes iSpeech the largest
mobile speech development platform in the world.
The technology can be used without cost in a
standard version that credits iSpeech.
iSpeech also released updates to its Web API,
iPhone, Android, and BlackBerry Software
Development Kits (SDKs) that provide faster
performance and optimized speech recognition for
Siri-like personal assistant applications and other
popular speech recognition use cases, the company
said. Heath Ahrens, Founder and CEO of iSpeech,
said the free version is used through the SDK, which
uses an Application Programming Interface (API) to
handle most of the interaction with the cloud-based
speech technology. The speech recognition and text-
Speech Strategy News
August 2012
8
to-speech software resides on servers owned by
iSpeech. The SDK assures that the requirements for
the free version are followed, including displaying
the source of the speech technology within an app
using it. The SDK also allows iSpeech to know the
app, the type of operating system and similar
information, giving it some visibility into where the
technology is being used and how it is used.
If a company wishes to use the API directly,
without the constraints imposed by the SDK, they
can pay a fee based on usage and/or number of
downloads, Ahrens said. The company also provides
professional services to, for example, create a
specialized statistical language model for a company
with an application that doesn’t fit the current
contexts available. The company can also create
custom TTS voices for customers. There are also
specialized contexts already developed for common
use cases such as virtual assistants, translation apps,
navigation, e-learning, and dictation. The company is
also considering licensing the core technology as
software, with no formal plans as yet.
A calorie counter app from about.com apparently
uses iSpeech speech recognition with a custom
language model. The app allows saying what you are
eating and getting a calorie count. About.com says it
has 250,000 foods in its database, presumably fodder
for a Statistical Language Model. iSpeech speech
recognition and text-to-speech is available in over 25
language and accent combinations.
The iSpeech platform, launched less than 10
months ago, is used by apps in lifestyle, food, travel,
retail, finance, gaming, messaging, dictation,
translation, and social services. Companies listed by
iSpeech in an announcement were Hearst (the media
and information company), Telenav (navigation
services), SpeaktoIt (a personal assistant mobile
app, SSN, November 2011, p. 28), and Vocre
(speech translation).
Ahrens said that the company has the server
capacity and technology to provide low-latency
response (“as fast as anybody”) and reliable
availability (“100% uptime so far”). The company is
cash-flow positive, he said, in part thanks to its
successful DriveSafe.ly app.
The company also announced iSpeech Home, a
platform for developers to use for connected devices
in the home and home networks. iSpeech Home is
intended to allow consumers to control their
televisions, home entertainment systems, lighting,
heating, ventilation, irrigation, security systems,
refrigerators, washers and dryers and other
household appliances through voice and natural
language commands. The system combines
embedded speech recognition for quick local action
with network-based speech recognition for more
complex queries. It also uses iSpeech text-to-speech
for voice feedback.
The company recently hired Qiru Zhou as Chief
R&D Scientist. Qiru, an expert on speech and
language processing, was with Bell Labs (now part
of Alcatel-Lucent after the split-up of AT&T) as a
member of its technical staff from 1992 to 2011. He
contributed to and led various Bell Labs major
research projects on robust, real-time speech
recognition, large vocabulary speech recognition,
natural language call routing, and spoken language
human-machine dialogue interface and architecture.
Empirix launches automated contact center regression testing as a service
Checks how existing services are impacted when changes are introduced
Empirix provides testing, monitoring, and
analytics solutions for service providers, mobile
operators, and contact centers, including the
simulation of calls using speech recognition to react
to prompts. The company announced the availability
of Empirix Regression Testing as a Service (Empirix
RTaaS), a new quality assurance solution for
ensuring that existing services are not negatively
impacted when changes are introduced into complex
contact center environments. It combines Empirix
Hammer Test technology with customizable services
for auditing contact center operations, assessing
customer experience, and designing test plans.
Empirix RTaaS measures the impact of changes on
switching, routing, Interactive Voice Response
(IVR), and agent desktop solutions prior to their
deployment.
Businesses are continuously updating their contact
center systems in response to events such as product
launches, cost-cutting programs, or mergers and
acquisitions.
Therefore,
comprehensive
understanding of all contact center systems can be
difficult to obtain, especially for companies that have
lost legacy expertise over time. Empirix RTaaS
provides organizations with detailed knowledge of
these systems to identify unused or underutilized
resources, as well as the thousands of routes that
calls travel throughout the contact center. Armed
with this information, businesses can leverage the
Empirix RTaaS solution to automate all their test
Speech Strategy News
August 2012
9
functions, including test script creation, execution,
monitoring,
reporting,
and
documentation.
Companies can then perform repeated regression
testing on an ongoing basis as changes are
introduced.
Tim Moynihan, vice president of marketing,
Empirix, said, “As organizations continually update
their contact center systems, they must not only
ensure that new features function properly prior to
their deployment, but also that existing capabilities
function at expected levels.”
In recent service engagements, Empirix said
companies saved between 60-70% when they
automated processes that were previously handled
manually. They were able to reduce the time needed
to test new solutions and gain actionable intelligence
for correcting any issues detected.
Voxeo launches "Zombie IVR" campaign
“Talking-Dead” self-service platforms continue to suck the life out of customer satisfaction
A frustration of those of us involved in speech
technology for many years has been the general
reaction of new acquaintances when we tell them of
our involvement with speech recognition; the general
reaction is something like, “Oh, you’re responsible
for those awful customer service systems that don’t
let me get to an agent.” Thank you, Apple, for
making speech recognition fun.
The frustration goes beyond the reaction of
acquaintances. Most of us believe that the speech
technology isn’t the limiting factor driving
dissatisfaction. Instead, it is often attitudes of call
center managers that tend to maintain the same
structure of interaction that they had with touch-tone
menus, without realizing that the decision tree forced
by touch-tone technology might not be natural to the
caller. If anything, many call centers have retreated
from the use of more advanced speech technology
and good design in call centers in the name of saving
money during a recession (and perhaps the lower
cost of agents outsourced to developing countries).
Part of the problem is older equipment that has
minimal flexibility.
Voxeo is fighting back with a “Zombie IVR”
campaign, emphasizing frustration levels with the
thousands of out-of-date end-of-life systems that trap
callers in “IVR hell.” To fight off these “talking
dead” IVR systems, Voxeo is launching a campaign
to showcase how fast, easy and cost-effective it can
be to migrate away from outdated IVR systems to a
flexible, standards-based solution with the ability to
adapt to customers’ heightened expectations and
changing preferences, including the demand for
mobile and social media interactions.
Voxeo’s architecture is based on open standards
such as VoiceXML 2.0 and 2.1, CCXML, SIP,
MRCP and SSML. Voxeo offers deployment
flexibility, delivering both hosted cloud and onpremise options with the ability to easily move from
one to the other or leverage a hybrid combination of
the models.
“IVR systems today need to keep up with
changing customer preferences and growing
expectations,” said Kim Martin, director of
marketing at Voxeo. “It’s not just about upgrading
hardware and software, but about upgrading the total
customer experience with the ability to provide
multi-channel interactions, personalization, location
intelligence and more. Companies that find
themselves locked into old technology are now
realizing how important it is to build in a completely
standards-based environment like Voxeo, that is
unlocked at every layer and provides the ability to
integrate cross-channel, actionable analytics to easily
tune and refine applications to meet customer
expectations. It’s ultimately about empowering
companies with the right functionality and tools,
even down to the flexibility of leveraging cloud
hosting, so they can better focus on their customer
experience and not the underlying infrastructure of
their IVR.”
Zombie IVR systems, Voxeo says, have offputting features such as greeting customers with
unhelpful, one-size-fits-all menu options, that insist
their options “have recently changed” when they
actually haven't been updated in years. For example,
such systems might require customers to enter their
account numbers, only to be asked again when the
customers give up and transfer to an agent for
help. These Zombie IVRs are unable to understand
or adapt to the customer's needs and merely drone on
like the “talking dead.” In summary, most of these
legacy systems were an attempt to reduce calls to
agents at the expense of customer satisfaction.
While the total cost of ownership of aging legacy
Zombie IVR systems continues to rise, Voxeo says
its VoiceObjects has been proven to save customers
up to 80% in maintenance and lifecycle management
costs. To speed up the migration process and keep
Speech Strategy News
August 2012
10
costs down, Voxeo and its partners have a variety of
tools that ease conversion from common platforms;
the company says it has been able to automate the
conversion of up to 95% of old code to Voxeo
VoiceObjects.
Calabrio integrates speech analytics and workforce optimization
More efficient review of voice transactions
Calabrio, Inc. provides contact center workforce
optimization and analytics software. In July, the
company announced a speech analytics application
integrated within a workforce optimization
framework.
The company’s Calabrio ONE workforce
optimization software includes call recording, quality
assurance, workforce management, performancebased dashboards, reporting, and now speech
analytics. Calabrio ONE is built on a Web 2.0-based
architecture that allows the contact center to
integrate new applications more easily, as well as
personalize and optimize the desktop toolset for each
user—agents, supervisors, managers, knowledge
workers, and executives.
Calabrio Speech Analytics turns recorded phone
transactions into meaningful data. Calabrio Speech
Analytics automates search of voice transactions, so
quality and compliance teams spend substantially
less time on review.
Tom Goodmanson, president and CEO of
Calabrio, said, “Calabrio’s goal is to drive Speech
Analytics into organizations in a powerful yet
flexible way and bring structure to the most
unstructured data, which is voice.”
The latest Calabrio ONE suite also includes
several enhancements to Calabrio Workforce
management and Calabrio Quality Management
applications, including more language options and
serviceability enhancements:
§ A dynamic dashboard capability, which includes
the ability to drill down on the detail within one
analytics widget to change the scope of all
related widgets within the dashboard, and
ultimately drill into root cause data for further
analysis and action;
§ A real-time recording monitoring application,
which monitors recording states and alerts in the
event of an outage;
§ User level localization for English, French,
Spanish, and Portuguese.
Calabrio ONE is available immediately through
Calabrio and its partner network.
StrikeForce and TradeHarbor partner to offer three-factor voice verification
Out-of-band authentication adds security for mobile transactions
StrikeForce Technologies and TradeHarbor are
partnering to offer a “three-factor” voice verification
solution for mobile devices. TradeHarbor provides
voice verification software (SSN, December 2011, p.
15), and StrikeForce provides multi-factor out-ofband authentication, which can include biometric
authentication methods. The new multi-factor
solution combines three critical factors—who you
are, what you have, and what you know—over a
mobile device. Each verification interaction produces
a legally binding voice signature combined with outof-band authentication, which includes an audit trail
to mitigate repudiation by the person being
authenticated.
Malware on a PC or mobile device can hijack a
web-based interaction and use the consumer’s
session to do things such as create a wire transfer
without the consumer realizing it. Out-of-band
authentication adds authentication through a separate
channel to avoid such attacks—thus, “out-of-band
authentication.” The telephone network is an ideal
out-of-band channel for authentication.
StrikeForce’s ProtectID out-of-band authentication
technology offers eight different out-of-band
methods, including phone, voice, instant messaging,
hard tokens, and desktop/mobile tokens. ProtectID
can be installed and managed on premise or with
StrikeForce’s hosted service offering.
TradeHarbor’s Voice Signature Service deploys
voice authentication technology in a scalable Web
Service. It provides the ability to obtain legally
binding document signatures over the telephone and
in mobile and Web transactions.
Speech Strategy News
August 2012
11
Active Endpoints allows use of enterprise software from a mobile phone
Helps iPhone and Android smartphone users to visualize, create, and modify their own wizards
Customer Relationship Management (CRM)
software such as Salesforce CRM from
Salesforce.com is used by professionals to keep
track of prospects, appointments, and other aspects
of selling a company’s products or services. It’s an
example of enterprise software applications that
companies use to organize and report activities.
Active Endpoints, Inc. announced Cloud Extend
Mobile in July, a product aimed at letting mobile
workers access such enterprise applications, debuting
first on Cloud Extend for Salesforce. The company
already has a product, Cloud Extend, that can work
through Web browsers. The company’s software
uses features of Salesforce CRM for the back-end
integration.
The mobile phone software supports speech-totext and touchscreen input with end-user
customization. Cloud Extend Mobile allows iPhone
and Android smartphone users to visualize, create,
and modify their own wizards, without IT skills or
training. The dual input method allows free-form
dictation of meeting notes, but also supports data
entry that could be accomplished more quickly by
tapping on-screen options, such as icons, pick lists,
and check boxes.
Users can get a quick start using a library of prebuilt tools. One is a free “Meeting Follow-Up
Wizard.” Users speak or tap info on the handset, and
their company’s enterprise app is automatically
updated. The wizards can be set up to ask for
information mapped to software databases for a
particular task.
Mark Taber, CEO, Active Endpoints, said,
“Cloud-based enterprise apps pump the information
lifeblood for companies around the world; however,
it’s nearly impossible for business users to utilize
those apps on the go, on a three- or four-inch screen.
Cloud Extend Mobile is going to spark a paradigm
shift for business smartphone users, letting them take
full advantage of the incredible computing power
that’s available in iPhone and Android
devices…This is the future for smartphones in
business.”
Polish telecom deploys Nuance VocalPassword for use by its employees
Subsidiary of Deutsche Telekom uses voice verification for automatic resetting of passwords
Nuance Communications announced that Polska
Telefonia Cyfrow, a subsidiary of Deutsche
Telekom, has deployed Nuance VocalPassword for
use by its more than 4,500 employees. Nuance
VocalPassword enables employees to automatically
reset their network and desktop access passwords
simply by speaking. The system uses Nuance voice
biometrics (speaker verification) to confirm their
identity and Nuance speech recognition to implement
the password reset.
Nuance indicated that its voice biometric solution
has processed more than 20 million voiceprints. The
company said that organizations are using the
technology in financial services, customer care,
government, and consumer devices, among others.
Robert Weideman, executive vice president and
general
manager,
enterprise
of
Nuance
Communication, noted, “Given that human voices
are as individually unique as fingerprints and retinas,
they are an ideal way for companies to authenticate
employees and customers.”
Maciej Zawada, platforms and systems
development bureau director of Polska Telefonia
Cyfrowa, said, “Nuance VocalPassword has
positively impacted our employees, giving them the
ability to easily and efficiently reset their passwords
24 hours a day, seven days a week. As a result, not
only have we been able to eliminate the need to ask
them a series of detailed questions to verify that they
are indeed who they say they are; more importantly,
we have been able to reduce the time it takes to
verify an employee’s identification to just 20
seconds, freeing up our IT staff to handle more
pressing issues. Given our positive experience with
VocalPassword, we are exploring how we can now
roll this service out to our customer service contact
center.”
The Nuance Voice Biometrics portfolio includes
VocalPassword; FreeSpeech, which automatically
identifies speakers passively during a live
conversation with a customer service agent;
DragonID, which provides authentication and
identification capabilities embedded into hardware
devices, such as mobile phones; and Loquendo
Public Security Solutions for government agencies,
such as law enforcement, military, and intelligence
services.
Speech Strategy News
August 2012
12
Nuance Dragon Drive! messaging in 2012 BMW 7 and BMW 3 Series
Speak emails and text messages with cloud-based transcription
In May, Nuance Communications introduced
Dragon Drive!, its cloud-based natural-language
voice platform designed specifically for the
connected car. Dragon Drive! Messaging (DDM)—a
mobile assistant that lets users speak, listen, and
respond to text messages and emails—was the first
service offered by the Dragon Drive! platform. The
speech recognition uses the same core technology as
Nuance’s Dragon Dictation app. Nuance provides a
hybrid automotive platform, with speech technology
local to the vehicle as well as in the cloud.
In July, Nuance announced that BMW was the
first manufacturer to integrate DDM. The solution
gives drivers the ability to dictate emails and text
messages to their contacts simply by speaking, and is
fully integrated as part of the BMW ConnectedDrive
Navigation system Professional in the new 2012
BMW 7 Series, BMW 3 Series Touring, and BMW 3
Series ActiveHybrid vehicles, with additional models
to follow.
DDM delivers a fully integrated mobile assistant
messaging experience that lets drivers speak, listen,
edit, and respond to text messages and emails while
keeping their hands on the wheel and eyes on the
road. Drivers can speak simple commands to format
e-mails by adding new lines, paragraphs, and
speaking punctuation and other format commands.
Arnd Weil, vice president and general manager,
automotive, Nuance, said, “People want to connect
with family, friends, and colleagues while they’re on
the road, but without the dangerous distractions
posed by manually engaging handheld devices.”
In addition to the dictation functionality, the new
BMW Navigation system Professional also features
local voice command and control with Nuance
technology. Drivers can speak one-shot commands
for phone calls and navigation, such as “Call John
Miller on Mobile” or “Navigate to 100 Boylston
Street in Boston, Massachusetts.”
DDM will be available in BMW vehicles starting
in July 2012 in six different languages, including US
and UK English, French, Italian, German, and
Spanish. BMW buyers can test DDM free for 60
days. Once the trial period expires, DDM will be
available as a Nuance service with an annual renewal
option.
AVIOS Speech Conference in Israel draws both academia and industry
Speakers from around the world addressed both academic and business issues
The Applied Voice Input Output Society
(AVIOS), the speech industry’s non-profit industry
organization, organizes a number of conference and
local chapter meetings to serve the needs of its
members and the general speech community,
including the Mobile Voice Conference (the fourth
to be held April 15-16, 2013 in San Francisco).
The 2012 Afeka-AVIOS Speech Processing
Conference, held June 19-20 in Tel-Aviv, was
organized by the Afeka Center for Language
Processing (ACLP) and AVIOS Israel, the AVIOS
local chapter in Israel. The 2012 conference had
representatives from both the academic and
industrial speech communities. International
speakers included Prof. Lawrence Rabiner, Rutgers
University; Dr. James Larson, VP, Larson
Technical Services; Prof. Sadaoki Furui, Tokyo
Institute of Technology; and Peter Mahoney, Chief
Marketing Officer, Nuance.
Dr. Nava Shaked, Chairman of AVIOS Israel, had
a major role in organizing the conference. She said
that it was very successful, with strong content and
enthusiastic networking.
Dr. K. W. “Bill” Scholz, AVIOS President,
commented, “We have been encouraging the growth
of AVIOS local chapters across North America,
Europe, the Middle East, and Australia to reach
across many geographies in an effort to raise
awareness of speech technology as a tool that that
helps the general public.”
Trapit uses natural language text processing to deliver content
Highly personalized content based on user-specified interest and user-specific adaptation
Trapit, founded in 2009, is backed in part by SRI
International, a source of speech and natural
language research that has been the basis of other
companies, most notably Siri before its purchase by
Apple. Trapit, which has a Web version of its
service content selection service, announced the
launch of their first iPad app in July. The free app is
described in a press release as “built from the same
Speech Strategy News
August 2012
13
AI technology that powers Siri.” Unlike Siri, which
uses speech input, Trapit is text-based.
The app delivers articles, videos, features, and
blogs on user-defined topics. Trapit isn’t limited to
pre-set categories or broad topics; Trapit explores the
entire Web and delivers “high-quality” content on
specific interests and hobbies defined by the
individual user. Gary Griffiths, CEO and co-founder
at Trapit, said, “Our iPad app represents an entirely
new approach to content discovery and consumption
on the iPad. Most people are tired of seeing the same
articles they already saw on social networks; they’re
looking for fresh, high-fidelity content from new
sources, on topics they actually care about.”
Trapit’s iPad app utilizes the same underlying
platform as the Web app, using natural language
processing, semantic analysis, and user feedback to
help select articles based on user-specified interests.
Users create focuses on any topic, from general
topics like “US politics” to more niche topics like
“cooking with avocados.” Trapit learns more about
the topic and individual preferences through both
implicit engagement and explicit feedback. Trapit
says it focuses on giving each user an experience that
is 100% unique to them, so that two users with
“traps” on the same topic, like “80’s Rock Bands,”
won’t be delivered exactly the same content. Trapit
uses over 100,000 “carefully vetted” sources, the
company said. The app also features a “save to
reading list” function and one-click sharing to
Twitter, Facebook, or email.
Interview with Nik Stanbridge, VoiceVault
Biometric identity verification with text-dependent voiceprints
Nik Stanbridge, VP of Product Marketing, VoiceVault, was interviewed by Bill Meisel in late July. Nik is
responsible for all aspects of Product Management, Marketing and social media integration at VoiceVault.
He is an experienced Product Manager with over 20 years experience in global B2B and B2C market sectors.
Prior to his current position, Nik held a variety of Product Management roles in technology companies
including PDF aggregation software for regularity submissions in the pharmaceutical industry; software for
PDF document creation, manipulation and conversion; and industrial inkjet print heads and ink systems.
Please describe how your basic voice verification technology works. Was it developed at
VoiceVault?
VoiceVault technology is 100% in-house developed and proprietary to VoiceVault. It is used to verify
someone’s claimed identity. That is, we can verify that someone is who they claim to be—it's about identity
verification rather than identification.
Our technology has undergone extensive user experience testing that has enabled us to define the optimal
number of words to use in the enrollment and verification processes. It has also applied many years of
research effort into making the voice biometric accuracy for that user experience appropriate for high security
applications.
To enroll a user, we prompt for speech that contains specific words that are then used to create that users'
voiceprint which is stored against their claimed identity (an account number for example). To verify that
person is who they claim to be, they have to speak some or all of the words that they used when they
registered their voice. This speech is then compared to the voiceprint associated with their claimed identity to
assess the probability that they came from the same person. Our technology is able to work with very small
amounts of speech: enrollment typically requires 10 seconds of speech and verification less than 5 seconds.
When used with our adaptation technology, which enables a voiceprint to be updated with verification
speech from one or more unsuccessful verification attempts, we are able to deliver very high levels of
accuracy [see below] in a wide range of speaker environments and channels. This provides a flexible
deployment approach that in turn delivers an excellent user experience with high levels of security.
What market segments do you see as early adopters of voice verification, and what is their
motivation?
Our strategic focus is to be the supplier of choice for Financial Services and Healthcare enterprises for
voice biometric solutions. Over the next 2-3 years we will build on the client base we already have to be the
foremost supplier of smart device and telephony-based voice biometric solutions as measured by the number
of Fortune 500 companies we will have as clients.
Speech Strategy News
August 2012
14
While we continue to have success in deploying voice e-signature solutions to the Healthcare market, we
are also seeing a significant uptake of identity verification solutions in the Financial Services market. Current
indications are that the Financial Services vertical is poised for rapid growth of text-dependent solutions on
smart device apps where text dependent voice biometrics is ideally suited.
Password Reset and One-time Password continue to be the ideal applications for how organizations take
their first steps in learning and understanding voice biometrics. Authentication and transaction authorization
will be the biggest growth area. Traditional call center applications will continue to be important including
their use of voice e-signatures and caller authentication.
We expect smart device solutions to be a significant growth area and we believe that our short-utterance
text-dependent solutions are very well placed to benefit from this rapidly growing market.
VoiceVault has had some successes internationally. Please outline these and the reason for
international growth.
Our current voiceprint distribution is 80% US and 20% non-US. By vertical this represents 50%
Healthcare, 40% Financial Services, and 10% other.
International growth is important to us and we acknowledge and recognize that while we have a strategic
US focus, many large institutions have technology and innovation centers outside the US. Initiatives in our
target markets can come from anywhere, and we look carefully at each one. As the number of large-scale
voice biometric solution deployments is still growing, it’s important for us to promote and encourage
adoption so that we can be seen as leading the way in voice biometrics—and this involves thinking and
operating internationally.
What do you see as distinguishing characteristics of voice verification technology from different
vendors?
§
§
§
§
§
§
§
§
§
§
§
§
§
§
Our key differentiators are:
Our technology can be optimized to deliver a false accept rate of 0.01% with a false reject rate of less
than 5% for high security applications;
It can be optimized to deliver a false reject rate of 0.05% with a false accept rate of less than 1% for costreduction applications;
Extensive and on-going user experience testing has resulted in a highly engaging but non-intrusive user
experience design for authentication and authorization solutions;
Accuracy levels can be achieved with 10 seconds of enrollment audio and less than 5 seconds of
verification audio;
It is a software-based solution designed for rapid and simple deployment. It can either be vendor-hosted
or on-premise, with no specialized server requirements—commodity hardware and virtualization are all
supported.
Web services APIs are extremely easy and quick to integrate with; partners and clients have developed
proof-of-concept applications in a matter of hours;
The same deployment can be used for all channels and all applications; all you need is the ability to
record speech and a network connection to submit it;
VoiceVault is exclusively Voice Biometrics. We are 100% focused on being the best provider of voice
biometric authentication and authorization solutions.
Looking at voice biometric vendors in general, the key characteristics to look at and consider are:
The type and amount of speech required for the enrollment of a person's voice and for subsequent
verification attempts;
The level of accuracy that can be achieved using this amount of speech (the false accept rate / false reject
rate);
The suitability of the enrollment / verification processes and user experience for a given use case;
The suitability of the obtainable accuracy level to the business case;
The scenarios and use cases that this amount of speech / accuracy can be used in (text independent
conversational speech for example isn't suitable for a smart device app);
How easy is it to develop an application and how straightforward is the API integration?
Speech Strategy News
August 2012
15
Any final comments?
There is “no one size fits all” in voice biometric deployments. Every voice biometric solution is designed
to meet a specific use case, so understanding the user case / business case in which the solution will be
integrated and deployed is key to the successful use of the technology. Taking time to understand what the
technology is going to be used for and what the deployment success criteria are is essential.
Interview with Chih-Chung Kuo, Industrial Technology Research Institute
ITRI’s speech research includes speech recognition, speaker recognition, and speech synthesis
Chih-Chung Kuo, Technical Director, Division for Computational Intelligence, Information and
Communications Research Laboratories (ICL), Industrial Technology Research Institute (ITRI)
was interviewed by Bill Meisel in mid-July. ITRI is a nonprofit R&D organization in Taiwan
engaging in applied research and technical services. The organization has offices in Taiwan, San
Jose (California), Tokyo, Germany, and Russia. Dr. Kuo is a senior researcher at ICL in ITRI. As
the Technical Director of the Division for Computational Intelligence & HCI Technology, he leads a
team to develop state-of-the-art technologies and to play the leading role in providing solutions for
Taiwan industry in the field of speech and intelligent user interfaces. Dr. Kuo holds a Ph.D. in EE
from National Tsing Hua University, Taiwan.
Please briefly state the overall goal of ITRI.
Founded in 1973, Industrial Technology Research Institute (ITRI) is Taiwan’s largest and one of the
world’s leading high-tech R&D institutions. Well-positioned to be a pioneer of industry with brand new ideas
and innovation, the goal of ITRI is to promote the advancement of Taiwan’s diverse industries by:
§ Expediting the development of new industrial technologies;
§ Aiding in the process of upgrading industrial technologies; and
§ Shaping the future of industrial technologies for greater efficiency and sustainability.
Being a multidisciplinary research center, ITRI focuses on six technical fields that include Information and
Communications; Green Energy and Environment; Medical Devices and Biomedical; Electronics and
Optoelectronics; Material, Chemical and Nanotechnology; and Mechanical and related systems. ITRI has
aggressively researched and developed countless next-generation technologies including green energy,
mobile digital life, cloud computing, flexible displays, 3-D ICs, RFID, light electric vehicles and tele-care
technologies. For five consecutive years ITRI has received prestigious international awards for outstanding
technology innovation, such as the Wall Street Journal Technology Innovation Award, R&D 100 Awards, the
iF Design Award the Red Dot Design Award, to name a few.
ITRI makes a concerted effort to collaborate with international partners to enhance and facilitate
technology innovation and commercialization, aiming to transform Taiwan’s research capability from a
“follower” to a “frontrunner,” so as to provide leading edge opportunities for domestic industries. For more
details please refer to www.itri.org.tw/eng.
What is the main focus of ITRI’s activities in information and communication research in
particular?
ITRI’s activities in information and communications research are conducted by Information and
Communication Research Laboratories (ICL), one of ITRI’s six core laboratories. Dedicated to the vision of
enabling a Green, Intelligent, and Healthy Society, ICL is executing its industry-enabling strategies by
developing software-centric, service-oriented circuits, information, and communications technologies, in
addition to stressing system integration, in the following focused areas: Smart endpoints, mobile enabled
cloud services, intelligent vehicles and transportation systems, green energy and health care. (See figure.)
Speech Strategy News
August 2012
16
M ain Focus of ITRI’s Inform ation & Com m unication Research
Please describe what ITRI is doing in speech and natural language research.
ITRI’s speech research includes speech recognition, speaker recognition, and speech synthesis. Speech
recognition technologies range from small-footprint voice command recognition optimized for IC and
embedded systems to large vocabulary continuous speech recognition run on servers. Nuvoton N572F064
and Grain Media ET11A5 are two speech ICs with speech recognition technology transferred from ITRI.
ITRI’s natural language research involves two aspects: one is for spoken dialog systems like natural
language understanding and generation as well as dialog management; the other is about how to extract
information from unstructured text content retrieved mostly from web. ITRI’s speech and natural language
research focuses on Mandarin Chinese for accents of both Taiwan and China. Since English is frequently
used in modern Taiwan society in practice, mixed-Chinese-English processing is also an emphasis of our
research. For example, an ITRI polyglot TTS system with unified
model of Mandarin and English can produce fluent synthetic speech
Mixed-Chinese-English
for mixed-Chinese-English sentences, which should be the leader in
solutions one focus
this kind of TTS system.
An image-based avatar technology integrated with our TTS engine can produce a text-driven “talking
head”. The synthesis of both image and voice of a true person make the avatar look just like real video
captured from the person. We believe that this is quite a unique technology and system. Please visit our demo
site at www.ecsr.itri.org.tw/ttsdemo/vttsdemo.php, where you may enter any Chinese text and watch the
synthetic video. For more details and a demonstration please click (in Chinese only)
http://atc.ccl.itri.org.tw/speech or visit a Chinese language learning web site (in English) for foreign learners,
which has integrated almost all of our technologies at www.cola.itri.org.tw/index.php.
Are ITRI technologies available for external license?
The Taiwan government funds the R&D activities of ITRI, which in turn transfers R&D results to local
enterprises. In addition to technology transfer, ITRI offers a range of technical services to assist industries to
enhance their competitiveness, including products and process development, pilot production,
Speech Strategy News
August 2012
17
test/certification, and IP licensing. Take the year 2010 as an example; 423 technologies were transferred to
491 companies, 690 investment deals were reached and a number of start-ups were established. Technology
transfer is conducted based on the principle of fairness, openness, and efficiency, with the priority given to
domestic enterprises. Companies in jurisdiction outside Taiwan need special approval from the Ministry of
Economic Affairs. See www.itri.org.tw/eng/econtent/business/business03.
AT&T speech technology (cont.)
Continued from page 1
AT&T Watson takes input, analyzes it, performs one or more services, and returns a result in real time.
Input can be audio files, speech, gestures, face recognition, and text. (Source: AT&T)
There is a registration charge of $99 for
developers, Gilbert indicated, which will allow
developers to use all AT&T APIs, including speech,
as they become available, without a per transaction
charge through 2012. Gilbert said that AT&T is
working on pricing beyond 2012, and current
projections have pricing at about one cent for most
“small transcriptions.” He said AT&T will review
pricing as we get closer to 2013, but he does not
anticipate pricing “going anywhere but down.”
(More detailed pricing information is available
online. AT&T also sent out an eblast with a discount
code that allows getting the API with the $99 fee
waived through August.)
AT&T Watson is a network-based engine that
integrates a variety of speech capabilities, including
speaker-independent speech recognition, AT&T
Natural Voices text-to-speech, speaker verification,
natural language understanding, LLAMA-based
machine learning, search, translation, and dialog
management. AT&T says that the Watson speech
engine continuously improves accuracy by learning
different accents and speech patterns. WATSON can
combine speech with other modalities, such as a
touch-screen tap (“show me the closest coffee shop
to here”) or other gesture (see figure). AT&T said in
advertising material that AT&T has accumulated
more than 600 patents on the AT&T Watson
technology.
Watson uses a plugin architecture where each
subtask is contained in its own plugin. Depending on
the task to be performed, Watson selects the right
plugins at run time, assembles them into a working
engine, and coordinates the information exchange
between the plugins. It also handles communication
with the end device.
However, only speech recognition (speech-to-text
transcription with Statistical Language Models that
are tuned for specific “contexts”) is available
initially with the current API. The API allows
sending audio and receiving back text. AT&T
indicated that native and HTML5-based Software
Development Kits (SDKs) would be available
“soon.”
The contexts make the speech recognition more
accurate and also support specialized vocabularies,
including:
Speech Strategy News
August 2012
§
§
§
§
§
§
§
Generic speech-to-text (general dictation,
automatically detects English or Spanish, and
returns the appropriate text transcription);
Web Search speech-to-text;
Local Business Search speech-to-text;
Voicemail-to-text;
SMS (text message) speech-to-text;
Question Transcription (converts questions to
text); and
Nuance Dragon version (cont.)
Continued from page 1
A new interactive tutorial is available to walk
people through exercises that demonstrate best
practices for dictating, editing and formatting to get
up and running quickly.
Dragon’s adaptive features that personalize the
speech recognition, vocabulary, and other aspects
specific to a user have been further enhanced.
Dragon 12 adds Smart Format Rules, a new
technology that adapts to the way the user prefers to
format their words. For example, if you work at
Nuance (or write a speech newsletter), you probably
want “dragon” to appear capitalized when you
dictate it. Dragon automatically detects word, phrase,
and format corrections, including abbreviations,
numbers and more, so dictated letters, emails and
documents reflect a person’s own writing style. (The
software asks if you want the replacement to always
be made.) Dragon also offers more and more likely
alternate word choices in its correction list. For
example, if one dictates “Eric,” the correction list
includes seven alternative spellings. Dragon 12
reminds users to use the feature that scans documents
and emails they choose to find vocabulary and usage
data for the programs language model.
Dragon 12’s use within other programs has been
improved. If Gmail and Hotmail are used through
Internet Explorer 9, Mozilla Firefox 12 or higher,
and Google Chrome 16 or higher, Dragon 12 offers
full text control and adds commands for the most
frequent actions.
Dragon 12 adds support for the Dragon Remote
Microphone App in Android phones, previously
available only for iOS. The feature lets one use a
mobile phone as a wireless microphone over a Wi-Fi
network using the free Dragon Remote Microphone
App. Dragon 12 also supports wideband 16 kHz
Bluetooth wireless headset microphones, providing
increased accuracy through a higher-quality audio
signal.
Some people find catching errors or poorly stated
points in a document easier if they hear it read.
18
TV Speech to Text (AT&T’s U-verse video
programming guide).
The contexts are language models built, maintained,
and tuned by AT&T.
AT&T is also offering the AT&T Application
Resource Optimizer (ARO) as open source code.
ARO is a free diagnostic tool that helps to optimize a
mobile app’s performance, speed, network impact,
and battery utilization.
Dragon 12’s text-to-speech reads text now with more
control—fast-forward, rewind, speed, and volume
control. Hill indicated that the TTS itself is more
natural-sounding in this version.
An improved help menu provides access to many
resources, including the Accuracy Center, the
Performance Assistant, Dragon’s Help, the Tip of the
Day, the Sidebar, Tutorial and Interactive Tutorial, a
link to printable documentation, and links to Web
resources. A user can get help at any time by saying
“Give me help.”
When dictating into Dragon’s native dictation box
and some other applications, all of Dragon’s features
for text control are available. In the new release,
Dragon automatically displays the dictation box
when you dictate into a text field for which it does
not have full text control. After you finish dictating,
you can transfer the text from the dictation box to the
desired application quickly by voice. (This option
can be turned on or off based on your preference).
Dragon 12 lets you specify preferences for
commands within Dragon. By giving you the option
to disable certain commands, Dragon can boost
performance, as well as avoid an unintended
command. In order to avoid unintended actions,
Dragon now, by default, requires you to say “Click”
before the name of a menu, button, check box, other
interface control, or hyperlinks. You can now turn
this requirement on or off for menus separately from
other controls.
The new command “Open Top Website for
<keywords>” directly opens the top-ranked web
page for the keywords you include when you dictate
the command. You can say this command at any
time, whether or not a Web browser is currently
open. In particular, this is a convenient way to
quickly open the website of a company or institution.
Professional and Legal Editions have added
features for administrators, including a recognition
log file for each end-user for usage information
which can give users targeted advice and measure
return on investment. Dragon's Auto-Transcribe
Folder Agent (ATFA) manages the flow of
transcribed text and synchronized audio of digital
Speech Strategy News
August 2012
voice recordings to streamline third-party review and
correction.
Peter Mahoney, Chief Marketing Officer for
Nuance and Senior Vice President, General Manager,
Dragon, said that, with the new improvements, “The
technology simply disappears and your ideas flow
onto the screen in front of you.”
Dragon NaturallySpeaking 12 is available for preorder immediately starting at $99.99, with
availability as a download on August 3.
Google voice search (cont.)
Barra told Wired: “It’s very deliberately not
making jokes with you. Google is a neutral party—
it’s not your friend, secretary or sister…It is an
information retrieval entity…And it’s very important
that this entity be impartial, and adding jokes and
other mannerisms to the voice would take away from
that.” Barra said that the name of the function is
simply Google Voice Search.
Barra also emphasized that Google’s text-tospeech voice is something special. He said that the
solution can speak in the same voice whether using a
TTS engine on the device or in the network. The
network-based solution, he said, uses a lot of speech
data to give a natural feel. He said that, in contrast to
TTS voices created for telephone applications, the
voice was created for voice search; he said the voice
was the “first conversational voice” in speech
synthesis.
Barra said that some of the things that Google did
in Jelly Bean are representative of where the
company thinks the industry should go in the mobile
space. One was the home screen experience, where
“stuff appears and actions can be invoked, without
having to dive into an application.”
The second thing he mentioned is more efficient
task switching. He gave the example of calling
someone back. That function should not be three
clicks away; it should be one click away. Google
will be trying to make easier access to all the
specialized applications evolving.
Barra also addressed the objective of the Nexus
pad computer. He said it is focused on delivering
digital content—movies, books, magazines, and
gaming. He emphasized the suitability for highperformance gaming as a distinguishing factor, with
the device containing a gyroscope and a powerful
Graphical Processing Unit, and Google Play as an
integrated resource for content and games.
Continued from page 1
To access Google Now, one can swipe up from
any screen. Once in Google Now, one can say,
“Google” to initiate voice search.
The multiple-pane results can be displayed in
response to a search, providing additional
information that is closer to the answer to a search
than the list of web sites provides. Google says it has
improved voice search so that it can display answers
to spoken questions from sources including
Wikipedia, the CIA World Factbook, and Freebase, a
community-run knowledge database.
For most text or voice search queries, the context
is detected from the phrase, e.g., the name of a sports
team (to get news on that team) or an airline flight
(to get flight status). There are some less obvious
alternatives, e.g., “area code 215” will give the
location of that area code; “Translate to Spanish,
Where is the Palace Hotel?” will provide the
translated phrase; and “pictures of…” or “images
of…” will launch an image search.
Google has also updated their search results to
include a new scientific calculator. Formerly, the
results would just show the calculated results if you
were to type, for example, 5+5, but now the result
will pop up on a full calculator on-screen with 34
buttons including logarithmic functions.
Google is providing search capabilities that make
search similar to a voice assistant, like Apple’s Siri.
But in contrast to Siri, Google seems to be avoiding
treating the search capability as a single personal
assistant. Some insight into the philosophy behind
this was provided in an interview with Hugo Barra,
Android’s director of product management,
published in July in Wired.
19
News briefs
VoiceVault releases new generation of its speaker verification
VoiceVault announced the release of the next generation of its voice biometric speaker verification
engine on July 30 (see interview, p. 13). The new voice biometric engine delivers a false accept rate of 0.01%
at false reject rates of less than 5%, the company indicated. VoiceVault says it has a verifiable equal error rate
(EER) of only 0.1%, compared to a typical EER of around 2% in other voice biometric deployments, based
Speech Strategy News
August 2012
20
on a “real-world financial services application,” used for authorizing high-value financial transactions on a
smartphone, where voice biometrics is part of a 4-factor security cocktail.
Samsung reduces search capability in its Galaxy SIII smartphone, apparently in response to
Apple patent suit
When users update the software in Samsung’s Galaxy S3 smartphone, responding to what seems a
maintenance release, they will find they can no longer use the search function for device-local information
such as contacts, apps, and other on-device material using software developed by Google as part of Android.
This is apparently a response to Apple’s patent lawsuit against the device, and is an example of how patents
on user interface features are not in the interest of consumers.
Nuance hints that its personal assistant aimed at corporations will be called “Nina”
In a brief announcement, Nuance indicated that it would release a corporate voice assistant branded Nina
this summer, aimed at supporting customer service on mobile phones.
Voxeo Labs announces strategic partnership with Deutsche Telekom
Voxeo Labs, part of Voxeo, announced their strategic partnership with Deutsche Telekom AG in
Europe. The new partnership introduces the Tropo API as an addition to Deutsche Telekom’s Developer
Garden. The Tropo API by Voxeo Labs enables developers to make and receive phone calls and text
messages from any web or mobile application, using a web-based API and pay-as-you-go pricing. Tropo
offers many advanced features including speech recognition, text-to-speech, conference calling, and call
recording, all using web technologies and programming languages developers already know.
Pronexus adds web sites to support IVR developers
Pronexus announced that it has launched two new websites, VBVoice.com, the free toolkit offered by
Pronexus for developers to build their own Interactive Voice Response (IVR) solution, and Pronexus.com.
Gary T. Hannah, President and CEO, Pronexus, said, “The VBVoice toolkit is widely used and continues to
win in verticals like healthcare, government, financial, and consumer.”
Spoken Communications partners with Varolii to combine customer interaction applications
with Spoken’s inbound capabilities
Spoken Communications, provider of a cloud platform for contact centers (see interview, SSN, May
2012, p. 17) announced a new partnership with Varolii. Varolii’s cloud-based communication services help
organizations effectively interact with large numbers of customers and employees through voice, text
messages, smartphone applications, and email. The new partnership supports Varolii’s customer interaction
applications with Spoken’s inbound capabilities. Spoken’s enterprise cloud features the full suite of inbound
contact center functionalities, including call switching, recording, monitoring and analytics as well as the
company’s speech recognition IVR.
Chinese search giant Baidu opens tech lab in Singapore, with a partial focus on speech
applications
Baidu, which already has offices in China and Silicon Valley, opened its first R&D lab in Singapore, the
Baidu-I²R Research Centre (BIRC). The lab is a joint venture with Singapore’s Agency for Science,
Technology and Research (A*STAR). The objective is to create new technologies for the Southeast Asian
region. The research group has reportedly already developed speaker authentication and other speech
technology for the Vietnamese and Thai language, apparently through a technology agreement with A*STAR
(not yet a commercial product, but intended for mobile devices).
The lab projects include natural language processing, information retrieval and information extraction,
and speech processing systems. These should eventually find their way into Baidu’s Box Computing and
Baidu Cloud mobile platforms.
SRI reveals voice assistant for Spanish banking group BBVA
According to a report from TechCrunch, SRI International is working on a new project for Spanish
banking group BBVA. The browser-based system “Lola” is designed to help users with online banking,
imitating the style and manner of a human bank teller. The system operates by chat or speech, at least as
Speech Strategy News
August 2012
21
demonstrated to TechCrunch. This is the sort of company-specific personal agent that I’ve predicted every
company of any size will need someday.
Northwest Multiple Listing Service selects Interactive Intelligence’s IP communications
software suite
Northwest Multiple Listing Service (Northwest MLS), a consortium of real estate brokers, has
selected Interactive Intelligence Group’s all-in-one IP communications software suite, Customer Interaction
Center (CIC). The real estate listing service is replacing a Nortel telephony system with CIC, which will
support all employees at its Kirkland, Washington, headquarters, and at the company’s 16 satellite locations.
Interactive Intelligence reseller, KRP Communications, will provide CIC deployment services and ongoing
maintenance for Northwest MLS.
Northwest MLS president and CEO, Tom Hurdelbrink, said, “When call volume is heavy in one
location, we’ll use CIC to automatically route calls to another office. Similarly, when a particular area is
affected by weather-related interruptions, calls can be routed to a different location or to employees working
from home. This will enable us to respond more quickly and consistently to our members.”
CallMiner analytics solution adds new personalization features
CallMiner, which provides customer analytics solutions for contact centers (SSN, March 2012, p. 9), has
today announced availability of Version 9.0 of its flagship Eureka! solution. Version 9.0 carries with it a new
set of features called “myEureka,” which enable personalized portal functionality to be delivered directly to
users. Scott Kendrick, VP of Product at CallMiner, said, “Until now, contact center analytics has been the
preserve of dedicated analysts. myEureka…pushes actionable business intelligence insights directly into the
workplace, to the people who need and can act on it in real-time. MyEureka delivers relevant data to
stakeholders at every level: the VP who manages contact centers and/or BPOs, the Supervisor who manages a
team of agents, and to agents themselves, to provide direct feedback on performance and where improvement
is needed.”
Hungarian telecom operator introduces voice identification in customer service operations
using Nuance voice biometric solution
Magyar Telekom provides fixed line and mobile communications services for residential (T-Home and
T-Mobile brands) and SME customers (Telekom brand) in Hungary. Magyar Telekom is the first in Hungary
to introduce voice-based identification to facilitate safer and more convenient customer service solutions.
Powered by Nuance voice biometrics, the system currently identifies 20 million customers.
Sandata Technologies Electronic Visit Verification technology to improve visibility and
oversight of home care delivery for Louisiana Department of Health and Hospitals
Sandata Technologies, a national provider of information technology solutions for the home care
industry (SSN, May 2012, p. 14), announced that the Louisiana Department of Health and Hospitals has
licensed its Electronic Visit Verification solution, Santrax Payor Management (SPM). Sandata’s partner,
CNSI, Inc., was awarded the Medicaid Management Information System Replacement and Fiscal
Intermediary Services contract for Louisiana’s Medicaid program. Through a subcontract with CNSI, Sandata
will provide the SPM solution to meet the visit verification requirements. Santrax Electronic Visit
Verification includes voice biometrics to perform speaker verification.
Easy Voice Biometrics allows finding closest match to individuals when comparing voice files
Easy Voice Biometrics is a partnership between several companies to provide forensics voice products,
with advanced support. The organization offers a product by the same name, Easy Voice Biometrics, a
product that allows technicians to find the closest match when comparing voice files. The product is intended
for professional audio forensics specialists to allow for quick comparison and identification of voice files.
Mathematical voice ID methods are used along with other methods including voiceprint, pitch, and formants
analysis, linguistic and auditory analysis.
The Easy Voice Biometrics product is designed for law enforcement agencies, state and private forensic
audio investigators, detectives, and lawyers to perform the following tasks:
Speech Strategy News
§
§
August 2012
22
Facilitate voice expert identification analysis in the performance of multi-target forensic audio
investigation by eliminating imposters and ranging the top-in-the-list speakers according to the biometric
traits likelihood probability.
Express attribution of the investigated speakers' voices by the proximity degree.
Google adds voice search to Google+ Local on iOS
Google modified its Google Places service in May, adding data from its purchase of Zagat and renaming
it Google+ Local. The company has now similarly revamped its iOS app (which is also now known as
Google+ Local). Voice search is now included. Zagat scores are now included alongside Google user reviews,
and one can rate business and locations in the app, making the feature somewhat of a competitor with Yelp.
VoiZapp app uses Android speech recognition to post to a Facebook news feed
VoiZapp Inc. launched the Android app “Friends Aloud.” The application allows Facebook participants
to access, listen to, and post status updates and comments by voice to their Facebook news feed. It uses
Android’s built-in speech cloud recognition from Google. It can also use text-to-speech capability to read
aloud Facebook news feed posts and their associated comments.
Mossberg review of Android Jelly Bean criticizes the voice assistant
Although Walter Mossberg in the July 11 issue of the Wall Street Journal gave general good reviews of
the new Google Nexus pad computer, he also discussed the latest Android release, “Jelly Bean.” He
commented briefly that the voice assistant function didn’t seem to measure up to Apple’s Siri (without
providing much discussion of how he arrived at that conclusion).
Wolfram|Alpha provides answers to Samsung’s S Voice as well as Apple’s Siri
Wolfram|Alpha, which attempts to answer a broad range of questions over a long list of subjects) and
contributes answers to Apple’s Siri, announced that it is also providing data to Samsung’s S Voice. The
Samsung Galaxy S III, as well as the Galaxy Note, will now include the Wolfram|Alpha knowledge base with
S Voice and the productivity app S Note.
Users will be able to get answers to factual questions. Users can ask questions such as “How high is
Mount Everest?,” “Who is Barack Obama?,” or “What is the weather like today?,” and Wolfram|Alpha will
give the correct answer.
BMW to incorporate Nuance voice command in its dashboards
BMW will be the first automaker to incorporate Nuance Communications’ Dragon Drive voice
messaging technology in its BMW 7 Series flagship luxury sedans as well as the BMW 3 Series Touring and
ActiveHybrid. Nuance is starting small. The first Dragon Drive application will be an SMS service, allowing
drivers to send a text message to a number or contact in their address books as well as dictate the message
itself. That service will start appearing in vehicles on dealer lots this summer. But soon, Nuance is expected
to start layering on more functions. BMW is implementing Dragon Drive’s initial service, messaging, which
allows drivers to listen to speech-transcribed e-mail and SMS, as well as dictate, edit, format, and send
messages via voice command.
Horizon Private Cloud provides outsourced services to Voice Automated, a Nuance Dragon
reseller
Horizon Private Cloud (HPC) announced a cloud desktop services agreement with Voice Automated, a
distributor of speech recognition applications for the healthcare, medical, and legal industries and a reseller of
Nuance Dragon products. Under terms of the services agreement, Horizon Private Cloud is providing Voice
Automated with cloud desktop hosting, application virtualization, and data protection and storage from
HPC’s data center in Irvine, CA. Future plans call for an additional healthcare and legal software hosting to
service customer throughout North America.
Robert Christiansen, General Manager of HPC, said, “We provide a unified cloud solution, meaning all
your apps and data, are available anywhere, on any device…Companies don’t want to buy infrastructure,
licensing, and employ IT staff. They simply want the service (their apps and data) delivered to them
seamlessly while only paying for what they use. We see a perfect fit with Voice Automated and the services
they provide for their healthcare and legal customers.”
Speech Strategy News
August 2012
23
Motorola’s new ATRIC HD phone for AT&T automatically goes into car mode when docked
The new Motorola ATRIX HD Android-based smartphone is available from AT&T for $99.99 with a
two-year agreement. The phone comes pre-loaded with SMARTACTIONS, a free app from Motorola that
suggests ways to automatically change the phone’s settings throughout the day. For instance, when you place
the Motorola ATRIX HD in the Vehicle Navigation Dock accessory and enable Drive Smart, it will set your
phone to vehicle mode, read your text messages aloud, and auto-reply to incoming calls and texts, as well as
provide turn-by-turn navigation.
Dictation features in new Mac OS is done in the network, some personal data is used
The AppleInsider web site reviewed the speech recognition (dictation) features in the recently announced
Apple Macintosh operating system, OS X 10.8 Mountain Lion (SSN, July 2012, p. 1). The speech-to-text
works everywhere that one can type—one simply clicks on a microphone icon to dictate. The audio is not
converted to text on the Mac; the audio is sent to Apple’s servers, thus requiring an Internet connection.
AppleInsider emphasized that Apple is careful about making sure users understand the privacy issues.
Dictation is turned off by default, for example, and users are warned that the audio leaves their computer.
Apple says that it uses the data to improve the speech recognition accuracy. It also downloads other data;
Apple’s warning includes:
“Your computer will also send Apple other information, such as your first name and nickname; and the
names, nicknames, and relationship with you (for example, “my dad”) of your address book contacts. All of
this data is used to help the dictation feature understand you better and recognize what you say. Your User
Data is not linked to other data that Apple may have from your use of other Apple services.”
Mercedes-Benz adds connection to Apple Siri to its COMAND navigation system
Apple is working with some car manufacturers to integrate the Siri speech recognition system used in its
iPhone to enhance the in-car infotainment options available. Working with voice control buttons that already
exist in many Bluetooth-enabled cars, one will be soon be able to access a range of Siri functions like
selecting and playing music, hearing and composing text messages, using maps and getting directions, and
getting calendar information and reminders. Input will be via the car’s built-in microphone and output via the
vehicle speakers. Mercedes-Benz will apparently be first to market, having launched their new A-Class with
a specific module on the COMAND navigation system.
It’s will be legal to text while driving in California if you use speech recognition in the new year
California lawmakers have made it illegal for you to type text while driving, but if you have speech
recognition on your phone, it will be OK to speak your message through Apple’s Siri or a similar voice
assistant. Gov. Jerry Brown just signed a bill that clarifies the state’s texting laws. Sending and receiving text
messages through hands-free speech recognition and speech synthesis is legal.
SoundGecko web application and mobile phone app is a text-to-speech service
SoundGecko is a web application that’s essentially a text-to-speech transcription service. Drop a URL
into SoundGecko and it converts the article at the URL into speech. The web app also integrates with cloud
services and an iPhone app. The simplest way to use SoundGecko is to have it send an email with the file, but
it can be integrated with Dropbox or Google Drive for immediate syncing. The iPhone app will also directly
sync up articles you've converted, and can use a Google Chrome web browser extension to add articles on the
fly.
Veveo provides predictive search on Android phones, including personal info on phone
Veveo announced vtap QuickSearch, a predictive “search as you type” universal search application that is
context-aware and personalized for Android smartphone users. vtap QuickSearch works on all Android
devices (running version 2.1 or later) including the Samsung Galaxy III and Galaxy Nexus, which recently
lost its local-device search functionality in the latest software upgrade, possibly as a response to Apple’s
patent suit.
vtap QuickSearch searches across device content including Contacts, Calendar, Music, Text Messages,
Device Settings, and others, to seamlessly merge with online results from various Android app stores,
Wikipedia, Wiktionary, movies, local business listings and places of interest. vtap QuickSearch then
prioritizes the results based on learned user preferences. Using Veveo’s predictive search, the results are
Speech Strategy News
August 2012
24
displayed as a user types, thereby providing instant search results that update with each additional keystroke.
The application is available for download on the Google Play Android store.
Veveo’s QuickSearch is an OEM application for smartphone manufacturers to embed universal search
and personalization capabilities directly on their devices. With multi-lingual capabilities for more than 50
languages in QuickSearch, the localized versions of vtap QuickSearch will soon be available for other
international stores.
Nuance Dragon Dictation and Dragon Search apps now available in Vietnam
Nuance announced that its Dragon Dictation and Dragon Search applications for the iPhone, iPod touch,
and iPad are available free in the Vietnam App Store. Supporting the Vietnamese language across regional
variations in pronunciation, the launch offers Vietnamese consumers a fast and convenient way to dictate
SMS text messages, emails, social media updates, mobile Web searches and more.
Michael Thompson, executive vice president and general manager, Nuance Mobile, said, “Dragon
Dictation and Dragon Search have already demonstrated incredible success across Asia, and we are thrilled to
expand the availability even further with the debut in Vietnam. The rapid worldwide adoption of the Dragon
apps demonstrates the strong consumer demand for voice-enabled mobile interfaces on a variety of iOS
devices.”
Samsung TV has voice recognition
Samsung’s new 75-inch ES9000 series TV has voice control in addition to many other features, such as
2D and 3D compatibility (four pairs of glasses included), and extensive Smart TV features (including Internet
streaming). The ES9000 also has a built-in Skype-compatible camera for videophone calling, as well as
support for gesture and face recognition control and Wireless Bluetooth streaming from compatible portable
devices.
HondaLink allows communicating with your car using your smartphone
The new Honda Fit EV electric car includes the company’s HondaLink system as standard equipment. If
an owner downloads the Fit EV application, he or she can communicate with the vehicle from a smartphone
running iOS or Android, a personal computer, or the interactive remote. HondaLink allows you to monitor the
Fit EV’s state of charge (and estimated range), begin charging, or see how long it will be before the car is
fully charged. To help reduce the cost of charging, the system allows you to set the charge timer to take
advantage of off-peak charging rates, as well as to pre-cool the cabin using electricity from the utility rather
than the car’s battery. A navigation system includes the location of both 120-volt and 240-volt public
charging stations.
HondaLink also allows one to stream various smartphone applications through the stereo by voice
commands, touch-screen commands, or steering-wheel buttons. It will connect drivers to cloud-based news,
information, and media feeds. HondaLink can announce the latest messages on a Facebook wall or Twitter
feed. HondaLink can read a downloaded book from your smartphone to you on the morning commute, or
announce calendar reminders verbally.
The system is also set to debut in the fall on the 2013 Honda Accord.
TalkTalk chooses Nexidia Advanced Interaction Analytics for its phone services
Nexidia announced that TalkTalk, a UK provider of home phone, broadband and mobile services to
consumers, chose Nexidia Advanced Interaction Analytics to ensure quality, compliance, and performance
consistency across its internal call centers and outsourcer network. The license agreement includes Nexidia’s
OnDemand hosted services and Managed Analytics and Business Services.
United Hospital System selects M*Modal clinical documentation system with speech
recognition
M*Modal, which is to be acquired for approximately $1.1 Billion by One Equity Partners (p. 34),
announced United Hospital System has selected the M*Modal Fluency family of cloud-based solutions for
clinical documentation. As part of the agreement, United Hospital System will roll out M*Modal Fluency
Direct, M*Modal Fluency for Transcription, and M*Modal Fluency for Imaging to its facilities across
Wisconsin and northern Illinois. The selection will include speech recognition and understanding to improve
Speech Strategy News
August 2012
25
the productivity of in-house transcriptionists while boosting physician adoption and Electronic Health Record
usability.
Toni Kuehl, Director, United Hospital System, indicated that the speech recognition feature was an
essential determinant of the choice: “M*Modal Fluency Direct voice-enables our electronic health record
improving the accuracy of clinical outcomes documentation using the power of the physician’s spoken word.
Because of its usability, the technology is able to complement physician workflow and give doctors time back
in their day to focus on patients. The M*Modal Fluency Direct solution will assist our physicians in creating a
higher quality clinical document with instant turnaround time.” Physician dictation is transformed into
electronic documents that are structured, clinically encoded, searchable, and shareable.
Providence Health & Services deploys Nuance Dragon Medical 360 Network Edition
Nuance Communications announced that Providence Health & Services, ranked by Thomson Reuters
among the top 20% of best-performing health systems in the country, is deploying Dragon Medical 360 |
Network Edition across its healthcare enterprise, making medical speech recognition available at 27 hospitals
and 250 clinics. The organization-wide deployment of Dragon Medical will support Providence’s rollout of
the Epic Electronic Health Record (EHR) system by empowering clinical staff to document and navigate the
EHR by speaking.
Over the next year, Dragon Medical will be seamlessly integrated with Epic for approximately 8,000
Providence clinicians. Once fully deployed, clinicians will be able to interact with, document and navigate
through the EHR simply by using their voice—a workflow that Nuance indicates is proven to be more
efficient and natural than typing alone, leading to faster EHR system adoption and improved physician
satisfaction with EHR use. With a voice-enabled EHR, documentation can be done by speaking in freeform or
to trigger various clinical templates and medical record review, and sign-off can occur in real-time—
eliminating the time lag and costs associated with medical transcription.
Janet Dillione, executive vice president and general manager, Nuance Healthcare, noted that the
company’s voice-driven clinical documentation solutions hare being used by more than 450,000 clinicians
across 10,000 healthcare facilities.
Terra Nova provides Health Sciences North with transcription and speech recognition editing
services
Health Sciences North (HSN), a major healthcare provider based in Sudbury, Ontario, Canada, has
selected Terra Nova as their outsourced clinical documentation partner. Terra Nova provides clinical
documentation services to hospital and clinic facilities in Canada and the Unites States. Terra Nova said it
achieves an accuracy rate of more than 99%.
4medica’s cloud-based Electronic Health Record adds medical speech recognition from
Nuance
4medica announced that its cloud-based Integrated Electronic Health Record (4medica iEHR) will
include medical speech recognition using Nuance Healthcare's 360 | Development Platform. Completely webbased, the combined solution will allow physicians to document care via voice anytime or anywhere they can
connect to the Internet. Oleg Bess, M.D., 4medica CEO, said, “It's an easy, natural way for physicians to take
advantage of mobile technology, and, as a result, increase their productivity, enhance care delivery, and
improve the accuracy and timeliness of clinical documentation.”
4medica's iEHR enables hospitals, physicians, labs and health information exchanges (HIEs) to aggregate
laboratory, imaging, pathology, e-prescribing and inpatient data from multiple sources into a single patientcentric record. With the new integrated speech recognition feature, clinicians can speak into the 4medica
iEHR Note Writer and add their own notes to the template. Alternatively, they can populate open fields of the
clinical note by voicing their desired selection. They can also view the narrative note in real-time on the
screen and make corrections by manually editing, deleting, or adding to the dictated note.
me2me releases a new version of its digital dictation app for iOS and BlackBerry devices,
targeted at the healthcare market through M*Modal partnership
Me2me Corp., a software company delivering mobile dictation, transcription and speech recognition
solutions (SSN, June 2012, p. 10), has launched a new version of its Frisbee Smart App for iOS and
Speech Strategy News
August 2012
26
BlackBerry. It communicates with the Frisbee Enterprise Server solution, enabling users to send dictation that
is immediately available for the transcription/editing staff. Thanks to a recent partnership with M*Modal (p.
34), healthcare professionals can also now make full use of the Frisbee Enterprise iOS or BlackBerry App to
record with high quality audio totally suitable for use with speech recognition in the cloud.
Leon Medical Centers selects IDS for speech recognition, mobile dictation, and workflow
solutions
Leon Medical Centers, a multi-specialty healthcare services provider for more than 38,000 Medicare
patients in Miami-Dade County, has chosen Integrated Document Solutions (IDS) to streamline its
radiology reporting using IDS’s cloud computing workflow portal, mobile applications, and speech
recognition technology. Leon’s radiologists began using IDS’s speech recognition application Voice2Dox
earlier this year to document diagnostic imaging studies, reducing turnaround times and transcription costs.
By integrating IDS’s AbbaDox ecosystem with Leon’s patient scheduling and Picture Archiving and
Communications System (PACS), providers are able to unify worklists and standardize reports across Leon’s
locations and facilities throughout the county.
Maureen Desoria, RN, BSN, JD, Director of Clinical Services, Leon Medical, said, “Because we serve a
predominantly geriatric population, timely and accurate diagnosis is critical. The AbbaDox system interfaces
with our existing technologies and Electronic Medical Record, which allows radiologists and physicians to
provide treatment and secure our patients’ health.”
Google search box in Chrome web browser displays calculator when calculation is entered,
allows voicing equation
Google launched a new feature in its search engine that displays a scientific calculator as well as the
results of a calculation. Voice search in mobile devices or the Chrome web browser can be used to input
calculations without touching the keyboard. The calculator has a full complement of scientific functions,
including sin, cos, tan, log, exponential, and square roots.
SpeakGlobal adds text-to-speech to its English language learning site for Japanese learners
SpeakGlobal in Japan, which provides web-based “chat robots” with speech recognition for English
language learning, announced the addition of text-to-speech (TTS) to its SG World global chat site. Now, site
visitors will find a text-to-speech voice in designated chat rooms. Visitors simply type a text message on the
screen. The written text will immediately appear above their avatar, and simultaneous audio of the text is
heard automatically. The TTS speech features both male and female voices with natural, standard American
English pronunciation.
Raytheon BBN awarded DoD contract to develop a foreign-document translation system
The Defense Advanced Research Projects Agency (DARPA) has awarded Raytheon BBN
Technologies, a wholly owned subsidiary of Raytheon, an additional $5.9 million in funding under the
Multilingual Automatic Document Classification, Analysis, and Translation (MADCAT) program. This
award follows Raytheon BBN’s participation in the first four years of the MADCAT program.
The Raytheon BBN team’s goal is to create a prototype system that provides accurate, relevant, distilled,
actionable information to military commands and personnel. If successful, the system will automatically
convert foreign language text images, such as handwritten notes and machine-printed documents, into English
transcripts without the use of linguists and analysts. When human analysis is necessary, linguists and analysts
would be able to use the technology to more effectively and efficiently explore the content of documents of
interest. Under the contract, BBN is tasked to advance previous work under the MADCAT program to refine
a laptop-deployable prototype translation system, integrate optical character recognition with Raytheon
BBN’s translation and distillation techniques, and develop novel methods to process handwritten text.
Carnegie Speech provides English language training with speech recognition for training
institute in Dubai
Carnegie Speech, which provides English assessment and instruction software, announced its English
language learning technology was selected by ITEC, an academic/training institute in Knowledge Village,
Dubai, UAE, to help meet the growing demands of spoken and aural English skills. ITEC will use Carnegie
Speech’s NativeAccent English speech training software to improve student spoken-English skills.
Speech Strategy News
August 2012
27
NativeAccent software combines advanced speech recognition and intelligent tutoring technologies to
improve English speaking and listening skills for personnel at multinational corporations and students at
international academic institutions. Featuring personalized learning paths based on student mother-tongue,
gender and English proficiency, as well as real-time analysis pinpointing student English errors and
delivering immediate remedial instruction, NativeAccent minimizes English accents among multinational
English speakers to improve communications, reduce errors and increase business efficiencies.
Goya Foods chooses Wavelink for voice-enabled warehouse picking solution
Goya Foods, which makes and distributes more than 2,000 products globally and has a total of 13
distribution centers, has chosen Wavelink’s Speakeasy voice-enabled stock picking in its new warehouse
management system. Luis Ramos, general manager at Goya Foods, explained, “We needed to be more
efficient across the board. Speakeasy's voice enabling technology gives picking greater efficiency, more
accuracy, and is safer than traditional picking. We went from 40 - 60 mispicks a night to almost non-existent
mispicks a night.” He added, “Our employees embraced the change. They immediately saw the safety and
efficiency benefits of voice picking and understood that it ultimately gave them more quality time off the
job.”
Stephen Bemis, vice president of worldwide sales at Wavelink, expanded on the benefits: “Speakeasy's
text-to-speech and speech-to-text technology gives companies exactly what they need in voice software with
tremendous flexibility. Also, the flexibility of the product doesn't require a user to be assigned to a specific
device and can be used across multiple shifts. Any individual can pick up a device, no matter their native
language and begin working immediately. This was important for Goya Foods where both English and
Spanish are spoken.”
Wavelink was recently acquired by LANDesk Software.
W3C Multimodal Interaction Working Group publishes “Registration & Discovery of Multimodal
Modality Components in Multimodal Systems: Use Cases and Requirements”
The Multimodal Interaction Working Group has published the First Public Working Group Note. The
latest published version can be found at www.w3.org/TR/mmi-discovery.
The background and objective of this WG Note is as follows:
§ The users of mobile phones, personal computers, tablets or other electronic devices are increasingly
interacting with their devices in a variety of ways: touch screen, voice, stylus, keypads, etc.
§ Today, users, vendors, operators and broadcasters can produce and use all kinds of different media and
devices that are capable of supporting multiple modes of input or output. Tools for authoring, edition or
distribution of Media for Application developers are well documented. But there is a lack of powerful
tools or practices for a richer integration and semantic synchronization of all these media.
§ To the best of our knowledge, there is no standardized way to build a Web application that can
dynamically combine and control discovered modalities by querying a registry-based on user-experience
data and modality states. This document describes design requirements that the Multimodal Architecture
and Interfaces specification needs to cover in order to address this problem.
Shanghai Zhi Zhen Internet Technology sues Apple in China over Siri
Shanghai Zhi Zhen Internet Technology, a Shanghai-based company with personal assistant software,
has sued Apple over of its Siri technology, claiming patent infringement.The company is the developer of
software called “Xiao i Robot” that communicates through voice, and can answer users’ questions while also
holding simple conversations. In 2004, the company applied for a patent in China covering the technology,
and was later granted it in 2006. Apple’s Siri, became available in China starting early this year, when the
iPhone 4S was officially launched. Last month, Apple said it had incorporated Chinese Mandarin and
Cantonese languages into Siri.
International Research Consortium (U-STAR) launches translation app
International Research Consortium (U-STAR, an organization with members from 23 countries)
announced a network-based speech-to-speech translation application, “VoiceTra4U-M.” Singapore’s
Institute for Infocomm Research (I2R), an institute of the Agency for Science, Technology and Research
(A*STAR), was a founding member of U-STAR. The application allows speech translation in 23 Languages
and will allow up to 5 users to chat simultaneously.
Speech Strategy News
August 2012
28
U-STAR, currently comprised of 26 institutes from 23 countries, has been conducting ongoing research
on speech translation. U-STAR and its members have collaboratively developed the multilingual speech
translation system to provide translation services via a publicly released client application, by connecting the
servers of U-STAR member institutes. More languages will be available when other research entities
participate by plugging in the U-STAR speech translation communication protocol libraries. U-STAR also
seeks to utilize the log data of speech translation collected from field experiments, helping each research
organization raise their accuracies in speech translation technology, as well as encouraging business
opportunities for the speech translation service to be cultivated in various markets.
Voxbone provides phone network for Lexifone speech-recognition-based realtime translation
service
Voxbone announced it is enabling Lexifone to launch a real-time language-translation service that
combines Lexifone’s phone-interpreter technology (SSN, March 2011, p. 6) with Voxbone’s IP voice
services. The new Lexifone service, aimed at consumers and businesses, allows each party in a telephone call
to speak and be heard in his or her chosen language. It is being launched worldwide after availability to
selected users in closed beta trials for the past five months.
Lexifone is a service, not an app, that may be used on any phone—landline, mobile or VoIP—without an
Internet connection or software download. To reach Lexifone’s translation bridge, users simply dial local
telephone numbers that Voxbone will make available in more than 50 countries, then select the languages and
the numbers of the people they want to call in 120 countries. Lexifone translates the caller’s spoken language,
such as French, into the language the called party can understand, such as German, and vice versa. The
Lexifone automated phone-interpreter service, which currently accommodates translations into seven
languages and 15 dialects, relies on Voxbone’s access numbers and all-IP network.
Dr. Ike Sagie, Lexifone CEO and founder, said, “The caller’s connection to our language-translation
platform must be crystal-clear because sound quality is crucial for accurate voice recognition.”
Microsoft touch keyboard in Windows 8 corrects some touch mistakes
In a company blog, Kip Knox of Microsoft discussed the touch keyboard in Windows 8. In Windows 8,
Microsoft set out to improve on text input support. The result was a standard touch QWERTY layout in
English. Knox said that exploring other options led back to the keyboard.
Microsoft researchers conducted an in-depth study in which they observed people “living with” tablets
over a period of time. Microsoft found that, when typing on a tablet, most people either set it on their lap or a
table and multi-finger type, or hold it in their hands and type with their thumbs, or hold it with one hand and
“hunt and peck.”
Our standard touch keyboard layout is optimized for laying the tablet down and multi-finger typing, and
also works well for typing with one hand. We also introduced a new layout we call the thumb keyboard
(which we showed for the first time at our very first preview of Windows 8 about a year ago), which is
designed for holding the tablet with two hands and typing with your thumbs. This keyboard is adjustable in
size, to accommodate different hand sizes. An interesting observation from our posture research is that people
frequently switch postures, and that posture switch is often seen as a positive thing, as we move about to
remain comfortable. So in our keyboard layouts we also considered what it would be like to type for a period
of time—say, an email to your mom—and switch postures while you do it. You might start by typing with the
tablet lying on the coffee table, for example, but then you might tire of that posture and pick up the tablet, lie
back on the couch, and interact with two thumbs. Microsoft added a version of the keyboard layout that made
the thumb version easier, e.g., adjusting the size of the keyboard to a hand size.
If you lay down a piece of glass and type on it, Microsoft noted, you get no feedback, as you do on a
physical keyboard; there is no indication for where to position your hands, and there is no indication of
whether you’ve hit a target or not. But they also observed that a touch keyboard can do things that a physical
keyboard can’t. The keys change color when you touch them, and they trigger a subtle sound, which
Microsoft tried to make minimal to avoid irritation. Knox indicated that Microsoft explored haptic feedback
(a vibration of the device based on input), but most people find the current state-of-the-art haptics somewhat
irritating when typing pieces of any length.
The Windows 8 touch keyboard compensates for some typing errors with what Microsoft calls the “touch
model.” When a user taps a key on the touch keyboard, the software detects the coordinates of the touch and
Speech Strategy News
August 2012
29
maps it to the geometry of the keys. But the press migrates outside the boundaries of the key. The key press is
first compared against a model that assesses the likelihood that you intended to strike that key or a key near it
and uses a model of typical errors along with lexical information to correct the touch.
Siri knows which is the best phone now
There was well publicized “oops” when Siri answered the question, “What is the best smartphone ever?”
with a Windows Mobile phone. She’s been fixed. Here’s how a conversation currently goes:
Q: What is the best smartphone ever?
Siri: You're kidding, right?
Q: What is the best smartphone ever?
Siri: Wait, there are other smartphones?
Q: What is the best smartphone other than iPhone?
Siri: Would you like to search on the Web for "What is the best smartphone other than iPhone?"
Q: No, what is the best smartphone ever?
Siri: The one you’re holding now.
National Federation of the Blind sues over US State Department’s purchase of Amazon
Kindles, citing limitations of the text-to-speech feature
The National Federation of the Blind filed a complaint with the Office of Civil Rights for the United
States Department of State, alleging that the State Department’s plan to purchase and deploy 35,000 Amazon
Kindles throughout the world violates federal law because blind people cannot independently access and use
the devices or their content. The State Department has announced plans to purchase 35,000 of Amazon's
dedicated e-reading devices under a sole-source contract, at a cost of $16.5 million, as part of an international
learning program being referred to as the Kindle Mobile Learning Initiative. The aim of this program is to
create a global e-reader program that introduces aspects of U.S. society and culture directly to young people,
students, and international audiences and to expand English-language learning opportunities abroad. The plan
will involve deploying the Kindles to embassies, libraries, and other entities around the world. The complaint
also alleges that a previous deployment of six thousand Kindles to State Department facilities throughout the
world violates the law.
Of the Kindles currently available, not all are capable of speaking the content of books. While the State
Department proposal specifically calls for the inclusion of this feature, the contract makes no reference to the
department’s obligation to purchase accessible technology under Section 508 of the Rehabilitation Act or
otherwise require that the devices procured be accessible to the blind. Blind readers cannot independently
access the text-to-speech reading and voice-guided menu features of the Kindle, the complaint alleges, and
cannot independently navigate within a book once it is opened, meaning that they must simply read it from
beginning to end.
Accessible Media service adds text-to-speech
Online technology that can read the text on a website aloud to its visitors has been installed at
www.ami.ca, the website of the Accessible Media Inc. specialty service. AMI operates an audio and TV
service in addition to its website; the not-for-profit multimedia organization serves Canadians who require
online reading support; the audience includes people with vision loss, people with dyslexia and other
perception difficulties, as well as people learning English or French as a second language. The text-to-speech
audio software, called BrowseAloud, features a selection of high-quality, natural-sounding voices in both
official languages.
A new Described Video (DV) Guide provides a list of described television programming across Canada.
It was developed in conjunction with the Canadian Radio-Television and Telecommunications
Commission (CRTC’s) Described Video Working Group and the Canadian Association of Broadcasters
(CAB), and designed to build awareness of described video programming and enable blind or low vision
customers to plan their television viewing.
Microsoft improves accessibility TTS function in Windows 8
Microsoft has made some changes to text-to-speech tool Narrator on the Consumer Preview of its new
Windows 8 accessibility tools. Most of them concern the new touch features, which let users move a finger
across the screen to be read the icons or content, and then tap to select. The tools were meant to make touch
Speech Strategy News
August 2012
30
screens easier to navigate for the visually impaired. To make the connection between the touch and the audio
more obvious, Microsoft has added quick audio cues to provide feedback for actions, and it’s streamlined the
gestures people used to navigate.
Proloquo2Go assistive software offers children with speaking disabilities artificial speech
Proloquo2Go from AssistiveWare is an Augmentative and Alternative Communication (AAC) solution
for iPad, iPhone, and iPod touch for people who have difficulty speaking or cannot speak at all. Speech can
be generated by tapping buttons with symbols or typing using the on-screen keyboard with word prediction.
The product is often used by adults and children diagnosed with autism, cerebral palsy, or Down syndrome,
as well as stroke victims, but previously the only voice options for children were adult voices or those
electronically altered to sound like a child’s voice. Now the app offers real children’s voices.
The American children’s voices, called Josh and Ella, were recorded by actual children over the course of
several days and include recordings for 14,000 words to match the preloaded images. (Two British children’s
voices, Harry and Rosie, are also available.)
Google researches computing methods using simulated neural networks
Google’s X Lab, headed by co-founder Sergey Brin, is the lab that produced the glasses with the
holographic display that received so much attention. Google fellow Jeff Dean and visiting faculty Andrew Ng
(from the Stanford Artificial Intelligence Lab) reported other research in a blog post. The researchers reported
on a pattern recognition approach using learning algorithms that adjust the parameters of a simulated neural
network. The “artificial neural network” was simulated with 16,000 processors and a billion connections in
Google data centers. The network was shown YouTube images for a week to see what it would learn—an
“unsupervised learning” or “clustering” approach. Apparently, it focused in on cats and learned to recognize
them in videos. (What does this say about youTube videos?) Some of the research has been published.
James and Janet Baker still pursuing compensation for their Dragon speech recognition
technology
Old-timers like your editor remember the raw deal speech recognition pioneers Jim and Janet Baker got,
when shortly after selling their company, Dragon Systems, to Lernout & Hauspie for stock in L&H, it was
revealed that the founders of L&H, who eventually went to jail, were cooking the books. L&H filed for
bankruptcy, and the Bakers were left with nothing. ScanSoft (now Nuance Communications) bought the
assets of L&H—that’s how they got into the speech recognition business—and inherited the Dragon
technology. The Dragon brand is widely used by Nuance today.
The New York Times reported the full story, and suggested the Bakers might finally get some
compensation. A suit by the Bakers against the investment firm handling the sale, Goldman Sachs (which
did get its commission) is being sued by the Bakers, and a resolution may be near.
Robots don’t just beep to warn you of movement, they now talk
RMT Robotics Ltd. introduced a programmable sound system ADAM RAP for the ADAM mobile
robot, uses interactive voice messages and mobile “vehicle in motion” jukebox. ADAM promotes lean
manufacturing efficiency in tire manufacturing facilities by automating component handling and orchestrating
work-in-process (WIP) logistics, delivering what is needed in the exact time and quantity required. A
reactive audio playback application plays various sound bites or text-to-speech audio based on the specific
function the robot is undertaking. Although all robots have a standard beeper-based “vehicle in motion” alert
system mandated by international safety standards, noise proliferation combined with monotonous beeps
diminishes worker alertness. ADAM RAP’s design promotes safer work environment and enhances workerrobot interaction.
Analyst compares Siri speech recognition search to Google text search
It was widely reported that Piper Jaffray analyst Gene Munster compared Apple’s Siri to Google and
found Siri far inferior in answering 1600 questions. A note to the firm’s clients reported:
§ Google understands 100% of the questions (meaningless, since they are typed in for the Google case—
wouldn’t it have been a fairer comparison to use Google’s Voice Actions, the closest it has to a Siri
equivalent?).
Speech Strategy News
August 2012
31
§
Google replies accurately 86% of the time (What is “accurate” if the results are a list of web sites? Does
this mean that the user could filter the web sites to find the answer, adding human intelligence to the
mix?).
§ Siri comprehends 83% of queries in noisy conditions, 89% in a quiet room (presumably meaning the
speech recognition transcribed the speech accurately, although an exact transcription might not be
required to get the correct answer. Were the errors of consequence, or “the” replaced by “a”?).
§ Siri answers accurately 62% of the time on the street and 68% in a quiet room (This is loaded if 11% and
17% of the questions respectively are the wrong question, as reported—no chance of a correct answer.
And Siri attempts to get the answer directly, unlike Google at this point, and, when that isn’t possible,
does a Google search!)
For this newsletter editor, the results were an extreme example of comparing Apples and oranges. I can draw
no conclusions from what I’ve seen of this report, although in fairness the full report might be more
informative.
Loading the dishwasher is still a job!
In a wonderful analogy in an interview with the MassDevice web site, Dr. Nick van Terheyden, chief
medical information officer for Nuance Communications, points out the problem with high expectations for
Electronic Medical Records (EMR). While noting the need for and potential power of good, accessible
medical content for doctors, van Terheyden pointed out that no one wanted to wash dishes until the
dishwasher was invented, and now no one wants to load the dishwasher. The EMR has a similar problem—no
one wants to load it with some of the most valuable data. Of course, Nuance is working on a solution with
speech recognition and natural language processing.
Taiwan's National Cheng Kung University files patent a lawsuit against Apple over Siri features
Taiwan’s National Cheng Kung University has filed a lawsuit in the US against Apple claiming that the
company has infringed two patents it holds on speech recognition that it believes are related to Apple’s Siri
voice assistant.
Statistics and Surveys
Smartphones in use worldwide to exceed 2.4 billion in 2016
Yankee Group forecast in July that smartphones in use worldwide will exceed 2.4 billion by 2016, rising
almost linearly from 1.12 billion in 2012.
Smartphone shipments to grow 38.8% this year to 686 million units
Research firm IDC expects global smartphone shipments to grow 38.8% this year to 686 million units.
Approximately three quarters of the world’s population now has access to a mobile phone
Approximately three quarters of the world’s population now has access to a mobile phone, according to a
new study from the World Bank. Fewer than 1 billion mobile subscriptions were active in 2000, while there
are six billion subscriptions active today. Last year alone, mobile users downloaded more than 30 billion
apps, the study estimated. The majority of today’s mobile subscriptions (5 billion) are in developing
countries. World Bank Vice President for Sustainable Development Rachel Kyte, said, “Mobile
communications offer major opportunities to advance human and economic development—from providing
basic access to health information to making cash payments, spurring job creation, and stimulating citizen
involvement in democratic processes.”
325 million Android phones expected to be sold worldwide in 2012
A June Yankee Group forecast predicted that 325 million Android devices and 163 million iPhones
would be sold worldwide in 2012. Results from a recent survey by Kantar Worldpanel ComTech, show
Android’s share rising to 84.1% in Spain, and it has at least half of the smartphone sales in Great Britain,
Germany, France, and Italy.
Speech Strategy News
August 2012
32
Samsung Galaxy S3 hits 10 million units in sales within two months
Samsung’s Galaxy SIII has a voice assistant that is a challenger to Apple’s Siri (SSN, July 2012, p. 1).
Perhaps indicating the value of the voice assistant in marketing, the model achieved the company’s stated
goal of 10 million sales within the first two months. The success of the SIII was highlighted in Samsung's
recent quarterly earnings statement, in which it reported a Q2 2012 operating profit of $5.9 billion.
Android has 77% share of China’s smartphone market
Android took almost 77% of sales in the first quarter of 2012, according to Beijing-based Analysis
International, which specializes in the Chinese market. A year ago, in the first quarter of 2011, Nokia’s
Symbian operating system was the market leader with 42.5% of the market against Android's 33.6%.
Biometric security to become a “must have” on all smart mobile devices, market research firm
claims
Goode Intelligence predicts that mobile biometric security will move from “an interesting concept” to a
“must-have” feature for all smart mobile devices (SMDs). Alan Goode, founder and Managing Director of
Goode Intelligence, said, “Last year, we forecasted that the mobile biometric security market would grow to
39 million users by 2015. This was based on the expectation that initial growth would come from two
biometric modalities; embedded fingerprint sensors and voice biometrics.”
Goode Intelligence predicts that mobile biometric security “will become a standard feature in SMDs as
these devices become the prime computer in both our personal and business life. Whether it is for protecting
the physical device or for providing strong authentication and identity verification for a remote service, such
as NFC-based mobile payments, mobile phone-based biometrics can offer a wide variety of solutions—the
third factor in the palm of your hand.”
Apple iPhone maintains consumer interest over Android
Despite increasing sales number for Android, a Yankee Group survey found that in May 2012 more
people said they intended to buy an Apple iPhone than an Android phone by a margin of 4%.
If you are under 34, you most likely use your mobile phone as your primary phone
According to Yankee Group, more than 60% of survey respondents from 18-34 said their primary phone
was their mobile phone. In the age group 35-44, the percentage dropped to 47.6% and was below 30% in the
higher age groups.
Voice search from Google on top 10 list of downloaded apps
Voice Search from Google (“Voice Actions”) was ranked No. 6 among free Android apps recently,
according to research from Google Play. The app has ranked high for many weeks in a row. Google’s
summary for the app: “Search the web and your phone by voice and control your phone with Voice Actions.
Quickly search your phone, the web, and nearby locations by speaking, instead of typing. Call your contacts,
get directions, and control your phone with Voice Actions.”
The mobile ad market could reach $18.3 billion by 2015
Even as smartphones account for 10% of the time spent consuming media, they draw only 1% of
advertising spending in the U.S., according to EMarketer. Bloomberg News reported in July that the picture
is changing as more technology companies, including the social media powerhouses, create mobile ad
products and woo big brands such as Target, American Express, and Coca-Cola. Bank of America Merrill
Lynch predicts the mobile advertising market will surge to $18.3 billion in 2015, from $3.6 billion last year.
Consumers show mixed interest in mobile coupons
With advertisers struggling to find the right way to reach consumers using mobile phones, an April 2012
survey by Yankee Group will interest them. The survey asked, “Thinking about mobile couponing, please
rate your experience or interest in the following activities on your mobile phone.” About two-thirds of
respondents said they would be interested in getting coupons on mobile phones, but only if “it were free.”
Speech Strategy News
August 2012
33
Global mobile app store revenue to exceed $34 billion in 2016
Yankee Group estimated that global revenue from app stores would rise from 13.2 billion in 2012 to $34
billion in 2016.
Hispanic community increasingly using mobile devices as a primary means of Internet access
In July, the Hispanic Institute and Mobile Future published a report revealing that Hispanics are
increasingly turning to mobile devices as their primary means of accessing the Internet. The report concludes
that policymakers must consider Hispanics’ reliance on mobile devices as they implement a national
broadband policy by making more wireless spectrum available, ending regressive taxes on broadband users,
and continuing to support the Lifeline/Link-Up programs (which offer discounts to qualified, low-income
wireless customers).
Nearly six out of 10 parents of children aged 8-12 have provided their children with cell
phones
Fifty-six percent of parents of children aged 8-12 have provided their children with cell phones,
according to a new survey conducted by ORC International for the National Consumers League (NCL),.
Of those parents, roughly a quarter say they are facing higher bills than they had expected to pay in order for
their child to have a cell phone. The top three reasons parents buy cell phones for tweeners are safety (84%);
tracking child's after-school activities (73%); and “child asked for one” (16%.)
Vocalabs finds that making it hard for a customer to reach an agent serves no purpose
In a free July report, Vocal Laboratories (Vocalabs) surveyed over 8,000 customers immediately after a
customer service call. Among other findings, the report found that making it hard for a customer to leave an
automated system to reach an agent served no purpose and was counter-productive. For example, among
customers who reported that an automated system made it hard to reach a person or find the right option, only
2% successfully used self-service. The majority did eventually reach a person, and the rest hung up without
getting what they needed. The survey also concluded that customers are much better than the automated
system at deciding when they need to talk to a person and when they can use self-service.
Contact center campaign survey concludes that the phone remains the most popular
communications channel
Infinity CCS, a global provider of contact center technology solutions, has announced the results of its
2012 Contact Center Campaign Survey designed to reveal how effective contact centers are in setting up new
customer campaigns or services. The key results:
§ 62% of contact center campaigns are set up in under 3 weeks.
§ Phone is still the most frequently used communication channel for both inbound and outbound contact
(used in over 70% of both inbound and outbound projects). Other channels included email, web contact
forms, post, and social media.
§ 85% say campaign development software makes it ‘easier’ to launch new projects or services.
A variety of issues flagged in a survey of contact center professionals
A recent study, conducted by International Customer Management Institute (ICMI) and sponsored by
inContact surveyed more than 500 contact center professionals from more than 20 countries. Nearly 70% of
respondents identified “meeting service level agreements” as a measure of success in a contact center today.
More than 40% of contact centers indicated they are experiencing agent attrition, and 25.3% indicated that
they were experiencing customer attrition. One major challenge identified in the survey, with 69.8 percent of
contact centers, was increased complexity, due to the proliferation of new channels and increasingly multichannel customers. Less than half of those respondents (32.7 percent), however, had upgraded their contact
centers to deal with channel proliferation.
600 million smartphones projected to support gesture recognition in 2017
A new study from ABI Research forecasts 600 million smartphones will be shipped with vision-based
gesture recognition features in 2017.
Speech Strategy News
August 2012
34
Financial Notes
Nuance reports Vlingo financials
On July 23, Nuance Communications (Nasdaq: NUAN) submitted a report to the SEC providing audited
financial results for the recently acquired Vlingo. Revenue for the quarter ended March 31, 2012 was reported
at $1.3 million with a net loss of $6.3 million. The report listed, as of March 31, 2012, total assets (mostly
cash and cash equivalents) of $48.7 million, total liabilities of $54.4 million, and redeemable preferred stock
valued at $79.7 million.
M*Modal to be acquired for approximately $1.1 Billion by One Equity Partners
M*Modal (NASDAQ/GS: MODL), a provider of integrated clinical documentation solutions for the U.S.
healthcare industry (SSN, June 2012, p. 45), announced its financial results for the three months ended March
31, 2012. Net revenues increased 5.5% to $117.4 million for the first quarter of 2012 compared with
$111.2 million for the first quarter of 2011. Adjusted EBITDA for the first quarter of 2012 was $26.6 million,
or 22.6% of net revenues, compared with $26.7 million, or 24.0% of net revenues, for the first quarter of
2011. Net loss for the first quarter of 2012 was $(2.9 million), or $(0.05) per fully diluted share
On July 2, MModal and One Equity Partners announced that they have entered into a definitive
agreement pursuant to which One Equity Partners, the private investment arm of JP Morgan Chase & Co.,
will acquire all of the outstanding shares of M*Modal for $14.00 per share in an all-cash transaction. The
transaction is valued at approximately $1.1 billion.
Under the terms of the agreement, which was unanimously approved by M*Modal’s Board of Directors,
M*Modal shareholders will receive $14.00 in cash for each outstanding share of M*Modal common stock
they own, representing an 8.3% premium over the closing price on July 2, 2012.
On July 11, Glancy Binkow & Goldberg LLP announced that it is investigating potential claims against
the Board of Directors of M*Modal related to the proposed acquisition by One Equity Partners. This
investigation concerns whether the Board of Directors of M*Modal breached their fiduciary duties to
stockholders by failing to adequately shop the company before agreeing to enter into the proposed
transaction, and whether the Company has disclosed all material information to shareholders about the
transaction.
Agero expands cloud-based content delivery to vehicles with investment in M-Way Solutions of
Germany
On July 19, Agero Connected Services, a subsidiary of Agero, a provider of vehicle connectivity
solutions (SSN, June 2012, p. 8), announced an investment in M-Way Solutions GmbH, of Stuttgart,
Germany, a provider of mobile enterprise software and mobile services. The partnership will enhance Agero’s
current capability to provide global automakers with cloud-based solutions by adding a market-proven
platform for delivering tailored Web-based content into connected vehicles. Agero plans to couple M-Way's
platform and technology with Agero's third-generation telematics infrastructure, which integrates diverse
functions within the vehicle’s electronic architecture, resulting in services tailored to drivers through multiple
in-vehicle and off-board human-machine interfaces. The two companies envision content delivered to drivers
as part of an aggregated service that includes personalization and mobile CRM processes, pre-sales and after
sales services, while conforming to human-machine interface requirements that meet the safety demands
within the vehicle.
Through its Mobile Enterprise Application Platform, mCAP, M-Way enables businesses to implement
mobile enterprise services, including enterprise app distribution and mobile device management, workflows,
mobile customer relationship management (CRM) services, and mobile commerce solutions for all mobile
devices such as iOS, Android, BlackBerry, smartphones, and tablets. The company also has been developing
and providing production systems and platforms for both premium- and mass-market automotive clients. The
maturity of the mCAP platform, which has served clients in the European mobile enterprise market for the
past several years, led to Agero's investment in repurposing the platform for in-vehicle content delivery, presales, marketing, mobility, and after-sales, as well as dealer- and customer-CRM services.
Speech Strategy News
August 2012
35
Agero foresees the next wave of connected vehicle services to be more complex than simply enabling
drivers to access mobile apps via their dashboard, according to Frank Hirschenberger, Agero’s director of
Product Innovation. “The M-Way partnership will enable Agero to effectively and quickly implement several
critical system aspects to meet this challenge,” he said.
Samsung delivers higher profits due to smartphone sales surge
Samsung is the best-selling phone manufacturer in the world right now, according to Gartner Group,
and the company reported hefty profit gains in the second quarter, up 79% percent year over year.
West Corporation reports increased revenue and profits for its second quarter
On July 18, 2012 - West Corporation, a provider of technology-driven communication services,
announced its second quarter 2012 results. Revenue was $661.9 million, compared to $622.8 million for the
same quarter last year, an increase of 6.3%. The Unified Communications segment had revenue of $369.5
million in the second quarter of 2012, an increase of 6.5% over the same quarter last year. The
Communication Services segment had revenue of $295.2 million in the second quarter of 2012, 6.1% higher
than the second quarter of 2011. The Company’s platform-based businesses had revenue of $485.2 million in
the second quarter of 2012, an increase of 8.5% over the previous year.
Adjusted EBITDA for the second quarter of 2012 was $179.5 million, or 27.1% of revenue, compared to
$170.1 million, or 27.3% of revenue, for the second quarter of 2011. At June 30, 2012, West Corporation had
cash and cash equivalents totaling $84.9 million and working capital of $260.0 million.
Spoken Communications acquires HyperQuality, provider of quality assurance and business
intelligence for contact centers
On July 2, Spoken Communications (see interview, SSN, May 2012, p. 17) announced its acquisition of
HyperQuality, a provider of third-party quality assurance and business intelligence for contact
centers. Howard Lee, CEO, Spoken Communications, was the founder of HyperQuality. Integrating
HyperQuality’s suite will be integrated with Spoken Communications’ cloud-based contact center platform.
Interactive Intelligence announces preliminary Q2 results
On July 17, Interactive Intelligence Group Inc. (Nasdaq: ININ, p. 21) announced preliminary results
for its second quarter ended June 30, 2012. Interactive Intelligence said it expects to report total revenues for
the second quarter of 2012 in the range of $54.0 million to $55.0 million, up approximately 4 to 6 percent
year-over-year, below the company’s guidance of $58.0 million to $61.0 million due, primarily, to a greater
than expected level of second quarter product orders that will be recognized as revenue in future quarters.
GAAP net loss in the second quarter of 2012 is expected to be in the range of $0.5 million to $1.5 million, or
$0.02 to $0.07 fully diluted earnings per share (EPS). Non-GAAP net income is expected to be in the breakeven range of $1.0 million, or EPS from $0.00 to $0.05.
Growth in total orders for the second quarter of 2012 was 26% compared to the second quarter of 2011.
Cloud-based orders increased 88% over the prior year’s second quarter and represented 24% of total orders
received in the quarter.
Apple acquires fingerprint scanner firm AuthenTec
According to an SEC filing August 26, Apple has acquired AuthenTec, a company that specializes in
security systems such as fingerprint scanners. AuthenTec will become a wholly owned subsidiary of Apple, at
a price of $8 per share, pending regulatory approval. Under the agreement, Apple the right to acquire nonexclusive licenses and other rights on AuthenTec hardware, software, and patents. For that, Apple will hands
over $20 million, after which it has 270 days to license certain technologies for up to $115 million. There alos
is a development agreement, which says that AuthenTec will perform certain non-recurring engineering
services for Apple for product development and will receive payment of a total of up to $7.5 million for
performance of the development work.
Speech Strategy News
August 2012
36
People
Thomas B. Sabol named Chief Financial Officer of Comverse, Inc.
Comverse Technology, Inc. (CTI), announced that, effective July 24, 2012, Thomas B. Sabol will
become Chief Financial Officer of CTI’s wholly-owned subsidiary Comverse, Inc. (CNS), which provides
business support solutions (BSS), mobile Internet, and value-added services (VAS). As previously
announced, CTI plans to spin off CNS as an independent public company in a transaction that is expected to
become effective in the third quarter of this fiscal year. Mr. Sabol joins Comverse following two years as
Chief Financial Officer of Hypercom, a publicly traded company in high security, end-to-end electronic
payment products and services.
Bill Robinson named Executive Vice President of Worldwide Sales at inContact
inContact, a provider of cloud contact center software (SSN, February 2012, p. 24), announced the
appointment of Bill Robinson as Executive Vice President of Worldwide Sales. Robinson will be responsible
for growing the company’s global cloud footprint through direct sales as well as indirect channels, including
Siemens Enterprise Communications and Verizon Business. Among other positions, Robinson was Senior
Vice President of Worldwide Field Operations for Witness Systems, where he led a team of more than 200
and approximately tripled sales in 3 years, positioning the company for a $1 billion merger with Verint
Systems.
Eliza names Lee Horner Senior Vice President of Sales
Eliza Corporation announced the appointment of Lee Horner as Senior Vice President of Sales. Horner
will support Eliza's continued growth and leadership in the “Health Engagement Management” segment,
according to a company announcement. (See interview with Lucas Merrow, founder and CEO of Eliza, SSN,
July 2012, p. 23.) Previously Senior Vice President at Vitera Healthcare Solutions (formerly Sage Software),
Horner was responsible for the strategic direction and execution of all sales, delivery, and marketing
throughout North America. John Shagoury, President of Eliza Corporation, said that Lee “will help our
customers make more informed decisions about improving and modernizing their end-to-end engagement
strategies with solutions that positively affect health outcomes, care and costs—today and in the future.”
Lyle Ball named Chief Operating Officer at translation company MultiLing
MultiLing, a translation services provider specializing in intellectual property (IP) and technical
materials translations for multinational enterprises, announced the appointment of Lyle Ball as chief operating
officer. With nearly 20 years of experience managing or consulting high-tech and clean-energy companies,
Ball spent the past six months strategically advising MultiLing on high-growth strategies related to its market
shift to IP translations. As chief operating officer, Ball will be responsible for more than 200 employees in
seven country offices and more than 1,000 highly skilled contractors across more than 80 languages.
Cyara Solutions names Laurence Webb general manager of sales for Australia and New
Zealand
Cyara Solutions, which provides premise and cloud solutions for testing, monitoring, and simulation of
IVRs and contact center systems and applications (SSN, October 2011, p. 26), announced the hiring of
Laurence Webb, a thirty-year IT veteran and former director of sales for Telstra’s outsourcing business, as
Cyara’s general manager of sales for Australia and New Zealand.
Speech Strategy News
August 2012
37
For Further Information on Products Mentioned in this Issue
Company
4medica
ABI Research
Location
Culver City,
CA
Oyster Bay,
NY
New York,
NY
Toronto,
Canada
Waltham,
MA
Tel Aviv,
Israel
About.com (part of
the NY Times Co.)
Accessible Media
Inc. (AMI)
Active Endpoints,
Inc.
Afeka Center for
Language
Processing (ACLP)
Agency for Science, Singapore
Technology and
Research (A*STAR)
Agero
Medford,
MA
Amazon
Seattle, WA
Analysis
Beijing,
International
China
Apple
Cupertino,
CA
Applied Voice Input San Jose,
Output Society
CA
(AVIOS)
AssistiveWare
Amsterdam,
The
Netherlands
AT&T
San
Antonio, TX
AuthenTec
Melbourne,
FL
Baidu, Inc.
Beijing,
China
BBVA
Spain
BMW
Westwood,
NJ
Calabrio
Minneapolis
, MN
CallMiner
Fort Myers,
FL
Carnegie Speech
Pittsburgh,
PA
CNSI
Gaithersbur
g, MD
Comverse
New York,
Technology
NY
Comverse, Inc.
Wakefield,
(subs of Comverse MA
Technology)
Product Mentioned
Contact info
Electronic Health Record
(310)695-3300; www.4medica.com
Market research
(516)624-2500;
www.abiresearch.com
www.about.com
Calorie counter app
Web site with acccessibility
features
Process automation products
Research organization
(416)422-4222; www.ami.ca
(781)547-2900;
www.cloudextend.com
+972-3-768-8757; www.aclp.co.il
Agency for supporting
innovation
www.a-star.edu.sg
Driver assistance and
connected vehicle services
Product sales on the Web
Market reseach in China
(781)393-9300; www.agero.com
Personal computers, music
players, wireless phones
Non-profit organization
supporting quality speech
application development
Assistive software
www.apple.com
Telecommunications services
www.att.com; www.wireless.att.com;
www.synaptic.att.com
(321)308-1300; www.authentec.com
Fingerprint authentication
solution
Search in Chinese
www.amazon.com
http://english.analysys.com.cn
(408)323-1783; www.avios.com
www.assistiveware.com
http://ir.baidu.com
Banking group
Automobiles
www.bbva.com
(201)307-4000; www.bmw.com
Contact center suite
(763)592-4600; www.calabrio.com
Speech analytics
(239)689-6463; www.callminer.com
Reading training using speech
recognition
IT and business process
outsourcing solutions
Network-based communication
services
Commuications solutions
(412)622-2181;
www.carnegiespeech.com
www.cns-inc.com
(212)739-1000; www.cmvt.com
(781)246-9000; www.comverse.com
Speech Strategy News
August 2012
38
Companies mentioned in this issue
Company
Cyara Solutions
Location
Melbourne,
Australia
DARPA (Defense
Arlington,
Advanced Research VA
Projects Agency)
Deutsche Telekom
Germany
Easy Voice
Windsor,
Biometrics
PA
Eliza Corporation
Danvers,
MA
eMarketer
New York,
NY
Empirix
Bedford,
MA
Gartner Group
Stamford,
CN
Glancy Binkow &
Los
Goldberg LLP
Angeles,
CA
Goode Intelligence
London, UK
Product Mentioned
Contact center solutions
Google
Goya Foods
Grain Media
Health Sciences
North
Hearst Corporation
Honda
Horizon Private
Cloud
HTC
HyperQuality
ICMI
IDC
inContact, Inc.
(formerly UCN)
Industrial
Technology
Research Institute
(ITRI)
Infinity CCS
Mountain
View, CA,
and
Cambridge,
MA
Secaucus,
NJ
Hsinchu
City,
Taiwan
Sudbury,
Canada
New York,
NY
Japan
Lake
Forest, CA
Taiwan,
R.O.C.
Seattle, WA
Colorado
Springs, CO
Framington,
MA
Salt Lake
City, UT
Chutung,Hs
inchu,Taiwa
n
Birmingham
, UK
Research support
Telecommunications services
Speech biometrics
Speech-enabled programs for
healthcare
Market research
Contact info
+61 3 9607 8304;
www.cyarasolutions.com
www.darpa.mil
www.telekom.com
(717)764-9240;
www.easyvoicebiometrics.com
(978)921-2700;
www.elizacorporation.com
(212)763-6010; www.emarketer.com
Hammer telephone application
testing
Information technology reports
and consulting
Law firm
(781)266-3200; www.empirix.com
Market research
Voice and directory search
+44 20 3356 4886;
www.goodeintelligence.com
(650)253-0000; www.google.com;
www.google.com/mobile;
www.grandcentral.com
Food company
(201)348-4900; www.goya.com
Integrated circuits
+886.3.564.5533; www.grainmedia.com
Healthcare organization
(705)523-7100; www.hsnsudbury.ca
Media and information
www.hearst.com
Automobiles
Cloud desktop and application
virtualization solutions
Smartphone and PDA Phone
devices
Quality assurance and business
intelligence for contact centers
Contact center services
www.honda.com
(888)652-2948;
www.horizonprivatecloud.com
+886-3-3753252; www.htc.com
Market research
(508)988-7988; www.idc.com
On-demand contact center
services
Research organization
(801)320-3200; www.inContact.com
Contact Manager platform
+44 121 450 7830;
www.infinityccs.com
(203)316-1111; www.gartner.com
(310) 201-9150; www.glancylaw.com
(206)283-7119;
www.hyperquality.com
(719)268-0328; www.icmi.com
+886-3-582-0100;
http://www.itri.org.tw
Speech Strategy News
August 2012
39
Companies mentioned in this issue
Company
Integrated
Document Solutions
(IDS)
Interactive
Intelligence Group
Inc.
International
Computer Science
Institute (ICSI)
International
Research
Consortium (USTAR)
iSpeech
ITEC
Kantar Worldpanel
ComTech
KRP
Communications
LANDesk Software
Lexifone
Louisiana
Department of
Health and
Hospitals
M-Way Solutions
M*Modal (MModal,
was MedQuist)
Magyar Telekom
me2me
Mercedes-Benz
Mercedes-Benz
USA
Microsoft
Corporation
Mobile Future
Motorola Mobility
(acquired by
Google)
MultiLing
National Cheng
Kung University
National
Consumers League
National Federation
of the Blind
Location
Product Mentioned
Ft.
Medical dictation service
Lauderdale,
FL
Indianapolis Unified Communications and
, IN
IVR
Contact info
(954)484-0969; www.idssite.com
Berkeley,
CA
Research institute
www.icsi.berkeley.edu
London, UK
Research organization
http://ustar-consortium.com
Newark, NJ
Dubai, UAE
--
Application developer toolkit
Academic/training institute
Consumer panels
(877)447-7332; www.iSpeech.org
www.itec.ae
www.kantarworldpanel.com
Burnaby,
BC, Canada
South
Jordon, UT
Haifa, Israel
Baton
Rouge, LA
Unified communications
integrator
IT software
(604)433-1530; www.krpcomm.com
Voice translation by phone
Health organization
www.lexifone.com
(225)342-9500;
http://new.dhh.louisiana.gov
Stuttgart,
Germany
Pittsburgh,
PA
Budapest,
Hungary
Zurich,
Switzerland
Germany
Montvale,
NJ
Redmond,
WA
Washington
, DC
Mobile solutions for the
automotive industry
Speech recognition technology
for healthcare transcription
Telephone service provider
+49 711 49066 - 460;
www.mwaysolutions.com
(412)422-2002; www.mmodal.com
Storing and retrieving personal
information by phone
Automobiles
Automobiles
www.me2me.com
Various applications, products,
and services
Coalition advocating innovations
in wireless technology and
services
Mobile phones, portable devices
(206)454-2030;
www.microsoft.com/speech
(866)459-5998;
www.mobilefuture.org
Provo, UT
Taiwan
Translation services
University
(801)377-2000; www.multiling.com
http://english.web.ncku.edu.tw
Washington
, D.C.
Baltimore,
MD
Consumers organization
(202)835-3323; www.nclnet.org
TTS for visually impaired
(410)659-9314; www.nfb.org
Downers
Grove, IL
(317)872-3000; www.ININ.com
(801)208-1500; www.landesk.com
+36 1 458 0000; www.telekom.hu
www.mercedes-benz.com
(201)573-0600; www.mbusa.com
(630)353-8000; www.motorola.com
Speech Strategy News
August 2012
40
Companies mentioned in this issue
Company
Nexidia
Northwest Multiple
Listing Service
(NWMLS)
Nuance
Communications
Nuvoton
Technology Corp.
OfCom
Location
Atlanta, GA
Kirkland,
WA
Product Mentioned
Audio content search
Real Estate broker consortium
Contact info
(404)495-7220; www.nexidia.com
www.nwrealestate.com
Burlington,
MA
Hsinchu
Science
Park,
Taiwan
London, UK
Speech technology,
applications, and services
Integrated circuits
(617)428-4444; www.nuance.com
Independent regulator and
competition authority for UK
communications industries
Investment firm
+44 300 123 3000;
www.ofcom.org.uk
Surveys
Market research
(800)444-4672;
www.orcinternational.com
(612)303-6000; www.piperjaffray.com
Telephone services
www.t-mobile.pl
Speech recognition computer
telephony
(613)271-8989; www.pronexus.com
Hospital and health services
www2.providence.org
Aerospace and defense
company
Industrial robots and
accessories
(781)522-3000; www.raytheon.com
Hosted CRM software
(415)901-7000; www.salesforce.com
Wireless telephones and TVs
www.samsung.com
IT solutions for home healthcare
and social services
(516)484-4400; www.sandata.com
Enterprise business process
software
+49 69 2222 7846; www.enterprisecommunications.siemens.com
One Equity Partners New York,
NY
ORC International
Princeton,
NJ
Piper Jaffray
Minneapolis
, MN
Polska Telefonia
Warsaw,
Cyfrowa (subs. of
Poland
Deutche Telecom)
Pronexus
Ottawa,
Ontario,
Canada
Providence Health
Renton, WA
& Services
Raytheon
Waltham,
MA
RMT Robotics Ltd.
Grimsby,
ON,
Canada
Salesforce.com
San
Francisco,
CA
Samsung
Seoul,
Electronics
South
Korea
Sandata
Port
Technologies
Washington
, NY
Siemens Enterprise Munich,
Communications
Germany
(SEN)
SoundGecko
-SpeakGlobal, Ltd.
Speaktoit plc
Spoken
Communications
SRI International
App converting articles to
speech
Kobe,
English as a Foreign Language
Japan
training
Newark, DE Personal assistant mobile app
Bellevue,
Call center and service provider
WA
solutions
Menlo Park, Speech recognition and
CA
language R&D
+886-3-5770066; www.nuvoton.com
www.oneequitypartners.com
(905)643-9700; www.rmtrobotics.com
http://soundgecko.com
www.speakglobal.co.jp
www.speaktoit.com
(206)428-6044; www.spoken.com
(650)859-2000; www.sri.com
Speech Strategy News
August 2012
41
Companies mentioned in this issue
Company
Strikeforce
Technologies
TalkTalk Group
TeleNav
Terra Nova
The Hispanic
Institute
TradeHarbor
Trapit
United Hospital
System
University of
Kansas
Verint Systems
Verizon Business
Veveo
Vlingo (acquired by
Nuance)
Vocal Laboratories
(Vocalabs)
Vocre
Voice Assist
Voice Automated
VoiceVault
VoiZapp Inc.
Voxbone
Voxeo
Voxeo Labs (part of
Voxeo)
W3C Multimodal
Interaction working
group
Wavelink
Corporation
Location
Product Mentioned
Edison, N.J. Two-factor authentication for
web sites
London, UK Fixed line broadband, voice
telephony, and mobile services
Santa
Navigation services
Clara, CA
St. John's,
Clinical documentation solutions
Canada
Washington Non-profit education forum
, DC
St. Louis,
Speaker verification applications
MO
Palo Alto,
Personal assistant app
CA
Wisconsin
Regional healthcare system
and
northern
Illinois
Lawrence,
University
KS
Melville, NY Call center and security
solutions
Los
Enterprise telephone solutions
Angeles,
CA
Andover,
Usability solutions for connected
MA
smart devices
Cambridge, Voice-powered interface for
MA
mobile phones
Golden
Usability testing
Valley, MN
-Translation app
Lake
Hosted speech services
Forest, CA
Lake
Voice to text workflows for
Forest, CA
vertical markets
Dublin,
Voice verification technology
Ireland
and service
Austin, TX
Tweet-to-voice and Facebookto-voice
Brussels,
Inbound VoIP provider
Belgium
Orlando, FL Voice hosting and contact
center solutions
San
Phone platform research
Francisco,
CA
—
Standards effort
Contact info
(732)661-9641;
www.strikeforcetech.com
+44 20 3417 1000;
www.talktalkgroup.com
(408)245-3800; www.telenav.com
Midvale, UT
(888)697-9283; www.wavelink.com
Mobile application development
and mobile infrastructure
management software
(888)600-4178;
http://terranovatrans.com
www.thehispanicinstitute.org
(314)878-1200;
www.tradeharbor.com
http://trap.it
www.uhsi.org
(785)864-2700; www.ku.edu
(631)962-9600; www.verint.com
(213)625-1005;
www.verizonbusiness.com
(978)687-8240;
http://corporate.veveo.net
(617)871-2987;www.vlingo.com;
www.vlingomobile.com
(952)941-6580; www.vocalabs.com
www.vocre.com
(949)257-0923; www.voiceassist.com
(714)969-7632 ;
http://store.voiceautomated.com
+353 1 603 9500;
www.voicevault.com
(512)850-5803; www.voizapp.com
+32 28 08 00 00; www.voxbone.com
(407)418-1800; www.voxeo.com
www.voxeolabs.com
www.w3.org/2002/mmi
Speech Strategy News
August 2012
Companies mentioned in this issue
Company
West Corporation
West Interactive
(unit of West Corp.)
Wolfram|Alpha
World Bank
Yankee Group
Yelp
Location
Omaha, NE
Omaha, NE
Cambridge,
MA
Washington
, DC
Boston, MA
San
Francisco,
CA
Product Mentioned
Communication solutions
Out-sourcing of customer
contact solutions
Knowledge search web site
International bank
Market research
Web review service
Contact info
www.west.com
(402)963-1300;
www.westinteractive.com
(217)398-0700;
www.wolframalpha.com
www.worldbank.org
(617)598-7200;
www.yankeegroup.com
www.yelp.com
42
Speech Strategy News
August 2012
43
Free Blog (with a chance to comment!)
Meisel-on-Mobile (www.meisel-on-mobile.com)
Will this make you watch TV advertisements?
Siri gets smarter
The ultimate mobile user interface: brain implants!?
Mobile marketing: Engaging your customer on a mobile device
Major themes at the Mobile Voice Conference
What does it take for mobile personal assistants to “understand” us?
Voice control of your TV: Is it listening to everything you say?
I wish to subscribe to Speech Strategy News for one year (12 issues), payable in US$ on US bank—
Individual*
Corporate*
Individual*
Corporate*
PDF
PDF
PDF
PDF
6 monthly issues
6 monthly issues 12 monthly issues 12 monthly issues
$215
$750*
$425
$1,495*
* Corporate subscriptions: Unlimited users within a corporation for PDF version with Web access through corporate
password. Individual subscriptions cannot be shared (neither passwords nor electronic copies).
Please send information on your consulting.
Name:
Company:
Address:
Check enclosed, payable to TMA Associates
(in U.S. $ on a U.S. bank).
Invoice me.
Charge my—
Visa MasterCard American Express
City, State
ZIP/Postal code
Card #
Country
Expiration date:
Email (required for email alerts or a Web subscription):
Signature:
_______________________________________________
Phone:
Copyright TMA Associates 2012; All rights reserved. TMA Associates, P.O. Box 570308, Tarzana, CA 91357-
0308 USA. Tel: (818) 708-0962. Fax: (818) 232-0368, or go to www.tmaa.com/subscribetossn. 230
Speech Strategy News is published twelve times per year by TMA Associates, Editor: William S. Meisel. Trademarks mentioned in this publication
are the property of the companies mentioned; they are used editorially. The material herein is based on data from sources believed to be reliable,
but is not guaranteed as to accuracy and does not purport to be complete. From time to time, the author or TMA Associates may have consulting
assignments, advisory positions, own stock, or have other business relations with organizations in speech recognition and associated areas,
including companies discussed in this newsletter. Speech Strategy News is a trademark of TMA Associates.