Virtual voice assistant for mobile phones

Transcription

Virtual voice assistant for mobile phones
WYDZIAŁ INFORMATYKI, ELEKTRONIKI I TELEKOMUNIKACJI
KATEDRA TELEKOMUNIKACJI
PROJEKT INŻYNIERSKI
Virtual voice assistant for mobile phones
Asystent głosowy na telefony komórkowe
Autor
: Michał Czerwień, Wojciech Szymczyk
Kierunek studiów:
Electronics and telecommunications
Opiekun
pracy: dr inż. Bartosz Ziółko
Kraków, 2016
1
Uprzedzony o odpowiedzialności karnej na podstawie art. 115 ust. 1 i 2 ustawy z
dnia
4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (t.j. Dz.U. z 2006 r. Nr
90, poz. 631 z późn. zm.): „ Kto przywłaszcza sobie autorstwo albo wprowadza w
błąd co do autorstwa całości lub części cudzego utworu albo artystycznego
wykonania, podlega grzywnie, karze ograniczenia wolności albo pozbawienia
wolności do lat 3. Tej samej karze podlega, kto rozpowszechnia bez podania
nazwiska
lub
pseudonimu
twórcy
cudzy
utwór
w wersji oryginalnej albo w postaci opracowania, artystyczne wykonanie albo
publicznie zniekształca taki utwór, artystyczne wykonanie, fonogram, wideogram
lub nadanie.”, a także uprzedzony o odpowiedzialności dyscyplinarnej na
podstawie
art.
211
ust.
1
ustawy
z
dnia
27 lipca 2005 r. Prawo o szkolnictwie wyższym (t.j. Dz. U. z 2012 r. poz. 572, z
późn. zm.) „Za naruszenie przepisów obowiązujących w uczelni oraz za czyny
uchybiające godności studenta student ponosi odpowiedzialność dyscyplinarną
przed komisją dyscyplinarną albo przed sądem koleżeńskim samorządu
studenckiego, zwanym dalej „sądem koleżeńskim”, oświadczam, że niniejszą
pracę dyplomową wykonałem(-am) osobiście, samodzielnie i że nie korzystałem(am) ze źródeł innych niż wymienione w pracy.
2
Table of contents
1. Introduction........................................................................................................ 4
2. Aim of the project............................................................................................... 5
2.1 Segregation of duties …................................................................. 6
2.2 Platform description ..................................................................... 7
2.3 Device specification....................................................................... 9
2.4 Programming environment........................................................... 9
3. Application source code ................................................................................... 11
3.1 Main Activity class....................................................................... 11
3.2 Methods class ............................................................................. 16
3.3 Android manifest......................................................................... 18
3.4 Minimalized application.............................................................. 19
4. Application tests................................................................................................ 22
4.1 Methodology............................................................................... 26
4.2 "Zadzwoń do" - "Call to" request................................................ 27
4.3 "Wiadomość - "Message" request.............................................. 28
4.4 "Szukaj" - "Search request"......................................................... 29
4.5 "Ustaw Alarm" - "Set an alarm" request..................................... 30
4.6"Znajdź na mapie" - "Find on the map" request.......................... 31
4.7 Sample sentences – additional system tests.............................. 32
4.8 Test round up.............................................................................. 33
5. Discussion........................................................................................................... 33
6. Bibliography........................................................................................................ 35
3
1.
Introduction
The purpose of this project was first introduced while working on a similar
project back in 2014 which allowed usage of voice commands in order to execute basic
computer functions like triggering internet browser and so. Looking forward it is clear,
that the trend in modern technology is to run and operate all functions making them
simple to execute by the end-user. Having in mind, that mobile technology with internet
access is in some cases essential nowadays, this project would combine those two aspects
to provide an easy application that would assist the user in mobile phone uses.
Accordingly, this project makes a link between mobile phone functions usage and quick,
yet reliable and simple user commands. Main focus during the development was to put
ease to disabled users or users that find it hard to use some of their mobile phones
functions, as well as children and older users.
Programming internal methods is one thing, but voice understanding and
recognition is a fundamental aspect of any application of this kind. Using external speech
recognition engines, developed by big companies is beneficial, as these will likely
become more and more powerful in understanding human speech and most probably
human emotions too, so that this projects engine will be updated automatically, and thus
have its solutions maintained, without change the application itself. If this will be the
case, then it would be possible to take into account customizations done by this project's
developers for end-user requests.
Coming across user expectation, other brands applications like Google Now or
Windows Cortana were closely analysed. Such analysis provided information about
possible improvements and different functionalities that could be this project’s advantage.
This project is very likely to be scalable and developed in future, as this branch of
technology is rapidly expanding. Thesis has been divided into two parts. The first part
will cover most of theoretical aspects like programming environment and platform
specifications, while the second part contains mostly voice recognition testing, end-user
experience and application efficiency as per different device used.
For now the application is still in beta, with newest version of 0.5 and codename
"Chewie" (can be also referred to as Virtual Personal Assistant or VPA).
4
2. Aim of the project
This project aims to provide a simple solution to access mobile phone functions
with the use of voice commands. After passing voice command to the phone, application
would then match recognized speech with appropriate function in the program. When
given command is not recognized the possible scenarios are that application displays
what was said and no other action is taken (this feature is for debugging mostly) or
prompts information that it could not understand what was said and asks to repeat the
inquiry. To be up and running, application requires internet connection for recognition
engine processes, but this can be easily skipped by just downloading appropriate speech
recognition package from google servers which would ensure the application works also
offline (though its functions would be then limited). Speech sample is sent to the servers
where engine is able to recognize given phonemes just to later associate them with words,
numbers and even phrases. Even though there were firstly attempts to try and develop
very own recognition system using HTK tools, due to lack of time and resources the
recognition engine developed by Google was used in the end product. It is given to be
freely used by developers within android operating system. Since now fast and quite
cheap internet access is quite common, this choice of use seems obvious and reasonable
as well as developing a concept of application working mainly online.
Img. 1. Mobile phone internet user penetration worldwide from 2014 to 2019. Source: http://www.statista.com
Due to its simplicity and wide range of functions it could be of use for those that
want do things simpler and faster, from a single interface or find it harder to type on a
small smartphone keyboards. It can also be used to help older people as well as children
use the device by just saying the commands and getting the result instead of launching
other applications or searching through contacts list for a specific name when want to
make a call. Since the application can operate offline, it can be also a good default
launcher for parents who often lend their mobiles to their children – since no internet
connection is required, there can be no harm made during the time a child is playing with
phone. Still it could though be used to “call mom” or “text mom” when needed. This is
one of the advantages the application has over similar ones written by big corporations.
5
Voice assistant requires the smartphone to be equipped with a microphone (which
can be found in nearly all models nowadays). For some functionality like maps searching,
it is advised to enable localization detection in order to narrow the searches for closely
related areas. This application does not collect user data, user browser history, user online
activity or any other data that is said to be private or confidential. Also no data is stored
within application and it cannot be accessed from other device than the one that it is
installed on, dramatically reducing the chances of data manipulation.
2.1 Segregation of duties
In order to maintain efficiency and reliability of service work has been split into
two persons. For development lifecycle it was best to use scrum method [1]. Scrum is a
development methodology which main assumptions are being agile and flexible to
upcoming changes. Scrum is said be the best solution when it comes to developing a new
idea from scratch Such methodology focuses on weekly meetings, short task divided into
scrum members. This projects tasks were assigned in such manner, that they would
require about a week time to develop. During the weekly meetings, currently assigned
tasks were reviewed, future tasks were discussed and a short talk about project direction
and possible thresholds. Then each scrum member had a few minutes to present work
done during the work period, so that every member knows all about the whole project
development.
To keep up to date with task assignment and members process, online event
organizer had been used. In this case, the organizer was "Trello" [5], which offers task
division and their assignment into enrolled scrum members. This tool provides a powerful
service to monitor all developments being done as well as history with progress for each
individual. With its use the development history is archived and kept clean and
transparent.
To upkeep the clear view and the least amount of errors, separate works had not
been merged until final steps. For this purpose version control system was introduced.
The tool that had been used was GitHub, which is an online system to achieve version
control. Final version, dated 09.12.2015 can be accessed from GitHub website under user
THenry14/Virtual-Assistant-UE [6]. Despite the methods that GitHub uses to deal with
differences, changes like renaming the similar variables and variables naming
standardization had been changed manually.
The segregation of duties was as follows:
Wojciech Szymczyk:







Backbone of application
Design of architecture and layout
Read Contacts intent
Dialing intent
Exceptions handling
Additional features (menu, exit option, scenarios option etc)
User interface
6



Merging of application
Tests
Operating in the background (not finished)
Michał Czerwień :







Standardization on Polish characters
Sequential listening
Text messages
Google maps
Alarm clock (not finished)
Internet browser
Splitting methods into classes
2.2 Platform description
Mobile platform or mobile OS is an operating system that is used on the device. It
defines mobile device, its structure, available functions, and responses to user actions.
While all available platforms support basic actions like phone calls, text messages and
internet browsing, there are differences in executing this actions and user customization.
The development, programming language and specification changes as per platform
selected. While designing for a described use, it is best to choose a platform that offers
best solutions for appropriate functions. For mobile applications, there are three
distinguishable platforms that cover over 95% of use, that are:



Android
Windows
iOS
For this project’s purpose, Android operating system was chosen, because it
offers most customization and due to the fact that it is the most used operating system
according to operating system usage on mobile devices in year 2015 and
Img. 2. Worldwide Smartphone OS Market Share (Share in Unit Shipments). Source:
http://www.idc.com/prodserv/smartphone-os-market-share.jsp
7
future years predictions. Also, there is a large growth in mobile devices other than
mobile phones, like tables and so on. Basing on below graph it is right to assume,
Android is preferable operating system among these devices as well.
Img. 3. Operating system share among all mobile devices users. Source: https://www.netmarketshare.com/operatingsystem-market-share.aspx
Also Android operating system is easily accessible by programmer. Due to being
derived from Linux system, it is being developed under GNU GPL license [10]. Android
source code is also released as open source, meaning anyone can look deeply how each
thing works, which is beneficial for programmers as source code can be well understood.
With huge community of developers actively working on new features and sharing their
ideas for anyone willing to cooperate and expand is thriving with opportunities for
enthusiasts.
8
2.3 Device specification
The main requirement about device is that it operates under Android 4.1
JellyBean operating system or above, has internet connection and a microphone. The
application itself was tested on Samsung Galaxy S4, Samsung Galaxy S4 mini (both
Android 4.1) phones, an emulator of Google Nexus 5 (running Android 6.0) and Google
Nexus 4 (Android 5.1). In order to deliver full compatibility for all the OS versions
between 4.1 and 6.0, some functions are not in full developed looking at the potential
they had, and application does not meet with Google's newest Material Design flow of
layout entirely. That is a price which had to be taken in order to ensure the support for the
largest number of devices possible.
Img. 4. Nexus 5 (on the left) and Nexus 4 (on the right), both emulated on a virtual machine and running the Virtual
Personal Assistant v 0.5 (with codename Chewie) Source: own.
2.4 Programming environment
During the development stage Android Studio was used to design and write the
application [2]. It is very good programming environment, which offers lots of tools and
amenities for both beginners and experts in coding. Like most of applications dedicated to
work under Android, the virtual assistant was written in java [4] using special android
API [3] which, since Android is open source, is superbly described on numerous android
related sites, books and forums. Ones used to look for tips and exact syntaxes while
developing the app was official Android site Android for Developers [7] and Tutorials
Point [8].
9
During the designing part, and to ensure best programming practices Stack
Overflow QA [9] site was also used, because there can be found many common problems
along with tips how to deal with them to deliver the best solutions possible.
Img. 5. Android Studio, recommended tool for Android programming, here it's welcome screen. Source: own.
10
3. Source code
Source code of this project is released in GitHub under GNU General Public
License Version 2, June 1991 [10], meaning it support basic four rules of open-source
project, which are:
 Free program launching in any reason
 Free to analyse the program runtime and adapting it to user needs.
 Free to distribute unmodified copy of the program
 Free to enhance and publish own updates for anyone to use
Assuming all those criteria's are filled, then according to FSF, such program can be said
to be open-source.
3.1 Main Activity class
The MainActivity is a main class of the application – its backbone. It consists of the most
important fragments of code of the application.
public class MainActivity extends AppCompatActivity {
protected static final int RESULT_SPEECH = 1;
private ImageButton ButtonToRecord;
private Button ButtonToMinimalize;
private TextView RecognisedText;
Methods method = new Methods();
About about = new About();
String phone = new String();
String name = new String();
String id = new String();
String number = new String();
String nameOfContact = new String();
boolean minimalized = false;
boolean isCalling;
String choiceMSG = "wiadomość";
String choiceBrowser = "Szukaj";
String choiceAlarm = "budzik";
String choiceMap = "Znajdź na mapie";
String choiceCall = "Zadzwoń";
String test = "test";
Above code presents the main class of the application, which extends
AppCompatActivity responsible for handling activities in Android. Next lines are just
declaration of variables which are critical for operation of the application or were used for
the purposes of various tests.
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
RecognisedText = (TextView) findViewById(R.id.Text);
about.AboutText = (TextView) findViewById(R.id.Text);
ButtonToRecord = (ImageButton) findViewById(R.id.ButtonToRecord);
ButtonToMinimalize = (Button) findViewById(R.id.Minimalize);
11
Next step was to override onCreate method – it is essential to do so to obtain
wanted outcomes when the application starts. Here, after the application starts it refers to
.xml file called activity_main in order to parse information about main layout of the
application – spacing, used fonts and icons etc. Of course the items named in
activity_main have to be initialized in the program.
ButtonToRecord.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
Intent intent = new
Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "pl-PL");
try {
startActivityForResult(intent, RESULT_SPEECH);
RecognisedText.setText("");
} catch (ActivityNotFoundException a) {
Toast t = Toast.makeText(getApplicationContext(),
"Device doesn't support speech recognition",
Toast.LENGTH_SHORT);
t.show();
}
}
});
Java allows to implement an action listener on a button (in this case on a
ButtonToRecord which is responsible for recording the speech) what is mainly a feature
which allows to trigger actions after a given event occurs (so when the button is clicked).
It then overrides another android built-in method – onClick – which determines actions
taken after pressing the button. Here new RecognizerIntent is triggered which just tells
system to take recorded sample and using Google Speech Recognition try to convert it to
text. Also there is an option added to this recognizer, which ensures Polish language as
the default one. Depending on whether the phone meets minimal requirements (see 2.3.
Device Specification) the speech is either processed by the intent or an exception is being
thrown with information, that “Device doesn’t support speech recognition”.
ButtonToMinimalize.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
if (minimalized) {
stopService(new Intent(getApplication(),
MinimalizedApplication.class));
minimalized = false;
} else {
startService(new Intent(getApplication(),
MinimalizedApplication.class));
minimalized = true;
startActivity(method.hideApp());
}
}
});
}
The OnCreate method is concluded with another Listener, this time on a
ButtonToMinimalize – a button which triggers minimization of the application, so it
could work in the background. Also this time onClick has to be overridden, changing its
12
behavior to check the state of a global variable minimalized. It is initialized with value
“false”, since after the start of the application it works in full screen mode. If the button
“HAZE” is clicked, the value changes to “true” and this triggers startService method on
MinimalizedApplication intent from other class, which then results in minimizing the
app. Going back to full screen mode and pressing “HAZE” button once again results in
killing the dime, and changing variable back to “false”, leaving application running on the
foreground. This closes the OnCreate.
@Override
public boolean onCreateOptionsMenu(Menu menu) {
getMenuInflater().inflate(R.menu.menu_main, menu);
return true;
}
This block just creates a menu, which can be triggered depending on the version
of the OS – by pressing mechanical button on the phone or software one placed on the
right corner of the screen.
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent
data) {
super.onActivityResult(requestCode, resultCode, data);
switch (requestCode) { // in our case speech recognition is chosen
all the time since RESULT_SPEECH=1
case RESULT_SPEECH: {
if (resultCode == RESULT_OK && null != data) {
ArrayList<String> text =
data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
RecognisedText.setText("");
RecognisedText.setText(text.get(0));
if (RecognisedText.getText().toString().startsWith(test))
{
Intent intent = new
Intent("android.intent.action.MAIN");
intent.setComponent(new
ComponentName("com.android.mms", "com.android.mms.ui.ConversationList"));
startActivity(intent);
}
if
(RecognisedText.getText().toString().startsWith(choiceCall)){
isCalling = true;
readContacts(text);
makeCall(number);
}
if
(RecognisedText.getText().toString().startsWith(choiceMSG))
{
isCalling = false;
readContacts(text);
String msg = text.get(0);
msg = method.checkPolish(msg);
startActivity(method.sendSMS(msg, number));
}
if
(RecognisedText.getText().toString().startsWith(choiceBrowser))
{
String page = text.get(0);
startActivity(method.browse(page));
13
}
if
(RecognisedText.getText().toString().contains(choiceAlarm))
{
int hour = 7;
int minute = 15;
startActivity(method.alarm(hour, minute));
}
if
(RecognisedText.getText().toString().startsWith(choiceMap))
{
String addr = text.get(0);
startActivity(method.maps(addr));
}
}
}
break;
}
}
Next the onActivityResult is being overridden, which ensures the wanted activity
will be taken after triggering some event – in this case the succussed recognition of
words. If the passed speech is recognised correctly and is not considered silence (NULL
space) the words converted to plain text are stored in an array list of strings and the
decision tree is triggered based on what is stored in this array list. When program finds
there some keywords the respective block of code is executed and the action described
within it is taken (like eg. making a call).
public String readContacts(ArrayList text) {
Map<String, String> book = new HashMap<String, String>();
nameOfContact = (String) text.get(0);
text.clear();
text.add(nameOfContact);
nameOfContact = TextUtils.join(" ", text);
if(isCalling) {
nameOfContact = nameOfContact.substring(11);
}
else{
nameOfContact = nameOfContact.substring(9);
}
ContentResolver cr = getContentResolver();
Cursor cur = cr.query(ContactsContract.Contacts.CONTENT_URI, null,
null, null, null);
if (cur.getCount() > 0) {
while (cur.moveToNext()) {
id =
cur.getString(cur.getColumnIndex(ContactsContract.Contacts._ID));
name =
cur.getString(cur.getColumnIndex(ContactsContract.Contacts.DISPLAY_NAME))
;
if
(Integer.parseInt(cur.getString(cur.getColumnIndex(ContactsContract.Conta
cts.HAS_PHONE_NUMBER))) > 0) {
Cursor pCur =
cr.query(ContactsContract.CommonDataKinds.Phone.CONTENT_URI, null,
ContactsContract.CommonDataKinds.Phone.CONTACT_ID + " = ?", new
String[]{id}, null);
while (pCur.moveToNext()) {
phone =
pCur.getString(pCur.getColumnIndex(ContactsContract.CommonDataKinds.Phone
.NUMBER));
14
book.put(name, phone);
}
pCur.close();
}
}
}
number = book.get(nameOfContact);
//RecognisedText.setText(number + " view from readContacts field"
+ "\n"); //FOR LOGGING
return number;
}
The ReadContacts method is responsible for reading contacts from the phone’s or
SIM card’s memory, so they can be later used in different areas of program. It takes the
input from the microphone, ‘cuts’ it to get the name of the contact, and then using cursors
checks all the IDs, names and phone numbers in memory just to find the one which was
looked for. Having this, it uses a map of strings (similar thing to Python’s dictionary) to
store the name of the contact alongside with its number. It was necessary to put this
method id main class and not in Methods, because it uses parts of code not accessible
from the outside.
public
void
makeCall
(String
number)
{
if(number
!=
null)
{
String
numberToCall
=
"tel:"
+
number.trim();
Uri
Call
=
Uri.parse(numberToCall);
Intent callIntent = new Intent(Intent.ACTION_CALL, Call);
startActivity(callIntent);
}
else
{
Toast
t
=
Toast.makeText(getApplicationContext(),
"No
such
contact
in
your
phonebook",
Toast.LENGTH_SHORT);
t.show();
}
}
MakeCall is used to dial a phone number which was parsed back in ReadContacts
method (it takes number as an argument). If the number is not null (so wanted contact
indeed exists in the phonebook) the number is trimmed (just to get plain number, without
any characters like NULL etc.) and then parsed into intent responsible for making phone
calls. When contact is not present in the phonebook a toast message informs about it
(later in plans is to ask about similar contacts found and whether call them or not).
@Override
public
boolean
onOptionsItemSelected(MenuItem
item)
{
switch
(item.getItemId())
{
case
R.id.action_scenarios:
about.AboutScenarios();
return
true;
case
R.id.action_settings:
//later
on
return
true;
case
R.id.action_about:
about.AboutText();
return
true;
case
R.id.action_exit:
startActivity(method.hideApp());
return
true;
15
default:
return
super.onOptionsItemSelected(item);
}
}
Last fragment of MainActivity relates to menu created previously. It is just all the
possible items available from the view of this menu, and they trigger some particular
actions (like exiting application, displaying possible scenarios etc.)
3.2 Methods class
Methods class is a class in main package speechtotext/Methods.java. It contains
most methods that are used in the application. Main activity class creates objects in this
class in order to access its functions in main Activity. Starting from top, there are imports,
that are necessary in order to use all function in the code that are Android specific,
followed by public class "Methods" declaration.
package
import
import
import
import
public
nowim.speechtotext;
android.net.Uri;
android.provider.AlarmClock;
android.support.v7.app.AppCompatActivity;
android.content.Intent;
class Methods extends AppCompatActivity {
Method hideApp takes no arguments and creates an Intent, which is a data type in
Android Java. Then on this intent Category is added, which is a home category. By
setting the FLAG of activity the application is being then hidden or closed.
public Intent hideApp () {
Intent hideIntent = new Intent(Intent.ACTION_MAIN);
hideIntent.addCategory(Intent.CATEGORY_HOME);
hideIntent.setFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
return hideIntent;
}
Method sendSMS takes two arguments, that are message (msg) and recipient
(caller). Directly from the method it is clear that the program passes whatever the content
of the method is, which in this case is the recognized speech sample, to the recipient,
which is also part of the recognized sample. This is realized by Android method
"putExtra", which implicitly puts content to the intent data structure. Msg variable is
shorten by the length of voice command needed to execute the task. Setting the type help
android
recognizing
type
of
request
that
the
program
uses.
public
Intent
sendSMS(String
msg,
String
caller)
{
//readContacts()
->
caller
msg
=
msg.substring(9);
/*chocieSMS.length()*/
Intent
sendIntent
=
new
Intent(Intent.ACTION_VIEW);
sendIntent.putExtra("sms_body",
msg);
sendIntent.putExtra("address",
caller);
sendIntent.setType("vnd.android-dir/mms-sms");
//
startActivity(sendIntent);
16
return
sendIntent;
}
Method browse takes one argument which is navigation page (page). Variable
page is shorten by the voice command used to launch the action. By passing the wanted
search into Google browser query, the application open default browser with defined
query, which in this case is the recognized speech and then automatically starts the search
trough Google search engine.
public Intent browse(String page) {
page = page.substring(6); /*choiceBrowser.length()*/
String url = "https://www.google.pl/search?q=" + page;
Intent browseNET = new Intent(Intent.ACTION_VIEW);
browseNET.setData(Uri.parse(url));
return browseNET;
}
Method alarm takes up to two arguments that are hour and minute. Upon taking
this action, program will set an alarm active for the next due date, that is the next
occurance of the set hour. Both variables are passed from the Main activity class. They
respond accordingly to the hour and minute that the alarm will be set. Due to many errors
in recognition, voice setting the hour was discarded. Currently this methods sets alarm
active for a hour defined by user manually, Errors contained:



Recognized word "siódma" instead of number "7"
Not recognizing number "21" at any case
Trying to set alarm time over the 24 hour scale
public Intent alarm(int hour, int minute) {
Intent Alarm = new Intent(AlarmClock.ACTION_SET_ALARM);
Alarm.putExtra(AlarmClock.EXTRA_HOUR, hour);
Alarm.putExtra(AlarmClock.EXTRA_MINUTES, minute);
//Alarm.putExtra(AlarmClock.EXTRA_DAYS,"MONDAY");
return Alarm;
}
Method maps takes one argument, that is address. Variable address is shorten by
the length of voice command that runs it. With using Google package, passing address
into maps search is done by parsing the geo location to the search. Good part is, that as
mentioned before, due to high integration with Google products, the search for location
that was recognized from speech sample starts as close to current location as possible.
That means searching for a specific street will first search trough close area to current
location. Thanks to this solution, user would not get the same street in a different country.
public
Intent
maps(String
address)
{
address
=
address.substring(16);
/*
mapa.length()
*/
Uri
gmmIntentUri
=
Uri.parse("geo:0,0?q="
+
address);
Intent mapIntent = new Intent(Intent.ACTION_VIEW, gmmIntentUri);
mapIntent.setPackage("com.google.android.apps.maps");
return
mapIntent;
}
Method checkPolish takes one argument, which is message. Currently it is used
in text messages. In case this project would be developed, future idea was to add this as a
checkbox in application runtime. This method replaces Polish signs in messages. Due to
17
dramatic reduce in text message length because of Polish signs, it switches any Polish
language specific characters into associated character in English language.
public
String
msg
msg
msg
msg
msg
msg
msg
msg
msg
return
=
=
=
=
=
=
=
=
=
checkPolish(String
msg.replace("ć",
msg.replace("ś",
msg.replace("ą",
msg.replace("ę",
msg.replace("ó",
msg.replace("ż",
msg.replace("ź",
msg.replace("ł",
msg.replace("ń",
msg)
{
"c");
"s");
"a");
"e");
"o");
"z");
"z");
"l");
"n");
msg;
}
}
3.3 Android manifest
Android manifest file is a must have in any application. It represents most
important information that is interpreted by Android system. This information is essential
in order to run the code. First of all it contains the name of the package that is used within
the application. This manifest also describes all the permissions used within the
application, that is, for this project:









Read contacts
Call phone
System alert window
Read calendar
Send sms
Read sms
Wake lock
Set alarm
Read Calendar
Moreover it defines most information about the application version itself, like the
minimum Android API level of application to detect for mobile device. Finally it
provides all necessary libraries that must be linked to the application. Below is the code
snippet of manifest file with mentioned permissions.
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
package="nowim.speechtotext" >
<uses-permission android:name="android.permission.READ_CONTACTS" />
<uses-permission android:name="android.permission.CALL_PHONE" />
<uses-permission android:name="android.permission.SYSTEM_ALERT_WINDOW"/>
<uses-permission android:name="android.permission.READ_CALENDAR" />
<uses-permission android:name="android.permission.SEND_SMS"/>
<uses-permission android:name="android.permission.READ_SMS"/>
<uses-permission android:name="android.permission.WAKE_LOCK"/>
<uses-permission android:name="com.android.alarm.permission.SET_ALARM"/>
<uses-permission android:name="android.permission.READ_CALENDAR"/>
<application
android:allowBackup="true"
18
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:theme="@style/AppTheme" >
<activity
android:name=".MainActivity"
android:label="@string/app_name" >
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
<service android:name="nowim.speechtotext.MinimalizedApplication" >
</service>
</application>
</manifest>
3.4 Minimalized application
This class is responsible for application to run in the background. It is not
finished, and for now only minimizes application, displaying a clickable dim which can
also be freely moved along the screen. The dim is 'always on top' - that is because it
works as a service, and to implement such idea a programming forum Stack Overflow [9]
was used, to get some inspiration and help with critical parts of code.
Img. 6. Screenshots from Samsung Galaxy S4 running the application. On the left the dim in its initial coordinates on the
screen, drew on the Message app. On the right the dim drew on a settings screen, relocated by the user according to his/hers
will. Source: own.
19
public class MinimalizedApplication extends Service {
private WindowManager windowManager;
private ImageView appHead;
WindowManager.LayoutParams parameters;
@Override
public void onCreate() {
super.onCreate();
windowManager = (WindowManager) getSystemService(WINDOW_SERVICE);
appHead = new ImageView(this);
appHead.setImageResource(R.mipmap.ic_minimalized_head);
As mentioned, the dim works as a service, that is why MinimalizedApplication class
implementing the whole idea behind the dim extends Service and additionally needs
private WindowManager to be able to ‘draw’ itself on other applications. appHead just
indicates the icon of dim, and parameters will be used to locate the dim. Since the dim
can be somehow treated as a new application, it has to have its own overridden onCreate.
parameters= new(
WindowManager.LayoutParams(WindowManager.LayoutParams.WRAP_CONTENT,
WindowManager.LayoutParams.WRAP_CONTENT,
WindowManager.LayoutParams.TYPE_PHONE,
WindowManager.LayoutParams.FLAG_NOT_FOCUSABLE, PixelFormat.TRANSLUCENT);
parameters.gravity = Gravity.TOP | Gravity.LEFT;
parameters.x = 0;
parameters.y = 100;
appHead.setOnTouchListener(new View.OnTouchListener() {
private int initialX;
private int initialY;
private float initialTouchX;
private float initialTouchY;
@Override
public boolean onTouch(View v, MotionEvent event) {
switch (event.getAction()) {
case MotionEvent.ACTION_DOWN:
initialX = parameters.x;
initialY = parameters.y;
initialTouchX = event.getRawX();
initialTouchY = event.getRawY();
return true;
case MotionEvent.ACTION_UP:
if( (Math.abs(initialTouchX event.getRawX())<5) && (Math.abs(initialTouchY - event.getRawY())<5) )
{
Toast t =
Toast.makeText(getApplicationContext(),
"Clicked",
Toast.LENGTH_SHORT);
20
t.show();
}
return true;
case MotionEvent.ACTION_MOVE:
parameters.x = initialX
+ (int) (event.getRawX() initialTouchX);
parameters.y = initialY
+ (int) (event.getRawY() initialTouchY);
windowManager.updateViewLayout(appHead,
parameters);
return true;
}
return false;
}
});
windowManager.addView(appHead, parameters);
}
The initial coordinates were set in this block of code, and the onTouch responsible for
actions after touching the dim was overridden so it now describes possible cases –
moving
dim
up,
down,
left
right
and
clicking
it.
@Override
public
void
onDestroy()
super.onDestroy();
if
(appHead
!=
windowManager.removeView(appHead);
}
@Override
public
//
return
}
IBinder
onBind(Intent
intent)
{
null)
{
null;
}
The MinimalizedApplication end with directives how to ‘destroy’ the dim.
21
4. Application tests
The application was tested during the main development stage and right after it by some
group of people consisted of:

A Polish 22 year old male, also English native-speaker

A Polish 26 year old female, also Spanish native-speaker

A Polish 6 year old female child, with a slight speech defect (mixing „r” with „l”)

A Polish 51 year old male
They were all requested to say some sample commands first, and later to just try and use
the application for some every day tasks like searching the information on the internet.
Most of the exact sentences are in the table no. 1, with indication if the command was
recognised properly or not. Since the application was designed to recognise Polish
commands and sentences only Polish native-speakers were performing the tests.
However, two foreign language speakers who were taking part were asked to try and see
if some non Polish words would be also recognised correctly. The main commands to use
were:

„Zadzwoń do <content>”

„Wiadomość <content>”

„Szukaj <content>”

„Ustaw alarm <content>”

„Znajdź na mapie <content>”
Additionally, for calculation of recognition efficiency purposes, some of the testers were
asked to record some sentences.
Table 1. Commands and the content that was put into them, with indication if they passed [OK] or not passed [NOK].
Command
Zadzwoń do
Content
1.
Mario [OK]
2.
Alberta [OK]
3.
Szymona [OK]
4.
Szymon [OK]
5.
Mariola [OK]
6.
Marioli [OK]
22
Wiadomość
7.
Mariusz [OK]
8.
Mariusza [OK]
9.
Ady [OK]
1.
Do Marysi [OK]
2.
Do Marysi [NOK]
3.
Do Krystiana [NOK]
4.
Do Marysi [OK]
5.
Do Krystiana [NOK => Do kryształowej kocham
Cię]
6.
Do Krystiana [OK]
7.
Do Alberta kocham Cię [OK]
8.
Do Maria [OK]
9.
Do Maria kocham Cię [NOK]
10. Do Szymona [OK]
11. Do Szymona kocham Cię [OK]
12. Mamo kup jajka [NOK => Na Zakopiankę]
13. Tato kup jajka na Zakopiance [NOK]
14. Gotuj telefon na zupę [OK]
15. Zakop telefon [NOK => zakup telefon]
16. Ugotuj telefon za łaływankę (bełkot dziecka)
[Ugotuj telefon na małżonkę]
17. Alarmowa na pomoc [OK]
18. Do Mario, Mario wpadnij [OK]
19. Do Marysi, one, two, three [do Marysi łan tu fri]
20. Do Alberta, kup dwa ptaszki [NOK => wiadomości
Alberta Kolbuszowa ptak]
21. Dla Alberta, kup dwa ptaszki [NOK => na FB tak
dwa ptaszki]
Szukaj
1.
Pet shop [NOK => potrzeba]
2.
[OK]
23
3.
[OK]
4.
Ticiro [NOK =>Tryczyna]
5.
Arsenal London Football Club [OK]
6.
Jak dojechać do Hamburga z Hamburga [NOK]
7.
Słuchaj [NOK]
8.
Wyniki lotto [NOK (b.głośno)]
9.
Taxi w Olkuszu [OK]
10. Youtube Moda na sukces [OK]
11. Google translate [OK]
12. Adams.pl [OK]
13. Premier League [OK]
14. Serial (pauza) El internado Laguna Negra [NOK =>
serial]
15. Hotel Mercure Wrocław [OK]
16. Emma Watson [OK]
17. Sherlock Holmes studium w różu [OK]
18. Sherlock Holmes study in pink [OK]
19. Jak żyć [OK]
20. Kiedy Arctic Monkeys przyjadą do Polski [OK]
Ustaw alarm
1.
Na 17 [NOK => 14]
2.
Na 12 [OK]
3.
Na 5 [OK]
4.
Na 20:00 [NOK => na 20 000]
5.
Na 17 [OK]
6.
Na 21:00 [NOK => na 25]
7.
Na 21 [NOK => na 20 1]
8.
Na 20:02 [NOK => na 22]
9.
Na 20:03 [NOK => na 23]
10. Na północ [OK]
24
11. Na 00 [OK]
12. Na 15:02 [OK]
13. Na 3 rano [OK]
14. Na 3 po południu [NOK => 3 (rano)]
15. 11:23 [OK]
16. Na 20:07 [NOK => na 2]
Znajdź na mapie
1.
//error -> application was stopped//
2.
Hotel spa [OK]
3.
Hotel spa w Zakopanem [OK]
4.
Hotel spa w Zakopanym [OK]
5.
Szkoła Podstawowa numer 2 w Olkuszu [OK]
6.
Dworzec Autobusowy Paryż [NOK]
7.
Dworzec Autobusowy w Paryżu [OK]
8.
Dworzec MDA w Krakowie [OK]
9.
Dworzec autobusowy Madryt [OK]
10. Dworzec w Olkuszu [OK]
11. Sklep spożywczy w Bobrownikach [NOK => sklep
spożywczy koniczynka]
12. Bobrowniki koło Włoszczowy [OK]
13. Remiza Strażacka Pilczyca [NOK => remiza
strażacka w fizyce]
14. Cafe Vinyl [NOK => cafe winy]
15. Green Day Wrocław [OK]
16. Złote Tarasy w Warszawie [OK]
17. Stadion w Gdańsku [OK]
18. Azory Kraków [OK]
19. Stadion Madryt [OK]
20. Londyn, Ashburton Grove [OK]
Some sample sentences
1.
Yyyy Olkusz [NOK => a arkusz]
25
2.
Kapusta [NOK => usta]
3.
Witam państwa [OK]
4.
Potężna wichura łamiąc duże drzewa trzciną
zaledwie tylko kołysze [OK]
5.
Cześć [OK]
6.
Nowim [OK]
7.
Cojes [OK]
8.
Trzy ciufcie [NOK]
9.
Mikrofon [OK]
10. Przez stół nie da rady [OK]
11. Nie rozumiem Chewbacci [OK]
12. Słowa, zdania, cokolwiek [OK]
13. Stół z powyłamywanymi nogami [OK]
14. Stół bez nóg [OK]
15. Król Karol kupił Królowej Karolinie korale kolory
koralowego [OK]
When [OK], the application did exactly what it should. With [NOK] there were some
troubles – either application didn’t understand what was said and asked to repeat the
sentence, or understood exactly what was said, but didn’t perform wanted action.
4.1
I.
Methodology
Positive try
When the application meets the expectations for given command:

Zadzwoń do – application dials a wanted person if his/her number is in the
phonebook or displays information toast that we don’t have such contact in the
phonebook if this is the case

Wiadomość – application creates a new message text (for now only message,
without receiver’s info)

Szukaj – application connects to the internet, opens a default browser, and
using google.com shows us the results of request (or searches similar to requested
one)
26

Ustaw alarm – application sets the alarm and shows (using toast message) the
information about the triggering of the alarm

Znajdź na mapie – application connects to the internet, and using google maps
shows us requested location

Sample sentences – application displays us exact words we said, with no
mistakes
II.
Negative
try
When the application results in different outcomes than in point I.
Positive try (or does not trigger wanted action).

Zadzwoń do – application twists our words (like, instead of Mario app gets
Marian)

Wiadomość – application twists our words or does not trigger the message
intent

Szukaj – application cannot connect to the internet or results different
outcomes than wanted

Ustaw alarm – application sets the alarm on an hour which differs from what
we wanted

Znajdź na mapie – application wrongly assumes our wanted location

Sample sentences – application twists at least one word
4.2

„Zadzwoń do” – „call to” request
Conditions:
o 30/40 tries in very bad conditions (8 people in a room + tv set turned on)
o

10/40 in good conditions (only tester in a room, no external source of
sound/noise)
Testers:
o 6 year old female child – 19 tries
o 26 year old female – 8 tries
o 51 year old male – 3 tries
o 22 year old male – 10 tries
27

Possible contacts:
o Mario
o Mariusz
o Alberto
o Mariola
o No contact in a phone book

Results:
o 34 positive tries
o 6 negative tries (2 times - don’t understand, can you repeat please? +
times twisted names )

4
Comments:
o
4 times name „Alberto” was pronounced with a mistake, 3 times
application guessed correctly the name, 1 time it tried to call „Agata”
o 1 time the child hesitated when saying „Zadzwoń do (pause) Mario” and
the application guessed she said „Zadzwoń do Marysi”
o 2 times it twisted „Mariusz” with „Mario”, both times when testing in bad
conditions
4.3

o
The 2 times when it didn’t understand a word were performed in both
good and bad conditions, once in each
o
The twisting could though be due to the fact, that both Polish and non
Polish names were included into tests, and majority of the tests were
performed
by
a
child
„Wiadomość” – „Message” request
Conditions:
o
40/50 tries in very good conditions (only tester in a room, no external
source of sound/noise)
o 10/50 tries in bad conditions (4 people in a room + TV set)

Testers:
o 6 year old female child – 30 tries
o 26 year old female – 10 tries
o 22 year old male – 10 tries

Results:
28
o 38 positive tries
o 12 negative tries

Comments:
o
Most of the negative tries (8) were caused by widely understood noise
(blundering, laugh, external sound/noise sources, talking people)
o 2 negative tries were due to recognition system failure (don’t understand,
can you repeat please?)
4.4

o
Sometimes though application understood blundering perfectly (like
„Albelto” was understood as it should – „Alberto”)
o
There were times, when English words were understood by the
application, but displayed in phonetic way
o
The remaining 2 negative tries were when application understood
everything correctly, but did not trigger message intent
„Szukaj” – „search” request
Conditions:
o 30/50 tries in very bad conditions (8 people in a room + tv set turned on)
o

20/50 in good conditions (only tester in a room, no external source of
sound/noise)
Testers:
o 6 year old female child – 16 tries
o 26 year old female – 20 tries
o 22 year old male – 10 tries
o 51 year old male – 4 tries

Results:
o 43 positive tries
o 7 negative tries

Comments:
o This request deals very well with non Polish words
o
Some of the negative tries were when mixing two languages in one
sentence
o Some of the negative tries were also resulting in „don’t understand .... „,
which was due to great noise around
29
o What was disturbing, but happened only once – after 30 consecutive tries
application just crashed. Sadly though, after debugging it was hard to tell
if it was strictly application’s fault or was it recognition system’s fault.
Most possible scenario was that there was a temporary lack of internet
connection
4.5
„Ustaw alarm” – „set an alarm” request
Disclaimer – this command is not quite finished, and that was a reason for
some errors the testers encountered. That is why, from this section, only
speech recognition utilities were taken under the account in the
summarization of tests.

Conditions:
o 20/20 tries in neutral conditions (2 people in a room, no additional source
of sound/noise)

Testers:
o 6 year old female child – 5 tries
o 26 year old female – 5 tries
o 22 year old male – 10 tries

Results:
o 10 positive tries
o 10 negative tries

Comments:
o The command is somewhat „hardcoded” for one time a day for now, since
there were big problems to be solved during development and too little
time
o Application does not read some more problematic hours properly (like
instead of 21 it reads 25 etc.), though this is purely recognition system’s
fault
o It is problematic to implement distinction between am and pm
30
4.6
„Znajdź na mapie” – „find on the map”
request

Conditions:
o 40/50 tries in very bad concitions (8 people in a room + tv set turned on)
o 10/50 tries in good conditions (only tester in a room, no external source of
sound/noise)

Testers:
o 6 year old female child – 4 tries
o 26 year old female – 30 tries
o 22 year old male – 10 tries
o 51 year old male – 6 tries

Results:
o 34 positive tries
o 6 negative tries

Comments:
o
Application handles very well the requested inquiries, no matter if the
request was to find eg. „Szkoła podstawowa” or „Podstawówka”
o
Application also makes almost no mistakes showing exact adress of
wanted places, like stadiums
o
Application shows the nearest possible results – so when one is in
Warsaw and looking for „Włochy” most likely will obtain the Warsaw
neighbourhood called that and not the country „Włochy” – Italy.
o Application struggled when the inquiry was requested with hesitation and
it can be read in both negative and positive way – on one hand when it
does not read the request in exact way (so waits through the hesitation
and reads what is after it) but on the other hand it tries to fit what it read
before the hesitation and using the location of the phone shows what is
most similar in terms of content to the requested one and delivers the
result in pretty effective time frame
o Some of the negative tries were probably the results of bad environment
conditions
31
4.7 Sample sentences–additional system tests

Conditions:
o 40/50 tries in very bad conditions (8 people in a room + tv set turned on)
o 10/50 tries in good conditions (only tester in a room, no external source of
sound/noise)

Testers:
o 6 year old female child – 3 tries
o 26 year old female – 5 tries
o 22 year old male – 38 tries
o 51 year old male – 4 tries

Results:
o 41 positive tries
o 9 negative tries

Comments:
o
As could be predicted majority of negative tries were in bad conditions
(8)
o Most of the time (7) it asked to repeat the sentence
o
When in good conditions application worked nearly perfectly, failing to
understand only once
o The sentences were in Polish, English and Spanish
o
Oddly, application didn’t have much trouble with writing phonetically
words witch don’t exist in Polish language officially (like „nowim”,
which is a regionalism) and English or Spanish ones (but only when
appearing in a sentence, the Polish words – when inquiry was consisted
of all foreign words from one language, it detected the language just fine.
It is due to prioritizing Polish language, which was implemented in code)
32
4.8
Tests round up
Table 2. Tests round up.
Name
Number/percentage
Total requests/sentences
260
Positive tries (when everything was fine)
200
Negative tries (at least one thing went
wrong)
60
Positive recognition tries (with alarm)
76.92%
Negative recognition tries (with alarm)
23.08%
Positive application response
79.17%
Negative application response
20.83%
5.
Discussion
What was done:
Simple graphical layout, getting the contacts list, handling calls, text messages,
internet browsing, map search, partially alarm clock, application working in background,
simple application settings with "about application" section, application icon. Apart from
those, a huge research on the topic had been going. During this research, voice
recognition had been tested, analysed and investigated. Also it allowed for further
discovery of voice sample recognition engine, as well as user reaction and impression
about the application. Throughout the project, development resulted in improved object
oriented skills, application structure management knowledge and overall Java
programming abilities. Due to scale of the project, some task management skill had been
involved, including splitting the work into blocks, work assignment and progress
monitoring. That helped to understand the complexity and workflow of developing full
application with all its components.
What could be improved:
In case of any further releases, versions or patches any not reliable function
should be improved and any method that is partially implemented should be finished.
First of all a clever implementation of alarm clock would be in place. Sending text
messages or phone calls could be enhanced by improving matching with contacts found.
Moreover by default application working in the background could be run by clicking an
33
icon box. Furthermore there could also be any case of saving speech samples as notes for
future uses, such that it would save in current date within calendar or to a separate file in
a specific folder.
Finally there are two more things that came hard to implement in this code
structure due to activity the system uses and code organization. Those things are
reminders and full calendar synchronization. As said, it was decided to do not include this
in the final project as reminders were causing errors, because they failed to save and full
calendar integration failed mostly due to different Android versions on different phones.
For example the calendar is not visible at all on Samsung Galaxy S4 running Android 4.1
(additionally causing the application to start not almost immediately but with about 15
seconds delay, which is not acceptable), while working without any problems with
Google Nexus 5, on Android 6.
Conclusions from tests:
The application reacts in almost 80% cases properly on requests. It properly reads
the sentences and commends (over 75% cases – in comparison Google claims that the
system works in 92% cases correctly [11]), also in very bad conditions, given by adults
and small children. That could be the positive sign considered the scope of potential
users. The VPA may facilitate usage of mobile phones by children and elders or some
disabled people (e.g.. with seeing problems). Easy, intuitive commends which are
understood by the application are something like a core of this product, which was
highlighted by majority of testers. The advantage of this application can be the fact that it
is open source, does not collect and send anywhere any user data and can be easily scaled
or suited for specific user by any programmer with at least basic programming skills.
Although, for now, application is not very advanced and probably lacks some more
commands, still meets expectations of the testers asked.
34
6. Bibliography
[1] Kenneth S. Rubin. Essential Scrum: A Practical Guide to the Most Popular Agile
Process, Michigan, First printing, July 2012
[2] Paul Deitel, Harvey Deitel, Abbey Deitel. Android for programmers an app-driven
approach. Crawfordsville, Indiana, December 2013
[3] Reto Meier. Professional Android Application Development, Indianapolis 2009
[4] Bruce Eckel. Thinking in Java. Fourth Edition, Helion, Gliwice 2006
[5] Online event organiser "Trello" http://www.trello.com
[6] Direct link to application's Version Control System on GitHub
http://www.github.com/THenry14/Virtual-Assistant-UE
[7] Android for Developers http://developer.android.com/index.html
[8] Android tutorials and tips about programming http://www.tutorialspoint.com/android/
[9] Stack Overflow Question/Answer forum for programmers http://stackoverflow.com/
[10] Official GNU website http://www.gnu.org/licenses/gpl-3.0.en.html
[11] Jordan Novet. Google says its speech recognition technology now has only an 8%
word error rate. Access: December 23rd, 2015.
http://venturebeat.com/2015/05/28/google-says-its-speech-recognition-technology-nowhas-only-an-8-word-error-rate/
35