Virtual voice assistant for mobile phones
Transcription
Virtual voice assistant for mobile phones
WYDZIAŁ INFORMATYKI, ELEKTRONIKI I TELEKOMUNIKACJI KATEDRA TELEKOMUNIKACJI PROJEKT INŻYNIERSKI Virtual voice assistant for mobile phones Asystent głosowy na telefony komórkowe Autor : Michał Czerwień, Wojciech Szymczyk Kierunek studiów: Electronics and telecommunications Opiekun pracy: dr inż. Bartosz Ziółko Kraków, 2016 1 Uprzedzony o odpowiedzialności karnej na podstawie art. 115 ust. 1 i 2 ustawy z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (t.j. Dz.U. z 2006 r. Nr 90, poz. 631 z późn. zm.): „ Kto przywłaszcza sobie autorstwo albo wprowadza w błąd co do autorstwa całości lub części cudzego utworu albo artystycznego wykonania, podlega grzywnie, karze ograniczenia wolności albo pozbawienia wolności do lat 3. Tej samej karze podlega, kto rozpowszechnia bez podania nazwiska lub pseudonimu twórcy cudzy utwór w wersji oryginalnej albo w postaci opracowania, artystyczne wykonanie albo publicznie zniekształca taki utwór, artystyczne wykonanie, fonogram, wideogram lub nadanie.”, a także uprzedzony o odpowiedzialności dyscyplinarnej na podstawie art. 211 ust. 1 ustawy z dnia 27 lipca 2005 r. Prawo o szkolnictwie wyższym (t.j. Dz. U. z 2012 r. poz. 572, z późn. zm.) „Za naruszenie przepisów obowiązujących w uczelni oraz za czyny uchybiające godności studenta student ponosi odpowiedzialność dyscyplinarną przed komisją dyscyplinarną albo przed sądem koleżeńskim samorządu studenckiego, zwanym dalej „sądem koleżeńskim”, oświadczam, że niniejszą pracę dyplomową wykonałem(-am) osobiście, samodzielnie i że nie korzystałem(am) ze źródeł innych niż wymienione w pracy. 2 Table of contents 1. Introduction........................................................................................................ 4 2. Aim of the project............................................................................................... 5 2.1 Segregation of duties …................................................................. 6 2.2 Platform description ..................................................................... 7 2.3 Device specification....................................................................... 9 2.4 Programming environment........................................................... 9 3. Application source code ................................................................................... 11 3.1 Main Activity class....................................................................... 11 3.2 Methods class ............................................................................. 16 3.3 Android manifest......................................................................... 18 3.4 Minimalized application.............................................................. 19 4. Application tests................................................................................................ 22 4.1 Methodology............................................................................... 26 4.2 "Zadzwoń do" - "Call to" request................................................ 27 4.3 "Wiadomość - "Message" request.............................................. 28 4.4 "Szukaj" - "Search request"......................................................... 29 4.5 "Ustaw Alarm" - "Set an alarm" request..................................... 30 4.6"Znajdź na mapie" - "Find on the map" request.......................... 31 4.7 Sample sentences – additional system tests.............................. 32 4.8 Test round up.............................................................................. 33 5. Discussion........................................................................................................... 33 6. Bibliography........................................................................................................ 35 3 1. Introduction The purpose of this project was first introduced while working on a similar project back in 2014 which allowed usage of voice commands in order to execute basic computer functions like triggering internet browser and so. Looking forward it is clear, that the trend in modern technology is to run and operate all functions making them simple to execute by the end-user. Having in mind, that mobile technology with internet access is in some cases essential nowadays, this project would combine those two aspects to provide an easy application that would assist the user in mobile phone uses. Accordingly, this project makes a link between mobile phone functions usage and quick, yet reliable and simple user commands. Main focus during the development was to put ease to disabled users or users that find it hard to use some of their mobile phones functions, as well as children and older users. Programming internal methods is one thing, but voice understanding and recognition is a fundamental aspect of any application of this kind. Using external speech recognition engines, developed by big companies is beneficial, as these will likely become more and more powerful in understanding human speech and most probably human emotions too, so that this projects engine will be updated automatically, and thus have its solutions maintained, without change the application itself. If this will be the case, then it would be possible to take into account customizations done by this project's developers for end-user requests. Coming across user expectation, other brands applications like Google Now or Windows Cortana were closely analysed. Such analysis provided information about possible improvements and different functionalities that could be this project’s advantage. This project is very likely to be scalable and developed in future, as this branch of technology is rapidly expanding. Thesis has been divided into two parts. The first part will cover most of theoretical aspects like programming environment and platform specifications, while the second part contains mostly voice recognition testing, end-user experience and application efficiency as per different device used. For now the application is still in beta, with newest version of 0.5 and codename "Chewie" (can be also referred to as Virtual Personal Assistant or VPA). 4 2. Aim of the project This project aims to provide a simple solution to access mobile phone functions with the use of voice commands. After passing voice command to the phone, application would then match recognized speech with appropriate function in the program. When given command is not recognized the possible scenarios are that application displays what was said and no other action is taken (this feature is for debugging mostly) or prompts information that it could not understand what was said and asks to repeat the inquiry. To be up and running, application requires internet connection for recognition engine processes, but this can be easily skipped by just downloading appropriate speech recognition package from google servers which would ensure the application works also offline (though its functions would be then limited). Speech sample is sent to the servers where engine is able to recognize given phonemes just to later associate them with words, numbers and even phrases. Even though there were firstly attempts to try and develop very own recognition system using HTK tools, due to lack of time and resources the recognition engine developed by Google was used in the end product. It is given to be freely used by developers within android operating system. Since now fast and quite cheap internet access is quite common, this choice of use seems obvious and reasonable as well as developing a concept of application working mainly online. Img. 1. Mobile phone internet user penetration worldwide from 2014 to 2019. Source: http://www.statista.com Due to its simplicity and wide range of functions it could be of use for those that want do things simpler and faster, from a single interface or find it harder to type on a small smartphone keyboards. It can also be used to help older people as well as children use the device by just saying the commands and getting the result instead of launching other applications or searching through contacts list for a specific name when want to make a call. Since the application can operate offline, it can be also a good default launcher for parents who often lend their mobiles to their children – since no internet connection is required, there can be no harm made during the time a child is playing with phone. Still it could though be used to “call mom” or “text mom” when needed. This is one of the advantages the application has over similar ones written by big corporations. 5 Voice assistant requires the smartphone to be equipped with a microphone (which can be found in nearly all models nowadays). For some functionality like maps searching, it is advised to enable localization detection in order to narrow the searches for closely related areas. This application does not collect user data, user browser history, user online activity or any other data that is said to be private or confidential. Also no data is stored within application and it cannot be accessed from other device than the one that it is installed on, dramatically reducing the chances of data manipulation. 2.1 Segregation of duties In order to maintain efficiency and reliability of service work has been split into two persons. For development lifecycle it was best to use scrum method [1]. Scrum is a development methodology which main assumptions are being agile and flexible to upcoming changes. Scrum is said be the best solution when it comes to developing a new idea from scratch Such methodology focuses on weekly meetings, short task divided into scrum members. This projects tasks were assigned in such manner, that they would require about a week time to develop. During the weekly meetings, currently assigned tasks were reviewed, future tasks were discussed and a short talk about project direction and possible thresholds. Then each scrum member had a few minutes to present work done during the work period, so that every member knows all about the whole project development. To keep up to date with task assignment and members process, online event organizer had been used. In this case, the organizer was "Trello" [5], which offers task division and their assignment into enrolled scrum members. This tool provides a powerful service to monitor all developments being done as well as history with progress for each individual. With its use the development history is archived and kept clean and transparent. To upkeep the clear view and the least amount of errors, separate works had not been merged until final steps. For this purpose version control system was introduced. The tool that had been used was GitHub, which is an online system to achieve version control. Final version, dated 09.12.2015 can be accessed from GitHub website under user THenry14/Virtual-Assistant-UE [6]. Despite the methods that GitHub uses to deal with differences, changes like renaming the similar variables and variables naming standardization had been changed manually. The segregation of duties was as follows: Wojciech Szymczyk: Backbone of application Design of architecture and layout Read Contacts intent Dialing intent Exceptions handling Additional features (menu, exit option, scenarios option etc) User interface 6 Merging of application Tests Operating in the background (not finished) Michał Czerwień : Standardization on Polish characters Sequential listening Text messages Google maps Alarm clock (not finished) Internet browser Splitting methods into classes 2.2 Platform description Mobile platform or mobile OS is an operating system that is used on the device. It defines mobile device, its structure, available functions, and responses to user actions. While all available platforms support basic actions like phone calls, text messages and internet browsing, there are differences in executing this actions and user customization. The development, programming language and specification changes as per platform selected. While designing for a described use, it is best to choose a platform that offers best solutions for appropriate functions. For mobile applications, there are three distinguishable platforms that cover over 95% of use, that are: Android Windows iOS For this project’s purpose, Android operating system was chosen, because it offers most customization and due to the fact that it is the most used operating system according to operating system usage on mobile devices in year 2015 and Img. 2. Worldwide Smartphone OS Market Share (Share in Unit Shipments). Source: http://www.idc.com/prodserv/smartphone-os-market-share.jsp 7 future years predictions. Also, there is a large growth in mobile devices other than mobile phones, like tables and so on. Basing on below graph it is right to assume, Android is preferable operating system among these devices as well. Img. 3. Operating system share among all mobile devices users. Source: https://www.netmarketshare.com/operatingsystem-market-share.aspx Also Android operating system is easily accessible by programmer. Due to being derived from Linux system, it is being developed under GNU GPL license [10]. Android source code is also released as open source, meaning anyone can look deeply how each thing works, which is beneficial for programmers as source code can be well understood. With huge community of developers actively working on new features and sharing their ideas for anyone willing to cooperate and expand is thriving with opportunities for enthusiasts. 8 2.3 Device specification The main requirement about device is that it operates under Android 4.1 JellyBean operating system or above, has internet connection and a microphone. The application itself was tested on Samsung Galaxy S4, Samsung Galaxy S4 mini (both Android 4.1) phones, an emulator of Google Nexus 5 (running Android 6.0) and Google Nexus 4 (Android 5.1). In order to deliver full compatibility for all the OS versions between 4.1 and 6.0, some functions are not in full developed looking at the potential they had, and application does not meet with Google's newest Material Design flow of layout entirely. That is a price which had to be taken in order to ensure the support for the largest number of devices possible. Img. 4. Nexus 5 (on the left) and Nexus 4 (on the right), both emulated on a virtual machine and running the Virtual Personal Assistant v 0.5 (with codename Chewie) Source: own. 2.4 Programming environment During the development stage Android Studio was used to design and write the application [2]. It is very good programming environment, which offers lots of tools and amenities for both beginners and experts in coding. Like most of applications dedicated to work under Android, the virtual assistant was written in java [4] using special android API [3] which, since Android is open source, is superbly described on numerous android related sites, books and forums. Ones used to look for tips and exact syntaxes while developing the app was official Android site Android for Developers [7] and Tutorials Point [8]. 9 During the designing part, and to ensure best programming practices Stack Overflow QA [9] site was also used, because there can be found many common problems along with tips how to deal with them to deliver the best solutions possible. Img. 5. Android Studio, recommended tool for Android programming, here it's welcome screen. Source: own. 10 3. Source code Source code of this project is released in GitHub under GNU General Public License Version 2, June 1991 [10], meaning it support basic four rules of open-source project, which are: Free program launching in any reason Free to analyse the program runtime and adapting it to user needs. Free to distribute unmodified copy of the program Free to enhance and publish own updates for anyone to use Assuming all those criteria's are filled, then according to FSF, such program can be said to be open-source. 3.1 Main Activity class The MainActivity is a main class of the application – its backbone. It consists of the most important fragments of code of the application. public class MainActivity extends AppCompatActivity { protected static final int RESULT_SPEECH = 1; private ImageButton ButtonToRecord; private Button ButtonToMinimalize; private TextView RecognisedText; Methods method = new Methods(); About about = new About(); String phone = new String(); String name = new String(); String id = new String(); String number = new String(); String nameOfContact = new String(); boolean minimalized = false; boolean isCalling; String choiceMSG = "wiadomość"; String choiceBrowser = "Szukaj"; String choiceAlarm = "budzik"; String choiceMap = "Znajdź na mapie"; String choiceCall = "Zadzwoń"; String test = "test"; Above code presents the main class of the application, which extends AppCompatActivity responsible for handling activities in Android. Next lines are just declaration of variables which are critical for operation of the application or were used for the purposes of various tests. @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); RecognisedText = (TextView) findViewById(R.id.Text); about.AboutText = (TextView) findViewById(R.id.Text); ButtonToRecord = (ImageButton) findViewById(R.id.ButtonToRecord); ButtonToMinimalize = (Button) findViewById(R.id.Minimalize); 11 Next step was to override onCreate method – it is essential to do so to obtain wanted outcomes when the application starts. Here, after the application starts it refers to .xml file called activity_main in order to parse information about main layout of the application – spacing, used fonts and icons etc. Of course the items named in activity_main have to be initialized in the program. ButtonToRecord.setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "pl-PL"); try { startActivityForResult(intent, RESULT_SPEECH); RecognisedText.setText(""); } catch (ActivityNotFoundException a) { Toast t = Toast.makeText(getApplicationContext(), "Device doesn't support speech recognition", Toast.LENGTH_SHORT); t.show(); } } }); Java allows to implement an action listener on a button (in this case on a ButtonToRecord which is responsible for recording the speech) what is mainly a feature which allows to trigger actions after a given event occurs (so when the button is clicked). It then overrides another android built-in method – onClick – which determines actions taken after pressing the button. Here new RecognizerIntent is triggered which just tells system to take recorded sample and using Google Speech Recognition try to convert it to text. Also there is an option added to this recognizer, which ensures Polish language as the default one. Depending on whether the phone meets minimal requirements (see 2.3. Device Specification) the speech is either processed by the intent or an exception is being thrown with information, that “Device doesn’t support speech recognition”. ButtonToMinimalize.setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { if (minimalized) { stopService(new Intent(getApplication(), MinimalizedApplication.class)); minimalized = false; } else { startService(new Intent(getApplication(), MinimalizedApplication.class)); minimalized = true; startActivity(method.hideApp()); } } }); } The OnCreate method is concluded with another Listener, this time on a ButtonToMinimalize – a button which triggers minimization of the application, so it could work in the background. Also this time onClick has to be overridden, changing its 12 behavior to check the state of a global variable minimalized. It is initialized with value “false”, since after the start of the application it works in full screen mode. If the button “HAZE” is clicked, the value changes to “true” and this triggers startService method on MinimalizedApplication intent from other class, which then results in minimizing the app. Going back to full screen mode and pressing “HAZE” button once again results in killing the dime, and changing variable back to “false”, leaving application running on the foreground. This closes the OnCreate. @Override public boolean onCreateOptionsMenu(Menu menu) { getMenuInflater().inflate(R.menu.menu_main, menu); return true; } This block just creates a menu, which can be triggered depending on the version of the OS – by pressing mechanical button on the phone or software one placed on the right corner of the screen. @Override protected void onActivityResult(int requestCode, int resultCode, Intent data) { super.onActivityResult(requestCode, resultCode, data); switch (requestCode) { // in our case speech recognition is chosen all the time since RESULT_SPEECH=1 case RESULT_SPEECH: { if (resultCode == RESULT_OK && null != data) { ArrayList<String> text = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS); RecognisedText.setText(""); RecognisedText.setText(text.get(0)); if (RecognisedText.getText().toString().startsWith(test)) { Intent intent = new Intent("android.intent.action.MAIN"); intent.setComponent(new ComponentName("com.android.mms", "com.android.mms.ui.ConversationList")); startActivity(intent); } if (RecognisedText.getText().toString().startsWith(choiceCall)){ isCalling = true; readContacts(text); makeCall(number); } if (RecognisedText.getText().toString().startsWith(choiceMSG)) { isCalling = false; readContacts(text); String msg = text.get(0); msg = method.checkPolish(msg); startActivity(method.sendSMS(msg, number)); } if (RecognisedText.getText().toString().startsWith(choiceBrowser)) { String page = text.get(0); startActivity(method.browse(page)); 13 } if (RecognisedText.getText().toString().contains(choiceAlarm)) { int hour = 7; int minute = 15; startActivity(method.alarm(hour, minute)); } if (RecognisedText.getText().toString().startsWith(choiceMap)) { String addr = text.get(0); startActivity(method.maps(addr)); } } } break; } } Next the onActivityResult is being overridden, which ensures the wanted activity will be taken after triggering some event – in this case the succussed recognition of words. If the passed speech is recognised correctly and is not considered silence (NULL space) the words converted to plain text are stored in an array list of strings and the decision tree is triggered based on what is stored in this array list. When program finds there some keywords the respective block of code is executed and the action described within it is taken (like eg. making a call). public String readContacts(ArrayList text) { Map<String, String> book = new HashMap<String, String>(); nameOfContact = (String) text.get(0); text.clear(); text.add(nameOfContact); nameOfContact = TextUtils.join(" ", text); if(isCalling) { nameOfContact = nameOfContact.substring(11); } else{ nameOfContact = nameOfContact.substring(9); } ContentResolver cr = getContentResolver(); Cursor cur = cr.query(ContactsContract.Contacts.CONTENT_URI, null, null, null, null); if (cur.getCount() > 0) { while (cur.moveToNext()) { id = cur.getString(cur.getColumnIndex(ContactsContract.Contacts._ID)); name = cur.getString(cur.getColumnIndex(ContactsContract.Contacts.DISPLAY_NAME)) ; if (Integer.parseInt(cur.getString(cur.getColumnIndex(ContactsContract.Conta cts.HAS_PHONE_NUMBER))) > 0) { Cursor pCur = cr.query(ContactsContract.CommonDataKinds.Phone.CONTENT_URI, null, ContactsContract.CommonDataKinds.Phone.CONTACT_ID + " = ?", new String[]{id}, null); while (pCur.moveToNext()) { phone = pCur.getString(pCur.getColumnIndex(ContactsContract.CommonDataKinds.Phone .NUMBER)); 14 book.put(name, phone); } pCur.close(); } } } number = book.get(nameOfContact); //RecognisedText.setText(number + " view from readContacts field" + "\n"); //FOR LOGGING return number; } The ReadContacts method is responsible for reading contacts from the phone’s or SIM card’s memory, so they can be later used in different areas of program. It takes the input from the microphone, ‘cuts’ it to get the name of the contact, and then using cursors checks all the IDs, names and phone numbers in memory just to find the one which was looked for. Having this, it uses a map of strings (similar thing to Python’s dictionary) to store the name of the contact alongside with its number. It was necessary to put this method id main class and not in Methods, because it uses parts of code not accessible from the outside. public void makeCall (String number) { if(number != null) { String numberToCall = "tel:" + number.trim(); Uri Call = Uri.parse(numberToCall); Intent callIntent = new Intent(Intent.ACTION_CALL, Call); startActivity(callIntent); } else { Toast t = Toast.makeText(getApplicationContext(), "No such contact in your phonebook", Toast.LENGTH_SHORT); t.show(); } } MakeCall is used to dial a phone number which was parsed back in ReadContacts method (it takes number as an argument). If the number is not null (so wanted contact indeed exists in the phonebook) the number is trimmed (just to get plain number, without any characters like NULL etc.) and then parsed into intent responsible for making phone calls. When contact is not present in the phonebook a toast message informs about it (later in plans is to ask about similar contacts found and whether call them or not). @Override public boolean onOptionsItemSelected(MenuItem item) { switch (item.getItemId()) { case R.id.action_scenarios: about.AboutScenarios(); return true; case R.id.action_settings: //later on return true; case R.id.action_about: about.AboutText(); return true; case R.id.action_exit: startActivity(method.hideApp()); return true; 15 default: return super.onOptionsItemSelected(item); } } Last fragment of MainActivity relates to menu created previously. It is just all the possible items available from the view of this menu, and they trigger some particular actions (like exiting application, displaying possible scenarios etc.) 3.2 Methods class Methods class is a class in main package speechtotext/Methods.java. It contains most methods that are used in the application. Main activity class creates objects in this class in order to access its functions in main Activity. Starting from top, there are imports, that are necessary in order to use all function in the code that are Android specific, followed by public class "Methods" declaration. package import import import import public nowim.speechtotext; android.net.Uri; android.provider.AlarmClock; android.support.v7.app.AppCompatActivity; android.content.Intent; class Methods extends AppCompatActivity { Method hideApp takes no arguments and creates an Intent, which is a data type in Android Java. Then on this intent Category is added, which is a home category. By setting the FLAG of activity the application is being then hidden or closed. public Intent hideApp () { Intent hideIntent = new Intent(Intent.ACTION_MAIN); hideIntent.addCategory(Intent.CATEGORY_HOME); hideIntent.setFlags(Intent.FLAG_ACTIVITY_NEW_TASK); return hideIntent; } Method sendSMS takes two arguments, that are message (msg) and recipient (caller). Directly from the method it is clear that the program passes whatever the content of the method is, which in this case is the recognized speech sample, to the recipient, which is also part of the recognized sample. This is realized by Android method "putExtra", which implicitly puts content to the intent data structure. Msg variable is shorten by the length of voice command needed to execute the task. Setting the type help android recognizing type of request that the program uses. public Intent sendSMS(String msg, String caller) { //readContacts() -> caller msg = msg.substring(9); /*chocieSMS.length()*/ Intent sendIntent = new Intent(Intent.ACTION_VIEW); sendIntent.putExtra("sms_body", msg); sendIntent.putExtra("address", caller); sendIntent.setType("vnd.android-dir/mms-sms"); // startActivity(sendIntent); 16 return sendIntent; } Method browse takes one argument which is navigation page (page). Variable page is shorten by the voice command used to launch the action. By passing the wanted search into Google browser query, the application open default browser with defined query, which in this case is the recognized speech and then automatically starts the search trough Google search engine. public Intent browse(String page) { page = page.substring(6); /*choiceBrowser.length()*/ String url = "https://www.google.pl/search?q=" + page; Intent browseNET = new Intent(Intent.ACTION_VIEW); browseNET.setData(Uri.parse(url)); return browseNET; } Method alarm takes up to two arguments that are hour and minute. Upon taking this action, program will set an alarm active for the next due date, that is the next occurance of the set hour. Both variables are passed from the Main activity class. They respond accordingly to the hour and minute that the alarm will be set. Due to many errors in recognition, voice setting the hour was discarded. Currently this methods sets alarm active for a hour defined by user manually, Errors contained: Recognized word "siódma" instead of number "7" Not recognizing number "21" at any case Trying to set alarm time over the 24 hour scale public Intent alarm(int hour, int minute) { Intent Alarm = new Intent(AlarmClock.ACTION_SET_ALARM); Alarm.putExtra(AlarmClock.EXTRA_HOUR, hour); Alarm.putExtra(AlarmClock.EXTRA_MINUTES, minute); //Alarm.putExtra(AlarmClock.EXTRA_DAYS,"MONDAY"); return Alarm; } Method maps takes one argument, that is address. Variable address is shorten by the length of voice command that runs it. With using Google package, passing address into maps search is done by parsing the geo location to the search. Good part is, that as mentioned before, due to high integration with Google products, the search for location that was recognized from speech sample starts as close to current location as possible. That means searching for a specific street will first search trough close area to current location. Thanks to this solution, user would not get the same street in a different country. public Intent maps(String address) { address = address.substring(16); /* mapa.length() */ Uri gmmIntentUri = Uri.parse("geo:0,0?q=" + address); Intent mapIntent = new Intent(Intent.ACTION_VIEW, gmmIntentUri); mapIntent.setPackage("com.google.android.apps.maps"); return mapIntent; } Method checkPolish takes one argument, which is message. Currently it is used in text messages. In case this project would be developed, future idea was to add this as a checkbox in application runtime. This method replaces Polish signs in messages. Due to 17 dramatic reduce in text message length because of Polish signs, it switches any Polish language specific characters into associated character in English language. public String msg msg msg msg msg msg msg msg msg return = = = = = = = = = checkPolish(String msg.replace("ć", msg.replace("ś", msg.replace("ą", msg.replace("ę", msg.replace("ó", msg.replace("ż", msg.replace("ź", msg.replace("ł", msg.replace("ń", msg) { "c"); "s"); "a"); "e"); "o"); "z"); "z"); "l"); "n"); msg; } } 3.3 Android manifest Android manifest file is a must have in any application. It represents most important information that is interpreted by Android system. This information is essential in order to run the code. First of all it contains the name of the package that is used within the application. This manifest also describes all the permissions used within the application, that is, for this project: Read contacts Call phone System alert window Read calendar Send sms Read sms Wake lock Set alarm Read Calendar Moreover it defines most information about the application version itself, like the minimum Android API level of application to detect for mobile device. Finally it provides all necessary libraries that must be linked to the application. Below is the code snippet of manifest file with mentioned permissions. <?xml version="1.0" encoding="utf-8"?> <manifest xmlns:android="http://schemas.android.com/apk/res/android" package="nowim.speechtotext" > <uses-permission android:name="android.permission.READ_CONTACTS" /> <uses-permission android:name="android.permission.CALL_PHONE" /> <uses-permission android:name="android.permission.SYSTEM_ALERT_WINDOW"/> <uses-permission android:name="android.permission.READ_CALENDAR" /> <uses-permission android:name="android.permission.SEND_SMS"/> <uses-permission android:name="android.permission.READ_SMS"/> <uses-permission android:name="android.permission.WAKE_LOCK"/> <uses-permission android:name="com.android.alarm.permission.SET_ALARM"/> <uses-permission android:name="android.permission.READ_CALENDAR"/> <application android:allowBackup="true" 18 android:icon="@mipmap/ic_launcher" android:label="@string/app_name" android:theme="@style/AppTheme" > <activity android:name=".MainActivity" android:label="@string/app_name" > <intent-filter> <action android:name="android.intent.action.MAIN" /> <category android:name="android.intent.category.LAUNCHER" /> </intent-filter> </activity> <service android:name="nowim.speechtotext.MinimalizedApplication" > </service> </application> </manifest> 3.4 Minimalized application This class is responsible for application to run in the background. It is not finished, and for now only minimizes application, displaying a clickable dim which can also be freely moved along the screen. The dim is 'always on top' - that is because it works as a service, and to implement such idea a programming forum Stack Overflow [9] was used, to get some inspiration and help with critical parts of code. Img. 6. Screenshots from Samsung Galaxy S4 running the application. On the left the dim in its initial coordinates on the screen, drew on the Message app. On the right the dim drew on a settings screen, relocated by the user according to his/hers will. Source: own. 19 public class MinimalizedApplication extends Service { private WindowManager windowManager; private ImageView appHead; WindowManager.LayoutParams parameters; @Override public void onCreate() { super.onCreate(); windowManager = (WindowManager) getSystemService(WINDOW_SERVICE); appHead = new ImageView(this); appHead.setImageResource(R.mipmap.ic_minimalized_head); As mentioned, the dim works as a service, that is why MinimalizedApplication class implementing the whole idea behind the dim extends Service and additionally needs private WindowManager to be able to ‘draw’ itself on other applications. appHead just indicates the icon of dim, and parameters will be used to locate the dim. Since the dim can be somehow treated as a new application, it has to have its own overridden onCreate. parameters= new( WindowManager.LayoutParams(WindowManager.LayoutParams.WRAP_CONTENT, WindowManager.LayoutParams.WRAP_CONTENT, WindowManager.LayoutParams.TYPE_PHONE, WindowManager.LayoutParams.FLAG_NOT_FOCUSABLE, PixelFormat.TRANSLUCENT); parameters.gravity = Gravity.TOP | Gravity.LEFT; parameters.x = 0; parameters.y = 100; appHead.setOnTouchListener(new View.OnTouchListener() { private int initialX; private int initialY; private float initialTouchX; private float initialTouchY; @Override public boolean onTouch(View v, MotionEvent event) { switch (event.getAction()) { case MotionEvent.ACTION_DOWN: initialX = parameters.x; initialY = parameters.y; initialTouchX = event.getRawX(); initialTouchY = event.getRawY(); return true; case MotionEvent.ACTION_UP: if( (Math.abs(initialTouchX event.getRawX())<5) && (Math.abs(initialTouchY - event.getRawY())<5) ) { Toast t = Toast.makeText(getApplicationContext(), "Clicked", Toast.LENGTH_SHORT); 20 t.show(); } return true; case MotionEvent.ACTION_MOVE: parameters.x = initialX + (int) (event.getRawX() initialTouchX); parameters.y = initialY + (int) (event.getRawY() initialTouchY); windowManager.updateViewLayout(appHead, parameters); return true; } return false; } }); windowManager.addView(appHead, parameters); } The initial coordinates were set in this block of code, and the onTouch responsible for actions after touching the dim was overridden so it now describes possible cases – moving dim up, down, left right and clicking it. @Override public void onDestroy() super.onDestroy(); if (appHead != windowManager.removeView(appHead); } @Override public // return } IBinder onBind(Intent intent) { null) { null; } The MinimalizedApplication end with directives how to ‘destroy’ the dim. 21 4. Application tests The application was tested during the main development stage and right after it by some group of people consisted of: A Polish 22 year old male, also English native-speaker A Polish 26 year old female, also Spanish native-speaker A Polish 6 year old female child, with a slight speech defect (mixing „r” with „l”) A Polish 51 year old male They were all requested to say some sample commands first, and later to just try and use the application for some every day tasks like searching the information on the internet. Most of the exact sentences are in the table no. 1, with indication if the command was recognised properly or not. Since the application was designed to recognise Polish commands and sentences only Polish native-speakers were performing the tests. However, two foreign language speakers who were taking part were asked to try and see if some non Polish words would be also recognised correctly. The main commands to use were: „Zadzwoń do <content>” „Wiadomość <content>” „Szukaj <content>” „Ustaw alarm <content>” „Znajdź na mapie <content>” Additionally, for calculation of recognition efficiency purposes, some of the testers were asked to record some sentences. Table 1. Commands and the content that was put into them, with indication if they passed [OK] or not passed [NOK]. Command Zadzwoń do Content 1. Mario [OK] 2. Alberta [OK] 3. Szymona [OK] 4. Szymon [OK] 5. Mariola [OK] 6. Marioli [OK] 22 Wiadomość 7. Mariusz [OK] 8. Mariusza [OK] 9. Ady [OK] 1. Do Marysi [OK] 2. Do Marysi [NOK] 3. Do Krystiana [NOK] 4. Do Marysi [OK] 5. Do Krystiana [NOK => Do kryształowej kocham Cię] 6. Do Krystiana [OK] 7. Do Alberta kocham Cię [OK] 8. Do Maria [OK] 9. Do Maria kocham Cię [NOK] 10. Do Szymona [OK] 11. Do Szymona kocham Cię [OK] 12. Mamo kup jajka [NOK => Na Zakopiankę] 13. Tato kup jajka na Zakopiance [NOK] 14. Gotuj telefon na zupę [OK] 15. Zakop telefon [NOK => zakup telefon] 16. Ugotuj telefon za łaływankę (bełkot dziecka) [Ugotuj telefon na małżonkę] 17. Alarmowa na pomoc [OK] 18. Do Mario, Mario wpadnij [OK] 19. Do Marysi, one, two, three [do Marysi łan tu fri] 20. Do Alberta, kup dwa ptaszki [NOK => wiadomości Alberta Kolbuszowa ptak] 21. Dla Alberta, kup dwa ptaszki [NOK => na FB tak dwa ptaszki] Szukaj 1. Pet shop [NOK => potrzeba] 2. [OK] 23 3. [OK] 4. Ticiro [NOK =>Tryczyna] 5. Arsenal London Football Club [OK] 6. Jak dojechać do Hamburga z Hamburga [NOK] 7. Słuchaj [NOK] 8. Wyniki lotto [NOK (b.głośno)] 9. Taxi w Olkuszu [OK] 10. Youtube Moda na sukces [OK] 11. Google translate [OK] 12. Adams.pl [OK] 13. Premier League [OK] 14. Serial (pauza) El internado Laguna Negra [NOK => serial] 15. Hotel Mercure Wrocław [OK] 16. Emma Watson [OK] 17. Sherlock Holmes studium w różu [OK] 18. Sherlock Holmes study in pink [OK] 19. Jak żyć [OK] 20. Kiedy Arctic Monkeys przyjadą do Polski [OK] Ustaw alarm 1. Na 17 [NOK => 14] 2. Na 12 [OK] 3. Na 5 [OK] 4. Na 20:00 [NOK => na 20 000] 5. Na 17 [OK] 6. Na 21:00 [NOK => na 25] 7. Na 21 [NOK => na 20 1] 8. Na 20:02 [NOK => na 22] 9. Na 20:03 [NOK => na 23] 10. Na północ [OK] 24 11. Na 00 [OK] 12. Na 15:02 [OK] 13. Na 3 rano [OK] 14. Na 3 po południu [NOK => 3 (rano)] 15. 11:23 [OK] 16. Na 20:07 [NOK => na 2] Znajdź na mapie 1. //error -> application was stopped// 2. Hotel spa [OK] 3. Hotel spa w Zakopanem [OK] 4. Hotel spa w Zakopanym [OK] 5. Szkoła Podstawowa numer 2 w Olkuszu [OK] 6. Dworzec Autobusowy Paryż [NOK] 7. Dworzec Autobusowy w Paryżu [OK] 8. Dworzec MDA w Krakowie [OK] 9. Dworzec autobusowy Madryt [OK] 10. Dworzec w Olkuszu [OK] 11. Sklep spożywczy w Bobrownikach [NOK => sklep spożywczy koniczynka] 12. Bobrowniki koło Włoszczowy [OK] 13. Remiza Strażacka Pilczyca [NOK => remiza strażacka w fizyce] 14. Cafe Vinyl [NOK => cafe winy] 15. Green Day Wrocław [OK] 16. Złote Tarasy w Warszawie [OK] 17. Stadion w Gdańsku [OK] 18. Azory Kraków [OK] 19. Stadion Madryt [OK] 20. Londyn, Ashburton Grove [OK] Some sample sentences 1. Yyyy Olkusz [NOK => a arkusz] 25 2. Kapusta [NOK => usta] 3. Witam państwa [OK] 4. Potężna wichura łamiąc duże drzewa trzciną zaledwie tylko kołysze [OK] 5. Cześć [OK] 6. Nowim [OK] 7. Cojes [OK] 8. Trzy ciufcie [NOK] 9. Mikrofon [OK] 10. Przez stół nie da rady [OK] 11. Nie rozumiem Chewbacci [OK] 12. Słowa, zdania, cokolwiek [OK] 13. Stół z powyłamywanymi nogami [OK] 14. Stół bez nóg [OK] 15. Król Karol kupił Królowej Karolinie korale kolory koralowego [OK] When [OK], the application did exactly what it should. With [NOK] there were some troubles – either application didn’t understand what was said and asked to repeat the sentence, or understood exactly what was said, but didn’t perform wanted action. 4.1 I. Methodology Positive try When the application meets the expectations for given command: Zadzwoń do – application dials a wanted person if his/her number is in the phonebook or displays information toast that we don’t have such contact in the phonebook if this is the case Wiadomość – application creates a new message text (for now only message, without receiver’s info) Szukaj – application connects to the internet, opens a default browser, and using google.com shows us the results of request (or searches similar to requested one) 26 Ustaw alarm – application sets the alarm and shows (using toast message) the information about the triggering of the alarm Znajdź na mapie – application connects to the internet, and using google maps shows us requested location Sample sentences – application displays us exact words we said, with no mistakes II. Negative try When the application results in different outcomes than in point I. Positive try (or does not trigger wanted action). Zadzwoń do – application twists our words (like, instead of Mario app gets Marian) Wiadomość – application twists our words or does not trigger the message intent Szukaj – application cannot connect to the internet or results different outcomes than wanted Ustaw alarm – application sets the alarm on an hour which differs from what we wanted Znajdź na mapie – application wrongly assumes our wanted location Sample sentences – application twists at least one word 4.2 „Zadzwoń do” – „call to” request Conditions: o 30/40 tries in very bad conditions (8 people in a room + tv set turned on) o 10/40 in good conditions (only tester in a room, no external source of sound/noise) Testers: o 6 year old female child – 19 tries o 26 year old female – 8 tries o 51 year old male – 3 tries o 22 year old male – 10 tries 27 Possible contacts: o Mario o Mariusz o Alberto o Mariola o No contact in a phone book Results: o 34 positive tries o 6 negative tries (2 times - don’t understand, can you repeat please? + times twisted names ) 4 Comments: o 4 times name „Alberto” was pronounced with a mistake, 3 times application guessed correctly the name, 1 time it tried to call „Agata” o 1 time the child hesitated when saying „Zadzwoń do (pause) Mario” and the application guessed she said „Zadzwoń do Marysi” o 2 times it twisted „Mariusz” with „Mario”, both times when testing in bad conditions 4.3 o The 2 times when it didn’t understand a word were performed in both good and bad conditions, once in each o The twisting could though be due to the fact, that both Polish and non Polish names were included into tests, and majority of the tests were performed by a child „Wiadomość” – „Message” request Conditions: o 40/50 tries in very good conditions (only tester in a room, no external source of sound/noise) o 10/50 tries in bad conditions (4 people in a room + TV set) Testers: o 6 year old female child – 30 tries o 26 year old female – 10 tries o 22 year old male – 10 tries Results: 28 o 38 positive tries o 12 negative tries Comments: o Most of the negative tries (8) were caused by widely understood noise (blundering, laugh, external sound/noise sources, talking people) o 2 negative tries were due to recognition system failure (don’t understand, can you repeat please?) 4.4 o Sometimes though application understood blundering perfectly (like „Albelto” was understood as it should – „Alberto”) o There were times, when English words were understood by the application, but displayed in phonetic way o The remaining 2 negative tries were when application understood everything correctly, but did not trigger message intent „Szukaj” – „search” request Conditions: o 30/50 tries in very bad conditions (8 people in a room + tv set turned on) o 20/50 in good conditions (only tester in a room, no external source of sound/noise) Testers: o 6 year old female child – 16 tries o 26 year old female – 20 tries o 22 year old male – 10 tries o 51 year old male – 4 tries Results: o 43 positive tries o 7 negative tries Comments: o This request deals very well with non Polish words o Some of the negative tries were when mixing two languages in one sentence o Some of the negative tries were also resulting in „don’t understand .... „, which was due to great noise around 29 o What was disturbing, but happened only once – after 30 consecutive tries application just crashed. Sadly though, after debugging it was hard to tell if it was strictly application’s fault or was it recognition system’s fault. Most possible scenario was that there was a temporary lack of internet connection 4.5 „Ustaw alarm” – „set an alarm” request Disclaimer – this command is not quite finished, and that was a reason for some errors the testers encountered. That is why, from this section, only speech recognition utilities were taken under the account in the summarization of tests. Conditions: o 20/20 tries in neutral conditions (2 people in a room, no additional source of sound/noise) Testers: o 6 year old female child – 5 tries o 26 year old female – 5 tries o 22 year old male – 10 tries Results: o 10 positive tries o 10 negative tries Comments: o The command is somewhat „hardcoded” for one time a day for now, since there were big problems to be solved during development and too little time o Application does not read some more problematic hours properly (like instead of 21 it reads 25 etc.), though this is purely recognition system’s fault o It is problematic to implement distinction between am and pm 30 4.6 „Znajdź na mapie” – „find on the map” request Conditions: o 40/50 tries in very bad concitions (8 people in a room + tv set turned on) o 10/50 tries in good conditions (only tester in a room, no external source of sound/noise) Testers: o 6 year old female child – 4 tries o 26 year old female – 30 tries o 22 year old male – 10 tries o 51 year old male – 6 tries Results: o 34 positive tries o 6 negative tries Comments: o Application handles very well the requested inquiries, no matter if the request was to find eg. „Szkoła podstawowa” or „Podstawówka” o Application also makes almost no mistakes showing exact adress of wanted places, like stadiums o Application shows the nearest possible results – so when one is in Warsaw and looking for „Włochy” most likely will obtain the Warsaw neighbourhood called that and not the country „Włochy” – Italy. o Application struggled when the inquiry was requested with hesitation and it can be read in both negative and positive way – on one hand when it does not read the request in exact way (so waits through the hesitation and reads what is after it) but on the other hand it tries to fit what it read before the hesitation and using the location of the phone shows what is most similar in terms of content to the requested one and delivers the result in pretty effective time frame o Some of the negative tries were probably the results of bad environment conditions 31 4.7 Sample sentences–additional system tests Conditions: o 40/50 tries in very bad conditions (8 people in a room + tv set turned on) o 10/50 tries in good conditions (only tester in a room, no external source of sound/noise) Testers: o 6 year old female child – 3 tries o 26 year old female – 5 tries o 22 year old male – 38 tries o 51 year old male – 4 tries Results: o 41 positive tries o 9 negative tries Comments: o As could be predicted majority of negative tries were in bad conditions (8) o Most of the time (7) it asked to repeat the sentence o When in good conditions application worked nearly perfectly, failing to understand only once o The sentences were in Polish, English and Spanish o Oddly, application didn’t have much trouble with writing phonetically words witch don’t exist in Polish language officially (like „nowim”, which is a regionalism) and English or Spanish ones (but only when appearing in a sentence, the Polish words – when inquiry was consisted of all foreign words from one language, it detected the language just fine. It is due to prioritizing Polish language, which was implemented in code) 32 4.8 Tests round up Table 2. Tests round up. Name Number/percentage Total requests/sentences 260 Positive tries (when everything was fine) 200 Negative tries (at least one thing went wrong) 60 Positive recognition tries (with alarm) 76.92% Negative recognition tries (with alarm) 23.08% Positive application response 79.17% Negative application response 20.83% 5. Discussion What was done: Simple graphical layout, getting the contacts list, handling calls, text messages, internet browsing, map search, partially alarm clock, application working in background, simple application settings with "about application" section, application icon. Apart from those, a huge research on the topic had been going. During this research, voice recognition had been tested, analysed and investigated. Also it allowed for further discovery of voice sample recognition engine, as well as user reaction and impression about the application. Throughout the project, development resulted in improved object oriented skills, application structure management knowledge and overall Java programming abilities. Due to scale of the project, some task management skill had been involved, including splitting the work into blocks, work assignment and progress monitoring. That helped to understand the complexity and workflow of developing full application with all its components. What could be improved: In case of any further releases, versions or patches any not reliable function should be improved and any method that is partially implemented should be finished. First of all a clever implementation of alarm clock would be in place. Sending text messages or phone calls could be enhanced by improving matching with contacts found. Moreover by default application working in the background could be run by clicking an 33 icon box. Furthermore there could also be any case of saving speech samples as notes for future uses, such that it would save in current date within calendar or to a separate file in a specific folder. Finally there are two more things that came hard to implement in this code structure due to activity the system uses and code organization. Those things are reminders and full calendar synchronization. As said, it was decided to do not include this in the final project as reminders were causing errors, because they failed to save and full calendar integration failed mostly due to different Android versions on different phones. For example the calendar is not visible at all on Samsung Galaxy S4 running Android 4.1 (additionally causing the application to start not almost immediately but with about 15 seconds delay, which is not acceptable), while working without any problems with Google Nexus 5, on Android 6. Conclusions from tests: The application reacts in almost 80% cases properly on requests. It properly reads the sentences and commends (over 75% cases – in comparison Google claims that the system works in 92% cases correctly [11]), also in very bad conditions, given by adults and small children. That could be the positive sign considered the scope of potential users. The VPA may facilitate usage of mobile phones by children and elders or some disabled people (e.g.. with seeing problems). Easy, intuitive commends which are understood by the application are something like a core of this product, which was highlighted by majority of testers. The advantage of this application can be the fact that it is open source, does not collect and send anywhere any user data and can be easily scaled or suited for specific user by any programmer with at least basic programming skills. Although, for now, application is not very advanced and probably lacks some more commands, still meets expectations of the testers asked. 34 6. Bibliography [1] Kenneth S. Rubin. Essential Scrum: A Practical Guide to the Most Popular Agile Process, Michigan, First printing, July 2012 [2] Paul Deitel, Harvey Deitel, Abbey Deitel. Android for programmers an app-driven approach. Crawfordsville, Indiana, December 2013 [3] Reto Meier. Professional Android Application Development, Indianapolis 2009 [4] Bruce Eckel. Thinking in Java. Fourth Edition, Helion, Gliwice 2006 [5] Online event organiser "Trello" http://www.trello.com [6] Direct link to application's Version Control System on GitHub http://www.github.com/THenry14/Virtual-Assistant-UE [7] Android for Developers http://developer.android.com/index.html [8] Android tutorials and tips about programming http://www.tutorialspoint.com/android/ [9] Stack Overflow Question/Answer forum for programmers http://stackoverflow.com/ [10] Official GNU website http://www.gnu.org/licenses/gpl-3.0.en.html [11] Jordan Novet. Google says its speech recognition technology now has only an 8% word error rate. Access: December 23rd, 2015. http://venturebeat.com/2015/05/28/google-says-its-speech-recognition-technology-nowhas-only-an-8-word-error-rate/ 35