Cover Story Article Article Article Article Research Front Cover Story
Transcription
Cover Story Article Article Article Article Research Front Cover Story
` 50/ISSN 0970-647X | Volume No. 36 | Issue No. 2 | May 2012 Cover Story Cover Story Desi Language Computing on the Rise 5 “Correcting” SMS Text Automatically 9 Research Front Article Approximate/Fuzzy String Matching using Mutation Probability Matrices 12 Emails and Web Pages in Local Languages 14 Article Article A Speech-to-Text System 18 Opinion Mining and Sentiment Analysis 22 Article Telemedicine in the State of Maharashtra: A Case Study 24 www.csi-india.org Practitioner Workbench Programming.Tips() » Passing Variable Number of Arguments in C 29 Practitioner Workbench CIO Perspective Programming.Learn (“Python”) » Plotting with Python 30 Managing Technology » Business Information Systems: Underlying Architectures 31 CSI Communications | May 2012 | B www.csi-india.org CSI Communications Contents Volume No. 36 • Issue No. 2 • May 2012 Cover Story Editorial Board Chief Editor Dr. R M Sonar Editors 5 9 Dr. Debasish Jana Dr. Achuthsankar Nair Resident Editor Mrs. Jayshree Dhere Desi Language Computing - on the Rise Hareesh N Nampoothiri Published by Executive Secretary Mr. Suchit Gogwekar For Computer Society of India Design, Print and Dispatch by CyberMedia Services Limited Please note: CSI Communications is published by Computer Society of India, a non-profit organization. Views and opinions expressed in the CSI Communications are those of individual authors, contributors and advertisers and they may differ from policies and official statements of CSI. These should not be construed as legal or professional advice. The CSI, the publisher, the editors and the contributors are not responsible for any decisions taken by readers on the basis of these views and opinions. Although every care is being taken to ensure genuineness of the writings in this publication, CSI Communications does not attest to the originality of the respective authors’ content. © 2012 CSI. All rights reserved. Instructors are permitted to photocopy isolated articles for non-commercial classroom use without fee. For any other copying, reprint or republication, permission must be obtained in writing from the Society. Copying for other than personal use or internal reference, or of articles or columns not owned by the Society without explicit permission of the Society or the copyright owner is strictly prohibited. 27 29 Programming.Tips() » Passing Variable Number of Arguments in C “Correcting” SMS Text Automatically Deepak P and L Venkata Subramaniam 12 14 18 22 24 Satyam Maheshwari and Sunil Joshi Practitioner Workbench Research Front Approximate/Fuzzy String Matching using Mutation Probability Matrices Dr. Debasish Jana 30 Programming.Learn (“Python”) » Plotting with Python Articles CIO Perspective Emails and Web Pages in Local Languages M Jayalakshmi 31 Managing Technology » Business Information Systems: Underlying Architectures Sajilal Divakaran and Achuthsankar S Nair Advisors Dr. T V Gopal Mr. H R Mohan Technical Trends Extending WEKA Framework for Learning New Algorithms Dr. R M Sonar A Speech-to-Text System Nishant Allawadi and Parteek Kumar Opinion Mining and Sentiment Analysis Jaganadh G Telemedicine in the State of Maharashtra: A Case Study Randhir Kumar, Dr. P K Choudhary, and S M F Pasha Umesh P Security Corner 35 36 Information Security » Cyber Crimes on/by Children Adv. Prashant Mali IT Act 2000 » Prof. IT Law Demystifies Technology Law Issues: Issue No. 2 Mr. Subramaniam Vutha PLUS ICT@Society: Graphic Texting 37 Achuthsankar S Nair Brain Teaser 38 Dr. Debasish Jana Ask an Expert 39 Dr. Debasish Jana Happenings@ICT: ICT News Briefs in April 2012 40 H R Mohan CSI Report Prof. Dipti Prasad Mukherjee and Dr. Dharm Singh 41 CSI News 43 Published by Suchit Gogwekar for Computer Society of India at Unit No. 3, 4th Floor, Samruddhi Venture Park, MIDC, Andheri (E), Mumbai-400 093. Tel. : 022-2926 1700 • Fax : 022-2830 2133 • Email : [email protected] Printed at GP Offset Pvt. Ltd., Mumbai 400 059. CSI Communications | May 2012 | 1 Know Your CSI Executive Committee (2012-13/14) » President Mr. Satish Babu [email protected] Vice-President Prof. S V Raghavan [email protected] Hon. Treasurer Mr. V L Mehta [email protected] Immd. Past President Mr. M D Agrawal [email protected] Hon. Secretary Mr. S Ramanathan [email protected] Nomination Committee (2012-2013) Dr. D D Sarma Mr. Bipin V Mehta Mr. Subimal Kundu Region - I Mr. R K Vyas Delhi, Punjab, Haryana, Himachal Pradesh, Jammu & Kashmir, Uttar Pradesh, Uttaranchal and other areas in Northern India. Region - II Prof. Dipti Prasad Mukherjee Assam, Bihar, West Bengal, North Eastern States and other areas in East & North East India Region - III Mr. Anil Srivastava Gujarat, Madhya Pradesh, Rajasthan and other areas in Western India Region - IV Mr. Sanjeev Kumar Jharkhand, Chattisgarh, Orissa and other areas in Central & South Eastern India Region - V Prof. D B V Sarma Karnataka and Andhra Pradesh Region - VI Mr. C G Sahasrabudhe Maharashtra and Goa Region - VII Mr. Ramasamy S Tamil Nadu, Pondicherry, Andaman and Nicobar, Kerala, Lakshadweep Region - VIII Mr. Pramit Makoday International Members Regional Vice-Presidents Division Chairpersons, National Student Coordinator & Publication Committee Chairman Division-I : Hardware (2011-13) Dr. C R Chakravarthy [email protected] Division-II : Software (2012-14) Dr. T V Gopal [email protected] Division-IV : Communications (2012-14) Mr. Sanjay Mohapatra [email protected] Division-V : Education and Research (2011-13) Dr. N L Sarda [email protected] Division-III : Applications (2011-13) Dr. Debesh Das [email protected] National Student Coordinator Mr. Ranga Raj Gopal Publication Committee Chairman Prof. R K Shyamsundar Important links on CSI website » Structure & Organisation http://www.csi-india.org/web/csi/structure National, Regional & http://www.csi-india.org/web/csi/structure/nsc State Students Coordinators Statutory Committees http://www.csi-india.org/web/csi/statutory-committees Collaborations http://www.csi-india.org/web/csi/collaborations Join Now http://www.csi-india.org/web/csi/join Renew Membership http://www.csi-india.org/web/csi/renew Member Eligibility http://www.csi-india.org/web/csi/eligibility Member Benefits http://www.csi-india.org/web/csi/benifits Subscription Fees http://www.csi-india.org/web/csi/subscription-fees Forms Download http://www.csi-india.org/web/csi/forms-download BABA Scheme http://www.csi-india.org/web/csi/baba-scheme Publications http://www.csi-india.org/web/csi/publications CSI Communications* http://www.csi-india.org/web/csi/info-center/communications Adhyayan* http://www.csi-india.org/web/csi/adhyayan R & D Projects http://csi-india.org/web/csi/1204 Technical Papers http://csi-india.org/web/csi/technical-papers Tutorials http://csi-india.org/web/csi/tutorials Course Curriculum http://csi-india.org/web/csi/course-curriculum Training Program http://csi-india.org/web/csi/training-programs (CSI Education Products) Travel support for International http://csi-india.org/web/csi/travel-support Conference eNewsletter* http://www.csi-india.org/web/csi/enewsletter Current Issue http://www.csi-india.org/web/csi/current-issue Archives http://www.csi-india.org/web/csi/archives Policy Guidelines http://www.csi-india.org/web/csi/helpdesk Events http://www.csi-india.org/web/csi/events1 President’s Desk http://www.csi-india.org/web/csi/infocenter/president-s-desk * Access is for CSI members only. ExecCom Transacts http://www.csi-india.org/web/csi/execcom-transacts1 News & Announcements archive http://www.csi-india.org/web/csi/announcements CSI Divisions and their respective web links Division-Hardware http://www.csi-india.org/web/csi/division1 Division Software http://www.csi-india.org/web/csi/division2 Division Application http://www.csi-india.org/web/csi/division3 Division Communications http://www.csi-india.org/web/csi/division4 Division Education and Research http://www.csi-india.org/web/csi/division5 List of SIGs and their respective web links SIG-Artificial Intelligence http://www.csi-india.org/web/csi/csi-sig-ai SIG-eGovernance http://www.csi-india.org/web/csi/csi-sig-egov SIG-FOSS http://www.csi-india.org/web/csi/csi-sig-foss SIG-Software Engineering http://www.csi-india.org/web/csi/csi-sig-se SIG-DATA http://www.csi-india.org/web/csi/csi-sigdata SIG-Distributed Systems http://www.csi-india.org/web/csi/csi-sig-ds SIG-Humane Computing http://www.csi-india.org/web/csi/csi-sig-humane SIG-Information Security http://www.csi-india.org/web/csi/csi-sig-is SIG-Web 2.0 and SNS http://www.csi-india.org/web/csi/sig-web-2.0 SIG-BVIT http://www.csi-india.org/web/csi/sig-bvit SIG-WNs http://www.csi-india.org/web/csi/sig-fwns SIG-Green IT http://www.csi-india.org/web/csi/sig-green-it SIG-HPC http://www.csi-india.org/web/csi/sig-hpc SIG-TSSR http://www.csi-india.org/web/csi/sig-tssr Other Links Forums http://www.csi-india.org/web/csi/discuss-share/forums Blogs http://www.csi-india.org/web/csi/discuss-share/blogs Communities* http://www.csi-india.org/web/csi/discuss-share/communities CSI Chapters http://www.csi-india.org/web/csi/chapters Calendar of Events http://www.csi-india.org/web/csi/csi-eventcalendar Important Contact Details » For queries, correspondence regarding Membership, contact [email protected] CSI Communications | May 2012 | 2 www.csi-india.org President’s Message Satish Babu From : [email protected] Subject : President’s Desk Date : 1st May, 2012 Dear Members CSI organized its customary joint ExeCom on 31st March and 1st April, 2012 where the 2011-12 ExeCom demitted office and the new ExeCom took charge. The ExeCom meeting held on 1st April, 2012, discussed several important policy matters and also started the process of constitution of the statutory committees that would steer the activities of CSI during the year. These yearly start-up processes would be completed latest by the month of May, so that they can get going with their business. WITFOR: One of the first events of the year that was supported by CSI, was the 5th IFIP World IT Forum (WITFOR), held in New Delhi during 16th-18th April, 2012. The Conference, attended by over 950 delegates and over 80 speakers from India and abroad, was organized in partnership with the Department of Electronics and Information Technology (DEITY), Government of India. The National Organizing Committee of the Forum was headed by the Union Minister of Communications & IT, Mr. Kapil Sibal, who inaugurated the Forum at Vigyan Bhawan. The speakers at the Conference also included the Minister of State for Communications & IT, Mr. Sachin Pilot. The 2-day event focused on the developmental opportunities offered by digital technologies in the areas of agriculture, education, e-Gov, and health. Nashik Chapter’s 25th Anniversary: It is a pleasure to note that CSI’s Nashik Chapter is entering their 25th year of activity in 2012. One of the very active chapters of CSI, Nashik Chapter has been privileged to carry out a number of important activities for its members and other stakeholders, and also contribute to the national leadership of CSI. I wish the Nashik Chapter, its leaders, and members many more years of adding value to the CSI community and to society at large. • 4th International Conference on Human Computer Interaction held during 18th-21st April, 2012 at Symbiosis Institute of Design (SID), Pune, organized by IFIP TC-13. Many thanks to Prof. Anirudh Joshi. • RACSS-2012: International Conference on Recent Advances in Computing and Software Systems held during 25th-27th April, 2012 at Dept. of CSE, SSN College of Engineering, Chennai. I convey my sincere thanks to the joint organization committee of CSI Chennai Chapter & Division IV, IEEE Madras Section, and IEEE CS. As we get going with the current year, it is important to plan for different events for the year, in particular Conferences, which form an important segment of our activities, and also contribute to the financial stability of CSI. The formal call for proposals for events will be put forth shortly, and I request you to start the process of planning events in your locations. Chapter AGMs and New Office Bearers: In most chapters of CSI, the Annual General Meetings have been conducted and the new chapter Office Bearers have taken charge. CSI is keen that all chapter Office Bearers - especially those new to CSI get adequate support when they require it, particularly about the conduct of the business of the chapter and for the conduct of events. The key resources for support are your Regional Vice President and the CSI HQ. Membership Growth: Membership growth is a high-priority area for CSI. While the growth in student membership is satisfactory, the growth in professional and institutional membership has potential for improvement. We are examining different mechanisms to enhance professional membership and attract the new IT professional to CSI. One of the means of doing this is to join hands with other societies, including international societies, to provide additional value to our members. Another mechanism being explored is the use of social media to build a more accessible community. We hope to put in place some of these steps over the next two months for stimulating membership growth. Kindly contact your RVPs and the CSI HQ Helpdesk (helpdesk@ csi-india.org) for any aspect where you need support. With greetings CSI Events during April: I convey my sincere appreciation to organizers of following events that took place during the month of April, 2012. Satish Babu President CSI Communications | May 2012 | 3 Editorial Rajendra M Sonar, Achuthsankar S Nair, Debasish Jana and Jayshree Dhere Editors Dear Fellow CSI Members, It’s pleasure to bring to you CSIC issue with cover story on ‘Linguistic Computing’. Computers have affairs with both programming languages and natural languages. With the wider penetration of ICT in society, especially in the form of mobile phones, the affair with natural languages is becoming more central. While in the case of the programming languages it was the programmer who was struggling, in case of natural language computing, the challenge is really for the computer. In a country like India, which is a linguistic cauldron, the problem of linguistic computing is amplified. Organised efforts are on in India towards this end. Technology Development for Indian Languages (TDIL) programme launched by the Ministry of Communication & Information Technology (MC&IT), Govt. of India aims at developing systems to facilitate human-machine interaction without language barrier; creating and accessing multilingual knowledge resources; and integrating them to develop innovative user products and services. The programme also promotes language technology standardization through participation in ISO, UNICODE, World-Wide-Web consortium (W3C) and BIS (Bureau of Indian Standards). Of course, Google is an important player in the scene as the whole world and its languages are of concern to it. Technology Development for Indian Languages (TDIL) programme launched by the Ministry of Communication & Information Technology (MC&IT), Govt. of India aims at developing systems to facilitate humanmachine interaction without language barrier; creating and accessing multilingual knowledge resources; and integrating them to develop innovative user products and services. In this issue we have an assortment of articles that touch basic settings and services related to the use of language on the web and in mobile phones to selected microscopic applications such as sentiment analysis. (We suppose that readers have noted that the cover page depicts the CSI web site translated into various Indian languages by on-line tools). Hareesh Namboothiri in his cover story article titled “Desi Language Computing on the Rise” introduces basic desi-language settings and services in computers and mobile phones. Another cover story article on “ ‘Correcting’ SMS Text Automatically” by P. Deepak and L. Venkata Subramaniam of IBM Research provides insight into challenges posed by unusual abbreviations, shortening and omissions, textese or SMS language to conventional electronic processing of text. Research Front column brings an article titled “Approximate/ Fuzzy String Matching using Mutation Probability Matrices” by Sajilal D and Achuthsankar S Nair. The article addresses fuzzy/ approximate string matching in Indian languages. Three other articles on the cover topic are specialised articles in the Articles section. Article on “Emails and Web Pages in Local Languages” CSI Communications | May 2012 | 4 by M. Jayalakshmi supplements and complements the first cover story article. Mr. Nishant Allawadi and Prof. Parteek Kumar of Thapar University in an article titled “Speech-to-Text System”, present speech to text conversion using Hidden Markov Model (HMM). Concept of sentiment analysis is introduced briefly by Jaganadh G in his article titled “Opinion Mining and Sentiment Analysis”. Articles section also includes an article titled "Telemedicine in the State of Maharashtra: A Case Study" by S M F Pasha, Randhir Kumar and Dr. P K Choudhary based on their paper submitted at SEARCC 2011. Technical Trends section is enriched with an article on “Extending WEKA Framework for Learning New Algorithms” by Mr. Satyam Maheshwari and Mr. Sunil Joshi. Google is an important player in the scene as the whole world and its languages are of concern to it. Practitioner Workbench column has a section titled Programming.Tips() and it provides an interesting write-up on “Passing Variable Number of Arguments in C” by Dr Debasish Jana. The other section called Programming.Learn("Python") under Practioner Workbench includes information about "Plotting with Python". Managing Technology section of the CIO Perspective column includes an article titled “Business Information Systems: Underlying Architectures” by Dr. RM Sonar. It is the third article in the series of articles on Business Information Systems. It throws light on various types of architecture starting from single-tier to web-based multi-tier architecture and discusses key benefits and key issues of the respective systems. Information Security section of the Security Corner feature has an article titled “Cyber Crimes on/by Children” written by Advocate Prashant Mali. The article starts with two cases and then goes about explaining how a child can be at risk in cyber space and how computing platform can be used for committing crime by children. The IT Act section under Security Corner comes with an article by Advocate Mr. Subramaniam Vutha, wherein he demystifies technology law and provides inputs on electronic (Internet-based) contract. Our ICT@Society covers a curio theme "Graphic Texting". As usual there are other regular features such as Brain Teaser, Ask an Expert and Happenings@ICT. CSI Reports and CSI News are about various region, SIG, chapter and student branch events. Please note that we welcome your feedback, contributions and suggestions at [email protected]. With warm regards, Rajendra M Sonar, Achuthsankar S Nair, Debasish Jana and Jayshree Dhere Editors www.csi-india.org Cover Story Hareesh N Nampoothiri University of Kerala, Thiruvananthapuram Desi Language Computing - on the Rise English was the first language that got placed in modern computer systems and naturally got accommodated exclusively, to the disadvantage of the other world languages. From the mnemonics used in assembly language, to the programming language keywords, to operating system commands, English embedded itself. Some early programming languages like COBOL almost sounded like English of nonnative speakers of the language. It is easy to weave an Anglo-centric conspiracy story, but in all fairness to the professionals of the yesteryears, it must be remembered that computers were not foreseen then as gizmo gadgets that ordinary citizens all over the world would own. As the popularity of the notebooks, netbooks, and mobile devices shot up, the language problem began to take a central stage and naturally multiple solutions began to emerge. Perhaps the turning point in language computing is the emergence of the Unicode. Unicode is simply a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems[1]. It set the stage for an organized development of a large number of linguistic computing issues. Even though the first version of Unicode was introduced in October 1991, it became popular only in the last decade. As of now, Unicode supports a long list of languages including Indian languages such as Bengali, Hindi, Kannada, Malayalam, Oriya, Tamil, Telugu etc. Now software developers come up with different language packs for different regions and computers are becoming truly desi in this aspect. An example is Microsoft's CLIP (Caption Language Interface Pack) for Visual Studio 2010 in which the author was also associated for developing a language interface pack. Apart from reaching a wider audience through incorporating as many languages as possible, Unicode also opens a wide range of possibilities for developers and service providers to come up with language-based tools and applications for common man. It is not surprising that Google is the one in the lead, tapping the possibilities in this sector. We introduce below a few of the language-based tools from Google. Google Translate Google Translate is a free translation service from Google, which provides instant translations between 65 different languages (as of Apr 2012) including some of the major Indian languages like Bengali, Gujarati, Hindi, Tamil, Telugu, and Urdu. Google Translation enables the users to translate words, paragraphs of text, or a whole website (using the Translator toolkit) from one language to another. According to Google the service aims to make information universally accessible and useful, regardless of the language in which it’s written[3]. How does it work? Google describes the working of Google Translate as follows: When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation for you. By detecting patterns in documents that have already been translated by human translators, Google Translate can make intelligent guesses as to what an appropriate translation should be. This process of seeking patterns in large amounts of text is called "statistical machine translation". Since the translations are generated by machines, not all translation will be perfect. The more human-translated documents that Google Translate can analise in a specific language, the better the translation quality will be. This is why translation accuracy will sometimes vary across languages[3]. In Practice Let's see how it becomes useful in practice by trying to translate a simple paragraph from English to Hindi (Fig. 1). Of course, it What is Unicode? In early days, there were many different encoding systems for characters used in computers. These encoding systems used to conflict with one another. That is, two encoding systems may use the same number to represent two different characters or they may use different numbers for the same character. As a result, any given computer was required to support many different encoding systems and even after that the chances of getting data corrupted was very high. To solve this issue, Unicode provides a unique number for every character irrespective of the platform, application, or language. The Unicode Standard has been adopted by most of the leading players of the industry such as Apple, Microsoft, Oracle, IBM, Sun etc. Also it is required by modern standards such as XML, Java, JavaScript, WML etc. It is supported in many operating systems (including Linux distributions), all modern browsers, most of the recent versions of office suites, and many other applications. The Unicode Consortium, a non-profit organization, is dedicated to develop, extend, and promote use of the Unicode standard. According to them the advantage of using Unicode is: Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption[2]. CSI Communications | May 2012 | 5 Fig. 1: Google Translator Page: http://translate.google.com/ does not produce a grammatically correct translation, but it does produce a useful text in Hindi. Apart from providing the translation of the text, it also provides the phonetic rendition of the text in English. One can hear the translated text by clicking the speaker icon. There is also an option to rate the resulting translation by clicking the tick mark. One can rate a particular translation as Helpful, Not helpful, or Offensive. The tool also offers alternative translations and an option to re-order blocks of words for reconstructing the translated sentence (Fig. 2). So what about translating from one Indian language to English? For that we need to type-in the text in the required Indian language. There is another tool from Google, Google Transliteration (still Fig. 2: The tool suggests alternative translation when the user click and hold on a block of words CSI Communications | May 2012 | 6 Translate source window itself. Another option is to copy-paste the typed text from Google Transliteration window. Note: Apart from Google Transliteration, there are many online and offline tools available, that will help you to type-in text in Indian languages. For Windows-based systems one may use Indic Input 2 (for Windows Vista / 7) or Indic Input 1 (for Windows XP). By installing this tool, one can type-in text in any text editor (such as Notepad, Wordpad, LibreOffice, Writer etc.) by enabling the phonetic keyboard and selecting the appropriate Unicode font. The tool can be downloaded freely from the BhashaIndia website. URL: http://bhashaindia.com/Downloads/ Here are some amusing translation examples – the lyrics of a Hindi film song (Fig. 4) and our national anthem (Fig. 5). When the Hindi film song lyrics are translated, the tool produces acceptable results but the translation for the national anthem is amusing, to say the least. In short, for simple functional sentences it produces better translations and for creative writings (such as poems) the results may not be of utility. Developers can integrate the application in the websites and it in labs) that will help you to type in other languages without learning the actual keys corresponding to the alphabets of that particular language. Here we will type 'mera bhArath mahaan' to get 'esjk kjr egku' in Hindi. The transliteration window (Fig. 3) provides required options to edit and format the text. Google provides transliteration API that helps the developers to enable transliteration facilities in their websites. The transliteration API is incorporated in Google Translate as well. When a language other than English is selected in Google Translate source window, an option to enable phonetic typing will be available. By enabling the option, one can directly type-in Fig. 3: Google Transliteration window the required text in the www.csi-india.org Alternatively, you may install Google Toolbar or get a bookmark for your language from the Tools and Resources page. URL: http://translate.google.com/ translate_tools Mobiles & Tablets Too Go Desi! Fig. 4: Hindi film song lyrics translated to English automatically translates the website to another language according to the choice selected by the user (Fig. 6). Even though the tool does not produce acceptable results all the time, it will be useful in translating websites to local languages (or foreign languages) using the Translator Toolkit provided by late?tl=hi&u=http://www.csi-india.org The tl (target language) parameter corresponds to the language of your choice (hi for Hindi, tl for Tamil, bn for Bengali and so on) and u is the URL of the website you wish to translate. The translated version of the CSI website is shown in in Fig. 7. Fig. 5: National Anthem translated from Bengali to English Google. At least the users will get some idea about the contents of the website instead of seeing the website in some alien language. What if the website does not provide a translation option by default? Still, it is possible to view the website in a language of your choice. For example, Computer Society of India website does not have an option to switch between languages. But still it is possible to display the website in Hindi or in any one of the 65 languages provided by Google Translator. If you wish to see the CSI website in Hindi, enter the following URL in the address bar: h t t p : //t ra n s l a t e . go o g l e .co m /t ra n s It is not happening with computers alone. Most of the modern mobile devices (Smartphones, tablets etc.) boast the power of computers we had three decades back. Apple Lisa[4] (released in Jan 1983), the first personal computer which offered GUI, had the processing power of Motorola 68000 @ 5 MHz. Now the medium range smartphone, Motorola Defy has 800 Mhz processor. If the memory of Apple Lisa was 1 MB RAM (In Lisa 2 only Apple introduced 10MB internal hard disk drive!), Motorola Defy has 512 MB RAM, 2 GB internal storage, and it supports microSDHC upto 32 GB! The tablets currently available in the market are even more powerful and we may consider them as minicomputers, only difference being the lack of input devices like keyboard and mouse (Of course, they permit to add them too via Bluetooth or USB!). Mobile devices are becoming more popular and the manufacturers are trying to reach mass public by incorporating local language support in their mobile devices. Clearly, the 'desification' is not going to happen in computers alone but it will extend to mobile devices as well. Many of the devices produced by various cell phone/tablet manufacturers like Nokia, Sony, Samsung, LG, Motorola etc. already allow the users to select a language for the phone interface. Entering and displaying Indic languages directly in mobile devices (for sending messages, for contact details, for writing notes etc.) is still in the development stages. Apple, the leading mobile device manufacturer, provides local language support in Fig. 6: Sample website with Google Translate enabled using the API. When the user scrolls over the text, the original text will be displayed as a tool-tip dialogue CSI Communications | May 2012 | 7 Fig. 7: CSI website translated to Hindi their iPhones and iPads based on iOS mobile operating system. Even though many of the other devices from various manufacturers do not have native support for Unicode, there are device specific work-arounds available for incorporating Unicode functionality in those mobile devices, especially for devices based on Android platform. Android-based devices from Samsung, LG etc. comes with support for Indian languages by default. In some mobiles, in the keypad itself, the Hindi alphabets are printed along with English alphabets to make entering the text easy as possible. Fig. 8 shows a lowend Android mobile phone from LG using Google Translate. The text produced is then copy-pasted to a message and send. If the party receiving the message has a mobile device with Unicode support, then the text will be rendered correctly or else the receiver will get a series of squares instead of the actual message. It is very obvious that developments in Indian language computing have moved very much to web and mobile platform rather than as stand-alone applications on PCs. The demand for these tools now arise from the common man and not from business or universities. That explains the vibrancy of this field in this current times. References [1] Wikipedia – Unicode http://en.wikipedia.org/wiki/Unicode [2] What is Unicode? http://www.unicode.org /standard/ WhatIsUnicode.html [3] About Google Translate http://translate.google.com/about/intl/ en_ALL/ [4] Wikipedia - Apple Lisa http://en.wikipedia.org/wiki/Apple_Lisa n Fig. 8: Hindi text displayed on an Android mobile phone About the Author Hareesh N Nampoothiri is a visual design consultant with an experience of more than a decade and worked with government organizations like C-DIT, C-DAC, University of Kerala, and other private organizations. Currently, he is doing interdisciplinary research in ethnic elements in visual design in computer media. He is an author of two books on graphic design and a regular contributor in leading technology magazines including CSI Communications. Kathakli, blogging, and photography are his passions. He has directed a documentary feature on Kathakali and also directed an educational video production for IGNOU, New Delhi. CSI Communications | May 2012 | 8 www.csi-india.org Cover Story Deepak P* and L Venkata Subramaniam** * IBM Research - India, Bangalore; [email protected] ** IBM Research - India, New Delhi; [email protected] “Correcting” SMS Text Automatically Abstract With the rapidly increasing penetration of mobile phones and microblogging, texting language is fast becoming the language of the youth. Characterized by unusual abbreviations, shortening, and omissions, textese or SMS language poses a challenge to conventional electronic processing of text. In this article, we present an overview of recent work on automatically cleaning SMS text. Introduction SMS language, also called textese, is becoming increasingly popular with widespread usage of SMS and microblogging sites to share information. Normalization of text written in such lingo, i.e. conversion to their clean versions, is a necessary prerequisite to enable electronic processing of such text. Conversion of SMSes to non-noisy versions would aid improved speech synthesis to help visually impaired mobile phone users. Clean SMSes can be accurately translated automatically, thus enabling seamless SMS communication between users of different natural languages. Noise in text is defined as any kind of difference in the surface form of an electronic text from the intended, correct, or original text[6]. Under such a definition, SMS language would qualify to be very noisy. The types of noise in SMS text have been classified[1,6] into various categories such as character deletion, phonetic substitution, and word deletion. Common categories of noise and their examples at the word or phrase level are tabulated in Fig. 1. Many a time, combinations of noise categories may be used to shorten long words. For example, tomorrow may often be transformed to 2mro using a combination Type of Noise of phonetic substitution (“to” transformed to “2”) and character deletion. The same word may be transformed by different users to different kinds of noisy variants. The single word, tomorrow, was observed to manifest in 16 different forms[3,7] in a corpus of thousand SMSes; a few of them are illustrated in Fig. 2. SMS normalization refers to the task of converting SMS text that could be noisy into their intended non-noisy form. Thus, an SMS normalization technique could potentially transform the noisy SMS itll b gud 2 c u tonite to the clean version it will be good to see you tonight. Most SMS normalization techniques need a set of noisy SMSes and their clean versions that may have to be manually generated, referred to as the training set. A machine learning algorithm then works on such pairs to learn a model. This learning process is illustrated in Fig. 3. A simplistic learner may simply learn a set of conditional probabilities as a model, with p(w’|w) denoting the probability that the noisy word w is actually a variant of the non-noisy word w’: p(w'|w)= # SMSes where w and w' occur in the noisy and clean version respectively # SMSes where w occurs in the noisy version The normalization phase uses the learned model to normalize (clean) a noisy input SMS and output the clean SMS. Our simple model could be used to replace each word, w, in the noisy SMS by 2moro tomm tomoro tomorow 2mro tomra tomorrow tom morrow tomora tomo tomrw Fig. 2: Noisy variants of “tomorrow” the word v such that p(v|w) is maximum among the conditional probabilities involving w, i.e. p(.|w). An illustration of the normalization phase appears in Fig. 4. State-of-the-art techniques use more sophisticated models than a simple formulation of conditional probabilities outlined above. We will outline techniques that use statistical machine translation (SMT) and spelling correction-based models in the remainder of the paper. Statistical Machine Translation We now use a toy example to illustrate how a simple SMT model[2] may be used to learn the mappings between words and SMS1:[ma, my] [hse, house] = 0.5 SMS1:[ma, house] [hse, my] = 0.5 SMS2: [ma, my] [buk, book] = 0.5 SMS2: [ma, book] [buk, my] = 0.5 Table 1: Initial word alignment configuration [Noisy SMS, clean SMS] Pairs “btw, r u goin 4 d movie” “by the way, are you going for the movie?” “itll b gud 2 c u tonite” it will be good to see you tonight” Example ” Character deletion “message” “msg” Phonetic substitution “to” Abbreviation “laugh out loud” Informal usage Word deletion “going to” “2” “lol” Learner “gonna” “driving back home” “drivin “drivin hm” hm” Fig. 1: Types of noise in SMS text Learned model “lemme no wen u gt thr” “let me know when you get there” Fig. 3: Learning process CSI Communications | May 2012 | 9 my house book ma 1.0 0.5 0.5 hse 0.5 0.5 0.0 0.5 0.0 0.5 buk Column-wise normalization my house book ma 0.50 0.50 0.50 hse 0.25 0.50 0.00 buk 0.25 0.00 0.50 Table 2: Populated word-word table their noisy variants using the training set of SMS pairs. Consider two hypothetical noisy SMSes, ma hse and ma buk, which map to their correct variants my house and my book respectively. We will not make any assumptions on the preservation of word ordering in the noisy variant of the clean SMS. Thus, we have the two possible word alignments for the [ma hse, my house] pair that we will initialize to being equally likely. A word alignment for a training SMS is a mapping from each word in the noisy version to a word in the clean version. Such an initial configuration of SMS word alignments are depicted in Table 1. Now, we will use these word alignment probabilities to populate the word-to-word mapping probabilities between the noisy word vocabulary [ma, hse, buk] and the correct vocabulary Noisy SMS [my, house, book]. Since the SMS1:[ma, my] [hse, house] = 0.50 * 0.50 = 0.250 mapping [ma, my] occurs SMS1:[ma, house] [hse, my] = 0.50 * 0.25 = 0.125 in two different alignments, SMS2: [ma, my] [buk, book] = 0.50 * 0.50 = 0.250 each with confidence 0.5, SMS2: [ma, book] [buk, my] = 0.50 * 0.25 = 0.125 we will initialize the mapping Normalization of word-alignment probabilities per training SMS to have a confidence of 1.0. Similarly, all pairs are SMS1:[ma, my] [hse, house] = 0.250/(0.250+0.125) = 0.67 SMS1:[ma, house] [hse, my] = 0.125/(0.250+0.125) = 0.33 initialized to the sum of SMS2: [ma, my] [buk, book] = 0.250/(0.250+0.125) = 0.67 confidences of all alignments SMS2: [ma, book] [buk, my] = 0.125/(0.250+0.125) = 0.33 in which they occur. Such a matrix, shown in Table 3: Modified word alignments for SMSes Table 2, is then normalized column-wise so that each word in the probabilities; such an iterative process target vocabulary (i.e. vocabulary of leads to a final converged matrix clean SMSes) has values summing up approximately of the form as shown in to unity. Such a process of creation and Table 4. Thus, an iterative sequence of normalization of the word-word mapping estimating word-alignment probabilities probability tables is illustrated in Table 2. and word-word mappings enables us In an iterative style, the wordto drill-down to the correct mappings mapping probabilities may now be used [ma → my, hse → house, buk → book] to compute refined word alignments for that can then be used to convert a new training SMSes. The confidence of each SMS to its clean version in a word-byalignment is computed as the product word manner. Though such a simplistic of the word mappings contained in translation model (called IBM Model 1) the alignment. Thus, the {[ma,house] is very popular, sophisticated SMT [hsr,my]} alignment of SMS1 is assigned models that can learn many-to-many a confidence of 0.125 (product of mappings between words are often used 0.50 from [ma,house] and 0.25 from to achieve more accurate mappings. [hse,my]). These are then normalized so my house book that the confidences of all alignments for a single SMS sums up to unity. Table 3 ma 0.99 0.00 0.00 illustrates this process of refinement hse 0.00 0.99 0.00 of word alignment confidences. These buk 0.00 0.00 0.99 can then be used to estimate new wordword mapping probabilities followed Table 4: Converged word-mapping probabilities by estimation of new alignment “wot a match, luvd evry bit o it” Learned model Model applier Cleaned SMS “what a match, loved evry bit of it” Fig. 4: Normalization process CSI Communications | May 2012 | 10 SMT-based Approaches to SMS Normalization The SMT paradigm has been found to be the most effective among the various paradigms that have been tried for SMS normalization. An adaptation of the traditional SMT models[2] was first used for SMS normalization to learn phrasebased alignments between the SMS and a candidate clean text. This uses a phrase-based model instead of the wordbased model described above and learns mappings between phrases in clean text and phrases in SMSes using an iterative approach. A comparative study of SMS normalization approaches[5] finds that SMT-based systems are significantly less error-prone than other approaches. Even in cases where a training set of noisy and clean SMS pairs are unavailable, the machine translation paradigm[4] has been used by creating a pseudo-translation www.csi-india.org T @ O @ G1 ‘T’ S0 G2 ‘O’ D @ A @ Y @ G3 ‘D’ G4 ‘A’ G5 ‘Y’ S6 (a) Graphemic path T P1 /T/ S0 A O U D Y E I P2 /AH/ P3 /D/ P4 /AY/ 2 S1 “2” S0 G1 ‘T’ G2 ‘O’ P1 /T/ P2 /AH/ S1 “2” S6 (b) Phonemic path G3 ‘D’ P3 /D/ G4 ‘A’ G5 ‘Y’ P4 /AY/ S6 (c) Cross-linked Fig. 5: Word HMMs for SMS normalization model based on heuristic-based estimation of SMS word to clean word mappings. Hidden Markov Models for SMS Normalization About the Authors Another paradigm that has been explored for SMS normalization is to model omissions and noisy variations explicitly. Towards this, an HMM-based word model[3] is constructed for each word in a training set of words. A hidden markov model may be considered as a set of interconnected states, each of which may emit certain values based on their output probabilities which are then seen in the output. In the formulation proposed in Choudhury et. al.[3], the noisy variant of a word is considered to be emitted from a word’s HMM. Consider the word today; the ordered set of graphemes within it is [`t`,`o`,`d`,`a`,`y`] whereas the corresponding set of phonemes is [/T/, /AH/, /D/, /AY/]. Fig. 5(a) represents a HMM constructed out of the graphemes (characters, in our context). This is represented as a linear sequence of hidden states, each state corresponding to a token in the grapheme set. In a nonnoisy version, each HMM state would emit the corresponding token; thus, a left-to-right HMM would always emit the correct word. However, since noise is what is to be modeled, each state is formulated to be able to emit either the corresponding grapheme, any other token (represented by ‘@’ in the figure), or nothing at all (represented as ε). A similar phonemic HMM is represented in Fig. 5(b). The transformation of a phoneme to a grapheme is itself noisy, and thus, the emission set only includes the graphemes that could possibly map to the phoneme associated with the state. The “to” part in “today” may be transformed to the numeral “2” due to phonemic similarity, and Fig. 5(b) shows how that is accounted for in the phonemic HMM. The graphemic and phonemic HMMs are cross-linked intuitively to produce a single HMM as shown in Fig. 5(c) (emission graphemes are omitted in the figure to reduce clutter). Each clean word, along with its noisy variants, is used as a training corpus to learn the transition probabilities and emission probabilities. For example, at the end of the training, state G1 may have an emission probability distribution [‘T’:0.8, ε:0.1, @:0.1] and an onward state transition distribution as [G2: 0.6, P2: 0.4]. Such learnt HMMs are then post-processed and harnessed using standard techniques to decode the “clean” version from a noisy word. Such word-level cleansing is aggregated to achieve normalization of SMS text to their clean versions. Summary With increasing popularity of the SMS language through SMSes and microblogging websites, cleansing SMS text is a prerequisite for effective development and deployment of services such as text-to-speech and automatic translation. There has been a lot of interest in developing techniques to cleanse SMS text of late. In this article, we outlined the problem of normalization of SMSes to their intended clean versions, and briefly surveyed various techniques that have been developed for the purpose. We specifically focused on the usage of machine translation models, a popular paradigm for accurate decoding of SMS text. References [1] AiTi Aw, et al. (2006). “A PhraseBased Statistical Model for SMS Text Normalization”, Proceedings of COLING/ ACL Conference, Sydney, Australia. [2] Brown, P, et al. (1993). “The mathematics of statistical machine translation: parameter estimation”, Computational Linguistics, 19(2), 263-311. [3] Choudhury, M, et al. (2007). “Investigation and modeling of the structure of texting language”, 1st Intl. Workshop on Analytics for Noisy Unstructured Text Data, Hyderabad, India. [4] Contractor, D, et al. (2010). “Unsupervised cleansing of noisy text”, Proceedings of the COLING Conference, Beijing, China. [5] Kobus, C, et al. (2008). “Normalizing SMS: are two metaphors better than one?” Proceedings of the COLING Conference, Manchester. [6] Venkata Subramaniam, L, et al. (2009). “A survey of types of text noise and techniques to handle noisy text”, Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data, Barcelona, Spain. [7] Venkata Subramaniam, L (2010). “Noisy Text Analytics”, Tutorial at the NAACL HLT Conference, Los Angeles, n USA. Deepak P is currently with the Information Management group at IBM Research - India, Bangalore. He received a B.Tech degree in computer science and engineering from Cochin University at Kochi, and M.Tech in the same discipline from IIT Madras, India. He is currently pursuing his PhD with the department of computer science and engineering at IIT Madras. His main research interests are in the area of data mining, similarity search, case-based reasoning and information retrieval. L Venkata Subramaniam received the BE degree in electronics and communication engineering from Mysore university, the MS degree in electrical engineering from Washington University, St. Louis, and the PhD degree in electronics from IIT Delhi. He presently manages the Information Processing and Analytics group in IBM Research India, New Delhi. His research interests include machine learning, natural language processing, speech processing and their applications to data analytics. CSI Communications | May 2012 | 11 Research Front Sajilal Divakaran* and Achuthsankar S Nair** *FTMS School of Computing, Kuala Lumpur **University of Kerala Approximate/Fuzzy String Matching using Mutation Probability Matrices We consider the approximate/fuzzy string matching problem in Malayalam language and propose a log-odds scoring matrix for score-based alignment. We report a pilot study designed and conducted to collect a statistics about what we have termed as “accepted mutation probabilities” of characters in Malayalam, as they naturally occur. Based on the statistics, we show how a scoring matrix can be produced for Malayalam which can be used effectively in numeric scoring for the approximate/fuzzy string matching. Such a scoring matrix would enable search engines to widen the search operation in Malayalam. Being a unique and first attempt, we point out a large number of areas on which further research and consequent improvement are required. We limit ourselves to a chosen set of consonant characters and the matrix we report is a prototype for further improvement. Keywords – approximate string matching, fuzzy string matching, scoring matrix, Malayalam Computing, Language Computing. Introduction Linguistic Computing issues in non-English languages are generally being addressed with less depth and breadth, especially for languages which have small user base. Malayalam, one such language, is one of the four major Dravidian languages, with a rich literary tradition. The native language of the South Indian state of Kerala and the Lakshadweep Islands in the west coast of India, Malayalam is spoken by 4% of India’s population. While Malayalam is integrated fairly well with computers, with a user base that may not generate huge market interest, such fine issues of language computing for Malayalam remains unaddressed and unattended. If we were to search Google to look for information on the senior author of this paper, Achuthsankar, and we gave the query as Achutsankar or Achudhsankar, in both cases Google would land us correctly in the official web page of the author. This “Did you mean” feature of Google is managed by the Google-diff-match-patch[4]. The match part of the algorithm uses a technique known as the approximate string matching or fuzzy pattern matching[10]. The close/ fuzzy match to any query that is received by the search engine is routine and obvious to the English language user. However, when a non-English language such as Malayalam is used to query Google, the same facility is not seen in action. When the word പതിനായിരം (Pathinaayiram - Malayalam word for the number ten thousand) is used CSI Communications | May 2012 | 12 as a query in Google Malayalam search, we are directed to documents that contain a similar word (Payinaayiaram - a common mispronunciation of the original word) but not the word പയിനായിരം. This is because approximate/fuzzy string matching has not been addressed in Malayalam. In this paper we make preliminary attempts toward addressing this very special issue of approximate/fuzzy string matching in Malayalam. Approximate/Fuzzy String Matching The field described as approximate or fuzzy string matching in computer science has been firmly established since 1980s. Patrick & Geoff[5] define approximate string matching problem as follows: Given a string s drawn from some set S of possible strings (the set of all strings composed of symbols drawn from some alphabet A), find a string t which approximately matches this string, where t is in a subset T of S. The task is either to find all those strings in T that are “sufficiently like” s, or the N strings in T that are “most like” s. One of the important requirements to analyze similarity is to have a scientifically derived measure of similarity. The soundex system of Odell and Russell[13] is perhaps one of the earliest of such attempts to use such a measure. It uses a soundex code of one letter and three digits. These have been used successfully in hospital databases and airline reservation systems[8]. Damerau-Leveshtein metric[2] proposed a measure - the smallest number of operations (insertions, deletions, substitutions, or reversals) to change one string into another. This metric can be used with standard optimization techniques[14] to derive the optimal score for each string matching and thereby choose matches in the order of closeness. Approximate or fuzzy string matching is in vogue not only in natural languages but also in artificial languages. In fact approximate string matching has been developed into a fine art in computational sciences, such as bioinformatics. Bioinformatics deals mainly with bio sequences derived from DNA, RNA, and Amino Acid Sequences[9]. Dynamic programming algorithm (Needleman–Wunch and Smith–Waterman algorithms)[11] which enable fast approximate string matching using carefully crafted scoring matrices are in great use in bioinformatics. The equivalent of Google for modern biologist is basic local alignment search tool (BLAST)[1], which uses scoring matrices such as point accepted mutation matrices (PAM)[3] and BLOcks of Amino Acid SUbstitution Matrix (BLOSUM)[6]. To the best of the knowledge of the authors, such a scoring system is not in existence for any natural language including English. Recently an attempt has been made in this direction for English language[7]. The statistics for accepted mutation in English was cleverly derived based on already designed Google searches. In the case of Malayalam, statistics of character mutations are not easily derivable from any corpus or any existing search engines or other language computing tools. Hence, data for this needs to be generated to go ahead with development of scoring matrix system. We www.csi-india.org will now describe generation of primary data of natural mutation in Malayalam. Occurrence and Mutation Probabilities Malayalam has a set of 51 characters, and basic statistics of its occurrence and mutation are required for developing a scoring matrix. The occurrence probabilities are available, derived from corpus of considerable size in 1971 and again in 2003[12]. We describe here only a subset of characters in view of economy of space. In Table 1, we give the probabilities of one set of consonants, which we have extracted from a small test corpus of Malayalam text derived from periodicals. ക 0.606 ഖ ഗ ഘ ങ k 0.009 0.044 0.004 0.039 0.297 Table 1: Occurrence probabilities of a set of selected Malayalam consonants We then designed and conducted a study to extract the character mutation probabilities. We selected 150 words that cover all the chosen consonant characters. A dictation was administered among a small group of school children (N=30). The observed mistakes (natural mutations) are tabulated in Table 2 as probabilities. It is noted that the sample size of N=30 is inadequate for a linguistic study of this kind. However, as already highlighted, this paper reports a pilot study to demonstrate proof of the concept. Moreover, the sample size can be made larger once the research community whets the approach put forward by us. ക ഖ ഗ ഘ ങ k ക 0.85 0.25 0.45 0.07 0 0.10 ഖ 0 0.55 0 0 0 0 ഗ 0.06 0.04 0.47 0.09 0 0 ഘ ങ k 0 0.01 0 0.85 0 0 0 0 0 0 0 0 0.08 0.11 0.08 0 0 0.90 Table 2: Probability of natural mistakes (natural mutation probabilities) of chosen set of consonant characters Log-odds Scoring Matrix It is possible to use Table 2 itself for scoring string matches. However, it might be unwieldy in practice. For long strings we will need to multiply probabilities, which might result in numeric underflow. Hence, we will use a logarithmic transformation. Another effect that we will use is to convert from probability to odds. The odds can be defined as the ratio of the probability of occurrence of an event to the probability that it does not. If the probability of an event is p, then odds is p/1-p. We will however not use this formula directly, but define odds for any given match i-j as: In the above equation, pij is the probability that character i mutates to character j and pj is the probability of natural occurrence of character j. Thus the negative score for a mutation of a less frequently occurring character will be more in this scheme. The multiplier 10 is used just to bring the scores to a convenient range. Table 3 shows the logodds score thus derived using occurrence probabilities and mutation probabilities given in Table 1 and 2. These can be used to score approximate matches and select the most similar one. ക ഖ ഗ ഘ ങ k ക 2 15 10 11 -30 -4 ഖ -30 18 -30 -30 -30 -30 ഗ -16 6 11 13 -30 -30 ഘ -30 3 -30 23 -30 -30 ങ -30 -30 -30 -30 -30 -30 k -9 11 0.08 -30 -30 5 Table 3: Log-odds probability of natural mistakes (mutation probabilities) of chosen set of consonant characters (We set score corresponding to 0 as -30. It may be noted that the diagonal elements are strongest in each respective column.) Results, Discussions, and Conclusion The prototype scoring matrix we have designed above can be demonstrated to be capable of scoring approximate matches and can therefore be a means of selecting the closest match. We will demonstrate this with an example of scoring four approximate matches for the word കk. Table 4 lists the scores for the four different matches and the exact match scores best. The next best match as per the new scoring scheme is കക. കk കk കk കഖ കk കഘ കk കക 2+5 2 - 30 2 - 30 2-4 Total Score: 7 Total Score: -28 Total Score: -28 Total Score: -2 Table 4: Demonstrating use of scoring matrix in Table 3 on sample approximate string matches Our demonstration has been on a chosen set of consonant characters, but it can be expanded to cover all Malayalam characters. For demonstrating more general words, scoring matrix for vowels is essential. We have computed the same and will be reporting it in a forthcoming publication. During our studies, we also noticed that the grouping of characters as done conventionally may not suit our studies. For example, we found that the character ഹ is a possible mutation for ക, very rarely, even though they are not grouped together conventionally. A regrouping based on natural mutations is a work we see as requiring attention. To the best of our knowledge, our work is a unique proposition for the Malayalam language, which can be incorporated into Malayalam search engines. We would like to reiterate that our work is in prototype stage. The sample size of the corpus as well as the size of the subjects in the survey is not substantial. The authors hope to expand the work with a sizable database from which statistics is extracted and then the scoring matrix can be made more reliable. We also propose to validate the scoring approach with sample trials involving language experts. References [1] Altschul, S F, et al. (1990). “Basic local alignment search tool”, Molecular Biology, 215(3), 403-410. [2] Damerau, F J (1964). “A technique for computer detection and correction of spelling errors”, ACM Communications, 7(3), 171-176. [3] Dayhoff, M O, et al. (1978). “A model of Evolutionary Change in Proteins”, Atlas of protein sequence and structure, 5(3), 345-358. [4] Google-diff-match-patch, [Online]. Available: http://code.google.com/p/ google-diff-match-patch/, Accessed on 20 Jan. 2012. Continued on Page 37 CSI Communications | May 2012 | 13 Article M Jayalakshmi Formerly of Vikram Sarabhai Space Centre, Dept of Space, Govt of India Emails and Web Pages in Local Languages Emails, text chats, and instant messages will become personalized and more impressive at times, if they are received in most familiar local languages. Similar is the case with online news and local language web pages. Those who are less literate in English as compared to their fluency in local languages, feel comfortable with a local language scripted emails/web pages compared to the corresponding English versions of the same. Here the local language is used in Indian context only. Let us look into some of the specific language tools and the languages they support. To read or write a local language scripted text, the required fonts must be present on your computer (PC). Windows, Macintosh, and Linux operating systems can use true-type fonts, which are available via downloadable installers. Installation needs to be done only once. Some web browsers have to be set up in utf-8 encoding format also. Nothing further is required for reading. Now in order to create, edit, and upload (send) texts in local languages, some language converters are to be installed or must be available in your PC. A number of language support tools, offline and online, free as well as non-free are available on the net. This article addresses some of these basic tools required to be set up in your PC for this purpose. There are keyboard maps and virtual keyboards supported by office software packages to type directly into the editor to create or update documents in any language, which comes along with the OS. But this will be a cumbersome process unless one is not conversant with that particular language typing and editing. Moreover, the fonts generated out of this process may not be web-fonts and hence readability will be lost. To overcome this, further software conversions and processing may be required to make them web loadable. There are some simpler short cuts to overcome these processes by sticking to the typing in the familiar English keyboard itself. A number of online and offline transliteration (language conversion according to sound) tools are available free on the net in the form of html web pages with multiple text-boxes, like window panes. One can type English alphanumeric characters (lower and upper cases in combination) according to the sound of the local language CSI Communications | May 2012 | 14 character to be produced. This is called phoneme transliteration. On the left window (English language editing window) you can type and edit the characters according to the target language phonetics (character sound) and on the right or bottom pane, the vernacular character will be simultaneously generated. For example, the typed text (on left column) will be rendered as follows: After you complete the partial or full editing of the English phonetics corresponding to a local language text, the local language characters will appear on the text-box (right window pane) in a (“Chillaksharam”), they can also be incorporated by these alphanumeric character sequences or from virtual keyboards of Unicode characters installed in your system. Department of Information Technology, Government of India has accepted Unicode encoding for fonts as Indian standard in this regard. Set Up Your System for Local Language Use If you are using Linux operating system, the installation procedure is as follows: 1. Download the font file from the site - स िर ग म प ध िन स Devanagari sa ri ga ma pa Dha ni sa - स िर ग म प ध िन स Hindi sa ri ga ma pa Dha ni sa - സ രി ഗ മ പ ധ നി സ Malayalam sa ri ga ma pa dha ni sa - ஸ ரி க ம ப த னி ஸ Tamil sa ri ga ma pa Dha ni sa - స రి గ మ ప ధ ని స Telugu sa ri ga ma pa Dha ni sa - ಸ ರಿ ಗ ಮ ಪ ಧ ನಿ ಸ Kannada sa ri ga ma pa Dha ni sa - স ির গ ম প ধ িন স Bangala sa ri ga ma pa Dha ni sa - ସ ରି ଗ ମ ପ ଧ ନି ସ Oriya sa ri ga ma pa Dha ni sa - ਸ ਿਰ ਗ ਮ ਪ ਧ ਿਨ ਸ Punjabi sa ri ga ma pa dha ni sa Unicode font. This local language text thus generated, you can copy and paste on the new mail editing area of the email client, in the html editing area of the web inbox, message-box of a chat line, or the web page editing window. Now you are ready to upload and dispatch the vernacular script. This is the basic principle used for local language web page creation too. To generate the vowel accents of local language sounds or compound characters in that particular language alphabet, a sequence of English characters may have to be typed at times. The guidelines for this will be generally available in the transliteration language web page itself. But all tools need not support all languages. Indian languages generally have a maximum of 15 vowel sounds and 36 consonants. There are compound letters formed by combination of consonants and vowels. Most of these patterns are handled in these tools. Still there will be a few left out which have to be addressed separately. In languages like Malayalam where certain words end in half sounds 2. 3. 4. 5. 6. Run the command: tar -xvzf Hindi.tar. gz This will create the directory "Hindi" Go into the directory "Hindi" Run the file FontInstaller.sh, give the command: ./FontInstaller.sh Now restart your X server The font is now installed on your machine. You can also create a new directory, say “myfonts” in /user/share/fonts/ and copy the required font in “myfonts” in Fedora. Windows 2000, Windows XP, and Windows Vista have inbuilt support for Unicode encoding at the operating system level, but the feature needs to be enabled. Windows VISTA • • • Go to the Control Panel and then click to the Regional and Language Option. Choose the Country - India. Click on the keyboard and Languages Tab and choose the Hindi keyboard. EN will appear in the system tray. Left click on the EN or press the ALT+SHIFT keys and choose the language to type. www.csi-india.org With the enabling of Unicode in your system, the INSCRIPT keyboard driver and Unicode supported Mangal and Arial Unicode MS fonts will be installed in the system. To download the other keyboard drivers, such as Typewriter/Remington, Phonetic/Roman, Platform-free and browser-free Open type fonts, fonts converter, keyboard tutor for learning the INSCRIPT Typing, Hindi version of Indian Open Office, and other software free of cost visit the site www.ildc.in • Choose the language (Hindi) • Click on the ‘Download’ for the required software and driver • A zip file will get downloaded • After unzipping the file, run the .exe of that software Option - 3 Open-type fonts Option - 4 Keyboard Drivers Option - 5 Fonts Converter Unicode can be enabled in the Windows 2000 and later version Operating Systems as under: You should first install Windows Files for display of Indic languages. Enable Indic for Windows XP & above 1. Go to Start-> Control Panel> Regional & Language Options >Languages Tab-> (Tick the Install files for complex scripts...) and click OK. (tick the Indic) and click OK. 2. Click OK (Figure Below). 3. You will require the Windows 2000 CD to enable Indic. Again go to Regional Options and Click on Input Locales. Add those languages on which you want to type. From System tray Click on EN and for typing select language. Unicode Fonts Unicode is a map, a chart of all of the characters, letters, symbols, punctuation marks etc. necessary for writing all of the world’s languages. Graphemes are the basic building blocks of a written script. Grapheme is a synonym for a character. In English, there is one-to-one correspondence between a character and its glyphs (ornamental marks). Glyphs in a font should comprise a unified design entity. Font represents the graphical form of a script. Fonts are therefore formed with a collection of Graphemes and glyphs. Phonemes are the basic building blocks of phonetics of a language. Graphemes form as an abstract conceptual layer in between physically conceivable glyphs and phonemes. Unicode Consortium is standardizing the character sets of the world languages. Character sets of 30+ languages are currently standardized under Unicode. A font spanning many Unicode ranges can be helpful in several practical applications. For instance, it can provide some scripts and characters that are hard to find, ease installation of base support for many languages, facilitate documents mixing symbols and language scripts, and improve appearance of web pages with mixed symbols and scripts. Those who use Windows OS (only NT), 2000, and XP can take advantage of Unicode. In these operating systems, it is possible to read, type, print etc. using Unicode mappings, provided of course that you have the appropriate font and keyboard drivers. With the other Windows (95, 98, me), typing in Unicode is not really possible. Unicode also works on recent Mac operating systems. Virtual Keyboards & Character Maps The combinations of consonants and vowels to render the different phonetics may be rendered by successive hits of key strokes as given below (Fig. 1). This can easily be rendered faster by transliteration packages, generally available as html forms as given in the subsequent figures (Fig. 3). 2. Click OK (Figure Below). 3. You will require the Windows XP CD to enable Indic. Again Go to Control Panel >> Regional and Language Option >> Click on Language Tab Click on Details and Click on Add for Selection of the language of your choice From System tray Click on EN and for typing select language Enable Indic for Windows 2000 1. Go to Start->Settings->Control Panel>Regional Options ->Languages->Indic Fig. 1: Key strokes for rendering phonetics (+ implies successive hits) CSI Communications | May 2012 | 15 Conclusion Setting up of your PC for local language reading and writing, installation of fonts, and language converters are a one-time activity. These installations are to be done only once for any typical local language. Rest of the work of reading, typing, editing, and uploading scripts are as easy as any other English language text. Some of the Unicode fonts for Indian languages are: 1. Windows: Arial Unicode MS, Akshar Unicode, ALPHABETUM Unicode, Aparajita, JanaHindi, JanaMarathi, JanaSanskrit, Kalimati, Kanjirowa, Kokila, Lucida Sans, Mangal, Raghindi, Roman Unicode, Sanskrit 2003, Fig. 2: A typical keyboard character map for devanagari font Santipur OT, Saraswati5, shiDeva, SHREEDV0726-OT, Language Converters Transliterate to Hindi SiddhiUni, Sun-ExtA, Thyaka There are a number of free language Rabison, TITUS Cyberbit Type your text here See your results here converters available in Windows and Basic, Uttara, Chrysanthi LINUX. The following list refers to a few Unicode, CN-Arial, namaskaara ueLdkj of them. Code2000, Ekushey Azad, Ekushey Durga, Ekushey Fig. 3: A typical Hindi transliteration page Offline Converters Puja, Ekushey Punarbhaba, 1. Indian language converter (ILC) Ekushey Saraswatii, Ekushey - Bengali, Hindi, Kannada, Malayalam, Sharifa, Ekushey Sumit, Free Oriya, Punjabi, Sanskrit, Telugu, and Tamil. Serif, Likhan, Mitra Mono, 2. Scripto0.2.0 – Gujarathi, Gurumukhi, Mukti, Mukti Narrow, Raga, Hindi, Malayalam Fig. 4 Roman Unicode, Rupali, 3. Keraleeyam, Varamozhi, mozhi, Saraswati5, SolaimanLipi, Madhuri - Malayalam Sun-ExtA, UniBangla, Vrinda, 4. Baraha - Kannada, Hindi, Marathi, aakar, Chrysanthi Unicode, CN - Arial, Conversion Guidelines Sanskrit, Tamil, Telugu, Malayalam, Code2000, padma, Rekha etc. Gujarati, Gurumukhi, Bengali, Assamese, Manipuri, and Oriya languages. 5. Hindi Editor For The Unicode™ Standard – Hindi Online Converters Google mails have built in transliteration facility. The language of choice may be selected from a list box in the html text creation of mails. • Aksharamala • Bangla Unicode Converter • Devanagari Editor etc. Some of the other online URLs are • http://www.translatorindia.com • http://www.tamilcube.com • http://unicode.org/resources/onlinetools.html Transliteration A typical transliteration software package ILC downloaded from the Internet will look like the following: CSI Communications | May 2012 | 16 Fig. 5 www.csi-india.org 2. 3. Macintosh OS 9: Devanagari MT, Devanagari MTS Linux: GNU FreeFont, Devanagari, Lohit Malayalam, Latha, Valluvar etc. About the Author More specifically, these are the fonts for typical Indian language scripts: 1. Hindi - Akshar, Cdac - GIST Surekh, Gargi (Gargi.ttf), JanaHindi (RKJanaHindi.TTF) JanaMarathi (RVJanaMarathi.TTF), Mangal (mangal.ttf), Raghindi (raghu.ttf), Sanskrit 2003 (Sanskrit2003.ttf), Shusha Fonts, Mangal font Mangal. ttf, Hindi for Devanagari, Arial Unicode 2. Malayalam - Kartika, Arial Unicode, GNU FreeFont, Lohit Malayalam, Meera, dyuthi, rachana, suruma, raghu, Anjali old lipi, ML-Nila 3. 4. 5. Tamil - Akshar Unicode (akshar. ttf),Arial Unicode MS (arialuni.ttf), JanaTamil (RRJanaTamil.ttf), Latha (latha.ttf), ThendralUni (Thendral Uni.ttf) TheneeUni (TheneeUni.ttf), VaigaiUni (VaigaiUni.ttf) Telugu - Akshar Unicode (akshar. ttf), Code2000 (code2000.ttf), Gautami (gautami.ttf), Pothana2000 (Pothana2000.ttf), Vemana2000 (Vemana.ttf) Kannada Akshar Unicode (Akshar.ttf), Arial Unicode MS (arialuni.ttf), Sample of JanaKannada at 25pt JanaKannada (ROJanaKannada.TTF from JanaKannada.zip), Kedage, Mallige (Malige-n.TTF_ RaghuKannada (RORaghuKannada_ship.ttf ), Saraswati5 (SaraswatiNormal.ttf and SaraswatiBold.ttf), Tunga (Tunga.ttf) 6. Bengali - Arial Unicode MS All true-type Unicode fonts are portable in LINUX system. Bibliography [1] Baraha - Free Indian Language Software - Typing Software, http:// www.baraha.com [2] Indian language transliteration | Indian language unicode, http://vikku.info/indian-languageunicode-converter [3] The Indian Language Converter, h t t p : //w w w . y a s h . i n f o / i n d i a n LanguageConverter Download this site's code: or ilc.zip. The code used is free for use. [4] GNU FreeFont: Why Unicode fonts? http://www.gnu.org /sof tware/ freefont/articles n M Jayalakshmi is a retired scientist/engineer from the Vikram Sarabhai Space Centre, Dept of Space, Govt of M. IIndia. She was the Webmaster of VSSC intranet & Head of its Enterprise Software Section, Computer Division. Her expertise are in the area of 1. Computational Numerical Software in Avionics sub-Systems, 2. Microprocessor H based On-board computers & Telemetry Systems, 3. Quality Assessment of Launch Vehicle Mission Software. b Development of applications software for VSSC intranet. She can be contacted at [email protected]. D CSI Communications | May 2012 | 17 Article Nishant Allawadi* and Parteek Kumar** * Masters Student, Thapar University, Patiala ** Assistant Professor, CSED, Thapar University, Patiala A Speech-to-Text System Abstract: Speech-to-Text (STT) can be described as a system which converts speech into text. This paper discusses about the applications of STT system in health care instruments, banking devices, aircraft devices, robotics etc. This paper discusses the existing system like SOPC based Speech-to-Text architecture, architecture for Hindi Speech Recognition System using HTK and Phonetic Speech Analysis for Speech to Text Conversion. This paper presents the architecture of the Speech-to-Text system. This paper provides a tutorial to implement STT system. In this, it describes four phases of development of STT system, namely, data preparation, monophone HMM creation, tied-state triphone HMM creation and execution with julius. First phase is used for processing of raw data for further use. Second phase is used for the training of the system using monophones. Third phase is used for the training of the system using triphones. Final phase explains the execution of the system. The paper also highlights the futuristic applications of Speech-to-Text system. Keywords: HMM, dictionary and triphones. monophones, Introduction Speech-to-Text (STT) system is a system for conversion of speech into text. It takes speech as input and divides it into small segments. These small segments are sounds, known as monophones. It extracts the feature vectors of the monophones and matches them with stored feature vectors[1]. Hidden Markov Model (HMM) is used to find the most probabilistic result and gives out the text for the input speech. The system is developed by re-estimating the feature vectors at each step of training using HMM Tool Kit (HTK) commands. The HMM is a result of the attempt to model the speech generation statistically. It is the most successful and commonly used speech model used in speech recognition[2]. This paper is divided into six sections. Second section discusses about the applications. Third section highlights the existing STT systems. Architecture of the STT has been described in the fourth section. Fifth section describes the implementation of the STT system. CSI Communications | May 2012 | 18 The conclusion has been derived in sixth section. Applications of the Speech-to-Text System STT system is applicable in hospitals for Health Care Instruments[6]. In banking, STT is implemented in input devices where credit card numbers are given input as speech. It is widely used in aircraft systems, where pilots give audio commands to manage operations in the flight. Mobile phones are devices which use STT in its many applications. These applications are like writing text messages by speech input, e-mail documentation, mobile games commands, music player song selection etc. STT systems are used in computers for writing text documents. It is also used for opening, closing and operating various applications in computers. Battle Management command centres require rapid access to and control of large, rapidly changing information databases. Commanders and system operators need to query these databases as conveniently as possible, in an eyes-busy environment where much of the information is presented in a display format. Human-machine interaction by voice has the potential to be very useful in these environments. Robotics is a new emerging field where inputs are given in speech format to robots. Robot processes the speech input command and perform actions according to that[3]. Existing Speech-to-Text Systems There are a number of systems that have been proposed world-wide for Speech-toText System. A System-on-Programmable-Chip (SOPC) based Speech-to-Text architecture has been proposed by Murugan and Balaji. This speech-to-text system uses isolated word recognition with a vocabulary of ten words (digits 0 to 9) and statistical modeling (HMM) for machine speech recognition. They used Matlab tool for recording speech in this process. The training steps have been performed using PC-based C programs. The resulting HMM models are loaded onto an Fieldprogrammable gate array (FPGA) for the recognition phase. The uttered word is recognized based on maximum likelihood estimation. An architecture for Hindi Speech Recognition System using HTK has been proposed by Kumar and Aggarwal[7]. The proposed system was built as a speech recognition system for Hindi language. Hidden Markov Model Toolkit (HTK) has been used to develop the system. The proposed architecture has four phases, namely, preprocessing, feature extraction, model generation and pattern classification. The system recognizes the isolated words using acoustic word model. The system was trained for 30 Hindi words. Training data was collected from eight speakers. The developer reported the accuracy of 94.63%. Phonetic Speech Analysis for Speech to Text Conversion has been given by Bapat, and Nagalkar[4]. Their work aimed in generating phonetic codes of the uttered speech in training-less, human independent manner. The proposed config INPUT proto dict monophones word.mlf Wav files prompts .grammar file Data Preparation phones.mlf MFC files hmmdefs Monophones HMM Creation macros Tied-State Triphones HMM Creation hmmdefs macros tiedlist Execution with Julius Text MFC files monophones .voca file vocabulary .dfa .dict file Fig. 1: Speech-to-Text Conversion Architecture www.csi-india.org .grammar .dfa .dfa and .dict file creation .voca .dict Fig. 2: Process of creation of. dfa file and .dict file system has four phases, namely, end point detection, segmenting speech into phonemes, phoneme class identification and phoneme variant identification in the class identified. The proposed system uses differentiation, zero-crossing calculation and FFT operations. Architecture of Speech-to-Text System The conversion process of speech to text is divided into four phases, namely, Data preparation, Monophones HMM creation, Tied-state triphones HMM creation and Execution with Julius interface as given in Fig. 1. The description of each of these phases is given in subsequent sections. Implementation of Speech-to-Text System The conversion process of speech to text is divided into four phases, namely, Data preparation, Monophones HMM creation, Tied-state triphones HMM creation and Execution with Julius interface as given in Fig. 1. The description of each of these phases is given in subsequent sections[5]. Data Preparation This phase is used to prepare the data for processing in subsequent phases. It requires grammar file, speech files, vocabulary file and training text file as raw input for processing. The processing of these files is explained below. Grammar files In this phase, the grammar of the language in the form of rules is provided in .grammar file and words are provided in .voca file. The .grammar file is used to define the recognition rules. The .voca file is used to define the actual words in each word category and their pronunciation information. The description of .grammar file is given in (1). % NS_E </s> % CALL ADVICE BOY */sample3 ADVICE ADVICE BOY BOY CHARLIE CHARLIE DOOR DOOR KICK KICK …(3) Speech files Speech files are stored in .wav format. These files are recorded by a recording tool like audacity. The training text, written in prompts file, is recorded and saved in these files. Vocabulary file sil ae d v ay s b oy …(2) As given in (1), S refers to start symbol of input, while NS_B indicates the beginning of silence and NS_E indicates end of silence by sil monophone. The data to be recognized is given by the keyword SENT which refers to CALL as given in (1). The details of CALL is provided in .voca file as given in (2). The CALL provides the recognition of words with their monophones combination. For example, ADVICE has monophones combination of “sil ae d v ay s sil”. The .grammar file and .voca file are compiled to generate a dictionary file and finite automata file, namely .dict and .dfa file, respectively. These files are required at the time of execution of the system as shown in Fig. 2. Training text file Training text file is named as prompts. It contains a list of words that are to be recorded and the names of their corresponding audio files that are to be stored. The description of this file is given in (3). */sample1 ADVICE BOY CHARLIE DOOR KICK MAID NURSE ONCE RULE TARGET */sample2 TARGET RULE ONCE NURSE MAID KICK DOOR CHARLIE BOY ADVICE This file contains a sorted collection of commonly used words of a language along with their combination of monophones. This file is used as a reference to create a dictionary for the training words. A snapshot of this file is given in (4). ABACK ABACUS ABALON [ABACK] [ABACUS] [ABALON] Creation of phones.mlf and dictionary file In data preparation phase, wordlist and words.mlf files are created from prompts file. The wordlist file contains all the unique words of prompts file. The words.mlf file contains the same text as prompts file with each word of prompts file in a new line. The wordlist file creates monophones0 and dictionary file, with the help of vocabulary file. The dictionary file contains all the training words with their corresponding monophone combination and monophones0 file contains list of all unique monophones. The dictionary and words.mlf file generate phones0.mlf file as given in Fig. 3. A monophones1 file is also generated in this process without sp i.e. short-pause monophone. Creation of MFC files The .mfc are created from .wav files by using HCopy command of HTK with the help of a configuration file, config[8]. These .mfc files contain the feature vectors for vocabulary Word List wordlist Creation monophones Dictionary Creation dictionary prompts S : NS_B SENT NS_E SENT: CALL …(1) The description of .voca file is given in (2). % NS_B <s> sil ax b ae k ae b ax k ax s ae b ax l aa n …(4) Master Label File Creation words.mlf Phoneme Master Label File Creation phones.mlf Fig. 3: Master Label File Creation CSI Communications | May 2012 | 19 best possible pronunciation. In order to do this, HVite command is used with words. mlf file, monophones1 file, dict file, config file and previously generated HMM file and saves it in a new transcript file, i.e., aligned. mlf. In order to retrain the system, HERest command is used two times with newly created aligned.mlf and monophones1[8]. config monophones0 monophones1 proto mfc files Creating Flat Start Fixing Silence macros hmmdefs Monophones macros hmmdefs Models and aligned.mlf Re-estimating phones0.mlf hmmdefs Realigning Training Data macros Tied-State Triphones HMM Creation This phase has triphones creation, tiedstate triphones creation and training as two important sub-phases as shown in Fig. 5. phones1.mlf Triphones Creation and Training aligned.mlf Fig. 4: Monophone Creation and Training the .wav files and are used in subsequent phases for training. Monophone HMM Creation This phase is used to create a well-trained set of single-gaussian monophones HMM. This phase requires a prototype for HMM, .mfc files, configuration file, monophone files and phones.mlf file for creating the HMM. Each HMM file follows the prototype given in proto file. There are a number of monophones in HMM file. Generally, each monophone has five states. Here, state 1 and state 5 are opening and closing states, while state 2, 3 and 4 has values for means and variances for its corresponding monophone. This phase is further divided into three sub-phases, namely, creating flat start monophones and re-estimation, fixing the silence models and realigning the training data as given in Fig. 4. Creating Flat Start Monophones and Re-estimation In this sub-phase, HMM file is created manually by using default global values of means and variances. These default values are calculated by HCompV command of HTK with the help of .mfc files and config file[8]. These values are re-estimated three times using HERest command with the help of previously generated HMM file, .mfc files, config file, phones.mlf file and monophones0 file[8]. and saved with name sp. It has 5 states where state 1 and state 5 are opening and closing states. The State 2 and state 4 are removed from sp model and only a state 3 is kept in sp model. The HHEd command is used to tie sp model with central state of sil model with the help of monophones1 file. The script file for this operation is given in (5). AT AT AT TI 2 4 0.2 4 2 0.2 1 3 0.3 silst In this manner, short pauses between spoken words are treated as silence. In order to retrain the system, the HERest command is used two times with the help of previously generated HMM file, .mfc files, config file, phones.mlf file and monophones1 file[8]. Realigning the Training Data In case of multiple pronunciations of a word in dictionary, this phase selects the CSI Communications | May 2012 | 20 sil ae+d ae-d+v d-v+ay v-ay+s ay-s …(6) The HLEd command is used to create triphones as given earlier in (6). It requires two files, aligned.mlf and a script, as shown in (7). WB WB TC sp sil …(7) As the system has been updated by including triphones file. The HERest command is used two times to train the system with triphones. config aligned.mlf hmmdefs Fixing the Silence Models This sub-phase is used to make the model more robust to absorb various impulsive noises in the training data. This is done by including short pause monophone in the HMM file and linking it with sil monophone. In order to do this, a temporary copy of sil model is created {sil.transP} {sil.transP} {sp.transP} {sil.state[3],sp. state[2]} …(5) In this sub-phase, triphones are created. Triphone is a combination of three monophones. This greatly improves recognition accuracy, because now the system looks to match a specific sequence of three sounds together rather than only one sound. For example, ADVICE has triphones as given in (6). macros Creating Triphones from Monophones and Training hmmdefs macros stats triphones wintri.mlf hmmdef Creating Tied-State Triphones and Training macros tiedlist mfc files Fig. 5: Triphones Creation and Training www.csi-india.org of that language and training the system with training text of that language. References Fig. 6: System Execution Tied-State Triphones Creation and Training -hlist -h -dfa -v -smpFreq tiedlist hmm15/hmmdefs sample.dfa sample.dict 48000 In this sub-phase, different triphone states are tied together in order to share the data and to make the system more robust. In order to tie states, the HHEd command is used with previously generated HMM file and triphones file. This command creates tiedlist file that is used in further training of the system. Since, the system has been updated with new file tiedlist, the HERest command is used to retrain the system two times with newly created tiedlist file. The julian command is used to execute the system. It requires julian.conf and mic as parameters. After execution of this command the system prompts the user to speak the sentence as given in Fig. 6. Now the speaker can speak input sentence and the system will give its corresponding text. Execution with Julius Interface Conclusion Julius is as an interface used to execute STT system. Julius requires four files, .dfa file, .dict file, previously generated HMM file and tiedlist file. The first two files, .dfa file and .dict file, have already been created in phase 1 and HMM file and tiedlist file have been created in phase 4. In order to execute the system, these files are passed as parameters in its configuration file, i.e., julian.conf as given in (8). A Speech-to-Text system for small vocabulary can be developed by using HTK commands. As discussed in the architecture, there are four phases in the development of the STT system. The above discussed STT system is speaker dependent. To make this system speaker independent, adaptation technique is required. A STT system for other languages can also be developed by using monophones …(8) [1] A. Kemble Kimberlee, “An Introduction to Speech Recognition (Unpublished work style),” unpublished. [2] Aymen M., Abdelaziz A., Halim S., Maaref H., “Hidden Markov Models for automatic speech recognition”, in International Conference on Communications, Computing and Control Applications (CCCA), Hammamet, Tunisia, 2011, pp. 1-6. [3] Balaganesh M., Logashanmugam E., Aadhitya C.S., Manikandan R., in International Conference on Emerging Trends in Robotics and Communication Technologies (INTERACT), Chennai, India, 2010, pp. 12-15. [4] Bapat Abhijit V., Nagalkar Lalit K., “Phonetic Speech Analysis for Speech to Text Conversion”, in IEEE Region 10 Colloquium and the Third International Conference on Industrial and Information Systems, Kharagpur, India, 2008, pp. 1-4. [5] “Create Speaker Dependent Acoustic Model Using Your Voice”, http:// w w w .v o x f o r g e . o r g / h o m e /d e v/ acousticmodels/windows/create. [6] Grasso Michael A., “The Long-Term Adoption of Speech Recognition in Medical Applications”, in 16th IEEE Symposium Computer-Based Medical Systems, New York, NY, USA ,2003, pp. 257-262. [7] Kumar Kuldeep and Aggarwal R.K., “Hindi Speech Recognition System using HTK”, J. of International Journal of Computing and Business Research, vol. 2, pp. 3-7, 2011. [8] Steve Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying (Andrew) Liu, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev and Phil Woodland, The HTK Book, Cambridge University n Engineering Department, 2009. About the Authors Nishant Allawadi is pursuing Master’s of Engineering in Computer Science at Thapar University, Patiala. He has received his Bachelor of Technology degree from Guru Jambheshwar University of Science and Technology, Hisar (Haryana) in the year 2010. He is doing his ME thesis in the field of Natural Language Processing. Parteek Kumar is Assistant Professor in the Department of Computer Science and Engineering at Thapar University, Patiala. He has more than thirteen years of academic experience. He has earned his B.Tech degree from SLIET and MS from BITS Pilani. He is pursuing his Ph.D in the area of Natural Language Processing from Thapar University. He has published more than 50 research papers and articles in Journals, Conferences and Magazines of repute. He has undergone various faculty development programme from industries like Sun Microsystems, TCS and Infosys. He has co-authored six books including Simplified Approach to DBMS. He is acting as Co-PI for the research Project on Development of Indradhanush: An Integrated WordNet for Bengali, Gujarati, Kashmiri, Konkani, Oriya, Punjabi and Urdu sponsored by Department of Information Technology, Ministry of Communication and Information Technology, Govt. of India. CSI Communications | May 2012 | 21 Article Jaganadh G Consultant in Text Analytics and Free and Open Source Software Opinion Mining and Sentiment Analysis Introduction It is human to have opinion on whatever may be experienced in his/her life. Opinion is expressed with the help of language either as written or spoken. Human being used to mine opinion in a natural way whenever he started living as social being. All his/her adventures or new procurement etc. were subject to the opinion mining. Before wearing a new apparel for a public function, or buying some appliances or before watching movies people solicited opinions from friend, family and others. They mined the entire opinion collection with worlds complex opinion mining system "human brain". When human being entered in to a consumer oriented world corporate and non-corporate establishments started producing, selling and advertising their products/services. Corporate establishments used media to advertise their service/product which eventually leads to word-of-mouth advertisement and sales opportunities. Non-corporate establishments were almost depending word-of-mouth publicity only. In both of the scenarios people buys and experiences, then they expressed their opinions on the same. These opinions were key factors in determining the market of services and products. Corporate establishments were keen to understand the customer opinions and to derive Business Intelligence from it. So they conducted surveys to understand customer needs satisfaction and dissatisfaction. Consolidated reports on such surveys helped them to improve product, marketing startegy and even withdraw product from market to avoid loss and manage reputation. It can be called as a second generation of Opinion Mining. The advent of Web 2.0 based technologies and tools opened wast window to express and share opinions. Thus the opinions reached to a wide audience across the globe. Also the opinions expressed through web platforms such as social media (Twitter, Facebook etc.) created opportunity to create real-time sharing of opinion; which leads to real-time market up and down for corporate and similar entities. Deriving Business Intelligence from heavy flow of CSI Communications | May 2012 | 22 consumer opinion in real-time gave birth to a new field of study in Natural Language Processing and Computational Linguistics. The very field is called as "Opinion Mining". Sentiment Analysis, Sentiment Mining, Opinion Mining, Review Mining, Opinion Detection, Sentiment Detection, Subjectivity Detection, Polarity Classification, Semantic Orientation, and Appraisal Extraction etc. refers to same state of the art. The current article aims to give a brief introduction to Opinion Mining, its technical aspects and business applications in real-world. Opinion To get a deeper insight on the art lets see what is the definition, structure and social role of opinion. We are surrounded with opinions than facts in our life. Oxford Dictionary defines opinion as (a) 'a view or judgment formed about something, not necessarily based on fact or knowledge' (b) 'a statement of advice by an expert on a professional matter'. Opinion is more or less results from state of mind when we experience something in our day to day life. Based on the socio-cultural standard of the person he the sentiment/ opinion express with the help of linguistic units appropriate to the mental state, experience and situation. This expression may be an appraisal or a negative comment up to the extreme of using sarcasm or un-parliamentary words. It is also quite natural that people may compare stuffs when expressing opinion. There are other kind of opinion which comes from experts or experienced people. In social life we call them as trust worthy source of opinion. They provides comparative and structured opinion on the topics we seek advice. Such people are there in the online community too. In terms of business and marketing strategy we can call them as 'influence leaders' or 'influencer'. We can categories such influencers as trust worthy and non-truth worthy influencers too, because some are biased people. Now we can observe a structure for the opinion; an opinion requires an object (a brand/product such as mobile/movie etc.), opinion holder who experiences the object and expresses the opinion and the opinion. Opinion Mining/Sentiment Analysis Information contained in any text document can be either subjective or objective or both. Subjective text will be mostly contains positive or negative opinions, while objective text will be facts. So the art Opinion Mining and Sentiment Analysis tries to identify subjectivity and objectivity of a text and further identifies polarity of subjective text. The polarity of a text will be either positive or negative or a mix of both. The polarity of objective text is considered as neutral. In short Sentiment Analysis is automated extraction of subjective content from digital text and predicting the subjectivity such as positive or negative. It aims to explore attitude of a person who created the text. It used Natural Language Processing and Machine Learning principles to spot linguistic structures that determines polarity. Detecting Sentiment from Text We can perform three level of sentiment analysis over a subjective text, document level sentiment analysis, sentence level sentiment analysis and faceted sentiment analysis or feature level sentiment analysis. Document level sentiment analysis aims to detect the sentiment of whole document. It is quite obvious that there are less chance that a single document may contain 100% positive or negative sentiment. But still the sentiment analysis predicts the predominant sentiment expressed in the document. (Predicting polarity of a full length review from http://www.rottentomatoes.com/.) Sentence level sentiment prediction aims to identify polarity of a given sentence in a text. Faceted sentiment analysis aims to predict polarity of sentences or phrases which deals with attributes of object under question (such as predicting sentiment of features related to mobile phone from a textual review). There are different ways to identify and predict sentiment from text. They are lexicon based, Natural Language Processing based and Machine Learning based techniques. There is no harm in trying hybrid approaches to obtain the www.csi-india.org results. In lexicon base approach prepopulated list of words with sentiment probability will be used to spot key sentiment indicators. The approach is quite straightforward; read a text, consult with lexicon find probability value, sum the probability and get the highest probability class. Similarly, we can use a pre populated list of positive and negative words to predict the sentiment too. A combination of linguistic rules and Natural Language Processing Techniques can be used to spot opinion indication and predict the sentiment. Generally, such rules will be finding adjective noun sequences and examines context rules to get the polarity. For example, in the sentence “Service of XYZ mobile phone is not good” 'good' is a positive word but the presence of negation 'not' contradicts the polar nature of the word. Or simple negative and positive word combination creates a negative expression. Adjective or adjective noun sequences can be identified with POS tagging, chunking or with parsing. Once the chunks are identified we can apply rules to identify the polarity. In machine learning based approach a sample data will be populated to train a selected algorithm. The populated data will be manually classified by the polarity value. The trained models will be used along with algorithms to predict the sentiment. Since the article is very short not details of the process involved in each methodologies omitted deliberately. I hope I can cover it in a later note. Challenges in Sentiment Analysis About the Author Language is the most wonderful, dynamic and mysterious phenomena in the universe. Language and its structure is the primary challenge in Sentiment Analysis. Especially the language or “slanguage” used social networks like Twitter and Facebook. As like in the society there are false influences or false opinion leaders who works for money. Identification of such false influencer and spam content is another major challenge in this area. There are other interesting challenges in sentiment analysis such as identification of sarcasm and using deep semantic pragmatic concepts to determine granule level emotion expressed in text. Even though industry adopted it as a technology to earn, still there are open ended issues and challenges to be resolved. Business Applications Sentiment Analysis will be the most widely adopted art from Natural Language Processing to Business and Business Intelligence Applications. Popularity of social networks and high volume of user generated content, especially subjective content caused the heavy demand to adopt sentiment analysis in business applications. Since it mainly deals with consumer centric content the very art can be called as “Marketing Research 3.0”. Sentiment Analysis helps corporates to get customer opinion in real-time. This real-time information helps them to design new marketing strategies, improve product features and can predict chances of product failure. It is not applied only in consumer centric applications. It can be used in Politics and diplomacy to get clear picture of peoples mentality about election campaigns and strategic policies and bills. Sentiment Analysis can even predict the effectiveness of “viral Marketing” and chances of ups and downs in stock prices too. There are good number of commercial as well as free sentiment analysis services. Radiant6, Sysomos, Viralhealt, Lexalytics, AiAiO Labs, etc. are some of the top commercial players in the field. There are some free tools like twittersentiment.appspot.com too exist. I Would Like to Develop a Sentiment Analysis System !! It is not rocket science. Even you can develop a sentiment analysis system. There are lots of Free and Open Source tools available for performing Natural Language Processing and Machine Learning tasks. Also wast amount of consumer generated text data, prepared for sentiment analysis task is available on internet. Tools like GATE, NLTK, Apache Mahout, Weka, Rapidminer, KNIME, OpenNLP etc. can be used to develop your own sentiment analysis system. References [1] Bing Liu (2010). "Sentiment Analysis and Subjectivity". Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau), 2010. [2] Peter Turney (2002). "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews". Proceedings of the Association for Computational Linguistics (ACL). pp. 417–424 [3] Bo Pang; Lillian Lee and Shivakumar Vaithyanathan (2002). "Thumbs up? Sentiment Classification using Machine Learning Techniques". Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 79–86. [4] Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink, retrieved 2010-03-12. [5] Lipika Dey, S K Mirajul Haque (2008). "Opinion Mining from Noisy Text Data". Proceedings of the second workshop on Analytics for noisy unstructured text data, pp. 83-90. [6] Minqing Hu; Bing Liu (2004). "Mining and Summarizing Customer Reviews". Proceedings of KDD 2004. [7] Pang, Bo; Lee, Lillian (2008). Opinion Mining and Sentiment Analysis. Now Publishers Inc. n Jaganadh G is a Natural Language Processing and Machine Learning Developer and Researcher with experience in Sentiment Analysis, Information Extraction, Machine Translation, Spell checker Development, Automatic Speech Recognition (ASR), Text to Speech System (TTS), Internationalization of Domain Names (IDN), Localization, Perl and Python programming. Experienced in preparing software documentation according to ISO and IEEE standards. Well versed in GNU/Linux operating system. A smart Computational Linguist with abilities in developing algorithms for Machine Translation and related NLP field. (365Media Pvt. Ltd., Project Lead (NLP), Coimbatore, Tamilnadu, India, AU-KBC Research Centre, Chennai, C-DAC, C-DIT, Rashtriya Sanskrit Vidyapeeth, Thirupathi, Andhrapradesh) CSI Communications | May 2012 | 23 Article Randhir Kumar*, Dr. P K Choudhary**, and S M F Pasha*** * PhD Candidate of AISSR, at University of Amsterdam (The Netherlands) ** HOD University Department of Sociology, Ranchi University, Ranchi *** Assistant Manager,Computer Society of India Telemedicine in the State of Maharashtra: A Case Study Abstract: The Government of Maharashtra telemedicine project was operationalised in the year 2007 and since then it has taken a path to expand its outreach and number of beneficiaries. This instance provides an example of how the modern ICT can be gainfully used for benefitting the masses, who till now were deprived from getting advanced medical care. The attempt of this case study is to document the path taken by the Health Ministry of Maharashtra in implementing the telemedicine successfully. Key Words: NRHM, HER, Specialist End, Patient End, Teleradiology Introduction Telemedicine is an umbrella term which involves all the medical activity having an element of distance (Wotton, 1998). Although, telemedicine has been practiced since hundred of years by means of letters (See, Wotton, n.d), but with advancement of Information and Communication Technology, there has been a manifold increase in using telemedicine as a tool for delivering medical treatment. Telemedicine not only includes the real time consultation between patient and expert, but it also has the element of getting medical advises on prerecorded medical data such as in the case of ‘teleradiology’or ‘telepathology’[1]. A more sophisticated model has been using it extensively for providing health care benefits to the unprivileged people. These interventions usually are taken in the form of welfare projects involving substantial investment, coordination and planning. The Government of Maharashtra launched its pilot project on Telemedicine in the year 2007, with one Specialist node at KEM Hospital, Parel, Mumbai and 5 sub district hospitals. The prime target areas for this intervention were tribal areas such as those of Sindhudurg, Nandurbar, Beed and Satara. The second phase of expansion involved participation of 5 specialist node, 23 district hospitals and 4 sub-district hospitals. The Maharashtra State Telemedicine project is a part of larger initiative undertaken by Government of India and World Health Organisation. Under the banner of National Rural Health Mission (NRHM), Telemedicine is one of the key initiatives to improve the health services for the rural people of India. The General Framework of Telemedicine Project in Maharashtra The overall network of Telemedicine in Maharashtra can be classified under two broad subheadings, viz. 1. Specialist End 2. Patient End Specialist End: The Specialist end consists of Five Medical colleges. The medical colleges that have been developed as specialist end are KEM Hospital Mumbai, B. J. Medical College Pune, GMC Aurangabad, GMC Nagpur, Sir J. J. Hospital Mumbai. Nanavati Hospital at Mumbai has been made has honorary specialist centre. The J. J. Hospital at Mumbai has a dual role to play. It acts as main server centre for coordinating between the Thane Bombay Alibagh Nashik Pune Satara Ratnagiri Osmanabad Latur Bid Ahmednagar Parbhani Jaina Aurangabad Jalgaon Buldana Amravati Wardha Nagpur Chandrapur Garhchiorli Bhandara Gondia Nandurbar Hingoli Washim Sindhudurg Annexure 1: Name of the districts where Telemedicine have been implemented. Specialist End Patient End SH 1 DH 1 S U B SH 2 SH 3 DH 2 SH 4 SH 5 DH 3...26 DH 27 D I S T R I C T H O S P I T A L S Fig. 1: An overview of Telemedicine frame work in Maharashtra. SH stands for five Specialist Hospitals who provides consultation services. DH stands for District Hospitals (27 in number) which once again has 4 subdistrict hospitals each. Each sub-district hospital further has several primary health centers (not depicted in the figure). specialist centers and patient centers. Additionally, it also provides consultation service for the referred patient through teleconference. Patient End: The patient end constitutes of 27 districts hospitals of Maharashtra (See, Annexure 1). Furthermore 4 Sub district hospitals in each district acts as centers where patient from nearby areas come for consulting the doctors. All the district and sub district hospitals are equipped with modern state of art telecommunication network system for carrying out teleconferences. The SubDistrict hospitals are further sub-divided into Regional Hospital (RH) and Primary Health Centre (PHC). The diagrammatic representation of the present set up has been depicted in Fig. 1. Technical Support: The first phase of telemedicine was technically supported by Indian Space Research Organisation (ISRO) who provided their expertise in network connectivity. Initially there were serious troubles with internet connectivity 1 Radiology is specialized medical branch which involves using of imaging technologies (X-Ray, MRI, CT Scan etc.) to identify and treat the anomalies in human body. Pathology involves with identification of diseases based on laboratory analysis. CSI Communications | May 2012 | 24 www.csi-india.org as many times the connection would be snapped. Later, this trouble was solved by using dedicated lease lines of fiber optic cables having a high bandwidth capacity. Thereafter a medical equipment supplier company “Progonosis” provided facilities for video conferencing along with other basic medical equipments such as those of scanner, BP apparatus etc. Management Structure The whole project has a Mission Managing Director (MD) under whom there are several Joint Directors followed by Assistant Directors. All three positions together form the top management who make the critical decisions in the implementation of overall project. Additionally, independent consultants are hired for giving their expertise from time to time. The ground level day to day operations are taken care by the coordinators and facility managers of technical support services. Each district has nodal officer who is responsible for overall day to day operation of telemedicine project at their district. Other than these managerial and support staff, a whole set of dedicated doctors both at Specialist and Patient End are involved in the consultation and treatment of patients. The doctors are not paid any extra by the government for consulting patient through telemedicine. However, an honorary sum of Rs. 100/and Rs. 300/- are paid to the doctors of District Hospital and Specialist Hospital per patient. The Motto of Telemedicine The primary motive of implementing a pan state telemedicine network was to provide a better access of super-specialty medical care to the residence of remote areas where they either do not have sufficient time or lack enough resource to travel to big cities for advance treatment. Highlighting the present medical system Nodal Officer of Mumbai area, Ms. Sandhya Tayde apprised that “The areas targeted for telemedicine intervention had a poor access to trained doctors or medical staff. Furthermore, due to the distance factor and cost involved in seeking a first hand specialist opinion was both time consuming and costly affair. We using telemedicine have tried to reduce the time of intervention and cost and improve the quality of treatment The CME division (Continuing Medical by getting specialist opinion at their place of Education) is very proactive in dissipating residence only.” In a way this was a positive the latest knowledge or medical cases development for rural folks who did not to the staff. At regular intervals of have an idea of how and where to go for time CME is organised and along with a particular type of disease of illness. technical knowledge various attitude Furthermore, by early detection of serious and behavioral skills related session are life threatening illness such as in cancer delivered, which in turn helps in creating patients, lives can be saved by early improved clinical performance and detection and timely intervention. professional development. Additionally, Another key beneficial feature of via tele-conferencing between medical Telemedicine intervention is its ability colleges and district hospitals computer to build and maintain a central database specific skill set are imparted to equip having all the details pertaining to the medical professionals to trouble patient medical history and treatment shoot minor technical problems. administered to him/her. This means Impact and outreach of Telemedicine that there is one centralized monitoring in Maharashtra hub from where all the data can be The telemedicine drastically reduced the accessed from any remote location at a time taken for seeking an expert advice. given point of time. This also means that According to Ms. Tayde earlier the wait patient digitized data related to X-Ray, period for a patient to seek an appointment CT scan, Pathology report etc. are easily with specialist was on an average of three accessible and opinion from different months. However, now the wait time has specialist can be sought before deciding reduced drastically as they can divert the a particular course of treatment. It also patient digital information to any expert ensures completeness and correctness who is willing to handle the case. The of information and past data records junior doctors involved in district hospital are often utilized by specialist for better also learn in this whole process of referring management of health care services. S.No. Specialty Patients Referred Opinion Received The Telemedicine from District from Specialty system in Maharashtra has (April 2010 to Centers (April 2010 March 2011) to March 2011) been equipped to seamlessly capture and upload patient 1 Medicine 1059 1032 information, waveforms and 2 Surgery 344 316 images from remote location OBGY 146 207 to a centralized server and 3 Pediatrics 393 387 get experts opinion or review 4 instantly within the network 5 Cardiology 65 51 (intranet) or at a later point 6 Neurology 45 44 of time. An Electronic Health 7 Anesthesia 28 29 Record (EHR) is generated for Chest 25 23 each patient and is archived in 8 digital format. During cardiac 9 Ophthalmology 24 24 arrest or other emergencies, 10 Skin VD 85 83 the ECG and other relevant ENT 76 43 data can be instantly 11 Orthopedics 278 287 transmitted and the doctors at 12 remote location can suggest a 13 Psychiatry 40 40 course of action based on the 14 Radiology 1301 1400 live data. Ayurvedic 68 30 Other than consulting 15 and archiving medical 16 Unani 155 160 data, Telemedicine has 17 Forensic 38 36 been innovatively used in Maharashtra to train Table 1: Specialty wise patient referred and opinion received for the same in the year 2010-11. and develop medical staff Source: Arogya Bhavan, CST Mumbai. personnel at patient end. CSI Communications | May 2012 | 25 Year Patient Referred Opinion Received 2008-09 538 448 2009-10 3640 3739 2010-11 4230 4195 Total 8408 8382 opinion received for them, from the year 2008-11. Thus one can observe from the table above that the telemedicine has been quite popular among its end user and has been catering for the service needs of the poor and unprivileged rural people residing in remote areas of Maharashtra. Conclusion and the way ahead Table 2: The number of patient referred through telemedicine and expert opinion received for the referred cases. Source: Arogya Bhavan, CST, Mumbai. the cases and having a discussion with the specialist over tele-conference. At times special rural camps on community health and ophthalmology are organised through telemedicine equipments mounted on mobile vans. The kind of specialist services extended through telemedicine is in 30 area of medicine which is quite broad. The key and most used specialist services are related to cardiology, dermatology, pathology, ophthalmology, ENT, surgery (consultation), neurology and medicine. The data related to the number of patient referred in the year 2010-11 has been summarized in table 1. Table 2 summarizes the total number of patient referred and The development in telecommunication technology has given birth to modern telemedicine, which has found its way into improving the health services for the unprivileged masses. Maharashtra has successfully implemented the telemedicine across its districts in two phases. In the first and second phase of the project, all the district and sub divisional hospitals have been linked with the state medical colleges. Now Maharashtra government is planning to implement the phase 3 of the project which proposes to link all the Primary Health Care Centers (PHC, Primary level) to medical colleges (tertiary level). This means creation of a complete network of primary (PHC), Secondary (District Hospitals) and Tertiary (Medical Colleges) for ensuring proper and better care of the patients. This network is expected to reduce mortality and morbidity thus saving more lives by ensuring continuity of care throughout the network. The present setup of Telemedicine network in Maharashtra is one of the largest in India. Telemedicine intervention has been successful in reducing travel by patient and therefore saving their costs involved in travel, food, accommodation along with pay loss due to taking leave from regular work. It also meant less flocking of patient in the specialty hospital and the doctors can give their opinion by looking the digitized data of patients, according to their convenience. Telemedicine has also reduced cost involved in training and development of medical staff for Primary Health Care center. Therefore, telemedicine is a perfect instance where amalgamation of technology and social cause has resulted in welfare of deprived masses. References [1] Wootton R. (1998) Telemedicine in the National Health Service, J R Soc Med. Vol. 91, No. 12, pp. 614-21. [2] Wootton R. (n.d) ‘Telemedicine’ in Lock S, Dunea G, Pearn J, (eds.) Illustrated Companion to Medicine, UK: Oxford University Press (in press). n About the Authors Randhir Kumar is a PhD candidate of AISSR (Amsterdam Institute of Social Science Research) at University of Amsterdam (The Netherlands). He secured his Masters degree in 'Globalisation and Labour Studies' from Tata Institute of Social Sciences (Mumbai); after which he worked as a Research Associate in Personnel Management and Industrial Relations Area of IIM (Ahmedabad). Dr. P K Choudhary (Double MA, PhD and NET JRF) is a HOD of University Department of Sociology, Ranchi University, Ranchi. Having more than 17 years of Experience in Research and Teaching at University level, he is Program Committee chair for various national and International Conferences. He has written several articles and books on various societal issues. Considering his knowledge and expertise, State and Central Governments have given him additional authority to lead various Development Projects of Jharkhand. S M Fahimuddin Pasha is an Assistant Manager at Computer Society of India. He has done M A in Globalization and Labour from Tata Institute of Social Sciences (Mumbai) and MA in Sociology from Ranchi University . He is on the verge of completing his PhD in Industrial Sociology. He is also a Researcher with International Institute of Social History (Amsterdam, The Neetherlands) and an invitee to the University of Leipzig, (Germany) to adress on the issues of 'Detorization of Working Class'. CSI Communications | May 2012 | 26 www.csi-india.org Technical Trends Satyam Maheshwari* and Sunil Joshi** * Assistant Professor, computer applications in SATI Degree, Vidisha (MP) ** Assistant Professor, computer applications in SATI Degree, Vidisha (MP) Extending WEKA Framework for Learning New Algorithms Waikato Environment for Knowledge Analysis (WEKA) is a collection of stateof-the-art machine learning algorithms and data preprocessing tools. It is designed so that you can quickly try out existing methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result preprocessing, clustering, classification, regression, visualization, and feature selection. All of WEKA techniques are predicted on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric, or nominal attributes and it also supports other type of attributes). The easiest way to use WEKA is a graphical user interface called the Explorer. The All of WEKA techniques are predicted on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric, or nominal attributes and it also supports other type of attributes). of learning. WEKA was developed at the University of Waikato in New Zealand and is an open source software issued under General Public License[2] written in java. It runs on almost any platform and has been tested under Linux, Windows, and Macintosh operating systems. Recently an article was published in CSI which showed application of WEKA in Bio-inspired algorithm[1]. The authors emphasized on MLP classifier using genetic algorithm and fuzzy logic. They gave information about the existing framework. In this article, we extend the existing framework of WEKA in which we can add new classifier and cluster and then can trend the dataset from new algorithms. The key features of WEKA’s success are as follows: 1. It is open source and freely available; 2. It provides many different algorithms for data mining and machine learning; 3. It is platform-independent; and 4. It is up-to-date, with new algorithms being added as they appear in the research literate. WEKA[3] supports several standard data mining tasks, more specifically, data data uses a so-called filtering algorithm. These filters can be used to transform the data (e.g. turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria. The “Classify panel” enables the user to apply classification and regression algorithms (indiscriminately called classifiers in WEKA) to the resulting dataset; to estimate the accuracy of the resulting predictive model; and to visualize erroneous predictions, ROC curves, or the model itself (if the model is amenable to visualization, e.g. a decision tree). The “Associate panel” provides access to association rule learners that attempt to identify all important interrelationships between various attributes in the data. The “Cluster panel” gives access to the clustering techniques in WEKA, e.g. the simple k-means algorithm. There is also an implementation of the expectation maximization algorithm for learning a mixture of normal distributions. The next panel, “Select attributes”, provides algorithms for identifying the most other user interfaces to WEKA are Experimenter, KnowledgeFlow, and Simple CLI. The Experimenter gives access to all of its facilities using menu selection and form filling. The KnowledgeFlow provides an alternative to Explorer for showing how data flows through the system. It also allows the design and execution of configurations for streamed data processing. The Simple CLI is a command line interface for executing WEKA commands. The main interface Explorer has several panels that give access to the main components of the workbench. The “Preprocess” panel has facilities for importing data from a database, a comma-separated values (CSV) file etc., and Fig. 1: Existing snapshot of WEKA for preprocessing this CSI Communications | May 2012 | 27 adding a new classifier or a cluster which is not included in existing WEKA GUI, want to investigate a new learning scheme,, or want to learn more about the inner workings of an induction algorithm by actually programming it yourself then integrate new workspace in WEKA. WEKA can be extended to include the elementary learning schemes for research and educational purposes. Fig. 1 shows the existing framework of WEKA. Now we represent the method to add a new classifier in WEKA, we follow the following steps: 1. Create a new folder in a window directory hierarchy. Ex. C:\SmWork\classifiers 2. To enable or disable dynamic class discovery, the relevant file to edit is Fig. 2: Snapshot of WEKA displaying new added classifier GenericPropertiesCreator.props (GPC). This file can be obtained predictive attributes in a dataset. The last from the weka.jar or weka-src.jar archive. panel, “Visualize”, shows a scatter plot These files can be opened with an archive matrix, where individual scatter plots can manager that can handle ZIP files and be selected and enlarged and analyzed navigate to the weka/gui directory, where further using various selection operators. the GPC file is located. All that is required WEKA can handle a number of file formats, is to change the Use Dynamic property including the ever-popular CSV (which in this file from false to true (for enabling can be exported from any spreadsheet it) or the other way round (for disabling program). WEKA prefers, however, to it). After changing the file, just place it in work with ARFF files, which are basically home directory. For generating the GOE CSV files with some header information file, we need to execute the following tacked on. steps: Suppose we want to implement a Java weka.gui.GenericProperties special-purpose learning algorithm i.e. Creator %USERPROFILE%\Generic PropertiesCreator.props %USERPROFILE%\GenericObject Editor.props 3. Remove WEKA.JAR from the CLASSPATH. 4. Edit the GenericPropertiesCreator. props file in the home directory and set UseDynamic to false. 5. Add SmWork/classifiers in Generic PropertiesCreator.props and Generic ObjectEditor.props. 6. Run the command java –classpath c:\progra~1\weka-36\weka.jar;c:\SmWork\classifiersweka. gui.GUIChooser Now we can write our new java code, compile it, and then copy the class file into a specified folder. Fig. 2 shows snapshot of newly added classifier. Similarly, we can extend WEKA for cluster and association as well. References [1] Goli, B and Govindan, G (2011). WEKA - A powerful free software for implementing Bio- inspired Algorithms, CSI Communication, 35(9), 09-11. [2] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1. [3] Written I H and Frank, E (2005). Data Mining: Practical Machine Learning Tools and Techniques, San Francisco: Morgan Kaufmann. n About the Authors Satyam Maheshwari received the MTech degree in Computer Technology and Applications from RGPV Bhopal. Since 2003, he is Assistant Professor in the department of computer applications in SATI Degree, Vidisha (MP). His research interest is classification of imbalanced dataset in Data Mining. He is a Member of IEEE, CSI, and ISTE. Sunil Joshi received MCA degree in 2001 from SATI Vidisha. Since 2005, he is Assistant Professor in the department of computer applications in SATI Degree, Vidisha (MP). Currently he is pursuing PHD degree in frequent pattern mining at University of RGPV. He is a member of IEEE and ISTE. CSI Communications | May 2012 | 28 www.csi-india.org Practitioner Workbench Dr. Debasish Jana Editor, CSI Communications Programming.Tips() » Passing Variable Number of Arguments in C Ever wondered how a printf or scanf is declared in C or C++? Why do I raise this? Because, printf and scanf are such type of functions that can take variable number of arguments. For example, you could use as: printf(“%d %c”, someinteger, somechar); Where someinteger and somechar are of int and char types respectively. int someinteger; char someinteger; We could have decided to print only one integer as below: printf(“%d”, somechar); Or, simply, just a string as: printf(“Hi There”); In the above three examples, we have printf taking three, two and one argument respectively. If you look closely, you will wonder that in all three cases, first argument is a character string, and in 1st case, the second argument is an integer (int), third argument is character (char). In 2nd case, second argument is an integer (int) and there is no third argument. In 3rd case, there is no second or third argument either. But, C does not support functions to be overloaded. So, we don't expect that we have so many different variants of printf (and scanf and similar functions) are declared. C++ inherited these from C, so in C++ we have printf/scanf taking similar form. In C, there is a syntax for optional parameter as triple dots i.e. "...". This allows to pass a list of variables as defined in the format string (first argument). Thus, the same method can be used to print things like this: int someinteger; char someinteger; printf(“%d %c”, someinteger, somechar); printf(“%d”, somechar); printf(“Hi There”); In fact, printf is a function with the following signature: void va_end(va_list ap); This must be called once after arguments processed and before function exit. An example program follows: #include <iostream.h> #include <stdarg.h> int sum( int first, ... ); int main() { // Call with 3 integers // (-1 is used as terminator). cout << "sum is: " << sum( 2, 3, 4, -1 ) << endl; // Call cout << << << with 4 integers "sum is: " sum( 5, 7, 9, 11, -1 ) endl; // Call with no integer : just -1 terminator cout << "sum is: " << sum( -1 ) << endl; return 0; } // Returns the sum of a variable list of // integers int sum( int first, ... ) { int s = 0, i = first; va_list marker; int printf(const char *format, ...); This means that it requires at least one argument as a character string, followed by 0 or more number of arguments (which can be of several different types). The return type (int) signifies how many bytes have been printed in the result. The number and type of the arguments are determined by the format string. There is a C header file stdarg.h that contains functions related to facilities for stepping through a list of function arguments of unknown number and type. The important functions are as given below: void va_start(va_list ap, lastarg); This Initialization macro is to be called once before any unnamed argument is accessed. ap must be declared as a local variable, and lastarg is the last named parameter of the function type va_arg(va_list ap, type); This produce a value of the type (type) and value of the next unnamed argument. Modifies ap. // Initialize variable arguments va_start(marker, first); while( i != -1 ) { s += i; i = va_arg( marker, int); } va_end( marker ); // reset variable arguments return s; } The output when the program is run is given below: Output 7.4 sum is: 9 sum is: 32 sum is: 0 n Do you have some Interesting Programming Tips to share? This could be in any Programming Language or Software tool. Share with us. Send your summarized write-up to CSI Communications with subject line ‘Programming Tips’ at email address [email protected] CSI Communications | May 2012 | 29 Practitioner Workbench Umesh P Department of Computational Biology and Bioinformatics, University of Kerala Programming.Learn (“Python”) » Plotting with Python Snakes are becoming popular among pet lovers as it is easy to care, exotic, and you don’t need to feed them daily like a dog or cat. Corn snake, Ball python, California King snake, Milk snake, Boa constrictor etc. are popular pet snakes. Among pythons, Ball python is considered to be one of the best pets for beginners. Ball pythons are docile and are 5-feet long. In some countries, there are online stores who deliver snakes on payment. Matplotlib is an object-oriented plotting library for python. It is a MATLAB/Scilab-like application programming interface (API) and provides accurate high-quality figures, which can be used for publication purposes. Matplotlib contains pylab interface, which is the set of functions provided by matplotlib.pylab to plot graph. matplotlib.pyplot is a collection of command-style functions that helps matplotlib to work like MATLAB. To start a plotting experiment, first we need to import matplotlib.pylab. >>>import matplotlib.pyplot as plt Here library - matplotlib.pyplot - is imported and labeled as plt for easy future reference of the module. >>>import matplotlib.pyplot as plt >>>plt.plot([1,2,3,4], [4,3,2,1]) >>>plt.axis([0,5,0,5]) >>>plt.show() The plot function accepts the plotting points as two arrays with x,y coordinate respectively. Pyplot fits a straight line to the points. If you need only a scatter diagram of the points try the following code: >>>plt.plot([1,2,3,4], [4,3,2,1], 'ro') You can plot the graph using different colors and styles by putting an argument after the plot function. >>>import matplotlib.pyplot >>>x=arange(1.,10.,0.1) >>>y=x*x >>>plot(x,y,'g--') >>>show() After plotting the graph, to view it, you need to type show() command. Here you will get a green line graph; try with r for red, y for yellow etc. We can specify shapes with cryptic reference such as S for square, ^ for triangle etc. >>plot(x,y,'rs') >>plot(x,y,'g^') # Red square # Green triangle Standard mathematical function can also be plotted. Let us plot sine curve: >>>from pylab import * >>> x = arange(0.,10.,0.1) >>> y = sin(x) >>>plot(x,y) >>>grid(True) >>>show() # # # # # to define x values function definition to plot to show graph in grid to show the plot Pylab contains the pyplot with numpy functionalities. If you are importing matplotlib library, you need to import numpy also for defining array. CSI C SI Commun Communications unic icat ations | May y 201 2012 012 | 30 01 n ww w w.csi-in ndia.org g www.csi-india.org CIO Perspective Dr. R M Sonar Chief Editor, CSI Communications Managing Technology » Business Information Systems: Underlying Architectures Previous article covered basic elements of a system such as input, processing, and output. Interfaces facilitate interactive environment to get input into a system and present output in a variety of forms such as reports. Processing involves a) execution of business logic implemented through programming languages and b) management of required data: storage, access, and manipulation. In short, software that implements ISs can be logically divided into three layers based on functionality: interfaces (presentation services), core business logic, and data services as shown in Fig. 1. Table 1 describes these layers. The components which implement functionality of those layers can be coupled either tightly or loosely. Loose coupling brings a) greater flexibility in developing and deploying components separately in networked environment in distributed fashion, b) flexibility in interconnecting heterogeneous systems and platforms, and c) better scalability and maintenance of information systems. The ISs which have all these layers managed by a single computer program is called as single-tier system, while ISs that have separate programs/systems to implement individual functionality are called as three-tier systems. In some systems, business logic may be implemented using multiple programs/systems, which are called as n-tier systems (refer Fig. 2). Single-tier Systems connected to each other. Examples of such systems include reservation systems which are developed in languages like COBOL and deployed under centralized mainframe environments. As shown in Fig. 3, thin clients are just devices with no processing capabilities (called dumb terminals) that are used for input (e.g. data entry) and display information. Many independent ISs that were developed in languages like C were single tier where the program manages user interfaces, processing as well as file handling. Decision support systems developed using desktop productivity tools like MS excel manage user interfaces, processing as well as data inside the same excel workbook are also examples of single-tier systems. • • These are centralized systems where all functionalities are tightly connected and implemented in a single information system (monolithic). Easier to support and maintain. These are secure systems as there are only limited entry points to the system. The users have to access the system through interfaces provided and typically these are through dumb terminals with no other devices/ systems connected to them. Computationally efficient because most of them are written in core programming languages, no overheads of other software like database servers. Business logic Data services Fig. 1: Logical separation of tasks (tiers) in IS Key issues • • • • • Key benefits • The program that implements ISs takes care of interfaces, business logic, and data services as shown in Fig. 3. The components of all these layers are tightly Interfaces Users have limited choice while accessing data. Lot of explicit and exhaustive coding is required as the program that implements IS has to manage all functionalities. More dependence on the vendor for support, especially if the systems providing customization capabilities are not based on open standard. Disadvantages of conventional file handling. Since most of these systems are based on centralized computing, failure of such systems can cause major disruption in services. Client/Server Systems In client/server systems (server is referred as database server), interfaces are taken care by client machines (usually desktops) and data services by DBMS as shown in Fig. 4. Client machines interact with database systems in a loosely coupled manner. The client machines send requests (or send DB commands) to the database systems; the database systems respond to that request and send required data or execute requested command. The business logic is split into two parts: client side and server side. Since the majority of business logic is implemented at client side, the Layer Functionality Components/Types Interfaces Takes care of presentation services. Facilitate input, Text-based data entry interfaces, GUI-based (windows) validation, and output. interactive forms, IVR, SMS, WAP and web-based forms, unstructured supplementary service data (USSD), static and interactive reports, and dashboards and multimedia interfaces. Business logic Execution of core processing logic. Data services Defining data models, creation, storage, access, File handling and management, data stores, database and manipulation of data required. management systems (DBMS), XML storage and access etc. Core modules, functions, procedures, APIs (libraries), Webservices, stored procedures etc. Table 1: Functionality implemented by layers and components CSI Communications | May 2012 | 31 Flexibility, personalization, access, and ROI Key issues • Web-based (N-tier) Client/server Single-tier • Distributed computing, modularity, open standard, and scalability • Fig. 2: Computing architectures clients are typically fat client (machines requiring higher computing resources). ISs developed using tools VB (Visual Basic) as front-end and Oracle as back-end fall under this category. All installations of such ISs at every deployment locations (such as branch offices) need database server and client machines connected over local area network. Client/server systems can be further enhanced to have better ROI using thin-client (GUI-based) technologies like ones from the vendors such as Citrix. Fig. 5 shows an example. In such architectures, instead of many fat-client machines only few client machines (even only one) are used where application processing is done. Operating systems like Windows 2000 allow multiple instances of IS running on the same machine. Using such thin-client technologies, these ISs can be accessed by many users over thin clients. Fat-client machines are typically server machines often called as terminal servers. Such technologies drastically reduce support and maintenance efforts as they do not need to install interfaces and business logic on many fat-client machines. Only one instance is shared amongst many through thin clients. This is some sort of virtualization. • • data storage, access, and manipulation. They take care of concurrency, redundancy, security, and consistency of data. Most of the database servers use standard query languages to access and manipulate data. Database systems are loosely coupled; end users have a greater degree of freedom in accessing data and creating customized report based on requirement. Option of using various database management systems and client side development tools. Interface (thin client) Business logic • • These systems are deployed on networking environment; if not properly configured security can be an issue as there can be multiple entry points into the systems. For example, users can have direct access to data in database server. Database administrator needs to set proper access rights and controls based on users and their roles. Scalability can be an issue especially when the number of clients increase. Load on database server increases as the number of clients accessing that server can increase, as it manages exclusive session for each one. Such system is difficult to manage, especially support and maintenance, when deployed in large scale at different locations. Even a small change in user interface needs to update client components at all locations. Dependence on database, especially if lot of business logic is implemented at database server. If systems are not properly designed, developed, and configured, it may lead to inefficient use of network bandwidth; for example, lot of data exchange between client and server. Data services (file handling) Data files Key benefits • • These are distributed systems normally deployed in LAN environment where many client machines are connected to a common database server. They use various resources: client side, server side as well as network. Data services are managed by database servers which take care of CSI Communications | May 2012 | 32 Mainframe File storage Thin clients (e.g. dumb terminals) Fig. 3: Single-tier systems www.csi-india.org • Interface Business logic Business logic Fat client Data services DB server (DBMS) • • • Network DB server business logic and database services can be centralized. Database systems are loosely coupled; end users have a greater degree of freedom in accessing data and creating customized report based on requirement. Since components are loosely coupled, these systems are highly scalable (load balancing is possible by deploying many servers) and accessible. These systems are based on open standards and can interconnect different systems. Core business logic as well as interfaces can be designed and implemented at granular/component level (e.g. as web service, mashups etc.) thereby increasing reuse and new systems can be built with relatively less effort using serviceoriented architecture (SOA). Key issues Client PCs • Fig. 4: Client/server systems Web-based N-tier Systems In web-based systems, functionalities of all the three layers are separated, run on different machines/devices and are loosely coupled. They are deployed under Internet, intranet (Internet-like setup within the organizations using all technologies, protocols, and standards that are used in Internet), and extranet (extending intranet setup to outside stakeholders like business partners, dealers, vendors, agents etc.) environments. In such ISs, interface functionality is taken care by client machines/devices, business logic by web server (which stores and delivers web pages), and data services by database servers. However, in some cases part of business logic is moved at DB server. Business logic can be split into multiple servers like web server and application server (which takes care of specific functional requirements like CRM). The client can be a desktop machine, thinclient machine, smart device supporting browser, or any device that supports Internet connectivity (refer Fig. 6). The computational resource requirements at client side depend upon functionality to be executed on that. Some clients require more processing power (e.g. rich Internet applications (RIA)) and applications that need to install some components like ActiveX etc. However, many web-based information systems just need a browser to access them from client machine. Since these systems are highly distributed, openly accessible, have multiple entry points, and interconnect many systems they are vulnerable to attack. If systems are not properly configured, they can face security threats. Dependence on network connectivity. Key benefits • • Table 2 shows examples of how components in three layers are implemented in single, client/server and web-based systems. These are completely distributed systems and use optimal resources: client side, server side, and Internet/ intranet and extranet. However, core Interface (thin client) Business logic (e.g. Citrix) Fat client Business logic DB server Network Network Terminal server Thin clients Data services DB server Fig. 5: Thin client based client/server systems CSI Communications | May 2012 | 33 Interface Business logic Client (thin/rich) Web server • Data services • DB server Internet/ Intranet/ Extranet Network Application server Web server (can be in multiples) Desktops, laptops, smart devices, thin clients DB server (can have many instances) Fig. 6: Web-based n-tier systems Key issues • deal of flexibility in selecting subscription models based on functional and technical requirements. Cloud-based Systems The Internet has evolved from a platform that delivered web contents to the platform to perform a variety of computing services. Instead of managing information system ISs and IT infrastructure on premise, organizations are outsourcing them to thirdparty vendors called cloud vendors. Vendors do not sell their software, platforms, or infrastructure as products and solutions but as services. Client organizations do not need to buy them but use and access on demand. This is equivalent to renting a car instead of owning it. There are various service models cloud vendors offer: software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Fig. 7 shows basic architecture of cloud computing. The client organizations can choose services based on their requirements. There is great Key benefits • • • • Client organizations neither need to own IT infrastructure and resources nor need to maintain and support them. They do not have to deal with constant changes in technologies. Better ROI. Better ROI, cloud vendors make resources available based on requirement and demand. Since cloud vendors provide services to many clients they can have economies of scale. Systems, platforms can be tested before renting/subscribing etc. There are many players who are part of cloud vendor ecosystem (e.g. independent software vendors, developer, and expert communities) the client organizations can take advantages of the same. Different service/subscription models can be opted depending upon requirement. It offers point-to-point and seamless connectivity to the client firm and all its stakeholders like employees and business partners in the ecosystem. For example, employees can access email directly from cloud services (e.g. Gmail) instead of connecting to/accessing it from corporate email server. Similarly, business partners can access the system from the cloud (instead of accessing it from the firm’s IT data center) that the organization accesses. There is no need of even having extranet kind of environments. • • • Security is one of the major concerns for client organizations as services are offered on shared basis and executed remotely. Lock-in cost can increase in case cloud vendor uses proprietary technologies. Cross-country legal framework to enforce service-level agreements (SLA) between client organizations and cloud vendors. Dependence on availability of Internet connectivity and required bandwidth. Summary Many client and software firms are opting for n-tier computing architectures, and there is a clear shift toward building and using cloud infrastructures and services. The IS/IT has moved from highly distributed systems to centralized architectures (like core Single Client/server Web-based n-tier systems Interface Developed using core programming language/tools. GUI forms (e.g. visual basic). Interfaces are tightly integrated to client information system. Web forms/pages. Interfaces are loosely integrated and downloaded from web server. Business logic Through core programming logic/ supported by tool. Implemented using languages like VB and partially at server side using DB programming languages. Implemented using core programming and scripting languages. Data services Program which implements the IS configures, accesses, and manipulates data files. Managed by database servers. Server Managed by database servers, data side business logic is implemented using stores, and XML files. DB programming languages such as PL/ SQL in Oracle (commonly referred as SPs: stored procedures). Table 2: Implementation of various layers: some examples Continued on Page 36 CSI Communications | May 2012 | 34 www.csi-india.org Security Corner Adv. Prashant Mali [BSc (Physics), MSc (Comp Science), LLB] Cyber Law Expert Email: [email protected] Information Security » Cyber Crimes on/by Children I would like to start this article with two distinct cases I am handling: one in which the child is the prey to cyber crime and another where the child has committed cyber crime. Case One: This child is 14 years old, the biggest mistake she made was that she used to write every single disagreement or fight she had with her mother or father on daily basis. Moreover, she used to substantiate her loneliness further with a small poem. A cyber criminal befriended her and used her loneliness as a sword to sexually abuse the girl. The girl is in deep mental trauma and the family in distress. Even though we traced the cyber criminal, but the larger question still remains. Case Two: This standard IX boy suffering from dyslexia was abandoned by his girlfriend studying in VIII. Moreover, the girl often taunted him with being impotent. This boy decided to take revenge on her, and using the girl’s photograph made her fake profile on Facebook. Further, he went ahead and wrote her actual mobile number with a comment that “I am a prostitute. Please call”. The girl started receiving hundreds of unsolicited calls. The case was investigated and the boy was arrested for his cyber crime. Children use the Internet for everything these days, from homework to keeping in touch with friends. Chat rooms, message boards, forums, instant messages, and Facebook has changed the way the world talks to each other. Thanks to these new communication portals, it is now possible to be in contact with people from all over the world instantly. While the majority of people on the Internet are simply using it for research or a form of entertainment, there are some who use the World Wide Web as a way to stalk and hunt prey. These cyber criminals are considered by most to be psychologically ill and in need of help. However while that is true, these pedophiles are also extremely manipulative and know how to not only attain their prey, but they are also experts at isolating those innocent members of online communities in order to get what they want. According to a recently released survey of online security technology firm, McAfee, 62% of children shared personal information online and 39% of parents were unaware of what their children do online. The survey says 58% of the children polled shared their home address on the Internet, while 12% have been victims of some kind of cyber threat. Technique of Cyber Criminals Cyber criminals often use a tactic called "grooming". The first step in this process is finding a victim. This can be done in a chat room or by reading blogs. The criminal will often look for something to share with the victim. It could be a birthday or a favorite sport, anything will do. This is simply done to initiate communication. The next thing you know emails are being exchanged and a friendship has started. The next step in the "grooming" process is to create a wedge between the victims and their parents, guardians, or protectors of any sort. This can be done by waiting for the right moment. Perhaps an email from the victim describes a disagreement between them and their parent or a blog tells of an argument. This is the perfect opportunity for the cyber criminal to become a friend and ally. Before you know it, the relationship has developed into a trust where the predator is always on the victim's side no matter what. Eventually this leads to a face-to-face meeting where the actual crime takes place. It is extremely important for parents to be completely aware of their children's actions on the Internet. What seems like a simple friendship to a child could be a predator catching their prey. What are the Different Signals that Your Child is at Risk on Internet? 1. Your child spends large amounts of time online, especially at night. 2. You find pornography on your child's computer. 3. Your child receives phone calls from men you don't know or is making calls, sometimes long distance, to numbers you don't recognize. 4. Your child receives mail, gifts, or packages from someone you don't know. 5. Your child turns the computer monitor off or quickly changes the screen on the monitor when you come into the room. 6. Your child becomes withdrawn from the family. 7. Your child is using an online account belonging to someone else. Children Can Commit Cyber Crimes in Following Ways by Using the Computer as a Target (Using a Computer to Attack Other Computers) Did you know that the majority of cyber crimes in this category are committed by children? In April 2012, a teenager was arrested for creating a devastating computer worm. How did he learn to do this? A simple Internet search will reveal all the tools necessary to create viruses and hack into others’ computers. Hacking can take a variety of forms, ranging from stealing passwords and classified information to vandalizing websites. Unauthorized entry into an information system through hacking or viruses has serious legal consequences. Talk with your child about the ethical and legal implications of hacking, which attracts up to 3 years of imprisonment and Rs. 5 lakhs of penalty in India. The Computer as a Weapon (Using a Computer to Commit Real World Crimes) Take, for instance, email. Children believe email is harmless because they don’t see the impact on the person who receives it. A growing trend with the use of email and Facebook is harassment; children are saying things to other children—both at school and in other communities—that they would never say face-to-face. Parents need to teach their children about appropriate communication through email and Facebook. The Computer as an Accessory (Using a Computer to Store Illegal Files or Information) The Internet is a useful tool for finding information in a quick and convenient way. Even though much of this information is available for everyone to use, many products and services found online are not permissible to be reproduced or downloaded, especially music and purchasable programs. Popular peer-to-peer software programs make it easy to share copyrighted material and actually encourage downloading. However, it is a violation of copyright law to take music or software from the Internet without the permission of the owner. It is easy for children to understand why the theft in the real world is wrong, but it is difficult for them to understand theft of intellectual property. Teach your children not to download pirated or counterfeit material. Downloading illegal material attracts IT Act, 2000 provisions as well as Copyright Act provisions. Cyber Parenting is the need of the hour, schools and colleges should take initiatives to make parents aware of the current issues, crimes, and the law of the land. I do my bit by conducting free workshops in schools and classes, but a major awareness drive by the n Government is the need of the hour. CSI Communications | May 2012 | 35 Security Corner Mr. Subramaniam Vutha Advocate Email: [email protected] IT Act 2000 » Prof. IT Law Demystifies Technology Law Issues: Issue No. 2 Prof. IT Law: There are other contracts that are not so obvious to most people. For example, when you access a website, you agree to their terms and conditions and that is also in the nature of an electronic contract. IT Person: But I do not sign anything there. On the other hand, when I buy something I click on the BUY button or something like that. Prof. IT Law: When you browse a site you have, by that very action of browsing, accepted the terms and conditions for accessing that site. IT Person: But I do not ever read the terms and conditions. Prof IT Law: Like millions of others. But that does not mean you have not agreed to the “access terms” of that site. Moreover, it also does not mean that you have no binding electronic contract with that site or its owners. IT Person: This is confusing. Please explain in a way I can understand. Prof IT Law: In terms of contract law, you can accept an offer in many ways. For instance, on a website for sale of products The Basics of an Electronic [Internet-based] Contract: IT Person: Prof. I. T. Law, it is a pleasure to meet you again. I look forward to an enlightening discussion with you on Technology law issues that people like me should know. Prof. IT Law: I enjoy talking to you too. What topic should we discuss today? IT Person: How about electronic contracts? Prof. IT Law: Yes, that is a fundamental issue in electronic commerce. All commercial dealings over the Internet are in the form of electronic contracts. However, it is so easy to engage in buying or selling over the Internet that we may sometimes overlook the fact that we are getting into electronic contracts. IT Person: Can you give me some examples, please? Prof. IT Law: Well, think of the air tickets you buy over the Internet, products you buy on Flipkart or Snapdeal, or train tickets or bus tickets. IT Person: Yes, I understand. Those are the obvious contractual transactions we engage in. you can accept an offer by ordering a book or a bag. On a website that provides mere information, you can accept their offer of information by merely browsing the site. Thus, your acceptance can be indicated by the mere action of browsing the site, which results in a contract that binds you to its terms. IT Person: But accepting a contract by just doing something rather than signing off sounds a little incomplete to me. Prof. IT Law: If the law were not so flexible we would have had to sign documents for every deal we do. For any contract, you need an offer from one party and an acceptance of the offer by another party. Over the Internet that happens all the time. IT Person: That is interesting. Prof IT Law: Yes. In a future meeting we shall discuss how an offer and an acceptance is actually made over the Internet, and the issues that should be kept in mind in electronic commerce. IT Person: I shall look forward to that. Talking to you is always so stimulating. n Continued from Page 34 Interface Cloud application Internet Shared infrastructure Shared platform roles and functionalities of IT/IS personnel. Such paradigm shift is going to have some issues and challenges such as security and privacy of confidential data, dependence or lock-in on cloud providers, management and enforcement of SLAs, and cross-country legalities. However, there are initiatives like having private clouds to take care of some such issues and challenges. Bibliography Internet Web/real-time Application /platform servers servers DB servers Network infrastructure Cloud services Desktops, laptops, smart devices/existing IT setups Fig. 7: Basic cloud computing framework banking solutions) deployed at data centers. Now such centrally deployed systems are likely to move to cloud infrastructure. Such CSI Communications | May 2012 | 36 shifts are helping client organizations to get rid of managing IT systems, resources, and infrastructure. This has also changed [1] Laudon, Kenneth. C., and Laudon, Jane. P. (2012). Management information systems: Managing the Digital Firm, 12th edn., Pearson Education. [2] James O'Brien, George Marakas, and Ramesh Behl. (2010). Management Information Systems, 9th edn., Tata McGraw Hill. [3] Henry C. Lucas Jr. (2008). Information Technology: Strategic Decision Making For Managers, Wiley India. [4] http://www.citrix.com/ accessed in April 2012. n www.csi-india.org ICT@ Society Achuthsankar S Nair Editor, CSI Communications Graphic Texting When you hear the word 'computer art' you might start thinking about the wonders of computer graphics, from Adobe Photoshop to dazzling image processing and morphing software. There was a time when all the computers could handle was plain text. People who have used 'line printers' during those days would know how far away was the computer from graphics. Well, even when there was no computer and the king of text processing Fire-breathing Dragon by Joan G. Stark is the typewriter, strange forms of art used to be practiced with these machines. Such 'typewriter art' is believed to exist from 1890s itself. Expert typists could create a close image of Mona Lisa by clever over-typing. During 1950s, some computers even accepted this method to produce graphics from text printers. These days are fortunately gone, but the art from the keyboard had been reborn in the computers in a big way. Joan G. Stark of Cleverland, Ohio, one of the leading ASCII artists, could surprise anyone with the immense creativity she can reflect on the computer keyboard. (ASCII, or American standard code for information interchange, is a number coding scheme for computer keyboard characters used since 1960s. For example, when you type the character 'a' on the keyboard, the number code 97 is what is stored inside the PC. In practice, ASCII is simply a reference to the set of characters that you can see on the keyboard.) The smiles that we often stick up in e-mails are miniature ASCII art. However, Stark’s variety of ASCII art is not single line. Some of them like the fire-spitting dragon can be a screen-full. She seems to have picked up the liking for keyboard art while she got to play with her father’s office typewriter during her childhood. After hearing about ASCII art five years ago, she has been churning out exciting artwork. All she uses is the Notepad, and of course her wonderful imagination. Links to her works are available in her wiki page. URL: http://en.wikipedia.org/wiki/Joan_ Stark. Take a fresh look at the computer keyboard before you visit her site. Do the keys (,), “’,’,-, = look capable of creating any art? Now prepare for the pleasant surprise in the links available in her wiki page. Her web site also has enough resources for would be ASCII artists. Her own works are classified into birds, cats, zoo animals etc. She has dated, titled and initialized most of her exhibits. A history of the art, an account of her personal experiments with it, tips for beginners and links to related sites are available in the External links section of Sterk's wiki page. Joan, being a mother of four kids whom she introduces in the web site, not surprisingly, through ASCII art. n [8] Leon, D (1962). “Retrieval of misspelled names in an airlines passenger record system”, ACM Communications, 5, 169-171. [9] Nair, A S (2007). “Computational Biology & Bioinformatics: A Gentle Overview”, Communications of the Computer Society of India, 31(1), 1-13. [10] Navarro, G (2001). “A Guided Tour to Approximate String Matching”, ACM Computing Surveys, 33(1), 31-88. [11] Needleman, S B and Wunsch, C D (1970). “A general method applicable to the search for similarities in the amino acid sequence of two proteins”, Journal of Molecular Biology, 48(3), 443-53. [12] Prema, S (2004). “Report of Study on Malayalam Frequency Count”, Dept. of Linguistics, University of Kerala. [13] Soundex, [Online]. Available: http:// en.wikipedia.org /wiki/Soundex, Accessed on 2 Dec. 2011. [14] Wagner, R A and Fischer, M J (1974). “The String-to-String Correction Problem”, Journal of the ACM, 21(1), n 168-178. Continued from Page 13 [5] Hall, P A V and Dowling, G R (1980). “Approximate String Matching”, ACM Computing Surveys, 12(4), 381402. [6] Henikoff, S and Henikoff, J G (1992). “Amino Acid Substitution Matrices from Protein Blocks”, Proceedings of the National Academy of Sciences of the United States of America, 22(22), 10915-10919. [7] Kanitha, D (2011). “A scoring matrix for English”, MPhil Dissertation in Computational Linguistics, Dept. of Linguistics, University of Kerala. CSI Communications | May 2012 | 37 Brain Teaser Dr. Debasish Jana Editor, CSI Communications Crossword » Test your Knowledge on Linguistic Computing Solution to the crossword with name of first all correct solution provider will appear in the next issue. Send your answers to CSI Communications at email address [email protected] with subject: Crossword Solution - CSIC May 2012 1 CLUES 2 ACROSS 3 1. 4 5 6 4. 5. 8. 11. 7 13. 15. 16. 8 9 10 11 17. 21. 12 13 23. 14 25. 26. 15 28. 16 17 29. 18 Determine the part of speech for each word from a sentence (10) Yahoo's text and web page language translation tool (9) A set of parameters defining user's language, country etc. (6) The study of how meaning is affected by context (10) A formal system in mathematical logic for expressing computation by way of variable binding and substitution (6) A database engine for annotated or analyzed text (6) A lexical database for the English language (7) Name of lemma that helps to tell that a language is not regular (7) Vocabulary of a language (7) Type of machine learning task to infer a function from training data (10) Abbreviation formed from the initial parts in a word or a phrase (7) Meaning encoded in a language expression (9) A multilingual dictionary for language translations on Windows (6) Process of analyzing a text as a sequence of tokens (words) (7) A very important data structure (4) DOWN 19 20 21 22 23 24 25 26 28 27 29 2. A company dealing with language translation software (7) 3. Phase structure grammar (11) 6. ISO standard markup framework for natural language processing (3) 7. A variation of finite automaton (8) 9. The study of the nature, structure, and variation of language (11) 10. A variant form of a morpheme (9) 12. Microsoft's language translation service (4) 14. A search algorithm for traversing or searching a tree structure or alike (10) 18. Interaction between computers and humans (3) 19. The study of the origin and history of individual words (9) 20. Rules that describe formation of correct sentence in a language (7) 21. One of the oldest machine translation companies (7) 22. A variety of a language peculiar to a particular region (7) 24. Abbreviation of processing natural language (3) 27. An international scientific and professional society dealing with computational linguistics (3) Solution to April 2012 crossword 1 "I am a failure as a computational linguist! My son sends me an SMS "U R 2 YY 4 ME" and none of my algorithms could crack it. His friends all are able to read it as "You are too wise for me" 4 P 2 A Y T H 3 F O O L M K 6 13 H B L C O O R M D O 11 E R S G A 20 R C Y S 22 J S S 16 A CSI Communications | May 2012 | 38 I T T 17 J A J I D N C R R E N L T 3 B S E S G E P P H R P O V D A C 15 T O 9 R A X M T I L S E T H T U B 25 26 Y U M I C R B T L P O F G 30 S V N A A C 29 R I 10 X A V E G 8 R I W P O Y N O 32 G E 24 28 E U A T 14 18 V Congratulations to Ms. P Deepa (Chennai), Dr. Suresh Kumar (Faridabad), Er. Aruna Devi (Mysore), Dr. T Revathi (Sivakasi) and Mr. S K Khatri (New Delhi) for getting ALMOST ALL correct answers to April month’s crossword. D O Q 21 R P A H W 23 7 C N A 19 W T W C P S 12 5 N I C E 31 F A R R E C M A E N L E D L S A T R A S T L U O I H Q L H O 27 C www.csi-india.org Ask an Expert Dr. Debasish Jana Editor, CSI Communications Your Question, Our Answer “Take up one idea. Make that one idea your life - think of it, dream of it, live on that idea. Let the brain, muscles, nerves, every part of your body, be full of that idea, and just leave every other idea alone. This is the way to success.” ~ Swami Vivekananda } Subject: C++ example } Sir, I have a couple of questions in C++ which I am having doubts. So could you please answer these questions for me? I will be very grateful if you kindly do so. 1. A data member of a class cannot be declared as friend. Why? 2. What should the overloaded operator [ ] return? 3. Can virtual function be declared as a static member of a class? 4. Should destructors be declared virtual as a good programming practice? 5. An overloaded function can have default arguments? Thanks. Sourideb Bhattacharya Student, BE (Instrumentation & Electronics Engineering) 3rd year Jadavpur University, Kolkata A Here are the answers to your questions: 1. Data cannot access another data or function, functions can access. So, no point giving access right to data. 2. The overloaded operator [] is meant for accessing single element in a list of multiple elements like an array. So return type should be element type with reference e.g. int & or in template form T&. The code snippet in template form is given below: template <class T> class Array { private: T *data; int size; public: Array(int s) { data = new T[s = size]; } Array(int s) { data = new T[s = size]; } ~Array() { if (data) { delete [] data; } } T& operator [] (int indx) { if ((indx < 0) || (indx > size -1)) { .... raise exception ... } else { return data[indx]; }; Here, the overloaded operator [] returns the actual data element by reference, otherwise, we cannot use this element to be as modifiable like Array<int> a(10);//int array of 10 elements a[0] = 4; // assign 4 to first element of array This would not have been possible if you returned by value. That would have resulted in a copy of actual element be created and the copy be assigned the value, original content remaining unassigned. 3. No, static member is meant for class type and not object type. Dynamic binding is applicable only on objects depending on dynamic type of object (X * or Y *, depends on how the new was issued like new X; or new Y; where Y is a subclass of X), the virtual function mechanism is applicable for objects. Static members cannot be virtual. class X { public: virtual void vf() { cout << “X::vf” << endl; } } }; class Y : public X { public: void vf() { cout << “Y::vf” << endl; } } }; Now, if we have a program snippet as below: X * p = new Y(); p->f(); This will print Y::vf and not X::vf. Here, the virtual function vf is applicable for the object of type X and Y and require the object instance to be called with. Static won’t do. 4. Yes, always. Otherwise, in a situation where Y is a subclass of X, and you have X* p = new Y, then, delete p would not call Y's destructor if X destructor was declared as virtual. 5. Yes. But Overloaded operators cannot. n Send your questions to CSI Communications with subject line ‘Ask an Expert’ at email address [email protected] CSI Communications | May 2012 | 39 Happenings@ICT H R Mohan AVP (Systems), The Hindu, Chennai Email: [email protected] ICT News Briefs in April 2012 The following are the ICT news and headlines of interest in April 2012. They have been compiled from various news and Internet sources including the financial dailies - The Hindu, Business Line, Economic Times. Voices & Views • In a couple of years, 85% of people will be using smartphones - Microsoft India Chairman. • Chinese hacker is responsible for cyberattacks on Government of India, military research organisations and shipping companies - Trend Micro. • Global handset shipments will increase 29% from 1.7 billion in 2012 to 2.2 billion in 2016 of which smartphone to touch 1 billion - ABI Research. • Emerging markets to spend $1.22 trillion (representing 31% of the worldwide total) on IT in 2012 - Gartner. • Indian enterprise software market will grow 13% in 2012 with revenue of $3.22 billion – Gartner. • Tablet sales touch 4.75 lakh in 2011 CyberMedia Research. • The US Citizenship and Immigration Services has received about 22,000 petitions (against the cap of 65,000) for H-1B work visas in the first four days. • Computer Society of India to promote free software - CSI president Satish Babu. • ‘I warned Raja against advancing 2G cutoff date' - Ex-Telecom Secretary. • Media tablet sales will double this year globally to 12 crore units from 6 crore units in 2011 - Gartner. • The Indian logistics industry is estimated at $130 billion and is expected to grow to $385 billion in the next four to five years - Mr P. Srikanth Reddy, Chairman, Four Soft. • Publishers reach settlement with US Justice Dept on e-book pricing. • By 2015, the market for ‘big data' technology and services globally will reach $16.9 billion up from $3.2 billion in 2010. Every day, 2.5 quintillion bytes of data are created – IDC. • Social networking sites should set up servers in India – Rajasthan CM Gehlot. • Karnataka's IT exports zoomed nearly 50% to touch Rs. 1.3 lakh crore in 2011-12. • ‘Data breach costs Indian organisations Rs. 5.35 crore annually’ – Symantec. Telecom, Govt, Policy, Compliance • Govt will help fund buys of foreign firms with high-end cyber security technology. CSI Communications | May 2012 | 40 • Aakash-II, sub $40 Android tablet launch likely in May – Sibal. • Supreme Court rejects 2G operators’ review petition. • Airtel rolls out 4G at Kolkata, to offer high speed Internet services. • DoT panel sees merit in merger of BSNL, MTNL. • Centre may clear Karnataka's plan to set up IT investment region at an estimated investment of Rs. 90,000 crore. The project would generate about 1.1 million direct and 2.7 million indirect jobs. • AICTE and Microsoft announced the implementation of Microsoft Live@edu for all the technical colleges in India. • TRAI wants licensing powers under new unified regime. • The future of Aakash tablet hangs in balance as Datawind and QUAD Electronics have locked horns over alleged violation agreements. • Mobile ARPUs start rising for first time in many years. • DoT asks telcos to comply with new tower radiation norms. • TRAI sets quality norms for mobile banking services. • TRAI makes one ‘per second’ plan mandatory. • Prospects brighten for silicon wafer fab units as global firms offer support. • TRAI sets base price for 2G spectrum at 10 times 2008 rate with price varying between Rs. 3,622 and Rs. 14,480 crore per megahertz of airwaves. • Panel set up to frame norms for telecom firms for issuing SIM cards. • TRAI launches online facility (www. tccms.gov.in) to monitor consumer complaints. IT Manpower, Staffing and Top Moves • Cyrus Mistry and O.P. Bhatt (Ex. SBI Chairman) join TCS board. • Progress Software to help engineering colleges in setting up incubation centres in Hyderabad. • Potential job losses in telcos 'enormous' - HR Experts. • Hiring of NRI professionals up 5% in JanMar 2012. • Infosys BPO to recruit 13,000 across 18 locations. Also plans to hire 35,000 people this fiscal. • Steelwedge Software to raise India headcount from 180 to 1150 by 2016. • Walmart Labs to hire 200 engineers. • Uninor employees take to the streets to save company. • TCS employee addition at all-time high with a gross addition of 70,400 employees in the year ending March 2012. • 150 Bangalore staff hit in Yahoo!'s 2,000 cut globally. • SingTel Global (India), to expand its operations in five more cities, including Jaipur and Ahmedabad, and double its workforce by 2014. • Tata Elxsi to increase headcount at Bangalore lab. • TCS chief, Mr N. Chandrasekaran, to assume the office of Chairman of Nasscom. • IT companies step up hiring of engineering graduates. The average salary increased by about 10% compared to last year and in the range of Rs. 3.05 lakh to Rs. 3.25 lakh per annum. Company News: Tie-ups, Joint Ventures, New Initiatives • Cisco is considering to set up a manufacturing and services unit in Maharashtra. • Wipro asks component vendors to disclose emission data as part of Green IT initiatives. • HCL Info launches operations in Qatar. • Micromax joins the tablet war with its FunBook priced at Rs. 6,499. • Facebook’s mobile app now in seven Indian languages. • Local search engine hudku.com launched. • Reliance emerges as the first telecom operator in the country to offer tablets on both the 3G and CDMA networks after launching the CDMA tablet. • Kaspersky comes out with suggestions on how to protect your Mac OS. Will be useful to 10 crore Mac OS X users around the world. • Facebook buys Instagram – smartphone photo sharing application for $1 billion. • Four Soft bets big on cloud-based product for logistics sector. • MonsterIndia launches app for mobiles. • HP unveils ‘converged cloud' services. • Wipro to provide tech services for San Francisco Marathon. • Green Platinum rating for Infosys. • Now, ‘Google Drive' to take on rivals' cloud storage service. • Samsung overtakes Nokia to become top selling phone brand globally. n • Zenith launches TigerCloud. www.csi-india.org CSI Report Prof. Dipti Prasad Mukherjee* and Dr. Dharm Singh** * RVP, Region II ** Member SIG-e-Agriculture, CSI * CSI Regional II Meeting at Kolkata A regional meet of the office bearers of different chapters of the Region II was organized at the Indian Statistical Institute, Kolkata on Sunday, March 25, 2012. The representatives from the Patna (Prof. A K Nayak), Durgapur (Prof. Asish Mukhopadhyay), Siliguri (Dr. Ardhendu Mandal) and Kolkata (the current and incoming chairmen Mr. Sushanta Sinha and Dr. Debasish Jana) chapters were present. The meeting was also attended by the CSI Secretary Prof. H R Vishwakarma, Division III Chair Prof. Debesh Das, the regional student coordinator Prof. Phalguni Mukherjee and national nomination committee member Mr. Subimal Kundu. Prof. Dipti Prasad Mukherjee, Regional Vice-President Region II, welcomed the gathering and urged to increase the CSI activity in the Eastern India. Prof. H R Vishwakarma discussed encouraging growth of the CSI membership across India except the eastern region. The problems faced by the smaller chapters like Durgapur and Siliguri were discussed in detail. Possibility of obtaining some seed funds from the CSI headquarter and A-category chapters for smaller chapters was explored at length. A number of senior CSI members present in the get-together expressed their concerns regarding the image of CSI and suggested more quality programs for enhancing the CSI brand value. A set of activity was planned in Patna, Durgapur and Siliguri chapters. The meeting ended with a positive note of leveraging the potential of Region II in expanding the reach of CSI. ** Special Interest Group on e-Agriculture Annual Report: 1 April 2011 to 31 March 2012 Background Special Interest group on e-Agriculture was formed in January 2011. The indirect benefits of IT in empowering Indian farmer are significant and remain to be exploited. The Indian farmer urgently requires timely and reliable sources of information inputs for taking decisions. At present, the farmer depends on trickling down of decision inputs from conventional sources which are slow and unreliable. The changing environment faced by Indian farmers makes information not merely useful, but necessary remain competitive. The role of ICT will of great importance for this 60 percent population dependent on agriculture as a part of rural development which isolate from urban sector thereby bridging the digital divide. Objectives • • • To transform technological intervention to increase agriculture production and productivity by ICT. To empower the farmers to take quality decision this will improve agriculture and allied activities. To research and develop strategy of ICT application in agriculture and allied activities. Activities: Events – 2011-2012 Host Institute Conference and Theme Date and Location SIG-WNs, SIG-e Agriculture, DivIV, Udaipur Chapter, A three days International Conference on Emerging Trends 22-24 April, 2011 at the CTAE CSI, IEI, WFEO, CTAE, TINJR and Co-Sponsored by in Networks and Computer Communications (ETNCC2011) Udaipur, India. IEEE Delhi Section was organized SIG-WNs, SIG-e Agriculture, Udaipur chapter and Motivational and expert series of lectures 5th May 2011 MPUAT Speakres: Dr. S. Reisman, President, IEEE Computer Society CTAE, Udaipur (Cyber lecturer), Dr. Dharm Singh, Convenor SIG-WNs CSI, Dr. YC Bahtt, Convenor SIG-e-Agriculture Udaipur Chapter, SIG-WNs, SIG-e-Agriculture, IEI- First CSI Rajasthan State IT Convention and National May 17-19, 2011 ULC Conference with Celebration of World Telecommunication SIGs Campus Udaipur and Information Society Day 2011 on “WTISD 2011: Better CTAE and SGI life in rural communities with ICTs” SIG-WNs and e-Agriculture CSI, IEI ULC and TINJR National Seminar on IP Multimedia Communications IEI, SIG-WNs & e-Agriculture CSI, CTAE and TINJR All India Seminar on Information and Communication February 11-12, 2012 Technology for Integrated Rural Development October 14-15, 2011, Udaipur Peer recognition achieved within India/globally This group is new one and presently taking up some projects on research and development side to develop electronic planters for precision farming. More collaborative work is envisage once the activities are strengthen more. Plans 2012-13 1. 2. Technical Session in 26th National Convention of Agricultural Engineers in January 2013. Seminar exclusively on theme of e-Agriculture planned at end of 2012. Dr. R. Srinivasan, Past President and Fellow of CSI has been appointed as Professor Emeritus in SRM University. Currently he is also serving as Dean Research & PG Studies at RNS Institute of Technology, Bangalore. Dr. Srinivasn is a member of IEEE, Member of IEEE Computer Society, Fellow of IETE (India) and Life Member of ISTE.” CSI Communications | May 2012 | 41 CSI Journal of COMPUTING ISSN 2277-6702 e-ISSN 2277-7091 www.csijournal.org Dear CSI Fraternity, CSI has launched the ‘CSI Journal of Computing’, with truly original papers from the vibrant community of academia, industrial researchers, innovators, and entrepreneurs around the world. The first issue was released by the Honorable Chief Minister of Maharashtra, Shri Prithviraj Chavan, on the CSI Foundation Day 2012 at Mumbai. The Journal covers topics related to Computer Science, Information Technology, several boundary areas among these and other fields. It is managed by an International Editorial Board. Initially each volume will have four issues. Contents of Vol. 1, No. 1, March 2012 • • • • • • • Efficient Face Recognition using Local Active Pixel Pattern (LAPP) for Mobile Environment: Mallikarjuna Rao G, Praveen Kumar, Vijaya Kumari G, and Babu G R Scalable Lock-Free FIFO Queues using Efficient Elimination Techniques: V V N Pavan Kumar and K Gopinath Direct Approach for Machine Translation from Punjabi to Hindi: Gurpreet Singh Josan and Gurpreet Singh Lehal Markov Modeling in Hindi Speech Recognition System: A Review: R K Aggarwal and M Dave The Genome Question: Moore vs. Jevons: B Mishra Hash Based Key Indexing: A New Approach to Rainbow Table Generation: Deepika Dutta Mishra, C S R C Murthy, A K Bhattacharjee, and R S Mundada Bioinformatics for Next Generation Sequencing: Srinivas Aluru CSI member Non CSI member (`) Individual ` 400/Volume or US$20/- ` 800/Volume or US$25/- Library ` 600/Volume or US$/50/- ` 1000/Volume or US$75/- For bulk discounts and other related information you may contact Mr. SM Fahimuddin Pasha, ([email protected]) Coordinator. I invite you all to reserve your copy as soon as possible through www.csijournal.org/subscription. Looking forward to your paper contributions to the Journal and subscriptions. Advertisements and Sponsorships To make the Journal and publications from CSI vibrant and offer Open Access for the community with a minimal subscription for the print versions, we solicit sponsorships for the journal. Note that the open access version offers very affordable advertisements. For advertisement rates, please refer to www.csijournal.org (also on the cover pages of the journal). Here are some of the varieties of Sponsorship possibilities for Software Houses, Universities, and Government organizations. Sponsorships Rate and Numbers/year Benefits Platinum ` 100,000/Numbers: two a. Online advertisement of 1 full page - whole year b. Half page - printed version Gold ` 75,000/Numbers: Four a. Online advertisement of 1/2 page - whole year b. 1/4 page - Printed version Silver ` 50,000/Numbers: Eight a. Online advertisement of 1/4 page - whole year b. One column (1/8 page) - printed version Institutional Memberships ` 25,000/- a. The member institutions name will be carried on the web as well as in the printed version b. 1/4 Page online advertisements - whole year CSI has vibrant distributorship across the country with 66 chapters, 385 student branches, and over 80,000 memberships across the country. Looking forward to generous sponsorships and by institutional memberships from the community to keep CSI publications vibrant. Satish Babu President, Computer Society of India CSI Communications | May 2012 | 42 Prof. R K Shyamasundar (TIFR) Editor-in-Chief, CSI Journal of Computing Chairman, CSI Publication www.csi-india.org CSI News From CSI Chapters » Please check detailed news at: http://www.csi-india.org/web/csi/chapternews-May2012 SPEAKER(S) TOPIC AND GIST GHAZIABAD (REGION I) Dr. Pankaj Jalote, Mr. Sunil Asthana, Mr. Amit Goenka, 7 April 2012: 10th National IT Seminar “Recent Trends in Software Mr. Navneet B Gupta Technologies (RTST-2012)” Dr. Jalote discussed the definition of engineering, especially software engineering & skills required. He discussed role of science-based researcher and engineering researcher, abilities of researcher, and difference between research & research manager. Mr. Sunil Asthana spoke about developments in IT Industry and covered various aspects of Mobile Commerce, Mobility, Mobile Applications, and Cloud Computing. There were two technical sessions: Emerging Trends & SIG Role in Software development and Recent Advances in Software testing, maintenance, and quality assurance. Dr. Pankaj Jalote, delivering the talk during inaugural session of RTST-2012 (L to R: sitting) Dr. A K Puri, Sh. Sunil Asthana, Dr. Vineet Kansal, and Dr. Rabins Porwal GWALIOR (REGION III) Jayu S Bhide 1 to 3 March 2012 and 14 March 2012: A program on “HAM Radio” A program on HAM Radio was jointly conducted by I.P.S. College Gwalior & CSI Gwalior Chapter from 1st to 3rd March and later on 12th March by R.J.I.T. Teknanpur. Mr. Jayu S. Bhide spoke and organized a live demonstration of HAM Radio. Attendees learnt how to set up the HAM Station. Students asked questions regarding security and operation of HAM Radio and speaker answered the queries. Mr. J S Bhide and students, during HAM Radio practical CUTTACK (REGION IV) Dr. Lalit Mohan Patnaik, Mr. Sushant Panda, 5-7 March 2012: “Cloud Computing” Conference and Student Convention on Objective of the conference was to provide an overview on Cloud Computing, the evolution, when and why to use the cloud services, some major market players and what they provide, and to familiarize the participants on the software and services available in the Cloud public domain. The first day was an Industry day, the second day was devoted to technical workshops and on the third day selected R&D papers were presented by the conference participants. Photograph showing inauguration of the Conference (L to R): Mr. Sanjay Mohapatra, Prof. (Dr.) R Misra, Prof. (Dr.) L M Patnaik, IISC Bangalore, Er. S Rout, Mr. Sushant Panda, and Dr. K C Patra BANGALORE (REGION V) Mr. Srikantan Moorthy, Sr. VP & Group Head, Education 17 March 2012: i3 for i3 Club Launch @ Infosys Campus, Bangalore & Research, Infosys Technologies Mr. Srikantan Moorthy delivered key note address on “Top Employability Parameters”. Participants took up three key topics mentioned by Mr. Moorthy viz, A. Building competency among faculty B. Building Competency among students and C. Improving industry interaction. Participants in group discussion CSII Co C CS C Communications mmun mm unic un icat atio at ions ns | Ma May y2 201 2012 012 01 2 | 43 SPEAKER(S) TOPIC AND GIST COIMBATORE (REGION VII) Dr. Narasimha Murthy K Bhatta and Mr. Mahesh Kolar 10 March 2012: Industry Interaction Day on “Future of Indian IT Sector: Trends, Opportunities and Challenges” A technical session on ‘Cloud Computing’ was handled by Dr. Narasimha Murthy K Bhatta. The second technical session on ‘Mobile Technologies’ was delivered by Mr. Mahesh Kolar.The panel discussion held on the theme, “Future of Indian IT Sector: Trends, Opportunities and Challenges”. Various trends, opportunities and challenges of Indian IT industry were discussed by panel members. (L to R) Mr. Ashok Bakthavathsalam, Mr. Mahesh Kolar, Mr. R Shekar, Prof. S Balasubramanian, Mr. Kumar Krishnasami, Dr. Narasimha Murthy K Bhatta, and Mrs. Maya Sreekumar TIRUCHIRAPPALLI (REGION VII) Prof. S Ravimaran, Mr. Ramachandran, Dr. S Selvakumar 15 March 2012: National Level Technical Symposium on “Emerging Trends in Computing, Informatics and its applications - COMBLAZE 2k12” Mr. Ramachandran highlighted importance of communication skill and hard work. He advised student community to upgrade their skills continuously. Dr. S. Selvakumar briefed on Cyberspace security and Network security and technology updates in this Cyber era. He explained various security requisites and security measures. He explained the techniques with real world scenarios, latest tools and software for security and mentioned several resources and references to learn more on the subject. Dr. Selvakumar at workshop From Student Branches » http://www.csi-india.org/web/csi/chapternews-May2012 SPEAKER(S) TOPIC AND GIST ABES ENGINEERING COLLEGE, GHAZIABAD (REGION-I) 24 March 2012: An intra-college technical paper presentation competition (Techsurge-2012) Intra-college technical paper presentation competition was organized in collaboration with Ghaziabad Chapter. Objective was to create awareness among students about emerging technologies and encourage them to take up research on related subjects. Approximately 130 students participated in this activity from different courses. Total 21 papers were presented during Techsurge-2012. Guests on dias at ABES College, Ghaziabad CSII Co CS Comm Communications mmun mm unic un ic catio attions nss | Ma May y2 2012 012 01 2 | 44 4 www. ww w.cs w. csics i-in indi in dia di a.or a.or org g www.csi-india.org SPEAKER(S) TOPIC AND GIST DR. ZAKIR HUSAIN INSTITUTE, PATNA (REGION-II) Prof. (Dr.) A K Nayak and Dr. M N Hoda 26 March 2012: One-day Seminar on “Twenty First Century Professionals: Industry Expectations” In his Inaugural Address, Prof. (Dr.) A. K. Nayak advised students to have dedication, devotion & determination to achieve scale of excellence in the profession. Prof. Hoda stressed that quality of computer education is the need of the hour for catering to the industry demand. He told students to make sincere effort to develop effective ability within them since students passing out are not reaching up to the expectations of organizations. The dignitaries sitting on the dais during the workshop SARDAR VALLABHBHAI PATEL INSTITUTE OF TECHNOLOGY(SVIT), VASAD (REGION-III) Prof. Virendra Ingle and Prof. Rinku Chavada 15-16 March 2012: Workshop on "Android-based Mobile Application Development" The workshop covered topics like introduction to Android, the anatomy of Android applications, UI screen elements and layout, and Android data and storage APIs as well as Location-based Services APIs. Participants at the workshop R.V. COLLEGE OF ENGINEERING, BANGALORE (REGION-V) Mr. Partha and Dr. S Sathyanarayana 19 March 2012: Motivational Talk on “Software Testing - Career” It was an occasion to facilitate Certificate distribution for students, who cleared “Software Testing Certification Examination”. Mr. Partha told that people look at testing with different mindset. We need to think from the perspective of customer. The tester are better coders. Dr. S Sathyanarayana advised the participants to make better use of opportunities. Participants attending motivational talk on “Software Testing” ANURADHA ENGINEERING COLLEGE, BULDHANA (REGION-VI) Prof. Avinash S Kapse and Dr. S V Agarkar 28 February 2012: National Science Day Celebration “Project Exhibition & Debate Competitions” Prof. Avinash S Kapse talked about importance of projects in globalization of knowledge & about projects needed by society. Dr. S V Agarkar gave guidance to students and answered their queries.Students explained their projects. Inaugural Session: (L to R) Prof. Avinash Kapse, Dr. S V Agarkar, Shri. Siddheshwarji Wanere, and Students Mr. N B Mapari, Prof. Avinash S Kapse, Dr. S V Agarkar, and Prof. K H Walse 3-4 March 2012: Two-days Workshop on “Understanding and using Android platform” Prof. Avinash S Kapse talked about importance of workshop & made appeal to students to improve their personality. He suggested use of the Android technology in future life. He also spoke about globalization of knowledge. Dr. S V Agarkar spoke about importance of Android & its applications & technology. Prof. K.H.Walse talked about importance of Andriod technology in future. Inaugural Session: (L to R) Mr. D G Vyawahare, Dr. Bhattachrayya, Dr. S V Agarkar, Prof. Avinash Kapse, , Prof K H Walse, and Mr. Dhaval Gulhane CSII Co C CS C Communications mmun mm unic un icat atio at ions ns | Ma May y2 201 2012 012 01 2 | 45 SPEAKER(S) TOPIC AND GIST K. K. WAGH INSTITUTE OF ENGINEERING EDUCATION & RESEARCH, NASHIK (REGION-VI) 13-14 March 2012: National Level Technical Symposium "Equinox 2k12" Various events conducted such as - • CODE-COGS: Programming Contest, • SPIDER -WEB: Web Designing Contest • TECHNO HUNT: Project Competition • SCRATCH YOUR BRAIN: Aptitude & Group Discussion • NET CONNECT: Networking Workshop • WORLD WAR III: Robo Wars. Chief Guest Mr. Piyush Somani, Prof. Dr. S S Sane, Faculties, and Student Member 17 March 2012: International Conference on “Emerging Trends in Computer Science and information Technology-2012 (ETCSIT-2012)” Professionals, academic researchers presented and discussed their conceptual and experimental work. The conference provided a forum for eminent academicians, technologist, scientists and researchers to exchange their ideas on the latest developments and future trends in Computer Science and IT. ETCSIT-2012 also provided a platform for UG and PG students & encouraged them to preset their work based on final year project. (L to R) : Prof. N M Shahabe, Dr. Uday Wad, Dr. Parvati Rajan, Dr. Bhargave, Prof. Dr. S S Sane, Mr. Shekhar Paranjape, Prof. S M Kamalapur, Prof.M B Jhade MET’S INSTITUTE OF ENGINEERING, NASHIK (REGION-VI) Dr. M U Kharat, Mr. Shirode, and Dr. V P Wani 9 -10 February 2012: Student Convention For the first time in 12 years, the CSI Regional Convention for the Region VI was held. Mr. Shirode enlightened students with his experiences of all-round engineering and his 360 degree principle to look at the world. Dr. V P Wani with his motivating words asked the students to give their 100% efforts in whatever competition they participate and make the competition tougher. During the Convention, IT Quiz, Paper Presentation, Circuit Trap, website design Contest, and Group Discussion contest were organized. (L to R): Dr. Shirish S Sane, Dr. V P Wani, Mr. Shirode, Mr. Anil Shukla, Mr. Mangesh Pisolkar, and Prof. Aruna Deogire 20 March 2012: Project on “MLearning Framework for Multiple Platforms” MLearning project won first prize in CSI- Discover Thinking National Project Student Contest and Expo 2012. Arpeet Kale, Saurabh Rawal, Jaspreet Kaur Kohli & Komal Bafna, who are students from Computer Engineering, developed this Mobile Application. These students developed a framework, which will deliver engineering education on mobiles through high quality 2D-3D animations, interactive learning content and many more such features. Winners: Arpeet Kale and Jaspreet Kaur Kohli with Dr. Trimurthi and other Judges GOVERNMENT ENGINEERING COLLEGE(GEC), BARTON HILL, TRIVANDRUM (REGION-VII) Mr. NabeelKoya A, Dr. K C Chandrasekharan Nair, and Mr. Shibin George 15 February 2012: One-day Technical Festival "Inceptra 2012" Mr. Nabeel Koya deliberated on Cyber Security and Forensics in the current scenario. Dr. K C Chandrasekharan Naira talked on Student Entrepreneurship and opportunities open to them. Mr. Shibin George conducted a general quiz competition. Events included Bug Hunt, a technical competition involving cryptography to debugging; LOL Codes, a coding test on rare and useful programming languages; and Cascade Coding, a challenge on parallel programming. Competitions on Project Presentation and Gaming were also conducted as a part of the festival. (L to R): Mr. Anand Kumar, Prof. Jayaprakash P, Prof. G Ramachandran, Dr. Sheela S, Prof. Balu John, and Ms. Sreelakshmi G S CSII Co CS Comm Communications mmun mm unic un ic catio attions nss | Ma May y2 2012 012 01 2 | 46 6 www. ww w.cs w. csics i-in indi in dia di a.or a.or org g www.csi-india.org SPEAKER(S) TOPIC AND GIST JYOTHI ENGINEERING COLLEGE(JEC), THRISSUR, KERALA. (REGION-VII) Dr. Gylson and Mr. Chaitany Khanpur 24-25 February 2012: Two-days Workshop on "Cloud Computing" Principal, Dr. Gylson Thomas inaugurated the Two-day National workshop on "Cloud Computing". Mr. Chaitany Khanpur gave a deep and interactive class about Cloud computing from the basics of cloud computing and grid computing. Students also got a hands-on session for implementing private cloud. During the workshop KALASALINGAM UNIVERSITY, TAMILNADU (REGION-VII) Dr. Maluk Ahamed and Dr. Kalaiselvi 28-29 March 2012: Digital Dreams ’12 – National Level Technical Symposium Dr. Maluk advised students to acquire knowledge about their field by attending Symposiums and Seminars and stressed the importance of maintaining quality standard. In the Symposium, 51 papers were presented. Themes were Distributed Computing, Network Technology, Image Processing and AI techniques. Dr. M. A. Maluk Ahamed delivered lecture on Distributed Computing and Dr. Kalaiselvi spoke on “Medical Imaging”. Other events included Technical Quiz, C-Debugging, Trailer Presentation, Web Designing, Situation Manager and Treasure Hunt. Dr. M A Maluk Mohammed releases the souvenir of Technical Symposium MAR BASELIOS COLLEGE OF ENGINEERING (MBCET), TRIVANDRUM (REGION-VII) 24 February 2012: Intercollegiate Code Debugging Contest “Neosoft” The competition consisted of two rounds: the prelims and the final round. The prelim was a written round, testing the logical and analytical skills of the participant. The final round was a practical round consisting of three questions, testing the logical, innovative thinking, and team work of the participating teams. Code Debugging competition in progress NATIONAL ENGINEERING COLLEGE (NEC), KOVILPATTI (REGION-VII) Mr. M K Anand 22 March 2012: Inaugural Function – “National Conference NACCA’12” The Mr. M K Anand inaugurated the conference and addressed the gathering. In his speech, he advised the students not only to look for jobs but also they must concentrate on self-employment with innovative ideas. The inaugural session was followed by the technical sessions in which advanced topics like Grid Computing, Mobile Computing, Soft Computing, and Distributed Computing were presented. Release of Conference Proceedings by Chief Guest Mr. M K Anand (L to R): Ms. E Siva Sankari, Dr. D Manimegalai, Dr. P Subburaj, Mr. M K Anand, Dr. Kn. K S K Chockalingam, and Mr N BalaSubramanian CSII Co C CS C Communications mmun mm unic un icat atio at ions ns | Ma May y2 201 2012 012 01 2 | 47 Following new student branches were opened as detailed below – REGION I Model Institute of Engineering and Technology (MIET), Jammu - First CSI student branch in Jammu & Kashmir was inaugurated on 24th March, 2012. On this occasion CSI convention on Disaster Management and e-Governance was organized. Two projects by MIET students showcased on the occasion were - a “Social Network Promoting Social Responsibility” by Sajan Sridhar and “Election Management” by Sumit Gupta. Prof. Ankur Gupta described several IT initiatives at MIET including filing of 3 patents; in-house development of 2 IT products; and 3 open source IT projects undertaken pertaining to learning management, campus ERP, and admission management systems. REGION III NRI Institute of Technology and Management (NRIITM), Gwalior - Dr. S K Gumasta gave an inaugural speech on the occasion of opening a new student branch at NRIITM on 17th February, 2012. A seminar was jointly organized by NRIITM, Gwalior and CSI Gwalior chapter on this occasion. REGION V REVA Institute and Technology Management (RITM), Bangalore - Inauguration of REVA CSI Student Branch was held on 11th February, 2012. The Chief Guest of the function was T N Seetharamu, who inaugurated the student chapter. On the occasion, Mr. Suman Kumar delivered a talk on “Android – The Mobile Technology”, which was attended by a large number of students, faculty, and staff members of the college. REGION VI Institute of Management and Entrepreneurship Development (IMED), Pune - On 29th March, 2012 Inaugural ceremony of “IMED-Student Chapter-CSI” was held in the presence of Dr. M S Prasad and Dr. M V Shitole. Chief Guest of the ceremony was Mr. C G Sahasrabuddhe. Mr. Amit Dangle was guest of honor. REGION VII S. Veerasamy Chettiar College of Engineering and Technology, Tirunelveli - The Inaugural function of student branch was organized on 29th February, 2012. The Chairman Dr. V Murugaiah presided over the function. Mr. Y Kathiresan spoke on the occasion on “Personal Effectiveness”. CSI BRINGS MEMBERS AND OPPORTUNITY TOGETHER Computer Society of India is the recognized association for Information and Communications Technology (ICT) professionals, attracting a large and active membership from all levels of the industry. A member of the Computer Society of India is the public voice of the ICT profession and the guardian of professional ethics and standards in the ICT industry. We also work closely with other industry associations, government bodies, and academia to ensure that the benefits of IT advancement ultimately percolate down to every single citizen of India. Membership demonstrates IT professionalism and gives a member the status and recognition deserved. Join CSI Learn more at www.csi-india.org I am interested in the work of CSI. Please send me information on how to become an individual/institutional* member Name ______________________________________ Position held_______________________ Address______________________________________________________________________ ______________________________________________________________________ City ____________Postal Code _____________ Telephone: _______________ Mobile:_______________ Fax:_______________ Email:_______________________ *[Delete whichever is not applicable] Interested in joining CSI? Please send your details in the above format on the following email address. [email protected] CSI Communications | May 2012 | 48 www.csi-india.org CSI Calendar 2012 Date Prof. S V Raghavan Vice President & Chair, Conference Committee, CSI Event Details & Organizers Contact Information May 2012 Events 22-26 May 2012 Workshop on Configuring and Administering Microsoft Share Point 2010 CSI Mumbai Chapter Mr. Abraham Koshy [email protected] 24-27 May 2012 Certificate Course on PMP (Project Management) 4.0 (36 Hours of PDU's) CSI Mumbai Chapter Mr. Abraham Koshy [email protected] 26-27 May 2012 Two - Day Workshop on "Secure Computing Systems" CSI Division II [Software] and Military College of Telecommunication Engineering [MCTE], Mhow. Dr. T V Gopal [email protected] June 2012 Events 8-12 June 2012 Hands on workshop on Microsoft Share Point 2010, Application Development CSI Mumbai Chapter Mr. Abraham Koshy [email protected] 13 June 2012 Software Process Information Network (SPIN) Meet on the topic of Advance Agile Methodology (Scrum etc) CSI Mumbai Chapter Mr. Abraham Koshy [email protected] 21-24 June 2012 Certificate Course on PMP (Project Management) 4.0 (36 Hours of PDU's) CSI Mumbai Chapter Mr. Abraham Koshy [email protected] July 2012 Events 26-28 July 2012 International Conference on Advances in Cloud Computing (ACC-2012) CSI, Bangalore Chapter and CSI Division I Dr. Anirban Basu [email protected] Dr. C R Chakravarthy [email protected] August 2012 Events 31 Aug-1 Sep 2012 3rd International Conference on Transforming Healthcare with IT CSI Division II (Software), Hyderabad Dr. T V Gopal [email protected] www.transformhealth-it.org September 2012 Events 5-7 September 2012 International Conference on Software Engineering (CONSEG 2012) CSI Division II (Software), Indore 13-14 September Global Science and Technology Forum Business Intelligent Summit and Awards 2012 CSI Division II (Software), Singapore Dr. T V Gopal [email protected] www.conseg2012.org Dr. T V Gopal [email protected] www.globalstf.org/bi-summit November 2012 Events 29 Nov-1 Dec 2012 Third International Conference on Emerging Applications of Information Technology (EAIT 2012) CSI Kolkata Chapter Event at Kolkata, URL: https://sites.google.com/site/csieait2012/ D P Mukherjee/Debasish Jana/ Pinakpani Pal/R T Goswami [email protected] December 2012 Events 1-2 December 2012 47th Annual National Convention of CSI (CSI 2012) Organized by CSI Kolkata Chapter, URL: http://csi-2012.org/ Subimal Kundu/D P Mukherjee/ Phalguni Mukherjee/J K Mandal [email protected] 14-16 December 2012 International Conference on Management of Data (COMAD-2012) SIGDATA, CSI, Pune Chapter and CSI Division II Mr. C G Sahasrabudhe Shekhar_sahasrabudhe@ persistent.co.in Please send your event news to [email protected] . Low resolution photos and news without gist will not be published. Please send only 1 photo per event, not more. Kindly note that news received on or before 20th of a month will only be considered for publishing in the CSIC of the following month. Registered with Registrar of News Papers for India - RNI 31668/78 Regd. No. MH/MR/N/222/MBI/12-14 Posting Date: 10&11 every month. Posted at Patrika Channel Mumbai-I If undelivered return to : Samruddhi Venture Park, Unit No.3, 4th floor, MIDC, Andheri (E). Mumbai-400 093 47th Annual National Convention of the Computer Society of India Organized by The Kolkata Chapter December 1-2, 2012, Science City, Kolkata In conjunction with 2012 Third International Conference on Emerging Applications of Information Technology (EAIT-2012) Call for Paper and Participation Advisory Committee R N Lahiri, Chair Event Chair Subimal Kundu Organizing Committee D P Mukherjee, Chair S Sinha, Co-Chair Program Committee P Mukherjee, Chair J K Mandal, Co-Chair Finance Committee R T Goswami, Chair D Dutta, Co-Chair Convention Committee S Daspal D P Sinha S Roychowdhury Avik Bose Anirudhha Nag Prashant Verma Gurudas Nag Gautam Hajra Md Aliullah Chinmay Ghosh T Chattopadhyay Subir Lahiri Debasish Jana Pinakpani Pal Convention Website: http://csi-2012.org/ Paper Submission: Aug 30, 2012 Paper Acceptance: Sept 30, 2012 Please contact: CSI Kolkata Chapter 5 Lala Lajpat Rai Sarani (Elgin Road), 4th Floor, Kolkata 700 020 Phone: 2281-4458 Telefax: 2280-2035 Email: [email protected] Web: http://csi-kolkata.org/ Convention Theme: Intelligent Infrastructure Convention Event: International Conference on Intelligent Infrastructure The Computer Society of India Kolkata Chapter (CSIKC) cordially invites you to participate in the 47th Annual National Convention of CSI. While this event will follow the glorious footsteps of previous conventions, it would still be a unique event focussing on the theme of Intelligent Infrastructure. CSI and CSI Kolkata Chapter: Formed in 1965, the CSI has been instrumental in guiding the Indian IT industry since its formative years. CSIKC is the oldest chapter and the first CSI Annual National Convention was held in Kolkata at the Indian Statistical Institute in 1965. To commemorate the achievement of CSI, CSIKC will host the CSI-2012. The event will comprise of Plenary Sessions, Paper Presentations and Panel Discussions. Intelligent Infrastructure: Compelling changes in society and nature require unprecedented fusion between the physical and the virtual worlds. Today’s society is a complex system of systems; it is a combination of economic development, public safety, healthcare, energy and utilities, transportation, education and various other systems. The function of intelligent infrastructure is to model as well as manage these complex interconnected systems based on a greater understanding of the interconnectivity and utilisation of the latest developments in ICT. The inter-disciplinary nature of intelligent infrastructure provides a great deal of opportunity for creative approaches to problem solving. The International Conference on Intelligent Infrastructure in CSI-2012 aims to provide a platform for fruitful deliberations on this theme of the hour. The theme includes (but not limited to) following topics: • Intelligent Infrastructure Applications ° ° ° ° ° • • Precision Agriculture and Smart Growth Systems Smart Grids and Wide Area Measurement Systems Intelligent Building Automation Systems Intelligent Energy and Water Management Systems Intelligent Manufacturing, Healthcare, Transportation Systems Intelligent Infrastructure Technologies ° Smart Structures, Federated Devices, Sensor Signal Processing and Modelling ° Miniature Wireless Sensors and Networks, Nanoscale Sensors ° Security Issues in Smart Infrastructures, Smart GIS ° Computational and Machine Intelligence Tools Intelligent Infrastructure Platforms ° Sensor Web-enablement, Sensor Data Analytics ° Management of Big Data and Associated Development Technologies ° Next Generation Data Centre Technologies for the Exascale Era Conference in Conjunction: 2012 Third International Conference on Emerging Applications of Information Proceedings: Original unpublished research articles, development notes Technology (EAIT-2012) and position papers aligned with the theme of the convention will be Nov 29 – Dec 01, 2012, Indian Statistical published in the Proceedings of the International Conference on Intelligent Institute, Kolkata Infrastructure. The author instructions for paper submission are available at EAIT-2012 Website: https://www.sites.google.com/site/csieait2012 http://csi-2012.org/. Media Partner for twin mega events EAIT-2012 and CSI-2012 CSI Communications | May 2012 | 50 Journal Special Issues: Extended versions of the selected papers presented in the conference will be published in Journals. CSI Journal of Computing (ISSN: 2277-7091) will publish a special issue on Intelligent Infrastructure after FAST TRACK review of selected papers from the conference.