Cover Story Article Article Article Article Research Front Cover Story

Transcription

Cover Story Article Article Article Article Research Front Cover Story
` 50/ISSN 0970-647X | Volume No. 36 | Issue No. 2 | May 2012
Cover Story
Cover Story
Desi Language Computing on the Rise 5
“Correcting” SMS Text
Automatically 9
Research Front
Article
Approximate/Fuzzy String
Matching using Mutation
Probability Matrices 12
Emails and Web Pages in Local
Languages 14
Article
Article
A Speech-to-Text
System 18
Opinion Mining and
Sentiment Analysis 22
Article
Telemedicine in the State of
Maharashtra: A Case Study
24
www.csi-india.org
Practitioner Workbench
Programming.Tips() »
Passing Variable Number
of Arguments in C 29
Practitioner Workbench
CIO Perspective
Programming.Learn (“Python”) »
Plotting with Python 30
Managing Technology »
Business Information Systems:
Underlying Architectures 31
CSI Communications | May 2012 | B
www.csi-india.org
CSI Communications
Contents
Volume No. 36 • Issue No. 2 • May 2012
Cover Story
Editorial Board
Chief Editor
Dr. R M Sonar
Editors
5
9
Dr. Debasish Jana
Dr. Achuthsankar Nair
Resident Editor
Mrs. Jayshree Dhere
Desi Language Computing - on the Rise
Hareesh N Nampoothiri
Published by
Executive Secretary
Mr. Suchit Gogwekar
For Computer Society of India
Design, Print and
Dispatch by
CyberMedia Services Limited
Please note:
CSI Communications is published by Computer
Society of India, a non-profit organization.
Views and opinions expressed in the CSI
Communications are those of individual authors,
contributors and advertisers and they may
differ from policies and official statements of
CSI. These should not be construed as legal or
professional advice. The CSI, the publisher, the
editors and the contributors are not responsible
for any decisions taken by readers on the basis of
these views and opinions.
Although every care is being taken to ensure
genuineness of the writings in this publication,
CSI Communications does not attest to the
originality of the respective authors’ content.
© 2012 CSI. All rights reserved.
Instructors are permitted to photocopy isolated
articles for non-commercial classroom use
without fee. For any other copying, reprint or
republication, permission must be obtained
in writing from the Society. Copying for other
than personal use or internal reference, or of
articles or columns not owned by the Society
without explicit permission of the Society or the
copyright owner is strictly prohibited.
27
29
Programming.Tips() »
Passing Variable Number of
Arguments in C
“Correcting” SMS Text Automatically
Deepak P and L Venkata Subramaniam
12
14
18
22
24
Satyam Maheshwari and Sunil Joshi
Practitioner Workbench
Research Front
Approximate/Fuzzy String Matching
using Mutation Probability Matrices
Dr. Debasish Jana
30
Programming.Learn (“Python”) »
Plotting with Python
Articles
CIO Perspective
Emails and Web Pages in Local Languages
M Jayalakshmi
31
Managing Technology »
Business Information Systems:
Underlying Architectures
Sajilal Divakaran and Achuthsankar S Nair
Advisors
Dr. T V Gopal
Mr. H R Mohan
Technical Trends
Extending WEKA Framework for
Learning New Algorithms
Dr. R M Sonar
A Speech-to-Text System
Nishant Allawadi and Parteek Kumar
Opinion Mining and Sentiment Analysis
Jaganadh G
Telemedicine in the State of Maharashtra: A
Case Study
Randhir Kumar, Dr. P K Choudhary, and S M F
Pasha
Umesh P
Security Corner
35
36
Information Security »
Cyber Crimes on/by Children
Adv. Prashant Mali
IT Act 2000 »
Prof. IT Law Demystifies
Technology Law Issues: Issue No. 2
Mr. Subramaniam Vutha
PLUS
ICT@Society: Graphic Texting
37
Achuthsankar S Nair
Brain Teaser
38
Dr. Debasish Jana
Ask an Expert
39
Dr. Debasish Jana
Happenings@ICT: ICT News Briefs in April 2012
40
H R Mohan
CSI Report
Prof. Dipti Prasad Mukherjee and Dr. Dharm Singh
41
CSI News
43
Published by Suchit Gogwekar for Computer Society of India at Unit No. 3, 4th Floor, Samruddhi Venture Park, MIDC, Andheri (E), Mumbai-400 093.
Tel. : 022-2926 1700 • Fax : 022-2830 2133 • Email : [email protected] Printed at GP Offset Pvt. Ltd., Mumbai 400 059.
CSI Communications | May 2012 | 1
Know Your CSI
Executive Committee (2012-13/14)
»
President
Mr. Satish Babu
[email protected]
Vice-President
Prof. S V Raghavan
[email protected]
Hon. Treasurer
Mr. V L Mehta
[email protected]
Immd. Past President
Mr. M D Agrawal
[email protected]
Hon. Secretary
Mr. S Ramanathan
[email protected]
Nomination Committee (2012-2013)
Dr. D D Sarma
Mr. Bipin V Mehta
Mr. Subimal Kundu
Region - I
Mr. R K Vyas
Delhi, Punjab, Haryana, Himachal
Pradesh, Jammu & Kashmir,
Uttar Pradesh, Uttaranchal and
other areas in Northern India.
Region - II
Prof. Dipti Prasad Mukherjee
Assam, Bihar, West Bengal,
North Eastern States
and other areas in
East & North East India
Region - III
Mr. Anil Srivastava
Gujarat, Madhya Pradesh,
Rajasthan and other areas
in Western India
Region - IV
Mr. Sanjeev Kumar
Jharkhand, Chattisgarh,
Orissa and other areas in
Central & South
Eastern India
Region - V
Prof. D B V Sarma
Karnataka and Andhra Pradesh
Region - VI
Mr. C G Sahasrabudhe
Maharashtra and Goa
Region - VII
Mr. Ramasamy S
Tamil Nadu, Pondicherry,
Andaman and Nicobar,
Kerala, Lakshadweep
Region - VIII
Mr. Pramit Makoday
International Members
Regional Vice-Presidents
Division Chairpersons, National Student Coordinator & Publication Committee Chairman
Division-I : Hardware (2011-13)
Dr. C R Chakravarthy
[email protected]
Division-II : Software (2012-14)
Dr. T V Gopal
[email protected]
Division-IV : Communications
(2012-14)
Mr. Sanjay Mohapatra
[email protected]
Division-V : Education and Research
(2011-13)
Dr. N L Sarda
[email protected]
Division-III : Applications (2011-13)
Dr. Debesh Das
[email protected]
National Student Coordinator
Mr. Ranga Raj Gopal
Publication Committee
Chairman
Prof. R K Shyamsundar
Important links on CSI website »
Structure & Organisation
http://www.csi-india.org/web/csi/structure
National, Regional &
http://www.csi-india.org/web/csi/structure/nsc
State Students Coordinators
Statutory Committees
http://www.csi-india.org/web/csi/statutory-committees
Collaborations
http://www.csi-india.org/web/csi/collaborations
Join Now http://www.csi-india.org/web/csi/join
Renew Membership
http://www.csi-india.org/web/csi/renew
Member Eligibility
http://www.csi-india.org/web/csi/eligibility
Member Benefits
http://www.csi-india.org/web/csi/benifits
Subscription Fees
http://www.csi-india.org/web/csi/subscription-fees
Forms Download
http://www.csi-india.org/web/csi/forms-download
BABA Scheme
http://www.csi-india.org/web/csi/baba-scheme
Publications
http://www.csi-india.org/web/csi/publications
CSI Communications*
http://www.csi-india.org/web/csi/info-center/communications
Adhyayan*
http://www.csi-india.org/web/csi/adhyayan
R & D Projects
http://csi-india.org/web/csi/1204
Technical Papers
http://csi-india.org/web/csi/technical-papers
Tutorials
http://csi-india.org/web/csi/tutorials
Course Curriculum
http://csi-india.org/web/csi/course-curriculum
Training Program
http://csi-india.org/web/csi/training-programs
(CSI Education Products)
Travel support for International http://csi-india.org/web/csi/travel-support
Conference
eNewsletter*
http://www.csi-india.org/web/csi/enewsletter
Current Issue
http://www.csi-india.org/web/csi/current-issue
Archives
http://www.csi-india.org/web/csi/archives
Policy Guidelines
http://www.csi-india.org/web/csi/helpdesk
Events
http://www.csi-india.org/web/csi/events1
President’s Desk
http://www.csi-india.org/web/csi/infocenter/president-s-desk
* Access is for CSI members only.
ExecCom Transacts
http://www.csi-india.org/web/csi/execcom-transacts1
News & Announcements archive http://www.csi-india.org/web/csi/announcements
CSI Divisions and their respective web links
Division-Hardware
http://www.csi-india.org/web/csi/division1
Division Software
http://www.csi-india.org/web/csi/division2
Division Application
http://www.csi-india.org/web/csi/division3
Division Communications
http://www.csi-india.org/web/csi/division4
Division Education and Research http://www.csi-india.org/web/csi/division5
List of SIGs and their respective web links
SIG-Artificial Intelligence
http://www.csi-india.org/web/csi/csi-sig-ai
SIG-eGovernance
http://www.csi-india.org/web/csi/csi-sig-egov
SIG-FOSS
http://www.csi-india.org/web/csi/csi-sig-foss
SIG-Software Engineering
http://www.csi-india.org/web/csi/csi-sig-se
SIG-DATA
http://www.csi-india.org/web/csi/csi-sigdata
SIG-Distributed Systems
http://www.csi-india.org/web/csi/csi-sig-ds
SIG-Humane Computing
http://www.csi-india.org/web/csi/csi-sig-humane
SIG-Information Security
http://www.csi-india.org/web/csi/csi-sig-is
SIG-Web 2.0 and SNS
http://www.csi-india.org/web/csi/sig-web-2.0
SIG-BVIT
http://www.csi-india.org/web/csi/sig-bvit
SIG-WNs
http://www.csi-india.org/web/csi/sig-fwns
SIG-Green IT
http://www.csi-india.org/web/csi/sig-green-it
SIG-HPC
http://www.csi-india.org/web/csi/sig-hpc
SIG-TSSR
http://www.csi-india.org/web/csi/sig-tssr
Other Links Forums
http://www.csi-india.org/web/csi/discuss-share/forums
Blogs
http://www.csi-india.org/web/csi/discuss-share/blogs
Communities*
http://www.csi-india.org/web/csi/discuss-share/communities
CSI Chapters
http://www.csi-india.org/web/csi/chapters
Calendar of Events
http://www.csi-india.org/web/csi/csi-eventcalendar
Important Contact Details »
For queries, correspondence regarding Membership, contact [email protected]
CSI Communications | May 2012 | 2
www.csi-india.org
President’s Message
Satish Babu
From
: [email protected]
Subject : President’s Desk
Date
: 1st May, 2012
Dear Members
CSI organized its customary joint ExeCom on 31st March and
1st April, 2012 where the 2011-12 ExeCom demitted office and
the new ExeCom took charge. The ExeCom meeting held on
1st April, 2012, discussed several important policy matters
and also started the process of constitution of the statutory
committees that would steer the activities of CSI during the
year. These yearly start-up processes would be completed
latest by the month of May, so that they can get going with
their business.
WITFOR: One of the first events of the year that was
supported by CSI, was the 5th IFIP World IT Forum
(WITFOR), held in New Delhi during 16th-18th April, 2012.
The Conference, attended by over 950 delegates and
over 80 speakers from India and abroad, was organized
in partnership with the Department of Electronics and
Information Technology (DEITY), Government of India. The
National Organizing Committee of the Forum was headed
by the Union Minister of Communications & IT, Mr. Kapil
Sibal, who inaugurated the Forum at Vigyan Bhawan. The
speakers at the Conference also included the Minister of
State for Communications & IT, Mr. Sachin Pilot. The 2-day
event focused on the developmental opportunities offered
by digital technologies in the areas of agriculture, education,
e-Gov, and health.
Nashik Chapter’s 25th Anniversary: It is a pleasure to
note that CSI’s Nashik Chapter is entering their 25th year
of activity in 2012. One of the very active chapters of CSI,
Nashik Chapter has been privileged to carry out a number of
important activities for its members and other stakeholders,
and also contribute to the national leadership of CSI. I wish
the Nashik Chapter, its leaders, and members many more
years of adding value to the CSI community and to society
at large.
•
4th International Conference on Human Computer
Interaction held during 18th-21st April, 2012 at Symbiosis
Institute of Design (SID), Pune, organized by IFIP TC-13.
Many thanks to Prof. Anirudh Joshi.
•
RACSS-2012: International Conference on Recent
Advances in Computing and Software Systems held
during 25th-27th April, 2012 at Dept. of CSE, SSN College
of Engineering, Chennai. I convey my sincere thanks to the
joint organization committee of CSI Chennai Chapter &
Division IV, IEEE Madras Section, and IEEE CS.
As we get going with the current year, it is important to plan
for different events for the year, in particular Conferences,
which form an important segment of our activities, and also
contribute to the financial stability of CSI. The formal call for
proposals for events will be put forth shortly, and I request you
to start the process of planning events in your locations.
Chapter AGMs and New Office Bearers: In most chapters of
CSI, the Annual General Meetings have been conducted and
the new chapter Office Bearers have taken charge. CSI is keen
that all chapter Office Bearers - especially those new to CSI get adequate support when they require it, particularly about
the conduct of the business of the chapter and for the conduct
of events. The key resources for support are your Regional Vice
President and the CSI HQ.
Membership Growth: Membership growth is a high-priority
area for CSI. While the growth in student membership is
satisfactory, the growth in professional and institutional
membership has potential for improvement. We are examining
different mechanisms to enhance professional membership
and attract the new IT professional to CSI. One of the means
of doing this is to join hands with other societies, including
international societies, to provide additional value to our
members. Another mechanism being explored is the use of
social media to build a more accessible community. We hope
to put in place some of these steps over the next two months
for stimulating membership growth.
Kindly contact your RVPs and the CSI HQ Helpdesk (helpdesk@
csi-india.org) for any aspect where you need support.
With greetings
CSI Events during April: I convey my sincere appreciation
to organizers of following events that took place during the
month of April, 2012.
Satish Babu
President
CSI Communications | May 2012 | 3
Editorial
Rajendra M Sonar, Achuthsankar S Nair, Debasish Jana and Jayshree Dhere
Editors
Dear Fellow CSI Members,
It’s pleasure to bring to you CSIC issue with cover story on
‘Linguistic Computing’. Computers have affairs with both
programming languages and natural languages. With the wider
penetration of ICT in society, especially in the form of mobile
phones, the affair with natural languages is becoming more
central. While in the case of the programming languages it was
the programmer who was struggling, in case of natural language
computing, the challenge is really for the computer.
In a country like India, which is a linguistic cauldron, the
problem of linguistic computing is amplified. Organised efforts
are on in India towards this end. Technology Development for
Indian Languages (TDIL) programme launched by the Ministry
of Communication & Information Technology (MC&IT), Govt. of
India aims at developing systems to facilitate human-machine
interaction without language barrier; creating and accessing
multilingual knowledge resources; and integrating them to
develop innovative user products and services. The programme
also promotes language technology standardization through
participation in ISO, UNICODE, World-Wide-Web consortium
(W3C) and BIS (Bureau of Indian Standards). Of course, Google
is an important player in the scene as the whole world and its
languages are of concern to it.
Technology Development for Indian
Languages (TDIL) programme launched by
the Ministry of Communication & Information
Technology (MC&IT), Govt. of India aims
at developing systems to facilitate humanmachine interaction without language barrier;
creating and accessing multilingual knowledge
resources; and integrating them to develop
innovative user products and services.
In this issue we have an assortment of articles that touch
basic settings and services related to the use of language on the
web and in mobile phones to selected microscopic applications
such as sentiment analysis. (We suppose that readers have noted
that the cover page depicts the CSI web site translated into various
Indian languages by on-line tools).
Hareesh Namboothiri in his cover story article titled “Desi
Language Computing on the Rise” introduces basic desi-language
settings and services in computers and mobile phones. Another
cover story article on “ ‘Correcting’ SMS Text Automatically” by
P. Deepak and L. Venkata Subramaniam of IBM Research provides
insight into challenges posed by unusual abbreviations, shortening
and omissions, textese or SMS language to conventional electronic
processing of text.
Research Front column brings an article titled “Approximate/
Fuzzy String Matching using Mutation Probability Matrices” by
Sajilal D and Achuthsankar S Nair. The article addresses fuzzy/
approximate string matching in Indian languages. Three other
articles on the cover topic are specialised articles in the Articles
section. Article on “Emails and Web Pages in Local Languages”
CSI Communications | May 2012 | 4
by M. Jayalakshmi supplements and complements the first cover
story article. Mr. Nishant Allawadi and Prof. Parteek Kumar of
Thapar University in an article titled “Speech-to-Text System”,
present speech to text conversion using Hidden Markov Model
(HMM). Concept of sentiment analysis is introduced briefly by
Jaganadh G in his article titled “Opinion Mining and Sentiment
Analysis”.
Articles section also includes an article titled "Telemedicine
in the State of Maharashtra: A Case Study" by S M F Pasha, Randhir
Kumar and Dr. P K Choudhary based on their paper submitted at
SEARCC 2011. Technical Trends section is enriched with an article
on “Extending WEKA Framework for Learning New Algorithms”
by Mr. Satyam Maheshwari and Mr. Sunil Joshi.
Google is an important player in the scene
as the whole world and its languages are of
concern to it.
Practitioner Workbench column has a section titled
Programming.Tips() and it provides an interesting write-up on
“Passing Variable Number of Arguments in C” by Dr Debasish
Jana. The other section called Programming.Learn("Python")
under Practioner Workbench includes information about "Plotting
with Python".
Managing Technology section of the CIO Perspective
column includes an article titled “Business Information Systems:
Underlying Architectures” by Dr. RM Sonar. It is the third article in
the series of articles on Business Information Systems. It throws
light on various types of architecture starting from single-tier to
web-based multi-tier architecture and discusses key benefits and
key issues of the respective systems.
Information Security section of the Security Corner feature
has an article titled “Cyber Crimes on/by Children” written by
Advocate Prashant Mali. The article starts with two cases and
then goes about explaining how a child can be at risk in cyber
space and how computing platform can be used for committing
crime by children. The IT Act section under Security Corner comes
with an article by Advocate Mr. Subramaniam Vutha, wherein he
demystifies technology law and provides inputs on electronic
(Internet-based) contract.
Our ICT@Society covers a curio theme "Graphic Texting". As
usual there are other regular features such as Brain Teaser, Ask an
Expert and Happenings@ICT. CSI Reports and CSI News are about
various region, SIG, chapter and student branch events.
Please note that we welcome your feedback, contributions
and suggestions at [email protected].
With warm regards,
Rajendra M Sonar, Achuthsankar S Nair,
Debasish Jana and Jayshree Dhere
Editors
www.csi-india.org
Cover
Story
Hareesh N Nampoothiri
University of Kerala, Thiruvananthapuram
Desi Language Computing - on the Rise
English was the first language that got placed in modern computer systems and naturally got accommodated exclusively,
to the disadvantage of the other world languages. From the mnemonics used in assembly language, to the programming
language keywords, to operating system commands, English embedded itself. Some early programming languages
like COBOL almost sounded like English of nonnative speakers of the language. It is easy to weave an Anglo-centric
conspiracy story, but in all fairness to the professionals of the yesteryears, it must be remembered that computers
were not foreseen then as gizmo gadgets that ordinary citizens all over the world would own. As the popularity of the
notebooks, netbooks, and mobile devices shot up, the language problem began to take a central stage and naturally
multiple solutions began to emerge. Perhaps the turning point in language computing is the emergence of the Unicode.
Unicode is simply a computing industry standard for the consistent encoding, representation, and handling of text
expressed in most of the world's writing systems[1]. It set the stage for an organized development of a large number of
linguistic computing issues.
Even though the first version of Unicode was introduced in October 1991, it became popular only in the last decade.
As of now, Unicode supports a long list of languages including Indian languages such as Bengali, Hindi, Kannada,
Malayalam, Oriya, Tamil, Telugu etc. Now software developers come up with different language packs for different regions
and computers are becoming truly desi in this aspect. An example is Microsoft's CLIP (Caption Language Interface Pack)
for Visual Studio 2010 in which the author was also associated for developing a language interface pack.
Apart from reaching a wider audience through incorporating as many languages as possible, Unicode also opens a
wide range of possibilities for developers and service providers to come up with language-based tools and applications
for common man. It is not surprising that Google is the one in the lead, tapping the possibilities in this sector. We
introduce below a few of the language-based tools from Google.
Google Translate
Google Translate is a free translation
service from Google, which provides
instant translations between 65 different
languages (as of Apr 2012) including some
of the major Indian languages like Bengali,
Gujarati, Hindi, Tamil, Telugu, and Urdu.
Google Translation enables the users to
translate words, paragraphs of text, or
a whole website (using the Translator
toolkit) from one language to another.
According to Google the service aims to
make information universally accessible
and useful, regardless of the language in
which it’s written[3].
How does it work?
Google describes the working of Google
Translate as follows: When Google
Translate generates a translation, it looks for
patterns in hundreds of millions of documents
to help decide on the best translation for you.
By detecting patterns in documents that
have already been translated by human
translators, Google Translate can make
intelligent guesses as to what an appropriate
translation should be. This process of seeking
patterns in large amounts of text is called
"statistical machine translation". Since the
translations are generated by machines,
not all translation will be perfect. The more
human-translated documents that Google
Translate can analise in a specific language,
the better the translation quality will be. This
is why translation accuracy will sometimes
vary across languages[3].
In Practice
Let's see how it becomes useful in practice
by trying to translate a simple paragraph
from English to Hindi (Fig. 1). Of course, it
What is Unicode?
In early days, there were many different encoding systems for characters used in computers. These encoding
systems used to conflict with one another. That is, two encoding systems may use the same number to represent
two different characters or they may use different numbers for the same character. As a result, any given
computer was required to support many different encoding systems and even after that the chances of getting
data corrupted was very high.
To solve this issue, Unicode provides a unique number for every character irrespective of the platform,
application, or language. The Unicode Standard has been adopted by most of the leading players of the industry such as Apple,
Microsoft, Oracle, IBM, Sun etc. Also it is required by modern standards such as XML, Java, JavaScript, WML etc. It is supported in
many operating systems (including Linux distributions), all modern browsers, most of the recent versions of office suites, and many
other applications.
The Unicode Consortium, a non-profit organization, is dedicated to develop, extend, and promote use of the Unicode standard.
According to them the advantage of using Unicode is:
Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy
character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries
without re-engineering. It allows data to be transported through many different systems without corruption[2].
CSI Communications | May 2012 | 5
Fig. 1: Google Translator Page: http://translate.google.com/
does not produce a grammatically correct
translation, but it does produce a useful
text in Hindi. Apart from providing the
translation of the text, it also provides the
phonetic rendition of the text in English.
One can hear the translated text by
clicking the speaker icon.
There is also an option to rate the
resulting translation by clicking the tick
mark. One can rate a particular translation
as Helpful, Not helpful, or Offensive. The
tool also offers alternative translations
and an option to re-order blocks of words
for reconstructing the translated sentence
(Fig. 2).
So what about translating from one
Indian language to English? For that we
need to type-in the text in the required
Indian language. There is another tool
from Google, Google Transliteration (still
Fig. 2: The tool suggests alternative translation
when the user click and hold on a block of words
CSI Communications | May 2012 | 6
Translate source window itself. Another
option is to copy-paste the typed text from
Google Transliteration window.
Note:
Apart
from
Google
Transliteration, there are many online and
offline tools available, that will help you
to type-in text in Indian languages. For
Windows-based systems one may use Indic
Input 2 (for Windows Vista / 7) or Indic
Input 1 (for Windows XP). By installing this
tool, one can type-in text in any text editor
(such as Notepad, Wordpad, LibreOffice,
Writer etc.) by enabling the phonetic
keyboard and selecting the appropriate
Unicode font. The tool can be downloaded
freely from the BhashaIndia website.
URL: http://bhashaindia.com/Downloads/
Here are some amusing translation
examples – the lyrics of a Hindi film
song (Fig. 4) and our national anthem
(Fig. 5). When the Hindi film song
lyrics are translated, the tool produces
acceptable results but the translation for
the national anthem is amusing, to say
the least. In short, for simple functional
sentences it produces better translations
and for creative writings (such as poems)
the results may not be of utility.
Developers can integrate the
application in the websites and it
in labs) that will help you to type in other
languages without learning the actual
keys corresponding to
the alphabets of that
particular
language.
Here we will type 'mera
bhArath mahaan' to get
'esjk kjr egku' in Hindi.
The transliteration
window
(Fig.
3)
provides
required
options to edit and
format the text. Google
provides transliteration
API that helps the
developers to enable
transliteration facilities
in their websites. The
transliteration API is
incorporated in Google
Translate as well. When
a language other than
English is selected in
Google Translate source
window, an option to
enable phonetic typing
will be available. By
enabling the option,
one can directly type-in
Fig. 3: Google Transliteration window
the required text in the
www.csi-india.org
Alternatively, you may install Google
Toolbar or get a bookmark for your
language from the Tools and Resources
page. URL: http://translate.google.com/
translate_tools
Mobiles & Tablets Too Go Desi!
Fig. 4: Hindi film song lyrics translated to English
automatically translates the website
to another language according to the
choice selected by the user (Fig. 6).
Even though the tool does not produce
acceptable results all the time, it will
be useful in translating websites to
local languages (or foreign languages)
using the Translator Toolkit provided by
late?tl=hi&u=http://www.csi-india.org
The tl (target language) parameter
corresponds to the language of your
choice (hi for Hindi, tl for Tamil, bn
for Bengali and so on) and u is the URL
of the website you wish to translate. The
translated version of the CSI website is
shown in in Fig. 7.
Fig. 5: National Anthem translated from Bengali to English
Google. At least the users will get some
idea about the contents of the website
instead of seeing the website in some
alien language.
What if the website does not provide
a translation option by default? Still, it is
possible to view the website in a language
of your choice. For example, Computer
Society of India website does not have an
option to switch between languages. But
still it is possible to display the website in
Hindi or in any one of the 65 languages
provided by Google Translator. If you wish
to see the CSI website in Hindi, enter the
following URL in the address bar:
h t t p : //t ra n s l a t e . go o g l e .co m /t ra n s
It is not happening with computers alone.
Most of the modern mobile devices
(Smartphones, tablets etc.) boast the
power of computers we had three decades
back. Apple Lisa[4] (released in Jan 1983),
the first personal computer which offered
GUI, had the processing power of Motorola
68000 @ 5 MHz. Now the medium range
smartphone, Motorola Defy has 800 Mhz
processor. If the memory of Apple Lisa was
1 MB RAM (In Lisa 2 only Apple introduced
10MB internal hard disk drive!), Motorola
Defy has 512 MB RAM, 2 GB internal
storage, and it supports microSDHC upto
32 GB! The tablets currently available in
the market are even more powerful and we
may consider them as minicomputers, only
difference being the lack of input devices
like keyboard and mouse (Of course, they
permit to add them too via Bluetooth or
USB!). Mobile devices are becoming more
popular and the manufacturers are trying
to reach mass public by incorporating
local language support in their mobile
devices. Clearly, the 'desification' is not
going to happen in computers alone but it
will extend to mobile devices as well.
Many of the devices produced by
various cell phone/tablet manufacturers
like Nokia, Sony, Samsung, LG, Motorola
etc. already allow the users to select a
language for the phone interface. Entering
and displaying Indic languages directly in
mobile devices (for sending messages,
for contact details, for writing notes etc.)
is still in the development stages. Apple,
the leading mobile device manufacturer,
provides local language support in
Fig. 6: Sample website with Google Translate enabled using the API.
When the user scrolls over the text, the original text will be displayed as a tool-tip dialogue
CSI Communications | May 2012 | 7
Fig. 7: CSI website translated to Hindi
their iPhones and iPads based on iOS
mobile operating system. Even though
many of the other devices from various
manufacturers do not have native support
for Unicode, there are device specific
work-arounds available for incorporating
Unicode functionality in those mobile
devices, especially for devices based on
Android platform. Android-based devices
from Samsung, LG etc. comes with
support for Indian languages by default.
In some mobiles, in the keypad itself, the
Hindi alphabets are printed along with
English alphabets to make entering the
text easy as possible. Fig. 8 shows a lowend Android mobile phone from LG using
Google Translate. The text produced is
then copy-pasted to a message and send.
If the party receiving the message has a
mobile device with Unicode support, then
the text will be rendered correctly or else
the receiver will get a series of squares
instead of the actual message.
It is very obvious that developments
in Indian language computing have moved
very much to web and mobile platform
rather than as stand-alone applications
on PCs. The demand for these tools now
arise from the common man and not from
business or universities. That explains the
vibrancy of this field in this current times.
References
[1]
Wikipedia – Unicode
http://en.wikipedia.org/wiki/Unicode
[2] What is Unicode?
http://www.unicode.org /standard/
WhatIsUnicode.html
[3] About Google Translate
http://translate.google.com/about/intl/
en_ALL/
[4] Wikipedia - Apple Lisa
http://en.wikipedia.org/wiki/Apple_Lisa
n
Fig. 8: Hindi text displayed on an Android mobile phone
About the Author
Hareesh N Nampoothiri is a visual design consultant with an experience of more than a decade and worked with
government organizations like C-DIT, C-DAC, University of Kerala, and other private organizations. Currently, he
is doing interdisciplinary research in ethnic elements in visual design in computer media. He is an author of two
books on graphic design and a regular contributor in leading technology magazines including CSI Communications.
Kathakli, blogging, and photography are his passions. He has directed a documentary feature on Kathakali and also
directed an educational video production for IGNOU, New Delhi.
CSI Communications | May 2012 | 8
www.csi-india.org
Cover
Story
Deepak P* and L Venkata Subramaniam**
* IBM Research - India, Bangalore; [email protected]
** IBM Research - India, New Delhi; [email protected]
“Correcting” SMS Text Automatically
Abstract
With the rapidly increasing penetration of
mobile phones and microblogging, texting
language is fast becoming the language
of the youth. Characterized by unusual
abbreviations, shortening, and omissions,
textese or SMS language poses a challenge
to conventional electronic processing of
text. In this article, we present an overview
of recent work on automatically cleaning
SMS text.
Introduction
SMS language, also called textese,
is becoming increasingly popular
with widespread usage of SMS and
microblogging sites to share information.
Normalization of text written in such lingo,
i.e. conversion to their clean versions, is a
necessary prerequisite to enable electronic
processing of such text. Conversion of
SMSes to non-noisy versions would
aid improved speech synthesis to help
visually impaired mobile phone users.
Clean SMSes can be accurately translated
automatically, thus enabling seamless
SMS communication between users of
different natural languages.
Noise in text is defined as any kind
of difference in the surface form of an
electronic text from the intended, correct,
or original text[6]. Under such a definition,
SMS language would qualify to be very
noisy. The types of noise in SMS text have
been classified[1,6] into various categories
such as character deletion, phonetic
substitution, and word deletion. Common
categories of noise and their examples at
the word or phrase level are tabulated in
Fig. 1. Many a time, combinations of noise
categories may be used to shorten long
words. For example, tomorrow may often be
transformed to 2mro using a combination
Type of Noise
of phonetic substitution (“to” transformed
to “2”) and character deletion. The same
word may be transformed by different
users to different kinds of noisy variants.
The single word, tomorrow, was observed
to manifest in 16 different forms[3,7] in a
corpus of thousand SMSes; a few of them
are illustrated in Fig. 2.
SMS normalization refers to the
task of converting SMS text that could be
noisy into their intended non-noisy form.
Thus, an SMS normalization technique
could potentially transform the noisy
SMS itll b gud 2 c u tonite to the clean
version it will be good to see you tonight.
Most SMS normalization techniques
need a set of noisy SMSes and their clean
versions that may have to be manually
generated, referred to as the training
set. A machine learning algorithm then
works on such pairs to learn a model. This
learning process is illustrated in Fig. 3. A
simplistic learner may simply learn a set of
conditional probabilities as a model, with
p(w’|w) denoting the probability that the
noisy word w is actually a variant of the
non-noisy word w’:
p(w'|w)=
# SMSes where w and w' occur in the noisy and
clean version respectively
# SMSes where w occurs in the noisy version
The normalization phase uses the
learned model to normalize (clean) a
noisy input SMS and output the clean
SMS. Our simple model could be used to
replace each word, w, in the noisy SMS by
2moro
tomm
tomoro
tomorow
2mro
tomra
tomorrow
tom
morrow
tomora
tomo
tomrw
Fig. 2: Noisy variants of “tomorrow”
the word v such that p(v|w) is maximum
among the conditional probabilities
involving w, i.e. p(.|w). An illustration
of the normalization phase appears in
Fig. 4. State-of-the-art techniques use
more sophisticated models than a simple
formulation of conditional probabilities
outlined above. We will outline techniques
that use statistical machine translation
(SMT) and spelling correction-based
models in the remainder of the paper.
Statistical Machine Translation
We now use a toy example to illustrate
how a simple SMT model[2] may be used
to learn the mappings between words and
SMS1:[ma, my] [hse, house] = 0.5
SMS1:[ma, house] [hse, my] = 0.5
SMS2: [ma, my] [buk, book] = 0.5
SMS2: [ma, book] [buk, my] = 0.5
Table 1: Initial word alignment configuration
[Noisy SMS, clean SMS] Pairs
“btw, r u goin 4 d movie”
“by the way, are you going for the movie?”
“itll b gud 2 c u tonite”
it will be good to see you tonight”
Example
”
Character deletion
“message”
“msg”
Phonetic substitution
“to”
Abbreviation
“laugh out loud”
Informal usage
Word deletion
“going to”
“2”
“lol”
Learner
“gonna”
“driving back home” “drivin
“drivin hm”
hm”
Fig. 1: Types of noise in SMS text
Learned
model
“lemme no wen u gt thr”
“let me know when you get there”
Fig. 3: Learning process
CSI Communications | May 2012 | 9
my
house
book
ma
1.0
0.5
0.5
hse
0.5
0.5
0.0
0.5
0.0
0.5
buk
Column-wise normalization
my
house
book
ma
0.50
0.50
0.50
hse
0.25
0.50
0.00
buk
0.25
0.00
0.50
Table 2: Populated word-word table
their noisy variants using the training set
of SMS pairs. Consider two hypothetical
noisy SMSes, ma hse and ma buk, which
map to their correct variants my house and
my book respectively. We will not make
any assumptions on the preservation of
word ordering in the noisy variant of the
clean SMS. Thus, we have the two possible
word alignments for the [ma hse, my house]
pair that we will initialize to being equally
likely. A word alignment for a training SMS
is a mapping from each word in the noisy
version to a word in the clean version.
Such an initial configuration of SMS word
alignments are depicted in Table 1.
Now, we will use these word
alignment probabilities to populate the
word-to-word mapping probabilities
between the noisy word vocabulary [ma,
hse, buk] and the correct vocabulary
Noisy
SMS
[my, house, book]. Since the
SMS1:[ma, my] [hse, house] = 0.50 * 0.50 = 0.250
mapping [ma, my] occurs
SMS1:[ma, house] [hse, my] = 0.50 * 0.25 = 0.125
in two different alignments,
SMS2: [ma, my] [buk, book] = 0.50 * 0.50 = 0.250
each with confidence 0.5,
SMS2: [ma, book] [buk, my] = 0.50 * 0.25 = 0.125
we will initialize the mapping Normalization of word-alignment probabilities per training SMS
to have a confidence of
1.0. Similarly, all pairs are
SMS1:[ma, my] [hse, house] = 0.250/(0.250+0.125) = 0.67
SMS1:[ma, house] [hse, my] = 0.125/(0.250+0.125) = 0.33
initialized to the sum of
SMS2: [ma, my] [buk, book] = 0.250/(0.250+0.125) = 0.67
confidences of all alignments
SMS2: [ma, book] [buk, my] = 0.125/(0.250+0.125) = 0.33
in which they occur.
Such a matrix, shown in
Table 3: Modified word alignments for SMSes
Table 2, is then normalized
column-wise so that each word in the
probabilities; such an iterative process
target vocabulary (i.e. vocabulary of
leads to a final converged matrix
clean SMSes) has values summing up
approximately of the form as shown in
to unity. Such a process of creation and
Table 4. Thus, an iterative sequence of
normalization of the word-word mapping
estimating word-alignment probabilities
probability tables is illustrated in Table 2.
and word-word mappings enables us
In an iterative style, the wordto drill-down to the correct mappings
mapping probabilities may now be used
[ma → my, hse → house, buk → book]
to compute refined word alignments for
that can then be used to convert a new
training SMSes. The confidence of each
SMS to its clean version in a word-byalignment is computed as the product
word manner. Though such a simplistic
of the word mappings contained in
translation model (called IBM Model 1)
the alignment. Thus, the {[ma,house]
is very popular, sophisticated SMT
[hsr,my]} alignment of SMS1 is assigned
models that can learn many-to-many
a confidence of 0.125 (product of
mappings between words are often used
0.50 from [ma,house] and 0.25 from
to achieve more accurate mappings.
[hse,my]). These are then normalized so
my
house
book
that the confidences of all alignments for
a single SMS sums up to unity. Table 3
ma
0.99
0.00
0.00
illustrates this process of refinement
hse
0.00
0.99
0.00
of word alignment confidences. These
buk
0.00
0.00
0.99
can then be used to estimate new wordword mapping probabilities followed
Table 4: Converged word-mapping probabilities
by estimation of new alignment
“wot a match, luvd evry bit o it”
Learned
model
Model
applier
Cleaned
SMS
“what a match, loved evry bit of it”
Fig. 4: Normalization process
CSI Communications | May 2012 | 10
SMT-based Approaches to SMS
Normalization
The SMT paradigm has been found to
be the most effective among the various
paradigms that have been tried for SMS
normalization. An adaptation of the
traditional SMT models[2] was first used
for SMS normalization to learn phrasebased alignments between the SMS
and a candidate clean text. This uses a
phrase-based model instead of the wordbased model described above and learns
mappings between phrases in clean text
and phrases in SMSes using an iterative
approach. A comparative study of SMS
normalization approaches[5] finds that
SMT-based systems are significantly less
error-prone than other approaches. Even
in cases where a training set of noisy
and clean SMS pairs are unavailable, the
machine translation paradigm[4] has been
used by creating a pseudo-translation
www.csi-india.org
T @
O @
G1
‘T’
S0
G2
‘O’
D @
A @
Y @
G3
‘D’
G4
‘A’
G5
‘Y’
S6
(a) Graphemic path
T
P1
/T/
S0
A O U
D
Y E I
P2
/AH/
P3
/D/
P4
/AY/
2
S1
“2”
S0
G1
‘T’
G2
‘O’
P1
/T/
P2
/AH/
S1
“2”
S6
(b) Phonemic path
G3
‘D’
P3
/D/
G4
‘A’
G5
‘Y’
P4
/AY/
S6
(c) Cross-linked
Fig. 5: Word HMMs for SMS normalization
model
based
on
heuristic-based
estimation of SMS word to clean word
mappings.
Hidden Markov Models for SMS
Normalization
About the Authors
Another paradigm that has been explored
for SMS normalization is to model
omissions and noisy variations explicitly.
Towards this, an HMM-based word
model[3] is constructed for each word in
a training set of words. A hidden markov
model may be considered as a set of
interconnected states, each of which may
emit certain values based on their output
probabilities which are then seen in the
output. In the formulation proposed in
Choudhury et. al.[3], the noisy variant of a
word is considered to be emitted from a
word’s HMM.
Consider the word today; the
ordered set of graphemes within it
is [`t`,`o`,`d`,`a`,`y`] whereas the
corresponding set of phonemes is [/T/,
/AH/, /D/, /AY/]. Fig. 5(a) represents a
HMM constructed out of the graphemes
(characters, in our context). This is
represented as a linear sequence of
hidden states, each state corresponding
to a token in the grapheme set. In a nonnoisy version, each HMM state would
emit the corresponding token; thus, a
left-to-right HMM would always emit the
correct word. However, since noise is what
is to be modeled, each state is formulated
to be able to emit either the corresponding
grapheme, any other token (represented
by ‘@’ in the figure), or nothing at all
(represented as ε). A similar phonemic
HMM is represented in Fig. 5(b).
The transformation of a phoneme to a
grapheme is itself noisy, and thus, the
emission set only includes the graphemes
that could possibly map to the phoneme
associated with the state. The “to” part
in “today” may be transformed to the
numeral “2” due to phonemic similarity,
and Fig. 5(b) shows how that is accounted
for in the phonemic HMM.
The graphemic and phonemic HMMs
are cross-linked intuitively to produce a
single HMM as shown in Fig. 5(c) (emission
graphemes are omitted in the figure to
reduce clutter). Each clean word, along
with its noisy variants, is used as a training
corpus to learn the transition probabilities
and emission probabilities. For example,
at the end of the training, state G1 may
have an emission probability distribution
[‘T’:0.8, ε:0.1, @:0.1] and an onward state
transition distribution as [G2: 0.6, P2: 0.4].
Such learnt HMMs are then post-processed
and harnessed using standard techniques
to decode the “clean” version from a
noisy word. Such word-level cleansing is
aggregated to achieve normalization of
SMS text to their clean versions.
Summary
With increasing popularity of the
SMS language through SMSes and
microblogging websites, cleansing SMS
text is a prerequisite for effective
development and deployment of services
such as text-to-speech and automatic
translation. There has been a lot of interest
in developing techniques to cleanse SMS
text of late. In this article, we outlined the
problem of normalization of SMSes to
their intended clean versions, and briefly
surveyed various techniques that have been
developed for the purpose. We specifically
focused on the usage of machine translation
models, a popular paradigm for accurate
decoding of SMS text.
References
[1] AiTi Aw, et al. (2006). “A PhraseBased Statistical Model for SMS Text
Normalization”, Proceedings of COLING/
ACL Conference, Sydney, Australia.
[2] Brown, P, et al. (1993). “The mathematics
of statistical machine translation:
parameter estimation”, Computational
Linguistics, 19(2), 263-311.
[3] Choudhury, M, et al. (2007).
“Investigation and modeling of the
structure of texting language”, 1st
Intl. Workshop on Analytics for Noisy
Unstructured Text Data, Hyderabad,
India.
[4] Contractor, D, et al. (2010).
“Unsupervised cleansing of noisy text”,
Proceedings of the COLING Conference,
Beijing, China.
[5] Kobus, C, et al. (2008). “Normalizing
SMS: are two metaphors better than
one?” Proceedings of the COLING
Conference, Manchester.
[6] Venkata Subramaniam, L, et al. (2009).
“A survey of types of text noise and
techniques to handle noisy text”,
Proceedings of the Third Workshop on
Analytics for Noisy Unstructured Text
Data, Barcelona, Spain.
[7] Venkata Subramaniam, L (2010).
“Noisy Text Analytics”, Tutorial at the
NAACL HLT Conference, Los Angeles,
n
USA.
Deepak P is currently with the Information Management group at IBM Research - India, Bangalore. He received
a B.Tech degree in computer science and engineering from Cochin University at Kochi, and M.Tech in the same
discipline from IIT Madras, India. He is currently pursuing his PhD with the department of computer science and
engineering at IIT Madras. His main research interests are in the area of data mining, similarity search, case-based
reasoning and information retrieval. L Venkata Subramaniam received the BE degree in electronics and communication engineering from Mysore
university, the MS degree in electrical engineering from Washington University, St. Louis, and the PhD degree in
electronics from IIT Delhi. He presently manages the Information Processing and Analytics group in IBM Research India, New Delhi. His research interests include machine learning, natural language processing, speech processing
and their applications to data analytics.
CSI Communications | May 2012 | 11
Research
Front
Sajilal Divakaran* and Achuthsankar S Nair**
*FTMS School of Computing, Kuala Lumpur
**University of Kerala
Approximate/Fuzzy String Matching using
Mutation Probability Matrices
We consider the approximate/fuzzy string matching problem in Malayalam language and propose a log-odds
scoring matrix for score-based alignment. We report a pilot study designed and conducted to collect a statistics
about what we have termed as “accepted mutation probabilities” of characters in Malayalam, as they naturally
occur. Based on the statistics, we show how a scoring matrix can be produced for Malayalam which can be used
effectively in numeric scoring for the approximate/fuzzy string matching. Such a scoring matrix would enable
search engines to widen the search operation in Malayalam. Being a unique and first attempt, we point out a
large number of areas on which further research and consequent improvement are required. We limit ourselves
to a chosen set of consonant characters and the matrix we report is a prototype for further improvement.
Keywords – approximate string matching,
fuzzy string matching, scoring matrix,
Malayalam
Computing,
Language
Computing.
Introduction
Linguistic Computing issues in non-English
languages are generally being addressed
with less depth and breadth, especially
for languages which have small user base.
Malayalam, one such language, is one of
the four major Dravidian languages, with a
rich literary tradition. The native language
of the South Indian state of Kerala and the
Lakshadweep Islands in the west coast
of India, Malayalam is spoken by 4% of
India’s population. While Malayalam is
integrated fairly well with computers,
with a user base that may not generate
huge market interest, such fine issues
of language computing for Malayalam
remains unaddressed and unattended.
If we were to search Google to look
for information on the senior author of this
paper, Achuthsankar, and we gave the query
as Achutsankar or Achudhsankar, in both
cases Google would land us correctly in the
official web page of the author. This “Did
you mean” feature of Google is managed by
the Google-diff-match-patch[4]. The match
part of the algorithm uses a technique
known as the approximate string matching
or fuzzy pattern matching[10]. The close/
fuzzy match to any query that is received by
the search engine is routine and obvious to
the English language user. However, when
a non-English language such as Malayalam
is used to query Google, the same facility
is not seen in action. When the word
പതിനായിരം (Pathinaayiram - Malayalam
word for the number ten thousand) is used
CSI Communications | May 2012 | 12
as a query in Google Malayalam search, we
are directed to documents that contain a
similar word (Payinaayiaram - a common
mispronunciation of the original word) but
not the word പയിനായിരം. This is because
approximate/fuzzy string matching has
not been addressed in Malayalam. In this
paper we make preliminary attempts
toward addressing this very special issue
of approximate/fuzzy string matching in
Malayalam.
Approximate/Fuzzy String
Matching
The field described as approximate or
fuzzy string matching in computer science
has been firmly established since 1980s.
Patrick & Geoff[5] define approximate
string matching problem as follows: Given
a string s drawn from some set S of possible
strings (the set of all strings composed
of symbols drawn from some alphabet
A), find a string t which approximately
matches this string, where t is in a subset
T of S. The task is either to find all those
strings in T that are “sufficiently like” s,
or the N strings in T that are “most like”
s. One of the important requirements to
analyze similarity is to have a scientifically
derived measure of similarity. The soundex
system of Odell and Russell[13] is perhaps
one of the earliest of such attempts to
use such a measure. It uses a soundex
code of one letter and three digits.
These have been used successfully in
hospital databases and airline reservation
systems[8]. Damerau-Leveshtein metric[2]
proposed a measure - the smallest number
of operations (insertions, deletions,
substitutions, or reversals) to change one
string into another. This metric can be used
with standard optimization techniques[14]
to derive the optimal score for each string
matching and thereby choose matches in
the order of closeness.
Approximate or fuzzy string
matching is in vogue not only in
natural languages but also in artificial
languages. In fact approximate string
matching has been developed into a
fine art in computational sciences, such
as bioinformatics. Bioinformatics deals
mainly with bio sequences derived
from DNA, RNA, and Amino Acid
Sequences[9]. Dynamic programming
algorithm
(Needleman–Wunch
and
Smith–Waterman
algorithms)[11]
which enable fast approximate string
matching using carefully crafted scoring
matrices are in great use in bioinformatics.
The equivalent of Google for modern
biologist is basic local alignment search
tool (BLAST)[1], which uses scoring
matrices such as point accepted mutation
matrices (PAM)[3] and BLOcks of Amino
Acid SUbstitution Matrix (BLOSUM)[6]. To
the best of the knowledge of the authors,
such a scoring system is not in existence
for any natural language including English.
Recently an attempt has been made in
this direction for English language[7]. The
statistics for accepted mutation in English
was cleverly derived based on already
designed Google searches.
In the case of Malayalam, statistics
of character mutations are not easily
derivable from any corpus or any existing
search engines or other language
computing tools. Hence, data for this
needs to be generated to go ahead with
development of scoring matrix system. We
www.csi-india.org
will now describe generation of primary
data of natural mutation in Malayalam.
Occurrence and Mutation
Probabilities
Malayalam has a set of 51 characters,
and basic statistics of its occurrence and
mutation are required for developing
a scoring matrix. The occurrence
probabilities are available, derived from
corpus of considerable size in 1971 and
again in 2003[12]. We describe here only a
subset of characters in view of economy of
space. In Table 1, we give the probabilities
of one set of consonants, which we have
extracted from a small test corpus of
Malayalam text derived from periodicals.
ക
0.606
ഖ
ഗ
ഘ
ങ
k
0.009 0.044 0.004 0.039
0.297
Table 1: Occurrence probabilities of a set
of selected Malayalam consonants
We then designed and conducted a
study to extract the character mutation
probabilities. We selected 150 words that
cover all the chosen consonant characters.
A dictation was administered among a
small group of school children (N=30).
The observed mistakes (natural mutations)
are tabulated in Table 2 as probabilities.
It is noted that the sample size
of N=30 is inadequate for a linguistic
study of this kind. However, as already
highlighted, this paper reports a pilot
study to demonstrate proof of the
concept. Moreover, the sample size can be
made larger once the research community
whets the approach put forward by us.
ക
ഖ
ഗ
ഘ
ങ
k
ക
0.85
0.25
0.45
0.07
0
0.10
ഖ
0
0.55
0
0
0
0
ഗ
0.06
0.04
0.47
0.09
0
0
ഘ
ങ
k
0
0.01
0
0.85
0
0
0
0
0
0
0
0
0.08
0.11
0.08
0
0
0.90
Table 2: Probability of natural mistakes
(natural mutation probabilities) of chosen
set of consonant characters
Log-odds Scoring Matrix
It is possible to use Table 2 itself for
scoring string matches. However, it might
be unwieldy in practice. For long strings we
will need to multiply probabilities, which
might result in numeric underflow. Hence,
we will use a logarithmic transformation.
Another effect that we will use is to convert
from probability to odds. The odds can be
defined as the ratio of the probability of
occurrence of an event to the probability
that it does not. If the probability of an
event is p, then odds is p/1-p. We will
however not use this formula directly, but
define odds for any given match i-j as:
In the above equation, pij is the
probability that character i mutates to
character j and pj is the probability of
natural occurrence of character j. Thus
the negative score for a mutation of a
less frequently occurring character will
be more in this scheme. The multiplier
10 is used just to bring the scores to a
convenient range. Table 3 shows the logodds score thus derived using occurrence
probabilities and mutation probabilities
given in Table 1 and 2. These can be used
to score approximate matches and select
the most similar one.
ക
ഖ
ഗ
ഘ
ങ
k
ക
2
15
10
11
-30
-4
ഖ
-30
18
-30
-30
-30
-30
ഗ
-16
6
11
13
-30
-30
ഘ
-30
3
-30
23
-30
-30
ങ
-30
-30
-30
-30
-30
-30
k
-9
11
0.08
-30
-30
5
Table 3: Log-odds probability of natural
mistakes (mutation probabilities) of chosen
set of consonant characters (We set score
corresponding to 0 as -30. It may be noted that
the diagonal elements are strongest in each
respective column.)
Results, Discussions, and
Conclusion
The prototype scoring matrix we have
designed above can be demonstrated
to be capable of scoring approximate
matches and can therefore be a means
of selecting the closest match. We will
demonstrate this with an example of
scoring four approximate matches for
the word കk. Table 4 lists the scores for
the four different matches and the exact
match scores best. The next best match as
per the new scoring scheme is കക.
കk
കk
കk
കഖ
കk
കഘ
കk
കക
2+5
2 - 30
2 - 30
2-4
Total
Score: 7
Total
Score: -28
Total
Score: -28
Total
Score: -2
Table 4: Demonstrating use of scoring
matrix in Table 3 on sample approximate
string matches
Our demonstration has been on a
chosen set of consonant characters, but
it can be expanded to cover all Malayalam
characters. For demonstrating more
general words, scoring matrix for vowels
is essential. We have computed the same
and will be reporting it in a forthcoming
publication. During our studies, we also
noticed that the grouping of characters
as done conventionally may not suit
our studies. For example, we found that
the character ഹ is a possible mutation
for ക, very rarely, even though they are
not grouped together conventionally. A
regrouping based on natural mutations
is a work we see as requiring attention.
To the best of our knowledge, our
work is a unique proposition for the
Malayalam language, which can be
incorporated into Malayalam search
engines. We would like to reiterate that
our work is in prototype stage. The
sample size of the corpus as well as the
size of the subjects in the survey is not
substantial. The authors hope to expand
the work with a sizable database from
which statistics is extracted and then
the scoring matrix can be made more
reliable. We also propose to validate
the scoring approach with sample trials
involving language experts.
References
[1] Altschul, S F, et al. (1990). “Basic local
alignment search tool”, Molecular
Biology, 215(3), 403-410.
[2] Damerau, F J (1964). “A technique
for
computer
detection
and
correction of spelling errors”, ACM
Communications, 7(3), 171-176.
[3] Dayhoff, M O, et al. (1978). “A model
of Evolutionary Change in Proteins”,
Atlas of protein sequence and structure,
5(3), 345-358.
[4] Google-diff-match-patch,
[Online].
Available: http://code.google.com/p/
google-diff-match-patch/, Accessed
on 20 Jan. 2012.
Continued on Page 37
CSI Communications | May 2012 | 13
Article
M Jayalakshmi
Formerly of Vikram Sarabhai Space Centre, Dept of Space, Govt of India
Emails and Web Pages in Local Languages
Emails, text chats, and instant messages
will become personalized and more
impressive at times, if they are received in
most familiar local languages. Similar is the
case with online news and local language
web pages. Those who are less literate in
English as compared to their fluency in
local languages, feel comfortable with a
local language scripted emails/web pages
compared to the corresponding English
versions of the same. Here the local
language is used in Indian context only. Let
us look into some of the specific language
tools and the languages they support.
To read or write a local language
scripted text, the required fonts must
be present on your computer (PC).
Windows, Macintosh, and Linux operating
systems can use true-type fonts, which
are available via downloadable installers.
Installation needs to be done only once.
Some web browsers have to be set up
in utf-8 encoding format also. Nothing
further is required for reading.
Now in order to create, edit, and
upload (send) texts in local languages, some
language converters are to be installed or
must be available in your PC. A number of
language support tools, offline and online,
free as well as non-free are available on the
net. This article addresses some of these
basic tools required to be set up in your PC
for this purpose. There are keyboard maps
and virtual keyboards supported by office
software packages to type directly into the
editor to create or update documents in any
language, which comes along with the OS.
But this will be a cumbersome process unless
one is not conversant with that particular
language typing and editing. Moreover, the
fonts generated out of this process may
not be web-fonts and hence readability will
be lost. To overcome this, further software
conversions and processing may be required
to make them web loadable. There are
some simpler short cuts to overcome these
processes by sticking to the typing in the
familiar English keyboard itself.
A number of online and offline
transliteration
(language
conversion
according to sound) tools are available free
on the net in the form of html web pages with
multiple text-boxes, like window panes. One
can type English alphanumeric characters
(lower and upper cases in combination)
according to the sound of the local language
CSI Communications | May 2012 | 14
character to be produced. This is called
phoneme transliteration.
On the left window (English language
editing window) you can type and edit
the characters according to the target
language phonetics (character sound) and
on the right or bottom pane, the vernacular
character
will
be
simultaneously
generated. For example, the typed text (on
left column) will be rendered as follows:
After you complete the partial
or full editing of the English phonetics
corresponding to a local language text,
the local language characters will appear
on the text-box (right window pane) in a
(“Chillaksharam”), they can also be
incorporated by these alphanumeric
character sequences or from virtual
keyboards of Unicode characters
installed in your system. Department of
Information Technology, Government
of India has accepted Unicode
encoding for fonts as Indian standard
in this regard.
Set Up Your System for Local
Language Use
If you are using Linux operating system,
the installation procedure is as follows:
1. Download the font file from the site
-
स िर ग म प ध िन स
Devanagari
sa ri ga ma pa Dha ni sa
-
स िर ग म प ध िन स
Hindi
sa ri ga ma pa Dha ni sa
-
സ രി ഗ മ പ ധ നി സ
Malayalam
sa ri ga ma pa dha ni sa
-
ஸ ரி க ம ப த னி ஸ
Tamil
sa ri ga ma pa Dha ni sa
-
స రి గ మ ప ధ ని స
Telugu
sa ri ga ma pa Dha ni sa
-
ಸ ರಿ ಗ ಮ ಪ ಧ ನಿ ಸ
Kannada
sa ri ga ma pa Dha ni sa
-
স ির গ ম প ধ িন স
Bangala
sa ri ga ma pa Dha ni sa
-
ସ ରି ଗ ମ ପ ଧ ନି ସ
Oriya
sa ri ga ma pa Dha ni sa
-
ਸ ਿਰ ਗ ਮ ਪ ਧ ਿਨ ਸ
Punjabi
sa ri ga ma pa dha ni sa
Unicode font. This local language text thus
generated, you can copy and paste on the
new mail editing area of the email client,
in the html editing area of the web inbox,
message-box of a chat line, or the web
page editing window. Now you are ready to
upload and dispatch the vernacular script.
This is the basic principle used for local
language web page creation too.
To generate the vowel accents of local
language sounds or compound characters
in that particular language alphabet, a
sequence of English characters may have
to be typed at times. The guidelines for
this will be generally available in the
transliteration language web page itself.
But all tools need not support all
languages. Indian languages generally
have a maximum of 15 vowel sounds
and 36 consonants. There are compound
letters formed by combination of
consonants and vowels. Most of these
patterns are handled in these tools. Still
there will be a few left out which have to
be addressed separately.
In languages like Malayalam
where certain words end in half sounds
2.
3.
4.
5.
6.
Run the command: tar -xvzf Hindi.tar.
gz
This will create the directory "Hindi"
Go into the directory "Hindi"
Run the file FontInstaller.sh, give the
command: ./FontInstaller.sh
Now restart your X server
The font is now installed on your machine.
You can also create a new directory,
say “myfonts” in /user/share/fonts/ and
copy the required font in “myfonts” in
Fedora.
Windows 2000, Windows XP, and
Windows Vista have inbuilt support for
Unicode encoding at the operating system
level, but the feature needs to be enabled.
Windows VISTA
•
•
•
Go to the Control Panel and then click
to the Regional and Language Option.
Choose the Country - India.
Click on the keyboard and Languages
Tab and choose the Hindi keyboard.
EN will appear in the system tray.
Left click on the EN or press the
ALT+SHIFT keys and choose the
language to type.
www.csi-india.org
With the enabling of Unicode in your system,
the INSCRIPT keyboard driver and Unicode
supported Mangal and Arial Unicode MS
fonts will be installed in the system.
To download the other keyboard
drivers, such as Typewriter/Remington,
Phonetic/Roman,
Platform-free
and
browser-free Open type fonts, fonts
converter, keyboard tutor for learning the
INSCRIPT Typing, Hindi version of Indian
Open Office, and other software free of
cost visit the site www.ildc.in
•
Choose the language (Hindi)
•
Click on the ‘Download’ for the
required software and driver
•
A zip file will get downloaded
•
After unzipping the file, run the .exe
of that software
Option - 3 Open-type fonts
Option - 4 Keyboard Drivers
Option - 5 Fonts Converter
Unicode can be enabled in the Windows
2000 and later version Operating Systems
as under:
You should first install Windows Files
for display of Indic languages.
Enable Indic for Windows XP & above
1. Go to Start-> Control Panel> Regional
& Language Options >Languages Tab->
(Tick the Install files for complex scripts...)
and click OK.
(tick the Indic) and click OK.
2. Click OK (Figure Below).
3. You will require the Windows 2000 CD
to enable Indic.
Again go to Regional Options
and Click on Input Locales. Add those
languages on which you want to type.
From System tray Click on EN and for
typing select language.
Unicode Fonts
Unicode is a map, a chart of all of the
characters, letters, symbols, punctuation
marks etc. necessary for writing all of the
world’s languages.
Graphemes are the basic building
blocks of a written script. Grapheme is a
synonym for a character. In English, there
is one-to-one correspondence between
a character and its glyphs (ornamental
marks). Glyphs in a font should comprise
a unified design entity. Font represents
the graphical form of a script. Fonts are
therefore formed with a collection of
Graphemes and glyphs.
Phonemes are the basic building
blocks of phonetics of a language.
Graphemes form as an abstract
conceptual layer in between physically
conceivable glyphs and phonemes.
Unicode Consortium is standardizing the
character sets of the world languages.
Character sets of 30+ languages are
currently standardized under Unicode.
A font spanning many Unicode
ranges can be helpful in several practical
applications. For instance, it can provide
some scripts and characters that are
hard to find, ease installation of base
support for many languages, facilitate
documents mixing symbols and language
scripts, and improve appearance of web
pages with mixed symbols and scripts.
Those who use Windows OS (only
NT), 2000, and XP can take advantage
of Unicode. In these operating systems,
it is possible to read, type, print etc.
using Unicode mappings, provided of
course that you have the appropriate
font and keyboard drivers. With the
other Windows (95, 98, me), typing in
Unicode is not really possible. Unicode
also works on recent Mac operating
systems.
Virtual Keyboards & Character Maps
The combinations of consonants and
vowels to render the different phonetics
may be rendered by successive hits of key
strokes as given below (Fig. 1). This can
easily be rendered faster by transliteration
packages, generally available as html
forms as given in the subsequent figures
(Fig. 3).
2. Click OK (Figure Below).
3. You will require the Windows XP CD to
enable Indic.
Again Go to Control Panel >>
Regional and Language Option >> Click on
Language Tab
Click on Details and Click on Add for
Selection of the language of your choice
From System tray Click on EN and for
typing select language
Enable Indic for Windows 2000
1. Go to Start->Settings->Control Panel>Regional Options ->Languages->Indic
Fig. 1: Key strokes for rendering phonetics (+ implies successive hits)
CSI Communications | May 2012 | 15
Conclusion
Setting up of your PC for local language
reading and writing, installation of fonts,
and language converters are a one-time
activity. These installations are to be done
only once for any typical local language.
Rest of the work of reading, typing, editing,
and uploading scripts are as easy as any
other English language text.
Some of the Unicode fonts for Indian
languages are:
1. Windows: Arial Unicode MS, Akshar
Unicode, ALPHABETUM Unicode,
Aparajita, JanaHindi, JanaMarathi,
JanaSanskrit, Kalimati, Kanjirowa,
Kokila, Lucida Sans, Mangal, Raghindi,
Roman Unicode, Sanskrit 2003,
Fig. 2: A typical keyboard character map for devanagari font
Santipur OT, Saraswati5, shiDeva,
SHREEDV0726-OT,
Language Converters
Transliterate to Hindi
SiddhiUni, Sun-ExtA, Thyaka
There are a number of free language
Rabison, TITUS Cyberbit
Type your text here
See your results here
converters available in Windows and
Basic, Uttara, Chrysanthi
LINUX. The following list refers to a few
Unicode,
CN-Arial,
namaskaara
ueLdkj
of them.
Code2000, Ekushey Azad,
Ekushey Durga, Ekushey
Fig. 3: A typical Hindi transliteration page
Offline Converters
Puja, Ekushey Punarbhaba,
1. Indian language converter (ILC)
Ekushey Saraswatii, Ekushey
- Bengali, Hindi, Kannada, Malayalam,
Sharifa, Ekushey Sumit, Free
Oriya, Punjabi, Sanskrit, Telugu, and Tamil.
Serif, Likhan, Mitra Mono,
2. Scripto0.2.0 – Gujarathi, Gurumukhi,
Mukti, Mukti Narrow, Raga,
Hindi, Malayalam
Fig. 4
Roman Unicode, Rupali,
3. Keraleeyam, Varamozhi, mozhi,
Saraswati5,
SolaimanLipi,
Madhuri - Malayalam
Sun-ExtA, UniBangla, Vrinda,
4. Baraha - Kannada, Hindi, Marathi,
aakar, Chrysanthi Unicode, CN - Arial,
Conversion Guidelines
Sanskrit, Tamil, Telugu, Malayalam,
Code2000, padma, Rekha etc.
Gujarati, Gurumukhi, Bengali, Assamese,
Manipuri, and Oriya languages.
5. Hindi Editor For The Unicode™
Standard – Hindi
Online Converters
Google mails have built in transliteration
facility. The language of choice may be
selected from a list box in the html text
creation of mails.
•
Aksharamala
•
Bangla Unicode Converter
•
Devanagari Editor etc.
Some of the other online URLs are
•
http://www.translatorindia.com
•
http://www.tamilcube.com
•
http://unicode.org/resources/onlinetools.html
Transliteration
A typical transliteration software package
ILC downloaded from the Internet will look
like the following:
CSI Communications | May 2012 | 16
Fig. 5
www.csi-india.org
2.
3.
Macintosh OS 9: Devanagari MT,
Devanagari MTS
Linux: GNU FreeFont, Devanagari,
Lohit Malayalam, Latha, Valluvar etc.
About the Author
More specifically, these are the fonts for
typical Indian language scripts:
1. Hindi - Akshar, Cdac - GIST
Surekh, Gargi (Gargi.ttf), JanaHindi
(RKJanaHindi.TTF)
JanaMarathi
(RVJanaMarathi.TTF),
Mangal
(mangal.ttf), Raghindi (raghu.ttf),
Sanskrit 2003 (Sanskrit2003.ttf),
Shusha Fonts, Mangal font Mangal.
ttf, Hindi for Devanagari, Arial
Unicode
2. Malayalam - Kartika, Arial Unicode,
GNU FreeFont, Lohit Malayalam,
Meera, dyuthi, rachana, suruma,
raghu, Anjali old lipi, ML-Nila
3.
4.
5.
Tamil - Akshar Unicode (akshar.
ttf),Arial Unicode MS (arialuni.ttf),
JanaTamil (RRJanaTamil.ttf), Latha
(latha.ttf), ThendralUni (Thendral
Uni.ttf) TheneeUni (TheneeUni.ttf),
VaigaiUni (VaigaiUni.ttf)
Telugu - Akshar Unicode (akshar.
ttf), Code2000 (code2000.ttf),
Gautami (gautami.ttf), Pothana2000
(Pothana2000.ttf),
Vemana2000
(Vemana.ttf)
Kannada
Akshar
Unicode
(Akshar.ttf),
Arial
Unicode
MS
(arialuni.ttf),
Sample
of
JanaKannada at 25pt JanaKannada
(ROJanaKannada.TTF
from
JanaKannada.zip), Kedage, Mallige
(Malige-n.TTF_
RaghuKannada
(RORaghuKannada_ship.ttf ),
Saraswati5 (SaraswatiNormal.ttf and
SaraswatiBold.ttf), Tunga (Tunga.ttf)
6.
Bengali - Arial Unicode MS
All true-type Unicode fonts are
portable in LINUX system.
Bibliography
[1] Baraha - Free Indian Language
Software - Typing Software, http://
www.baraha.com
[2] Indian language transliteration |
Indian language unicode,
http://vikku.info/indian-languageunicode-converter
[3] The Indian Language Converter,
h t t p : //w w w . y a s h . i n f o / i n d i a n
LanguageConverter
Download this site's code: or ilc.zip.
The code used is free for use.
[4] GNU FreeFont: Why Unicode fonts?
http://www.gnu.org /sof tware/
freefont/articles
n
M Jayalakshmi is a retired scientist/engineer from the Vikram Sarabhai Space Centre, Dept of Space, Govt of
M.
IIndia. She was the Webmaster of VSSC intranet & Head of its Enterprise Software Section, Computer Division.
Her expertise are in the area of 1. Computational Numerical Software in Avionics sub-Systems, 2. Microprocessor
H
based On-board computers & Telemetry Systems, 3. Quality Assessment of Launch Vehicle Mission Software.
b
Development of applications software for VSSC intranet. She can be contacted at [email protected].
D
CSI Communications | May 2012 | 17
Article
Nishant Allawadi* and Parteek Kumar**
* Masters Student, Thapar University, Patiala
** Assistant Professor, CSED, Thapar University, Patiala
A Speech-to-Text System
Abstract: Speech-to-Text (STT) can be
described as a system which converts
speech into text. This paper discusses
about the applications of STT system in
health care instruments, banking devices,
aircraft devices, robotics etc. This paper
discusses the existing system like SOPC
based
Speech-to-Text
architecture,
architecture for Hindi Speech Recognition
System using HTK and Phonetic Speech
Analysis for Speech to Text Conversion.
This paper presents the architecture of
the Speech-to-Text system. This paper
provides a tutorial to implement STT
system. In this, it describes four phases of
development of STT system, namely, data
preparation, monophone HMM creation,
tied-state triphone HMM creation and
execution with julius. First phase is used
for processing of raw data for further use.
Second phase is used for the training of
the system using monophones. Third
phase is used for the training of the system
using triphones. Final phase explains the
execution of the system. The paper also
highlights the futuristic applications of
Speech-to-Text system.
Keywords:
HMM,
dictionary and triphones.
monophones,
Introduction
Speech-to-Text (STT) system is a system
for conversion of speech into text. It takes
speech as input and divides it into small
segments. These small segments are
sounds, known as monophones. It extracts
the feature vectors of the monophones
and matches them with stored feature
vectors[1]. Hidden Markov Model (HMM)
is used to find the most probabilistic result
and gives out the text for the input speech.
The system is developed by re-estimating
the feature vectors at each step of training
using HMM Tool Kit (HTK) commands.
The HMM is a result of the attempt to
model the speech generation statistically.
It is the most successful and commonly
used speech model used in speech
recognition[2].
This paper is divided into six sections.
Second section discusses about the
applications. Third section highlights
the existing STT systems. Architecture
of the STT has been described in the
fourth section. Fifth section describes
the implementation of the STT system.
CSI Communications | May 2012 | 18
The conclusion has been derived in sixth
section.
Applications of the Speech-to-Text
System
STT system is applicable in hospitals for
Health Care Instruments[6]. In banking, STT
is implemented in input devices where credit
card numbers are given input as speech. It is
widely used in aircraft systems, where pilots
give audio commands to manage operations
in the flight. Mobile phones are devices which
use STT in its many applications. These
applications are like writing text messages
by speech input, e-mail documentation,
mobile games commands, music player
song selection etc. STT systems are used
in computers for writing text documents.
It is also used for opening, closing and
operating various applications in computers.
Battle Management command centres
require rapid access to and control of large,
rapidly changing information databases.
Commanders and system operators need
to query these databases as conveniently
as possible, in an eyes-busy environment
where much of the information is presented
in a display format. Human-machine
interaction by voice has the potential to be
very useful in these environments. Robotics
is a new emerging field where inputs are
given in speech format to robots. Robot
processes the speech input command and
perform actions according to that[3].
Existing Speech-to-Text Systems
There are a number of systems that have
been proposed world-wide for Speech-toText System.
A System-on-Programmable-Chip
(SOPC) based Speech-to-Text architecture
has been proposed by Murugan and Balaji.
This speech-to-text system uses isolated
word recognition with a vocabulary of
ten words (digits 0 to 9) and statistical
modeling (HMM) for machine speech
recognition. They used Matlab tool for
recording speech in this process. The
training steps have been performed using
PC-based C programs. The resulting
HMM models are loaded onto an Fieldprogrammable gate array (FPGA) for the
recognition phase. The uttered word is
recognized based on maximum likelihood
estimation.
An architecture for Hindi Speech
Recognition System using HTK has been
proposed by Kumar and Aggarwal[7]. The
proposed system was built as a speech
recognition system for Hindi language.
Hidden Markov Model Toolkit (HTK)
has been used to develop the system.
The proposed architecture has four
phases, namely, preprocessing, feature
extraction, model generation and pattern
classification. The system recognizes the
isolated words using acoustic word model.
The system was trained for 30 Hindi
words. Training data was collected from
eight speakers. The developer reported
the accuracy of 94.63%.
Phonetic Speech Analysis for Speech
to Text Conversion has been given by
Bapat, and Nagalkar[4]. Their work aimed
in generating phonetic codes of the
uttered speech in training-less, human
independent manner. The proposed
config
INPUT
proto
dict
monophones
word.mlf
Wav files
prompts
.grammar
file
Data
Preparation
phones.mlf
MFC files
hmmdefs
Monophones
HMM
Creation
macros
Tied-State
Triphones
HMM
Creation
hmmdefs
macros
tiedlist
Execution
with Julius
Text
MFC files monophones
.voca file
vocabulary
.dfa
.dict file
Fig. 1: Speech-to-Text Conversion Architecture
www.csi-india.org
.grammar
.dfa
.dfa and
.dict file
creation
.voca
.dict
Fig. 2: Process of creation of. dfa file and .dict file
system has four phases, namely, end
point detection, segmenting speech into
phonemes, phoneme class identification
and phoneme variant identification in the
class identified. The proposed system uses
differentiation, zero-crossing calculation
and FFT operations.
Architecture of Speech-to-Text
System
The conversion process of speech to text
is divided into four phases, namely, Data
preparation, Monophones HMM creation,
Tied-state triphones HMM creation and
Execution with Julius interface as given
in Fig. 1. The description of each of these
phases is given in subsequent sections.
Implementation of Speech-to-Text
System
The conversion process of speech to text
is divided into four phases, namely, Data
preparation, Monophones HMM creation,
Tied-state triphones HMM creation and
Execution with Julius interface as given
in Fig. 1. The description of each of these
phases is given in subsequent sections[5].
Data Preparation
This phase is used to prepare the data
for processing in subsequent phases.
It requires grammar file, speech files,
vocabulary file and training text file as raw
input for processing. The processing of
these files is explained below.
Grammar files
In this phase, the grammar of the
language in the form of rules is provided
in .grammar file and words are provided
in .voca file. The .grammar file is used to
define the recognition rules. The .voca file
is used to define the actual words in each
word category and their pronunciation
information. The description of .grammar
file is given in (1).
% NS_E
</s>
% CALL
ADVICE
BOY
*/sample3 ADVICE ADVICE BOY
BOY CHARLIE CHARLIE DOOR DOOR
KICK KICK
…(3)
Speech files
Speech files are stored in .wav format.
These files are recorded by a recording
tool like audacity. The training text, written
in prompts file, is recorded and saved in
these files.
Vocabulary file
sil
ae d v ay s
b oy
…(2)
As given in (1), S refers to start symbol of
input, while NS_B indicates the beginning
of silence and NS_E indicates end of
silence by sil monophone. The data to
be recognized is given by the keyword
SENT which refers to CALL as given in (1).
The details of CALL is provided in .voca
file as given in (2). The CALL provides
the recognition of words with their
monophones combination. For example,
ADVICE has monophones combination
of “sil ae d v ay s sil”. The .grammar file
and .voca file are compiled to generate a
dictionary file and finite automata file,
namely .dict and .dfa file, respectively.
These files are required at the time of
execution of the system as shown in Fig. 2.
Training text file
Training text file is named as prompts.
It contains a list of words that are to
be recorded and the names of their
corresponding audio files that are to be
stored. The description of this file is given
in (3).
*/sample1 ADVICE BOY CHARLIE
DOOR KICK MAID NURSE ONCE
RULE TARGET
*/sample2 TARGET RULE ONCE
NURSE MAID KICK DOOR CHARLIE
BOY ADVICE
This file contains a sorted collection of
commonly used words of a language along
with their combination of monophones.
This file is used as a reference to create
a dictionary for the training words. A
snapshot of this file is given in (4).
ABACK
ABACUS
ABALON
[ABACK]
[ABACUS]
[ABALON]
Creation of phones.mlf and
dictionary file
In data preparation phase, wordlist
and words.mlf files are created from
prompts file. The wordlist file contains
all the unique words of prompts file. The
words.mlf file contains the same text as
prompts file with each word of prompts
file in a new line. The wordlist file creates
monophones0 and dictionary file, with the
help of vocabulary file. The dictionary file
contains all the training words with their
corresponding monophone combination
and monophones0 file contains list of all
unique monophones. The dictionary and
words.mlf file generate phones0.mlf file as
given in Fig. 3. A monophones1 file is also
generated in this process without sp i.e.
short-pause monophone.
Creation of MFC files
The .mfc are created from .wav files by
using HCopy command of HTK with the
help of a configuration file, config[8]. These
.mfc files contain the feature vectors for
vocabulary
Word List wordlist
Creation
monophones
Dictionary
Creation
dictionary
prompts
S : NS_B SENT NS_E
SENT: CALL
…(1)
The description of .voca file is given in (2).
% NS_B
<s>
sil
ax b ae k
ae b ax k ax s
ae b ax l aa n
…(4)
Master
Label File
Creation
words.mlf
Phoneme
Master
Label File
Creation
phones.mlf
Fig. 3: Master Label File Creation
CSI Communications | May 2012 | 19
best possible pronunciation. In order to do
this, HVite command is used with words.
mlf file, monophones1 file, dict file, config file
and previously generated HMM file and
saves it in a new transcript file, i.e., aligned.
mlf. In order to retrain the system, HERest
command is used two times with newly
created aligned.mlf and monophones1[8].
config
monophones0
monophones1
proto
mfc files
Creating Flat
Start
Fixing Silence macros hmmdefs
Monophones macros hmmdefs
Models
and
aligned.mlf
Re-estimating
phones0.mlf
hmmdefs
Realigning
Training
Data
macros
Tied-State Triphones HMM Creation
This phase has triphones creation, tiedstate triphones creation and training as
two important sub-phases as shown in
Fig. 5.
phones1.mlf
Triphones Creation and Training
aligned.mlf
Fig. 4: Monophone Creation and Training
the .wav files and are used in subsequent
phases for training.
Monophone HMM Creation
This phase is used to create a well-trained
set of single-gaussian monophones
HMM. This phase requires a prototype
for HMM, .mfc files, configuration file,
monophone files and phones.mlf file for
creating the HMM. Each HMM file follows
the prototype given in proto file. There
are a number of monophones in HMM
file. Generally, each monophone has five
states. Here, state 1 and state 5 are opening
and closing states, while state 2, 3 and 4
has values for means and variances for its
corresponding monophone. This phase
is further divided into three sub-phases,
namely, creating flat start monophones
and re-estimation, fixing the silence
models and realigning the training data as
given in Fig. 4.
Creating Flat Start Monophones and
Re-estimation
In this sub-phase, HMM file is created
manually by using default global values of
means and variances. These default values
are calculated by HCompV command of
HTK with the help of .mfc files and config
file[8]. These values are re-estimated three
times using HERest command with the
help of previously generated HMM file,
.mfc files, config file, phones.mlf file and
monophones0 file[8].
and saved with name sp. It has 5 states
where state 1 and state 5 are opening and
closing states. The State 2 and state 4 are
removed from sp model and only a state 3
is kept in sp model. The HHEd command
is used to tie sp model with central state
of sil model with the help of monophones1
file. The script file for this operation is
given in (5).
AT
AT
AT
TI
2 4 0.2
4 2 0.2
1 3 0.3
silst
In this manner, short pauses between
spoken words are treated as silence. In
order to retrain the system, the HERest
command is used two times with the
help of previously generated HMM file,
.mfc files, config file, phones.mlf file and
monophones1 file[8].
Realigning the Training Data
In case of multiple pronunciations of a
word in dictionary, this phase selects the
CSI Communications | May 2012 | 20
sil
ae+d
ae-d+v
d-v+ay
v-ay+s
ay-s
…(6)
The HLEd command is used to create
triphones as given earlier in (6). It requires
two files, aligned.mlf and a script, as shown
in (7).
WB
WB
TC
sp
sil
…(7)
As the system has been updated by
including triphones file. The HERest
command is used two times to train the
system with triphones.
config
aligned.mlf
hmmdefs
Fixing the Silence Models
This sub-phase is used to make the
model more robust to absorb various
impulsive noises in the training data.
This is done by including short pause
monophone in the HMM file and linking
it with sil monophone. In order to do this,
a temporary copy of sil model is created
{sil.transP}
{sil.transP}
{sp.transP}
{sil.state[3],sp.
state[2]}
…(5)
In this sub-phase, triphones are created.
Triphone is a combination of three
monophones. This greatly improves
recognition accuracy, because now the
system looks to match a specific sequence
of three sounds together rather than only
one sound. For example, ADVICE has
triphones as given in (6).
macros
Creating
Triphones
from
Monophones
and Training
hmmdefs
macros
stats
triphones
wintri.mlf
hmmdef
Creating
Tied-State
Triphones
and Training
macros
tiedlist
mfc files
Fig. 5: Triphones Creation and Training
www.csi-india.org
of that language and training the system
with training text of that language.
References
Fig. 6: System Execution
Tied-State Triphones Creation and
Training
-hlist
-h
-dfa
-v
-smpFreq
tiedlist
hmm15/hmmdefs
sample.dfa
sample.dict
48000
In this sub-phase, different triphone states
are tied together in order to share the data
and to make the system more robust. In
order to tie states, the HHEd command is
used with previously generated HMM file
and triphones file. This command creates
tiedlist file that is used in further training
of the system. Since, the system has been
updated with new file tiedlist, the HERest
command is used to retrain the system
two times with newly created tiedlist file.
The julian command is used to execute
the system. It requires julian.conf and mic
as parameters. After execution of this
command the system prompts the user to
speak the sentence as given in Fig. 6.
Now the speaker can speak input
sentence and the system will give its
corresponding text.
Execution with Julius Interface
Conclusion
Julius is as an interface used to execute
STT system. Julius requires four files, .dfa
file, .dict file, previously generated HMM
file and tiedlist file. The first two files, .dfa
file and .dict file, have already been created
in phase 1 and HMM file and tiedlist file
have been created in phase 4. In order to
execute the system, these files are passed
as parameters in its configuration file, i.e.,
julian.conf as given in (8).
A Speech-to-Text system for small
vocabulary can be developed by using HTK
commands. As discussed in the architecture,
there are four phases in the development of
the STT system. The above discussed STT
system is speaker dependent. To make this
system speaker independent, adaptation
technique is required.
A STT system for other languages can
also be developed by using monophones
…(8)
[1] A. Kemble Kimberlee, “An Introduction
to Speech Recognition (Unpublished
work style),” unpublished.
[2] Aymen M., Abdelaziz A., Halim S.,
Maaref H., “Hidden Markov Models
for automatic speech recognition”,
in International Conference on
Communications, Computing and
Control
Applications
(CCCA),
Hammamet, Tunisia, 2011, pp. 1-6.
[3] Balaganesh
M.,
Logashanmugam
E., Aadhitya C.S., Manikandan R., in
International Conference on Emerging
Trends in Robotics and Communication
Technologies (INTERACT), Chennai,
India, 2010, pp. 12-15.
[4] Bapat Abhijit V., Nagalkar Lalit K.,
“Phonetic Speech Analysis for Speech
to Text Conversion”, in IEEE Region 10
Colloquium and the Third International
Conference
on
Industrial
and
Information Systems, Kharagpur, India,
2008, pp. 1-4.
[5] “Create Speaker Dependent Acoustic
Model Using Your Voice”, http://
w w w .v o x f o r g e . o r g / h o m e /d e v/
acousticmodels/windows/create.
[6] Grasso Michael A., “The Long-Term
Adoption of Speech Recognition in
Medical Applications”, in 16th IEEE
Symposium Computer-Based Medical
Systems, New York, NY, USA ,2003,
pp. 257-262.
[7] Kumar Kuldeep and Aggarwal R.K.,
“Hindi Speech Recognition System
using HTK”, J. of International Journal
of Computing and Business Research,
vol. 2, pp. 3-7, 2011.
[8] Steve Young, Gunnar Evermann, Mark
Gales, Thomas Hain, Dan Kershaw,
Xunying (Andrew) Liu, Gareth Moore,
Julian Odell, Dave Ollason, Dan Povey,
Valtcho Valtchev and Phil Woodland,
The HTK Book, Cambridge University
n
Engineering Department, 2009.
About the Authors
Nishant Allawadi is pursuing Master’s of Engineering in Computer Science at Thapar University, Patiala. He has received his
Bachelor of Technology degree from Guru Jambheshwar University of Science and Technology, Hisar (Haryana) in the year
2010. He is doing his ME thesis in the field of Natural Language Processing.
Parteek Kumar is Assistant Professor in the Department of Computer Science and Engineering at Thapar University, Patiala.
He has more than thirteen years of academic experience. He has earned his B.Tech degree from SLIET and MS from BITS
Pilani. He is pursuing his Ph.D in the area of Natural Language Processing from Thapar University. He has published more
than 50 research papers and articles in Journals, Conferences and Magazines of repute. He has undergone various faculty
development programme from industries like Sun Microsystems, TCS and Infosys. He has co-authored six books including
Simplified Approach to DBMS. He is acting as Co-PI for the research Project on Development of Indradhanush: An Integrated
WordNet for Bengali, Gujarati, Kashmiri, Konkani, Oriya, Punjabi and Urdu sponsored by Department of Information
Technology, Ministry of Communication and Information Technology, Govt. of India.
CSI Communications | May 2012 | 21
Article
Jaganadh G
Consultant in Text Analytics and Free and Open Source Software
Opinion Mining and Sentiment Analysis
Introduction
It is human to have opinion on whatever
may be experienced in his/her life.
Opinion is expressed with the help of
language either as written or spoken.
Human being used to mine opinion in a
natural way whenever he started living
as social being. All his/her adventures
or new procurement etc. were subject to
the opinion mining. Before wearing a new
apparel for a public function, or buying
some appliances or before watching
movies people solicited opinions from
friend, family and others. They mined
the entire opinion collection with worlds
complex opinion mining system "human
brain". When human being entered in
to a consumer oriented world corporate
and
non-corporate
establishments
started producing, selling and advertising
their
products/services.
Corporate
establishments used media to advertise
their service/product which eventually
leads to word-of-mouth advertisement
and sales opportunities. Non-corporate
establishments were almost depending
word-of-mouth publicity only. In both of
the scenarios people buys and experiences,
then they expressed their opinions on the
same. These opinions were key factors in
determining the market of services and
products. Corporate establishments were
keen to understand the customer opinions
and to derive Business Intelligence
from it. So they conducted surveys to
understand customer needs satisfaction
and
dissatisfaction.
Consolidated
reports on such surveys helped them
to improve product, marketing startegy
and even withdraw product from market
to avoid loss and manage reputation. It
can be called as a second generation of
Opinion Mining. The advent of Web 2.0
based technologies and tools opened
wast window to express and share
opinions. Thus the opinions reached to
a wide audience across the globe. Also
the opinions expressed through web
platforms such as social media (Twitter,
Facebook etc.) created opportunity to
create real-time sharing of opinion; which
leads to real-time market up and down for
corporate and similar entities. Deriving
Business Intelligence from heavy flow of
CSI Communications | May 2012 | 22
consumer opinion in real-time gave birth
to a new field of study in Natural Language
Processing and Computational Linguistics.
The very field is called as "Opinion
Mining". Sentiment Analysis, Sentiment
Mining,
Opinion
Mining,
Review
Mining, Opinion Detection, Sentiment
Detection, Subjectivity Detection, Polarity
Classification, Semantic Orientation, and
Appraisal Extraction etc. refers to same
state of the art. The current article aims
to give a brief introduction to Opinion
Mining, its technical aspects and business
applications in real-world.
Opinion
To get a deeper insight on the art lets see
what is the definition, structure and social
role of opinion.
We are surrounded with opinions
than facts in our life. Oxford Dictionary
defines opinion as (a) 'a view or
judgment formed about something, not
necessarily based on fact or knowledge'
(b) 'a statement of advice by an expert on
a professional matter'. Opinion is more
or less results from state of mind when
we experience something in our day
to day life. Based on the socio-cultural
standard of the person he the sentiment/
opinion express with the help of linguistic
units appropriate to the mental state,
experience and situation. This expression
may be an appraisal or a negative
comment up to the extreme of using
sarcasm or un-parliamentary words. It
is also quite natural that people may
compare stuffs when expressing opinion.
There are other kind of opinion which
comes from experts or experienced
people. In social life we call them as trust
worthy source of opinion. They provides
comparative and structured opinion on
the topics we seek advice. Such people
are there in the online community too. In
terms of business and marketing strategy
we can call them as 'influence leaders'
or 'influencer'. We can categories such
influencers as trust worthy and non-truth
worthy influencers too, because some
are biased people. Now we can observe
a structure for the opinion; an opinion
requires an object (a brand/product
such as mobile/movie etc.), opinion
holder who experiences the object and
expresses the opinion and the opinion.
Opinion Mining/Sentiment Analysis
Information contained in any text
document can be either subjective or
objective or both. Subjective text will
be mostly contains positive or negative
opinions, while objective text will be facts.
So the art Opinion Mining and Sentiment
Analysis tries to identify subjectivity and
objectivity of a text and further identifies
polarity of subjective text. The polarity of a
text will be either positive or negative or a
mix of both. The polarity of objective text is
considered as neutral. In short Sentiment
Analysis is automated extraction of
subjective content from digital text and
predicting the subjectivity such as positive
or negative. It aims to explore attitude
of a person who created the text. It used
Natural Language Processing and Machine
Learning principles to spot linguistic
structures that determines polarity.
Detecting Sentiment from Text
We can perform three level of sentiment
analysis over a subjective text, document
level sentiment analysis, sentence level
sentiment analysis and faceted sentiment
analysis or feature level sentiment
analysis. Document level sentiment
analysis aims to detect the sentiment
of whole document. It is quite obvious
that there are less chance that a single
document may contain 100% positive or
negative sentiment. But still the sentiment
analysis predicts the predominant
sentiment expressed in the document.
(Predicting polarity of a full length review
from http://www.rottentomatoes.com/.)
Sentence level sentiment prediction aims
to identify polarity of a given sentence in
a text. Faceted sentiment analysis aims to
predict polarity of sentences or phrases
which deals with attributes of object under
question (such as predicting sentiment of
features related to mobile phone from a
textual review).
There are different ways to identify
and predict sentiment from text. They
are lexicon based, Natural Language
Processing based and Machine Learning
based techniques. There is no harm in
trying hybrid approaches to obtain the
www.csi-india.org
results. In lexicon base approach prepopulated list of words with sentiment
probability will be used to spot key
sentiment indicators. The approach is
quite straightforward; read a text, consult
with lexicon find probability value, sum
the probability and get the highest
probability class. Similarly, we can use
a pre populated list of positive and
negative words to predict the sentiment
too. A combination of linguistic rules and
Natural Language Processing Techniques
can be used to spot opinion indication
and predict the sentiment. Generally,
such rules will be finding adjective noun
sequences and examines context rules
to get the polarity. For example, in the
sentence “Service of XYZ mobile phone
is not good” 'good' is a positive word but
the presence of negation 'not' contradicts
the polar nature of the word. Or simple
negative and positive word combination
creates a negative expression. Adjective
or adjective noun sequences can be
identified with POS tagging, chunking
or with parsing. Once the chunks are
identified we can apply rules to identify
the polarity. In machine learning based
approach a sample data will be populated
to train a selected algorithm. The
populated data will be manually classified
by the polarity value. The trained models
will be used along with algorithms to
predict the sentiment. Since the article
is very short not details of the process
involved in each methodologies omitted
deliberately. I hope I can cover it in a
later note.
Challenges in Sentiment Analysis
About the Author
Language is the most wonderful, dynamic
and mysterious phenomena in the
universe. Language and its structure is the
primary challenge in Sentiment Analysis.
Especially the language or “slanguage”
used social networks like Twitter and
Facebook. As like in the society there are
false influences or false opinion leaders
who works for money. Identification of
such false influencer and spam content
is another major challenge in this area.
There are other interesting challenges in
sentiment analysis such as identification
of sarcasm and using deep semantic
pragmatic concepts to determine granule
level emotion expressed in text. Even
though industry adopted it as a technology
to earn, still there are open ended issues
and challenges to be resolved.
Business Applications
Sentiment Analysis will be the most
widely adopted art from Natural Language
Processing to Business and Business
Intelligence Applications. Popularity of
social networks and high volume of user
generated content, especially subjective
content caused the heavy demand to
adopt sentiment analysis in business
applications. Since it mainly deals with
consumer centric content the very art can
be called as “Marketing Research 3.0”.
Sentiment Analysis helps corporates to
get customer opinion in real-time. This
real-time information helps them to
design new marketing strategies, improve
product features and can predict chances
of product failure. It is not applied only
in consumer centric applications. It can
be used in Politics and diplomacy to
get clear picture of peoples mentality
about election campaigns and strategic
policies and bills. Sentiment Analysis can
even predict the effectiveness of “viral
Marketing” and chances of ups and downs
in stock prices too.
There are good number of
commercial as well as free sentiment
analysis services. Radiant6, Sysomos,
Viralhealt, Lexalytics, AiAiO Labs, etc.
are some of the top commercial players
in the field. There are some free tools like
twittersentiment.appspot.com too exist.
I Would Like to Develop a Sentiment
Analysis System !!
It is not rocket science. Even you can
develop a sentiment analysis system.
There are lots of Free and Open Source
tools available for performing Natural
Language Processing and Machine
Learning tasks. Also wast amount of
consumer generated text data, prepared
for sentiment analysis task is available on
internet. Tools like GATE, NLTK, Apache
Mahout, Weka, Rapidminer, KNIME,
OpenNLP etc. can be used to develop your
own sentiment analysis system.
References
[1] Bing Liu (2010). "Sentiment Analysis
and Subjectivity". Handbook of
Natural Language Processing, Second
Edition, (editors: N. Indurkhya and
F. J. Damerau), 2010.
[2] Peter Turney (2002). "Thumbs
Up or Thumbs Down? Semantic
Orientation Applied to Unsupervised
Classification
of
Reviews".
Proceedings of the Association for
Computational Linguistics (ACL).
pp. 417–424
[3] Bo Pang; Lillian Lee and Shivakumar
Vaithyanathan (2002). "Thumbs
up? Sentiment Classification using
Machine Learning Techniques".
Proceedings of the Conference
on Empirical Methods in Natural
Language Processing (EMNLP).
pp. 79–86.
[4] Michelle
de
Haaff
(2010),
Sentiment Analysis, Hard But Worth
It!,
CustomerThink,
retrieved
2010-03-12.
[5] Lipika Dey, S K Mirajul Haque
(2008). "Opinion Mining from
Noisy Text Data". Proceedings of
the second workshop on Analytics
for noisy unstructured text data,
pp. 83-90.
[6] Minqing Hu; Bing Liu (2004).
"Mining and Summarizing Customer
Reviews". Proceedings of KDD 2004.
[7] Pang, Bo; Lee, Lillian (2008). Opinion
Mining and Sentiment Analysis. Now
Publishers Inc.
n
Jaganadh G is a Natural Language Processing and Machine Learning Developer and Researcher with experience in Sentiment
Analysis, Information Extraction, Machine Translation, Spell checker Development, Automatic Speech Recognition (ASR),
Text to Speech System (TTS), Internationalization of Domain Names (IDN), Localization, Perl and Python programming.
Experienced in preparing software documentation according to ISO and IEEE standards. Well versed in GNU/Linux operating
system. A smart Computational Linguist with abilities in developing algorithms for Machine Translation and related NLP field.
(365Media Pvt. Ltd., Project Lead (NLP), Coimbatore, Tamilnadu, India, AU-KBC Research Centre, Chennai, C-DAC, C-DIT,
Rashtriya Sanskrit Vidyapeeth, Thirupathi, Andhrapradesh)
CSI Communications | May 2012 | 23
Article
Randhir Kumar*, Dr. P K Choudhary**, and S M F Pasha***
* PhD Candidate of AISSR, at University of Amsterdam (The Netherlands)
** HOD University Department of Sociology, Ranchi University, Ranchi
*** Assistant Manager,Computer Society of India
Telemedicine in the State of Maharashtra: A Case Study
Abstract:
The
Government
of
Maharashtra telemedicine project was
operationalised in the year 2007 and
since then it has taken a path to expand
its outreach and number of beneficiaries.
This instance provides an example of how
the modern ICT can be gainfully used for
benefitting the masses, who till now were
deprived from getting advanced medical
care. The attempt of this case study is to
document the path taken by the Health
Ministry of Maharashtra in implementing
the telemedicine successfully.
Key Words: NRHM, HER, Specialist End,
Patient End, Teleradiology
Introduction
Telemedicine is an umbrella term
which involves all the medical activity
having an element of distance (Wotton,
1998). Although, telemedicine has
been practiced since hundred of years
by means of letters (See, Wotton, n.d),
but with advancement of Information
and Communication Technology, there
has been a manifold increase in using
telemedicine as a tool for delivering
medical treatment. Telemedicine not
only includes the real time consultation
between patient and expert, but it also has
the element of getting medical advises on
prerecorded medical data such as in the
case of ‘teleradiology’or ‘telepathology’[1].
A more sophisticated model has been
using it extensively for providing health
care benefits to the unprivileged people.
These interventions usually are taken
in the form of welfare projects involving
substantial investment, coordination and
planning.
The Government of Maharashtra
launched its pilot project on Telemedicine
in the year 2007, with one Specialist
node at KEM Hospital, Parel, Mumbai
and 5 sub district hospitals. The prime
target areas for this intervention were
tribal areas such as those of Sindhudurg,
Nandurbar, Beed and Satara. The second
phase of expansion involved participation
of 5 specialist node, 23 district hospitals
and 4 sub-district hospitals.
The Maharashtra State Telemedicine
project is a part of larger initiative
undertaken by Government of India and
World Health Organisation. Under the
banner of National Rural Health Mission
(NRHM), Telemedicine is one of the key
initiatives to improve the health services
for the rural people of India.
The General Framework of
Telemedicine Project in Maharashtra
The overall network of Telemedicine in
Maharashtra can be classified under two
broad subheadings, viz.
1. Specialist End
2. Patient End
Specialist End: The Specialist end consists
of Five Medical colleges. The medical
colleges that have been developed
as specialist end are KEM Hospital
Mumbai, B. J. Medical College Pune,
GMC Aurangabad, GMC Nagpur, Sir J.
J. Hospital Mumbai. Nanavati Hospital
at Mumbai has been made has honorary
specialist centre.
The J. J. Hospital at Mumbai has a
dual role to play. It acts as main server
centre for coordinating between the
Thane
Bombay
Alibagh
Nashik
Pune
Satara
Ratnagiri
Osmanabad
Latur
Bid
Ahmednagar
Parbhani
Jaina
Aurangabad
Jalgaon
Buldana
Amravati
Wardha
Nagpur
Chandrapur
Garhchiorli
Bhandara
Gondia
Nandurbar
Hingoli
Washim
Sindhudurg
Annexure 1: Name of the districts where
Telemedicine have been implemented.
Specialist End
Patient End
SH 1
DH 1
S
U
B
SH 2
SH 3
DH 2
SH 4
SH 5
DH 3...26
DH 27
D
I
S
T
R
I
C
T
H
O
S
P
I
T
A
L
S
Fig. 1: An overview of Telemedicine frame
work in Maharashtra. SH stands for five
Specialist Hospitals who provides consultation
services. DH stands for District Hospitals (27
in number) which once again has 4 subdistrict
hospitals each. Each sub-district hospital
further has several primary health centers (not
depicted in the figure).
specialist centers and patient centers.
Additionally, it also provides consultation
service for the referred patient through
teleconference.
Patient End: The patient end constitutes
of 27 districts hospitals of Maharashtra
(See, Annexure 1). Furthermore 4 Sub
district hospitals in each district acts
as centers where patient from nearby
areas come for consulting the doctors.
All the district and sub district hospitals
are equipped with modern state of art
telecommunication network system for
carrying out teleconferences. The SubDistrict hospitals are further sub-divided
into Regional Hospital (RH) and Primary
Health Centre (PHC).
The diagrammatic representation of
the present set up has been depicted in
Fig. 1.
Technical Support: The first phase of
telemedicine was technically supported
by Indian Space Research Organisation
(ISRO) who provided their expertise in
network connectivity. Initially there were
serious troubles with internet connectivity
1
Radiology is specialized medical branch which involves using of imaging technologies (X-Ray, MRI, CT Scan etc.) to identify and treat the anomalies in human body. Pathology
involves with identification of diseases based on laboratory analysis.
CSI Communications | May 2012 | 24
www.csi-india.org
as many times the connection would be
snapped. Later, this trouble was solved by
using dedicated lease lines of fiber optic
cables having a high bandwidth capacity.
Thereafter a medical equipment supplier
company “Progonosis” provided facilities
for video conferencing along with other
basic medical equipments such as those
of scanner, BP apparatus etc.
Management Structure
The whole project has a Mission
Managing Director (MD) under whom
there are several Joint Directors followed
by Assistant Directors. All three positions
together form the top management
who make the critical decisions in the
implementation of overall project.
Additionally, independent consultants are
hired for giving their expertise from time
to time.
The ground level day to day operations
are taken care by the coordinators and
facility managers of technical support
services. Each district has nodal officer
who is responsible for overall day to day
operation of telemedicine project at their
district. Other than these managerial and
support staff, a whole set of dedicated
doctors both at Specialist and Patient
End are involved in the consultation and
treatment of patients. The doctors are
not paid any extra by the government for
consulting patient through telemedicine.
However, an honorary sum of Rs. 100/and Rs. 300/- are paid to the doctors of
District Hospital and Specialist Hospital
per patient.
The Motto of Telemedicine
The primary motive of implementing a
pan state telemedicine network was to
provide a better access of super-specialty
medical care to the residence of remote
areas where they either do not have
sufficient time or lack enough resource to
travel to big cities for advance treatment.
Highlighting the present medical system
Nodal Officer of Mumbai area, Ms.
Sandhya Tayde apprised that “The areas
targeted for telemedicine intervention had
a poor access to trained doctors or medical
staff. Furthermore, due to the distance factor
and cost involved in seeking a first hand
specialist opinion was both time consuming
and costly affair. We using telemedicine have
tried to reduce the time of intervention and
cost and improve the quality of treatment
The CME division (Continuing Medical
by getting specialist opinion at their place of
Education) is very proactive in dissipating
residence only.” In a way this was a positive
the latest knowledge or medical cases
development for rural folks who did not
to the staff. At regular intervals of
have an idea of how and where to go for
time CME is organised and along with
a particular type of disease of illness.
technical knowledge various attitude
Furthermore, by early detection of serious
and behavioral skills related session are
life threatening illness such as in cancer
delivered, which in turn helps in creating
patients, lives can be saved by early
improved clinical performance and
detection and timely intervention.
professional development. Additionally,
Another key beneficial feature of
via tele-conferencing between medical
Telemedicine intervention is its ability
colleges and district hospitals computer
to build and maintain a central database
specific skill set are imparted to equip
having all the details pertaining to
the medical professionals to trouble
patient medical history and treatment
shoot minor technical problems.
administered to him/her. This means
Impact and outreach of Telemedicine
that there is one centralized monitoring
in Maharashtra
hub from where all the data can be
The telemedicine drastically reduced the
accessed from any remote location at a
time taken for seeking an expert advice.
given point of time. This also means that
According to Ms. Tayde earlier the wait
patient digitized data related to X-Ray,
period for a patient to seek an appointment
CT scan, Pathology report etc. are easily
with specialist was on an average of three
accessible and opinion from different
months. However, now the wait time has
specialist can be sought before deciding
reduced drastically as they can divert the
a particular course of treatment. It also
patient digital information to any expert
ensures completeness and correctness
who is willing to handle the case. The
of information and past data records
junior doctors involved in district hospital
are often utilized by specialist for better
also learn in this whole process of referring
management of health care
services.
S.No. Specialty
Patients Referred
Opinion Received
The
Telemedicine
from District
from Specialty
system in Maharashtra has
(April 2010 to
Centers (April 2010
March 2011)
to March 2011)
been equipped to seamlessly
capture and upload patient 1
Medicine
1059
1032
information, waveforms and 2
Surgery
344
316
images from remote location
OBGY
146
207
to a centralized server and 3
Pediatrics
393
387
get experts opinion or review 4
instantly within the network 5
Cardiology
65
51
(intranet) or at a later point
6
Neurology
45
44
of time. An Electronic Health
7
Anesthesia
28
29
Record (EHR) is generated for
Chest
25
23
each patient and is archived in 8
digital format. During cardiac 9
Ophthalmology
24
24
arrest or other emergencies,
10
Skin VD
85
83
the ECG and other relevant
ENT
76
43
data
can
be
instantly 11
Orthopedics
278
287
transmitted and the doctors at 12
remote location can suggest a 13
Psychiatry
40
40
course of action based on the
14
Radiology
1301
1400
live data.
Ayurvedic
68
30
Other than consulting 15
and
archiving
medical 16
Unani
155
160
data,
Telemedicine
has 17
Forensic
38
36
been
innovatively
used
in Maharashtra to train Table 1: Specialty wise patient referred and opinion received
for the same in the year 2010-11.
and develop medical staff
Source: Arogya Bhavan, CST Mumbai.
personnel at patient end.
CSI Communications | May 2012 | 25
Year
Patient
Referred
Opinion
Received
2008-09
538
448
2009-10
3640
3739
2010-11
4230
4195
Total
8408
8382
opinion received for them, from the year
2008-11.
Thus one can observe from the table
above that the telemedicine has been
quite popular among its end user and
has been catering for the service needs
of the poor and unprivileged rural people
residing in remote areas of Maharashtra.
Conclusion and the way ahead
Table 2: The number of patient referred
through telemedicine and expert opinion
received for the referred cases.
Source: Arogya Bhavan, CST, Mumbai.
the cases and having a discussion with the
specialist over tele-conference. At times
special rural camps on community health
and ophthalmology are organised through
telemedicine equipments mounted on
mobile vans.
The kind of specialist services
extended through telemedicine is in 30
area of medicine which is quite broad.
The key and most used specialist services
are related to cardiology, dermatology,
pathology, ophthalmology, ENT, surgery
(consultation), neurology and medicine.
The data related to the number of patient
referred in the year 2010-11 has been
summarized in table 1. Table 2 summarizes
the total number of patient referred and
The development in telecommunication
technology has given birth to modern
telemedicine, which has found its way
into improving the health services for
the unprivileged masses. Maharashtra
has successfully implemented the
telemedicine across its districts in two
phases. In the first and second phase of the
project, all the district and sub divisional
hospitals have been linked with the state
medical colleges. Now Maharashtra
government is planning to implement the
phase 3 of the project which proposes to
link all the Primary Health Care Centers
(PHC, Primary level) to medical colleges
(tertiary level). This means creation of
a complete network of primary (PHC),
Secondary (District Hospitals) and
Tertiary (Medical Colleges) for ensuring
proper and better care of the patients. This
network is expected to reduce mortality
and morbidity thus saving more lives by
ensuring continuity of care throughout the
network.
The present setup of Telemedicine
network in Maharashtra is one of the largest
in India. Telemedicine intervention has been
successful in reducing travel by patient
and therefore saving their costs involved in
travel, food, accommodation along with pay
loss due to taking leave from regular work.
It also meant less flocking of patient in the
specialty hospital and the doctors can give
their opinion by looking the digitized data
of patients, according to their convenience.
Telemedicine has also reduced cost involved
in training and development of medical staff
for Primary Health Care center. Therefore,
telemedicine is a perfect instance where
amalgamation of technology and social
cause has resulted in welfare of deprived
masses.
References
[1] Wootton R. (1998) Telemedicine in
the National Health Service, J R Soc
Med. Vol. 91, No. 12, pp. 614-21.
[2] Wootton R. (n.d) ‘Telemedicine’ in
Lock S, Dunea G, Pearn J, (eds.)
Illustrated Companion to Medicine,
UK: Oxford University Press (in
press).
n
About the Authors
Randhir Kumar is a PhD candidate of AISSR (Amsterdam Institute of Social Science Research) at University of Amsterdam
(The Netherlands). He secured his Masters degree in 'Globalisation and Labour Studies' from Tata Institute of Social Sciences
(Mumbai); after which he worked as a Research Associate in Personnel Management and Industrial Relations Area of IIM
(Ahmedabad).
Dr. P K Choudhary (Double MA, PhD and NET JRF) is a HOD of University Department of Sociology, Ranchi University,
Ranchi. Having more than 17 years of Experience in Research and Teaching at University level, he is Program Committee chair for
various national and International Conferences. He has written several articles and books on various societal issues. Considering
his knowledge and expertise, State and Central Governments have given him additional authority to lead various Development
Projects of Jharkhand.
S M Fahimuddin Pasha is an Assistant Manager at Computer Society of India. He has done M A in Globalization and Labour
from Tata Institute of Social Sciences (Mumbai) and MA in Sociology from Ranchi University . He is on the verge of completing his
PhD in Industrial Sociology. He is also a Researcher with International Institute of Social History (Amsterdam, The Neetherlands)
and an invitee to the University of Leipzig, (Germany) to adress on the issues of 'Detorization of Working Class'.
CSI Communications | May 2012 | 26
www.csi-india.org
Technical
Trends
Satyam Maheshwari* and Sunil Joshi**
* Assistant Professor, computer applications in SATI Degree, Vidisha (MP)
** Assistant Professor, computer applications in SATI Degree, Vidisha (MP)
Extending WEKA Framework for
Learning New Algorithms
Waikato Environment for Knowledge
Analysis (WEKA) is a collection of stateof-the-art machine learning algorithms
and data preprocessing tools. It is designed
so that you can quickly try out existing
methods on new datasets in flexible
ways. It provides extensive support for
the whole process of experimental data
mining, including preparing the input data,
evaluating learning schemes statistically,
and visualizing the input data and the result
preprocessing, clustering, classification,
regression, visualization, and feature
selection. All of WEKA techniques are
predicted on the assumption that the data
is available as a single flat file or relation,
where each data point is described by
a fixed number of attributes (normally,
numeric, or nominal attributes and it also
supports other type of attributes). The
easiest way to use WEKA is a graphical
user interface called the Explorer. The
All of WEKA techniques are predicted on the
assumption that the data is available as a single flat
file or relation, where each data point is described by
a fixed number of attributes (normally, numeric, or
nominal attributes and it also supports other type of
attributes).
of learning. WEKA was developed at the
University of Waikato in New Zealand and
is an open source software issued under
General Public License[2] written in java.
It runs on almost any platform and has
been tested under Linux, Windows, and
Macintosh operating systems. Recently an
article was published in CSI which showed
application of WEKA in Bio-inspired
algorithm[1]. The authors emphasized on
MLP classifier using genetic algorithm and
fuzzy logic. They gave information about
the existing framework. In this article, we
extend the existing framework of WEKA
in which we can add new classifier and
cluster and then can trend the dataset
from new algorithms.
The key features of WEKA’s success
are as follows:
1. It is open source and freely available;
2. It provides many different algorithms
for data mining and machine learning;
3. It is platform-independent; and
4. It is up-to-date, with new algorithms
being added as they appear in the
research literate.
WEKA[3] supports several standard
data mining tasks, more specifically, data
data uses a so-called filtering algorithm.
These filters can be used to transform the
data (e.g. turning numeric attributes into
discrete ones) and make it possible to
delete instances and attributes according
to specific criteria. The “Classify panel”
enables the user to apply classification and
regression algorithms (indiscriminately
called classifiers in WEKA) to the resulting
dataset; to estimate the accuracy of the
resulting predictive model; and to visualize
erroneous predictions, ROC curves, or the
model itself (if the model is amenable to
visualization, e.g. a decision tree). The
“Associate panel” provides access to
association rule learners that attempt to
identify all important interrelationships
between various attributes in the data.
The “Cluster panel” gives access to the
clustering techniques in WEKA, e.g. the
simple k-means algorithm. There is also
an implementation of the expectation
maximization algorithm for learning a
mixture of normal distributions. The
next panel, “Select attributes”, provides
algorithms for identifying the most
other user interfaces to WEKA are
Experimenter, KnowledgeFlow, and Simple
CLI. The Experimenter gives access to
all of its facilities using menu selection
and form filling. The
KnowledgeFlow provides
an alternative to Explorer
for showing how data
flows
through
the
system. It also allows
the design and execution
of
configurations
for
streamed
data
processing. The Simple
CLI is a command line
interface for executing
WEKA commands.
The main interface
Explorer has several
panels that give access
to the main components
of the workbench. The
“Preprocess” panel has
facilities for importing
data from a database,
a
comma-separated
values (CSV) file etc., and
Fig. 1: Existing snapshot of WEKA
for preprocessing this
CSI Communications | May 2012 | 27
adding a new classifier or a
cluster which is not included
in existing WEKA GUI, want
to investigate a new learning
scheme,, or want to learn more
about the inner workings of an
induction algorithm by actually
programming it yourself then
integrate new workspace in
WEKA.
WEKA can be extended
to include the elementary
learning schemes for research
and educational purposes. Fig.
1 shows the existing framework
of WEKA. Now we represent the
method to add a new classifier in
WEKA, we follow the following
steps:
1.
Create a new folder in a
window directory hierarchy.
Ex. C:\SmWork\classifiers
2.
To enable or disable
dynamic
class
discovery,
the relevant file to edit is
Fig. 2: Snapshot of WEKA displaying new added classifier
GenericPropertiesCreator.props
(GPC). This file can be obtained
predictive attributes in a dataset. The last
from the weka.jar or weka-src.jar archive.
panel, “Visualize”, shows a scatter plot
These files can be opened with an archive
matrix, where individual scatter plots can
manager that can handle ZIP files and
be selected and enlarged and analyzed
navigate to the weka/gui directory, where
further using various selection operators.
the GPC file is located. All that is required
WEKA can handle a number of file formats,
is to change the Use Dynamic property
including the ever-popular CSV (which
in this file from false to true (for enabling
can be exported from any spreadsheet
it) or the other way round (for disabling
program). WEKA prefers, however, to
it). After changing the file, just place it in
work with ARFF files, which are basically
home directory. For generating the GOE
CSV files with some header information
file, we need to execute the following
tacked on.
steps:
Suppose we want to implement a
Java
weka.gui.GenericProperties
special-purpose learning algorithm i.e.
Creator
%USERPROFILE%\Generic
PropertiesCreator.props
%USERPROFILE%\GenericObject
Editor.props
3. Remove WEKA.JAR from the
CLASSPATH.
4. Edit the GenericPropertiesCreator.
props file in the home directory and set
UseDynamic to false.
5. Add SmWork/classifiers in Generic
PropertiesCreator.props and Generic
ObjectEditor.props.
6. Run the command
java –classpath c:\progra~1\weka-36\weka.jar;c:\SmWork\classifiersweka.
gui.GUIChooser
Now we can write our new java code,
compile it, and then copy the class file into
a specified folder. Fig. 2 shows snapshot of
newly added classifier. Similarly, we can
extend WEKA for cluster and association
as well.
References
[1] Goli, B and Govindan, G (2011).
WEKA - A powerful free software
for implementing Bio- inspired
Algorithms, CSI Communication,
35(9), 09-11.
[2] Mark Hall, Eibe Frank, Geoffrey
Holmes, Bernhard Pfahringer, Peter
Reutemann, Ian H. Witten (2009);
The WEKA Data Mining Software:
An Update; SIGKDD Explorations,
Volume 11, Issue 1.
[3] Written I H and Frank, E (2005). Data
Mining: Practical Machine Learning
Tools and Techniques, San Francisco:
Morgan Kaufmann.
n
About the Authors
Satyam Maheshwari received the MTech degree in Computer Technology and Applications from RGPV Bhopal.
Since 2003, he is Assistant Professor in the department of computer applications in SATI Degree, Vidisha (MP).
His research interest is classification of imbalanced dataset in Data Mining. He is a Member of IEEE, CSI, and ISTE.
Sunil Joshi received MCA degree in 2001 from SATI Vidisha. Since 2005, he is Assistant Professor in the
department of computer applications in SATI Degree, Vidisha (MP). Currently he is pursuing PHD degree in
frequent pattern mining at University of RGPV. He is a member of IEEE and ISTE.
CSI Communications | May 2012 | 28
www.csi-india.org
Practitioner
Workbench
Dr. Debasish Jana
Editor, CSI Communications
Programming.Tips() »
Passing Variable Number of Arguments in C
Ever wondered how a printf or scanf is declared in C or C++? Why do I
raise this? Because, printf and scanf are such type of functions that can
take variable number of arguments. For example, you could use as:
printf(“%d %c”, someinteger, somechar);
Where someinteger and somechar are of int and char types respectively.
int someinteger;
char someinteger;
We could have decided to print only one integer as below:
printf(“%d”, somechar);
Or, simply, just a string as:
printf(“Hi There”);
In the above three examples, we have printf taking three, two and
one argument respectively. If you look closely, you will wonder that
in all three cases, first argument is a character string, and in 1st case,
the second argument is an integer (int), third argument is character
(char). In 2nd case, second argument is an integer (int) and there is
no third argument. In 3rd case, there is no second or third argument
either.
But, C does not support functions to be overloaded. So, we don't
expect that we have so many different variants of printf (and scanf
and similar functions) are declared. C++ inherited these from C, so in
C++ we have printf/scanf taking similar form.
In C, there is a syntax for optional parameter as triple dots i.e.
"...". This allows to pass a list of variables as defined in the format
string (first argument). Thus, the same method can be used to print
things like this:
int someinteger;
char someinteger;
printf(“%d %c”, someinteger, somechar);
printf(“%d”, somechar);
printf(“Hi There”);
In fact, printf is a function with the following signature:
void va_end(va_list ap);
This must be called once after arguments processed and before
function exit.
An example program follows:
#include <iostream.h>
#include <stdarg.h>
int sum( int first, ... );
int main()
{
// Call with 3 integers
// (-1 is used as terminator).
cout << "sum is: "
<< sum( 2, 3, 4, -1 )
<< endl;
// Call
cout <<
<<
<<
with 4 integers
"sum is: "
sum( 5, 7, 9, 11, -1 )
endl;
// Call with no integer : just -1 terminator
cout << "sum is: "
<< sum( -1 )
<< endl;
return 0;
}
// Returns the sum of a variable list of
// integers
int sum( int first, ... )
{
int s = 0, i = first;
va_list marker;
int printf(const char *format, ...);
This means that it requires at least one argument as a character string,
followed by 0 or more number of arguments (which can be of several
different types). The return type (int) signifies how many bytes have
been printed in the result. The number and type of the arguments are
determined by the format string.
There is a C header file stdarg.h that contains functions related to
facilities for stepping through a list of function arguments of unknown
number and type. The important functions are as given below:
void va_start(va_list ap, lastarg);
This Initialization macro is to be called once before any unnamed
argument is accessed. ap must be declared as a local variable, and
lastarg is the last named parameter of the function
type va_arg(va_list ap, type);
This produce a value of the type (type) and value of the next unnamed
argument. Modifies ap.
// Initialize variable arguments
va_start(marker, first);
while( i != -1 )
{
s += i;
i = va_arg( marker, int);
}
va_end( marker ); // reset variable
arguments
return s;
}
The output when the program is run is given below:
Output 7.4
sum is: 9
sum is: 32
sum is: 0
n
Do you have some Interesting Programming Tips to share? This could be in any Programming Language or Software tool. Share with us. Send your summarized write-up to CSI
Communications with subject line ‘Programming Tips’ at email address [email protected]
CSI Communications | May 2012 | 29
Practitioner
Workbench
Umesh P
Department of Computational Biology and Bioinformatics, University of Kerala
Programming.Learn (“Python”) »
Plotting with Python
Snakes are becoming popular among pet lovers as it is easy to
care, exotic, and you don’t need to feed them daily like a dog or
cat. Corn snake, Ball python, California King snake, Milk snake,
Boa constrictor etc. are popular pet snakes. Among pythons, Ball
python is considered to be one of the best pets for beginners. Ball
pythons are docile and are 5-feet long. In some countries, there are
online stores who deliver snakes on payment.
Matplotlib
is an object-oriented plotting library for
python. It is a MATLAB/Scilab-like application programming
interface (API) and provides accurate high-quality figures, which
can be used for publication purposes.
Matplotlib contains pylab interface, which is the set
of functions provided by matplotlib.pylab to plot graph.
matplotlib.pyplot is a collection of command-style functions
that helps matplotlib to work like MATLAB.
To start a plotting experiment, first we need to import
matplotlib.pylab.
>>>import matplotlib.pyplot as plt
Here library - matplotlib.pyplot - is imported and labeled
as plt for easy future reference of the module.
>>>import matplotlib.pyplot as plt
>>>plt.plot([1,2,3,4], [4,3,2,1])
>>>plt.axis([0,5,0,5])
>>>plt.show()
The plot function accepts the plotting points as two arrays with
x,y coordinate respectively. Pyplot fits a straight line to the
points. If you need only a scatter diagram of the points try the
following code:
>>>plt.plot([1,2,3,4], [4,3,2,1], 'ro')
You can plot the graph using different colors and styles by putting
an argument after the plot function.
>>>import matplotlib.pyplot
>>>x=arange(1.,10.,0.1)
>>>y=x*x
>>>plot(x,y,'g--')
>>>show()
After plotting the graph, to view it, you need to type show()
command.
Here you will get a green line graph; try with r for red, y for
yellow etc. We can specify shapes with cryptic reference such as S
for square, ^ for triangle etc.
>>plot(x,y,'rs')
>>plot(x,y,'g^')
# Red square
# Green triangle
Standard mathematical function can also be plotted. Let us plot
sine curve:
>>>from pylab import *
>>> x = arange(0.,10.,0.1)
>>> y = sin(x)
>>>plot(x,y)
>>>grid(True)
>>>show()
#
#
#
#
#
to define x values
function definition
to plot
to show graph in grid
to show the plot
Pylab contains the pyplot with numpy functionalities. If you are
importing matplotlib library, you need to import numpy also for
defining array.
CSI
C
SI Commun
Communications
unic
icat
ations | May
y 201
2012
012 | 30
01
n
ww
w w.csi-in
ndia.org
g
www.csi-india.org
CIO Perspective
Dr. R M Sonar
Chief Editor, CSI Communications
Managing Technology »
Business Information Systems:
Underlying Architectures
Previous article covered basic elements of a
system such as input, processing, and output.
Interfaces facilitate interactive environment
to get input into a system and present
output in a variety of forms such as reports.
Processing involves a) execution of business
logic implemented through programming
languages and b) management of required
data: storage, access, and manipulation.
In short, software that implements ISs can
be logically divided into three layers based
on functionality: interfaces (presentation
services), core business logic, and data services
as shown in Fig. 1. Table 1 describes these
layers. The components which implement
functionality of those layers can be coupled
either tightly or loosely. Loose coupling
brings a) greater flexibility in developing
and deploying components separately
in networked environment in distributed
fashion, b) flexibility in interconnecting
heterogeneous systems and platforms, and
c) better scalability and maintenance of
information systems. The ISs which have all
these layers managed by a single computer
program is called as single-tier system, while
ISs that have separate programs/systems to
implement individual functionality are called
as three-tier systems. In some systems,
business logic may be implemented using
multiple programs/systems, which are
called as n-tier systems (refer Fig. 2).
Single-tier Systems
connected to each other. Examples of
such systems include reservation systems
which are developed in languages like
COBOL and deployed under centralized
mainframe environments. As shown in
Fig. 3, thin clients are just devices with
no processing capabilities (called dumb
terminals) that are used for input (e.g.
data entry) and display information. Many
independent ISs that were developed in
languages like C were single tier where
the program manages user interfaces,
processing as well as file handling. Decision
support systems developed using desktop
productivity tools like MS excel manage
user interfaces, processing as well as data
inside the same excel workbook are also
examples of single-tier systems.
•
•
These are centralized systems
where all functionalities are tightly
connected
and
implemented
in a single information system
(monolithic). Easier to support and
maintain.
These are secure systems as there
are only limited entry points to the
system. The users have to access the
system through interfaces provided
and typically these are through dumb
terminals with no other devices/
systems connected to them.
Computationally efficient because
most of them are written in core
programming
languages,
no
overheads of other software like
database servers.
Business
logic
Data
services
Fig. 1: Logical separation of tasks (tiers) in IS
Key issues
•
•
•
•
•
Key benefits
•
The program that implements ISs takes
care of interfaces, business logic, and
data services as shown in Fig. 3. The
components of all these layers are tightly
Interfaces
Users have limited choice while
accessing data.
Lot of explicit and exhaustive coding
is required as the program that
implements IS has to manage all
functionalities.
More dependence on the vendor for
support, especially if the systems
providing customization capabilities
are not based on open standard.
Disadvantages of conventional file
handling.
Since most of these systems are
based on centralized computing,
failure of such systems can cause
major disruption in services.
Client/Server Systems
In client/server systems (server is referred
as database server), interfaces are taken
care by client machines (usually desktops)
and data services by DBMS as shown in
Fig. 4. Client machines interact with
database systems in a loosely coupled
manner. The client machines send requests
(or send DB commands) to the database
systems; the database systems respond
to that request and send required data or
execute requested command. The business
logic is split into two parts: client side and
server side. Since the majority of business
logic is implemented at client side, the
Layer
Functionality
Components/Types
Interfaces
Takes care of presentation services. Facilitate input, Text-based data entry interfaces, GUI-based (windows)
validation, and output.
interactive forms, IVR, SMS, WAP and web-based forms,
unstructured supplementary service data (USSD), static and
interactive reports, and dashboards and multimedia interfaces.
Business logic
Execution of core processing logic.
Data services
Defining data models, creation, storage, access, File handling and management, data stores, database
and manipulation of data required.
management systems (DBMS), XML storage and access etc.
Core modules, functions, procedures, APIs (libraries), Webservices, stored procedures etc.
Table 1: Functionality implemented by layers and components
CSI Communications | May 2012 | 31
Flexibility, personalization, access, and ROI
Key issues
•
Web-based
(N-tier)
Client/server
Single-tier
•
Distributed computing, modularity, open standard, and scalability
•
Fig. 2: Computing architectures
clients are typically fat client (machines
requiring higher computing resources). ISs
developed using tools VB (Visual Basic)
as front-end and Oracle as back-end fall
under this category. All installations of such
ISs at every deployment locations (such as
branch offices) need database server and
client machines connected over local area
network. Client/server systems can be
further enhanced to have better ROI using
thin-client (GUI-based) technologies like
ones from the vendors such as Citrix. Fig.
5 shows an example. In such architectures,
instead of many fat-client machines only
few client machines (even only one) are
used where application processing is done.
Operating systems like Windows 2000
allow multiple instances of IS running on
the same machine. Using such thin-client
technologies, these ISs can be accessed
by many users over thin clients. Fat-client
machines are typically server machines
often called as terminal servers. Such
technologies drastically reduce support and
maintenance efforts as they do not need to
install interfaces and business logic on many
fat-client machines. Only one instance is
shared amongst many through thin clients.
This is some sort of virtualization.
•
•
data storage, access, and manipulation.
They take care of concurrency,
redundancy, security, and consistency
of data. Most of the database servers
use standard query languages to
access and manipulate data.
Database systems are loosely
coupled; end users have a greater
degree of freedom in accessing data
and creating customized report
based on requirement.
Option of using various database
management systems and client side
development tools.
Interface
(thin client)
Business
logic
•
•
These systems are deployed on
networking environment; if not
properly configured security can be
an issue as there can be multiple
entry points into the systems. For
example, users can have direct access
to data in database server. Database
administrator needs to set proper
access rights and controls based on
users and their roles.
Scalability can be an issue especially
when the number of clients increase.
Load on database server increases as
the number of clients accessing that
server can increase, as it manages
exclusive session for each one.
Such system is difficult to manage,
especially support and maintenance,
when deployed in large scale at
different locations. Even a small
change in user interface needs to
update client components at all
locations.
Dependence on database, especially
if lot of business logic is implemented
at database server.
If systems are not properly designed,
developed, and configured, it may
lead to inefficient use of network
bandwidth; for example, lot of data
exchange between client and server.
Data services
(file handling)
Data
files
Key benefits
•
•
These are distributed systems
normally
deployed
in
LAN
environment where many client
machines are connected to a
common database server. They use
various resources: client side, server
side as well as network.
Data services are managed by
database servers which take care of
CSI Communications | May 2012 | 32
Mainframe
File storage
Thin clients
(e.g. dumb
terminals)
Fig. 3: Single-tier systems
www.csi-india.org
•
Interface
Business
logic
Business
logic
Fat client
Data
services
DB server (DBMS)
•
•
•
Network
DB server
business logic and database services
can be centralized.
Database systems are loosely
coupled; end users have a greater
degree of freedom in accessing data
and creating customized report
based on requirement.
Since components are loosely coupled,
these systems are highly scalable (load
balancing is possible by deploying
many servers) and accessible.
These systems are based on open
standards and can interconnect
different systems.
Core business logic as well as
interfaces can be designed and
implemented at granular/component
level (e.g. as web service, mashups
etc.) thereby increasing reuse and
new systems can be built with
relatively less effort using serviceoriented architecture (SOA).
Key issues
Client PCs
•
Fig. 4: Client/server systems
Web-based N-tier Systems
In web-based systems, functionalities of
all the three layers are separated, run on
different machines/devices and are loosely
coupled. They are deployed under Internet,
intranet (Internet-like setup within the
organizations using all technologies,
protocols, and standards that are used
in Internet), and extranet (extending
intranet setup to outside stakeholders
like business partners, dealers, vendors,
agents etc.) environments. In such ISs,
interface functionality is taken care by
client machines/devices, business logic by
web server (which stores and delivers web
pages), and data services by database
servers. However, in some cases part
of business logic is moved at DB server.
Business logic can be split into multiple
servers like web server and application
server (which takes care of specific
functional requirements like CRM). The
client can be a desktop machine, thinclient machine, smart device supporting
browser, or any device that supports
Internet connectivity (refer Fig. 6). The
computational resource requirements at
client side depend upon functionality to
be executed on that. Some clients require
more processing power (e.g. rich Internet
applications (RIA)) and applications that
need to install some components like
ActiveX etc. However, many web-based
information systems just need a browser
to access them from client machine.
Since these systems are highly
distributed,
openly
accessible,
have multiple entry points, and
interconnect many systems they are
vulnerable to attack. If systems are
not properly configured, they can
face security threats.
Dependence on network connectivity.
Key benefits
•
•
Table 2 shows examples of how
components in three layers are
implemented in single, client/server and
web-based systems.
These are completely distributed
systems and use optimal resources:
client side, server side, and Internet/
intranet and extranet. However, core
Interface
(thin client)
Business
logic
(e.g. Citrix)
Fat client
Business
logic
DB server
Network
Network
Terminal server
Thin clients
Data
services
DB server
Fig. 5: Thin client based client/server systems
CSI Communications | May 2012 | 33
Interface
Business
logic
Client (thin/rich)
Web server
•
Data
services
•
DB server
Internet/
Intranet/
Extranet
Network
Application
server
Web server (can
be in multiples)
Desktops,
laptops, smart
devices, thin
clients
DB server (can
have many
instances)
Fig. 6: Web-based n-tier systems
Key issues
•
deal of flexibility in selecting subscription
models based on functional and technical
requirements.
Cloud-based Systems
The Internet has evolved from a platform
that delivered web contents to the platform
to perform a variety of computing services.
Instead of managing information system
ISs and IT infrastructure on premise,
organizations are outsourcing them to thirdparty vendors called cloud vendors. Vendors
do not sell their software, platforms, or
infrastructure as products and solutions but
as services. Client organizations do not need
to buy them but use and access on demand.
This is equivalent to renting a car instead of
owning it. There are various service models
cloud vendors offer: software as a service
(SaaS), platform as a service (PaaS), and
infrastructure as a service (IaaS). Fig. 7 shows
basic architecture of cloud computing. The
client organizations can choose services
based on their requirements. There is great
Key benefits
•
•
•
•
Client organizations neither need to
own IT infrastructure and resources nor
need to maintain and support them.
They do not have to deal with constant
changes in technologies. Better ROI.
Better ROI, cloud vendors make
resources
available
based
on
requirement and demand. Since cloud
vendors provide services to many clients
they can have economies of scale.
Systems, platforms can be tested
before renting/subscribing etc.
There are many players who are
part of cloud vendor ecosystem
(e.g. independent software vendors,
developer, and expert communities)
the client organizations can take
advantages of the same.
Different
service/subscription
models can be opted depending upon
requirement.
It offers point-to-point and seamless
connectivity to the client firm and all
its stakeholders like employees and
business partners in the ecosystem.
For example, employees can access
email directly from cloud services
(e.g. Gmail) instead of connecting
to/accessing it from corporate email
server. Similarly, business partners
can access the system from the cloud
(instead of accessing it from the firm’s
IT data center) that the organization
accesses. There is no need of even
having extranet kind of environments.
•
•
•
Security is one of the major concerns
for client organizations as services
are offered on shared basis and
executed remotely.
Lock-in cost can increase in case
cloud vendor uses proprietary
technologies.
Cross-country legal framework to
enforce service-level agreements
(SLA) between client organizations
and cloud vendors.
Dependence on availability of
Internet connectivity and required
bandwidth.
Summary
Many client and software firms are opting
for n-tier computing architectures, and there
is a clear shift toward building and using
cloud infrastructures and services. The IS/IT
has moved from highly distributed systems
to centralized architectures (like core
Single
Client/server
Web-based n-tier systems
Interface
Developed using core
programming language/tools.
GUI forms (e.g. visual basic). Interfaces
are tightly integrated to client
information system.
Web forms/pages. Interfaces are loosely
integrated and downloaded from web
server.
Business
logic
Through core programming logic/
supported by tool.
Implemented using languages like VB
and partially at server side using DB
programming languages.
Implemented using core programming
and scripting languages.
Data
services
Program which implements the
IS configures, accesses, and
manipulates data files.
Managed by database servers. Server
Managed by database servers, data
side business logic is implemented using stores, and XML files.
DB programming languages such as PL/
SQL in Oracle (commonly referred as
SPs: stored procedures).
Table 2: Implementation of various layers: some examples
Continued on Page 36
CSI Communications | May 2012 | 34
www.csi-india.org
Security Corner
Adv. Prashant Mali [BSc (Physics), MSc (Comp Science), LLB]
Cyber Law Expert
Email: [email protected]
Information Security »
Cyber Crimes on/by Children
I would like to start this article with two
distinct cases I am handling: one in which the
child is the prey to cyber crime and another
where the child has committed cyber crime.
Case One: This child is 14 years old, the
biggest mistake she made was that she used
to write every single disagreement or fight
she had with her mother or father on daily
basis. Moreover, she used to substantiate
her loneliness further with a small poem. A
cyber criminal befriended her and used her
loneliness as a sword to sexually abuse the
girl. The girl is in deep mental trauma and the
family in distress. Even though we traced the
cyber criminal, but the larger question still
remains.
Case Two: This standard IX boy
suffering from dyslexia was abandoned by
his girlfriend studying in VIII. Moreover, the
girl often taunted him with being impotent.
This boy decided to take revenge on her, and
using the girl’s photograph made her fake
profile on Facebook. Further, he went ahead
and wrote her actual mobile number with
a comment that “I am a prostitute. Please
call”. The girl started receiving hundreds of
unsolicited calls. The case was investigated
and the boy was arrested for his cyber crime.
Children use the Internet for everything
these days, from homework to keeping in
touch with friends. Chat rooms, message
boards, forums, instant messages, and
Facebook has changed the way the world
talks to each other. Thanks to these new
communication portals, it is now possible
to be in contact with people from all over
the world instantly. While the majority of
people on the Internet are simply using it
for research or a form of entertainment,
there are some who use the World Wide
Web as a way to stalk and hunt prey. These
cyber criminals are considered by most to
be psychologically ill and in need of help.
However while that is true, these pedophiles
are also extremely manipulative and know
how to not only attain their prey, but they
are also experts at isolating those innocent
members of online communities in order to
get what they want.
According to a recently released survey
of online security technology firm, McAfee,
62% of children shared personal information
online and 39% of parents were unaware of
what their children do online. The survey says
58% of the children polled shared their home
address on the Internet, while 12% have been
victims of some kind of cyber threat.
Technique of Cyber Criminals
Cyber criminals often use a tactic called
"grooming". The first step in this process is
finding a victim. This can be done in a chat
room or by reading blogs. The criminal will
often look for something to share with the
victim. It could be a birthday or a favorite
sport, anything will do. This is simply done
to initiate communication. The next thing
you know emails are being exchanged and
a friendship has started. The next step in
the "grooming" process is to create a wedge
between the victims and their parents,
guardians, or protectors of any sort. This can
be done by waiting for the right moment.
Perhaps an email from the victim describes
a disagreement between them and their
parent or a blog tells of an argument. This
is the perfect opportunity for the cyber
criminal to become a friend and ally. Before
you know it, the relationship has developed
into a trust where the predator is always on
the victim's side no matter what. Eventually
this leads to a face-to-face meeting where
the actual crime takes place.
It is extremely important for parents
to be completely aware of their children's
actions on the Internet. What seems like
a simple friendship to a child could be a
predator catching their prey.
What are the Different Signals that
Your Child is at Risk on Internet?
1. Your child spends large amounts of
time online, especially at night.
2. You find pornography on your child's
computer.
3. Your child receives phone calls from
men you don't know or is making calls,
sometimes long distance, to numbers you
don't recognize.
4. Your child receives mail, gifts, or
packages from someone you don't know.
5. Your child turns the computer monitor
off or quickly changes the screen on the
monitor when you come into the room.
6. Your child becomes withdrawn from
the family.
7. Your child is using an online account
belonging to someone else.
Children Can Commit Cyber Crimes
in Following Ways by Using the
Computer as a Target (Using a
Computer to Attack Other Computers)
Did you know that the majority of cyber
crimes in this category are committed by
children? In April 2012, a teenager was
arrested for creating a devastating computer
worm. How did he learn to do this? A simple
Internet search will reveal all the tools
necessary to create viruses and hack into
others’ computers. Hacking can take a variety
of forms, ranging from stealing passwords and
classified information to vandalizing websites.
Unauthorized entry into an information
system through hacking or viruses has serious
legal consequences. Talk with your child about
the ethical and legal implications of hacking,
which attracts up to 3 years of imprisonment
and Rs. 5 lakhs of penalty in India.
The Computer as a Weapon (Using
a Computer to Commit Real World
Crimes)
Take, for instance, email. Children believe
email is harmless because they don’t see
the impact on the person who receives it.
A growing trend with the use of email and
Facebook is harassment; children are saying
things to other children—both at school and
in other communities—that they would never
say face-to-face. Parents need to teach their
children about appropriate communication
through email and Facebook.
The Computer as an Accessory (Using
a Computer to Store Illegal Files or
Information)
The Internet is a useful tool for finding
information in a quick and convenient way.
Even though much of this information is
available for everyone to use, many products
and services found online are not permissible
to be reproduced or downloaded, especially
music and purchasable programs.
Popular peer-to-peer software programs
make it easy to share copyrighted material
and actually encourage downloading.
However, it is a violation of copyright law
to take music or software from the Internet
without the permission of the owner. It is easy
for children to understand why the theft in the
real world is wrong, but it is difficult for them
to understand theft of intellectual property.
Teach your children not to download pirated
or counterfeit material. Downloading illegal
material attracts IT Act, 2000 provisions as
well as Copyright Act provisions.
Cyber Parenting is the need of the hour,
schools and colleges should take initiatives
to make parents aware of the current issues,
crimes, and the law of the land. I do my bit
by conducting free workshops in schools and
classes, but a major awareness drive by the
n
Government is the need of the hour.
CSI Communications | May 2012 | 35
Security Corner
Mr. Subramaniam Vutha
Advocate
Email: [email protected]
IT Act 2000 »
Prof. IT Law Demystifies Technology
Law Issues: Issue No. 2
Prof. IT Law: There are other contracts
that are not so obvious to most people.
For example, when you access a website,
you agree to their terms and conditions
and that is also in the nature of an
electronic contract.
IT Person: But I do not sign anything there.
On the other hand, when I buy something
I click on the BUY button or something
like that.
Prof. IT Law: When you browse a site you
have, by that very action of browsing,
accepted the terms and conditions for
accessing that site.
IT Person: But I do not ever read the terms
and conditions.
Prof IT Law: Like millions of others. But
that does not mean you have not agreed to
the “access terms” of that site. Moreover,
it also does not mean that you have no
binding electronic contract with that site
or its owners.
IT Person: This is confusing. Please explain in
a way I can understand.
Prof IT Law: In terms of contract law, you
can accept an offer in many ways. For
instance, on a website for sale of products
The Basics of an Electronic
[Internet-based] Contract:
IT Person: Prof. I. T. Law, it is a pleasure to meet
you again. I look forward to an enlightening
discussion with you on Technology law issues
that people like me should know.
Prof. IT Law: I enjoy talking to you too.
What topic should we discuss today?
IT Person: How about electronic contracts?
Prof. IT Law: Yes, that is a fundamental
issue in electronic commerce. All
commercial dealings over the Internet
are in the form of electronic contracts.
However, it is so easy to engage in buying
or selling over the Internet that we may
sometimes overlook the fact that we are
getting into electronic contracts.
IT Person: Can you give me some examples,
please?
Prof. IT Law: Well, think of the air tickets
you buy over the Internet, products you buy
on Flipkart or Snapdeal, or train tickets or
bus tickets.
IT Person: Yes, I understand. Those are
the obvious contractual transactions we
engage in.
you can accept an offer by ordering a
book or a bag. On a website that provides
mere information, you can accept their
offer of information by merely browsing
the site. Thus, your acceptance can be
indicated by the mere action of browsing
the site, which results in a contract that
binds you to its terms.
IT Person: But accepting a contract by just
doing something rather than signing off
sounds a little incomplete to me.
Prof. IT Law: If the law were not so flexible
we would have had to sign documents
for every deal we do. For any contract,
you need an offer from one party and
an acceptance of the offer by another
party. Over the Internet that happens all
the time.
IT Person: That is interesting.
Prof IT Law: Yes. In a future meeting
we shall discuss how an offer and an
acceptance is actually made over the
Internet, and the issues that should be
kept in mind in electronic commerce.
IT Person: I shall look forward to that.
Talking to you is always so stimulating. n
Continued from Page 34
Interface
Cloud
application
Internet
Shared
infrastructure
Shared
platform
roles and functionalities of IT/IS personnel.
Such paradigm shift is going to have some
issues and challenges such as security and
privacy of confidential data, dependence or
lock-in on cloud providers, management and
enforcement of SLAs, and cross-country
legalities. However, there are initiatives like
having private clouds to take care of some
such issues and challenges.
Bibliography
Internet
Web/real-time Application
/platform
servers
servers
DB servers
Network infrastructure
Cloud services
Desktops,
laptops, smart
devices/existing
IT setups
Fig. 7: Basic cloud computing framework
banking solutions) deployed at data centers.
Now such centrally deployed systems are
likely to move to cloud infrastructure. Such
CSI Communications | May 2012 | 36
shifts are helping client organizations to
get rid of managing IT systems, resources,
and infrastructure. This has also changed
[1] Laudon, Kenneth. C., and Laudon,
Jane. P. (2012). Management
information systems: Managing the
Digital Firm, 12th edn., Pearson
Education.
[2] James O'Brien, George Marakas, and
Ramesh Behl. (2010). Management
Information Systems, 9th edn., Tata
McGraw Hill.
[3] Henry C. Lucas Jr. (2008). Information
Technology: Strategic Decision Making
For Managers, Wiley India.
[4] http://www.citrix.com/ accessed in
April 2012.
n
www.csi-india.org
ICT@ Society
Achuthsankar S Nair
Editor, CSI Communications
Graphic Texting
When you hear the word
'computer art' you might
start thinking about the
wonders of computer
graphics, from Adobe
Photoshop to dazzling
image processing and
morphing software.
There was a time when
all the computers could
handle was plain text.
People who have used 'line printers' during
those days would know how far away was
the computer from graphics.
Well, even when there was no
computer and the king of text processing
Fire-breathing Dragon by Joan G. Stark
is the typewriter, strange forms of art
used to be practiced with these machines.
Such 'typewriter art' is believed to exist
from 1890s itself. Expert typists could
create a close image of Mona Lisa by
clever over-typing.
During 1950s, some computers
even accepted this method to produce
graphics from text printers. These days
are fortunately gone, but the art from
the keyboard had been reborn in the
computers in a big way. Joan G. Stark of
Cleverland, Ohio, one of the leading ASCII
artists, could surprise anyone with the
immense creativity she can reflect on the
computer keyboard.
(ASCII, or American standard code
for information interchange, is a number
coding scheme for computer keyboard
characters used since 1960s. For
example, when you type the character
'a' on the keyboard, the number code
97 is what is stored inside the PC. In
practice, ASCII is simply a reference to
the set of characters that you can see on
the keyboard.)
The smiles that we often stick up in
e-mails are miniature ASCII art. However,
Stark’s variety of ASCII art is not single
line. Some of them like the fire-spitting
dragon can be a screen-full. She seems
to have picked up the liking for keyboard
art while she got to play with her father’s
office typewriter during her childhood.
After hearing about ASCII art five years
ago, she has been churning out exciting
artwork. All she uses is the Notepad, and
of course her wonderful imagination. Links
to her works are available in her wiki page.
URL: http://en.wikipedia.org/wiki/Joan_
Stark.
Take a fresh look at the computer
keyboard before you visit her site. Do the
keys (,), “’,’,-, = look capable of creating any
art? Now prepare for the pleasant surprise
in the links available in her wiki page. Her
web site also has enough resources for
would be ASCII artists. Her own works are
classified into birds, cats, zoo animals etc.
She has dated, titled and initialized most
of her exhibits.
A history of the art, an account of
her personal experiments with it, tips for
beginners and links to related sites are
available in the External links section of
Sterk's wiki page. Joan, being a mother
of four kids whom she introduces in the
web site, not surprisingly, through ASCII
art. n
[8] Leon, D (1962). “Retrieval of
misspelled names in an airlines
passenger record system”, ACM
Communications, 5, 169-171.
[9] Nair, A S (2007). “Computational
Biology & Bioinformatics: A Gentle
Overview”, Communications of the
Computer Society of India, 31(1), 1-13.
[10] Navarro, G (2001). “A Guided Tour to
Approximate String Matching”, ACM
Computing Surveys, 33(1), 31-88.
[11] Needleman, S B and Wunsch, C D
(1970). “A general method applicable
to the search for similarities in
the amino acid sequence of two
proteins”, Journal of Molecular Biology,
48(3), 443-53.
[12] Prema, S (2004). “Report of Study on
Malayalam Frequency Count”, Dept. of
Linguistics, University of Kerala.
[13] Soundex, [Online]. Available: http://
en.wikipedia.org /wiki/Soundex,
Accessed on 2 Dec. 2011.
[14] Wagner, R A and Fischer, M J (1974).
“The String-to-String Correction
Problem”, Journal of the ACM, 21(1),
n
168-178.
Continued from Page 13
[5] Hall, P A V and Dowling, G R (1980).
“Approximate String Matching”,
ACM Computing Surveys, 12(4), 381402.
[6] Henikoff, S and Henikoff, J G (1992).
“Amino Acid Substitution Matrices
from Protein Blocks”, Proceedings of
the National Academy of Sciences of
the United States of America, 22(22),
10915-10919.
[7] Kanitha, D (2011). “A scoring matrix
for English”, MPhil Dissertation in
Computational Linguistics, Dept. of
Linguistics, University of Kerala.
CSI Communications | May 2012 | 37
Brain Teaser
Dr. Debasish Jana
Editor, CSI Communications
Crossword »
Test your Knowledge on Linguistic Computing
Solution to the crossword with name of first all correct solution provider will appear in the next issue. Send your answers to CSI
Communications at email address [email protected] with subject: Crossword Solution - CSIC May 2012
1
CLUES
2
ACROSS
3
1.
4
5
6
4.
5.
8.
11.
7
13.
15.
16.
8
9
10
11
17.
21.
12
13
23.
14
25.
26.
15
28.
16
17
29.
18
Determine the part of speech for each word from a sentence
(10)
Yahoo's text and web page language translation tool (9)
A set of parameters defining user's language, country etc. (6)
The study of how meaning is affected by context (10)
A formal system in mathematical logic for expressing
computation by way of variable binding and substitution (6)
A database engine for annotated or analyzed text (6)
A lexical database for the English language (7)
Name of lemma that helps to tell that a language is not
regular (7)
Vocabulary of a language (7)
Type of machine learning task to infer a function from training
data (10)
Abbreviation formed from the initial parts in a word or a
phrase (7)
Meaning encoded in a language expression (9)
A multilingual dictionary for language translations on
Windows (6)
Process of analyzing a text as a sequence of tokens
(words) (7)
A very important data structure (4)
DOWN
19
20
21
22
23
24
25
26
28
27
29
2. A company dealing with language translation software (7)
3. Phase structure grammar (11)
6. ISO standard markup framework for natural language
processing (3)
7. A variation of finite automaton (8)
9. The study of the nature, structure, and variation of language
(11)
10. A variant form of a morpheme (9)
12. Microsoft's language translation service (4)
14. A search algorithm for traversing or searching a tree structure
or alike (10)
18. Interaction between computers and humans (3)
19. The study of the origin and history of individual words (9)
20. Rules that describe formation of correct sentence in a
language (7)
21. One of the oldest machine translation companies (7)
22. A variety of a language peculiar to a particular region (7)
24. Abbreviation of processing natural language (3)
27. An international scientific and professional society dealing
with computational linguistics (3)
Solution to April 2012 crossword
1
"I am a failure as
a computational
linguist! My son
sends me an SMS "U
R 2 YY 4 ME" and
none of my algorithms
could crack it. His
friends all are able
to read it as "You are
too wise for me"
4
P
2
A
Y
T
H
3
F
O
O
L
M
K
6
13
H
B
L
C
O
O
R
M
D
O
11
E
R
S
G
A
20
R
C
Y
S
22
J
S
S
16
A
CSI Communications | May 2012 | 38
I
T
T
17
J
A
J
I
D
N
C
R
R
E
N
L
T
3
B
S
E
S
G
E
P
P
H
R
P
O
V
D
A
C
15
T
O
9
R
A
X
M
T
I
L
S
E
T
H
T
U
B
25
26
Y
U
M
I
C
R
B
T
L
P
O
F
G
30
S
V
N
A
A
C
29
R
I
10
X
A
V
E
G
8
R
I
W
P
O
Y
N
O
32
G
E
24
28
E
U
A
T
14
18
V
Congratulations to
Ms. P Deepa (Chennai), Dr. Suresh Kumar (Faridabad), Er. Aruna Devi (Mysore),
Dr. T Revathi (Sivakasi) and Mr. S K Khatri (New Delhi)
for getting ALMOST ALL correct answers to April month’s crossword.
D
O
Q
21
R
P
A
H
W
23
7
C
N
A
19
W
T
W
C
P
S
12
5
N
I
C
E
31
F
A
R
R
E
C
M
A
E
N
L
E
D
L
S
A
T
R
A
S
T
L
U
O
I
H
Q
L
H
O
27
C
www.csi-india.org
Ask an Expert
Dr. Debasish Jana
Editor, CSI Communications
Your Question, Our Answer
“Take up one idea. Make that one idea your life - think of it, dream of it, live on that idea. Let the brain,
muscles, nerves, every part of your body, be full of that idea, and just leave every other idea alone.
This is the way to success.”
~ Swami Vivekananda
}
Subject: C++ example
}
Sir, I have a couple of questions in C++ which I am having doubts.
So could you please answer these questions for me? I will be very
grateful if you kindly do so.
1. A data member of a class cannot be declared as friend. Why?
2. What should the overloaded operator [ ] return?
3. Can virtual function be declared as a static member of a class?
4. Should destructors be declared virtual as a good programming
practice?
5. An overloaded function can have default arguments?
Thanks.
Sourideb Bhattacharya
Student,
BE (Instrumentation & Electronics Engineering) 3rd year
Jadavpur University, Kolkata
A Here are the answers to your questions:
1. Data cannot access another data or function, functions can
access. So, no point giving access right to data.
2. The overloaded operator [] is meant for accessing single
element in a list of multiple elements like an array. So return type
should be element type with reference e.g. int & or in template
form T&. The code snippet in template form is given below:
template <class T>
class Array
{
private:
T *data;
int size;
public:
Array(int s)
{
data = new T[s = size];
}
Array(int s)
{
data = new T[s = size];
}
~Array() {
if (data) {
delete [] data;
}
}
T& operator [] (int indx) {
if ((indx < 0) || (indx > size -1)) {
.... raise exception ...
}
else {
return data[indx];
};
Here, the overloaded operator [] returns the actual data
element by reference, otherwise, we cannot use this element to
be as modifiable like
Array<int> a(10);//int array of 10 elements
a[0] = 4; // assign 4 to first element of array
This would not have been possible if you returned by value.
That would have resulted in a copy of actual element be created
and the copy be assigned the value, original content remaining
unassigned.
3. No, static member is meant for class type and not object
type. Dynamic binding is applicable only on objects depending on
dynamic type of object (X * or Y *, depends on how the new was
issued like new X; or new Y; where Y is a subclass of X), the virtual
function mechanism is applicable for objects. Static members
cannot be virtual.
class X
{
public:
virtual void vf() {
cout << “X::vf” << endl;
}
}
};
class Y : public X
{
public:
void vf() {
cout << “Y::vf” << endl;
}
}
};
Now, if we have a program snippet as below:
X * p = new Y();
p->f();
This will print Y::vf and not X::vf. Here, the virtual
function vf is applicable for the object of type X and Y and
require the object instance to be called with. Static won’t do.
4. Yes, always. Otherwise, in a situation where Y is a subclass of
X, and you have X* p = new Y, then, delete p would not call Y's
destructor if X destructor was declared as virtual.
5. Yes. But Overloaded operators cannot.
n
Send your questions to CSI Communications with subject line ‘Ask an Expert’ at email address [email protected]
CSI Communications | May 2012 | 39
Happenings@ICT
H R Mohan
AVP (Systems), The Hindu, Chennai
Email: [email protected]
ICT News Briefs in April 2012
The following are the ICT news and headlines
of interest in April 2012. They have been
compiled from various news and Internet
sources including the financial dailies - The
Hindu, Business Line, Economic Times.
Voices & Views
• In a couple of years, 85% of people will
be using smartphones - Microsoft India
Chairman.
• Chinese hacker is responsible for cyberattacks on Government of India, military
research organisations and shipping
companies - Trend Micro.
• Global handset shipments will increase
29% from 1.7 billion in 2012 to 2.2 billion
in 2016 of which smartphone to touch
1 billion - ABI Research.
• Emerging markets to spend $1.22 trillion
(representing 31% of the worldwide
total) on IT in 2012 - Gartner.
• Indian enterprise software market will
grow 13% in 2012 with revenue of $3.22
billion – Gartner.
• Tablet sales touch 4.75 lakh in 2011 CyberMedia Research.
• The US Citizenship and Immigration
Services has received about 22,000
petitions (against the cap of 65,000) for
H-1B work visas in the first four days.
• Computer Society of India to promote
free software - CSI president Satish Babu.
• ‘I warned Raja against advancing 2G cutoff date' - Ex-Telecom Secretary.
• Media tablet sales will double this year
globally to 12 crore units from 6 crore
units in 2011 - Gartner.
• The Indian logistics industry is estimated
at $130 billion and is expected to grow to
$385 billion in the next four to five years
- Mr P. Srikanth Reddy, Chairman, Four
Soft.
• Publishers reach settlement with US
Justice Dept on e-book pricing.
• By 2015, the market for ‘big data'
technology and services globally will
reach $16.9 billion up from $3.2 billion in
2010. Every day, 2.5 quintillion bytes of
data are created – IDC.
• Social networking sites should set up
servers in India – Rajasthan CM Gehlot.
• Karnataka's IT exports zoomed nearly
50% to touch Rs. 1.3 lakh crore in 2011-12.
• ‘Data breach costs Indian organisations
Rs. 5.35 crore annually’ – Symantec.
Telecom, Govt, Policy, Compliance
• Govt will help fund buys of foreign firms
with high-end cyber security technology.
CSI Communications | May 2012 | 40
• Aakash-II, sub $40 Android tablet launch
likely in May – Sibal.
• Supreme Court rejects 2G operators’
review petition.
• Airtel rolls out 4G at Kolkata, to offer high
speed Internet services.
• DoT panel sees merit in merger of BSNL,
MTNL.
• Centre may clear Karnataka's plan to set
up IT investment region at an estimated
investment of Rs. 90,000 crore. The
project would generate about 1.1 million
direct and 2.7 million indirect jobs.
• AICTE and Microsoft announced the
implementation of Microsoft Live@edu
for all the technical colleges in India.
• TRAI wants licensing powers under new
unified regime.
• The future of Aakash tablet hangs
in balance as Datawind and QUAD
Electronics have locked horns over
alleged violation agreements.
• Mobile ARPUs start rising for first time in
many years.
• DoT asks telcos to comply with new
tower radiation norms.
• TRAI sets quality norms for mobile
banking services.
• TRAI makes one ‘per second’ plan
mandatory.
• Prospects brighten for silicon wafer fab
units as global firms offer support.
• TRAI sets base price for 2G spectrum at
10 times 2008 rate with price varying
between Rs. 3,622 and Rs. 14,480 crore
per megahertz of airwaves.
• Panel set up to frame norms for telecom
firms for issuing SIM cards.
• TRAI launches online facility (www.
tccms.gov.in) to monitor consumer
complaints.
IT Manpower, Staffing and Top Moves
• Cyrus Mistry and O.P. Bhatt (Ex. SBI
Chairman) join TCS board.
• Progress Software to help engineering
colleges in setting up incubation centres
in Hyderabad.
• Potential job losses in telcos 'enormous'
- HR Experts.
• Hiring of NRI professionals up 5% in JanMar 2012.
• Infosys BPO to recruit 13,000 across
18 locations. Also plans to hire 35,000
people this fiscal.
• Steelwedge Software to raise India
headcount from 180 to 1150 by 2016.
• Walmart Labs to hire 200 engineers.
• Uninor employees take to the streets to
save company.
• TCS employee addition at all-time
high with a gross addition of 70,400
employees in the year ending March
2012.
• 150 Bangalore staff hit in Yahoo!'s 2,000
cut globally.
• SingTel Global (India), to expand its
operations in five more cities, including
Jaipur and Ahmedabad, and double its
workforce by 2014.
• Tata Elxsi to increase headcount at
Bangalore lab.
• TCS chief, Mr N. Chandrasekaran,
to assume the office of Chairman of
Nasscom.
• IT companies step up hiring of
engineering graduates. The average
salary increased by about 10% compared
to last year and in the range of Rs. 3.05
lakh to Rs. 3.25 lakh per annum.
Company News: Tie-ups, Joint
Ventures, New Initiatives
• Cisco is considering to set up a
manufacturing and services unit in
Maharashtra.
• Wipro asks component vendors to
disclose emission data as part of Green
IT initiatives.
• HCL Info launches operations in Qatar.
• Micromax joins the tablet war with its
FunBook priced at Rs. 6,499.
• Facebook’s mobile app now in seven
Indian languages.
• Local search engine hudku.com launched.
• Reliance emerges as the first telecom
operator in the country to offer tablets on
both the 3G and CDMA networks after
launching the CDMA tablet.
• Kaspersky comes out with suggestions
on how to protect your Mac OS. Will be
useful to 10 crore Mac OS X users around
the world.
• Facebook buys Instagram – smartphone
photo sharing application for $1 billion.
• Four Soft bets big on cloud-based product
for logistics sector.
• MonsterIndia launches app for mobiles.
• HP unveils ‘converged cloud' services.
• Wipro to provide tech services for San
Francisco Marathon.
• Green Platinum rating for Infosys.
• Now, ‘Google Drive' to take on rivals'
cloud storage service.
• Samsung overtakes Nokia to become top
selling phone brand globally.
n
• Zenith launches TigerCloud.
www.csi-india.org
CSI Report
Prof. Dipti Prasad Mukherjee* and Dr. Dharm Singh**
* RVP, Region II
** Member SIG-e-Agriculture, CSI
* CSI Regional II Meeting at Kolkata
A regional meet of the office bearers of different chapters of the Region II was organized
at the Indian Statistical Institute, Kolkata on Sunday, March 25, 2012. The representatives
from the Patna (Prof. A K Nayak), Durgapur (Prof. Asish Mukhopadhyay), Siliguri
(Dr. Ardhendu Mandal) and Kolkata (the current and incoming chairmen Mr. Sushanta
Sinha and Dr. Debasish Jana) chapters were present. The meeting was also attended by
the CSI Secretary Prof. H R Vishwakarma, Division III Chair Prof. Debesh Das, the regional
student coordinator Prof. Phalguni Mukherjee and national nomination committee member
Mr. Subimal Kundu. Prof. Dipti Prasad Mukherjee, Regional Vice-President Region II, welcomed
the gathering and urged to increase the CSI activity in the Eastern India. Prof. H R Vishwakarma
discussed encouraging growth of the CSI membership across India except the eastern region.
The problems faced by the smaller chapters like Durgapur and Siliguri were discussed in detail.
Possibility of obtaining some seed funds from the CSI headquarter and A-category chapters for smaller chapters was explored at length. A
number of senior CSI members present in the get-together expressed their concerns regarding the image of CSI and suggested more quality
programs for enhancing the CSI brand value. A set of activity was planned in Patna, Durgapur and Siliguri chapters. The meeting ended with a
positive note of leveraging the potential of Region II in expanding the reach of CSI.
** Special Interest Group on e-Agriculture
Annual Report: 1 April 2011 to 31 March 2012
Background
Special Interest group on e-Agriculture was formed in January 2011. The indirect benefits of IT in empowering Indian farmer are significant and
remain to be exploited. The Indian farmer urgently requires timely and reliable sources of information inputs for taking decisions. At present,
the farmer depends on trickling down of decision inputs from conventional sources which are slow and unreliable. The changing environment
faced by Indian farmers makes information not merely useful, but necessary remain competitive. The role of ICT will of great importance for this
60 percent population dependent on agriculture as a part of rural development which isolate from urban sector thereby bridging the digital divide.
Objectives
•
•
•
To transform technological intervention to increase agriculture production and productivity by ICT.
To empower the farmers to take quality decision this will improve agriculture and allied activities.
To research and develop strategy of ICT application in agriculture and allied activities.
Activities: Events – 2011-2012
Host Institute
Conference and Theme
Date and Location
SIG-WNs, SIG-e Agriculture, DivIV, Udaipur Chapter, A three days International Conference on Emerging Trends 22-24 April, 2011 at the CTAE
CSI, IEI, WFEO, CTAE, TINJR and Co-Sponsored by in Networks and Computer Communications (ETNCC2011) Udaipur, India.
IEEE Delhi Section
was organized
SIG-WNs, SIG-e Agriculture, Udaipur chapter and Motivational and expert series of lectures
5th May 2011
MPUAT
Speakres: Dr. S. Reisman, President, IEEE Computer Society CTAE, Udaipur
(Cyber lecturer), Dr. Dharm Singh, Convenor SIG-WNs CSI,
Dr. YC Bahtt, Convenor SIG-e-Agriculture
Udaipur Chapter, SIG-WNs, SIG-e-Agriculture, IEI- First CSI Rajasthan State IT Convention and National May 17-19, 2011
ULC
Conference with Celebration of World Telecommunication
SIGs Campus Udaipur
and Information Society Day 2011 on “WTISD 2011: Better
CTAE and SGI
life in rural communities with ICTs”
SIG-WNs and e-Agriculture CSI, IEI ULC and TINJR
National Seminar on IP Multimedia Communications
IEI, SIG-WNs & e-Agriculture CSI, CTAE and TINJR
All India Seminar on Information and Communication February 11-12, 2012
Technology for Integrated Rural Development
October 14-15, 2011, Udaipur
Peer recognition achieved within India/globally
This group is new one and presently taking up some projects on research and development side to develop electronic planters for precision
farming. More collaborative work is envisage once the activities are strengthen more.
Plans 2012-13
1.
2.
Technical Session in 26th National Convention of Agricultural Engineers in January 2013.
Seminar exclusively on theme of e-Agriculture planned at end of 2012.
Dr. R. Srinivasan, Past President and Fellow of CSI has been appointed as Professor Emeritus in SRM University. Currently he is also serving as Dean Research & PG Studies
at RNS Institute of Technology, Bangalore. Dr. Srinivasn is a member of IEEE, Member of IEEE Computer Society, Fellow of IETE (India) and Life Member of ISTE.”
CSI Communications | May 2012 | 41
CSI Journal of
COMPUTING
ISSN 2277-6702
e-ISSN 2277-7091
www.csijournal.org
Dear CSI Fraternity,
CSI has launched the ‘CSI Journal of Computing’, with truly original papers from the vibrant community of academia, industrial researchers,
innovators, and entrepreneurs around the world. The first issue was released by the Honorable Chief Minister of Maharashtra, Shri
Prithviraj Chavan, on the CSI Foundation Day 2012 at Mumbai. The Journal covers topics related to Computer Science, Information
Technology, several boundary areas among these and other fields. It is managed by an International Editorial Board. Initially each volume
will have four issues.
Contents of Vol. 1, No. 1, March 2012
•
•
•
•
•
•
•
Efficient Face Recognition using Local Active Pixel Pattern (LAPP) for Mobile Environment: Mallikarjuna Rao G, Praveen Kumar,
Vijaya Kumari G, and Babu G R
Scalable Lock-Free FIFO Queues using Efficient Elimination Techniques: V V N Pavan Kumar and K Gopinath
Direct Approach for Machine Translation from Punjabi to Hindi: Gurpreet Singh Josan and Gurpreet Singh Lehal
Markov Modeling in Hindi Speech Recognition System: A Review: R K Aggarwal and M Dave
The Genome Question: Moore vs. Jevons: B Mishra
Hash Based Key Indexing: A New Approach to Rainbow Table Generation: Deepika Dutta Mishra, C S R C Murthy, A K Bhattacharjee,
and R S Mundada
Bioinformatics for Next Generation Sequencing: Srinivas Aluru
CSI member
Non CSI member (`)
Individual
` 400/Volume or US$20/-
` 800/Volume or US$25/-
Library
` 600/Volume or US$/50/-
` 1000/Volume or US$75/-
For bulk discounts and other related information you may contact Mr. SM Fahimuddin Pasha, ([email protected]) Coordinator.
I invite you all to reserve your copy as soon as possible through www.csijournal.org/subscription.
Looking forward to your paper contributions to the Journal and subscriptions.
Advertisements and Sponsorships
To make the Journal and publications from CSI vibrant and offer Open Access for the community with a minimal subscription for the
print versions, we solicit sponsorships for the journal. Note that the open access version offers very affordable advertisements. For
advertisement rates, please refer to www.csijournal.org (also on the cover pages of the journal). Here are some of the varieties of
Sponsorship possibilities for Software Houses, Universities, and Government organizations.
Sponsorships
Rate and Numbers/year
Benefits
Platinum
` 100,000/Numbers: two
a. Online advertisement of 1 full page - whole year
b. Half page - printed version
Gold
` 75,000/Numbers: Four
a. Online advertisement of 1/2 page - whole year
b. 1/4 page - Printed version
Silver
` 50,000/Numbers: Eight
a. Online advertisement of 1/4 page - whole year
b. One column (1/8 page) - printed version
Institutional Memberships
` 25,000/-
a. The member institutions name will be carried on the
web as well as in the printed version
b. 1/4 Page online advertisements - whole year
CSI has vibrant distributorship across the country with 66 chapters, 385 student branches, and over 80,000 memberships across the country.
Looking forward to generous sponsorships and by institutional memberships from the community to keep CSI publications vibrant.
Satish Babu
President,
Computer Society of India
CSI Communications | May 2012 | 42
Prof. R K Shyamasundar (TIFR)
Editor-in-Chief, CSI Journal of Computing
Chairman, CSI Publication
www.csi-india.org
CSI News
From CSI Chapters »
Please check detailed news at:
http://www.csi-india.org/web/csi/chapternews-May2012
SPEAKER(S)
TOPIC AND GIST
GHAZIABAD (REGION I)
Dr. Pankaj Jalote, Mr. Sunil Asthana, Mr. Amit Goenka, 7 April 2012: 10th National IT Seminar “Recent Trends in Software
Mr. Navneet B Gupta
Technologies (RTST-2012)”
Dr. Jalote discussed the definition of engineering, especially software
engineering & skills required. He discussed role of science-based researcher
and engineering researcher, abilities of researcher, and difference between
research & research manager.
Mr. Sunil Asthana spoke about developments in IT Industry and covered
various aspects of Mobile Commerce, Mobility, Mobile Applications, and
Cloud Computing. There were two technical sessions: Emerging Trends &
SIG Role in Software development and Recent Advances in Software testing,
maintenance, and quality assurance.
Dr. Pankaj Jalote, delivering the talk during inaugural session of RTST-2012
(L to R: sitting) Dr. A K Puri, Sh. Sunil Asthana, Dr. Vineet Kansal, and
Dr. Rabins Porwal
GWALIOR (REGION III)
Jayu S Bhide
1 to 3 March 2012 and 14 March 2012: A program on “HAM Radio”
A program on HAM Radio was jointly conducted by I.P.S. College Gwalior &
CSI Gwalior Chapter from 1st to 3rd March and later on 12th March by R.J.I.T.
Teknanpur. Mr. Jayu S. Bhide spoke and organized a live demonstration of
HAM Radio. Attendees learnt how to set up the HAM Station. Students
asked questions regarding security and operation of HAM Radio and speaker
answered the queries.
Mr. J S Bhide and students, during HAM Radio practical
CUTTACK (REGION IV)
Dr. Lalit Mohan Patnaik, Mr. Sushant Panda,
5-7 March 2012:
“Cloud Computing”
Conference
and
Student
Convention
on
Objective of the conference was to provide an overview on Cloud Computing,
the evolution, when and why to use the cloud services, some major market
players and what they provide, and to familiarize the participants on the
software and services available in the Cloud public domain. The first day
was an Industry day, the second day was devoted to technical workshops
and on the third day selected R&D papers were presented by the conference
participants.
Photograph showing inauguration of the Conference
(L to R): Mr. Sanjay Mohapatra, Prof. (Dr.) R Misra, Prof. (Dr.) L M
Patnaik, IISC Bangalore, Er. S Rout, Mr. Sushant Panda, and Dr. K C Patra
BANGALORE (REGION V)
Mr. Srikantan Moorthy, Sr. VP & Group Head, Education 17 March 2012: i3 for i3 Club Launch @ Infosys Campus, Bangalore
& Research, Infosys Technologies
Mr. Srikantan Moorthy delivered key note address on “Top Employability
Parameters”. Participants took up three key topics mentioned by Mr.
Moorthy viz, A. Building competency among faculty B. Building Competency
among students and C. Improving industry interaction.
Participants in group discussion
CSII Co
C
CS
C
Communications
mmun
mm
unic
un
icat
atio
at
ions
ns | Ma
May
y2
201
2012
012
01
2 | 43
SPEAKER(S)
TOPIC AND GIST
COIMBATORE (REGION VII)
Dr. Narasimha Murthy K Bhatta and Mr. Mahesh Kolar
10 March 2012: Industry Interaction Day on “Future of Indian IT Sector:
Trends, Opportunities and Challenges”
A technical session on ‘Cloud Computing’ was handled by Dr. Narasimha
Murthy K Bhatta. The second technical session on ‘Mobile Technologies’
was delivered by Mr. Mahesh Kolar.The panel discussion held on the theme,
“Future of Indian IT Sector: Trends, Opportunities and Challenges”. Various
trends, opportunities and challenges of Indian IT industry were discussed by
panel members.
(L to R) Mr. Ashok Bakthavathsalam, Mr. Mahesh Kolar, Mr. R Shekar,
Prof. S Balasubramanian, Mr. Kumar Krishnasami, Dr. Narasimha Murthy
K Bhatta, and Mrs. Maya Sreekumar
TIRUCHIRAPPALLI (REGION VII)
Prof. S Ravimaran, Mr. Ramachandran, Dr. S Selvakumar
15 March 2012: National Level Technical Symposium on “Emerging Trends
in Computing, Informatics and its applications - COMBLAZE 2k12”
Mr. Ramachandran highlighted importance of communication skill and hard
work. He advised student community to upgrade their skills continuously. Dr. S.
Selvakumar briefed on Cyberspace security and Network security and technology
updates in this Cyber era. He explained various security requisites and security
measures. He explained the techniques with real world scenarios, latest tools and
software for security and mentioned several resources and references to learn
more on the subject.
Dr. Selvakumar at workshop
From Student Branches »
http://www.csi-india.org/web/csi/chapternews-May2012
SPEAKER(S)
TOPIC AND GIST
ABES ENGINEERING COLLEGE, GHAZIABAD (REGION-I)
24 March 2012: An intra-college technical paper presentation competition
(Techsurge-2012)
Intra-college technical paper presentation competition was organized in
collaboration with Ghaziabad Chapter. Objective was to create awareness
among students about emerging technologies and encourage them to take
up research on related subjects. Approximately 130 students participated in
this activity from different courses. Total 21 papers were presented during
Techsurge-2012.
Guests on dias at ABES College, Ghaziabad
CSII Co
CS
Comm
Communications
mmun
mm
unic
un
ic
catio
attions
nss | Ma
May
y2
2012
012
01
2 | 44
4
www.
ww
w.cs
w.
csics
i-in
indi
in
dia
di
a.or
a.or
org
g
www.csi-india.org
SPEAKER(S)
TOPIC AND GIST
DR. ZAKIR HUSAIN INSTITUTE, PATNA (REGION-II)
Prof. (Dr.) A K Nayak and Dr. M N Hoda
26 March 2012: One-day Seminar on “Twenty First Century Professionals:
Industry Expectations”
In his Inaugural Address, Prof. (Dr.) A. K. Nayak advised students to have
dedication, devotion & determination to achieve scale of excellence in the
profession. Prof. Hoda stressed that quality of computer education is the need
of the hour for catering to the industry demand. He told students to make
sincere effort to develop effective ability within them since students passing out
are not reaching up to the expectations of organizations.
The dignitaries sitting on the dais during the workshop
SARDAR VALLABHBHAI PATEL INSTITUTE OF TECHNOLOGY(SVIT), VASAD (REGION-III)
Prof. Virendra Ingle and Prof. Rinku Chavada
15-16 March 2012: Workshop on "Android-based Mobile Application
Development"
The workshop covered topics like introduction to Android, the anatomy of
Android applications, UI screen elements and layout, and Android data and
storage APIs as well as Location-based Services APIs.
Participants at the workshop
R.V. COLLEGE OF ENGINEERING, BANGALORE (REGION-V)
Mr. Partha and Dr. S Sathyanarayana
19 March 2012: Motivational Talk on “Software Testing - Career”
It was an occasion to facilitate Certificate distribution for students, who
cleared “Software Testing Certification Examination”. Mr. Partha told that
people look at testing with different mindset. We need to think from the
perspective of customer. The tester are better coders. Dr. S Sathyanarayana
advised the participants to make better use of opportunities.
Participants attending motivational talk on “Software Testing”
ANURADHA ENGINEERING COLLEGE, BULDHANA (REGION-VI)
Prof. Avinash S Kapse and Dr. S V Agarkar
28 February 2012: National Science Day Celebration “Project Exhibition &
Debate Competitions”
Prof. Avinash S Kapse talked about importance of projects in globalization of
knowledge & about projects needed by society. Dr. S V Agarkar gave guidance
to students and answered their queries.Students explained their projects.
Inaugural Session: (L to R) Prof. Avinash Kapse, Dr. S V Agarkar,
Shri. Siddheshwarji Wanere, and Students
Mr. N B Mapari, Prof. Avinash S Kapse, Dr. S V Agarkar,
and Prof. K H Walse
3-4 March 2012: Two-days Workshop on “Understanding and using
Android platform”
Prof. Avinash S Kapse talked about importance of workshop & made appeal
to students to improve their personality. He suggested use of the Android
technology in future life. He also spoke about globalization of knowledge. Dr. S
V Agarkar spoke about importance of Android & its applications & technology.
Prof. K.H.Walse talked about importance of Andriod technology in future.
Inaugural Session: (L to R) Mr. D G Vyawahare, Dr. Bhattachrayya,
Dr. S V Agarkar, Prof. Avinash Kapse, , Prof K H Walse, and Mr. Dhaval Gulhane
CSII Co
C
CS
C
Communications
mmun
mm
unic
un
icat
atio
at
ions
ns | Ma
May
y2
201
2012
012
01
2 | 45
SPEAKER(S)
TOPIC AND GIST
K. K. WAGH INSTITUTE OF ENGINEERING EDUCATION & RESEARCH, NASHIK (REGION-VI)
13-14 March 2012: National Level Technical Symposium "Equinox 2k12"
Various events conducted such as - • CODE-COGS: Programming Contest,
• SPIDER -WEB: Web Designing Contest • TECHNO HUNT: Project
Competition • SCRATCH YOUR BRAIN: Aptitude & Group Discussion • NET
CONNECT: Networking Workshop • WORLD WAR III: Robo Wars.
Chief Guest Mr. Piyush Somani, Prof. Dr. S S Sane, Faculties, and Student Member
17 March 2012: International Conference on “Emerging Trends in Computer
Science and information Technology-2012 (ETCSIT-2012)”
Professionals, academic researchers presented and discussed their
conceptual and experimental work. The conference provided a forum for
eminent academicians, technologist, scientists and researchers to exchange
their ideas on the latest developments and future trends in Computer Science
and IT. ETCSIT-2012 also provided a platform for UG and PG students &
encouraged them to preset their work based on final year project.
(L to R) : Prof. N M Shahabe, Dr. Uday Wad, Dr. Parvati Rajan, Dr. Bhargave,
Prof. Dr. S S Sane, Mr. Shekhar Paranjape, Prof. S M Kamalapur, Prof.M B Jhade
MET’S INSTITUTE OF ENGINEERING, NASHIK (REGION-VI)
Dr. M U Kharat, Mr. Shirode, and Dr. V P Wani
9 -10 February 2012: Student Convention
For the first time in 12 years, the CSI Regional Convention for the Region VI
was held. Mr. Shirode enlightened students with his experiences of all-round
engineering and his 360 degree principle to look at the world. Dr. V P Wani
with his motivating words asked the students to give their 100% efforts in
whatever competition they participate and make the competition tougher.
During the Convention, IT Quiz, Paper Presentation, Circuit Trap, website
design Contest, and Group Discussion contest were organized.
(L to R): Dr. Shirish S Sane, Dr. V P Wani, Mr. Shirode,
Mr. Anil Shukla, Mr. Mangesh Pisolkar, and Prof. Aruna Deogire
20 March 2012: Project on “MLearning Framework for Multiple Platforms”
MLearning project won first prize in CSI- Discover Thinking National Project
Student Contest and Expo 2012. Arpeet Kale, Saurabh Rawal, Jaspreet
Kaur Kohli & Komal Bafna, who are students from Computer Engineering,
developed this Mobile Application. These students developed a framework,
which will deliver engineering education on mobiles through high quality
2D-3D animations, interactive learning content and many more such
features.
Winners: Arpeet Kale and Jaspreet Kaur Kohli with Dr. Trimurthi and other Judges
GOVERNMENT ENGINEERING COLLEGE(GEC), BARTON HILL, TRIVANDRUM (REGION-VII)
Mr. NabeelKoya A, Dr. K C Chandrasekharan Nair, and
Mr. Shibin George
15 February 2012: One-day Technical Festival "Inceptra 2012"
Mr. Nabeel Koya deliberated on Cyber Security and Forensics in the current
scenario. Dr. K C Chandrasekharan Naira talked on Student Entrepreneurship
and opportunities open to them. Mr. Shibin George conducted a general quiz
competition. Events included Bug Hunt, a technical competition involving
cryptography to debugging; LOL Codes, a coding test on rare and useful
programming languages; and Cascade Coding, a challenge on parallel
programming. Competitions on Project Presentation and Gaming were also
conducted as a part of the festival.
(L to R): Mr. Anand Kumar, Prof. Jayaprakash P, Prof. G Ramachandran,
Dr. Sheela S, Prof. Balu John, and Ms. Sreelakshmi G S
CSII Co
CS
Comm
Communications
mmun
mm
unic
un
ic
catio
attions
nss | Ma
May
y2
2012
012
01
2 | 46
6
www.
ww
w.cs
w.
csics
i-in
indi
in
dia
di
a.or
a.or
org
g
www.csi-india.org
SPEAKER(S)
TOPIC AND GIST
JYOTHI ENGINEERING COLLEGE(JEC), THRISSUR, KERALA. (REGION-VII)
Dr. Gylson and Mr. Chaitany Khanpur
24-25 February 2012: Two-days Workshop on "Cloud Computing"
Principal, Dr. Gylson Thomas inaugurated the Two-day National
workshop on "Cloud Computing". Mr. Chaitany Khanpur gave a deep
and interactive class about Cloud computing from the basics of cloud
computing and grid computing. Students also got a hands-on session for
implementing private cloud.
During the workshop
KALASALINGAM UNIVERSITY, TAMILNADU (REGION-VII)
Dr. Maluk Ahamed and Dr. Kalaiselvi
28-29 March 2012: Digital Dreams ’12 – National Level Technical Symposium
Dr. Maluk advised students to acquire knowledge about their field by
attending Symposiums and Seminars and stressed the importance of
maintaining quality standard. In the Symposium, 51 papers were presented.
Themes were Distributed Computing, Network Technology, Image Processing
and AI techniques. Dr. M. A. Maluk Ahamed delivered lecture on Distributed
Computing and Dr. Kalaiselvi spoke on “Medical Imaging”. Other events
included Technical Quiz, C-Debugging, Trailer Presentation, Web Designing,
Situation Manager and Treasure Hunt.
Dr. M A Maluk Mohammed releases the souvenir of Technical Symposium
MAR BASELIOS COLLEGE OF ENGINEERING (MBCET), TRIVANDRUM (REGION-VII)
24 February 2012: Intercollegiate Code Debugging Contest “Neosoft”
The competition consisted of two rounds: the prelims and the final round.
The prelim was a written round, testing the logical and analytical skills of the
participant. The final round was a practical round consisting of three questions,
testing the logical, innovative thinking, and team work of the participating
teams.
Code Debugging competition in progress
NATIONAL ENGINEERING COLLEGE (NEC), KOVILPATTI (REGION-VII)
Mr. M K Anand
22 March 2012: Inaugural Function – “National Conference NACCA’12”
The Mr. M K Anand inaugurated the conference and addressed the gathering.
In his speech, he advised the students not only to look for jobs but also they
must concentrate on self-employment with innovative ideas. The inaugural
session was followed by the technical sessions in which advanced topics
like Grid Computing, Mobile Computing, Soft Computing, and Distributed
Computing were presented.
Release of Conference Proceedings by Chief Guest Mr. M K Anand
(L to R): Ms. E Siva Sankari, Dr. D Manimegalai, Dr. P Subburaj,
Mr. M K Anand, Dr. Kn. K S K Chockalingam, and Mr N BalaSubramanian
CSII Co
C
CS
C
Communications
mmun
mm
unic
un
icat
atio
at
ions
ns | Ma
May
y2
201
2012
012
01
2 | 47
Following new student branches were opened as detailed below –
REGION I

Model Institute of Engineering and Technology (MIET), Jammu - First CSI student branch in Jammu & Kashmir was
inaugurated on 24th March, 2012. On this occasion CSI convention on Disaster Management and e-Governance was
organized. Two projects by MIET students showcased on the occasion were - a “Social Network Promoting Social
Responsibility” by Sajan Sridhar and “Election Management” by Sumit Gupta. Prof. Ankur Gupta described several IT
initiatives at MIET including filing of 3 patents; in-house development of 2 IT products; and 3 open source IT projects
undertaken pertaining to learning management, campus ERP, and admission management systems.
REGION III

NRI Institute of Technology and Management (NRIITM), Gwalior - Dr. S K Gumasta gave an inaugural speech on
the occasion of opening a new student branch at NRIITM on 17th February, 2012. A seminar was jointly organized
by NRIITM, Gwalior and CSI Gwalior chapter on this occasion.
REGION V

REVA Institute and Technology Management (RITM), Bangalore - Inauguration of REVA CSI Student Branch was
held on 11th February, 2012. The Chief Guest of the function was T N Seetharamu, who inaugurated the student
chapter. On the occasion, Mr. Suman Kumar delivered a talk on “Android – The Mobile Technology”, which was
attended by a large number of students, faculty, and staff members of the college.
REGION VI

Institute of Management and Entrepreneurship Development (IMED), Pune - On 29th March, 2012 Inaugural
ceremony of “IMED-Student Chapter-CSI” was held in the presence of Dr. M S Prasad and Dr. M V Shitole. Chief
Guest of the ceremony was Mr. C G Sahasrabuddhe. Mr. Amit Dangle was guest of honor.
REGION VII

S. Veerasamy Chettiar College of Engineering and Technology, Tirunelveli - The Inaugural function of student
branch was organized on 29th February, 2012. The Chairman Dr. V Murugaiah presided over the function. Mr. Y
Kathiresan spoke on the occasion on “Personal Effectiveness”.
CSI BRINGS MEMBERS AND OPPORTUNITY TOGETHER
Computer Society of India is the recognized association for Information and
Communications Technology (ICT) professionals, attracting a large and active
membership from all levels of the industry. A member of the Computer Society of India
is the public voice of the ICT profession and the guardian of professional ethics and
standards in the ICT industry. We also work closely with other industry associations,
government bodies, and academia to ensure that the benefits of IT advancement
ultimately percolate down to every single citizen of India. Membership demonstrates
IT professionalism and gives a member the status and recognition deserved.
Join
CSI
Learn more at www.csi-india.org
I am interested in the work of CSI. Please send me information on how to become an individual/institutional*
member
Name ______________________________________ Position held_______________________
Address______________________________________________________________________
______________________________________________________________________
City ____________Postal Code _____________
Telephone: _______________ Mobile:_______________ Fax:_______________ Email:_______________________
*[Delete whichever is not applicable]
Interested in joining CSI? Please send your details in the above format on the following email address. [email protected]
CSI Communications | May 2012 | 48
www.csi-india.org
CSI Calendar
2012
Date
Prof. S V Raghavan
Vice President & Chair, Conference Committee, CSI
Event Details & Organizers
Contact Information
May 2012 Events
22-26 May 2012
Workshop on Configuring and Administering Microsoft Share Point 2010
CSI Mumbai Chapter
Mr. Abraham Koshy
[email protected]
24-27 May 2012
Certificate Course on PMP (Project Management) 4.0 (36 Hours of PDU's)
CSI Mumbai Chapter
Mr. Abraham Koshy
[email protected]
26-27 May 2012
Two - Day Workshop on "Secure Computing Systems"
CSI Division II [Software] and Military College of Telecommunication Engineering
[MCTE], Mhow.
Dr. T V Gopal
[email protected]
June 2012 Events
8-12 June 2012
Hands on workshop on Microsoft Share Point 2010, Application Development
CSI Mumbai Chapter
Mr. Abraham Koshy
[email protected]
13 June 2012
Software Process Information Network (SPIN) Meet on the topic of Advance Agile
Methodology (Scrum etc)
CSI Mumbai Chapter
Mr. Abraham Koshy
[email protected]
21-24 June 2012
Certificate Course on PMP (Project Management) 4.0 (36 Hours of PDU's)
CSI Mumbai Chapter
Mr. Abraham Koshy
[email protected]
July 2012 Events
26-28 July 2012
International Conference on Advances in Cloud Computing (ACC-2012)
CSI, Bangalore Chapter and CSI Division I
Dr. Anirban Basu
[email protected]
Dr. C R Chakravarthy
[email protected]
August 2012 Events
31 Aug-1 Sep
2012
3rd International Conference on Transforming Healthcare with IT
CSI Division II (Software), Hyderabad
Dr. T V Gopal
[email protected]
www.transformhealth-it.org
September 2012 Events
5-7 September
2012
International Conference on Software Engineering (CONSEG 2012)
CSI Division II (Software), Indore
13-14 September Global Science and Technology Forum Business Intelligent Summit and Awards
2012
CSI Division II (Software), Singapore
Dr. T V Gopal
[email protected]
www.conseg2012.org
Dr. T V Gopal
[email protected]
www.globalstf.org/bi-summit
November 2012 Events
29 Nov-1 Dec
2012
Third International Conference on Emerging Applications of Information Technology
(EAIT 2012)
CSI Kolkata Chapter Event at Kolkata, URL: https://sites.google.com/site/csieait2012/
D P Mukherjee/Debasish Jana/
Pinakpani Pal/R T Goswami
[email protected]
December 2012 Events
1-2 December
2012
47th Annual National Convention of CSI (CSI 2012)
Organized by CSI Kolkata Chapter, URL: http://csi-2012.org/
Subimal Kundu/D P Mukherjee/
Phalguni Mukherjee/J K Mandal
[email protected]
14-16 December
2012
International Conference on Management of Data (COMAD-2012)
SIGDATA, CSI, Pune Chapter and CSI Division II
Mr. C G Sahasrabudhe
Shekhar_sahasrabudhe@
persistent.co.in
Please send your event news to [email protected] . Low resolution photos and news without gist will not be published.
Please send only 1 photo per event, not more. Kindly note that news received on or before 20th of a month will only be
considered for publishing in the CSIC of the following month.
Registered with Registrar of News Papers for India - RNI 31668/78
Regd. No. MH/MR/N/222/MBI/12-14
Posting Date: 10&11 every month. Posted at Patrika Channel Mumbai-I
If undelivered return to :
Samruddhi Venture Park, Unit No.3,
4th floor, MIDC, Andheri (E). Mumbai-400 093
47th Annual National Convention of the Computer Society of India
Organized by The Kolkata Chapter
December 1-2, 2012, Science City, Kolkata
In conjunction with 2012 Third International Conference on
Emerging Applications of Information Technology (EAIT-2012)
Call for Paper and Participation
Advisory Committee
R N Lahiri, Chair
Event Chair
Subimal Kundu
Organizing Committee
D P Mukherjee, Chair
S Sinha, Co-Chair
Program Committee
P Mukherjee, Chair
J K Mandal, Co-Chair
Finance Committee
R T Goswami, Chair
D Dutta, Co-Chair
Convention Committee
S Daspal
D P Sinha
S Roychowdhury
Avik Bose
Anirudhha Nag
Prashant Verma
Gurudas Nag
Gautam Hajra
Md Aliullah
Chinmay Ghosh
T Chattopadhyay
Subir Lahiri
Debasish Jana
Pinakpani Pal
Convention Website:
http://csi-2012.org/
Paper Submission:
Aug 30, 2012
Paper Acceptance:
Sept 30, 2012
Please contact:
CSI Kolkata Chapter
5 Lala Lajpat Rai Sarani (Elgin Road),
4th Floor, Kolkata 700 020
Phone: 2281-4458
Telefax: 2280-2035
Email: [email protected]
Web: http://csi-kolkata.org/
Convention Theme:
Intelligent Infrastructure
Convention Event:
International Conference on Intelligent Infrastructure
The Computer Society of India Kolkata Chapter (CSIKC) cordially invites you to
participate in the 47th Annual National Convention of CSI. While this event will follow
the glorious footsteps of previous conventions, it would still be a unique event focussing
on the theme of Intelligent Infrastructure.
CSI and CSI Kolkata Chapter: Formed in 1965, the CSI has been instrumental in guiding
the Indian IT industry since its formative years. CSIKC is the oldest chapter and the first
CSI Annual National Convention was held in Kolkata at the Indian Statistical Institute
in 1965. To commemorate the achievement of CSI, CSIKC will host the CSI-2012. The
event will comprise of Plenary Sessions, Paper Presentations and Panel Discussions.
Intelligent Infrastructure: Compelling changes in society and nature require
unprecedented fusion between the physical and the virtual worlds. Today’s society is a
complex system of systems; it is a combination of economic development, public safety,
healthcare, energy and utilities, transportation, education and various other systems.
The function of intelligent infrastructure is to model as well as manage these complex
interconnected systems based on a greater understanding of the interconnectivity and
utilisation of the latest developments in ICT. The inter-disciplinary nature of intelligent
infrastructure provides a great deal of opportunity for creative approaches to problem
solving. The International Conference on Intelligent Infrastructure in CSI-2012 aims to
provide a platform for fruitful deliberations on this theme of the hour.
The theme includes (but not limited to) following topics:
•
Intelligent Infrastructure Applications
°
°
°
°
°
•
•
Precision Agriculture and Smart Growth Systems
Smart Grids and Wide Area Measurement Systems
Intelligent Building Automation Systems
Intelligent Energy and Water Management Systems
Intelligent Manufacturing, Healthcare, Transportation Systems
Intelligent Infrastructure Technologies
° Smart Structures, Federated Devices, Sensor Signal Processing and Modelling
° Miniature Wireless Sensors and Networks, Nanoscale Sensors
° Security Issues in Smart Infrastructures, Smart GIS
° Computational and Machine Intelligence Tools
Intelligent Infrastructure Platforms
° Sensor Web-enablement, Sensor Data Analytics
° Management of Big Data and Associated Development Technologies
° Next Generation Data Centre Technologies for the Exascale Era
Conference in Conjunction:
2012 Third International Conference on
Emerging Applications of Information
Proceedings: Original unpublished research articles, development notes
Technology (EAIT-2012)
and position papers aligned with the theme of the convention will be
Nov 29 – Dec 01, 2012, Indian Statistical
published in the Proceedings of the International Conference on Intelligent
Institute, Kolkata
Infrastructure. The author instructions for paper submission are available at
EAIT-2012 Website:
https://www.sites.google.com/site/csieait2012 http://csi-2012.org/.
Media Partner for twin mega events
EAIT-2012 and CSI-2012
CSI Communications | May 2012 | 50
Journal Special Issues: Extended versions of the selected papers presented
in the conference will be published in Journals. CSI Journal of Computing
(ISSN: 2277-7091) will publish a special issue on Intelligent Infrastructure
after FAST TRACK review of selected papers from the conference.