CSIC 2015( March ) - Computer Society Of India

Transcription

CSIC 2015( March ) - Computer Society Of India
` 50/ISSN 0970-647X | Volume No. 38 | Issue No. 12 | March 2015
www.csi-india.org
Cover Story
Machine Translation System –
An Indian Perspective 7
Technical Trends
Role of Machine Translation for
Multilingual Social Media 12
Research Front
Different Approaches for Word Sense
Disambiguation: A Main Process in
Machine Translation 19
Article
Routing Challenges in
Internet of Things 26
Security Corner
A Case Study of Kachwala
Mistry & Partners 35
Security Corner
Photographing a Woman without
her Consent - No Law in India to
Prosecute 39
CSI Communications | March 2015 | 1
Computer Society of India, Chennai
IEEE Computer Society, Madras
IEEE Professional Communication Society, Madras
Results of Student Essay Contest on
Harnessing the Power of ICT for our New Initiatives
Computer Society of India, Chennai Chapter, in association with the IEEE Computer Society, Madras and IEEE Professional
Communication Society, Madras conducted an Essay Contest in two streams: Stream 1: Open to School Students (from 8th
Standard to Plus 2); and Stream 2: Open to College/Polytechnic Students (UG/PG students of all disciplines).
The participants had the option of submitting an essay on “ICT for Digital India” or “ICT for Make in India” or “ICT for
Clean India” by 31st Jan 2015.
Submitted essays were evaluated on criteria such as originality, novelty, applicability, potential value of the proposed
idea(s) and clarity and style of presentation by a panel consisting of Mr. Ramesh Gopalaswamy (Author, Consultant and
Guest Faculty, IIT Madras), Mr. Pramod Mooriath (President, Qatalys Software Tech & Chair, CSI Chennai), Ms. Latha
Ramesh (VP-Academic Engagement & Service Delivery, Classle Knowledge Pvt Ltd & Past Chair, CSI Chennai) and
Mr. K. Adhivarahan (ICT Consultant & Past Chair, CSI Chennai).
We present below in the table, the winners of the first three prizes in each stream. Consolation prizes of
Rs. 1000/= and Certificates of Merit are also have been announced to a select number of participants. For the full
list pl. visit http://goo.gl/FziCmK Our congratulations to all the winners and thanks to all the participants.
Prize
Amount (Rs)
School Stream Winners
College Stream Winners
First
10,000
Karthik Balaji M
St. John’s Public School, Chennai
Ganesh L
Panimalar Inst. of Tech, Chennai
Second
5,000
Sanjana Lakshmi CN
St. John’s Public School, Chennai
Swati Kesarwani
Shambhunath Inst. of Engg & Tech, Lucknow
Second
5,000
Shlok Prakash
Kendriya Vidyalaya, Chennai
Vipin Paul
Mount Zion College of Engg & Tech, Pudukkottai
Third
2,500
Vijayalakshmi Sundar
Pushpalata Vidya Mandir, Tirunelveli
Akhila Sai V
Panimalar Engg. College, Chennai
Third
2,500
Vineel Tipirneni
Sri Chaitanya School, Vijayawada
Manjula S
College of Engg. , Anna Univ., Chennai
Third
2,500
Gowri R
SDAV Hr. Sec School, Chennai
Sivabalan KC
Tamilnadu Agricultural Univ., Coimbatore
Third
2,500
Mangala Shenoy K
HHSIBS Hr. Sec School, Kasaragod
Jayamathan S
Sri Ramakrishna Engg. College, Coimbatore
We would like to thank Dynamic Group, Anjana Software Solutions Pvt. Ltd, HP Networking, Cognitive Platform Solutions
(CPS) Pvt Ltd, Orbit Innovations and CloudReign Technologies for the generous sponsorship of the prizes.
Our thanks to Prof. San Murugesan (Adjunct Professor, University of Western Sydney, Australia) and Mr. S. Ramasamy
(GM, Great Lakes Institute of Management & Past RVP-VII and Past Chair, CSI Chennai) for their support in the
successful conduct of this essay contest. We also take the opportunity to thank all those who had helped us in this contest
and facilitated the participation.
The prize winning essays are being hosted at the website at http://goo.gl/FziCmK and the ideas presented in them will be
shared with various agencies for possible implementation.
The prize money and the certificates will be sent to the winners during March 2015. Queries if any in this regard may be
sent to [email protected]
H.R. Mohan
Convener, Student Essay Contest
CSI Communications
Contents
Volume No. 38 • Issue No. 12 • March 2015
Editorial Board
Chief Editor
Dr. R M Sonar
Editors
Dr. Debasish Jana
Dr. Achuthsankar Nair
Resident Editor
Mrs. Jayshree Dhere
Published by
Executive Secretary
Mr. Suchit Gogwekar
For Computer Society of India
Design, Print and
Dispatch by
CyberMedia Services Limited
Cover Story
7
10
12
Machine Translation System – An Indian
Perspective
16
Machine Translation: Amazing Blend
of Knowledge-Based Algorithms and
Information Technology
Elizabeth Sherly
An Overview of Machine Translation
Arun Kumar N
Research Front
Data Compression –An Overview and
Trends in Genomics
Biji C L and Manu K Madhu
Please note:
CSI Communications is published by Computer
Society of India, a non-profit organization.
Views and opinions expressed in the CSI
Communications are those of individual authors,
contributors and advertisers and they may
differ from policies and official statements of
CSI. These should not be construed as legal or
professional advice. The CSI, the publisher, the
editors and the contributors are not responsible
for any decisions taken by readers on the basis of
these views and opinions.
Although every care is being taken to ensure
genuineness of the writings in this publication,
CSI Communications does not attest to the
originality of the respective authors’ content.
© 2012 CSI. All rights reserved.
Instructors are permitted to photocopy isolated
articles for non-commercial classroom use
without fee. For any other copying, reprint or
republication, permission must be obtained
in writing from the Society. Copying for other
than personal use or internal reference, or of
articles or columns not owned by the Society
without explicit permission of the Society or the
copyright owner is strictly prohibited.
Innovations in India
34
Collaborative Invention Mining Make Your Ideas Patentable
Taruna Gupta and Jyothi Viswanathan
Security Corner
35
Case Studies in IT Governance, IT Risk
and Information Security »
A Case Study of Kachwala Mistry &
Partners
Vishnu Kanhere
38
IT Act 2000»
Electronic/Digital Evidence & Cyber
Law- Part 2
Prashant Mali
Routing Challenges in Internet of
Things
IT Act 2000»
Photographing a Woman without
her Consent - No Law in India
to Prosecute
Amol Dhumane and Rajesh Prasad
Prashant Mali
Articles
26
28
Programming.Tips() »
Geometric Transformations in ‘C’
using OpenGL Graphics API
Bharti Trivedi
Different Approaches for Word Sense
Disambiguation: A Main Process in
Machine Translation
Sunita Rawat and Manoj Chandak
22
32
Technical Trends
Hardik A Gohel
Richa Sharma and T R Gopalakrishnan Nair
Practitioner Workbench
Role of Machine Translation for
Multilingual Social Media
D G Jha
19
30
Intelligence for Diagnostic Imaging
in the Medical World
39
Secured Outsourcing Data &
Computation to the Untrusted Cloud
– New Trend
Sumit Jaiswal, Subhash Chandra Patel and
Ravi Shankar Singh
PLUS
Brain Teaser
40
Dr. Debasish Jana
Happenings@ICT
41
H R Mohan
CSI Reports
44
CSI News
45
Published by Suchit Gogwekar for Computer Society of India at Unit No. 3, 4th Floor, Samruddhi Venture Park, MIDC, Andheri (E), Mumbai-400 093.
Tel. : 022-2926 1700 • Fax : 022-2830 2133 • Email : [email protected] Printed at GP Offset Pvt. Ltd., Mumbai 400 059.
CSI Communications | March 2015 | 3
Know Your CSI
Executive Committee (2013-14/15)
President
Mr. H R Mohan
[email protected]
»
Vice-President
Prof. Bipin V Mehta
[email protected]
Hon. Secretary
Mr. Sanjay Mohapatra
[email protected]
Hon. Treasurer
Mr. Ranga Rajagopal
[email protected]
Immd. Past President
Prof. S V Raghavan
[email protected]
Nomination Committee (2014-2015)
Prof. P. Kalyanaraman
Mr. Sanjeev Kumar
Mr. Subimal Kundu
Region - I
Mr. R K Vyas
Delhi, Punjab, Haryana, Himachal
Pradesh, Jammu & Kashmir,
Uttar Pradesh, Uttaranchal and
other areas in Northern India.
[email protected]
Region - II
Mr. Devaprasanna Sinha
Assam, Bihar, West Bengal,
North Eastern States
and other areas in
East & North East India
[email protected]
Region - III
Prof. R P Soni
Gujarat, Madhya Pradesh,
Rajasthan and other areas
in Western India
[email protected]
Region - V
Mr. Raju L kanchibhotla
Karnataka and Andhra Pradesh
[email protected]
Region - VI
Dr. Shirish S Sane
Maharashtra and Goa
[email protected]
Region - VII
Mr. S P Soman
Tamil Nadu, Pondicherry,
Andaman and Nicobar,
Kerala, Lakshadweep
[email protected]
Regional Vice-Presidents
Division Chairpersons
Division-I : Hardware (2013-15)
Prof. M N Hoda
[email protected]
Division-II : Software (2014-16)
Dr. R Nadarajan
[email protected]
Division-IV : Communications
(2014-16)
Dr. Durgesh Kumar Mishra
[email protected]
Division-V : Education and Research
(2013-15)
Dr. Anirban Basu
[email protected]
Region - IV
Mr. Hari Shankar Mishra
Jharkhand, Chattisgarh,
Orissa and other areas in
Central & South
Eastern India
[email protected]
Publication Committee (2014-15)
Dr. S S Agrawal
Prof. R K Shyamasundar
Prof. R M Sonar
Dr. Debasish Jana
Dr. Achuthsankar Nair
Dr. Anirban Basu
Division-III : Applications (2013-15)
Prof. A K Saini
Dr. A K Nayak
Prof. M N Hoda
[email protected]
Dr. R Nadarajan
Dr. A K Nayak
Dr. Durgesh Kumar Mishra
Mrs. Jayshree Dhere
Chairman
Member
Member
Member
Member
Member
Member
Member
Member
Member
Member
Member
Important links on CSI website »
About CSI
Structure and Orgnisation
Executive Committee
Nomination Committee
Statutory Committees
Who's Who
CSI Fellows
National, Regional & State
Student Coordinators
Collaborations
Distinguished Speakers
Divisions
Regions
Chapters
Policy Guidelines
Student Branches
Membership Services
Upcoming Events
Publications
Student's Corner
CSI Awards
CSI Certification
Upcoming Webinars
About Membership
Why Join CSI
Membership Benefits
BABA Scheme
Special Interest Groups
http://www.csi-india.org/about-csi
http://www.csi-india.org/web/guest/structureandorganisation
http://www.csi-india.org/executive-committee
http://www.csi-india.org/web/guest/nominations-committee
http://www.csi-india.org/web/guest/statutory-committees
http://www.csi-india.org/web/guest/who-s-who
http://www.csi-india.org/web/guest/csi-fellows
http://www.csi-india.org/web/guest/104
http://www.csi-india.org/web/guest/collaborations
http://www.csi-india.org/distinguished-speakers
http://www.csi-india.org/web/guest/divisions
http://www.csi-india.org/web/guest/regions1
http://www.csi-india.org/web/guest/chapters
http://www.csi-india.org/web/guest/policy-guidelines
http://www.csi-india.org/web/guest/student-branches
http://www.csi-india.org/web/guest/membership-service
http://www.csi-india.org/web/guest/upcoming-events
http://www.csi-india.org/web/guest/publications
http://www.csi-india.org/web/education-directorate/student-s-corner
http://www.csi-india.org/web/guest/csi-awards
http://www.csi-india.org/web/guest/csi-certification
http://www.csi-india.org/web/guest/upcoming-webinars
http://www.csi-india.org/web/guest/about-membership
http://www.csi-india.org/why-join-csi
http://www.csi-india.org/membership-benefits
http://www.csi-india.org/membership-schemes-baba-scheme
http://www.csi-india.org/special-interest-groups
Membership Subscription Fees
Membership and Grades
Institutional Membership
Become a member
Upgrading and Renewing Membership
Download Forms
Membership Eligibility
Code of Ethics
From the President Desk
CSI Communications (PDF Version)
CSI Communications (HTML Version)
CSI Journal of Computing
CSI eNewsletter
CSIC Chapters SBs News
Education Directorate
National Students Coordinator
Awards and Honors
eGovernance Awards
IT Excellence Awards
YITP Awards
CSI Service Awards
Academic Excellence Awards
Contact us
http://www.csi-india.org/fee-structure
http://www.csi-india.org/web/guest/174
http://www.csi-india.org /web/guest/institiutionalmembership
http://www.csi-india.org/web/guest/become-a-member
http://www.csi-india.org/web/guest/183
http://www.csi-india.org/web/guest/downloadforms
http://www.csi-india.org/web/guest/membership-eligibility
http://www.csi-india.org/web/guest/code-of-ethics
http://www.csi-india.org/web/guest/president-s-desk
http://www.csi-india.org/web/guest/csi-communications
http://www.csi-india.org/web/guest/csi-communicationshtml-version
http://www.csi-india.org/web/guest/journal
http://www.csi-india.org/web/guest/enewsletter
http://www.csi-india.org/csic-chapters-sbs-news
http://www.csi-india.org/web/education-directorate/home
http://www.csi-india.org /web/national-studentscoordinators/home
http://www.csi-india.org/web/guest/251
http://www.csi-india.org/web/guest/e-governanceawards
http://www.csi-india.org/web/guest/csiitexcellenceawards
http://www.csi-india.org/web/guest/csiyitp-awards
http://www.csi-india.org/web/guest/csi-service-awards
http://www.csi-india.org/web/guest/academic-excellenceawards
http://www.csi-india.org/web/guest/contact-us
Important Contact Details »
For queries, correspondence regarding Membership, contact [email protected]
CSI Communications | March 2015 | 4
www.csi-india.org
President’s Message
H R Mohan
From
: President’s Desk:: [email protected]
Subject : President's Message
Date
: 1st March 2015
Dear Members
Let me begin my message by congratulating Ms. Mini Ulanat, our
National Student Coordinator and past chairperson of CSI Kochi for having
been selected to receive the prestigious Chevening fellowship to attend
the TCS Cyber Security Programme, a 12 week intensive course starting
during the last week of Feb 2015 at Cranfield University, Defence Academy
of United Kingdom, UK. Ms. Mini has expressed her desire to share her
learnings and spread awareness about these important topics on her return
from the programme.
I had the opportunity of participating in the CSI@50 celebrations and the
TechNext India 2014-15 convention on the theme “IT Education Solemnised”
organized by CSI Mumbai during 31st Jan – 1st Feb 2015 in association
with FOSSEE and IIT Bombay. Mr. B. N. Satpathy, Sr. Advisor, NITI Aayog
(Planning Commission) had inaugurated the convention and explored the
possibilities of CSI working with NITI Aayog in its various initiatives. Shri.
D. Sivanandhan, IPS, Former Director General of Police, Maharashtra, in his
keynote address on “Ever Moving Boundary of Cyber Security” presented the
realities and urged that the Govt. and professional societies like CSI should
work together in creating awareness in areas of cyber security. The convention,
in terms of technical content, was comparable to our annual convention, run
very professionally with multiple parallel tracks catering needs of various
stakeholders like professionals, academics and students. The major highlight
of the convention was the Principal Roundtable Meet on “Online Education
(MOOCs)”, where both the academic and industry participants deliberated
on the need, relevance, trends and future of e-learning. Prof. D.B. Phatak, our
Fellow and Padma Shri award recipient, in his address stated the possibilities of
MOOCs being accepted by educational agencies in our conventional systems
of education and stressed the need for preparing our teachers for Blended
Learning and invited CSI to be a part in the online training progamme being
planned by IIT Bombay. The convention which also had a mini exhibition
attracted around 500 participants. The team TechNext India under the
leadership of Chairman Mr. Sandip Chintawar and Vice Chairman Dr. Suresh
Chandra J Gupta deserve appreciation for their efforts.
I congratulate the CSI student branch of University of Petroleum and
Energy Studies (UPES), Dehradun for having taken a lead in organizing the
State level, Regional level and the National level Student Convention in a row
and announcing to organize NGCT-2015, the 1st International Conference
on Next Generation Computing Technologies during Sep 2015. The NSC
held during 5-6, Feb 2015 with a focus on Cyber Security attracted over 400
student participants from various parts of the country. I thank the Chair & Vice
Chair of CSI Dehradun Dr. T. N. Jowhar, and Dr. Vinay Avasthi, Mr. R.K. Vyas,
RVP-1 and Dr. M.N. Hoda, Chair-Div I, CSI for their active role in promoting CSI
in the northern part of India by organizing such events.
The CSI Delhi and The National Capital Region (NCR) CSI chapters are
systematically planning for the Golden Jubilee convention, the 50th annual
convention -- CSI-2015 during the first week of Dec 2015 by organizing a series
of events every month for the last few months and building up the tempo for the
annual meet. As a part of this, a meeting was organized on 16th Feb 2015 on
the theme “Make in India” with Dr. Ajay Kumar, IAS, Joint Secretary & Director
General, NIC as the Chief Guest. Dr. Ajay Kumar, in his address highlighted
the importance of the “Make in India” initiative and the GOI’s steps such as
supporting research, nurturing innovation and startups, providing Internet
access, promoting Net neutrality. He added that DeitY currently works with CSI
in many of its initiatives and will continue to do so in future. A panel of industry
and academic experts deliberated on how to go about making the “Make in
India” a successful initiative. I reiterated that, in this initiative out of 25 sectors
identified though only two directly relate to ICT, the importance of the role of ICT
in the remaining 23 sectors cannot be underestimated. As a part of spreading
awareness about these important initiatives of Government, CSI conducted a
Student Essay Contest on the themes “Make in India”, “Digital India” and “Clean
India”. There was enthusiastic participation with over 200 participants. The
results of this contest are now available at http://goo.gl/FziCmK. We plan to
compile the views of these young and creative minds and present them to DeitY.
At the end of this meeting Mr. S.D. Sharma, the Chair of CSI Delhi, few OBs of
NCR CSI Chapters and Execom members based at Delhi briefed on the progress
of CSI-2015 related activities. I am confident that the mile stone event, 50th
annual convention will be a grand and memorable one.
I had the opportunity of interacting with Dr. Anant Agarwal, CEO of edX
and Professor at MIT when he was at Chennai delivering a leadership lecture
at IIT Madras on “Reinventing Education” providing an overview of MOOCs
and edX which aspires to reinvent education through online learning. The edX
initiative whose mission include increasing access to education for students
worldwide through MOOCs, substantially enhancing campus education in
both quality and efficiency through blended online approaches has partners
throughout the world including IITs in India is keen in having CSI as a partner
in India. In line with this, the need for standards in e-learning were discussed
in the recently held Sectional Committee meeting on e-learning of the BIS
Committee on Electronics and Information Technology at Delhi and it was felt,
India being a large country with wide diversity and language barriers should
take the lead and actively participate in making global standards. CSI with
a large number of student and academic members with Education Research
background and is ideally positioned to contribute to this initiative along with
the existing SIG on Technology Enhanced Learning.
One of the long standing dreams of mine is to have a publishing arm
at CSI – “CSI Press” in lines of other professional societies such as IEEE CS,
and ACM having IEEE CS Press, and ACM Press respectively and engage in
educational and knowledge sharing activities. Sustaining the society activities
with just membership fees and dwindling sponsorships is quite difficult.
Organizing state of art technology events at affordable cost across the country,
bringing out quality proceedings and high standard journals in emerging areas
of CSE and ICT, creating online courses, incubating ideas and creating IPs etc.,
will go in a long way in enhancing the value of our service to the society and
grow with financial stability. Our attempt to bring out CSI Transactions on
ICT with Springer is progressing well. During the CSI-2014 at Hyderabad and
subsequently at CSI ED at Chennai, I had discussions with APress, the book
publishing division of Springer who has shown interest in partnering with CSI
and help our dream to come true. With this initiative we can encourage our
industry and academic members having expertise and knowledge to publish
books and monograms under CSI Press for global readers and also make
them available to our large member community in the country. I urge all our
members to share their views on this important aspect.
As we approach the year end, a lot of student related competitions such
as Discover Thinking Programming Contest, Alan Turing Quiz Competition,
and Project Contests are run at different regions for talent recognition. CSI ED
along with NSC coordinated with SSCs, RSCs and conducted many of them
and few are at the finals stage. I thank all the individuals spearheaded in these
events and the SBs and chapters helped in the successful conduct of them.
The 5th edition of the IT Excellence Awards, our annual prestigious
industry IT projects recognition event received a phenomenal response from
industry across all verticals. The projects were evaluated by an eminent jury
panel including our knowledge partner, Deloitte. The winners were recognized
in a gala event recently held in Mumbai.
My one year term as President of CSI was enjoyable, challenging and
educative. I thank the Execom members, fellows and senior members, and
chapter OBs for their guidance and support in executing my responsibilities.
The CSIC editors and the board have done an excellent work in bringing out
quality publication in a timely manner and meeting the expectations of all the
stakeholders of CSI. The CSI HQ and CSI ED staffs have been very cooperative.
My industry and academic contacts have readily accepted my requests and
supported CSI events organized at various chapters and student branches. And
finally, I owe a lot to the members of CSI for having given me an opportunity
to serve CSI. I welcome the new Execom headed by Prof. Bipin V Mehta and
wish them to take CSI to a newer height. While this is my last message to
CSIC readers as President of CSI, I wish to continue my relationship through
my regular column in CSIC and through the CSI eNewsletter.
With best wishes and warm regards
H.R. Mohan
President
Computer Society of India
CSI Communications | March 2015 | 5
Editorial
Rajendra M Sonar, Achuthsankar S Nair, Debasish Jana and Jayshree Dhere
Editors
Dear Fellow CSI Members,
March 2015 issue marks four years since the present editorial
team of four of us - Dr. Rajendra M. Sonar, Dr. Achuthsankar S.
Nair, Dr. Debasish Jana and Mrs. Jayshree A. Dhere, took over CSI
Communications, and it will be our last issue. Our editorial board
took charge of CSI Communications from April 2011. In departing
grief, we murmur in the tune of William Shakespeare, ‘Farewell, my
dearest sister, fare thee well. The elements be kind to thee, and make
thy spirits all of comfort: fare thee well.’
‘Farewell, my dearest sister, fare thee well. The elements
be kind to thee, and make thy spirits all of comfort: fare
thee well.’
First of all we thank CSI for giving us this opportunity to be the
Editors of CSI communications. From April 2011 onwards, CSI
Communications started with a quest for rejuvenation with new
look, new content format, technically rich content with a mission to
change from merely news heavy newsletter to a technical magazine
with sufficient and adequate news section. CSIC being a magazine
for membership at large, the challenge was to provide technically rich
content at the level of general audience of varied member categories.
With this vision it transformed itself to become Knowledge Digest for
IT Community. Each issue got technically rich with mostly theme
based contributions in Cover Story, Technical Trends, Research
Front and Article sections. Added to that, we introduced columns
like Practitioner Workbench with sections like Programming.Tips(),
Programming.Learn() and Software Engineering.Tips(), Security
Corner with sections like Information Security and IT Act 2000, CIO
Perspective, HR, IT Industry Perspective, ICT@Society, Brain Teaser,
Ask an Expert, Happenings@ICT, On the Shelf!, Innovations in India
all these in addition to CSI News and Announcements that took a
smaller number of pages.
We got overwhelming response from all over India and abroad too.
Many stalwarts like Bjarne Stroustrup, Jeffrey Ullman, Grady Booch,
Ivar Jacobson, Philippe Kruchten, Narsingh Deo gladly contributed
either through article or exclusive interview and CSI Communications
became richer and richer in content. Dr. Sonar and Ms. Dhere were in
Mumbai, so they could meet each other, but the other two, Dr. Nair
from Kerala and Dr. Jana from Kolkata never met face to face with each
other and with other two Editors. There were hardly any meetings
among us, other than mail exchanges, yet the synergy that got
developed within the team continued with each issue being technically
supervised by one of Dr. Sonar, Dr. Nair or Dr. Jana with the backbone
support by Mrs. Dhere. Most of the time, CSIC got published in time,
very rarely got delayed by a day or two because of reasons not in our
control. In the process, we were careful in review process and didn’t
select just any article or contribution submitted. Plagiarism was a big
issue and we had to be selective in choosing the better ones.
Our first issue as a team was April 2011 issue with MAD (Mobile
Application Development) as the cover theme and now the last joint
issue of March 2015 is having cover theme of Machine Translation.
Machine Translation uses computers to translate from one natural
language to another. Linguistic rules govern the translation, rather
than translating word by word. The challenge lies in extracting the
meaning or semantics of the source language to translate into
the target. There are two broad categories of machine translation
techniques: rule-based (e.g. Systran) or statistical (e.g. Google
translate). Although Robert Frost said, ‘Poetry is what gets lost in
translation’, still, in spite of few limitations, today’s near perfect
machine translators play a promising role for the community at large.
Our cover story section is enriched with two articles – the first
one titled Machine Translation System – An Indian Perspective by Ms
Elizabeth Sherly providing insight about the theme in the Indian
context and the other one titled Overview of Machine Translation by
CSI Communications | March 2015 | 6
Arun Kumar N providing general overview along with brief about
translation tools and recent trends.
Our Technical Trends section has two articles. The first one titled
Role of Machine Translation for Multilingual Social Media elaborates
on the importance of machine translation in today’s all enticing
social media and is authored by Hardik A Gohel while the second
one is titled Machine Translation: Amazing blend of knowledgebased Algorithms and Information Technology and is written by
Prof (Dr.) D G Jha which explains how intricacies associated with
morphological analysis, syntactic analysis and content analysis
make machine translation quite complex.
In the Research Front section we have two articles – first one
titled Different approaches for Word Senses Disambiguation: A main
process in Machine Translation is written by Sunita Rawat and Manoj
Chandak. It throws light on various algorithmic techniques for
making sense from words. The second article is not directly related
to the theme. It is titled Data Compression –An Overview and Trends
in Genomics and is written by Biji C.L and Manu K. Madhu.
Our Article section brings to you three articles on varied
technical topics viz. Routing Challenges in Internet of Things by
Amol Dhumane, Dr. Rajesh Prasad; Secured Outsourcing Data &
Computation to the Untrusted Cloud – New Trend written by Sumit
Jaiswal, Subhash Chandra Patel, Dr. Ravi Shankar Singh and
Intelligence for Diagnostic Imaging in the Medical World by Richa
Sharma & T.R. Gopalakrishnan Nair.
Under Practitioner Workbench column, in Programming.
Tips() section there is an article by Bharti Trivedi on Geometric
Transformations in ‘C’ using OpenGL Graphics API. In Innovations
in India column we have an article by Taruna Gupta and Jyothi
Viswanathan of TCS on Collaborative Invention Mining – Make Your
Ideas Patentable which elaborates on using the IPR protecting
instrument of patent for protecting the ideas.
Although Robert Frost said, ‘Poetry is what gets lost in
translation’, still, in spite of few limitations, today’s near
perfect machine translators play a promising role for the
community at large.
In Security Corner column, in the continuing section of Case Studies
in IT Governance, IT Risk and IT Security we have a case study of a
firm called Kachwala Mistry & Partners who decide to opt for machine
translation solution and Dr. Vishnu Kanhere explains as to what
should be done so far as IT governance is concerned. In the IT Act
2000 section, there are two articles by Adv Prashant Mali – first on
modifications in the IT Act regarding Electronic/Digital Evidence
titled Electronic/Digital Evidence & Cyber Law – Part II and the other
titled Photographing a Woman without her Consent- No Law in India to
Prosecute, which throws light on legal status regarding the issue and
why there should be modification in the law.
We provide solution to the last month’s crossword on Quantum
Computing but regret to mention that there is no new crossword in
this issue since this is the last issue that we are editing. There are
other regular features such as Happenings@ICT written by Mr. H
R Mohan, CSI President, CSI Announcements and Calls for papers,
Chapter and Student Branch News and CSI Reports.
We thank all those who provided feedback to us for improving as well
as for encouraging and also to all contributors who helped build content
rich magazine and also to all those readers who enthusiastically looked
forward to receiving the magazine month after month.
Thanks once again and warm regards,
Rajendra M Sonar, Achuthsankar S Nair,
Debasish Jana and Jayshree Dhere
Editors
www.csi-india.org
Cover
Story
Elizabeth Sherly
Professor, IIITM-Kerala
Introduction
Machine Translation(MT) is a process
of automatically translating one natural
language to another natural language
without human intervention. The first
work on MT started in 1946 in breaking
enemy codes during World War II, but
after 60 years of research, MT is still an
open problem. Today the need of an
MT System is at the peak as we live in a
multilingual society in a global village,
which requires use of different languages
for communication, thereby national and
international boundaries are diminishing.
There are a number of pioneering projects
and research carried out in US and
Europe that had begun at University of
Washington, University of California and
Massachusetts Institute of Technology.
The first Machine translation system was
demonstrated by Georgetown University
in collaboration with IBM, that could
translate a carefully selected sample of
49 Russian sentences to English language.
During 1950 to 1960 the research in MT
progressed in leaps and bounds, then US
formed an advisory committee to examine
the prospects of MT, but found weak and
slower systems, that made major funding
on MT systems, virtually down. A decade
after a revival was laid down by adopting
different MT approaches and models
mainly dictionary based, rule based and
Statistical Hybrid MT System.
India has waken up to MT systems
bit late, just more than two decades
ago. India, having 22 official languages,
values its culture, heritage and language,
is greatly in need of local language support
so as to combat the dominance of English
in computing. India has had about 25 years
of history in language computing (LC)
which has gone through its ups and down.
But during the last decade there has been
a paradigm shift with a significant leap to
LC and MT system as a new computing
arena. Scientists who were somewhat
reluctant to take up language computing
for research turned out to choose LT as a
main stream of research and interestingly
industry giants like Microsoft and Google
entered into Language Computing in a
big way. So, a phenomenal shift in LT
has happened as Language Technology
tools become inevitable to enhance the
products and services in high growth
markets such as mobile application,
healthcare, IT services, financial services,
online retail, call centres, publishing and
media etc. In this article, some of the
major projects in MT system for Indian
languages along with the techniques and
models used are described.
Anglabharti (1991)
Angalabharti, first of its kind in Indian MT
system was developed by IIT, Kanpur with
the leadership of Prof. R M K Sinha, is a
multilingual machine aided translation
project on translation from English to
Indian languages, primarily Hindi. It uses
pattern directed approach using context
free grammar like structures. A `pseudotarget’ is generated which is applicable to
a group of Indian languages. Set of rules
are acquired through corpus analysis to
identify the plausible constituents with
respect to which movement rules for
the `pseudo-target’ are constructed. A
number of semantic tags are used to resolve
sense ambiguity in the source language.
The strategy used in ANGLABHARTI lies
in between the transfer and the interlingua
approach. It is better than the transfer
approach, as the translation is valid for a
host of target language sentences, but falls
short of genuine interlingua, in the sense
that it ignores complete disambiguation/
understanding of the text to be translated.
The English to Hindi Angalabharti system,
known as AnglaHindi, a web-enabled
system is available http://anglahindi.iiitk.
ac.in , which is used for domain specific
health for translation. Work is again
progressed for English to Telegu/Tamil
translation.
Mantra (1999)
Mantra (MAchiNe assisted TRAnslation)
MT System developed by C-DAC
Bangalore is to perform translation for
gazette notifications of Government and
Parliamentary proceedings from English
to Indian languages and vice versa.
They used a Lexicalized Tree Adjoining
Grammar (LTAG) to create the English
and Hindi grammar.
Tree Adjoining
Grammar (TAG) technique is used for
parsing and generation. It also preserves
the formatting of input word document
across the translation. The work is then
extended to Hindi-English and HindiBengali translation.
AnglaMT is a Rule Based Machine
Translation System, developed by CDAC,
designed for translating Text in English to
Indian languages with pseudo-interlingua
approach by IIT, Kanpur. It analyses English
only once and creates an intermediate
structure with most of the disambiguation
performed and is used to generate
Indian Language translated output. This
approach is adapted to create eight MT
systems with the support of TDIL, DeitY
by CDAC centres. Mantra, AnglaBharati
, MaTra are some of the other products
developed by C-DAC.
Frame Based System for Dravidian
Languages (1999)
The work is carried out in Cochin
University for Suman Mary Idikula’s
doctoral thesis by considering the Karaka
relations for sentence comprehension
with its semantico-syntactic relations
between the verbs and other related
constituents in a sentence. For Machine
Translation, source language is a free order
and the target language is of fixed order.
Here Malayalam as a source language and
English as a target language is considered.
It gives an elegant account of the relation
between vibakthi and karaka roles in
Dravidian languages.
Anusaraka (2000)
In order to find the similarity among
Indian languages for MT, a Translation
System was developed based on the
principles of Paninian Grammar by IIT
Kanpur in association with University
of Hyderabad. It is domain free but
the system has mainly been applied
for translating children’s stories.
An
alpha version
was deployed in five
regional languages Punjabi, Bengali,
Telugu,Kannada, and Marathi to Hindi.
Anusaaraka essentially maps local word
groups between the source and target
languages. Where there are differences
between the languages, the system
introduces extra notation to preserve the
information of the source language (Sudip
N. Sivaji B). The Anusaaraka project is
funded by Technology Development in
Indian Languages (TDIL), DeitY and IIIT
Hyderabad is continued its development
CSI Communications | March 2015 | 7
Fig.1: Malayalam to Tamil MT system
from English to Hindi under
supervision of Prof. Rajeev Sangal.
the
Angalabharati -II and Anubharati -II
(2004)
A modified Angalabharati and Anubharati
is
developed by IIT-Kanpur with a
different approach by addressing many
of the shortcomings of the earlier
architecture. The new approach is based
on Generalized Example-Base (GEB) for
hybridization besides a Raw Example-Base
(REB). The system first attempts a match
in REB and GEB before invoking the rulebased approach. Automated pre-editing
and paraphrasing are two additions to the
new translation system which resulted
into more accuracy and robustness. Now
the technology is transferred to eight
different pair of languages. Similarly for
Anubharati, the system is revised with a
varying degree of hybridization of different
paradigms for Hindi to other Indian
languages.
Shiva and Shakti MT System (2005)
Two machine translation systems from
English to Hindi, Shiva and Shakti are
being developed jointly by Carnegie
CSI Communications | March 2015 | 8
Mellon University USA, Indian Institute
of Science, Bangalore, India, and
International Institute of Information
Technology, Hyderabad. It is based on
an Example-based Machine Translation
system (Shiva) and another machine
translation system (Shakti) follows a
hybrid approach by combining both rule
and statistical approach. The new release
of Shakti is also in progress for three target
languages Hindi, Marathi and Telugu.
There are number of other projects
developed during 2005 to 2010, some of
them are ‘A hybrid statistical MT system for
English to Bengali’ at Jadavpur University,
English to Kannada and Kannada to
Tamil language pair by an example based
system, Punjabi to Hindi MT system using
direct word-to word translation at Punjabi
University, Patiala, a hybrid Example based
MT system for English to Indian Languages
using minimal linguistic resources etc. It
is difficult to list all the initiatives in MT,
now the list will be confined to some of the
currently undergoing projects in MT. Most
of the current LT and MT systems are funded
by TDIL, DeitY, GOI and are of consortium
mode.
Tamil-Hindi and Hindi-Tamil Machine
Aided Translation System (2005)
The system Tamil-Hindi MachineAided Translation System has been
developed by Prof. C.N. Krishnan at
Anna University at KB Chandrashekhar
(AU-KBC) research centre, Chennai.
The translation system is based on
Anusaaraka
Machine
Translation
System, the input text is in Tamil
and the output is produced in a Hindi
text. It uses a lexical level translation
and has 80-85%
coverage. Tamil
morphological analyser and Tamil-Hindi
bilingual dictionary are the by-products
of this system. They also developed a
prototype of English-Tamil MachineAided Translation system. It includes
exhaustive syntactical analysis, which
has limited vocabulary (100-150) and
small set of transfer rules. The MT
system developed has three major
components, viz. morphological analyser
of source language, mapping unit and the
target language generator. The TamilHindi Machine Aided Translation (MAT)
system has a performance in the range
of 75%.
www.csi-india.org
Fig 2: Parallel corpora Translation for Hindi to Malayalam
Indian Language to Indian Language
Machine Translation System
(Sampark) (2006)
IL-ILMT system is a consortia project
headed by IIIT Hyderabad and 11 institutions,
Uinverisity of Hyderabad, CDAC-Noida
and Pune, AUKBC-Anna Univerity, IIT
Kharagpur, IISc Bangalore, IIIT Allahabad,
Tamil University, Jadavpur University, IIT
Mumbai, and IIITM-Kerala are participating
to build the system. The main objective
of the system is to build bidirectional
systems for 9 pairs of languages{TamilHindi,Telugu-Hindi,Marathi-Hindi,BengaliHindi,Tamil-Telugu,Urdu-Hindi,KannadaHindi,Punjabi-Hindi,
Malayalam-Tamil}.
The major tasks involved are to enhance
the dictionary size (domain based), develop
Morphological Analyser, Sentence Parser,
Chunker, generator for both source and
target languages appropriately and also
to include Discourse parsing, Anaphora,
MultiWord Expression(MWE), Named
Entity Recognizer (NER), Word Sense
Disambiguation (WSD) and apply new
Statistical MT system to improve accuracy.
The project is headed by Prof. Rajeev Sangal
and Prof. Dipti Misra of IIIT -Hyderabad.
The screenshot of Malayalam to Tamil MT
system part developed by IIITM-Kerala as
par of ILMT is shown in Fig. 1.
Indian Language Corpora Initiative
(ILCI) (2009)
ILCI is a consortia project headed by JNU
New Delhi under TDIL of DeitY, GOI to
build a common language platform by
creating a parallel annotated corpora in 17
Indian languages with Hindi as the source
language. The phase 1 of the project is
to build an annotated parallel corpora
(Hindi to Indian languages with English)
with standards for 17 major Indian
languages including English - 8 Indo Aryan
languages (Hindi, Urdu, Punjabi, Bangla,
Oriya, Gujarati, Marathi and Konkani)
and 3 Dravidian languages (Tamil, Telugu,
Malayalam) plus English in the domain of
tourism and health. In Phase II, Assamese,
Nepali, Bodo, Kashmiri, Kannada and
Manipuri are added. About 1 lakh corpora
on health, Tourism, Agriculture and
Entertainment are created, that has been
annotated, tagged and chunking process
is in progress. Tools for Parsing, Chunking
and System Generators are also in
Progress, which serves as a big resource to
build MT System. The project is headed
by Dr. Girish Nath Jha of JNU-New Delhi.
The Parts-of-Speech tagged output for
Hindi to Malayalam module of IIITMKerala in JNU site is shown in Fig. 2.
UNL based MT System ( 2010)
Universal Networking Language (UNL)
is based on Interlingua approach by
converting source language to UNL form
using an Encoder and then decoded
from UNL to the target language using
hypergraph concepts. IIT Mumbai tried
out in English to Hindi and Marathi,
Anna University worked in Tamil to
Malayalam and IIITM-K developed a
decoder for Malayalam language using
UNL for Machine Translation. Though
UNL provides strong mapping to language
features in semantic and syntactic, but for
each language, linguistic features have
to be coded for each case, which makes
the process tedious and complex. So
number of UNL based projects in Indian
languages namely, Hindi, Punjabi, Bangla,
Kannada, Tamil and Malayalam could not
progressed as expected.
IndoWordnet and Indradhanush (2010)
IndoWordNet is a linked lexical knowledge
base of wordnets of 18 Indian languages
viz., Assamese, Bangla, Bodo, Gujarati,
Hindi, Kannada, Kashmiri, Konkani,
Malayalam, Manipuri, Marathi, Nepali,
CSI Communications | March 2015 | 9
About the Author
Oriya, Punjabi, Sanskrit, Tamil, Telugu and
Urdu. The project is under TDIL of DeitY,
headed by Dr. Pushpak Bhattacharya of
IIT-Mumbai.
Indradhanush WordNet Consortium
comes under the umbrella of IndoWordnet
along with two other consortiums namely,
North East WordNet Consortium which
works on “Development of NE WordNet:
An Integrated WordNet for North East
Languages: Assamese, Bodo, Manipuri and
Nepali” and Dravidian WordNet Consortium
which works on “Development of Dravidian
WordNet: An Integrated WordNet for
Telugu, Tamil, Kannada and Malayalam”.
These WordNets are developed at different
institutes in India and co-ordinated by IIT
Bombay. These WordNets are constructed
and linked to Hindi and English WordNets
and amongst each other. The main objective
is to build Automatic multi-lingual dictionary
creation, Machine Translation and Crosslingual Information Retrieval.
There are many other MT system
development carried out and some of
the notable contributions by Central
Institute of Indian Languages (CIIL) under
Linguistic Data Consortium for Indian
Languages (LDC-IL), CDAC, IITs, IIITs
are significant. In Dravidian Languages
(Kannada, Telugu, Tamil, Malayalam), there
are significant developments and major
players are Dravidian University, AUKBC,
Anna University, University of Hyderabad,
Amrita University, CIIL-Mysore, Tamil
University-Thanjavur, IIIT-Hyderabad and
IIITM-Kerala. Also IT companies mainly
Microsoft and Google are contributing
to MT Systems to cater the multitude of
translation scenarios today.
Conclusion
The survey reveals that most of the present
MT systems for Indian languages use
Statistical and Hybrid approaches since
rule based or example based system failed
in many situations. This is because of
the morphologically rich inflectional and
agglutinative nature of Indian languages.
Also many of the earlier systems tried to
incorporate more of linguistic features,
which could not be handled by the earlier
computational
techniques
available.
Now better models and techniques are
available, so rather than injecting more
of linguistic aspects, system should be
designed for computational models with
newer techniques, in which linguistic data
has to be fed as like health data or financial
data, ie without burdening much of the
the intrinsic language complexity to the
system, thereby more robust systems can
be expected.
References
[1]
[2]
[3]
[4]
[5]
Sinha, RMK; Sivaraman,K; Agrawal, A; Jain,
R ANGLABHARTI: a multilingual machine
aided translation project on translation from
English to Indian languages, Systems, man
and cybernetics, IEEE, vol 5, 1995.
Suman Idikula; Design and Development
of an adaptable Frame based System for
Dravidian Language Processing, Doctoral
Thesis, CUSAT (1999).
Rinju O R, Rajeev R R, Reghu Raj P C,
Elizabeth Sherly, Morphological Analyser
for Malayalam: Probabilistic Method
Vs Rule Based , International Journal of
Computational Linguistics and Natural
Language Processing, Vol 2 Issue 10 October
2013.
Rajeev RR, Jisha P Jayan, and Elizabeth Sherly,
Parts of Speech Tagger for Malayalam”,
IJCSIT International Journal of Computer
Science and Information Technology, Vol 2,
No.2, December 2009, pp 209-213.
Biji Nair, Elizabeth Sherly, Language
Dependent Features for UNL-Malayalam
Deconversion, IJCA International Journal of
Computer Application, vol 100, No.6, pp 3741, Aug 2014.
n
Elizabeth Sherly obtained her Ph.D in Computer Science in 1995 in Artifical Neural Networks from Kerala University. Now working
as Professor in IIITM-Kerala, Trivandrum, has 25 years of experience in research and teaching. She is the Principal Investigator of two
prestigious projects of Language Technology ILCI and ILMT of TDIL, DeitY, GOI. Her other research interests are Object Oriented Technology,
datamining and Image Processing. Has got 50 publications to her credit. She is guiding dozen of Ph.D students in CS. She can be reached
at [email protected]
CSI Communications | March 2015 | 10
www.csi-india.org
Cover
Story
Arun Kumar N
Assistant Professor, Amrita School of Arts and Sciences, Kochi
An Overview of Machine Translation
Introduction
Language is an effective medium of
communication. It represents the ideas
and expressions of human mind. Several
thousands of languages exist in the
world that reflects linguistic diversity. It
is difficult for an individual to know and
understand all the languages of the world.
Hence the methodology of translation were
adopted to communicate the messages
from one language into another. Today, in
the era of Information and communication
technology there is a revolution in the field
of machine translation. Several tools free as
well as proprietary are now available which
supports translation of text into one or
more languages.
Machine Translation (MT) also known
as Computer Aided Translation, is basically
the use of software programs which have
been specifically designed to translate
both verbal as well as written texts from
one language into another. It comes under
the area of Natural language processing.
On the basic level MT perform word for
word translation. Translation depends on
morphology of the language. Morphology is
the identification, analysis, and description
of the structure of a given language’s
morphemes and other linguistic units such
as root words, affixes and parts of speech.
Languages rich in morphology are Dravidian,
Hungarian, and Turkish and languages poor
in morphology are English and Chinese.
Language with rich in morphology has the
advantage of easier processing at higher
stage of translation.
Techniques
There are mainly three types of Machine
Translation namely EBMT, RBMT and SMT.
EBMT stands for Example Based
Machine Translation which translates
sentences from one language into another
using bilingual corpus. The basic units
of EBMT are sequence of words and the
basic techniques are matching of words
against words in the corpus. In EBMT, the
input sentence is decomposed into set
of tokens known as fragmental phrases.
These fragmental phrases are translated
into the target language phrases by the
analogy translation principle by referring
proper examples in the corpus. Corpus
can be supervised, semi-supervised or
unsupervised. Supervised sentence corpus
provides tagged sentences, generally tagged
using Hidden Markov Model. Unsupervised
corpus contains plain sentences. Semisupervised corpus contains a mixture of the
two, can be tagged using predefined set of
rules and Viterbi algorithm.
The ambiguity arise during wordto-word translation from source to target
language can be disambiguate using word
sense disambiguation algorithms such as
Selection Restrictions algorithm, Lesk’s
algorithm, conceptual density algorithm
or Random Walk algorithm.
The accuracy of EBMT depends on
the number of samples in the corpus.
Corpus reduces the human cost. The
major drawback of EBMT is search cost is
expensive as corpus grows exponentially
and knowledge acquisition is still
problematic.
RBMT stands for Rule Based Machine
Translation. It is a machine translation
systems based on linguistic information
about source and target languages
basically retrieved from (unilingual,
bilingual or multilingual) dictionaries and
grammars covering the main semantic,
morphological, and syntactic regularities
of each language respectively. It uses a set
of predefined rules and sample corpus as
part of translation. RBMT can be classified
into three categories namely: direct,
transfer and Interlingua.
Direct RBMT system maps input
into output using basic translation rules.
It adopted a word for word translation
from the source language to the target
language. Transfer RBMT System employs
morphological and syntactical analysis.
In this the transformation process is
decomposed into three steps namely:
Analysis, Transfer, and Synthesis. Analysis
of source text is done based on linguistic
information such as morphology, parts
of speech, syntax, and semantics. The
syntactic or semantic structure of source
language is then transferred into the
syntactic or semantic structure of the target
language. This approach has dependency
on the language pair involved. Interlingua
RBMT System is considered as the third
generation of machine translation. It aims
to create linguistic homogeneity across
the globe. In this system source language
is transformed into an intermediate
language which is independent of any of
the languages involved in the translation.
This intermediate representation is known
as Interlingua, which can be transformed
into multiple languages.
Advantages of RBMT are effective
for core phenomena based on linguistic
theory. It is easy to build an initial system.
The main drawbacks are rules are
formulated by experts. So it is difficult to
maintain and extend and is ineffective for
marginal phenomena.
SMT stands for statistical Machine
Translation. It is a machine translation
paradigm where translations are generated
on the basis of statistical models whose
parameters are derived from the analysis
of bilingual text corpora. The idea behind
statistical machine translation comes
from information theory.
Advantages of SMT are Numerical
knowledge extracts knowledge from
corpus reduces the human cost. This
model is mathematically grounded. The
drawbacks of SMT are it doesn,t have any
linguistic background.
Search cost is expensive. .Hard to
capture long distance phenomena.All of
these MT systems involve the following
general steps:
Morphological Analysis: performs
word formation rules.
Lexical
analysis:
Dictionary
representation of words.
Syntactic analysis: Parsing of
sentences.
Semantic
analysis:
Meaning
representation.
Pragmatics: determines how a
sentence is used.
Discourse: Processing of connected
sentences.
All these system requires a corpus for
implementation, which contains sample
sentences, can be tagged using parts
of speech tagger. Parts of speech(POS)
Tagging is the process of assigning a
unique tag for each word in a Sentence.
It is the lowest level of syntactic analysis.
POS tagging is mainly used in Information
retrieval, text to speech conversion,
and word sense disambiguation. It sits
CSI Communications | March 2015 | 11
between Morphology and Parsing. It
requires a standard set of tags known
as Tag Set for representation as well as
implementation. We can simply assign
a tag to each word in the sentence with
the help of a Tag Set and a Stochastic
HMM Tagger with Bigram assumption.
Bigram taggers assign tags on the basis of
sequences of two words (usually assigning
tag to word n on the basis of word n-1).
About the Author
Translation Tools
Some of the tools developed for machine
translation are : Anusaaraka project(1995)
built to transfer sentences from Telugu,
Kannada, Bengali, Punjabi and marati to
Hindi. Mantra(1999) is a translation tool
devised for English to Hindi in a precise
domain such as administration, office
orders and office memorandum. Matra
system(2004) used for English to Hindi for
news stories. Anglabharati and Anubharati
is used for translating sentences in English
into any other languages. Anuvaadak is
an English to Hindi translation tool used in
domain such as official, formal, agriculture
and linguistic.
Recent Trends
The recent trend in machine translation is
to combine Machine translation systems
with data mining techniques to build
corpus independent systems which can be
used for extracting information from multi
databases. Machine Translation combined
with Image processing is useful to extract
sentences from images and translate
it into the intended Target language.
Speech to Text, Image to Text and Speech
to Speech translations are some of the
research area in this field.
References
[1]
[2]
[3]
[4]
[5]
Sneha Tripathi and Juran Krishna Sarkhel
“Approaches to machine translation”, Annals
of library and information studies Vol. 57,
December 2010.
Beaven, J (1998): ‘MT: 10 years of
development’, Terminologie et Traduction
1998: 1, 242-256.
Carl, M and Way, A eds. (2003): Recent
advances in example-based machine
translation. Dordrecht: Kluwer.
Bassnett-Mcguire, Susan: 1990, Translation
studies, Great Britain, The Chaucer Press Ltd.
Arnold, DJ, Balkan, L, Meijer S, Humphreys, RL
and Sadler, L: 1994, Machine translation: an
introductory guide, London, Blackwells-NCC.
n
Arun Kumar N working as Assistant Professor in Amrita school of arts and sciences, Kochi. Presently, he is working on doctoral research at
Amrita viswavidyapeetham university in the area of Natural language processing.His areas of interests include Natural Language processing,
Algorithms and Image Processing. He has 5 years of experience in teaching .
CSI Communications | March 2015 | 12
www.csi-india.org
Technical
Trends
Hardik A Gohel
Assistant Professor, AITS, Rajkot and active Member of CSI
Role of Machine Translation for Multilingual Social Media
Introduction to Machine Translation
Machine Translation (MT) is the method
of translation carried out by a computer.
It is a sub category of the computational
linguistics which scrutinizes employ
of software to translate a plain text or
vocalizations from one ordinary language
to anther ordinary language. The procedure
of translation is done by a computer. There
is no human being involvement. This is
the technique which has been found in
1950s and it is also known as automated,
automatic or instant translation.
In fact, the concept of machine
translation has been marked out back in
17th century. The concept of “Universal
Language” with different tongues and
similar kind of symbol is proposed by Rene
Descartes. But it becomes first field of
researchers in 1950. The first public demo
by Georgetown University MT research
team with IBM has done in 1954.
Terminologies
It is necessary to get not only target language
retrieval automatically but also in correct
place of the result document. It is only
possible whenever the right terminology
has been supplied to the system of Machine
Translation. Now let us see how does
machine translation is working. Basically,
there are two different types of Machine
Translations. The first one is, rule based
machine translation system and another one
is statistical machine translation system.
The rule based machine translation
system is using mixture of language and
grammar rules as well as dictionaries
for ordinary words. It is also known as
knowledge based machine translation.
There is a special creation of dictionaries
which focuses on particular industries
or disciplines. This type of machine
translation systems classically conveys
reliable translations with accurate
terminology, whenever there is proper
training by special created dictionaries.
Another type is statistical machine
translation systems. There is no
knowledge of rules about languages. But
there is learning by analysis of large scaled
data for each language pair. It may be
trained for specialized industry sector or
disciplines using further data relevant to
the sector needed. Naturally, the delivery
of machine translation is more fluentsounding but less reliable translations.
Statistical based translation and rule
based translation are mostly matched
with languages like French and Spanish.
Where as, specific statistical based
translation is suited for minority language.
Rule based translation can perform better
on languages includes Korean, Japanese,
Russian and German.
The differences between statistical
machine translation and rule based
machine translation and are given bellow:
The best terminology about Machine
Translation is to analyse Google’s
translation. It is not stand on intellectual
assumption of early machine efforts. It is
also not just an algorithm which has been
Fig. 1: People from Different Region Communication with help of Machine Translation
Statistical Based Machine Translation
Rule Based Machine Translation
It is well again for content which is
generated by user and broad domain
material such as patents.
It is well again for records and even software.
It might translates software tags
It defends software tags.
Its’ more suitable on fly translation on
small-shelf-life substance.
It is good enough for editing by later and
changes during translation.
It is using most likely terms but it is not
necessary that individual will prefer it.
It ruins modifications to terms and relates
the correct grammar.
It is not predictable
It is predictable.
It is having longer updating cycles.
(Once or twice in a year)
It is faster to update (Can be on daily basis)
This can be free or an open source.
This is high-priced to license.
It is very heavy on processing resources.
It is very heavy on linguistic resources.
SMT creates more flowing sentences.
RBMT creates less flowing sentences.
It can handle the terrible grammar as
well as doesn’t get better much with
unnatural authoring.
SMT can hold over 50 languages out of
the box. E.g. is Google & Bing Translator.
It is doing appreciably better when unnatural
authoring is in place.
RBMT can hold 20 targeted languages out
of the box.
Table 1: Difference between SMT & RBMT
CSI Communications | March 2015 | 13
Fig. 2: Multilingualism of India with Machine Translation
intended to extract the significance of an
expression from its syntax and vocabulary.
It is also not dealing with meaning.
Something that probably been said
before, instead of taking simply linguistic
expression, decoding is the principle on
which machine translation of Google is
working. It utilizes huge computing power
to search the internet within the blink of
an eye, looking for the expression in some
text which exists next to its matching
translation. The mass content scanning it
includes all the paper put out by European
Union in 24 languages, a lot the United
Nations and its agencies have ever done
in writing in 6 official languages, and
large quantity of different material, since
the records of international tribunals to
company reports and all the books as
well as bilingual articles from that have
been put up on the web by individuals,
booksellers, libraries, authors and
departments of academics. Drawing on
conventional patterns that already exists,
of matches between these millions of
paired documents, Google Translate uses
statistical methods to pick out the most
possible satisfactory version of what’s
been submitted to it. All most, all the time
it works. It is quite spectacular and mainly
liable for the new mood of optimism
about the prediction for “fully automated
high-quality machine translation”. Google
translate might not work exclusive of very
large pre-existing amount of translation.
It is erects upon the millions of hours to
CSI Communications | March 2015 | 14
of their business market instead of English
only. Since numbers of non-English
speaking users are rising day by day, it is
necessary to communicate in their native
languages. According to search engine
journal states ascertaining a worldwide
presence across all social media platforms
will help boost your brand awareness.
Preferably, an organization’s presence on
a social media will provide as a portal to
their website. The social media is helpful
to companies to achieve their goal of
marketing across the globe.
By analysing more about social
media statistics, we have found more than
6,000 multilingual posts. The languages of
comments are one or more, or the threadstarter, were various apparently signifying
people being able to communicate with other
people in other languages through machine
translation. The following are specified
statistics of multilingual comments.
work on human translators who
fashioned the texts which Google Sr. No. Number of
Percentage
Languages
Translate searches. At existing,
Google offers two way translations, 1
Two Languages in
85%
by using machine translation, among
Comments on thread
58 languages, that is 3,306 separate
Three Languages in
15%
translation services, more than ever 2
Comments
on
thread
existed in all human history till date.
Four+ Languages in
3%
Google Translate, with the help 3
Comments on thread
of Machine Translation, is providing
voice reorganization for Hindi
Table 2: Number of Languages on Social Media threads
and other seven Indian languages
Comments
also. The latest version of Google
Translate supports Hindi, Gujarati,
Moreover, the mainstream multilingual
Bengali, Marathi, Punjabi, Kannada, Tamil,
posts concerned English with different
and Telugu, enveloping major languages
language like English-Spanish and Englishof India. Presently, Google introduced
Portuguese being the most frequent
advertisements in Hindi on its network as
combination along with bilingual threads:
there are more than 500 million people
The above study is related to
speaking Hindi worldwide.
multilingualism of worldwide. Now let’s
After analysing more about Machine
have study related to multiple languages
Translation in Social Media, there are
of incredible India and its connectivity to
more than 6,000 multilingual posts.
social media through machine translation.
As India is having diversity in cultural and
Multilingualism, Social Media and
it involves lots of languages spoken by over
Machine Translation 1.2 billion people lives in the country. Yes,
Columbia Business School Centre
that is true that 200 million Indians are
conducted a research study on global
capable to recognize English but according
brand leadership in which they have found
to the record of 2001 half billion of Indian
about their recent marketing tool. None
population has recognized Hindi as their
other than social network accounts are
most preferable tool by 85% corporations.
mother tongue. Furthermore, if we are
It includes brand accounts on Facebook,
talking about rural India, 43% of citizens
Twitter, Google+, Foursquare and others
mentioned that they would readily adopt
also. But the problem is companies are
social media if at all there had been content
looking for marketing in native languages
in their respective local languages. With
www.csi-india.org
Sr. No.
Languages Pairs
No. of Threads (Approx)
1
English & Portuguese
2500
2
English & Spanish
1150
3
English & French
650
4
English & Italian
400
5
English & Turkish
300
6
Catalan & English
250
7
German & English
200
8
English & Vietnamese
150
9
English & Japanese
100
10
English & Russian
100
Table 3: Top 10 Threads of Social Media with Multilingual Pairs
the figure given by Accredited Language
Services (ALS) amongst top 10 common
languages spoken worldwide, following is
the position of Indian languages.
Indian Language
Rank of Spoken
Worldwide
Hindi
Urdu
Bengali
Punjabi
4th
5th
7th
10th
Table 4: Rank of Indian Languages spoken
worldwide
Google offers two way translations,
by using Machine Translate, between
58 languages, which is 3,306 separate
translation services, more than have ever
existed in all human history till date.
The popular social media is the
twitter in which there is a line, world may
like to tweet but Japanese love to! But
the problem is Japanese are twitting in
their native language. If any multinational
company is looking forward their product
marketing by twitter in Japan, it is
mandatary to tweet in Japanese to get
maximum followers. Now if company
would go for Japanese twit only then
other nation will not understand it. So the
solution is to create, and make it update
also, the multiple twitter accounts.
Another popular social media is
Facebook. The companies are using
Facebook for marketing to globe have to
create separate pages similar to twitter.
But in 2012, Facebook has provided a
new tool to get a streamline the process
for companies for global page creation.
In this new tool any organization can set
Fig. 3: Multilingualism & Social Media Interaction
up localized version of their cover photos,
Page apps, profile photos, news feed
stories and about information. The version
in English might say “Hello”, for welcoming
them, where as the users who is visiting
from Spanish-speaking countries would
see “Hola”. In short, pages available
globally allows corporate to create distinct
brand identity.
Facebook at present ropes 13 Indian
languages and is determined on facilitating
all major Indic languages and on actively
advancing them on various platforms.
Mark Zuckerberg, co-founder and
chairman of Facebook, has recently met
PM Modi to discuss his plan to develop
Facebook in other Indian languages by
applying advanced machine translation.
Local language utilization growth rate
is around approximate to be more
than four times than that of English
language. -Google
Since last three years, the platforms
of social media have been rolling out
Machine Translation (MT) in trusts of
facilitating multilingual interactions. It is
possible that people are interacting with
each other through social media knowing
very well and having common languages.
But what about the people, who are having
Common Interests but not a Common
Language? As we have discussed above
also, companies are also working to create
distinct brand identity by multilingual
social media.
As we have mentioned above that by
using the facility of “Machine Translation”
the Facebook is the first social media with
multilingual facility. Google+ and Twitter
have also started providing this facility
later on. The Machine Translation Tool,
Facebook launched, is known as New InLine translation tool. It allows facility of
auto translating conversations and posts on
Facebook pages. This is diverse from tool of
Google’s Machine Translation tool. This is
permitted services by Microsoft and works
on Facebook post of any individual’s profile
as well as pages. Lets’ say example, if you
are speaking and understanding English
only and found a comment in Gujarati, you
can see the Translate button next to that
comment which allows you to pop-out
window in English. Furthermore, there is a
facility of Machine Learning Translation on
Facebook to provide better accuracy. In this,
a user can enter a human translation in that
pop-out window. If it is getting enough votes
CSI Communications | March 2015 | 15
from other users in positive way related its
accuracy then it will replace from existing
translation while translating next time.
These all translations can be managed by
page administrators by using “Manage
Translations” link beneath posts on pages
they manage.
We have mentioned, Twitter is the
second large social media in all over the world
where only 50% of tweets are in English and
others are in various languages apart from
English. Twitter is allowing 140 characters
only so, it is not difficult to translate these
limited characters and its possibilities of
reordering by human translation which
is also very less. But machine translation
would be the first choice of users. Machine
translation for Twitter can be considered
as domain adaptation crisis, as there is
no huge bilingual Twitter as collection of
written text. The field of domain variation
has been measured significant, because
the performance of a statistical machine
translation system decomposes when faced
with tasks from various type. However, the
work is mainly looked into adaptation to
domains which is similar to the types of
data training. It requires having domain
adaptation research as well as tests since
huge amount of monolingual in domain data
are freely available through its streaming
application programme interface.
About the Author
Accomplishment of Machine Translation
in Social Media
•
It is possible to have quick language
translations
by
using
in-line
translation which is available in
multilingual social media.
•
It is complimentary for all the users
of social media who are having same
interests but with different languages.
•
The most significant accomplishment
of machine translation in multilingual
social media is, it supports and able
to use all the internet browsers.
•
By using machine translation,
individual can accomplish global
communication
through
social
media.
•
At present, multilingual social media
supports high number of languages
•
•
so people can have varieties of
languages for translation.
It offers links for real person
translations for suggestion if needed
and by having number of votes it
replaces existing translation to real
person translation.
It is very useful for multinational
companies for better branding of
their products on local market.
Challenges with Machine Translation in
Social Media
•
Till date, the machine translation
uses in social media is not 100%
accurate so it requires extending
efforts to make it more accurate.
•
It is quite difficult to decide which
translation is accurate or which one
is not and this is very big challenge.
•
At present in social media if you are
going for real life translation, it is
having some allege for that.
•
Each type of Machine Translation
is having their own drawback which
is applicable to multilingual social
media also.
•
Some of the languages like
Vietnamese and other few are not
having enough content online from
where machine can learn translate.
Conclusion
Multilingual social media by the facility
of machine translation is very innovative
idea to extend usage of social media. It
is not only for social interaction but for
branding of multinational products and
services worldwide. Social media is one of
the most significant way to promote any
product as well as to extend network but it
was having limitation of English language
which can be understand by 50% of the
people in all over the world. So multilingual
social media with the facility of machine
translation is having some challenges but
most imperative way to give personal
touch towards communication.
References
[1]
Hardik Gohel “Looking Back at
the Evolution of the Internet”, CSI
Communications - Knowledge Digest for
IT Community, 38(6), pp. 23-26 [Online].
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Available
at:http://www.csi-india.
org/ (Accessed: 9th February 2015).
Hardik Gohel & Priyanka Sharma
“Study of Quantum Computing with
Significance of Machine Learning”, CSI
Communications - Knowledge Digest for
IT Community, 38(11), pp. 21-23 [Online].
Available
at:http://www.csi-india.
org/ (Accessed: 16th February 2015).
SDL (2014) What is Machine
Translation, Available at:http://www.
translationzone.com/products/machinetranslation/ (Accessed: 9th February
2015).
Charlie White (2011) Facebook Launches
New In-Line Translation Tool, Available
at:http://mashable.com/2011/10/06/
facebook-translation-tool/ (Accessed:
15th February 2015).
Andrés
Monroy-Hernández
(2014) Multilingual
Interactions
through
Machine
Translation—
Numbers from Socl, Available at:http://
socialmediacollective.org/2013/10/04/
multilingual-interactions-throughmachine-translation-numbers-fromsocl/ (Accessed: 9th January 2015).
M Vasconcellos, B Avey, C Gdaniec, L
Gerber, M León & T Mitamura (2001)
Terminology and Machine Translation,
2 edn., Amsterdam/Philadelphia: John
Benjamins.
Libor Safar (2013) Why multilingual
social media marketing is good for
business, Available at: http://info.
moravia.com/blog/bid/265158/Whymultilingual-social-media-marketing-isgood-for-business-and-how-to-do-itright (Accessed: 11th February 2015).
David Bellos (2011) How Google
Translate works, Available at:http://
www.independent.co.uk/life-style/
g a d ge t s - a n d - t e c h /f e a t u re s / h ow go o g l e - t ra n s l a t e -wo r ks -2 3 5 3 594 .
html (Accessed: 16th February 2015).
Lori (2014) Machine Translation
Blog, Available at: http://lexworks.com/
machine-translation-blog/ (Accessed:
17th February 2015).
Jasleen Kaur (2015) Indian regional
languages emerge in Digital and Social
Media, Available at: http://www.
digitalvidya.com/blog/indian-regionallanguages-emerge-in-digital-and-socialmedia/ (Accessed: 27th February 2015).
n
Hardik A Gohel, an academician and researcher, is an assistant professor at AITS, Rajkot and active member of CSI. His research
spans Artificial Intelligence and Intelligent Web Applications and Services. He also focuses on “How to make popular, Artificial
Intelligence in study of Computer Science for various reasons” He has 28 publications in Journals and proceedings of national and
international conferences. He is also working as a Research Consultant. He has contributed cover stories in CSI Communication
Magazine by last year and technical trends in last month. He can be reached at [email protected]
CSI Communications | March 2015 | 16
www.csi-india.org
Technical
Trends
D G Jha
Professor & Area Chairperson – IT; Programme Coordinator – MCA K J Somaiya Institute of Management Studies and Research,
Vidyanagar, Vidyavihar, Mumbai
Introduction
The dictionary meaning of the term
‘Technology’ is ‘the application of
mechanical and applied sciences to
industrial use’ [3]; ‘the sum total of the
technical means employed to meet the
material needs of society’; ‘the technical
terms used in science, arts etc.’ [13]; ‘study
or use of mechanical arts and applied
sciences; these subjects collectively’[2].
Technology needs to be perceived
as social phenomenon, one that posses
complete autonomy and remains
unaffected by the society in which it
exists. The power of technology lies in
determining its own course away from
any form of social control. Once the
momentum of technological development
gets firmly established it becomes difficult
to stop, before the process is complete.
However, whether to continue or abandon
the project is undeniably human and
would therefore be unwise to declare
technology as monster threatening the
human existence. Technology in itself is
neutral and passive:
In Lynne White Jr. words -“Technology
opens doors; it does not compel man to
enter” The developments in Information
Technology have had an impact on general
society perception of information. The
impact has been fourfold: storage (society
expects to be able to store more than
what has been previously conceived);
manipulation (society expects to be
able to realign information for their own
benefit, to increase understanding and
discover new relationships); distribution
(society expects to be able to distribute
information quickly, efficiently, cheaply
and in the language understandable to
the recipient) and creation (society now
expects the creation of new information to
be facilitated by these new technologies)8].
The key issue is to understand the
effect of society on the information
technology rather than analyzing the
impact of information technology on
society.
Any technology that is regarded to be
highly innovative reaches obsolescence
sooner or later with another successful
innovation taking its place. The
complexity of human society is not in a
capabilities), User friendliness (Graphics
User Interface), Connectivity (global
networks) and Artificial Intelligence
(matching human thinking process).
Machine Translation is one such innovation
that combines knowledge oriented
computing concepts and technology.
The Knowledge-Technology Connection–
hard to achieve
The entire life cycle of any project
conceptually now seeks the support
of information technology as can be
visualized in the Fig. 1.[5] When people get
involved in designing a system, they usually
arrange the interacting components in
such a manner that an objective or set of
objectives gets accomplished
Fig. 1: Data … Knowledge/Intelligence cascading impact
position to resolve as to what drives the
advancement of the technology i.e., any
attempt at identifying the causes and
effects that drives technological advances
will not yield any extrapolative values.
The challenge is to make intelligent use
of the extraordinary power of electronic
information systems for the benefit of the
society.
The use of Information Technology
is directly linked to enormous increase
in computer’s power (processing
For an organisation or an enterprise,
one of the key contributors to generation
of newer, innovative and creative ideas
is the communication of the facts in
the language that is decipherable to
the recipient. The translation of created
text into another language (as desired
by the recipient) can now be achieved
using machine albeit with lot of inherent
errors in algorithm (steps specified for
translation) involved in the process as
illustrated below:
While translating English text to Hindi, using Google translate – interesting observation were encountered, for the sentence ‘I am Manish’,
the Hindi version generated accurate text while for similar text ‘I am Divyanshu’, the resultant text lacked the Hindi conversion of proper
noun ‘Divyanshu’ [It’s represented here to only drive the point and not to undermine the efforts of research team at Google] [12]
I am Manish
I am Divyanshu
मैं मनीष हूँ
मैं Divyanshu हूँ
Few more interesting observation(s):
Icecream
I like icecream
I like strawberry icecream
I love Vanila icecream
I love strawberry flavor in icecream
I like vanila flavor in icecream
आइसक्रीम
मुझे आइसक्रीम पसंद है
मैं झरबेरी मलाईबर्फ़ पसंद
मैं वनीला कुल्फ़ी प्यार करता हूँ
मैं मलाईबर्फ़ में स्ट्रॉबेरी का स्वाद प्यार करता ह
मैं मलाईबर्फ़ में वनीला स्वाद पसंद
CSI Communications | March 2015 | 17
Needless, to say recipient of translated text expects the Hindi version to give them better results. Using translated Hindi version in this
case, would turn out to be disastrous and needs to be accompanied with the warning (disclaimer) “…any inferences at the user’s risk,
the website does not guarantee the exactness of the translation”. Therefore, while converting entire users’ manual (say) developed
in English (any source language) to Hindi (any target language) using machine translation, the resultant document would require
careful editing of content(s) as machine would (accordingly) translate the contents that maps with its vocabulary set and translation
algorithm.
The above examples have been cited only to emphasise that machine translation requires more formal linguistic, and needs ‘real world
knowledge’ and understanding of semantic barrier for the algorithm design.
All this indicates that translation is a tough task for a computer, for it involves –
•
Understanding of the source text
•
Converting the same into target language as desired by the recipient
•
Generating the correct target text meaningful the recipient [6]
Machine Translation Fundamentals
Machine translation can be viewed as
an automated system that analyzes text
from source language and produces
‘meaningful equivalent’ text in target
language without human intervention
(see Fig. 2).
According to the presentation by
Source Language
Human Interpreter
Bonnie J. Dorr, Eduard H. Hovy, Lori S. Levin[1]
there exist three main methodologies for
machine translation – Direct; Transfer and
Interlingua (see Fig. 3).
Target Audience
i.e., human interpreter is replaced with computer
Source Language
Machine Translator
Target Audience
Fig. 2: Machine Translator replacing human interpreter
Source
Language
(SL) is original text (in a
Fig. 3: Different methodologies for Machine Translation [Source:
http://mttalks.ufal.ms.mff.cuni.cz/images/f/f1/Pyramid.png]
CSI Communications | March 2015 | 18
particular language) that
needs to be translated into
another language referred
to a target language (TL).
Word structure is an
important building block
that helps understand
the language. It defines
the manner in which
word is constructed and
the elements of which it is
made. For example, the elements of which
a word unproductively is made of can be
visualised as: [4]
By the technique of morphological
analysis (a method for exploring
all possible solutions to a multidimensional, non-quantified problem
– developed by Fritz Zwicky) the word
structure for source text is formed.
[9]
The target text is then generated
using the technique of morphological
generation.
The
morphological
generator uses a set of lexical and
www.csi-india.org
Machine
translation
is
not
straightforward. It involves rewriting
of entire text in another language.
Fig. 4: The example of word structure [Source: http://iffahrahim.files.wordpress.com /2012/05/mm.jpg]
morphological properties to address
the issues related to different syntactic
categories that may include usage of
nouns, adjectives, adverbs etc.,
Morphological generator combines stem/
root and suffixes to generate word i.e.
Stem/root + suffixes → Word
This methodology is referred to as direct
methodology.[7]
Syntactical structure defines rules
whereby words or other elements of
sentence structure are combined to
form grammatical sentences. Syntactic
analysis is the process of analysing a string
of symbols conforming to the rules of a
formal grammar associated with natural
languages or computer languages. [11] The
resultant structure is referred to as syntactic
structure. In transfer methodology, the word
structure is converted to syntactic structure
using syntactic analysis.
Semantic structure (first published in
1957 by Noam Chomsky) focuses on the
relation between signifiers. Signifiers could
be signs, symbols, words and phrases,
their meaning and their denotation
(interpretation). Linguistic semantics
is the study of meaning that is used for
understanding human expression through
language. The syntactic structure is
converted to semantic structure using
semantic analyser.[10]
Semantic analysis is the process
of relating syntactic structures to entire
content i.e., syntactic structure comprising
of levels of phrases + clauses through to
sentences and paragraphs gets related to the
level of generation entire content along with
the associated meanings that are languageindependent. Transfer methodology also
involves generation of semantic structure.
Finally, in the Interlingua methodology
a single representation for both SL and TL
that drifts away from language-specific
characteristics to create a “languageneutral” representation. Fig. 5 provides the
abstract view of Interlingua methodology.
Conclusions
Machine translation is not straightforward.
It involves rewriting of entire text in another
language. Though technology is expected to
make the task easier, machine translation
being a complex process it doesn’t always
result in accurate translation and therefore,
it is perceived by many that necessary postediting required is more time consuming
than doing the manual translation from
scratch. The intricacy associated with
morphological analysis, syntactic analysis,
and analysis of content makes machine
translation more complex. In short, it's all
about designing an algorithm that will help
system understand the content before
translating it.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
Analyzer
Synthesizer
About the Author
Fig. 5: Abstract view of Interlingua methodology
[12]
[13]
Bonnie J Dorr, Eduard H Hovy, Lori S
Levin Nd. “Machine Translation: Interlingua
Methods.” Available at http://verbs.colorado.
edu/~mpalmer/Ling7800/
Machine
Translation.ppt
Dictionary, nd. The Oxford English Mini
dictionary
Dictionary. 2001. The new international
Webster’s Pocket Dictionary of the English
Language. New Revised Edition. New Delhi:
CBS Publishers & Distributors
English Language and Linguistics. nd.
Available at http://www.putlearningfirst.
com/language/05words/05words.html
Jha, DG, 2007. “Computer Concepts and
Management Information System.” New Delhi:
Prentice-Hall of India Private Limited.
Machine Translation I. nd. Available at http://
personalpages.manchester.ac.uk/staff/
harold.somers/LELA30431/Machine%20
Translation%20I.ppt
Mallamma V Reddy & Dr. M Hanumanthappa.
nd. “Sentence translation for Kannada using
morphological analyser and generator.”
Available
at
http://www.academia.
edu/2451907/sentence_translation_for_
kannada_using_morphological_analyser_
and_generator.
Meadowcroft B, nd. “The impact of Information
Technology on work and Society.” Available at
http://www.benmeadoecroft.com/reports/
impact/
Wikipedia. 2015. “Morphological analysis
(problem-solving).” Available at http://
en.wikipedia.org /wiki/Morphological_
analysis_%28problem-solving%29
Wikipedia.
2015.
“Semantic
analysis (linguistics).” Available at http://
en.wikipedia.org/wiki/Semantic_analysis_%
28linguistics%29
Natural Language Processing. nd. Available
at http://language.worldofcomputing.net/
machine-translation/machine-translationoverview.html
Translate. nd. Available at https://translate.
google.co.in/
Webster’s New English Dictionary., 2004.
New Delhi: BPB Publications
n
Prof (Dr.) D G Jha is currently working as Professor and Area Chairperson - IT at K J Somaiya Institute of Management
Studies and Research. He has over 25 years of experience and has authored a text book in the area of computing concepts and
Management Information System. He is a Ph.D from University of Mumbai. He is also the programme coordinator of MCA. His
area of interests are computing concepts, DBMS, Information systems, and HRIS.
CSI Communications | March 2015 | 19
Research
Front
Sunita Rawat* and Manoj Chandak**
*Assistant Professor, Visweswaraiah Technological University, Karnataka, India
**Professor and Head of Department, Computer Science & Engineering, Ramdeobaba College of Engineering, Nagpur
Natural language is most common way to
communicate with each other but it’s not
possible to understand all the languages.
To understand different languages
machine translation (MT) is required. MT
is the most excellent application which
helps to understand any other language
in very less time and cost. Related to
this context some problems are faced by
researchers like words which pronounce
same but having totally different
meaning, few words spelled different
but having identical meaning, while in
some cases combination of words may
change the meaning. Thus Word Sense
Disambiguation (WSD) is needed to
resolve such kind of problems. Word Sense
Disambiguation is used to understand the
correct meaning of the word with respect
to context in which that is used.
WSD is essentially a task of
classification. Where word senses are
the classes and the context provides the
evidence. Every incidence of a word is
assigned to one or more of its possible
classes based on the evidence. Words
are assumed to have a finite and discrete
set of senses from ontology, a dictionary
or a lexical knowledge base. WSD has
apparent relationships with other fields
such as lexical semantics, whose main
aim is to define, analyze, and realize the
relationships between “word”, “meaning”,
and “context”[1].
Significance of WSD has been
widely acknowledged in computational
linguistics. Obviously WSD is not thought
of as an end in itself other than as an
enabler for other tasks and applications
of natural language processing (NLP) and
computational linguistics such as parsing,
machine translation, text mining, semantic
interpretation, knowledge acquisition and
information retrieval. On the other hand,
along with its theoretical significance,
explicit WSD has not always demonstrated
benefits in real applications. In general, the
WSD module is a black box surrounding
an explicit process of WSD that can be
dropped into any function, greatly like
a syntactic parser or a (POS) part-ofspeech tagger. The alternative is to include
WSD as a task specific “module” of a
CSI Communications | March 2015 | 20
particular application in a precise domain
and included so completely into a system
that it is hard to isolate. Explicit WSD has
not yet been persuasively demonstrated
to have a significant positive effect on any
function[2].
Selection of Word Senses
A commonly accepted meaning of a word
is a word sense. Such as, consider the
following two sentences:
(a) She chopped the vegetables with a
chef’s knife.
(b) A man was beaten and cut with a knife.
The word knife is used in the above
sentences with two different senses: a
tool (a) and a weapon (b). The two senses
are clearly associated, since they possibly
refer to the same object; however the
object’s projected uses are not same. The
examples make it clear that determining
the sense inventory of a word is a key
problem in word sense disambiguation[3].
Approach
There are two approaches that are
followed for Word Sense Disambiguation
(WSD): Knowledge Based approach and
Machine-Learning Based approach. In
Knowledge based approach, it requires
external lexical resources like Word Net,
dictionary, thesaurus etc. In Machine
learning-based approach, systems are
trained to perform the task of word sense
disambiguation. These two approaches
are briefly discussed below:
Knowledge Based Approach
The advantage of the knowledge-based
methods over the supervised and the
clustering methods is that training data is
not required for each word that needs to
be disambiguated. This allows the system
to disambiguate words in running text,
referred to as all-words disambiguation.
Here we have discussed three different
kinds of knowledge based approaches.
Dictionary Based Approach
It provides both the means of constructing
a sense tagger and target senses to be
used. Machine Readable Dictionaries
(MRD) are used to perform large scale
disambiguation. In this approach, all
the senses of a word that needs to
be disambiguated are retrieved from
the dictionary. These senses are then
compared to the dictionary definitions of
all the remaining words in context[4]. The
sense with highest overlap with these
context words is chosen as the correct
sense.
WordNet Based Approach
Wordnet superficially be similar to a
thesaurus. Based on the meanings it
groups words together. Though, there are
some important differences. WordNet
interlinks not only just word forms—
strings of letters—but also specific senses
of words. Therefore, words that are found
in close proximity to one another in the
network are semantically disambiguated.
WordNet labels the semantic relations
among words, whereas in a thesaurus
the groupings of words do not follow
any explicit pattern other than meaning
similarity.
Thesaurus Based Approach
Thesaurus is a resource that groups words
according to their similarity or likeness.
Thesauruses such as Roget and WordNet
are produced manually, while others,
like pioneering work by Sparck Jones
(1986) and more recent advances from
Grefenstette (1994) and Lin (1998) are
produced automatically from text corpora.
Machine Learning Based Approach
In machine learning approach, the systems
are trained to carry out the task of Word
Sense Disambiguation. Here the role
of the classifier is to learn features and
assigns senses to new unseen examples.
The initial input is the target word that is
the word to be disambiguated and the
context that is nothing but the text in
which it is embedded[5].
Methodology
Here we have discussed about the three
methods for word sense disambiguation.
First is Knowledge-Based Methods
second is supervised learning and third
one is unsupervised learning.
Knowledge-Based Methods for WSD
Knowledge-based methods represent
a distinct category in word sense
disambiguation (WSD). The performance
www.csi-india.org
of such knowledge intensive methods is
usually exceeded by their corpus-based
alternatives, but they have the advantage
of a larger coverage. Knowledge based
methods for WSD are usually applicable
to all words in unrestricted text. Whereas
corpus-based techniques are totally
opposite, which are applicable only to
those words for which annotated corpora
are available.
Supervised Learning Method for WSD
Supervised methods are similar to AI
methods of the early 1970s (Ide & Veronis,
1998). Such methods use a manually
created set of annotated corpora to train
an algorithm. A supervised algorithm
will typically identify patterns and rules
concerning word senses in the preannotated corpora, which can then be
applied to new corpora. Such as, the
corpora (pre-annotated) may contain
the word bank in numerous transcripts.
Certain words that appear around
the occurrences of bank will found by
supervised algorithm and creating a “bag
of words” for each word sense. These bags
of words are used by this algorithm when
is run on a new corpus to infer the correct
sense for each word. This information
is stored as information vectors. In
supervised learning, it is assumed that
the correct (target) output values are
known for each Input. So, actual output is
compared with the target output, if there
is a difference, an error signal should be
generated by the system. This error signal
helps the system to learn and reach to the
desired or target output.
Unsupervised Learning Methods for WSD
No supervision is provided in case of
unsupervised learning. Take an example
of a tadpole. Child fish learns to swim
without any supervision therefore its
leaning process is independent. In this
technique, feature vector representations
of unlabeled instances are taken as
input and are then grouped into clusters
according to a similarity metric. These
clusters are then labeled by hand with
known word senses. In machine learning,
the task of unsupervised learning is to
find hidden structure in unlabeled data[6].
Corrections to the network weights are
not performed by an external agent, as
in many cases we also do not know what
solution network should produce. Network
itself has to decide what output is best for
a given input and reorganizes accordingly.
Algorithmic Approach
In this article we have discussed
Knowledge–based,
supervised
and
unsupervised algorithmic approaches.
Knowledge-based Algorithms
Knowledge based methods are methods
that rely on external lexical resources
to disambiguate senses of the word.
Here we have described three different
knowledge-based algorithms that have
been used in WSD: a similarity algorithm,
a vector algorithm and a topic modelbased WSD system.
Knowledge-based Similarity Algorithm
In the general English domain semantic
similarity and relatedness measures have
been applied to the task of WSD. Semantic
similarity and relatedness measures
assign a score as to how similar or
related two concepts are. A more general
form of semantic similarity is semantic
relatedness. For example, foot and sock are
related but not similar, where as foot and
hand are both related as well as similar. To
disambiguate words in general English this
method has previously been used using
the knowledge source WordNet.
Knowledge-based Vector Algorithm
In this method, the vector creation module
creates a test vector for each instance
in the test data and a concept vector
for each possible concept of the target
word. The concept vector is created using
information about that concept from a
knowledge source such as its definition
or synonyms terms, for example,
[Patwardhan, 2003] use the definitions
of a concept and its related concepts,
and [Mohammad and Hirst, 2006] and
[Humphrey et al., 2006] use the terms
in a knowledge source associated with a
concepts categorization.
Topic Model-based WSD System
Li et al. (2010) use topic models (Blei et
al., 2003) which represent text corpora
using generative probability distributions,
since the middle component of their
WSD system. Topics are distributions
over words and each document is
modeled as a mixture of latent topics. Li
et al. (2010) extract from WordNet one
sense paraphrase per word sense. The
topic model is used to estimate a vector
of the topic distribution for the context
of the target word (usually the sentence
in which it occurs) and a vector for the
sense paraphrase of the candidate sense.
The cosine between these vectors is taken
as the final score for the word sense[7].
Therefore, the Topic Models approach
might yield better performance using
different parameter settings.
Algorithms based on Supervised Learning
Based on supervised learning some
algorithms compared in this study
(Support Vector Machines, Neural
Network, Decision Trees) are generally
used for WSD and differ considerably in
their ways of performing classification.
Support Vector Machines (SVM) Classifier
Support Vector Machine method is based
on the idea of learning a linear hyper
plane from the training set that separates
positive samples from negative samples.
Basically Support Vector Machine is
a binary classifier that classifies the
samples into either true class or in false
class. Here SVM for WSD must be
adapted to multiclass classification since
in WSD, one word may have more than
one meaning[8]. Support Vector Machines
(SVMs) is a new class of machine
learning techniques]. SVM is one of the
most robust and successful classification
Algorithms.
Neural Network Classifier
Neural network is also an approach
in supervised method which is
interconnection of artificial neurons. The
neural networks used for WSD purpose
are Hidden Markow Model or back
propagation based feed forward network.
Input feature and the expected output
are the pairs of input to the learning
technique. The aim of this approach is to
make use of input features to partition
the training contexts into non-overlapping
sets. To the training set the new pairs of
input is provided and the weights among
neurons are adjusted so that the expected
output is having larger values as compare
with the other outputs[9].
Decision Tree Classifier
Decision trees are one of the most
powerful used inductive learning methods.
These classifiers are most commonly
used particularly for data mining. Their
robustness to noisy data and their
capability to learn disjunctive expressions
seem suitable for document classification.
They are designed with the use of a
hierarchical division of the underlying
data space with the use of different text
features. They are performed in two
CSI Communications | March 2015 | 21
phases either tree building (top-down
manner) or tree pruning (bottom-up
manner). Decision tree method takes the
data described by its features as input. It
partitions the data of records recursively
using breadth-first approach or depth first
greedy approach until all the data items
have assigned to a particular class.
Algorithms based on Unupervised Learning
Clustering is a type of unsupervised
learning. In clustering method, objects of
the dataset are grouped into clusters, like
each group is different from other and the
objects in the same group or cluster are
very similar to each other. In clustering
there are no predefined set of classes
which means that resulting clusters
are not known before the execution of
clustering algorithm [10].
Self-Organization Maps (SOM)
Self Organization Map (SOM) uses a
competition and cooperation mechanism
to achieve unsupervised learning. SOM
is proposed by Professor T. Kohonen
in1982. After adequate training the output
layer of a SOM network will be separated
into different regions. And different
neurons will have different response to
different input samples. As this process is
automatic, all the input documents will be
clustered.
Hierarchical Clustering
Hierarchical methods are well known
clustering technique that can be
potentially very useful for various data
mining. Such kinds of clustering scheme
produces a sequence of clustering in
which each clustering is nested into the
next clustering in the sequence. Since
hierarchical clustering is a greedy search
algorithm based on a local search, the
merging decision made early in the
agglomerative process are not necessarily
the right ones. In Data Mining, hierarchical
methods are commonly used for
clustering.
k-means for text clustering
K-means is partition-based clustering
method where items are classified as
belonging to one of K-groups. The outcome
of partitioning method is a set of K clusters,
such that similar items falls or belongs to
same cluster. Every cluster contains a
centroid or a cluster representative. When
the clusters are more, the centroids can
be further clustered to produce hierarchy
within a dataset. K-means algorithm
CSI Communications | March 2015 | 22
Summary of Word Sense Disambiguation Approaches
uses an iterative approach to cluster the
database. The number of clusters that is
the value of K is defined by the user which
is fixed. For calculating the distance of
data point from the particular centroid
Euclidean Distance is used.
Comparative Analysis
Most of the methods discussed above
have advantages and disadvantages,
which are summarised here. Supervised
methods are accurate, but are reliant on
pre-annotated corpora to be effective.
This can be overcome using unsupervised
methods; although those methods have
difficulty in determining why and how
word senses are different[11]. Knowledge
based methods can solve this problem,
although the external lexical resources are
difficult to create manually. It is unclear
what it will take in order to create an
algorithm that can disambiguate finely
grained word senses with greater than
human level accuracy.
In this article we have discussed
about the various approaches of word
sense disambiguation and machine
translation in natural language processing.
Also comparison of the most well known
classification algorithms like decision
trees, neural network, SVMs, self
organizing feature maps, hierarchical
clustering, k-means and some knowledgebased algorithms has been done. Both
supervised and unsupervised methods
have advantages and disadvantages: on
one hand, it is possible to apply simple
supervised methods to disambiguate a
small pre-defined set of words. Whereas,
for more robust applications, unsupervised
methods seems to be more suitable as
they can deal with a bigger portion of the
lexicon.
References
[1] Ling Che and Yangsen Zhang “Study
on Word Sense Disambiguation
Knowledge Base Based on Multisources,” Published in: Intelligent
Systems and Applications (ISA),
(IEEE), Wuhan , 2011, PP. 1-4.
[2] Wilks, Yorick. 1975. Preference
semantics. Formal Semantics of
Natural Language, ed. by E. L.
Keenan, III, 329–348. Cambridge,
UK: Cambridge University Press.
[3] Roberto navigli, “Word Sense
Disambiguation:
A
Survey”,
Universit`a di Roma La Sapienza.
[4] Bridget
Thomson
McInnes,
“Supervised and Knowledge-based
Methods for Disambiguating Terms
in Biomedical Text using the UMLS
and MetaMap”, September, 2009.
[5] Word Sense Disambiguation First
Stage Report Kanwal Rekhi School
of Information Technology Indian
Institute of Technology, Powai,
Mumbai 2006-2007.
[6] Reza
Soltanpoor,
Mehran
Mohsenzadeh
and
Morteza
Mohaqeqi (2010), “A New Approach
for Better Document Retrieval
and Classification Performance
Using
Supervised
WSD
and
Concept Graph”, First International
www.csi-india.org
About the Authors
Conference on Integrated Intelligent
Computing, IEEE.
[7] Annemarie
Friedrich,
Nikos
Engonopoulos, Stefan Thater and
Manfred Pinkal, “A Comparison of
Knowledge-based Algorithms for
Graded Word Sense Assignment”,
Proceedings of COLING 2012:
Posters, pages 329–338.
[8] Vapnik V N, “The nature of statistical
learning theory,” Springer Verlag,
Heidelberg, DE, 1995.
[9] F M Lesk ”Automatic Sense
Disambiguation using Machine
Readable Dictionaries: How to Tell a
Pine Cone from an Ice Cream Cone.”
In Proceedings of ACM SIGDOC
Conference, Toronto, Canada, 1986 p.
25-26.
[10] Kehar Singh, Dimple Malik and
Naveen Sharma, “ Evolving limitations
in K-means algorithm in data
mining” IJCEM International Journal
of Computational Engineering &
Management, Vol. 12, April 2011.
[11] David Justin Craggs, “An Analysis
and Comparison of Predominant
Word
Sense
Disambiguation
Algorithms”.
n
Sunita Rawat received BE Degree in Computer Technology from Nagpur University, Maharashtra, India and Master Degree in
Computer Engineering from North Maharashtra University, Maharashtra, India and is currently working as an assistant professor
at Visweswaraiah Technological University, Karnataka, India. Her research interest text mining and Word Sense Disambiguation.
Dr. M B Chandak, is Ph.D. in Computer Science & Engineering, presently working as Professor and Head of Department,
Computer Science & Engineering at Ramdeobaba College of Engineering, Nagpur (An autonomous institute). He has total
21 years of academic experience. His research domain is Machine Translation and Natural Language Processing. He has
total 72 publications in International Journals of repute.
CSI Communications | March 2015 | 23
Research
Front
Biji C L* and Manu K Madhu**
*Ph.D. from University of Kerala
**M. Tech. Student, School of Computer Sciences, M G University, Kottayam
Data Compression –An Overview and Trends
in Genomics
Introduction
The magical touch of compression can
be felt in many modern computer and
communication technologies. As from
email communications to the sharing of
video through YouTube and transmission
of pictures through WhatsApp, in
lightning speed is possible only because of
compression. It is always time consuming
to move large files as such over internet
network and that even demands for a
higher bandwidth. The best practice is
to shrink the files by throwing away the
redundant data. For instance as shown
in Fig. 1, taking into advantage of human
visual system, it is possible to reduce the
size with varying levels of degradation of
quality acceptable for different purposes.
explanations. For example, a small extract
of the well-known poem Elegy written in
a country churchyard by Thomas gray is
shown below in Fig. 2.
The poet describes that many
blessed beauty in the nature is unseen
since we limit ourselves to understand
it. There are many intrinsic meaning
be perceived in these lines. In general,
inorder to understand the paintings or
poetry, one should know the principle
behind the artistic creation which is even
true for the data compression technology.
In short we can define data compression
as the process of transforming a data
from one representation to another so
that it takes less storage space or less
transmission time. Most of the real
Fig. 1: An example of Image compression for different picture resolution 409 KB, 37KB and 25KB
(Photo: Lakshmy Gopalaswamy by Hareesh N)
Even in our day-today life knowingly
or unknowingly we use compression. The
arrangement of things in the best possible
manner with the available space is also
an example for compression. So, from a
layman point of view, compression is the
process of discovering structures that
exist in the data[1]. Data compression is
widespread in number systems, natural
languages and even in mathematical
notations. It plays a very important role
in communications technology, especially
the digital multimedia. Many portable
practical developments like mobile
computing, digital & satellite TV, computer
systems such as memory structures &
disks employ data compression. But then,
centuries before the developments of
technology, poets and painters used the
principle of compression. The imaginative
skill of artist helped to concise an
elaborate message on a piece of paper
with minimum words or images, which
may actually require pages and pages of
CSI Communications | March 2015 | 24
world data have inherent redundancy
in the form of structural similarity or
some hidden patterns. Exploiting these
redundancies will help us to represent
data in less number of bits. Hence, in this
context, data compression may be viewed
as an art of representing information in a
compact form[1]. Even if the technology is
improving for better mass storage system;
the regular increase in data always urges a
need for compression techniques.
A look back in history reveals
the envisaging concept laid out by
Claude E. Shannon in his famous 1948
landmark paper “A Mathematical
Theory of Communication“ helped to
frame the concept of data compression
which is inevitable in many fields of
communication[2]. Annals shows the
strong inspiration of Harley’s paper in
proposing the mysterious concept of
information. The term information has
nothing to do with its meaning in common
parlance. The intuitive ideas shed out by
Shannon helped to relate surprise and
information[3] which will be explained
in detail in the subsequent sections.
Shannon proposed even the limit at which
a message can be transmitted from one
end to another through channel without
loss of information.
The abstract concept of information
proposed by Shannon forms the foundation
of all technological advancements, in the
field of data storage and transmission
systems.
Full many a gem of purest ray serene
The dark unfathom'd caves of ocean
bear: Full many a flower is born to blush
unseen, And waste its sweetness on the
desert air.
Fig. 2: Data Compression Analogy with Poetry
(Source: http://en.wikipedia.org/wiki/Elegy_
Written_in_a_Country_Churchyard)
Fig. 3: Claude E. Shannon –Father of
Information Theory (Source: http://www.nndb.
com/people/934/000023865/)
www.csi-india.org
Compression
Any source of information can be
translated into an efficient representation
using compression techniques for
better
storage
and
transmission.
Compression may be lossy or losseless.
If the compressed file can be reproduced
exactly similar to the input file, then the
scheme is called lossless compression.
Text compression is an example for
lossless compression. On the other hand,
if the reconstructed file is not exactly as
input file, then the scheme is lossy. Video
compression is an example for lossy
compression. Any compression algorithm
consists of two stages as shown in Fig. 4.
A source model, which describes the
redundancy of given message followed
by the selection of an optimal encoding
technique for a much precise and smaller
representation of the message[1].
Example 1:
Th_ _ss_nti_l M_ssi_g
Cla_de Sh_nn_n _nd Th_ M_k_ng _f
_nfo_m_ti_n The_ry
Even though a few letters are
missing, still we will be able to read the
text as “The essential missing Claude
Shannon and the making of information
theory”. As explained by Shannon, “any
one speaking a language possesses an
enormous knowledge of the statistics of
the language. Familiarity with the words
and grammar enables to fill in missing
or incorrect letters[4]”. This infact form
the fundamental of any compression
algorithm.
occurrence of the event. Thus, the
information contained in a message
depends on the probability of occurrence.
The information content decreases with
increasing probability of occurrence.
Mathematically, information is inversely
proportional to probability (p). This clearly
reflects Shannon’s idea that, there is more
information in rare events likes “winning a
lottery” and generally, the most probable
event like “Sun rises in the east” has less
information. Suppose there are n symbols
{a1, a2 …an} emanating independent of each
other from a source, with probabilities {p1,
p2 …pn} respectively. Then the information
content of any message of size k made out
Fig. 5: Shannon’s Prediction Model of communication system using reduced text (Source: C.E.
Shannon, “Prediction and Entropy of Printed English”, The Bell system, Technical Journal, vol.27, pp.50-64, July,
September, 1950)
In the subsequent sections, the
commonly used compression term
Information and Entropy is explained with
a few analogies which is further followed
by the current compression trends in the
field of genomics.
Fig. 4: Compression algorithm stages
Understanding
the
nature
of message is very crucial in any
compression problem. For example, in
the case of natural language, based on
the statistical structure of the language,
one can build a source model. A model
can be static or adaptive. The
probabilistic distribution of each symbol
is computed from a large corpus of
datum and will be fixed in compressor
and decompressor. For example, the
frequencies of symbols in English
language may be modeled from a
large corpus of English text. While
in adaptive models, an intelligent
predictor will be used to compute the
probability distribution of symbols.
Thus compression may be viewed as
an artificial intelligence problem. The
enormous knowledge about the statistics
of language helps to produce a reduced
text or the encoded message. For
instance, have a close look to Example 1
Information
Information is a common term that we
encounter in our daily life. The term has
broadly been used in many different
areas with many intuitive meanings,
which generally create confusion. As
per Shannon, the semantic aspect of
communication is irrelevant[1]. All forms
of messages like text, images, audio or
video can be transmitted in two states
like “yes” or “no”. Information may be
then defined as minimum number of
yes or no questions to determine the
state like “on(1) or off (0)”. Any system,
which is defined by two states has their
fundamental atom as bits. Hence bit is the
unit of information.In the 1948 land mark
paper Shannon quoted “If the number of
messages in the set is finite then this number
or any monotonic function of this number
may be regarded as information”[2]. Thus, as
proposed by Shannon, Information may be
mathematically expressed as
I =- log (1/N) = -logP, ..……………….(1)
Where P = 1/N, is the probability of
of these symbols is given by
k
I = -∑ log pi …………………….(2)
i=1
The famous simple prediction
game example[5] will certainly help one
to understand the intuitive meaning of
information. To state with, imagine any
random number from 1 to 100. One can
predict the number by asking logical yes
or no questions. For example, one can
reduce the search space by directly asking
whether the number is less than 50. Now
the search space reduced to one half,
which actually increase our confidence to
predict the number. Further, one can ask
is it less than 25, so that again the search
space is reduced. More logical questions
like is it prime number or Is it odd number,
help us to predict the exact number. How
can we connect this prediction game
to information theory concepts? In the
example the total possible combination
of numbers is 100. Information is the
logarithm of all possible combinations.
Hence log2 (100) = 6.6, so nearly 7
questions are required to correctly guess
the exact number.On the other hand,
it is possible for us to say nearly 7 bit of
information is present in the event.
As another example, consider a
sequence from the source of 4 alphabets
{A, T, G & C}
CSI Communications | March 2015 | 25
AATGGCACCT
Let p(A), p(T), p(G) & p(C) be
probability of occurrence of A, T, G & C
respectively.
3 = 0.3
2 = 0.2
p(A)= —
p(T)= —
10
10
2 = 0.2 p(C)= —
3 = 0.3
p(G)= —
10
10
The information content of A, T, G, C
can be computed as I(A)=-log(A)=2.32
bits, ,I(T)=-log(T)=1.74 bits, I(G)=log(G)=1.74 bits and I(C)=-log(C)=2.32
bits. Almost all symbols have equal
probability, hence uncertainty is more.
Each symbol have around 2 bit of self
information and the total information
content is
I = -∑ log2 Pi = 8.17bits
i
As another example consider the
tossing of biased coin with P(H)=1/5
and P(T)=4/5, then I(H)=-log(1/5)=2.32
bits and I(T)= -log(4/5)=0.1 bits. The
occurrence of tail is more, which means
the event is almost certain, hence the selfinformation is low. The occurrence of head
is low, hence the self-information is high.
Entropy
Entropy is a measure of uncertainty or lack
of information. It denotes “the amount of
surprise created on us”[5]. For instance,
the following news “School locks up UKG
student in dog house” certainly create more
surprise than “School locks up a dog in dog
house”. Since the first incident is rare to
happen, the number of bits required to
encode is more compared to the second
news.
In general, Entropy is the average
amount of information produced from the
event. It intuitively provides the number of
bits per symbol actually required storing
data. Thus entropy provides a bound for
lossless compression. Mathematically,
Entropy (H) is weighted average of
the probability (pi) of occurrence of all
possible events.
H = -∑ pi log2 (pi)
As mentioned earlier, the significance
of Information entropy is that it tells us
the minimum number of bits required to
encode the message digitally. This would
mean that if one measures the entropy
of a message, he can know if there is a
scope for compression of that message.
The number of bits required to represent
English text, if all letters and space are
considered to have the same probability,
is log2 (27) = 4.75 bits. But the underlying
structure of English language clearly states
that, the probability of occurrence of all
letters in a message is not uniform. Based
on standard estimation of probability of
occurrence of English alphabet
–∑pi log pi = p('a')∗ log p('a') + p('b')∗ log
('b')+...+ p('z') ∗ log p('z')=p(' ') ∗ log ('p')
= 4.14bits
As per Shannon, “If the language is
translated into binary digits(“0” or “1”) in
the most efficient way, the entropy H is the
average number of binary digits required
per letter of the original language”[3]. This
shows that English text can be ideally
be represented using 4 bits based on
the probability of its occurrence of each
alphabet.
Consider
the
coin
tossing
experiment with the following outcomes
HTHTTTTHHT
Fig. 6: Illustration of Entropy with analogy: School locks up UKG student in dog house- A rare event
hence entropy is more (Photo: http://www.munsif.tv/articles/2014/09/29/ukg-student-locked-kennelprincipal-arrested)
CSI Communications | March 2015 | 26
Case
Probability
H
4/10
T
6/10
H = –∑ pi log2 (pi)
= –[0.4 ∗ log2 (0.4) + 0.6 ∗ log2 (0.6)]
=.97bits / symbol
Let us again consider the example:
AATGGCACCT
The entropy H is calculated as
H = –[0.3 ∗ log2 (0.3)+ 0.2 ∗ log2 (0.2) + 0.2
∗ log2 (0.2) + 0.3 ∗ log2 (0.3)]
= 1.81bits / symbol
The probability of occurrence of all the
4 symbols in a sequence is almost same.
These messages may be represented
ideally using two bits. The statistical
nature of language helps to “reduce the
entropy” by selecting a proper model.
This knowledge in turn helps to store the
message in more efficient manner. With
the fundamental idea of compression we
cordially invite our readers focus into the
compression trends in Genomics.
Compression Trends in Genomics
Compression waves have alleviated
bottlenecks in many different fields
ranging from internet service to the
multimedia industries, its healing touch
can even be felt in the field of Genomics.
Compression is one technology that helps
to shrink data there by storing in the
same disk in a more effective approach.
The New biology pulls out a new form of
biological data- DNA (De-oxy ribonucleic
acid) that helps to reveal the mysteries of
life. DNA is equivalent to a text file with
four alphabets {A, T, G, C}, which forms
the genetic code that runs our life.
DNA is responsible for the
unique traits which is passed on to
offspring through both parents and this
macromolecule determine the variation
in gene accountable for the look of hair
or eye. For example, the Malayalam actor
Indrajith due to his inherit traits has a
strong resemblance to his parents Mallika
and Sukumaran as shown in Fig. 8. It is
even interesting to highlight the fact that
not only the curly hair or the long nose
but also the day to day activities of every
cell is being controlled by the secret code
engraved deep inside the nucleus of cell.
The human body system maintains
a symphony with one hundred, million,
million cells (100,000,000,000,000). The
symphony is controlled through the code
www.csi-india.org
for medical decision making. Considering
the demand for processing, analyzing,
transmitting and storing the huge data, DNA
Data compression seems a viable choice to
manage the flood of data.
The new big voluminous data,
addresses many computational challenges.
Fig. 7: A sample DNA sequence
Fig. 10: Personal Genome Card (Photo by
Hareesh N.)
Fig. 8: An example for DNA traits- Resemblence of Malayalam film actor Indrajith with his parents
(Source: http://en.wikipedia.org/wiki)
About the Authors
reside inside the nucleus of cell which one
inherit through hereditary. Human nucleus
contains 23 pairs of chromosomes. Each
chromosome contains a twisted ladder
shaped DNA (Deoxyribonucleic acid)
molecules. Two strands of DNA are known
as coding strand and template strand.
Each of them is complement of the other.
And these two strands are connected by
Hydrogen bonds. In DNA strands, Adenine
always combines with its complement
Thymine and Guanine always combines
with its complement Cytosine. Human
genome is made of 3 billion genetic letters.
Ever since the complete draft of the first
human genome in 2003, the biologist are
marveled by many insightful surprise. It
took nearly 13 years to publish the first draft
of human genome for $ 1 billion. A decade
later, with high through put sequencing
technology genomic data is growing and
the cost is dramatically decreasing. The
price of sequencing have gone down to
$5000 which is further expected to drop
down. Currently the genomic data is
accumulated in Petabyte scales[6]. Storage
requirement for a petabyte may result in
stacking of DVDs for nearly 2 miles tall. The
huge accumulation of data is surpassing all
hardware requirements for storing the data.
As mentioned earlier, the language of DNA
has 4 alphabet. Hence, the Shannon’s
information entropy is close to 2 bits per
base and this forms the upper limit to
encode the bases which is close to a naïve
encoding of 2 bits per base. Understanding
the nature of DNA data and exploiting its
repeat properties help to frame an expert
source model capable to compress the DNA
sequences. As the data explosion continues
to prevail we expect that novel compression
algorithms has to flourish for effective DNA
data Management.
References
[1]
Fig. 9: Sequencing Cost trend 2002-2014
[2]
(Source: http://www.nature.com/news/technologythe-1-000-genome-1.14901)
As shown below, sequencing technologies
hyped a long way than Moore’s law.
Even in the midst of data horror, greater
understanding of individual genome is of
great interest by physicians and scientist.
Early intervention of genetic risk, disease
prediction and treatment were made
possible with genetic understanding.
Moreover the prescribed rate of drug
dosage for each individual is revolutionizing
the personalized medicine industry too.
In the future, DNA sequences need to be
kept in hand –held like the credit/debit card
[3]
[4]
[5]
[6]
Khalid Sayood, “Introduction to Data
Compression” , Elsevier, 2nd edition 2000.
CE Shannon, “A Mathematical Theory
of Communication”, The Bell system,
Technical Journal, vol.27, pp.379-423,
623-656, July, October, 1948.
Arun K S and Achuthsankar S Nair, “It's
60 years since “kpbwcyxz” became
more informative than ‘I love you’”, IEEE
Potentials, Vol. 29, 2012, pp. 16-19.
C E Shannon, “Prediction and Entropy
of Printed English”, The Bell system,
Technical Journal, vol.27, pp.50-64, July,
September, 1950.
Achuthsankar
S
Nair,
“Claude
Shannon
&
Information
Theory”,
pp 26-28, Info Kairali 2003.
Vivien Marx “ The Big Challenges of Big
Data”, Nature vol 498, pp 255-260, 2013.
n
Biji C.L. is currently working towards her Ph.D. from University of Kerala. She is interested in communicating science
through popular science magazines and has earlier contributed to CSI communications.
Manu K. Madhu is an M. Tech. student of School of Computer Sciences, M G University, Kottayam. Apart from the
academic life, he is a passionate poem writer and he enjoys cooking.
CSI Communications | March 2015 | 27
Article
Amol Dhumane* and Rajesh Prasad**
*Assistant Professor, Computer Engineering Department of NBN Sinhgad School of Engineering, Ambegaon(Bk), Pune
**Professor & Head, Computer Engineering Department of NBN Sinhgad School of Engineering, Ambegaon(Bk), Pune
Routing Challenges in Internet of Things
Abstract: We are moving towards Internet of Things (IoT). Ubiquitous and pervasive computing is a nucleus of IoT. The number of
sensors deployed across the globe is very huge in number and their rate of growth is very high. These sensors are acting like the digital
skin of the earth. Sensors collect the raw data continuously and interpret this raw data for generating the knowledge out of it.
The routing of data from source to sink is a fundamental component of any large scale network. In IoT the communication devices
works with dissimilar networking standards, may experience irregular connectivity with each other and many of them can be resource
constrained. These characteristics raise several routing challenges which were not present in the traditional routing protocols. So it
is essential to understand the context while routing the data on the future networks. This survey addresses routing mechanism and
related challenges in IoT.
Index Terms – Internet of Things (IoT), context awareness, ubiquitous computing.
Introduction
Kevin Ashton from MIT coined the term
“Internet of Things” in early 2000’s. It stands
for a “world-wide network of interconnected
objects uniquely addressable, based on
standard communication protocols”[2].
Though IoT is a widely used term, its definition
is still fuzzy. IoT is a technological revolution
that represents the future of computing and
communications, and it aims at increasing
the ubiquity of the Internet by integrating
every object for interaction via embedded
systems, which leads to highly distributed
network of devices communicating with
human beings as well as other devices[5].
Compared with the traditional information
networks, IoT has three new goals, i.e. more
extensive interconnection, more intensive
information perception and more
comprehensive intelligence service[3].
Due to the explosion of short range
networks and occurrence of devices
connected to these networks, a flawless
interconnection between devices is
steadily being created. These shortrange networks contain wireless sensor
networks (WSNs), radio frequency
identification (RFID) networks, Bluetooth
and Zigbee networks. It is predicted
that the devices connected together
for creating, gathering and sharing the
information, which involves a sequence
of communication steps with or without
human interference.
At present, we need to build a
reference architectural model that will allow
interoperability in different types of systems.
The new research areas in IoT visualize the
interconnection of objects in of everybody’s
daily life. These research areas recommend
the
communication
between
the
heterogeneous devices. This heterogeneity
can be in terms of size, computational
power, memory and energy. The energy
of the device is one of the most important
CSI Communications | March 2015 | 28
resources which may cause the network to
experience the intermittent connectivity
and results in making the routing challenge
in IoT more complex.
IoT supports various types of
communication such as device to device,
device to human or human to device. The
communication could be intradomain or
interdomain. It can be single hop or multiple
hops. For multihop communication
devices relay information to achieve end
to end communication between source
and destination. Traffic patterns and
data flows are highly directional. These
patterns are classified into point to point,
point to multipoint and multipoint to
point. Due to the heterogeneous nature
of IoT some intelligence is required in the
communication process. Intelligence in
this context is the ability to of a device to
be aware of the environment in which it is
operating and collaborate with the other
devices to use the data it has collected
from its environment[1].
Many large scale wireless networks
uses low powered embedded devices for
data acquisition and actuation related
applications. These embedded devices
works under severe energy constraints
and communicate over a lossy channel.
These low power devices which are
the part of large scale wireless network
containing more or less other devices
may enter or leave the network at random
times. So the upcoming wireless routing
solutions that are going to be predicted
must be highly energy efficient, scalable
and self-sufficient.
This article is organized as follows:
section II of this article discusses about
the types of routing protocols, section III
talks about routing challenges that are
need to be addressed and last section
concludes the article.
Types of Routing Protocols
Routing protocols are classified into
proactive, reactive and hybrid routing
protocols in terms of the way by which
they make the routing decisions. Proactive
protocols always maintains the route
information in tabular format at any time,
reactive protocols builds the on-demand
route whereas hybrid routing makes use
of both proactive and reactive routing
algorithms. Table 1 states protocols of
various types.
Table 1: Routing protocol types
Protocol
Type
Protocol Name
Proactive
Optimized linked state
routing (OLSR), Destination
sequenced distance
vector (DSDV), Topology
dissemination based on
reverse path forwarding
(TBRPF)
Reactive
Dynamic source routing
(DSR), Ad-hoc on demand
distance vector (AODV)
Hybrid
Zone based hierarchical link
state routing protocol (ZRP)
Reactive protocols utilizes the
bandwidth more efficiently, it is more
suitable to dynamic network whereas the
proactive protocol is suitable for static
network.
From the researcher’s point of view
reactive protocols are more suitable in
WSN as the routes may get changed
frequently which results into the need of
constant upgradation of routing tables.
Akkaya et al[5] grouped routing
protocols for WSNs which is a major
component of IoT into following
categories:
(1)
data-centric,
(2)
hierarchical, (3) location based, (4) QoSaware. Data-centric protocols do not need
a globally unique ID for every sensor node.
www.csi-india.org
It does multihop routing by using attributebased naming mechanisms. Hierarchical
protocols partition the network into tiny
clusters with a node performing as a
cluster head. Location-aware algorithms
exploit the knowledge of the geographical
location of a node to achieve energy
efficient routing. QoS-aware protocols
can clearly deal with multi-constrained
requests for data transmissions.
This classification is further enlarged
by Boukerche et al.[6], who added two more
categories in the routing protocols, flat and
multipath. Flat category refers to the case in
which a large number of nodes work together
to sense the environment. The nodes are all
analogous and global IDs are not assigned
to them. The category multipath contains
the algorithms that compute multiple paths
from sources to destinations in order to
handle failing nodes effectively.
About the Authors
Challenges in Routing
Routing in the network made up of smart
objects has unique characteristics. These
characteristics led to formation of a new
WG known as ROLL, whose aim is to specify
a routing protocol for low power lossy
networks known as RPL[7]. In this section we
have discussed the major challenges that
can arise in the routing process of IoT.
1. Deployment of nodes: In contrast to
the traditional networks where the
topology of the network was known
exactly before establishment of the
network, it is very difficult in WSN
which is a important component of IoT,
to keep the topology fixed as the nodes
are deployed randomly on the field.
2. Heterogeneous devices: Devices
differs according to the type of
network standards they use and the
type of applications they support.
Also these devices can be different in
terms of the resources. Some devices
suffer from resource constraints and
some of them not.
3. Diverse networking standards: IoT
is an umbrella which brings various
4.
5.
6.
7.
8.
technologies such as traditional
network, WSN, Zigbee, WiFi etc
together. The working principles of
these technologies are diverse. They
use different protocol stacks.
Intermittent connectivity: Due to the
limited battery life, there is always
a danger of change in the network
topology. Intermittent connectivity
can also be experienced due to the
highly mobile devices, which get
disconnected from the network when
they move.
Multihop communication: Most
of the devices used in IoT are low
powered devices. These devices are
short range transmitting devices thus
they have to use relay mechanism
while transmitting the data from
source to destination.
Fault tolerance: Due to the
environmental factors, deployment
mechanisms or energy constraints
there is always a danger of affecting
the overall network performance. So
there must be some mechanism in
the routing protocols to handle such
unexpected events.
Security: Because of some dishonest
participants, the routing security issue
arises. Hop to hop authentication
is not enough. Cryptography can
mitigate the effects to some extent
but not completely.
Context awareness: Context aware
computing
includes
five
subtechnologies mainly: (1) getting
context (2) context-modeling (3)
context-reasoning (4) context-conflict
solving and (5) context-storage and
management[4]. In context aware
environment, system has to use context
information for doing necessary
changes in the routing process.
Conclusion
In this article we tried to discuss the
basics of routing mechanism and related
challenges in IoT. Internet changed our
lives to great extent since last two decades.
Now it’s a time to connect everything to
internet, so that it will make our lives more
comfortable. As we are going to connect
every possible ‘thing’ to internet, we have
to address routing issues that have already
addressed in the article.
Future is IoT, but still lots of things are
there that need to be resolved. At the edge
of future internet, in upcoming years it is
essential to make the routing a context
aware mechanism.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
O Bello and S Zeadally, “Intelligent
Device-to-Device Communication in the
Internet of Things (IoT)”, to appear in IEEE
Systems Journal, 2014.
INFSO D.4 Networked Enterprise RFID
INFSO G.2 Micro Nanosystems in Cooperation with the Working Group RFID
of the ETP EPOSS, “Internet of Things in
2020, Roadmap for the Future, Version
1.1,” European Commission, Information
Society and Media, Tech. Rep., May 2008.
H Zhou, K Hou, ”CIVIC: An Power- and
Context-Aware Routing.
Protocol for Wireless Sensor Networks,”
Proc. IEEE WiCom’07 , 2009, pp. 27712774.
Zhikui Chen, Haozhe Wang “A ContextAware Routing Protocol on Internet of
Things Based on Sea Computing Model”,
Journal of computers, Vol. 7, No.1, January
2012, pp. 96-105.
K Akkaya, M Younis, A survey on routing
protocols for wireless sensor networks,
Ad Hoc Networks 3 (3) (2005) 325–349.
A Boukerche, M Ahmad, B Turgut, D
Turgut, A taxonomy of routing protocols
in sensor networks, in: A Boukerche (Ed.),
Algorithms and Protocols for Wireless
Sensor Networks, Wiley, 2008, pp. 129–
160 (Chapter 6).
T Winter et al., “RPL: IPv6 Routing Protocol
for Low-Power and Lossy Networks”, IETF
RFC 6550, Mar. 2012.
Acknowledgements
We are thankful to our Principal Dr. S.D.
Markande and Prof. S.P. Patil, Head, IT
department, NBN Sinhgad School of
Engineering for their constant support and
motivation.
n
Amol Dhumane Has received his masters (M.E. Computer) degree from Bharati Vidyapeeth University, Pune in 2008.
He is working as a Assistant Professor in Computer Engineering Department of NBN Sinhgad School of Engineering,
Ambegaon(Bk), Pune. He is having 10 years of working experience. His area of interest is congestion control in computer
network. He has published over 10 papers in national and international conferences. He is a life time member of ISTE.
Dr. Rajesh Prasad Has received masters (M.E. Computer) degree from College of Engineering, Pune in 2004 and his
doctorate degree from SGGS, Nanded in 2012. He is working as professor & head in Computer Engineering department of
NBN Sinhgad School of Engineering, Ambegaon(Bk), Pune. He is having 18 years of experience. His area of interest is Soft
computing, Text Analytics and Information management. He has published over 40 papers in national and international
journals. He is a life time member of CSI and ISTE.
CSI Communications | March 2015 | 29
Article
Sumit Jaiswal*, Subhash Chandra Patel* and Ravi Shankar Singh**
*Ph.D. Student, Department of Computer Science & Engineering, IIT (B.H.U.), Varanasi
**Assistant Professor, Department of Computer Science & Engineering, IIT (B.H.U.), Varanasi
Secured Outsourcing Data & Computation to
the Untrusted Cloud – New Trend
Cloud computing has evolved as a
recent trend in computing based on the
model of providing delivery of services.
Cloud computing has transformed
the way people think about software
delivery & licensing, computing utility
& infrastructure. Cloud computing
concept is based on efficiently sharing
resources to customers. Dynamic
reallocation of resources to customers
is done based on demand basis. Cloud
computing offers several benefits like
Multitenancy
(shared
resources),
fast deployment, pay-for-use, lower
costs, scalability, rapid provisioning,
rapid elasticity, ubiquitous network
access. Broadly the services offered
by cloud computing comes under SaaS
(Software as a Service), PaaS (Platform
as a service), IaaS (Infrastructure as a
Service) categories. Users access the
services offered by cloud irrespective
about bothering where and how those
services are hosted. Many IT vendors
provide computing services, storage
services (synchronizing operations
across multiple devices) and application
hosting services based on cloud to
customers with minimal information
about the background operations to the
customers. Companies hosting cloud
services like Dropbox, Google Drive,
Microsoft’s Skydrive, Amazon’s Simple
Storage Service (S3), Elastic Compute
Cloud (EC2) are prominent examples
offering cloud services.
Outsourcing Computation to Cloud: A
New Trend
Now-a-days,
we
are
witnessing
tremendous growth in the penetration
of mobile devices, tablets among the
people. These mobile devices are
computationally weak devices due to
various resource constraints. They are
cheap but have limited computational
power. When it comes to perform
operations that require high computing
Cloud computing has transformed
the way people think about software
delivery & licensing, computing utility
& infrastructure.
CSI Communications | March 2015 | 30
power beyond the scale of mobile
devices, these devices seem inefficient.
This is where the idea of outsourcing
computation to some third party (cloud)
looks promising. The computationally
weak client outsources the computation
of function f (for some inputs x1, x2 . . .
xn) to third party cloud. The operation
is to be performed in the cloud and
result f(xi) = yi is returned to the user.
The user/enterprise has to pay for
computing service in terms of equivalent
use of computing cycles as measure by
cloud service provider. The applications
of cloud computing have significant
potential for the mobile devices (weak
computational
devices),
enabling
them to perform any hard computing
task by outsourcing it to cloud. The
outsourced computation can be any
computation ranging from evaluation of
Linear Equations, photo manipulation or
modular exponentiation operations etc.
For e.g. a similar famous project named
SETI@home is being run where huge
data of radio transmission is scanned for
any extra-terrestrial information. Many
people around the world volunteer to
participate by contributing their idle
cycles of computations of their CPU to
SETI. Thus large numbers of computers
collaborate towards a huge task of
scanning data for existence of any extra
terrestrial information by donating their
CPU cycles. In case of Cloud Computing,
mobile
devices
(computationally
weak devices) can outsource their
computation task to be performed by
the cloud (third party).
Outsourcing Data to Cloud (and it’s
Untrusted Nature)
Cloud
consists
of
infrastructure
maintained and operated at remote
locations which may exist beyond the
geographical boundaries of countries or
even continents. Security concerns arise
if the user/enterprise who wishes to use
cloud services has to outsource their
data to remote location for computation.
In other words, by outsourcing their
private business data, user/enterprise
may not have direct control over it. The
details of services and its processes
are not completely transparent during
operation in the cloud. However, the
user/enterprise has to “trust” the Cloud
Service Provider (CSP) over the handling
and privacy of data. But here comes the
important security question: “Are we
willing to put our sensitive/personal
data to remote cloud?” To ensure
privacy and confidentiality of data, we
require mechanisms to ensure that no
other information about the data can be
extracted by the Cloud Service Provider
(CSP) other than the required operations
to perform.
Possible Mechanism to Ensure Privacy
and Confidentiality of Data in Cloud
Threat to confidentiality and privacy of
data in cloud may exist both from the
Cloud Service Provider (CSP) itself as
well as from the outsider entities. Exact
nature of the threat to the data may not be
known clearly but some precautions may
be taken in advance to prevent breach of
privacy of sensitive user data at premises
in cloud.
One efficient way to achieve
privacy and confidentiality is to perform
“encryption” of data before transmitting
it to cloud. Data in encrypted form
is unintelligent & meaningless. No
information is feasible by any adversary
about the data regardless of the degree
of freedom adversary has over the data
(of course, unless the adversary does
not have information about keys, or does
modify/erase that data, in which case, it
can be detected easily by the owner in
case).
Therefore, for confidentiality, data
being outsourced in the cloud should be in
encrypted form. The Enterprise/User can
outsource their data in encrypted form
to cloud for the processing/storage in it.
Thus, the Cloud provides storage service
of the data under its IaaS service.
Outsourcing Data: Outsourcing
of data to cloud is very simple. Data
just needs to be encrypted before
transmission. One important aspect to
be noted is the problem of management
of encryption Keys after storage: Secure
distribution of keys to authenticated
users and its prevention from being
www.csi-india.org
or the function f itself) other than f’(y1,
y2, y3, … yn). Finally, the user(s) should
be able to verify the correctness of the
result by jointly computing f from f’ (i.e.
decryption of result over encoded function
f’ using key ki should verify the result of
computation over original function f ) i.e.
f (x1, x2, x3, … xn )= joint Decryption
[ f’(y1, y2, y3, … yn) ]
using Keys (ki) i=n
i=0
Fig. 1: Encrypted data storage and retrieval from cloud
misused. Encryption can be done in 2
ways : Symmetric and Asymmetric. In
Symmetric encryption, same key is used to
encrypt and decrypt the data. The user
must ensure the secrecy of key between
sender and receiver as it is solely used to
encrypt and decrypt the data. If the key is
lost/compromised, the confidentiality/
privacy of encrypted will be endangered.
However, in Asymmetric encryption
two different keys are used, one for
encryption (Public Key) and another for
decryption (Private Key). The key for
decryption is kept private/secret (held
by the owner) and key for encryption
(declared publicly) is distributed to
those users interested in transmission of
data. In Asymmetric encryption, Anyone
can be able to securely transmit data to
cloud (from anywhere around the world)
using Public Key but only the owner of
Private Key will be able to read the
plaintext data after decryption. Thus,
using both encryption schemes one can
easily share and outsource data to cloud
(since cloud provides platform to access
data from anywhere and anytime across
the world).
Outsourcing Computation: Now-adays with the advancement of cloud
computing, several trends have been
growing to “outsource” computing
from a (relatively) weak computational
device (mobiles, tablets etc.) to a more
powerful computation device. Recently,
enterprises are buying computing power
at rent from Cloud Service Provider on
pay per use basis. So to ensure the privacy
of computation over data, this model
of service is adapted for performing
computation over data (while it is in
encrypted form ) to ensure privacy with no
risk of information leaks of personal data
such as medical data, biological data, and
educational records etc.
Recent research in this field has
been focused on using Secure Function
Evaluation (SFE) to effectively compute
a given function f(x1, x2, x3, … xn) where
input xi can be private input of ith user
(or all inputs may belong to single user).
Here user(s) wish to compute a function
over his/their private inputs. User(s)
have computationally weak devices
(eg. mobile, tablet), so they transmit
public values yi (encrypted values of
xi using private keys ki ) corresponding
to their private inputs xi along with the
representation of the encoded function
f’ equivalent of their respective function
f. But at the end of the protocol/
computation the Cloud Service Provider
(CSP) should not be able to infer any
information (about the private inputs
Fig. 2: Searching (operation) over encrypted (private) data
Recently, research has been focused
on efficient verifiable computing, where
the users will be able to efficiently
verify the correctness of the outsourced
computation work performed by some
third party with significantly less work
required by the computation itself.
With the advent of homomorphic
encryption (encryption which allows to
perform mathematical operations, such
as addition or multiplication on it in its
encrypted state) Since the results are in
encrypted form, private key is required to
decrypt the results. It allows computations
to be performed on confidential encrypted
data without disclosing the private data.
New researches have been happening
over the improvements and applications
of homomorphic encryption in cloud
computing.
Therefore,
using
homomorphic
encryption, the users/enterprises will
be able to securely outsource heavy
computation to the cloud (computation
as a service on pay per use model), this
will allow the Cloud Service Provider
(CSP) to perform desired computations
over encrypted data without having
any information about data itself. Later,
the users will be able to verify the
correctness of the computation with
much less effort.
Conclusion
Cloud computing is a promising new
paradigm in computing in the coming future
The security issue of cloud is
witnessing tremendous research in
coming time and several IT vendors
are conducting research, as they
are now investing highly in cloud
computing. Outsourcing data and
computation to the cloud can prove
to be key trend in cloud computing if
the existing challenges of privacy and
security and concerns of enterprises/
users are met.
CSI Communications | March 2015 | 31
offering cheap alternative to the small
medium enterprises/users. This article
has outlined the benefits as well as
risks of cloud to the users/enterprises.
This article highlights one of the major
security concerns and privacy risks that
any enterprise/user is facing while using
the services of cloud (storing private data
to cloud using Amazon S3, outsourcing of
computation). Recent researches in this
field have been discussed in the article.
The security issue of cloud is witnessing
tremendous research in coming time
and several IT vendors are conducting
research, as they are now investing highly
in cloud computing. Outsourcing data and
computation to the cloud can prove to be
key trend in cloud computing if the existing
challenges of privacy and security and
concerns of enterprises/users are met.
References
[1] R Gennaro, C Gentry, B Parno. NonInteractive Verifiable Computing:
Outsourcing
Computation
to
Untrusted Workers, CRYPTO 2010.
[2] Kai-Min Chung, Yael Kalai, and Salil
Vadhan. Improved delegation of
computation using fully homomorphic encryption. In Proceedings
of the 30th annual conference
on
Advances
in
cryptology
(CRYPTO’10),
Springer-Verlag,
Berlin, Heidelberg, 483-501, 2010.
[3] Michael Backes , Dario Fiore , Raphael
M. Reischuk, Verifiable delegation
of computation on outsourced
data, Proceedings of the 2013 ACM
SIGSAC conference on Computer &
communications security, November
04-08, 2013, Berlin, Germany.
[4] h t t p : // w w w . m c g i l l d a i l y .
com/2014/09/harnessing-theworlds-computational-power
(accessed on 3rd November 2014)
n
About the Authors
Sumit Jaiswal is currently working towards his PhD from Department of Computer Science and Engineering,
IIT (BHU), Varanasi. He received his M.Tech from NIT Durgapur in 2013. He is student member of Cryptology
Research Society of India. His research interests include Information Security, Cryptography and Cloud
Computing. He can be reached at [email protected]
Subhash Chandra Patel received his M.Tech. degree in Information Security from the Guru Gobind Singh
Indraprashtha University, New Delhi in 2010. Currently, he is pursuing Ph.D. in the Department of Computer
Science & Engineering, IIT (BHU), Varanasi, India. His research interests include Cloud Computing Security,
and Information Security. He can be reached at [email protected]
Dr. Ravi Shankar Singh has received Ph.D. in Computer Science and Engineering from IIT (BHU), India
in 2010. He is working as Assistant Professor in IIT (BHU), Varanasi from 2004. His research interest
includes Cloud Computing Structures, Algorithms and High Performance Computing. He can be reached
at [email protected]
CSI Communications | March 2015 | 32
www.csi-india.org
Article
Richa Sharma* and T R Gopalakrishnan Nair**
*Research Associate, Jain University and VP, Advanced Imaging and Computer Vision Group, Research and Industry Centre (RIIC), Dayananda
Sagar Institutions Bangalore
**Aramco Endowed Chair, PMU, KSA and VP, Advanced Imaging and Computer Vision Group, Research and Industry Centre (RIIC),
Dayananda Sagar Institutions Bangalore
Intelligence for Diagnostic Imaging in the Medical World
Abstract: Diagnostic imaging with the help of intelligent machines and intuitive algorithms has made several strides of success in the
recent past in detecting multiple malfunctions of body expressed as diseases. It is rapidly evolving to include sophisticated imaging
analysis on already existing optical, magnetic resonance, computed tomographic and nuclear imaging technologies. As the number
of patients and the related medical images are increasing and the number of qualified doctors remaining finite, intelligent diagnostic
systems are becoming more and more need of the day. These are expected to be applied widely for finding a variety of diagnostic
indications of many different types of abnormalities in medical images. This article covers the state of the art research and developments
in Computer Aided Diagnostics, challenges, and possibilities.
Introduction
When intelligent algorithms are applied
on digital images to detect and diagnose
certain disease in an automatic way, it can
be called Computer Aided Diagnostics
(CAD). CAD is becoming a priceless tool
in medicine today that uses noninvasively
produced images of the internal organs of
the body for diagnostic purposes so that a
warning message could be given to people
who are at places where no sophisticated
systems and trained doctors are available
to detect deeply and confirm. In noninvasive
procedures, internal structures of the body
(hidden under the skin and bones) are
imaged without any cut in the patient’s body.
It is also possible to cut/remove organs and
tissues (as in biopsy) for imaging; such
procedures are usually considered as part of
pathology instead of medical imaging.
Imaging modalities available today
(X-ray, MRI, CT scan etc.) provide an effective
means for creating visual representations of
the interiors of the body for clinical analysis
and medical interventions. The knowledge
related to the anatomy of the normal and
diseased tissues has increased a lot because
of latest technology and innovation in
imaging methods.
In medical images, traditionally, the
value associated to each pixel alludes to
enormous information about the status
(appearance in terms of color, intensity etc.)
of the corresponding part of the organ. In
projection radiography, the pixel values are
generated by X-ray radiation. X-ray radiation
is absorbed by different types of components
such as bone, fat, and muscle and produces
different intensity patterns in the medical
images. In case of medical ultrasonography,
the pixel values get generated by ultrasonic
waves and echoes which penetrate the
tissues to visualise the internal structure
of the organs. At any cost a single pixel in
medical images is not intended to represent
a single cell of the organ. Nevertheless when
a collection of cells change due to infection
or so, this change can be visualised in the
form of a change in the hue of the pixels in
digital medical images
Medical images are traditionally
stored in DICOM format. DICOM differs
from most of the other data formats
because apart from preserving the
images with its best resolution, it groups
information into data sets. That means, a
file of a chest X-ray image, for example,
actually contains the patient ID, imaging
details, manufacturer details etc. within
the file along with pixel values. Hence the
image pixel data can never be separated
from these information even by mistake.
The main aim of traditional CAD
is to improve the diagnostic accuracy
of the disease. Computer-aided simple
triage (CAST) is another type of CAD,
which performs fully automatic initial
interpretation. It is mainly used in
emergency where a prompt diagnosis is
required. CAST performs a fully automatic
initial interpretation of patient’s condition
and automatically gives classification
result in the form of some meaningful
categories, such as, positive/negative,
critical/minor/normal etc.
CAD is a part of the routine clinical
work for detection of breast cancer on
mammograms, lung, and colon cancer,
large array of orthopedic issues and
muscular issues at many screening sites
of advanced hospitals across the world.
This article provides insight state of the
art research and developments in CAD,
challenges, and recommendations.
Research and Developments in CAD
Automatic detection of the disease has
been of interest to many researchers. A
number of techniques and approaches have
been proposed by researchers but how far
they can actually be applied in field practice
is yet to be verified in a long run. There are
basically two dimensions of the researches
going on in CAD. First approach insists on
introducing better technology for acquiring
more precise images. If the images are better
in terms of resolution, with more precise
data, they will obviously produce better
diagnostic results. The second approach, on
the other hand, insists on applying better or
smarter algorithms on existing images. This
can help to provide more accurate results
with the existing image datasets itself.
According to a study published in
European Journal of Radiology, Riverain
Technologies has developed a computeraided detection (CAD) software named
ClearRead +Detect with bone suppression
technology to detect subtle lung cancers
on chest X-ray images which usually go
unrecognized by the radiologists. This
could lead to an earlier diagnosis by an
average of 18 months. The software marks
suspicious regions on a conventional
chest X-ray image so that radiologists can
further evaluate these areas.
CAD4TB was released in 2010 and its
next version came in year 2012. It is used
to diagnose positive Tuberculosis from the
chest radiograph. It classifies normal vs.
abnormal chest and marks the suspected
region. This software is also capable of
sending the report through mobile phones. It
generates a report in the form of CAD score
within 30 seconds at zero variable cost with
93% sensitivity, 69% specificity. Cases with
high score can go for sputum test.
Recently
Oxford
University
researchers have come up with a computer
program that identifies various conditions
such as Down’s syndrome, Angelman
syndrome, or Progeria by facial features
in photographs and returns possible
matches ranked by likelihood. In another direction, a prototype
has been developed for measuring the
acoustic signals generated by the super
paramagnetic nano-particles (SNPs). This
system estimates the 3D location of the
tumor deep under tissue surface in real
time. Preliminary results demonstrate the
ability to localize small tumors (a few mm
in diameter) with positioning accuracy less
than 4mm. It can detect tumor presence at a
depth of a few cm below the skin surface, as
CSI Communications | March 2015 | 33
well as triangulation of its location. Further
powerful “image and treat” system can be
developed using the presence of these SNPs.
Recently, Harvard University in
collaboration with Max Planck Institute
for Brain Research, Frankfurt, Germany,
has come up with a noble imaging method
and algorithms to analyze the complex
synaptic network formed by the billions of
interconnected neurons. In this method
neural tissues are thinly sliced and each
section is imaged with a scanning electron
microscope at high resolution generating
mapping of all connections made by each
cell. This provides a detailed wiring diagram
of the brain: the connectome. The daunting
challenge here is, size and complexity.
In order to enable long-term health
monitoring, technologies are emerging to
design the sensors which can be stuck on
electronic tattoos or directly printed onto
human skin. There are mainly four emerging
unobtrusive and wearable technologies
for acquisition of health information: 1)
unobtrusive sensing methods, 2) smart
textile technology, 3) flexible-stretchableprintable electronics, and 4) sensor fusion.
In order to enable long-term health
monitoring, technologies are emerging
to design the sensors which can be
stuck on electronic tattoos or directly
printed onto human skin.
About the Authors
Challenges and Recommendations
Algorithms and approaches used for CAD
vary widely depending on the specific
application, imaging modality, and other
factors. For example, the segmentation of
lung tissue has different requirements from
the segmentation of the kidney because
basic features of these two regions are very
different from each other in terms of size,
shape, texture, and geometry of surrounding
area around region of interest (ROI).
External imaging conditions such as noise,
lighting conditions, partial volume effects,
and motion can also affect performance
of segmentation algorithms significantly.
Furthermore, each imaging modality has
its own peculiarities to deal with. Selection
of the right method for pre-processing is
crucial for achieving desired segmentation
result. There is currently no single method
that yields acceptable results for every
medical image. Hence there is a large scope
of further work in this field.
Images with better resolution and
contrast levels will have more data for
analysis which can lead to more accurate
diagnosis. Hence, the challenges to CAD
starts with the very first step of image
processing, that is, image acquisition.
Researches are going on in this direction,
and many new imaging methods like
Photo Acoustic Tomography, Array-Based
Micro-Ultrasound Scanner for Preclinical
Imaging, etc. have been introduced.
Currently, Computed Tomography (CT)
and Positron Emission Tomography (PET)
are being used for CAD. DiffractionEnhanced Imaging and Phase-Contrast
X-ray Imaging (PCI) are innovative
methods that are sensitive to the refraction
of the X-rays in matter. PCI is mainly
adapted to visualize weakly absorbing
details like those often encountered
in biology and medicine. Recently, the
MicroCT has come into existence. It
acquires non-destructive visual cross
section after capturing 3D view of the
rotated object. It uses refraction phase
contrast imaging rather absorption. In
the case of MicroCT, resolution does not
fall but remains same as sample moves
away from the source. Multiple detector
magnification (at varying distance and
angles) is used for zooming without
cutting down the samples. Better CAD
software can be developed by utilizing
additional details received from these
newly introduced imaging modalities.
Photometry is the science of the
measurement of light in terms of perceived
brightness to the human eye. Human eye
is not equally sensitive to all wavelengths
of visible light. The concept of photometry
can be incorporated and utilised in CAD
systems in an effective way to make them
perform much better.
Sometimes, images with their limited
capacity may not capture the signature of
the disease. An expert doctor is well aware
of this fact. That’s why in order to confirm
the diagnosis, along with the radiological
images, usually he considers case history
An effective method is needed
which can combine the information
received from images with its case
history, clinical evidences and
background knowledge effectively and
automatically.
of the patient, symptoms, clinical evidences
and then on top of that, he applies his
background knowledge what he has gained
after many years of practice. Hence, another
challenge to CAD lies in linking the data
recovered from medical images with a
semantic knowledge-base. An effective
method is needed which can combine
the information received from images
with its case history, clinical evidences
and background knowledge effectively
and automatically. Assigning right level of
specificity and sensitivity for such a tool
may be a big challenge, and it requires very
careful research, design, and development.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Goldenberg R, Peled N, Computer–aided
simple triage, Int J Comput Assist Radiol
Surg, Sept. 2011 , 6(5):705-11.doi:10.1007/
s11548-011-0552-x. Epub 2011 Apr. 16.
Riverain Technologies, retrieved from http://
www.riveraintech.com/riverain-receivescfda-approval/, (accessed Oct. 27, 2014).
CAD4TB- diagnostic imaging analysis group
retrieved from http://www.diagnijmegen.
nl/index.php/CAD4TB (accessed Oct. 27,
2014).
T Salach, I Steinberg, and I Gannot, Tumor
Localization Using Magnetic Nano-Particles
Induced Acoustic Signals, Tel Aviv University,
Israel, & Johns Hopkins University, IEEE
Trans Biomed Eng, 2014, Volume 61,
Issue 8, Page: 2313-2323 doi: 10.1109/
TBME.2013.2286638. Epub 2013 Oct. 21.
Unobtrusive Sensing and Wearable Devices
for health Informatics, IEEE Trans. On
Biomedical Eng., Vol 61, Issue 5, 2014,
DOI: 10.1109/TBME.2014.2309951.
Emerging Imaging Technologies in Medicine
edited by Mark A Anastasio, Patrick La
Riviere, CRC Press, Dec.06, 2012.
Lighting flashcards | Quizlet, retrieved from
http://quizlet.com/17556191/lighting-flashcards/ (accessed Oct. 27, 2014).
Computer-aided diagnosis of rare genetic
disorders from family snaps, retrieved from
http://www.ox.ac.uk/news/2014-06-24computer-aided-diagnosis-rare-geneticdisorders-family-snaps (accessed Oct. 27,
2014).
n
Richa Sharma has 12 years of teaching and research experience. She holds the M.Tech. degree (I.T. BHU, Varanasi) and is engaged in
her doctoral program. Her areas of interests are digital image processing and medical image analysis. She is a Member of International
Association of Computer Science and Information Technology and Computer Society of India.
Dr. T R Gopalakrishnan Nair has 30 years of experience in professional field spread over industry, research and education. He holds
degrees M.Tech. (I.I.Sc., Bangalore) and Ph.D. in Computer Science. He is currently the Saudi Aramco Endowed Chair of Technology
and Information Management, at Prince Mohammad Bin Fahd University. He is a winner of PARAM award for technology innovations.
(www.trgnair.org)
CSI Communications | March 2015 | 34
www.csi-india.org
Practitioner
Workbench
Bharti Trivedi
ICT Consultant, Adjunct Professor at M.S. University of Baroda
Programming.Tips() »
Geometric Transformations in ‘C’ using OpenGL Graphics API
OpenGL is a software interface to graphics hardware. OpenGL
is designed as a streamlined, hardware independent interface
to be implemented on many different hardware platforms. A
sophisticated library OpenGL Utility Library (GLU) provides
the graphical modeling features such as geometric primitives,
quadratic surfaces, Bezier, B-Spline curves and surfaces.
The interface consists of more than 300 distinct commands
to specify objects and operations to produce interactive 2D / 3D
applications. OpenGL is a state machine that is you put it onto
various states that remain in effect until you change them.
Translation, rotation and scaling are the 2D geometric
transformations whereas reflection and shearing are composite
transformations. The geometric transformations are needed as a
viewing aid, as a modeling tool and as an image manipulation tool.
C program using OpenGL (GLUT) Library to perform
geometric transformations.
#include<glut.h>
#include<stdlib.h>
void object()
{
glBegin(GL_TRIANGLES);
glColor3f (1.0,1.0,0.0);
glVertex2f(15,25);
glVertex2f(75,25);
glVertex2f(45,55);
glEnd();
glBegin(GL_LINE_LOOP);
glColor3f (0.0,0.0,0.0);
glVertex2f(15,25);
glVertex2f(75,25);
glVertex2f(45,55);
glEnd();
glBegin(GL_POLYGON);
glColor3f (1.0,0.0,0.0);
glVertex2f(30,30);
glVertex2f(35,30);
glVertex2f(35,35);
glVertex2f(30,35);
glEnd();
glBegin(GL_POLYGON);
glColor3f (0.0,0.0,1.0);
glVertex2f(55,30);
glVertex2f(60,30);
glVertex2f(60,35);
glVertex2f(55,35);
glEnd();
glBegin(GL_TRIANGLES);
glColor3f (0.0,1.0,0.0);
glVertex2f(40,40);
glVertex2f(50,40);
glVertex2f(45,45);
glEnd();
}
void axis()
{
glColor3f(0.0,0.0,0.0);
glLineWidth(2);
glBegin(GL_LINES);
glVertex2f(-100,0);
glVertex2f(100,0);
glVertex2f(0,100);
glVertex2f(0,-100);
glEnd();
glLineWidth(1);
}
void reflect_x()
{
glScalef(1,-1,1);
object();
}
void reflect_y()
{
glScalef(-1,1,1);
object();
}
void reflect_xy()
{
glScalef(-1,-1,1);
object();
}
void trans()
{
glTranslated (15,30,0);
object();
}
void rotate()
{
glTranslatef(15,25,0);
glRotatef(30,0,0,1);
glTranslatef(-15,-25,0);
object();
}
void scale()
{
glScalef(0.5,0.5,1);
glTranslatef(0,0,0);
object();
}
void display(void)
{
glClear(GL_COLOR_BUFFER_BIT);
glColor3f(0,0,0);
axis();
object();
CSI Communications | March 2015 | 35
trans(); //Figure (a)
rotate(); //Figure (b)
scale(); //Figure (c)
reflect_x(); //Figure (d)
reflect_y(); // Figure (e)
reflect_xy(); // Figure (f)
glFlush();
}
void init(void)
{
glClearColor(1.0,1.0,1.0,0.0);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluOrtho2D(-80,90,-80,90);
}
int main()
{
glutInitDisplayMode(GLUT_SINGLE|GLUT_RGB);
glutInitWindowSize(300,300);
glutInitWindowPosition(0,0);
glutCreateWindow(“Transformations”);
init();
glutDisplayFunc(display);
glutMainLoop();
return 0;
}
n
Figure (a) to Figure (f) shows the output screens.
Figure (a)
About the Author
Figure (d)
Figure (b)
Figure (c)
Figure (e)
Figure (f)
Dr. Bharti Trivedi, an Academician and Administrator, is a dynamic professional with two decades of experience and with expertise in
research, teaching, project management, corporate training and consultancy. She has done Masters and Ph.D in Computer Science.
She is a renowned faculty at M.S. University of Baroda, National Academy of Indian Railways, Indian Institute of Materials
Management. She is Director of Apex Technology. She is a recipient of national award for best Faculty at IIMM. She is a noted author
and speaker on the emerging applications of ICT and has guest lectured at various universities in India and abroad on wide range
of topics on emerging trends of IT. She also delivers the industrial courses to business executives and IT professionals globally (in
China, India and South Korea). She has presented scientific papers at various conferences at Dubai, Wrexham- London, Malaysia,
South Korea. She was member of editorial board of “International Journal of Green Computing” (IGI Global, PA, USA). She is in the
national website committee of IIMM, life member of CSI and ISC. She can be contacted at email [email protected]
CSI Communications | March 2015 | 36
www.csi-india.org
Innovations
in India
Taruna Gupta and Jyothi Viswanathan
Corporate IPR Group, TCS
Collaborative Invention Mining - Make Your Ideas Patentable
About the Authors
Enterprises generate a lot of ideas
to bring out innovative solutions to meet
customer’s requirements or to solve their
own business problems. This leads to the
following scenarios:
•
An inventor comes up with a new
idea but he/she is not very sure
how unique the idea is and how it
measures up to other competitor
products/patents that could be
similar or overlapping.
•
An inventor feels that his/her idea is
commonplace and not patentable.
•
An inventor feels that his/her idea is
very different as compared to what
already exists, but when the details
of the case are studied, then the
same does not stand up to the test of
patentability.
To deal with such scenarios, TCS
came up with an innovative design for
a process and system, referred to as
Collaborative Invention Mining (CIM) to
help the inventors widen, lengthen and
deepen their idea to mature it iteratively
into a patentable invention.
The CIM system is a collaborative
platform for inventors and other
stakeholders (Moderator, Prior Art Analyst,
Claims Analyst, Technical Writer and
others) to discuss an idea and question,
deliberate, and segment the same as part
of the Storm-Form-Norm-Compose subprocesses. It includes various modules
such as idea detailing, search and analytics,
workflow management, idea management,
claim construction, metrics generation,
and display and visualisation to enable the
overall process.
An idea sharing matrix template is
used as the basis for real-time collaboration.
Stake holders key in their inputs into the Idea
Detailing Tree matrix, which is governed by
a set of rules for each sub-process. Different
areas of the matrix are unlocked in a phased
manner after receiving and processing
the Area dimensions of Process,
Technology, Measurement and
System through collaborative
deliberation.
3.
It is then deepened
as part of the Norming subprocess across Characteristic
attributes such as Efficiency,
Adaptability,
Agility
and
Anticipative attributes through
collaborative segmentation.
4.
Lastly,
the
claim
Fig. 1: TCS Collaborative Invention Mining Model
elements are converged and
analysed to result into patent
claim statements as part of the
inputs from stakeholders at various subComposing sub-process.
process levels. There is a defined entry
On basis of assigned weightages and
criteria, a set of tasks to be completed that
scoring at each step, a final invention score
are validated in the collaborative platform
is computed that indicates how strong
and then if the output meets the exit criteria,
the idea is, to be patentable. The patent
the next area of the matrix is available to the
claim statements are mapped into a treeinventor community.
like structure to visualise independent
The maturing of the idea takes
and dependant claims, which are later
place iteratively as it flows through the
input to a resultant, corresponding patent
following Storm-Form-Norm-Compose
specification.
sub-processes:
1. During the Storming sub-process,
CIM is being used to determine
the idea is widened and categorized
patentability of a number of ideas in TCS.
along the Category dimensions of
It is serving to impart confidence to new
Novelty, Inventive Step and Utility
and experienced inventors to objectively
through collaborative questioning.
assess patentability of their inventions
More utility scenarios typically
in a governed manner, provide newer
emerge at this stage for the idea.
dimensions to enhance their ideas over
2. The widened idea is further
what is already known, and help establish
channeled into the Forming subinventor communities within the enterprise.
process where it is lengthened across
References
[1]
TCS Patent Published
Application
–
EP2637130 A1, US
13/493,162,
608/
MUM/2012
‘Collaborative system
and method to mine
inventions’ - Santosh
Kumar
Mohanty,
Shampa Sarkar, Jyothi
n
Viswanathan
Fig. 2: TCS CIM – Storm-Form-Norm-Compose process
Taruna Gupta is a senior member of the Corporate IPR Group at TCS. In her current role, she is responsible for driving Copyright initiatives
across TCS, also to drive IP creation strategy and execution for several TCS units. This involves working with the various TCS units to promote,
protect and profit from TCS IP in the form of business aligned patent portfolios, IP led solutions, copyrights and trademarks. Prior to this role,
she led Presales for TCS Life Sciences & Healthcare, Energy & Resources, and for a large global banking customer relationship. In her earlier
roles, she headed the TCS Knowledge Management Practice, and has led Program Management for many large projects.
Jyothi Viswanathan is a member of TCS' Corporate IPR group. In her current role, she is responsible for IP Maintenance, Trademarks and
also helping business units identify and protect IP. She also drives other IP led initiatives which focus on promoting and profiting from IP.
She is also a registered Patent Agent with the Indian Patent Office.
Innovators interested in publishing about their work can send a brief write up in 150 words to Dr Anirban Basu, Chairman, CSI Div V, at [email protected].
CSI Communications | March 2015 | 37
Security Corner
Vishnu Kanhere
Convener SIG – Humane Computing (Former Chairman of CSI Mumbai Chapter)
Case Studies in IT Governance, IT Risk and Information Security »
Machine Translation – Quantum Leap or Flash in the Pan
Machine Translation (MT) simply put is the use of software to translate text or speech from one natural language to
another. MT performs simple substitution of words in one natural language for words in another, but that alone usually
cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the
target language is needed. To process any translation, human or automated, the meaning of a text in the original (source)
language must be fully restored in the target language, i.e. the translation. While on the surface this seems straightforward,
it is far more complex. Solving this problem with corpus and statistical techniques is a rapidly growing field that is leading
to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.
MT software is customized by domain or profession (such as weather reports), improving output by limiting the scope
of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is
used. It follows that machine translation of government and legal documents more readily produces usable output than
conversation or less standardized text. Machine translation with no human involvement was pioneered in the 1950s and
has come a long way in the last 60 years.
Primarily, there are two types of automated or instant translation software, rule based and statistical. Rule-based
systems use a combination of language and grammar rules plus dictionaries for common words. Specialist dictionaries
are created to focus on certain industries or disciplines. Statistical systems have no knowledge of language rules. Instead
they “learn” to translate by analyzing large amounts of data for each language pair. They can be trained for specific
industries or disciplines using additional data relevant to the sector needed.
Machine Translation has made considerable advances in terms of accuracy, consistency and even fluency and flow
of language when translating from one natural language to another. However when, what type of, how, where MT is to be
used needs to be carefully considered before using MT given the various aspects and issues involved in it.
Machine translation does have significant advantages like speed, ability to process large volumes and low cost of
translation, there are some tough challenges in using MT given the fact that no two languages are alike in terms of
structure, grammar, use of words and many other aspects.
Given this background the current Case in Information Systems is being presented. The facts of the case are based
on information available in media reports, online information and some real life incidents. Although every case may cover
multiple aspects it will have a predominant focus on some aspect which it aims to highlight.
A case study cannot and does not have one right answer. In fact answer given with enough understanding and
application of mind can seldom be wrong. The case gives a situation, often a problem and seeks responses from the
reader. The approach is to study the case, develop the situation, fill in the facts and suggest a solution. Depending on
the approach and perspective the solutions will differ but they all lead to a likely feasible solution. Ideally, a case study
solution is left to the imagination of the reader, as the possibilities are immense. Readers’ inputs and solutions on the
case are invited and may be shared. A possible solution from the author’s personal viewpoint is also presented.
A Case Study of Kachwala Mistry & Partners
Sameer Kachwala is the Senior Partner of
Kachwala Mistry & Partners. Of late the
firm is facing a lot of difficulties in filing
plaints, registration papers and other
legal work on time. One of the causes of
concern is the inordinate delay in getting
the documents translated. The State of
Anarthapur and its neighboring State
Nirdhanabad, where the firm’s practice
is primarily concentrated, both have
nine major languages of which two are
used in the courts and are official state
languages for documents, land records
CSI Communications | March 2015 | 38
and legislation. The High Court of both
states also uses English as its official
language. The firm being an international
law firm with clients mainly in UK and US
uses English to transact its business.
The problem of delay had arisen
due to a series of events – first a strike
in the official translation department of
Nirdhanabad, followed by a language
agitation in Anarthapur, which was further
compounded by the fact that two of the
senior translators who were with the firm
for over 30 years had retired and their
services were no longer available. The firm
had tried different alternatives including
outsourcing work to external agencies but
somehow things were not working out.
Sameer and his partner Adi had just
returned from a business trip overseas,
where they had come across Machine
Translation software and were very
impressed. Determined to do something
about the situation both Sameer and Adi
have decided to introduce MT and call a
meeting of all partners and senior staff
including the IT manager.
www.csi-india.org
Sameer is quick to list the advantages
•
MT is fast - When time is a crucial
factor, machine translation can really
help. You don’t have to spend hours
poring over dictionaries to translate.
Instead, the software provides quality
output in no time.
•
MT is economical - It is comparatively
cheap. There is an initial investment
but in the long run it is a very small
cost considering the return it
provides. If a professional translator
is used, he will charge you on a per
page basis which is extremely costly
while software has one time cost.
•
MT can deal with Multiple Languages
– In a state using nine languages the
same software can simultaneously
provide translation from / to any /
all the languages – whereas for each
combination a different professional
translator may be required.
•
MT can process large volume of data
– A heavy work load will result in
piling up work, backlog and delays in
manual translation but not so when
MT software is used.
Adi added that there could be
some issues about quality, accuracy of
translation or some such issues but was
sure that the experienced staff will be able
to deal with it. “After all we do review the
translations before submitting them.” – He
had remarked.
Predictably Anil, Makarand and
Hemant all in their thirties had jumped at
the idea. So was the IT manager elated.
The resistant lot seemed to be Dwarkanath
who belonged to the old guard and
Yogendra the Staff representative.
Yogendra was outspoken and raised
the issue of the three vacancies in the
translation department which had not yet
been filled and attributed the problems to
this. He was also apprehensive that with
the MT software, not only these will not be
filled but the five staff currently employed
in the translation department would also
be asked to go.
Dwarkanath had more fundamental
issues. He stressed the ambiguity and gaps
in words in different languages. The word
run in English has more than fifty different
meanings and usages and so were many
other words in all languages. This new
fangled software does not understand the
context and the fine nuances of language.
He gave an example of difference in
syntax in languages with the following
example of the result he produced after
using the MT software to first translate
from English into French and then back,
which produced results which are
completely different –
•
She ran into the room. (English)
•
Elle entra dans la salle en courant.
(French)
•
She entered into the room in/while
running. (English)
•
Ea intra in camera alergând. (French)
Vinod the junior IT Assistant chipped
in with how Google Translate had actually
helped
Google Translate helps paramedics deliver
baby - February 10, 2015
The Swahili word for “thanks” might
be appropriate to Google Translate,
after it helped two paramedics deliver
a Congolese woman’s baby in Ireland
this week. Men’s lifestyle website
Joe.ie cited a report by The Corkman news
site that the incident occurred this week
between Macroom and Lissarda, when the
paramedics were bringing the woman to a
hospital to deliver her child.
“It’s something that I think I won’t
ever forget as I was translating Swahili
into English somewhere on the side of the
road between Macroom and Lissarda,”
paramedic Gerry McCann said. McCann
and Shane Mulcahy, were taking the
woman to the Cork University Maternity
Hospital when the baby was due. The
problem, however, was that the woman
spoke limited English. Thinking quickly,
McCann opened Google Translate on his
phone to communicate with the woman.
As a result, a “beautiful baby girl” was
born, the report said.”
Sameer and Adi who had already
made up their mind grabbed this
opportunity and pushed through the
change of adopting MT software
unanimously at the meeting. However
realizing that there was indeed merit
in the issues raised in the meeting they
thought it wise to consult Radha who had
extensive experience in MT software and
its implementation, to guide them in this
exercise.
Radha has a series of meetings both
group and one on one with the partners,
different departments and staff and has
now come up with an understanding of
the major issues. What would be your
thoughts if you were Radha?
Solution
The situation
The firm is currently facing several issues in
meeting deadlines, filing plaints, affidavits
and preparing legal documentation.
One of the main reasons is the backlog
in translations. The firm operates in an
environment using multiple (up to nine)
local languages as well as English. The
backlog has been aggravated due to strike
in the Translation department of the court
and other issues which have put pressure
on the translation staff of the firm. This
department is currently handicapped due
to the retirement of two senior translators
for whom no replacement has been found.
The situation is likely to increase
in the long run with increasing costs
of translation, difficulty in outsourcing
translation of confidential and sensitive
documents and ever shortening deadlines
and increased expectations of speed and
quality of output – given the current level
of competition and globalization.
Machine Translation provides distinct
advantages like speed of processing, ability
to translate large volume of text / matter,
comparatively cheaper cost per page of
translation, ability to simultaneously deal
with multiple languages and above all
improved translation ability with the recent
advances in MT software. Adoption of MT
software seems to be the option of choice
as it does provide a viable alternative to
supplement and strengthen the present
process of manual translations.
However, as noted there are many
limitations and short comings of MT
software. There are quality issues, issues
of inability to translate effectively where
structural or linguistic differences exist
and where context provides the meaning.
The level of accuracy expected in a law
firm is of professional standard and a MT
software translation may not be able to
meet or even come near to that quality.
The consequences
The issues and challenges and even the
resistance within the organization is being
aggravated not by the proposed adoption
of MT software but the manner in which it
is proposed to be used and done. Unless
the issues and challenges are met and
overcome and the resistance within is
addressed satisfactorily, the firm will end
up with more problems than before post
implementation of MT software.
The Strategy
The right strategy for Kachwala Mistry
& Partners, the law firm, at this stage
would be:
Identify and address the issues
challenges and resistance systematically:
CSI Communications | March 2015 | 39
1.
2.
3.
4.
Issues and Challenges –
MT software output is not consistent
in quality
MT software overlooks / cannot
understand context which has a
significant impact on meaning
MT software is unable to effectively
deal with differences in language
structure, differences in construction,
words having multiple meaning
and usage, idioms and phrases,
structural
bilingual
ambiguity,
lexical differences and a whole lot of
linguistic issues.
Translation is not merely word
replacement and MT software cannot
and does not take into account
the customary usages and body of
knowledge and conventions specific
to particular languages.
About the Author
Resistance from within 1. Perceived curtailment of jobs and
redundancy in the once strong
translation department
2. Inability of software to address
language issues
3. Possibly multiple languages for which
Translation software is not developed
/ available and the fear of being
unable to cope with the new system
especially of the older employees.
Adoption of Hybrid Approach: The
right approach to be adopted in this
situation will be to adopt a human /
machine compromise – a hybrid approach.
Machine translation with what we call
pre- and post-editing is a methodology in
which a linguist “trains” or programs the
machine-translation engine to correctly
translate context-specific terminology,
phrases with double meanings and casebased client exceptions to rules where the
MT platform may have otherwise made
a mistake. The content is then processed
by the machine translation software and
then after translation, a professional
human translator reviews the output and
edits it for technical accuracy, style and
comprehensibility.
Providing MT software to supplement
and strengthen the Translation Department
and not as a replacement: Providing MT
software as a tool will remove staff
resistance bred on fears of replacement
/ redundancy. It will also address the
concerns of quality of translation as it
will be reviewed. Introducing the human
element thus removes most of the major
issues, challenges and resistance.
Creation of Decision Rule for
assigning jobs to MT software: Decision
rules based on criteria covering different
aspects like volume of work, nature of
translation document, intended use
internal / external – certain jobs can be
entirely assigned to MT software with
limited review. Others which do not meet
criteria or are difficult will see limited use
of MT software with substantial human
intervention.
The way forward is to adopt the
hybrid approach to improve the working
of the Translation Department, with MT
software being acquired and introduced
not to replace it but to supplement and
strengthen its working and improving the
overall efficiency and effectiveness of
the firm. This way, the firm can balance
the need for speed and cost benefits of
machine translation and address the
potential pitfalls.
An effective solution is generally
expected to proceed on these lines.
n
Dr. Vishnu Kanhere is an expert in taxation, fraud examination, information systems security and system audit and has done his Ph.D. in
Software Valuation. He is a practicing Chartered Accountant, a qualified Cost Accountant and a Certified Fraud Examiner. He has over 30
years of experience in consulting, assurance and taxation for listed companies, leading players from industry and authorities, multinational and
private organizations. A renowned faculty at several management institutes, government academies and corporate training programs, he has
been a key speaker at national and international conferences and seminars on a wide range of topics and has several books and publications
to his credit. He has also contributed to the National Standards Development on Software Systems as a member of the Sectional Committee
LITD17 on Information Security and Biometrics of the Bureau of Indian Standards, GOI. He is former Chairman of CSI, Mumbai Chapter and has
been a member of Balanced Score Card focus group and CGEIT- QAT of ISACA, USA. He is currently Convener of SIG on Humane Computing
of CSI and Topic Leader – Cyber Crime of ISACA(USA). He can be contacted at email id [email protected]
CSI Communications | March 2015 | 40
www.csi-india.org
Security Corner
Prashant Mali
Advocate, Cyber Law & Cyber Security Expert, Author, Speaker
[email protected]
IT Act 2000»
Electronic/Digital Evidence & Cyber Law- Part 2
[Earlier an article titled Electronic Evidence &
Cyber Law by the current author had appeared
in the CSIC September 2012 issue.]
This article is necessitated because
from my earlier article the position of
admissibility of Electronic Evidence in
Indians courts have changed and are now
following verbatim what Indian Evidence
says. The surge in cyber crime and influx
of technology has made it necessary
to elevate the safeguard standards of
electronic evidence submitted in court.
The “standard of proof” in the form of
electronic evidence should be “more
accurate and stringent” compared to other
documentary evidence, tested with the
touchstone of relevance and admissibility
before it is admitted in court. This has
necessitated amendments in the Evidence
Act, 1872. The timely alterations and
amendments will make it an efficacious
tool of combat for cyber world challenges.
The recent judgment of The Hon’ble
Supreme Court delivered in ANVAR P.V.
VERSUS, P.K. BASHEER AND OTHERS,
in CIVIL APPEAL NO. 4226 OF 2012
decided on Sept., 18, 2014, has put to rest
the controversies and the contradicting
judgments related to the admissibility of
the Electronic Evidences. The court after
interpreting the Sections 22A, 45A, 59,
65A & 65B of the Evidence Act, held that
electronic record is not admissible as
evidence in court of law, without a certificate
u/s 65 B(4) of Evidence Act. It has clarified
that no oral evidence or expert opinion under
section 45A Evidence Act could be resorted
to prove the veracity and genuineness of the
computer output. It may be submitted that
oral evidence or an expert opinion is not a
substantive piece of evidence and without
independent and reliable corroboration it
may have no value in the eyes of law. The
evidence may be judged but it needs to be
emphasized that to rule out the possibility
of any kind of tampering the standard of
proof has to be more stringent about its
authenticity and accuracy as compared to
other documentary evidence. Even prior
to trial, especially at the stage of ex-parte
injunctions in intellectual property matters,
the strength of the electronic records are
weighed by Courts extensively. In the
absence of any cogent evidence regarding
the source and the manner of acquisition
of computer output, the authenticity of the
computer output is questionable.
The
significant
judgment
on
admissibility of electronic records in the
case of Anvar P.V. vs P.K. Basheer & Others
in which, the court has been pleased to
overrule its previous ruling on admissibility
of secondary evidence in State vs Navjot
Sandhu (2005) 11 SCC 600. The Court
further held that provisions such as Section
45A of the Indian Evidence Act which
provide for the opinion of examiner of
electronic evidence can only be availed
once the provisions of Section 65B are
satisfied. Hence compliance with Section
65B is now mandatory for persons who
intend to rely upon emails, websites or
any electronic record in a civil or criminal
trial to which provisions of the Evidence
Act are applicable. To further elucidate
admissibility of electronic records let us take
a comparative view of both the judgments:
Rules of Admissibility as Per State vs
Navjot Sandhu
The case of State vs Navjot Sandhu
(parliament attack case), in which the
Respondent was convicted under various
provisions of the Indian Penal Code and the
Prevention of Terrorism Act, 2002, the call
records of the accused was an evidence
which subsequently formed the basis of
conviction for the prosecution. In appeal
before the Supreme Court the admissibility
of the call records as electronic evidence
was adjudicated. The Court held that to
make the callrecords admissible, the printouts obtained from the computers/servers
and certified by a responsible official of
the service providing Company can be led
into evidence through a witness who can
identify the signatures of the certifying
officer or speak facts based on his personal
knowledge. The Supreme Court stated that
irrespective of the compliance of Section
65B of the Evidence Act, there is no bar to
adducing secondary evidence under the
other provisions of the Evidence Act, namely
Sections 63 & 65. The Court held that
merely because a certificate containing the
details in Section 65B(4) is not filed in the
instant case, does not mean that secondary
The “standard of proof” in the form of electronic evidence should be “more
accurate and stringent” compared to other documentary evidence, tested with
the touchstone of relevance and admissibility before it is admitted in court.
evidence cannot be given even if the law
permits such evidence to be given in the
circumstances mentioned in the relevant
provisions, namely Sections 63 & 65.
New Rules of Admissibility as per Anvar
P.V. vs P.K. Basheer & Others
The Supreme Court in Anvar P.V. vs
P.K.Basheer & Others has overruled the
earlier judgment position in State vs Navjot
Sandhu. The Court has now held that any
documentary evidence in the form of an
electronic record can be proved only in
accordance with the procedure prescribed
under Section 65B of the Evidence Act.
The Court reasoned that Section 65B of
the Evidence Act inserted by way of an
amendment, is a special provision which
governs digital evidence and will override the
general provisions with respect to adducing
secondary evidence under the Evidence
Act. The Section 65B mandates that every
electronic record will be admissible only if
it is supported by an affidavit of the party,
made by the person who has procured
access to the electronic record or who is in
control of the computer terminal (incase of
an email). Such a person may be called as a
witness at the stage of trial.
Conclusion
In my opinion it can be fairly concluded
that the Anvar’s case neatly binds up
electronic evidence and in doing so
the Hon. Supreme Court has created a
special law that overrides the general
law of documentary evidence on the
principle lexspecialisderogatlegigenerali.
I suggest law enforcement agencies
and investigating officers need to be
updated on the authentication process
regarding the admissibility of electronic/
digital evidences. Currently in the case
Ratan Tata vs Union of India Writ Petition
(Civil) 398 of 2010, a compact disc (CD)
containing intercepted telephone calls
was introduced in the Supreme Court
without following procedure contained
in the Evidence Act. This qualifies proper
training in effective handling and storage
of electronic evidences to ease the
hiccups that arise in trial procedures. I
appeal to complainants that please take
a certificate made under Section 65(B)(4)
along with the electronic/digital evidence
like printouts of snap shots from mobile
phones which you submit as documentary
n
evidence in any matter.
CSI Communications | March 2015 | 41
Security Corner
Prashant Mali
Advocate, Cyber Law & Cyber Security Expert, Author, Speaker
[email protected]
IT Act 2000»
Photographing a Woman without her Consent - No Law in
India to Prosecute
I strongly feel that the technological
enhancements facilitate our everyday life
but at the same time create possibility
of privacy violations. The rampant use of
smart phones and burst of technology has
also simultaneously increased the burden
on legal system to update its archaic laws.
There is an urgent need to amend our
laws. The law must seek to protect one
thing: the safety and well-being of women.
Let us take a broad overview of the laws
in India to understand this lacuna in our
legal system. There is an urgent need to
amend the Indian Penal Code by inserting
an amended Section 509A to the prevailing
509 Section, which prohibits a person
from photographing a woman without her
consent”. It is also necessary to create an
all inclusive definition of Privacy as it stands
today along with its growing relevance with
the cyber world.
The Information Technology Act,
2000 for instance, in its entirety does
not forbid “a person from photographing a
woman without her consent”.
To elucidate my point further, a
careful reading of Section 66E only
concerns itself with: Punishment for
violation of privacy if photographs of
private parts are taken. Likewise Section
67 deals with, punishment for publishing
and transmitting obscene material in
electronic form. Section 67A deals with
pornography; Section 67B deals with
pornography concerning children.
Now Let us review the sections, in the
Indian Penal Code, 1860.
Section 294 deals with -Obscene acts
and songs; Section 354 deals with Assault
or criminal force to a woman with intent to
outrage her modesty; Section 509 deals with
word, gesture or act intended to insult the
modesty of a woman. There is a difference
between Section 354 and 509. Section
509 specifically talks about the insult and
modesty as premium ingredient of this
offence against women as stated in Santha
Vs State of Kerala. The intention to insult the
modesty of a woman must be coupled with
the fact that the insult is caused whereas
Section 354 deals with outraging the
modesty of the women. A suspected stalker
has been arrested for clicking a woman on
his cell phone at the Netaji Bhawan Metro
station, setting the stage for a test case
dealing with privacy in public places in
the age of ubiquitous digital gadgets. The
accused, a 35-year-old mechanic living near
CSI Communications | March 2015 | 42
Park Street in Kolkatta, has been charged
with “insulting the modesty of a woman, by
word, gesture or act” under Section 509 of
the Indian Penal Code.
But all these sections are silent on
the act of a man photographing a woman
without her consent.
Let us take the Ruling of Machindra
Chate’s appeal for squashing an FIR filed
under Sec. 354 of IPC. Bombay High Court
said “Even if you keep your hand on the
shoulder of a woman, it is for the lady
to comment on the nature of the touch,
whether it was friendly, brotherly or
fatherly.” The Supreme Court offered some
clarity in a 2007 judgment about the term
“outrage the modesty”. A precise definition
of what constitutes a woman’s ‘modesty’ was
given by the Supreme Court as “The essence
of a woman’s modesty is her sex.” Further
bench said in a judgment, “The act of pulling
a woman, removing her saree, coupled with
a request for sexual intercourse, would be
an outrage to the modesty of a woman,
and knowledge that modesty is likely to
be outraged, is sufficient to constitute the
offence.”
The urgency to seek an insertion
of amended Section 509A in the Indian
Penal Code, 1860 is based on the fact that
burst of technology has invaded our lives
in the form of ubiquitous mobile phones
which means that photos are taken more
frequently and the images are used by few
for private sexual gratification. Ultimately
then, this is a social malady.
The recent news in “The Times Of
India” dated 10/02/2014 claims that in
Mumbai a pub staffer was arrested for
filming women without their consent in
the toilet, in this case the Mumbai police
have imposed section 354( molestation)
and section 66 of information act , 2000.
A senior IPS officer in Karnataka has
been booked by the Bangalore police for
allegedly clicking ‘obscene pictures’ of two
young women at a cafe-restaurant on the
Cunningham Road. Police have registered
a case of assault of a woman with intent
to outrage her modesty and criminal
intimidation. Police have also seized the
mobile phone used to click the images.
Now let us take a look at various
International laws. A female judge in
Washington DC dismissed charges against
a Virginian man, accused of voyeurism for
allegedly taking pictures of women’s skirts
at the Lincoln Memorial, saying that women
should have no expectation of privacy in a
public place. In another case a 40-year-old
man was arrested in Kawasaki City, Japan
for taking pictures of a young woman next
to him on the train. The photos in question
did not contain any sneaky stuff under the
skirt shots. The law states that it doesn’t
matter what you are taking a picture of, if
the woman being photographed is made to
feel uncomfortable or starts feeling anxious,
you are liable to be arrested. Even so much
as pointing a camera in the victim’s direction
without taking a picture is grounds for arrest.
The point is with changing times
and technology, more harm can be done
with photos of a woman clicked without
consent and then uploaded on the internet
for viewing and gratifying sexual needs.
In recent times, intrusion of privacy goes
beyond the bedroom and has come out in
public spaces as well. Privacy is defined
explicitly in the following:
Case Law: R. Rajagopal vs. State of T.N.
(1994): Auto Shankar & Nakkeeran - Right
to privacy held to be implicit in Article 21.
“It is the right to be left alone”. This “right
to be left alone” includes right not to have
your personal data collected, published or
otherwise processed without your consent.
Conclusion & Suggestion: We need a law
to take the notion of privacy in a public place
seriously. The act of clicking photos of women
at will by any gadgets without their consent
is abuse of power by a man against women
who by and large are vulnerable in public
spaces. All it takes is one click to upload a
snap to the internet, and the snap might exist
on a server and circulate somewhere we are
totally oblivious to. It also is unlawful to view
and photograph people inside residences or
other places where privacy is expected, even
when the photographer is standing in public.
The breach of the social norms can result in
opprobrium, coercion, danger, and violence,
and as such should not be ignored. Therefore
an amendment of existing Section 509 by
an insertion of Section 509A in the IPC that
clearly defines the act of taking photographs
of a woman without her consent as an offence
is much required with consent as important
ingredient. The nuisance and awkwardness
caused by the indiscriminate use of mobile
phone cameras to click photographs of
women in a reckless and irresponsible
manner and exploit the vulnerability of this
section of society will be curtailed and public
etiquette and social maturity will be infused
through law and order.
n
www.csi-india.org
Brain Teaser
Debasish Jana
Editor, CSI Communications
Crossword »
On Being “Discontinued”
Solution to February 2015 crossword
We’ve been contributing Crossword Column under Brain
Teaser section since April 2011. Over the months, we have
seen the increased interest and enthusiasm among readers
of CSIC and every month, month after month, we have
been creating Crossword puzzles on the theme topic and
check responses from the solution providers and publish
names of all or near all correct solution providers. It’s been
a magnificent experience. Sorry for the disappointment.
We are overwhelmed by the responses and solutions
received from our enthusiastic readers
Congratulations!
ALL correct answers to February 2015 month’s
crossword received from the following readers:
Er. Aruna Devi (Surabhi Softwares, Mysore) and
Surendra Khatri (Senior CSI Member, Retired From
Survey of India)
CSI Communications | March 2015 | 43
Happenings@ICT
H R Mohan
ICT Consultant, Former AVP (Systems), The Hindu & President, CSI
Email: hrmohan.csi@gmail .com
ICT News Briefs in February 2015
The following are the ICT news and
headlines of interest in February 2015. They
have been compiled from various news &
Internet sources including the dailies - The
Hindu, Business Line, and Economic Times.
Voices & Views
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
India will soon be 2nd largest market for
robot assisted surgery. At present, the US
is the largest followed by Japan, Korea
and India.
The IT industry directly employs around
three million people and indirectly about
10 million.
India is the 4th largest base for young
businesses in the world with over 3,100
tech start-ups. This is set to increase
to 11,500 by 2020 and create about
2,50,000 jobs - Nasscom.
While most Indian software providers
operate at over 20% profit margin in large
markets such as the US and Europe, their
profit margins in India fall in the single
digits due to late payments and delays.
Indian SMB market will grow 15% per
annum to propel IT spending in the sector
to over $18.5 billion by 2018 - Nasscom.
DBTL subscribers cross 10 crore. Has
transferred Rs. 4,299 crore d since
November 15, 2014, through 11.33 crore
transactions. Aadhaar-based DBT to
cover all schemes from next fiscal.
Currently, 65-70% of the $90 billion
Indian ESDM market relies on imports.
This to be reduced to 50% by 2016 with
local manufacturing.
The global telecom outsourcing market to
hit $76 billion by 2016 - Analysts.
Internet will influence & impact $ 35
billion worth FMCG sales in India over the
next five years.
Ten-fold rise in patent applications - about
1,500 patents in fiscal 2014 against 150 in
2009 – Nasscom.
The IT industry grew from $100 million in
1992 to $146 billion last year.
Industrial internet will usher in the next
revolution – Experts
In 2014, India saw the launch of 1,137
mobile handset models, around 19% more
than the 957 models launched in 2013.
Though India has 5.5 million 4G capable
devices, only about 85,000 subscribers
are active LTE users.
In 2001, the number of Internet users in
India was 6 million against 250 million
now. The current Internet economy, 2.7%
of the GDP is to increase to 4-5% by 2020.
Digital India in the next 10 years will have
a $550 billion to $1 trillion impact on
the GDP. 350-550 million Indians to join
mobile internet in four years – McKinsey.
India’s smartphone market shrunk by 4%
for the first time in the Q4 of 2014.
In 2014, Delhi, Hyderabad, Chennai and
Chandigarh constituted 45% of all the
malware detection, with the rest of the
country remaining 55%.
Budget fails to address IT industry issues
– Nasscom.
CSI Communications | March 2015 | 44
Govt, Policy, Telecom, Compliance
•
•
•
•
•
•
•
•
•
•
•
The central government’s eBiz platform
- a one-stop online shop for services to
investors, will fully integrate the services
of all Central ministries and departments
by May 31.
Out of 54 project proposals worth
investments of $3 billion, 28 project
proposals of $1 billion in electronics
manufacturing have been approved.
The Centre to be ready with the National
Skill Development Policy within the next
six months.
Digital India: conclave soon on the use of
geographic information systems.
Prasad, IT minister wants to replicate
‘White Revolution’ in IT space.
Web-based tool to track atrocities on
dalits, adivasis.
The creation of the fab ecosystem
coupled with the products and systems
value chain is expected to create 4.5 lakh
jobs, making a potential future economic
impact of $40 billion, over its project life
span. - IESA.
e-commerce needs a fair tax deal.
Car-makers seek spectrum for hi-tech
vehicles.
TRAI cuts connect charges, lays ground
for cheaper calls. Landline calls could
get cheaper by about 20 paise a minute.
STD calls to become cheaper after the
reduction in carriage charge.
Software exporters seek restoration of tax
incentives for SEZs in Budget.
IT Manpower, Staffing & Top Moves
•
•
•
•
•
•
•
•
•
•
•
•
Indian Angel Network (IAN) to offer
feedback on social start-up ideas
www.infaparambrata.com,
online
platform for recruiting film talents
launched
IT Professionals Welfare Association
(ITPWA) held a protest against lay-off“
through structural transformation” and
“workforce optimisations” .
Paytm aims to grow its employee strength
to 5,000 by end of 2015 from around
2,000 now.
Ajuba, which has around 3,500 employees
in Chennai plans to increase by 20-30%
by the year-end.
Trade unions want white paper on hiring;
appraisals in IT industry.
Mid-level IT engineers face re-skilling
challenge.
The Indian IT industry, during its growth
towards a $100-billion sector, added
3 million people, but will add less than a
million people for the next $100 billion in
revenues.
Foxconn’s Chennai plant which suspended
all operations from 2014 December 24 is
officially closed on 10th Feb 2015 affecting
1300 employees.
Cyberabad techies to take buses for a day
to cut carbon footprint. Majority of the
three lakh employees use private vehicles
to go to their respective work places.
eBay lays off 350 in India.
IT, BFSI sector to create maximum jobs in
first half of 2015 -Naukri survey.
•
Nasscom to offer IT courses such as Big
Data analytics, cyber security and design
engineering in tier-ii and tier-iii cities.
Company News: Tie-ups, Joint Ventures,
New Initiatives
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
The concept of installing Wi-Fi hotspots
at tea centres is being pushed by MUFT
Internet as a part of a project to roll out
internet browsing services at 22,000
tea vendors in Mumbai for a monthly
investment of Rs. 1,500(towards Wi-Fi
equipment and support.
Internet-for-all idea tops in the TiE contest
for school kids.
ESPNcricinfo, Google team up to provide
real-time and relevant updates on the
sport, anytime and anywhere on their
mobile devices.
Nasscom to open its third startup warehouse in Chhattisgarh after
Bangalore and Kolkata as a part of its
initiative to nurture 10,000 start-ups
initiative.
Cisco to help develop Visakhapatnam as
a smart city.
Ahmedabad-based eInfochips, Toshiba to
build modular phone Spiral-3for Google
and may be priced around $50 onward..
Alibaba arm buys 25% in Paytm parent for
$700 m.
Huawei India upbeat on opportunities,
opens R&D facility in Bengaluru. It spends
over 10% of its revenues on R&D.
www.cabus.in set up to offer inter-city
cabs at the fare of bus travel.
Myntra, a part of Flipkart, plans to transforms
itself into a ‘mobile only’ company.
TCS Fit4Life concept which marries
wellness, team spirit and social cause into
one, engages 900 students across 9 cities
in its inaugural run.
RCom, Facebook (Internet.org) to join
hands for taking Internet to the masses.
Offers free access to 33 specific websites,
including jobs, weather and news sites.
ICICI Bank rolls out e-wallet Pockets that
will allow transactions through a mobile
phone with or without a bank account.
Nasscom Ties up with Entrepreneurship
Café to help pick your Start-up Soulmate.
ACT offers ‘fastest Internet package’ with
100 mbps speed at Rs. 2,799 a month.
To bond with start-ups, TCS partners
with Startupbootcamp FinTech, a tech
accelerator.
e-logistics matters in agri-biz.
Jet Airways to test use of mobile boarding
passes.
Tech Mahindra turns focus on women’s
safety, unveils ‘Fightback Plus’.
Online real estate platform CommonFloor.
com launches “the world’s first virtual
reality innovation in real estate for the
masses” for virtual walkthroughs.
Epson to offer managed services for
corporates.
ECIL to roll out “Tek Robot”, a robotics
programme in 500 schools in Tamil Nadu
through Tek Wizard..
n
www.csi-india.org
Technical Campus
Explore to Invent
IC3T2015
InternationalConferenceonComputerandCommunications
Hostedby:CMRTechnicalCampus
InAssociationwithCSIHyderabadChapter,Division5,EducationandResearch,CSI–India
Dates:24thͲ26thJuly2015,Venue:CMRTechnicalCampus,Hyderabad
www.cmrtc.ac.in/ic3t2015/
CallforParticipation/PaperPresentation
Following the big success of First IC3T during March 2014, now it is set to organize 2nd International
Conference on Computer & Communication Technologies- IC3T 2015 in association with Division V, Education
& Research, CSI India, and Jointly Organized by Department of CSE & ECE of CMR Technical Campus during
24th – 26th July 2015.
In this regards all the prospective authors are invited to submit their original research articles related to the
themes of various special sessions and subjects related to Computers and Communication Technologies. IC3T
act as a major forum for the presentation of innovative ideas, approaches, developments and research projects
and also it will be a platform to exchange the information between researchers and industry professionals.
Paper Submission and Proceedings: Submitted articles should be neither previously published nor under
consideration for publication elsewhere. All papers will be refereed through a peer review process. Proceedings
will be published by a prestigious international publisher & available online. Prospective authors are invited to
submit paper(s) not exceeding 8 pages written in A4 size.
Submit your papers in below link:
www.easychair.org/conferences/?conf=ic3t2015
Scopes:IC3T2015willprovideaplatformforresearchersandpractitionerstointeractwithoneanotheranddiscuss
the stateͲofͲtheͲart developments in the field. The topics of the conference will cover all aspects of research and
applicationsinIntelligentComputing,includingbutnotlimitedto:
™
™
™
™
™
™
™
™
™
™
™
™
™
™ PatternRecognition
™ EvolutionaryComputing
™ ResourceManagementand
Scheduling
™ FuzzyComputing
™ SensorNetworksandSocial
Sensing
™ GreenComputing
™ SmartEnvironmentsand
Applications
™ HumanͲComputerInteraction
™ SwarmIntelligence&Swarm
Robotics
™ IntelligentControl
™ UbiquitousIntelligenceand
Computing
BigData
MachineLearning
CellularAutomata
MembraneComputing
CellularComputing
MobileComputing
CompressedSensing
MolecularComputing
Computational
Nanotechnology
MultiͲAgentSystems
DataIntensiveComputer
Architecture
DataMiningandKnowledge
Discovery
EmbeddedSystems
™ IntelligentEͲLearningSystems
™ WebIntelligenceand
Computing
™ IntelligentVideo&Image
Processing
™ WirelessNetworks
™ InternetSecurity
™ WirelessProtocolsand
Architectures
™ KnowledgeManagementand
Networks
™ OpticalComputing
™ NeuralComputing
SpecialSessionson
SoftwareEngineeringandapplications,
CyberSecurityandDigitalForensics,
ApplicationsforFuzzySystemsinEngineering
ImportantDates:
SubmissionofFullManuscript:10 March2015
LastdateofNotification:
Notification of Acceptance: Acceptance will be sent
CameraReadyPaper: assoonasthereviewsarecompleted.
EarlyRegistrationStarts:
th
15thMarch2015
20thMarch2015
20thMarch2015
Conveners:
Dr.KSrujanRaju,HODͲCSE
Prof.GSrikanth,HODͲECE
Ph:+91Ͳ9246874862,[email protected]
Ph:+91Ͳ9248727226,[email protected]
SpringerEditorialMember:Dr.SChandraSatapathy
Application for Editors of CSI Communications
Computer Society of India invites applications from professionals who are Life Members of CSI for appointment
as Editors of the CSI Communications for an initial period of one year extendable by another year. Editors can
be from Academic, R&D or from the Industry with excellent R&D credentials and preferably with experience in
editing scientific and/or technical journals.
CSI Communications (CSIC) is the most important mouthpiece of Computer Society of India and published every month with a variety
of articles on technology and also contains reports of activities of CSI going on in different places. The Editors should be ready to
devote time and work with the editorial staff to make the CSIC an excellent and attractive magazine for the members. The hard/ soft
copies of CSIC are distributed to all members across the globe.
Interested Members may send their application by March 25, 2015 with resume, details of relevant experience, list of publications,
references etc. to
Hony. Secretary,
Computer Society of India
Email: [email protected]
CSI Communications | March 2015 | 46
www.csi-india.org
EXECCOM TRANSACT
Report by Mr. Sanjay Mohapatra, Hon Secretary, CSI
The Third meeting of CSI Executive Committee for the year 2014-15 was held on December 12, 2014 at Hyderabad. I take pleasure
to share some discussion and decisions taken during this meeting.
•
CSI Nomination Committees for the year 2014-15 confirmed that the CSI election process started with publishing of
schedule of CSI ExecCom as well as Chapter Elections in October 2014 issue of CSIC. NC chair also confirmed that
amendments to CSI Constitution& Byelaws as well as Chapter Byelaws will be included in the e-ballot along with voting
for other vacant posts.
•
Chairman, NC once again appealed to RVPs to impress chapter NCs to complete chapter Election process at the earliest
preferably before January 2015. He also informed ExecCom that Bangalore and Kolkata chapters will be realigning voting
of Treasurer and MC positions with ExecCom elections.
•
ExecCom approved the list of Lifetime Achievement, Hon Fellowship and Fellowship Award winners presented by the
Awards Committee 2014-15 for consideration of National Council
•
RVP I presented details of the CSI 2015 Convention at Delhi with the names of Organising Committee, Programme Committee
and Finance Committee Chairs and preparations with respect to venue finalization and accommodation facilities. RVP I
mentioned that although CSI 2015 convention will be hosted by CSI Delhi Chapter, other chapters in NCR viz. Ghaziabad,
Noida and Gurgaon will be actively supporting this Convention.
•
ExecCom RESOLVED and approved the Annual Report and Audited Accounts for 2013-14 submitted by Hon Treasurer
•
ExecCom also confirmed the reappointment of M/s. Pruthviraj C Shah as National Auditor, M/s. Dutta Ghosh & Associates
and M/s. N Sivaprasad Associates as Regional Auditors. ExecCom also consider to revision in Audit fees charges depending
on the category of chapter.
•
ExecCom RESOLVED that Chapters that have submitted audited accounts or bank statements (as applicable), chapter
election Results, opening of new bank account at SBI MIDC Mumbai under unified banking system, closure of old bank
accounts with linking of FDs to chapter’s new bank account or revived with opening of new bank account till November 30th
2014 will be considered as CSI Chapters. List of chapters declared inoperative with non-compliance of these statutory norms
will be published in January issue of CSIC.
•
ExecCom reviewed the progress on Excellence in IT and YITP Awards
•
ExecCom approved the extension of period up to March 31st 2015, for discount scheme on Life membership of CSI
CSI Communications | March 2015 | 47
CSI Reports
From CSI SIG and Divisions »
Please check detailed news at:
http://www.csi-india.org/web/guest/csic-reports
SPEAKER(S)
TOPIC AND GIST
Annual student convention of CSI (Odisha) 2015
Dr. AK Nayak, Dr. RN Behera, Dr. RN Satpathy, Dr. PK Subudhi, 3-4 January 2015: Annual Student Convention of CSI theme “Augmentation
Manas Ranjan Pattanaik, Rashmiranjan Sutar, Subhas Sahoo & of ICT in rural voyage”
Dr. PK Subudhi. Judges - Prof. Satya Ranjan Mohapatra,
First day was earmarked with events ranging from ICT Grilling and Paper and
Prof. Jagannath Ray & Prof. Sanket Mishra.
Poster Presentation. ICT Grilling had participations from GITAM College,
ABIT and GIFT students. Winners were Arup Bid and Kumar Ujjwal of 6th
sem CSE. Second day was marked by events like Code Debugging, Droid
Android and Round Table which had response from various colleges like HIT,
GITAM, KIST including students from GIFT. Winners were Arup Bid (6th
Sem, CSE,GIFT), Ankit Sharma (6th Sem,CSE,GIFT) & Deepak Sahu (HIT,
Bhubaneswar).
Guests on stage
Division-III (Applications) of CSI, Patna University and CSI Patna Chapter
Dr. Ranjeet Kumar Verma, AK Nayak, Arun Kumar Sinha, Amrendra 7 February 2015: Seminar on the theme “Cloud Computing: A Paradigm Shift
Mishra, SK Srivastava, KP Singh, Shams Raza, Purnendu Narayan & in ICT”
RS Mishra
Seminar was inaugurated by Prof. Verma along with Prof. Nayak and
Prof. Sinha as Guest of Honour. Dr. Verma highlighted benefits of Cloud
Computing and pleaded for its wide application for dissemination of
knowledge. Prof. Sinha said cloud computing is beneficial not only
for education but also for Banking, Agricultural, Health and Science.
Prof. Nayak pointed out that ICT is changing rapidly by adopting faster,
effective and latest computing technologies which is contributing
significantly for inclusive growth of society. Prof. Mishra discussed about
growing popularity of cloud computing.
Guests and dignitaries on stage
K L University, Koneru, CSI Education Directorate and CSI Koneru-Chapter
Dr. K Gopi Krishna, P Thrimurthy, Dr. LSS Reddy, Dr. A Anand
Kumar, Dr. K Thirupathi Rao, AV Praveen Krishna, Koneru
Satyanaryana, Chandrashekhar Sahasrabudhe, Dr. Nilesh K Modi,
Uma Devi B, Saurabh Agrawal, Shailaja Sardessai and Sougouna, S
Ramasamy, Mini Ulanat, M Gnanasekaran & Shirini.
21 February 2015: 5th National Level Competition of the “CSI Discover
Thinking Funquiz-2015”
Winning teams from each of the states representing TN (Hosur), Maharashtra
(Pune), Puducherry, Gujarat (Ahmedabad), Kerala (Cochin), Goa and AP, participated
at the National Level Competition. Winners are 1) Firdous Fatma & Y. Chinmayee,
Sri Prakash Vidya Niketan, Paykaroapeta, AP (I Prize-Rs.15000/- + Trophy +
Certificate) 2) Adithyan Unni & Athul Unnikrishnanan, Bhavans Vidya Mandir,
Ernakulam, Cochin, Kerala (II Prize-Rs.10000/- + Trophy + Certificate) 3) Aadi Bhure
& Chinmay Mandke, New India School, Pune, Maharashtra (III Prize- Rs.5000/- +
Trophy +Certificate). Dr. Gopi Krishna & Prof. Thrimurthy distributed Cash Prizes,
Trophies & Certificates.
National Level Final Winners at K L University
CSI Communications | March 2015 | 48
www.csi-india.org
CSI News
From CSI Chapters »
Please check detailed news at:
http://www.csi-india.org/web/guest/csic-chapters-sbs-news
SPEAKER(S)
TOPIC AND GIST
DELHI (REGION I)
SD Sharma, VK Gupta, Dr. VB Aggarwal and Dr. AK 8 February 2015: Golden Jubilee celebrations technical talk on “ROBOTICS
Bansal
and Trends in Info Tech Education”
Mr. Sharma explained that Robotics Technology is taking lead in all fields
especially in manufacturing field. He also stressed need of awareness of new
trends in technical education. Dr. Aggarwal covered the topic on Robotics
touching on Unmanned Automatic Vehicle, UAV being used in Defence. He
gave latest trends and information on Technical Education in today’s Indian
scenario.
Dr. AK Bansal proposing vote of thanks and gratitude to one and all
CHANDIGARH (REGION I)
Mr. Subhash Chander Jain
14 November 2014: Student Symposium
Communication Trends in Technology”
on
“Information and
Symposium aimed to provide opportunity to students from varied fields
to showcase their innovative ideas and research. Theme of symposium
included: Cloud computing, Software engineering, Grid computing, Green
computing, Data mining, Big data and Image processing. There was expert
talk by Mr. Subhash Chander Jain on “Optical Fiber – An important Tool for
Communication”. Students gave poster presentations on research topics
which were adjudged by judges. Cash prizes worth Rs.20000/- were given to
the 3 winning positions and 2 consolations.
Participants and organizers of the symposium
HARIDWAR (REGION I)
Dr. MS Aswal, Dr. Mayank Aggrawal, Prof. RD Kaushik, 20 February 2015: National Level Students workshop on “Web Initials
Chirag Goel, Spandan Kumar and Nishant Kumar
WEB-2015”
Workshop was specifically designed for students to learn and have handson experience on Web languages starting from how to start (HTML,
PHP, CSS, JQUERY, JavaScript) to industry level web development (MVC
introduction). Inaugural Session was by Dr. Aswal and Dr. Aggrawal. Total
participation was more than 120. Prof Kaushik distributed certificates.
Distribution of certificates to Students
LUCKNOW (REGION I)
Chief Guest Prof. RK Khandal, Ajay Singh, Dr. Ashok 7 December 2014: National conference on “Digital India: eMpowering
Chandra
e Governance”
Prof. Khandal set the tone and called for free exchange of information &
resources among stakeholders. Mr. Singh asserted that path to India’s
digital dream goes through UP. Dr. Ashok Chandra stressed on using India’s
digital resources efficiently. Sessions focused on enablers and challenges
in rural outreach, infrastructure, security and communication services and
collaborative digital platforms (Social Media, IoT, Smart City). Topics covered
were - Digital Mandi, Smart Grid, improvements in rural governance through
IT etc. Selected papers that were presented, focused on eGovernance,
eRegistration, cyber security, digital media management and social service
using e-services. Panel discussion participants were Dr. Harsharan Das, CV
Singh, Ashesh Agrawal, PS Ganpathy, Dr. Upendra Kumar and Ajay Singh.
Honoring the Guest
CSI Communications | March 2015 | 49
KOLKATA (REGION II)
Devadatta Sinha, Dr. Swapan Purkait, Arindam Gupta,
Dr. Ambar Dutta, Devaprasanna Sinha, Subir Lahiri,
Dr. RT Goswami, Prof. Subrata Basak, Subimal Kundu &
JK Mandal
7 February 2015: Sixteenth Eastern Regional Young IT Professional Awards
(YITPA) Contest
Dr. Goswami stated rationale behind YITPA contest. Prof Basak mentioned
importance of interaction between Industry and Academia. Four presentations
were- Heartsense: Estimating Physiological Vitals on Smart Phone Using
Photoplethysmography by Anirban Dutta Choudhury and Rohan Banerjee, 3D
Reconstruction Using Smart Phone Sensors by Brojeshwar Bhowmick and Apurbaa
Mallik, Speech-Based Access for Agricultural Commodity Prices in Six Indian
Languages by Milton Samirakshma Bepari, Joyanta Basu, Rajib Roy and Soma Khan
& Development of A Dynamic GIS Model for Preparing EIA and EMP for Open Cast
Coal Mining by Mousumi Kundu, Subhajit Das & Sauvik Sarkar. Recipients of prizes
were: 1st Anirban Dutta Choudhury & Rohan Banerjee, 2nd Milton Samirakshma
Bepari, Joyanta Basu, Rajib Roy and Soma Khan, Special Mention: B. Bhowmick and
Apurbaa Mallik. Awards were presented by Subimal Kundu.
Group photograph of organizers, judges and participants
AHMEDABAD (REGION III)
Bipin Mehta, Anilbhai Patel, Dr. Mahendra Sharma,
Dr. Bhimaraya Metri, Dr. Amit Patel, Rajen Purohit,
Dr. Rajneesh Das, Dr. Nilesh Modi, Dr. Neeta Shah,
Dr. Panduranga Vithal M, Dr. Ravichandran, Dr. HJ Jani,
Ruchit Shurti, Vibha Desai, Sanjay Gaden, Dr. Nityesh
Bhatt, Dr. Shubrat Sahu & Dr. Jayesh Agaja
6-7 February 2015: 7th International Conference on “Emerging Management
Perspectives, Practices, & Research Trends and First Doctoral Colloquium”
Dr. Metri delivered keynote speech followed by various plenary sessions by
Dr. Neeta Shah, Dr. Panduranga Vithal M. and Dr. Ravichandran. Dr. Jani
delivered session on Business Analytics, Mr. Shurti discussed case on Mining
Industry and Perishable food supply chain, Dr. Modi delivered session on Social
Media and Cyber Security, Mrs. Desai delivered an expert session, Mr. Sanjay
Gaden spoke on Mission simplifying E-Governance & Dr. Bhatt delivered session
on Contemporary ICT enabled operation Practices. The Doctoral Colloquium
Workshop on Advanced Data Analysis Techniques was also organized and 20
Ph D scholars attended it. Dr. Sahu & Dr. Agaja delivered sessions on Exploratory
Factor Analysis and Confirmatory Factor Analysis.
Guests and dignitaries on stage
Bharat Patel, Dr. Bhushan Trivedi, Dr. Harsh Bhatt, Jigar 10 February 2015: 5th National Discover Thinking Quiz for Young Learners
Raval & Prerna Agrawal
for Region - III
Around 21 teams (42 participants) from different schools took part.
Preliminary first round was Objective Test. Students were given 15 minutes
for 25 questions. At the end of round six teams were qualified for final round.
Session on Information Security for You was also arranged. Dr. Bhatt and
Mr. Raval gave short session on Cyber Crime and Cyber Security. Final round
was divided in three sub-rounds Straight Round, Passing Round and Rapid
Fire (Buzzer) Round. Quiz was judged by Ms. Prerna Agrawal. Winners
Patel Sunil Jigneshbhai & Goswami Parth Kirankumar were honored with
Certificate of Achievement, Memento and Cash Prize.
Participant students, organizers and judges
GWALIOR (REGION III)
Chief guest Dr. Veer Singh and Shailendra Satyarthi
17 January 2015: Computer Quiz CQ-2015
Purpose of quiz was to share knowledge and make competitive environment
among school children at single platform. Scindia School Fort Gwalior got
the First position, Gwalior Glory Second position and Scindia School Fort
got third position. Finally team from Scindia School Fort Gwalior won rolling
shield. Prizes and certificates were distributed to the winning teams and
consolation prizes to all others. From Left: Ace Technology Official Dr VK Rao, Shailendra Satyarthi, Dr Shashi
Vikasit in Valedictory program
CSI Communications | March 2015 | 50
www.csi-india.org
MYLAVARAM (REGION V)
HR Mohan, Dr. P Trimurthy, Dr. Raju LK, Y Kathiresan, 28 January 2015: Inauguration of CSI Mylavaram Chapter
AV Praveen Krishna, K Timma Reddy, Dr. EV Prasad & The Mylavaram, Krishna District, Andhra Pradesh chapter was inaugurated
Dr. NRM Reddy
by Dr. H R Mohan, President of CSI by lighting the lamp. Later all the other
dignitaries joined him for the same.
Guests on stage for Chapter Inauguration
Dr. EV Prasad and GNV Raja Reddy
13-14 February 2015: Workshop on “ALICE 3 ( 3D Programming in Java)”
Workshop was inaugurated by Dr. Prasad. GNV Raja Reddy, Instructor Oracle
Academy was resource person. Hands on Experience was provided to participants.
Alice is an innovative 3D programming environment that makes it easy to create
an animation for telling a story, playing interactive game or video to share on
the web. Alice is freely available teaching tool designed to be student's first
exposure to object-oriented programming. It allows students to learn fundamental
programming concepts in the context of creating animated movies and simple
video games. In Alice, 3-D objects (e.g. people, animals & vehicles) populate a
virtual world. Students designed their own animations, Stories and Quiz.
Alice3 Workshop Inauguration
COCHIN (REGION VII)
Charles Andrews, KB Rajasekharan and Soman SP
11 January 2015: CSI Discover Thinking Quiz 2015
Competition was meant for middle school - class 6 to class 9 students.
105 teams from various Schools in central Kerala participated. There was
preliminary round of written test to select 6 teams who later participated
in final. There was tie as 3 teams got same marks in addition to top 5
teams. Using tie breaker test, 6 teams were selected for final. Winners for
Nationals were - Bhavans Vidya Mandir, Elamakara (Adithyan Unni - IX,
Athul Unnikrishnan - IX). Cash Prizes and Trophies were distributed by Mr
Rajasekharan and Mr Soman.
Participants and organizers team
TRIVANDRUM (REGION VII)
Vishnukumar S, G Neelakantan and Rajesh P
9 January 2015: Talk on “Relevance of Professional Bodies in Engineering
Colleges”
Mr. Vishnukumar delivered talk on ‘Relevance of Professional Bodies in
Engineering Colleges’. Mr. Neelakantan delivered talk on ‘Industry Relevant
Academic Projects’ and Mr. Rajesh delivered lecture on ‘Mean Stack Based
Web Application Development’.
Mr. G Neelakantan taking the sessions
Anshad Ameenza
14 February 2015: Workshop on “Software Defined Networking”
Content topics of workshop were - 1) Current Stage of Networking Industry
2) What is Software Defined ‘Everything'? 3) What is Software Defined
Networking (SDN)? 4) Introduction to Network Virtualization (NV) 5) SDN
Architecture and Components 6) SDN Development Models 7) Controller
Based Networking 8) Use Cases – Future and Present 9) SDN Adoption
Approach and 10) Demo of Few Well Known Solutions.
Anshad Ameenza during sessions
CSI Communications | March 2015 | 51
VELLORE (REGION VII)
14 February 2015: Workshop on “Open Source Technologies”
V Gurunatha Prasad
Mr. Gurunatha Prasad from Avinash Infotech covered topics such as
Open Source Tools like Firefox, Linux, Apache, Wireshark & virtual box and
creating web applications using Apache. Around 65 participants attended
the workshop.
Participants attending the workshop
From Student Branches »
(REGION - III)
AESICS, AHMEDABAD
(REGION - IV )
HI-TECH INSTITUTE OF TECHNOLOGY, KHORDHA
24-01-2015 - Regional Student Convention (Region-III) held at AESICS
CSI Student Branch, Ahmedabad
(REGION-V)
19-01-2015 - Mr Panchanan Das, Dr. Bhagirathi Behera & Prof. (Dr) R N
Satpathy during the lecture on “Self Employment”
(REGION-V)
MVJ COLLEGE OF ENGINEERING, BANGALORE
CMR TECHNICAL CAMPUS, HYDERABAD
14-02-2015 - Guest Lecture on “Technology Trends: Impacts on Business
and Consumer”
05-02-2015 - Resource Persons Addressing during the Workshop on
“Oracle (OCA)”
(REGION-V)
(REGION-VII)
RAVINDRA COLLEGE OF ENGINEERING, KURNOOL
SKR ENGINEERING COLLEGE, CHENNAI
13-02-2015 – Mr. Y Kathiresan, Senior Manager, Education Directorate
& Mr. Raju L Kanchibhotla, RVP-V, during “CSI Student Branch
inauguration”
31-01-2015 - Dr. M. Senthilkumar, Mr. Y Kathiresan, Dr. R Suguna,
Mr. S M Nandhakumar & Mr. S Suresh during Seminar on “Your Unique
Identity and CSI”
CSI Communications | March 2015 | 52
www.csi-india.org
(REGION-VII)
(REGION-VII)
ER. PERUMAL MANIMEKALAI COLLEGE OF ENGINEERING, HOSUR
EINSTEIN COLLEGE OF ENGINEERING, TIRUNELVELI
20-12-2014: Principal Dr. Chithra, SBC Ms. V Keerthika, with the resource
persons Mr. P Aravind at workshop on “Massive Open edX online
Learning platform”. Student Volunteer Harish welcoming all.
20-01-2015: Technical Quiz (C language) Dr. R Velayutham, Dr. K Ramar,
Prof. A Amudhavanan & Prof. M Suresh Thangakrishnan with Prize
winner Ms. Shunmugavalli
(REGION-VII)
(REGION-VII)
K S R INSTITUTE FOR ENGINEERING AND TECHNOLOGY, TIRUCHENGODE
SARANATHAN COLLEGE OF ENGINEERING , TRICHY
23-08-2014: Mr. S Gobidoss, Chief Educational Officer, Salem during the
Workshop to give basic ideas on usage of computer software, hardware
and internet to head masters and head mistress from various schools
10-01-2015: “AppDhoom” – one day workshop on Mobile Application
Development by Mr.Prithivi – Target soft Systems, Chennai
(REGION-VII)
(REGION-VII)
JAMAL MOHAMED COLLEGE (AUTONOMOUS), TIRUCHIRAPPALLI
SRI RAMAKRISHNA ENGINEERING COLLEGE, COIMBATORE
16-12-2014: In Inter-Collegiate Technical Symposium, SWAP 2K14, Students
of Cauvery College for Women receiving the overall champion-ship award
from Dr. A K Khaja Nazeemudheen, Secretary and Correspondent
19-12-2014: Tamilnadu State Student Convention inauguration by
Mr. Satheesh Kanagasabapathy, CTS
(REGION-VII)
(REGION-VII)
VELAMMAL ENGINEERING COLLEGE, CHENNAI
EINSTEIN COLLEGE OF ENGINEERING, TIRUNELVELI
04-02-2015 - Mr. Raja Venkatesh during the “Quiz Competition” for II
year CSE students
06-02-2015 – Mr. Prithviraj during the Training programme on “Mobile
application Development – appdhoom”
CSI Communications | March 2015 | 53
(REGION-VII)
K. S. RANGASAMY COLLEGE OF TECHNOLOGY, THIRUCHENGODE
09-01-2015: RVP Mr. Soman inaugurating the Regional Student
Convention while Prof.S.Balu, SBC-CSI ,Principal Dr. K Thyagarajah, Dr. B
G Geetha, HOD,CSE and Dr.R.Sasikala, HOD, IT are in the Dias
CSI Communications | March 2015 | 54
Please send your student branch
news to Education Director at
[email protected]. News sent
to any other email id will not be
considered. Please send only 1 photo
per event, not more.
www.csi-india.org
FORM IV
(Rule No. 8)
Statement about ownership and other particulars of the ‘CSI Communications’
1. Place of Publication
Computer Society of India
Unit No. 3, 4th Floor, Samruddhi Venture Park,
Marol MIDC Area, Andheri (E). Mumbai 400 093.
2. Periodicity of its Publication
Monthly
3. Printers Name
Nationality
Address
Mr. Suchit Gogwekar
Indian
Computer Society of India
Unit No. 3, 4th Floor, Samruddhi Venture Park,
Marol MIDC Area, Andheri (E). Mumbai 400 093.
4. Publishers Name
Nationality
Address
Mr. Suchit Gogwekar
Indian
Computer Society of India
Unit No. 3, 4th Floor, Samruddhi Venture Park,
Marol MIDC Area, Andheri (E). Mumbai 400 093.
5. Editor’s Name
Nationality
Address
Dr. R M Sonar
Indian
Computer Society of India
Unit No. 3, 4th Floor, Samruddhi Venture Park,
Marol MIDC Area, Andheri (E). Mumbai 400 093.
6. Names and Address of Individuals who own the
newspaper and partners or shareholders holding
more than one percent of the total capital
Computer Society of India
Unit No. 3, 4th Floor, Samruddhi Venture Park,
Marol MIDC Area, Andheri (E). Mumbai 400 093.
I, Suchit Gogwekar, hereby declare that the particulars given above are true to my knowledge and belief.
1st March, 2015
Sd/Suchit Gogwekar
Signature of the Publisher
CSI Communications | March 2015 | 55
CSI Calendar
2015
Date
Bipin V Mehta
Vice President, CSI & Chairman, Conf. Committee
Email: [email protected]
Event Details & Organizers
Contact Information
March 2015 events
21 Mar 2015
DIGITAL INDIA SUMMIT-2015 Golden Jubilee Year Celebrations & Organized By Computer
Society of India (Delhi & Gurgaon Chapter, Region-I & Division-I) At Mapple (Basement) India
Habitat Centre Lodhi Road, New Delhi-03.
Shiv Kumar
27-28 Mar 2015
International Conference on ICT in Healthcare organized by Sri Aurobindo Institute of
Technology, Indore in association with CSI Indore, Udaipur Chapter and CSI Division III and
Division IV Communication. http://www.csi-udaipur.org/icthc-2015/
Dr. Durgesh Kumar Mishra
[email protected]
Prof. A K Nayak, [email protected]
Prof. Amit Joshi, [email protected]
3-4 Apr 2015
National Conference on Creativity and Innovations in Technology Development
(NCCITD’15) at Udaipur. Organised by CSI Udaipur Chapter, Division IV, ACM
Udaipur Chapter and S S College of Engineering , Udaipur. www.csi-udaipur.org
Amit Joshi, [email protected]
Dr Jaydeep Ameta
[email protected]
11-12 Apr 2015
Two Day National Conference on ICT Applications “CONICTA-2014” at IIBM Auditorium,
Patna, organized by CSI Patna Chapter in association with Div-III and Div- IV of Computer
Society of India
Prof. A K Nayak, [email protected]
Prof. Durgesh Kumar Mishra
[email protected]
24-25 Apr 2015
ICON’15 “All India Conference On “Sustainable product in Computer Science & Engineering
organized by Chhatrapati Shivaji Institute of association with CSI Division IV, CSI Region IV
Prashant Richhariya
[email protected]
15–17 May 2015
International Conference on Emerging Trend in Network and Computer Communication
(ETNCC2015) at Department of Computer Science, School of Computing and Informatics
Polytechnic of Namibia in Association with Computer Society of India Division IV and SIGWC http://etncc2015.org/
Prof. Dharm Singh
[email protected]
17 May 2015
WTISD 2015 - Telecommunications and ICTs: Drivers of Innovations Organised by : CSI
Udaipur Chapter, IE(I) ULC At Udaipur http://www.csi-udaipur.org
Dr. Y C Bhatt, [email protected] Amit Joshi, [email protected]
ICICSE-2015: 3rd International Confernce on Innovations in Computer Science & Engineering
in collaboration with Computer Society of India (CSI)
Dr. H S Saini, [email protected]
Dr. D D Sarma, [email protected]
International Conference on Computer Communication and Control (IC42015) at Medicaps
Group of Institutions, Indore in association with CSI Division IV, Indore Chapter and IEEE MP
Subsection.
Dr. Pramod S Nair
[email protected]
Prof. Pankaj Dashore
[email protected]
9–10 Oct 2015
International Congress on Information and Communication Technology (ICICT-2014).At
Udaipur. Organised by CSI Udaipur Chapter, Div-IV, SIG-WNs, SIG- e-Agriculture and ACM
Udaipur Chapter www.csi-udaipur.org/icict-2014
Dr. Y C Bhatt
[email protected]
Amit Joshi
[email protected]
16-17 Oct 2015
6th Edition of the International Conference on Transforming Healthcare with IT to be held at
Hotel Lalit Ashok, Bangalore, India. http://transformhealth-it.org/
Mr. Suresh Kotchatill, Conference
Coordinator, [email protected]
April 2015 events
May 2015 events
Aug 2015 event
7-8 Aug 2015
Sept 2015 event
10-12 Sep 2015
Oct 2015 events
CSI Communications | March 2015 | 56
www.csi-india.org
Registered with Registrar of News Papers for India - RNI 31668/78
Regd. No. MCN/222/20l5-2017
Posting Date: 10 & 11 every month. Posted at Patrika Channel Mumbai-I
Date of Publication: 10 & 11 every month
A
d LT
arde
y aw
SV
rig.
ou
S Ch
L
r
dhu
If undelivered return to :
Samruddhi Venture Park, Unit No.3,
4th floor, MIDC, Marol, Andheri (E). Mumbai-400 093
e Achivement Awa
m
i
t
rd
ife Prof. DVR Vithal awarded LTA
Dr. C
R Ch
akra
vart
h
i aw
arde
d LT
A
B
Dr.
G
d
rde
wa
a
y
edd
h R wship
s
e
he Fello
Sat
n
Ho
Hon Fellowship Award
Dr. Achyutananda Samanta awarded
Hon Fellowship
h Babu
ri. Satis
Shri B
harat
Goen
k
Fellow a awarded
Hon
ship
Dr. D
ipti P
rasa
dM
Fello ukherjee
wshi
awar
p
ded
ship
d Fellow
awarde
Sh
Fellowship Award
Dr. H S Saini
ip
wsh
awarded Fello
wship
Fello
d
e
d
r
a
an aw
Shri.
S
Shri. H
nath
Rama
Shri KVSS
R Visw
aka
Fellow rma award
ed
ship
ao awarded
Rajeswar R
ip
Fellowsh
CSI Communications | March 2015 | 57