digital object identifier in effective media library management

Transcription

digital object identifier in effective media library management
DIGITAL OBJECT IDENTIFIER IN EFFECTIVE MEDIA LIBRARY
MANAGEMENT – AN INDIAN PERSPECTIVE
M TAMIZHCHELVAN11, AC GANESH21, S SWAMINATHAN32
1
Librarian,
The New Indian Express,
Express Estates, Club House, Road,
Chennai - 600002
Tamil Nadu, India.
[email protected]
2
Chief Web Writer,
Express Network Private Limited,
Express Estates, Club House, Road,
Chennai - 600002
Tamil Nadu, India.
[email protected]
3
Programmer,
CricInfo India Private Limited
25, RK Salai, Mylapore,
Chennai – 600004.
Tamil Nadu, India.
[email protected]
Abstract
There has been a need for identity across generations. It has been in the form of paper, plastic and now digital. Digital
Object Identifier (DOI) is a concept that helps to identify the needs of the end-users using technology on a digital
environment. DOIs can point to documents, images, sounds, video clips, parts of works, gateways, works under
development, evolving works, invoice screens (e.g. order forms, invitational membership forms or pay-per-view forms),
rights agreements, a page pointing to an object, even constantly changing sources like news headlines or stock quotes.
Virtually anything that a URL might point to now could be handled by the DOI system. DOI makes it possible to identify
the information on a digital network, and associating it with related current data. DOIs are designed for use in any digital
network, not just the World Wide Web, which is only one recent aspect of the evolution of digital networks and the use of
digital objects within them. DOIs can be used in open or proprietary digital networks in broadcasting, multimedia systems,
or indeed any conceptual framework. DOIs can be thought of as an abstract specification, which have a reference
implementation in the current Internet technologies. From a media library point of view, with so much of news flowing in a
media organization, one needs to be both subject as well as system expert to identify, organize, store and also retrieve
information. Many attempts have been made to organize the newspapers over the years, from bound volumes to the recent
full-text newspaper database to identify the news or information. The role of digital identifiers plays a part in some ways.
Though DOI may not be used in true sense, information which is collected, stored and retrieved using digital media be it
CD-ROMs, tape drives or web based applications are based on certain identification which may be in the form of numbers
generated by database themselves or by manual inputs. Therefore, one can easily identify the news item hosted under
various categories for either modification or for effective retrieval. Another area that digital media libraries are slowly
moving into helping the organization bring out eBooks to enhance, sustain and provide value-added service to their reader/
browser and also bring in some extra revenue for the organization. Therefore, this paper looks at ways effectively to
organize media library in managing information, storing, retrieving, indexing, classification and overall management of
news flow in a media setup and also sees a need for DOI as a tool to reach to different target audiences and end users.
Keywords: Media, Information, management, digital media, digital object identifier, copyright, digital rights management,
news, photos, audiovisuals, e-books, e-publishing.
140
Introduction
Managing, capturing and archiving information in media library is an art. The subjects vary, targets vary and
users vary from politicians to professors, students to social scientists and editors to elder statesmen.
“Information” is being treated like a kind of soup that “content providers” scoop out of pots and dump
wholesale into information systems. But it does not work that way. Good information retrieval design requires
just as much expertise about information and systems of information organization as it does about the technical
aspects of systems. (Bates, 1998).
Though the concept of “Digital Object Identifiers (DOI)” first came out of the Association of American
Publishers (AAP), it is slowly gaining acceptance to other form of digital network and is associating with
related current data to provide a link between a user and the author/ publisher of a material. Having been created
in 1996 to facilitate an e-commerce market for digital content, and provide solutions for copyright
protection/anti-piracy in the digital environment, DOI has come a long way in managing content in the day-today affairs of an organisation dealing in digital content.
Large media houses who are into diversified media business like print, television, radio and Internet, are
slowing moving towards setting up a centralized content management system (CMS). The CMS helps in
managing various forms of content - daily news, features, supplements, special issues etc., images be it in-house
or agency pictures, advertisement’s, audio-video’s, e-commerce related information and also Short messaging
services (SMS) which are stored at different locations and on different servers and medium. The CMS helps one
to create and publish content, manage the content more cost-effectively and also helps for better decisionmaking. It helps the integration and automation of the processes that support efficient and effective delivery of
content in required format, be it Intranet, CD-ROMs, Internet, E-Books, Paper/Print etc. This is where the role
of Digital Object Identifier (DOI) has become crucial. Though the media houses may not have taken a unique
server, they have understood the need and benefits of a unique identifying system by which one can retrieve and
disseminate information both in-house as well as to the users in this competitive age. Therefore, this paper looks
at ways effectively to organize media library in managing information, storing, retrieving, indexing,
classification and overall management of news flow in a media setup and also sees a need for DOI as a tool to
reach to different target audiences and end users.
The objectives/ benefits of Digital Object Identifier are to help:
The end-user must have no difficulties in accessing the information that he desires to read.
• The classification and categorizations of the publication should be user-friendly.
• The navigation should be simple and user-friendly.
• A well-organized content in electronic format benefits archival and easy retrieval.
• Helps to accelerated business processes, improved decision support
• Documents can be accessed anytime from anywhere
• Direct integration adds benefit to resource planning systems
From an online newspaper perspective, since the volume of content is huge and the flow of information being
round the clock, an informational professional plays a varied role from that of a librarian to a project manager
where he has to where he has to plan the choose, test and decide the hardware, server’ for storing data,
software’s for easy access, retrieval and content management.
Overview of DOI in media library environment
Access to information from various sources is of utmost importance to journalists for writing their stories. Many
attempts have been made to organize the newspapers over the years, from bound volumes to the recent full-text
newspaper database to identify the news or information.
The role of digital identifiers plays a part in some ways. Though DOI may not be used in true sense, information
which is collected, stored and retrieved using digital media be it CD-ROMs, tape drives or web based
applications are based on certain identification which may be in the form of numbers generated by database
141
themselves or by manual inputs. For example, daily news will be in the form of local news, state news, national
news, sports news, international news, editorials, district news, photos etc that are received from various sources
and places either through agencies or from staffers, who assign certain special digital identifiers. In case of
agencies, they have certain identifiers by which they send their pictures and news that are selected and stored
under the said categories. This digital identification varies from one newspaper organization to another.
From the library point of view to digitize the existing newspaper based on the category, name has to be assigned
based on the edition date of the daily edition and for supplements, features and special supplements that come
along with the newspaper. In case of a weekly or a fortnightly newspaper or magazines, again one has to assign
a identifier based on the weekly/ fortnightly edition dates. Digital classification may also have to be assigned for
pictures based on the daily for easy retrieval. Archiving and storing of newspaper and/ or magazines is an art
and one has the following modes of storage, which are done, either in CD-ROMs, tapes or in servers.
Classification is entirely different for the online edition of the newspaper or magazines hosted on the net. On the
website, one has to classify news items keeping the browsers in mind as navigation should be simple and easy
for retrieval. Here, the news items are hosted either statically or dynamically. For static pages, the digital
identifier is simple as it is done manually whereas in case of dynamic pages the database assigns values.
Therefore, one can easily identify the news item hosted under various categories for either modification or for
effective retrieval. A typical example for this is our organization; we host as many as four newspapers in
different Indian languages including English and each language newspaper/ magazine has been assigned an
identifier for different purpose of identification, example being DN for Dinamani newspaper (a Tamil
newspaper which is one of the Indian languages) and DN followed by H for headlines and so on for other
categories. This pattern is followed for others languages too. In case of special topics or sections created for
various events happening, similar digital identity is created.
Comprehensive access to information in newspapers has long been a recognized need. Many libraries, historical
societies and news organizations have attempted over the years to meet that expressed need in a variety of ways
over the years. From bound ledgers, loose-leaf notebooks and card files they have moved to the most recent
trend of storing full-text of newspaper or in case of newspapers, giving archiving news in a digital format and
making it available over the Internet for access in the form of full-text newspaper databases.
On the indexing front, the current trend is also to even move to computer-assisted assisted indexing. With the
exception of a relatively few large metropolitan newspaper indexes published and distributed to subscribers,
most indexes to local newspapers are not published. They are usually one-of-a-kind projects stored in public or
academic library file drawers, notebooks and even shoeboxes. Increasingly they are stored on computer disks.
The quality and continuity of these indexes vary considerably. Patron usage and satisfaction with these local
newspaper indexes have not been well documented.
Another area that digital media libraries are slowly moving into helping the organization bring out eBooks to
enhance, sustain and provide value-added service to their reader/ browser and also bring in some extra revenue
for the organization.
How the DOI works in media organisation
News comes from different countries, states, and cities to a common news server, which is then chosen and
used for different editions based on the priorities and importance. The file name have a unique identification
based on the user who sends it and the source from where it is sent and also based on the category the news item
is sent. Once the basic plan is ready, the next step is to organize the content so as to present it in a readable
format. This may be in the form of text, document, HTML, PDF, stored in database or a combination of these.
Electronic information may be divided into two types, namely streaming content and non-streaming content.
Streaming content is that where the multimedia components like real time audio/video, movies, videoconferencing etc are used to present interviews, audio/video features, songs, movie clips etc. The non-streaming
format includes text, pictures, graphics, etc presented in the form of static or dynamic pages.
142
With the arrival of broadband through Internet medium the role has become much more complicated for both
storage and also presentation of the content namely streaming and non-steaming because one needs both kind of
servers vis-à-vis - streaming content has to be stored in a media server like real player, media player, while the
non-streaming content goes to database server. Therefore one has to chose the mode of delivery and create
content accordingly.
Server: There are different servers configured to store content received from various centres and also agencies.
Server A is configured to store daily news and related Quark Express or PDF files.
• This is further divided into various centre content
Server B stores images related to the daily edition is stored and retrieved for use.
• This is divided into in-house images for all edition be it English or other language editions
•
Agency images ex: Reuters/ PTI, AFP, AP etc. images that is subscribed, downloaded and used for the
edition
Server C stores features and supplements that need to go along with the newspaper/ magazines.
• This contains content as well as images for the particular features/ supplements
Server D stores all advertisement related to the edition
• Advertisement images for the particular edition
Server E stores all E-commerce and Mobile format content
Server E stores backup data
Identifier: A certain unique naming conventions are being used with a prefix for news or an image is
practiced. For an image the publication date and image with extension, ex: IE10Vajpayee.jpg / DN10PM.jpg to
support the news item is being done. For news: While uploading the news, one has to choose the category, topic
etc and upload. On submission of the news, the system generates a unique news ID. A typical example is DN
stands for Dinamani; L stands for latest news followed by year/month/date/hours/minutes/seconds. This system
is followed as there is uniqueness in the number and chances for duplication is remote. This number helps to
identify the news for editing, deleting and/or when the content is shared with vendors in accessing the news
item based on the news ID generated.
Similarly, in the E-books and E-commerce related products, a unique product code is given for identifying the
book or product to reach the buyers/ purchasers. A typical example is a book cover is scanned as an image and a
unique name is given for the cover image, which is then put on display. The book details are entered along with
a unique product code and other details like name of the publisher, author name, and year of book published,
ISBN number, etc. The Product code will have a prefix TAM/ KAN/TEL/ENG/ followed by the number in case
of books or CD-followed by the number. Here TAM stands for Tamil Language books, KAN for Kannada
books, TEL for Telugu books, so on and so forth.
Directory: The directory holds all the DOI numbers and addresses of the server and routes the requests made
to the publishers and acts as an intermediary between the user and the rights-holder.
But if one needs to avoid the ‘File Not Found Error’ when a publisher moves, changes server, or sells rights,
then the DOI acts as a catalyst and directs the user to the new location as it updates the directory. Since the
number remains the same and is attached to the same content, one can avoid the annoying “File not found” error
as associated with the Internet.
The database: The database consists of the content or information provided by the vendor or information
provider that was requested by the user. The publisher also maintains a response screen, which is the first thing
the user sees after clicking the DOI icon. The response screen or the Index page might comprise the content
itself, or it might contain information about how to purchase the content.
Photo library: Similarly, large number of photos keeps coming into the library for reference and use and
hence they need to be stored without causing damage to the pictures. Since the volume of pictures are large, the
pictures are stored both in physical as well as digital format. But the classification and indexing are different for
both the formats. The digital pictures are stored both in the server as well as CDs and therefore a system has
been developed where the photos are classified under different heads for easy reference in both forms, manually
and electronically. Each and every photo has a unique name. If one looks at the naming convention that agency
like Reuters use, they start the year with MDF followed by a serial number that keeps continuing till the yearend. They also use the Suffix MDF1234a, b, c etc for duplication copy. An Indian news agency Press Trust of
India (PTI) has different conventions though. They use the prefix IND followed by date convention for India
related photos while they use FGN for foreign categories. Whereas, the digital convention used by the
143
organization media library - one stores with classification such as SPT as prefix for Sports category, POL for
political pictures, REG for Regional etc., for reference and easy access.
Methodology
Given below is the DOI flow that is in pipeline to streamline paid content site in the near future. The authors
have also shown the stages of how they have progressed in using DOI format over the years. Given below are
screen shots of the how the content namely - News, Photos, Audio/Video are stored in the server with a unique
number assigned followed by the news category and database generated number that is being used currently. In
the initial years the photos were in the local system in DOS mode where separate directory was created for each
and every category say National, Foreign, Personalities etc.
Results:
A screen shot of the yesteryear identification is shown below:
FAB01 JPG
FAB02 JPG
FAC01 JPG
FAD01 JPG
FAD02 JPG
FAG01 JPG
FAH01 JPG
FAH02 JPG
FAH03 JPG
FAK01 JPG
11,215
23,641
13,938
15,286
24,893
11,656
32,718
31,447
27,416
13,651
08-25-99
08-25-99
08-25-99
08-25-99
08-25-99
08-25-99
08-25-99
08-25-99
08-25-99
08-25-99
8:52p FAB01.JPG
8:53p FAB02.JPG
8:53p FAC01.JPG
8:53p FAD01.JPG
8:54p FAD02.JPG
8:54p FAG01.JPG
8:55p FAH01.JPG
8:56p FAH02.JPG
8:57p FAH03.JPG
8:57p FAK01.JPG
Figure 1. Names of Foreigner’s have been given in this system.
NCH25C JPG
NCH26 JPG
NCH27 JPG
NCH28C JPG
NCH29 JPG
NCH30C JPG
NCH30CA JPG
NCH31 JPG
NCH32 JPG
13,871 08-16-98 11:14p NCH25C.JPG
21,318 08-26-99 2:56a NCH26.JPG
15,623 08-26-99 2:56a NCH27.JPG
59,899 08-16-98 11:15p NCH28C.JPG
15,286 08-26-99 2:57a NCH29.JPG
59,899 08-26-99 2:57a NCH30C.JPG
48,122 08-26-99 2:58a NCH30CA.JPG
11,139 08-26-99 2:58a NCH31.JPG
15,817 08-26-99 2:58a NCH32.JPG
Figure 2. National leaders given in the coding system
LSH05 JPG
LSH06 JPG
LSH061 JPG
LSH09 JPG
LSI-06 JPG
10,522 08-26-99 12:44a LSH05.JPG
15,764 08-26-99 12:44a LSH06.JPG
57,853 08-26-99 12:45a LSH061.JPG
12,910 08-26-99 12:45a LSH09.JPG
106,569 08-26-99 12:47a LSI-06.JPG
Figure 3. Local leaders are given like this
ADVANI <DIR>
CLINTON <DIR>
FGN
<DIR>
KARGIL2 <DIR>
SONIA
<DIR>
VAJPAI
<DIR>
11-08-00 4:34p advani
11-08-00 3:44p Clinton
11-08-00 3:46p FGN
11-08-00 4:03p KARGIL2
11-08-00 4:16p SONIA
11-08-00 4:35p vajpai
Figure 4. Top personalities are given as folder name
144
This is an example of the photos and their classifications stored in FOXPRO database and
stored on a LAN Server for reference and use.
Figure 5.
With the advance in server management, technology and impact of ICT including the Internet given below is
the example of how the authors have started storing News/Photos/Audiovisuals/E-books/e-Shopping/ Ecommerce.
Presentations of various screen shots under different categories; the way it is stored in a unique server followed
by an assigned number given for NEWS, PHOTOS, SPECIAL FEATURES/ SUPPLEMENTS/
AUDIO/VIDEO followed by the file name are shown in appendix.
Given below is the Effective use of DOI in Media where news, photos and audiovisual has been used for a
client.
In case of E-books and E-commerce, one need to click on the Image of the Novel, short stories etc and this may
take them to an Index page from where they can download or read the abstract/story.
Discussions:
Client level Interface:
There will be a request from the client side to Admin for updation of one or more of the following options.
Admin will analyze the client side request, setup the initial information on the required database and provide the
required web based CMS system to the client.
145
•
•
•
•
•
Set up the initial information such as country code, city code, company code etc. We have used the
International telephone code structure that has been chosen for country code and city code, as they are
simple, time tested and effectively used worldwide.
Setup = > Add, Edit, Delete and List
Photos => Add, Edit, Delete, and List (Providing date and keyword based listing)
News items => Add, Edit, Delete, and List (Providing date and keyword based listing)
Audio & Video => Add, Edit, Delete, and List (Providing date and keyword based listing)
After providing required web based CMS system by Admin, the client will use the system and ask for any of the
above action (Add, Edit, Delete and List).
According the Client action, the program will recognize their id (Client id - which is already given by the
Admin on the initial setup process) and send queries to the required tables and store the content in to
appropriate dir on the web server.
Storing process on the web server:
* Consider the base dir on the web server is /doi/admin/
In case of Photos updation
* The path will be
/doi/country code/city code/client code/archive/photos/ddmmyyy/ph-XXXX-ddmmyyyymmhhss.jpg. XXXX
stands for a random four-digit number.
In case of News updation
*The path will be
/doi/country code/city code/client code/archive/news/ddmmyyy/NWS-XXXX-ddmmyyyymmhhss.html. XXXX
stands for a random four-digit number.
In case of Audio & Video updation
The path will be
/doi/country code/city code/client code/archive/av/ddmmyyy/av-XXXX-ddmmyyyymmhhss.avi. XXXX stands
for a random four-digit number.
Technical Specification:
Right now there are four updation on DOI concept namely Photos, News, Audio & Video and e-commerce. A
database has been created with tables that are used for validation and use.
Examples of client side validation, uploading of news, photographs and audiovisuals are presented as figures in
appendix. Similarly there are separate screen for Edit, delete and listing though they have not been shown below
but explained.
Using DOI in Digicom environment and rights management
Taking this media organization as an example on how rights management work is essentially with the help of
software. The software usually sits on the server of the content creator that is designed to ensure that it is secure
(i.e. protected) distribution of that content over the Web. The objectives are prevention of copying or
duplication and protect the content. As a value added service, access to e-books and purchases through ecommerce are being done through the website. Both English and language books classified under various heads
146
can be chosen and online purchases can be made from the website. The cover of the books is scanned and
unique database numbers are assigned for the books. A person selects and adds to his cart a book or range of
books and then makes online payment and then the book is sent physically. In the case of e-books, the user can
go through the index of the content that is available and then can register by paying he necessary amount. Once
the payment is made and the authenticity of the user is identified, he is provided with a user name and password
by which he can enter into the area where the content is available for reading and printing the material. By this
way, one can and will not lose control over the products displayed and also protect the digital content. Control
can be maintained for operations such as playing, printing, copying and saving etc.
In the case of online shopping of books, CDs, dress materials, jeweler etc., the shopping cart metaphor is used
as used by many online retailers in which the customer can see the range of items for sale, the customer moves
the item to his or her shopping cart, and then a permission must be issued to the customer (sometimes in the
form of an "offer URL") before actual access to the encrypted file is granted. The customer can then either
access the item directly or go to the offer URL to download or print the content of the item. Imagine that the
online retailer is selling e-books, and you will understand how this might work.
Conclusion
DOI: The idea of DOI needs to take off with the publishers. Identifiers are easy to be assigned though creating
and maintaining a database needs some work. There are certain benefits for the public and libraries at large as
there are hopes that publishers will assign DOIs for books out of print for which they holds rights as this would
benefit the researchers, scholars are publics at large as this will help in creating a unified catalog. With such a
system, books that are out-of-print but available in a certain library can be identified. This may be a Herculean
task for libraries, but in the long run. As far as digital rights management is concerned, many companies
worldwide are entering the business, and established ones are developing new products or enhancing some older
ones. Many top line companies including companies such as Adobe are working on DRM technologies. This
apart a new host of trusted people who act as the via media between the content/ online mall provider and endusers need and are the clearinghouses in the business chain. They handle the payment gateway and process the
payments made online and these companies work in the background. Information managers/ librarians or web
writers in our case are using the DRM systems extensively.
The need of the hour is IDF should sit with all media companies and try to understand the mechanism of how
they work and try to come out with standards. Publishers, Libraries etc have standards or people working and
striving for standard in their respective professions, whereas media is the only profession that does not have
standards as far as digital library management is concerned at least in India. Therefore the need of the hour is
some standards or guidelines or large media houses need to share their experiences in standardization so that
smaller firms can follow them. Since each and every firm uses their own way of classification, indexing, storing
and identification methods, huge amount of information is unnecessarily duplicated or is lost due to lack of
standards. In the current scenario, a librarian has to be mentally prepared to work on these lines though there is
still a clash of role as they are not accustomed to such a role i.e., acting as a coordinator, though they have
always been proud owners and value their possession. In the years to follow, librarians/ information
professionals need to go a step further and start producing digital content in the form of text, PDF, html,
pictures, graphs, audio/ videos, broadband content, CDs, MPEGs, MP3s etc.
147
Appendix 1:
Given below are screen shots of News, photo’s, supplements/ features, uploading systems etc.
Appendix 2:
Given below are screen shots of the client side authentication and uploading system.
Notes and References:
1.
2.
3.
4.
5.
Cooper, Charles, (2000): E-books: An idea still ahead of its time, ZDNet News Issue dated August 9, 2000.
Reuka K, (2002): Electronic books and the future of libraries: LIST 2002, 27-28 January, 2002. (133 p)
Chuoksey SS (2002): On-line Library Services towards On-line education system, MANLIBNET 2002, 3-5
April 2002, (258 p)
Rose MJ (2001): E-Books Live On After Mighty Fall, Issue Wired News, Dec. 18, 2001
Digital Rights Management for eBooks: Publisher Requirements, version 1.0 Association of American
Publishers, New York, NY and Washington, D.C. USA, November 2000 (available at
http://www.publishers.org/home/drm.pdf).
148
6.
Bide, Mark, In Search of the Unicorn, The Digital Object Identifier from a User Perspective, BNBRF
Report 89, Book Industry Communications, London, February 1998 (available at
http://www.bic.org.uk/unicorn2.pdf).
7. Bernstein, Paula, DOI: A New Identifier for Digital Content. Searcher: The Magazine for Database
Professionals, Vol. 6, No. 1, Jan. 1998 (available at http://www.infotoday.com/searcher/jan98/story4.htm).
8. The DOI Handbook, version 0.5.1, The International DOI Foundation, Washington, D.C. USA and Geneva,
Switzerland, September 2000 (available at http://www.doi.org/handbook_2000/index.html).
9. Paskin, Norman, Digital Object Identifier: implementing a standard digital identifier as the key to effective
digital rights management, The International DOI Foundation, Kidlington, Oxfordshire, United Kingdom,
2000 at 3 (available at http://www.doi.org/doi_presentations/aprilpaper.pdf).
10. Bates, Marcia J. (1988), "How to Use Controlled Vocabularies More Effectively in Online Searching,"
Online, 12(6), pp. 45-56.
11. Bates, Marcia J. (1998), "Indexing and Access for Digital Libraries and the Internet: Human, Database, and
Domain Factors," Journal of the American Society for Information Science, 49 (13), pp. 1185-1205.
12. Bates, Marcia J. (1999), "The Invisible Substrate of information Science," Journal of the American Society
for Information Science, 50, (12), pp. 1043-1050.
13. Bates, Marcia J. (2002), "The Cascade of Interactions in the Digital Library Interface," Information
Processing and Management, 38(3), pp. 381-400.
14. Bates, Marcia J. (2002), After the Dot-Bomb: Getting Web Information Retrieval Right This Time, First
Monday, 7 (7).
15. Bowker, Geoffrey C. and Star, Susan Leigh (1998) (Ed.), "How Classifications Work: Problems and
Challenges in an Electronic Age", Library Trends, 47(2).
16. Svenonius, Elaine. (1983), "Use of classification in online retrieval." Library Resources and Technical
Services, 27 (1), pp. 76-80.
149