digital object identifier in effective media library management
Transcription
digital object identifier in effective media library management
DIGITAL OBJECT IDENTIFIER IN EFFECTIVE MEDIA LIBRARY MANAGEMENT – AN INDIAN PERSPECTIVE M TAMIZHCHELVAN11, AC GANESH21, S SWAMINATHAN32 1 Librarian, The New Indian Express, Express Estates, Club House, Road, Chennai - 600002 Tamil Nadu, India. [email protected] 2 Chief Web Writer, Express Network Private Limited, Express Estates, Club House, Road, Chennai - 600002 Tamil Nadu, India. [email protected] 3 Programmer, CricInfo India Private Limited 25, RK Salai, Mylapore, Chennai – 600004. Tamil Nadu, India. [email protected] Abstract There has been a need for identity across generations. It has been in the form of paper, plastic and now digital. Digital Object Identifier (DOI) is a concept that helps to identify the needs of the end-users using technology on a digital environment. DOIs can point to documents, images, sounds, video clips, parts of works, gateways, works under development, evolving works, invoice screens (e.g. order forms, invitational membership forms or pay-per-view forms), rights agreements, a page pointing to an object, even constantly changing sources like news headlines or stock quotes. Virtually anything that a URL might point to now could be handled by the DOI system. DOI makes it possible to identify the information on a digital network, and associating it with related current data. DOIs are designed for use in any digital network, not just the World Wide Web, which is only one recent aspect of the evolution of digital networks and the use of digital objects within them. DOIs can be used in open or proprietary digital networks in broadcasting, multimedia systems, or indeed any conceptual framework. DOIs can be thought of as an abstract specification, which have a reference implementation in the current Internet technologies. From a media library point of view, with so much of news flowing in a media organization, one needs to be both subject as well as system expert to identify, organize, store and also retrieve information. Many attempts have been made to organize the newspapers over the years, from bound volumes to the recent full-text newspaper database to identify the news or information. The role of digital identifiers plays a part in some ways. Though DOI may not be used in true sense, information which is collected, stored and retrieved using digital media be it CD-ROMs, tape drives or web based applications are based on certain identification which may be in the form of numbers generated by database themselves or by manual inputs. Therefore, one can easily identify the news item hosted under various categories for either modification or for effective retrieval. Another area that digital media libraries are slowly moving into helping the organization bring out eBooks to enhance, sustain and provide value-added service to their reader/ browser and also bring in some extra revenue for the organization. Therefore, this paper looks at ways effectively to organize media library in managing information, storing, retrieving, indexing, classification and overall management of news flow in a media setup and also sees a need for DOI as a tool to reach to different target audiences and end users. Keywords: Media, Information, management, digital media, digital object identifier, copyright, digital rights management, news, photos, audiovisuals, e-books, e-publishing. 140 Introduction Managing, capturing and archiving information in media library is an art. The subjects vary, targets vary and users vary from politicians to professors, students to social scientists and editors to elder statesmen. “Information” is being treated like a kind of soup that “content providers” scoop out of pots and dump wholesale into information systems. But it does not work that way. Good information retrieval design requires just as much expertise about information and systems of information organization as it does about the technical aspects of systems. (Bates, 1998). Though the concept of “Digital Object Identifiers (DOI)” first came out of the Association of American Publishers (AAP), it is slowly gaining acceptance to other form of digital network and is associating with related current data to provide a link between a user and the author/ publisher of a material. Having been created in 1996 to facilitate an e-commerce market for digital content, and provide solutions for copyright protection/anti-piracy in the digital environment, DOI has come a long way in managing content in the day-today affairs of an organisation dealing in digital content. Large media houses who are into diversified media business like print, television, radio and Internet, are slowing moving towards setting up a centralized content management system (CMS). The CMS helps in managing various forms of content - daily news, features, supplements, special issues etc., images be it in-house or agency pictures, advertisement’s, audio-video’s, e-commerce related information and also Short messaging services (SMS) which are stored at different locations and on different servers and medium. The CMS helps one to create and publish content, manage the content more cost-effectively and also helps for better decisionmaking. It helps the integration and automation of the processes that support efficient and effective delivery of content in required format, be it Intranet, CD-ROMs, Internet, E-Books, Paper/Print etc. This is where the role of Digital Object Identifier (DOI) has become crucial. Though the media houses may not have taken a unique server, they have understood the need and benefits of a unique identifying system by which one can retrieve and disseminate information both in-house as well as to the users in this competitive age. Therefore, this paper looks at ways effectively to organize media library in managing information, storing, retrieving, indexing, classification and overall management of news flow in a media setup and also sees a need for DOI as a tool to reach to different target audiences and end users. The objectives/ benefits of Digital Object Identifier are to help: The end-user must have no difficulties in accessing the information that he desires to read. • The classification and categorizations of the publication should be user-friendly. • The navigation should be simple and user-friendly. • A well-organized content in electronic format benefits archival and easy retrieval. • Helps to accelerated business processes, improved decision support • Documents can be accessed anytime from anywhere • Direct integration adds benefit to resource planning systems From an online newspaper perspective, since the volume of content is huge and the flow of information being round the clock, an informational professional plays a varied role from that of a librarian to a project manager where he has to where he has to plan the choose, test and decide the hardware, server’ for storing data, software’s for easy access, retrieval and content management. Overview of DOI in media library environment Access to information from various sources is of utmost importance to journalists for writing their stories. Many attempts have been made to organize the newspapers over the years, from bound volumes to the recent full-text newspaper database to identify the news or information. The role of digital identifiers plays a part in some ways. Though DOI may not be used in true sense, information which is collected, stored and retrieved using digital media be it CD-ROMs, tape drives or web based applications are based on certain identification which may be in the form of numbers generated by database 141 themselves or by manual inputs. For example, daily news will be in the form of local news, state news, national news, sports news, international news, editorials, district news, photos etc that are received from various sources and places either through agencies or from staffers, who assign certain special digital identifiers. In case of agencies, they have certain identifiers by which they send their pictures and news that are selected and stored under the said categories. This digital identification varies from one newspaper organization to another. From the library point of view to digitize the existing newspaper based on the category, name has to be assigned based on the edition date of the daily edition and for supplements, features and special supplements that come along with the newspaper. In case of a weekly or a fortnightly newspaper or magazines, again one has to assign a identifier based on the weekly/ fortnightly edition dates. Digital classification may also have to be assigned for pictures based on the daily for easy retrieval. Archiving and storing of newspaper and/ or magazines is an art and one has the following modes of storage, which are done, either in CD-ROMs, tapes or in servers. Classification is entirely different for the online edition of the newspaper or magazines hosted on the net. On the website, one has to classify news items keeping the browsers in mind as navigation should be simple and easy for retrieval. Here, the news items are hosted either statically or dynamically. For static pages, the digital identifier is simple as it is done manually whereas in case of dynamic pages the database assigns values. Therefore, one can easily identify the news item hosted under various categories for either modification or for effective retrieval. A typical example for this is our organization; we host as many as four newspapers in different Indian languages including English and each language newspaper/ magazine has been assigned an identifier for different purpose of identification, example being DN for Dinamani newspaper (a Tamil newspaper which is one of the Indian languages) and DN followed by H for headlines and so on for other categories. This pattern is followed for others languages too. In case of special topics or sections created for various events happening, similar digital identity is created. Comprehensive access to information in newspapers has long been a recognized need. Many libraries, historical societies and news organizations have attempted over the years to meet that expressed need in a variety of ways over the years. From bound ledgers, loose-leaf notebooks and card files they have moved to the most recent trend of storing full-text of newspaper or in case of newspapers, giving archiving news in a digital format and making it available over the Internet for access in the form of full-text newspaper databases. On the indexing front, the current trend is also to even move to computer-assisted assisted indexing. With the exception of a relatively few large metropolitan newspaper indexes published and distributed to subscribers, most indexes to local newspapers are not published. They are usually one-of-a-kind projects stored in public or academic library file drawers, notebooks and even shoeboxes. Increasingly they are stored on computer disks. The quality and continuity of these indexes vary considerably. Patron usage and satisfaction with these local newspaper indexes have not been well documented. Another area that digital media libraries are slowly moving into helping the organization bring out eBooks to enhance, sustain and provide value-added service to their reader/ browser and also bring in some extra revenue for the organization. How the DOI works in media organisation News comes from different countries, states, and cities to a common news server, which is then chosen and used for different editions based on the priorities and importance. The file name have a unique identification based on the user who sends it and the source from where it is sent and also based on the category the news item is sent. Once the basic plan is ready, the next step is to organize the content so as to present it in a readable format. This may be in the form of text, document, HTML, PDF, stored in database or a combination of these. Electronic information may be divided into two types, namely streaming content and non-streaming content. Streaming content is that where the multimedia components like real time audio/video, movies, videoconferencing etc are used to present interviews, audio/video features, songs, movie clips etc. The non-streaming format includes text, pictures, graphics, etc presented in the form of static or dynamic pages. 142 With the arrival of broadband through Internet medium the role has become much more complicated for both storage and also presentation of the content namely streaming and non-steaming because one needs both kind of servers vis-à-vis - streaming content has to be stored in a media server like real player, media player, while the non-streaming content goes to database server. Therefore one has to chose the mode of delivery and create content accordingly. Server: There are different servers configured to store content received from various centres and also agencies. Server A is configured to store daily news and related Quark Express or PDF files. • This is further divided into various centre content Server B stores images related to the daily edition is stored and retrieved for use. • This is divided into in-house images for all edition be it English or other language editions • Agency images ex: Reuters/ PTI, AFP, AP etc. images that is subscribed, downloaded and used for the edition Server C stores features and supplements that need to go along with the newspaper/ magazines. • This contains content as well as images for the particular features/ supplements Server D stores all advertisement related to the edition • Advertisement images for the particular edition Server E stores all E-commerce and Mobile format content Server E stores backup data Identifier: A certain unique naming conventions are being used with a prefix for news or an image is practiced. For an image the publication date and image with extension, ex: IE10Vajpayee.jpg / DN10PM.jpg to support the news item is being done. For news: While uploading the news, one has to choose the category, topic etc and upload. On submission of the news, the system generates a unique news ID. A typical example is DN stands for Dinamani; L stands for latest news followed by year/month/date/hours/minutes/seconds. This system is followed as there is uniqueness in the number and chances for duplication is remote. This number helps to identify the news for editing, deleting and/or when the content is shared with vendors in accessing the news item based on the news ID generated. Similarly, in the E-books and E-commerce related products, a unique product code is given for identifying the book or product to reach the buyers/ purchasers. A typical example is a book cover is scanned as an image and a unique name is given for the cover image, which is then put on display. The book details are entered along with a unique product code and other details like name of the publisher, author name, and year of book published, ISBN number, etc. The Product code will have a prefix TAM/ KAN/TEL/ENG/ followed by the number in case of books or CD-followed by the number. Here TAM stands for Tamil Language books, KAN for Kannada books, TEL for Telugu books, so on and so forth. Directory: The directory holds all the DOI numbers and addresses of the server and routes the requests made to the publishers and acts as an intermediary between the user and the rights-holder. But if one needs to avoid the ‘File Not Found Error’ when a publisher moves, changes server, or sells rights, then the DOI acts as a catalyst and directs the user to the new location as it updates the directory. Since the number remains the same and is attached to the same content, one can avoid the annoying “File not found” error as associated with the Internet. The database: The database consists of the content or information provided by the vendor or information provider that was requested by the user. The publisher also maintains a response screen, which is the first thing the user sees after clicking the DOI icon. The response screen or the Index page might comprise the content itself, or it might contain information about how to purchase the content. Photo library: Similarly, large number of photos keeps coming into the library for reference and use and hence they need to be stored without causing damage to the pictures. Since the volume of pictures are large, the pictures are stored both in physical as well as digital format. But the classification and indexing are different for both the formats. The digital pictures are stored both in the server as well as CDs and therefore a system has been developed where the photos are classified under different heads for easy reference in both forms, manually and electronically. Each and every photo has a unique name. If one looks at the naming convention that agency like Reuters use, they start the year with MDF followed by a serial number that keeps continuing till the yearend. They also use the Suffix MDF1234a, b, c etc for duplication copy. An Indian news agency Press Trust of India (PTI) has different conventions though. They use the prefix IND followed by date convention for India related photos while they use FGN for foreign categories. Whereas, the digital convention used by the 143 organization media library - one stores with classification such as SPT as prefix for Sports category, POL for political pictures, REG for Regional etc., for reference and easy access. Methodology Given below is the DOI flow that is in pipeline to streamline paid content site in the near future. The authors have also shown the stages of how they have progressed in using DOI format over the years. Given below are screen shots of the how the content namely - News, Photos, Audio/Video are stored in the server with a unique number assigned followed by the news category and database generated number that is being used currently. In the initial years the photos were in the local system in DOS mode where separate directory was created for each and every category say National, Foreign, Personalities etc. Results: A screen shot of the yesteryear identification is shown below: FAB01 JPG FAB02 JPG FAC01 JPG FAD01 JPG FAD02 JPG FAG01 JPG FAH01 JPG FAH02 JPG FAH03 JPG FAK01 JPG 11,215 23,641 13,938 15,286 24,893 11,656 32,718 31,447 27,416 13,651 08-25-99 08-25-99 08-25-99 08-25-99 08-25-99 08-25-99 08-25-99 08-25-99 08-25-99 08-25-99 8:52p FAB01.JPG 8:53p FAB02.JPG 8:53p FAC01.JPG 8:53p FAD01.JPG 8:54p FAD02.JPG 8:54p FAG01.JPG 8:55p FAH01.JPG 8:56p FAH02.JPG 8:57p FAH03.JPG 8:57p FAK01.JPG Figure 1. Names of Foreigner’s have been given in this system. NCH25C JPG NCH26 JPG NCH27 JPG NCH28C JPG NCH29 JPG NCH30C JPG NCH30CA JPG NCH31 JPG NCH32 JPG 13,871 08-16-98 11:14p NCH25C.JPG 21,318 08-26-99 2:56a NCH26.JPG 15,623 08-26-99 2:56a NCH27.JPG 59,899 08-16-98 11:15p NCH28C.JPG 15,286 08-26-99 2:57a NCH29.JPG 59,899 08-26-99 2:57a NCH30C.JPG 48,122 08-26-99 2:58a NCH30CA.JPG 11,139 08-26-99 2:58a NCH31.JPG 15,817 08-26-99 2:58a NCH32.JPG Figure 2. National leaders given in the coding system LSH05 JPG LSH06 JPG LSH061 JPG LSH09 JPG LSI-06 JPG 10,522 08-26-99 12:44a LSH05.JPG 15,764 08-26-99 12:44a LSH06.JPG 57,853 08-26-99 12:45a LSH061.JPG 12,910 08-26-99 12:45a LSH09.JPG 106,569 08-26-99 12:47a LSI-06.JPG Figure 3. Local leaders are given like this ADVANI <DIR> CLINTON <DIR> FGN <DIR> KARGIL2 <DIR> SONIA <DIR> VAJPAI <DIR> 11-08-00 4:34p advani 11-08-00 3:44p Clinton 11-08-00 3:46p FGN 11-08-00 4:03p KARGIL2 11-08-00 4:16p SONIA 11-08-00 4:35p vajpai Figure 4. Top personalities are given as folder name 144 This is an example of the photos and their classifications stored in FOXPRO database and stored on a LAN Server for reference and use. Figure 5. With the advance in server management, technology and impact of ICT including the Internet given below is the example of how the authors have started storing News/Photos/Audiovisuals/E-books/e-Shopping/ Ecommerce. Presentations of various screen shots under different categories; the way it is stored in a unique server followed by an assigned number given for NEWS, PHOTOS, SPECIAL FEATURES/ SUPPLEMENTS/ AUDIO/VIDEO followed by the file name are shown in appendix. Given below is the Effective use of DOI in Media where news, photos and audiovisual has been used for a client. In case of E-books and E-commerce, one need to click on the Image of the Novel, short stories etc and this may take them to an Index page from where they can download or read the abstract/story. Discussions: Client level Interface: There will be a request from the client side to Admin for updation of one or more of the following options. Admin will analyze the client side request, setup the initial information on the required database and provide the required web based CMS system to the client. 145 • • • • • Set up the initial information such as country code, city code, company code etc. We have used the International telephone code structure that has been chosen for country code and city code, as they are simple, time tested and effectively used worldwide. Setup = > Add, Edit, Delete and List Photos => Add, Edit, Delete, and List (Providing date and keyword based listing) News items => Add, Edit, Delete, and List (Providing date and keyword based listing) Audio & Video => Add, Edit, Delete, and List (Providing date and keyword based listing) After providing required web based CMS system by Admin, the client will use the system and ask for any of the above action (Add, Edit, Delete and List). According the Client action, the program will recognize their id (Client id - which is already given by the Admin on the initial setup process) and send queries to the required tables and store the content in to appropriate dir on the web server. Storing process on the web server: * Consider the base dir on the web server is /doi/admin/ In case of Photos updation * The path will be /doi/country code/city code/client code/archive/photos/ddmmyyy/ph-XXXX-ddmmyyyymmhhss.jpg. XXXX stands for a random four-digit number. In case of News updation *The path will be /doi/country code/city code/client code/archive/news/ddmmyyy/NWS-XXXX-ddmmyyyymmhhss.html. XXXX stands for a random four-digit number. In case of Audio & Video updation The path will be /doi/country code/city code/client code/archive/av/ddmmyyy/av-XXXX-ddmmyyyymmhhss.avi. XXXX stands for a random four-digit number. Technical Specification: Right now there are four updation on DOI concept namely Photos, News, Audio & Video and e-commerce. A database has been created with tables that are used for validation and use. Examples of client side validation, uploading of news, photographs and audiovisuals are presented as figures in appendix. Similarly there are separate screen for Edit, delete and listing though they have not been shown below but explained. Using DOI in Digicom environment and rights management Taking this media organization as an example on how rights management work is essentially with the help of software. The software usually sits on the server of the content creator that is designed to ensure that it is secure (i.e. protected) distribution of that content over the Web. The objectives are prevention of copying or duplication and protect the content. As a value added service, access to e-books and purchases through ecommerce are being done through the website. Both English and language books classified under various heads 146 can be chosen and online purchases can be made from the website. The cover of the books is scanned and unique database numbers are assigned for the books. A person selects and adds to his cart a book or range of books and then makes online payment and then the book is sent physically. In the case of e-books, the user can go through the index of the content that is available and then can register by paying he necessary amount. Once the payment is made and the authenticity of the user is identified, he is provided with a user name and password by which he can enter into the area where the content is available for reading and printing the material. By this way, one can and will not lose control over the products displayed and also protect the digital content. Control can be maintained for operations such as playing, printing, copying and saving etc. In the case of online shopping of books, CDs, dress materials, jeweler etc., the shopping cart metaphor is used as used by many online retailers in which the customer can see the range of items for sale, the customer moves the item to his or her shopping cart, and then a permission must be issued to the customer (sometimes in the form of an "offer URL") before actual access to the encrypted file is granted. The customer can then either access the item directly or go to the offer URL to download or print the content of the item. Imagine that the online retailer is selling e-books, and you will understand how this might work. Conclusion DOI: The idea of DOI needs to take off with the publishers. Identifiers are easy to be assigned though creating and maintaining a database needs some work. There are certain benefits for the public and libraries at large as there are hopes that publishers will assign DOIs for books out of print for which they holds rights as this would benefit the researchers, scholars are publics at large as this will help in creating a unified catalog. With such a system, books that are out-of-print but available in a certain library can be identified. This may be a Herculean task for libraries, but in the long run. As far as digital rights management is concerned, many companies worldwide are entering the business, and established ones are developing new products or enhancing some older ones. Many top line companies including companies such as Adobe are working on DRM technologies. This apart a new host of trusted people who act as the via media between the content/ online mall provider and endusers need and are the clearinghouses in the business chain. They handle the payment gateway and process the payments made online and these companies work in the background. Information managers/ librarians or web writers in our case are using the DRM systems extensively. The need of the hour is IDF should sit with all media companies and try to understand the mechanism of how they work and try to come out with standards. Publishers, Libraries etc have standards or people working and striving for standard in their respective professions, whereas media is the only profession that does not have standards as far as digital library management is concerned at least in India. Therefore the need of the hour is some standards or guidelines or large media houses need to share their experiences in standardization so that smaller firms can follow them. Since each and every firm uses their own way of classification, indexing, storing and identification methods, huge amount of information is unnecessarily duplicated or is lost due to lack of standards. In the current scenario, a librarian has to be mentally prepared to work on these lines though there is still a clash of role as they are not accustomed to such a role i.e., acting as a coordinator, though they have always been proud owners and value their possession. In the years to follow, librarians/ information professionals need to go a step further and start producing digital content in the form of text, PDF, html, pictures, graphs, audio/ videos, broadband content, CDs, MPEGs, MP3s etc. 147 Appendix 1: Given below are screen shots of News, photo’s, supplements/ features, uploading systems etc. Appendix 2: Given below are screen shots of the client side authentication and uploading system. Notes and References: 1. 2. 3. 4. 5. Cooper, Charles, (2000): E-books: An idea still ahead of its time, ZDNet News Issue dated August 9, 2000. Reuka K, (2002): Electronic books and the future of libraries: LIST 2002, 27-28 January, 2002. (133 p) Chuoksey SS (2002): On-line Library Services towards On-line education system, MANLIBNET 2002, 3-5 April 2002, (258 p) Rose MJ (2001): E-Books Live On After Mighty Fall, Issue Wired News, Dec. 18, 2001 Digital Rights Management for eBooks: Publisher Requirements, version 1.0 Association of American Publishers, New York, NY and Washington, D.C. USA, November 2000 (available at http://www.publishers.org/home/drm.pdf). 148 6. Bide, Mark, In Search of the Unicorn, The Digital Object Identifier from a User Perspective, BNBRF Report 89, Book Industry Communications, London, February 1998 (available at http://www.bic.org.uk/unicorn2.pdf). 7. Bernstein, Paula, DOI: A New Identifier for Digital Content. Searcher: The Magazine for Database Professionals, Vol. 6, No. 1, Jan. 1998 (available at http://www.infotoday.com/searcher/jan98/story4.htm). 8. The DOI Handbook, version 0.5.1, The International DOI Foundation, Washington, D.C. USA and Geneva, Switzerland, September 2000 (available at http://www.doi.org/handbook_2000/index.html). 9. Paskin, Norman, Digital Object Identifier: implementing a standard digital identifier as the key to effective digital rights management, The International DOI Foundation, Kidlington, Oxfordshire, United Kingdom, 2000 at 3 (available at http://www.doi.org/doi_presentations/aprilpaper.pdf). 10. Bates, Marcia J. (1988), "How to Use Controlled Vocabularies More Effectively in Online Searching," Online, 12(6), pp. 45-56. 11. Bates, Marcia J. (1998), "Indexing and Access for Digital Libraries and the Internet: Human, Database, and Domain Factors," Journal of the American Society for Information Science, 49 (13), pp. 1185-1205. 12. Bates, Marcia J. (1999), "The Invisible Substrate of information Science," Journal of the American Society for Information Science, 50, (12), pp. 1043-1050. 13. Bates, Marcia J. (2002), "The Cascade of Interactions in the Digital Library Interface," Information Processing and Management, 38(3), pp. 381-400. 14. Bates, Marcia J. (2002), After the Dot-Bomb: Getting Web Information Retrieval Right This Time, First Monday, 7 (7). 15. Bowker, Geoffrey C. and Star, Susan Leigh (1998) (Ed.), "How Classifications Work: Problems and Challenges in an Electronic Age", Library Trends, 47(2). 16. Svenonius, Elaine. (1983), "Use of classification in online retrieval." Library Resources and Technical Services, 27 (1), pp. 76-80. 149