Project Document Cover Sheet
Transcription
Project Document Cover Sheet
Project Document Cover Sheet Project Information Project Acronym YS-IMG Project Title Yale-SOAS Islamic Manuscript Gallery Start Date 1 September 2009 Lead Institution School of Oriental and African Studies, University of London Project Director John Robinson Project Manager & contact details Huei-Lan Liu [email protected] Partner Institutions Yale University Project Web URL http://www.soas.ac.uk/ysimg/ http://www.library.yale.edu/img/ Programme Name (and number) JISC-NEH Transatlantic Digitisation Collaboration Grants Programme Manager Alastair Dunning 31 August 2009 End Date Document Name Project Plan Document Title Reporting Period Author(s) & project role John Robinson Date Filename Project plan_ysimg URL ⌧ Project and JISC internal Access General dissemination Document History Version Date 1 26/10/09 2 24/11/09 Page 1 of 24 Document title: JISC Project Plan Last updated: Comments Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 JISC Project Plan Overview of Project 1. Background The majority of manuscripts accessible online today favours Psalters, books of hours, bestiaries, and other medieval manuscripts from Europe. Manuscript study in terms of Western texts is wellsupported and enjoys high standards of critical scholarship. The Middle Eastern manuscript culture is less well known, primarily because the materials are far harder for qualified scholars to examine or study. Thus, by enhancing access, the possibilities for making a material change in our understanding of Middle Eastern cultures of the medieval, early modern and modern times are great. Yale and SOAS seek to join collections, resources, staff, and expertise to create a united source of reference works related to Middle Eastern manuscripts and to develop the technical apparatus needed for connecting reference materials to original works. The proposed selection represents a wealth of intellectual interests that will highlight the contributions of Middle Eastern scholars, among them philosophers, poets, physicians, and scientists. The project partners plan to provide new and enhanced access to these important materials, as well as to develop a technical model that can be used by other libraries. Increasing interest and attention has been given of late to the scholarly community that specializes in Arabic and Middle Eastern Studies. In the U.S., it is clear that there is an enormous rise in enrolments in Arabic classes. In May 2007, the Modern Language Association (MLA) published “Enrolments in Languages Other Than English in United States Institutions of Higher Education, Fall 2006,” in which Furman et al. presented comparative data on foreign language enrolments from 1998 to 2006. The authors reported that “Arabic continued its impressive expansion: from 1998 to 2002, it lifted its enrolments by 92.3%, and between 2002 and 2006 by a remarkable 126.5%.”2 In the UK, Ph.D. theses reflect the increase in enrolment and attention paid to Middle Eastern studies. In 1997, 55 theses were accepted; the average per year by 2006 was 86. Birmingham, SOAS, and Oxford are the top three institutions granting PhDs. in Middle Eastern topics since 1997. Concurrent to this visible growth, specific interest has been focused on the research needs of a scholarly community that was established centuries ago. In 2007, responding to an initiative in the United Kingdom regarding the importance of supporting Islamic Studies in higher education, the Joint Information Systems Committee (JISC) issued a call for an investigation into user needs within the field. The University of Exeter won the bid to complete the study, and in June of 2008 the project team published the results-based data extracted from online questionnaires, focus groups, and telephone interviews. In addition, the study’s authors reviewed reading lists from UK institutions, doctoral theses, and existing online gateways to Islamic Studies materials. The recommendation ranked first by the authors was the creation of a gateway to Islamic Resources, including primary texts, fully digitised Islamic manuscript catalogues, and reference tools such as dictionaries and Islamic websites. A worldwide initiative has further concentrated on manuscripts in particular. Following the First Islamic Manuscript Conference held at King's College, Cambridge, in 2005, the conference participants encouraged the founding of a global association to coordinate the efforts of scholars and librarians working with Islamic manuscripts. In the following year, forty-five founding members established The Islamic Manuscript Association (TIMA), an international group pledged to protect Islamic manuscript collections and to support those individuals working with these collections. One of TIMA's ongoing projects focuses on facilitating access to these manuscripts. Page 2 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 Yale and SOAS have collaborated to form a small but selective list of resources in Arabic and Western scripts from their collections. In addition, the project organizers have culled lists of existing digital copies of Arabic and Middle Eastern manuscripts held in repositories of prominent libraries around the world, for example in the digital library at the Beinecke as well as the British Library. We intend this selection to be a pilot project that is scalable and extensible to other collections on the partner campuses or in other libraries, such as the Digital Shikshapatri collection at Oxford and the Genizah manuscript collections at Cambridge. The proposed project builds on work done at Yale University Library in three digital initiatives related to the Middle East. First, the OACIS project (http://www.library.yale.edu/oacis/) laid the foundation by creating an electronic union list of serials published in and about the Middle East. From this beginning, the library team expanded the usability of the bibliographical catalogue by digitising full text articles from a selection of academic journals published in the Middle East. Second, with funding from the National Endowment for the Humanities (NEH), this digitisation began with the Iraq ReCollection project, which converted nine (9) Iraqi journals (104,000 pages). (http://www.library.yale.edu/digIraq/) Third, the full text entries from this digitisation effort became the first deposited articles in a searchable repository developed as part of the AMEEL project. Like OACIS, Project AMEEL began with funding from the U.S. Department of Education. (http://www.library.yale.edu/ameel/) The AMEEL project also added a regional selection of academic journals spanning Tunisia to Saudi Arabia. Development work during the AMEEL project has produced the technical infrastructure for search, retrieval and display of the journal articles using the open source FEDORA repository software. At present, over 125,000 pages have been deposited with a goal of finishing 240,000 pages by the fall of 2009. While accomplishing these goals, the Yale project team has gained considerable expertise in Arabic text digitisation. Starting in 2005 with the assistance form the digital staff at the Bibliotheca Alexandrina, the Yale team has since formulated its own digitisation workflow. Further, the team has worked to automate as much of this workflow as possible, in order to keep labor costs low while producing high quality scanned images and OCR output. Additionally, the Yale team held a digitisation workshop, geared toward the needs of U.S. academic libraries, at the November 2008 annual Middle Eastern Studies Association conference. (http://www.library.yale.edu/ameel/MESAworkshop/index.htm ) Digitisation projects at SOAS are increasing in number and scope. The Endangered Languages Archive (ELAR), started in 2005 as part of an international network of digital endangered language archives, permits scholars to deposit documentations and descriptions of endangered languages. (http://elar.soas.ac.uk/) Begun in Fall 2008 with JISC funding, the Fürer-Haimendorf project plans to digitise, research, catalogue and mount online approximately 20,000 photographs from the FürerHaimendorf archive held at SOAS. (http://www.soas.ac.uk/furer-haimendorf/ ) Christoph von FürerHaimendorf (1909-1995) amassed an important collection of his own photographs, film, and written materials during fifty years of scholarship on tribal cultures in South Asia and the Himalayas. The project is part of a longer-term strategy to mount online the entire Fürer-Haimendorf archive, as well as other special collections at SOAS. Since the library at SOAS has not previously worked on digitisation projects involving manuscripts or printed text as other UK institutions such as Oxford and Cambridge, the Yale-SOAS partnership aims not only to address user needs by uniting essential reference material related to Arabic and Middle Eastern manuscripts, but also to share expertise gained at Yale and increase digitisation capacities at SOAS, especially as the work relates to the digitisation of Arabic text and the integration of digital resources. 2. Aims and Objectives Yale University Library and the School of Oriental and African Studies (SOAS) seek to improve online access to trans-Atlantic collections of digitised manuscripts, manuscript catalogues, and Page 3 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 dictionaries by creating a virtual archive, open and freely accessible, for researchers working in the field of Arabic and Middle Eastern Studies. 2.1 Digitisation endeavour • To create an integrated set of full-text digital resources supporting manuscript research from manuscript catalogues and dictionaries by converting materials in Arabic, Persian, and Western scripts (primarily Latin, German, Spanish, and French) and depositing these into searchable repositories. • To augment existing digital collections of Arabic and Persian manuscripts by scanning, depositing, and indexing selected Yale- and SOAS-held historical manuscripts. 2.2 Integration Project • To develop an infrastructure to integrate manuscripts with related reference resources. • To build a suite of tools that will analyse digitised materials and construct internal crossreferences for connecting the materials in the archive. 3. Overall Approach 3.1 Preparation and processing of materials We will use a combination of in-house and outsourcing – sharing lessons learned from previous projects – to complete all image capture. Varying costs will be factored into the budget to account for this hybrid approach. The manuscripts will be processed with appropriate supervision by related curators and Preservation librarians at on-campus facilities to limit the exposure of the fragile documents to outside conditions. The scanning of the dictionaries and catalogues may be outsourced, depending on their physical condition. Three OCR software products will be used: 1) Automatic Reader – OCR Gold from Sakhr Software Co., in Cairo, Egypt, 2) VERUS, the OCR product from NovoDynamics in Ann Arbor, Michigan, and 3) ABBY FineReader, an international company founded by David Yang for the automated translation of Russian dictionaries. ABBY FineReader is known for high accuracy conversion of text, especially in Western scripts. Sakhr and VERUS were developed for Arabic text specifically. VERUS, due to its original design, can handle a mix of languages and degraded documents better than Sakhr. On the other hand, Sakhr’s engine is based on a study of modern newspapers from the Middle East and thus recognizes a wider range of vocabulary. By incorporating two different OCR software packages into the digitisation workflow, we can accommodate and manage varying conditions found in the selected materials. The OCR conversion of texts with a mixture of languages may require periodic modifications to existing workflows. The adjustments will be managed on a timely basis and will be documented so that workflow knowledge may be shared with other libraries. We will follow established best practices for TEI mark-up as well as the Project AMEEL model to include Dublin Core for repository organization and MARCXML metadata for librarian perusal. Individuals with language expertise, as well as metadata training, will manage mark-up and quality assurance tasks. The Yale team will share best practices and guidelines with the SOAS team. For example, we will perform quality control checks on a statistical sample of all finished work, in accordance with American National Standards Institute ANSI/ASQ Z1.4-2003. A random sample equal to 10% of the total batch of files shall serve as the inspection sample for each file type. In tiered approaches employed in previous projects, a batch failing the 10% test is rechecked using a different sampling of 5% to determine final acceptance or rejection (and reprocessing) of the batch. Experience at SOAS suggests that, while post-digitisation inspection is necessary, it is essential to build quality Page 4 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 control into the digitisation process in order to minimize the need to reprocess material. SOAS has adopted a strategy of embedding essential metadata within the image file using IPTC tags; this reduces the risk of orphan files (where image files cannot be found by the catalogue database or subsequently identified). It also facilitates the parallel development of the database (Phase 2) while digitisation proceeds (Phase 1), with the embedded metadata being automatically ingested into the cataloguing database when ready. 3.2 Organization of and access to materials 3.2.1 Metadata generation and Cross-reference links All metadata created, for ingest to the Fedora archive, will be stored in XML format. We will begin with MARC21 records extracted from the partners’ Online Public Access Catalogue (OPAC). Within the first month of the project, the technical team will convert these records into Dublin Core (DC) and MARCXML files at the title, volume, and author levels for manuscript catalogues and dictionaries; the manuscript records will have DC and MARCXML at the title, author, and accession level. The MARCXML file will be available when viewing searched materials, so that librarians may review encoded data. The customized Dublin Core (DC) file, added at the time of ingestion into the digital archive, will include tags to describe the resource as well as external and redirect instructions in the technical metadata for those materials with recognized cross-references. We have, initially, identified three Use Cases as follows: ♦ Use Case #1: a direct link exists from a digitised manuscript catalogue to a digital copy of a manuscript freely accessible via the Internet. ♦ Use Case #2: a “See also” link from a digitised manuscript catalogue to a digital copy of another manuscript from same author or, if possible, same subject. ♦ Use Case #3: a direct or a “See also” link from a digitised manuscript catalogue to digitised dictionary entry. We will begin the proof of concept with those Arabic or Persian manuscripts already available in Yale’s BRBL digital library. For those resources identified as existing in nonpartner digital libraries, we will seek permission and technical specifications to create and maintain cross-reference links. 3.2.2 Cross-collection searching: EPrints / Fedora1 Project AMEEL at Yale uses Fedora 2.2, an open source software product. The Fedora framework provides for OAI compatible harvesting for resource discovery, thus permitting other repositories to discover the newly generated metadata. EPrints, in use at SOAS and other UK academic institutions, is also open source software for OAI compliant repositories. Both repository approaches share commonalities, which will be explored to resolve the connectivity needed for searching the joined collections simultaneously. While the design of the front door to the proposed digital archive may appear the same, the underlying architecture at each library site will correspond to the requirements of the repository software in use. The technical team from both campuses will develop modules of code that can be adapted to both software approaches. 3.2.3 Durable URLs and Citation creation Persistent identifiers are essential in developing a digital archive since storage media and formats are sure to change over the life of a digital library. Citation links will cease to 1 Note - Both partners will develop its own front end to host the digital objects. Yale will integrate the digitised materials with their existing archives through their AMEEL (An Arabic and Middle Eastern Electronic Library) portal site, SOAS will mount its own collections on the Digital Archives and Special Collections website. A suite of tools will be developed to enable cross-collection searching and cross-references for connecting materials in the archive. Page 5 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 function and cause frustration for library patrons without persistent identifiers. Some current standards for persistent identifiers include: • PURLs, or Persistent Uniform Resource Locators, which use an intermediate resolution service that points to the URL of the digital object; • DOIs: Digital Object Identifiers, are managed by an open membership consortium, which provides a name to identify a digital object that remains unchanged over the life of the object. The location of the object may change, but the name associated with it will not; • Handles: a handle server is a naming management system that creates a unique identifier at the time an object is added to a repository. The Yale and SOAS technical teams will collaborate on the configuration of persistent identifiers compatible with both repository approaches. 3.3 Storage, maintenance and protection of data The Fedora repository will reside on a Linux server at Yale; the EPrints repository will also be Linux- based but at SOAS. The archival TIFFs will be stored separately to the repository, but at each institution, which will arrange suitable off-site backup. Both teams will follow technical policies, including sensible naming conventions and file structure, so that content can be integrated effectively and efficiently as the project progresses. The servers will include systems management subscriptions to keep the server software protected and current. Information Technology support staff at the partner locations will regularly monitor server traffic, guard against attacks, and apply widely accepted security practices to development and maintenance servers. SOAS uses a mix of out-sourced and in-house server solutions, working in partnership with the University of London Computer Centre (ULCC). It is likely that ULCC will be used to host the EPrints repository while the archive of high resolution image files will be on SOAS’s own data storage solution. 4. Project Outputs • • • • • • Digital copies of 16,800 pages in fifteen significant manuscript catalogues and six dictionaries, and sixteen historical manuscripts. (approx. 3000 leaves) See Appendix A for a consolidated list of selected materials for digitisation. OCR text extraction and metadata of selected manuscript catalogues and Arabic and Persian dictionaries. Cross-reference links from the initial set of existing scanned manuscripts to newly digitised catalogues and dictionaries. OAI configuration for the newly digitised materials to be discovered by other electronic resources and indexed by internet search engines. Digitised materials to be deposited into open and freely accessible networked repositories. Findings regarding the new digital collection and project documentation to be published on the project website for use by other academic libraries. 5. Project Outcomes • • • • Support manuscript research from manuscript catalogues and dictionaries, many of which exist only in printed form with publication dates from the 19th century, by converting materials in Arabic, Persian, and Western scripts (primarily Latin, German, Spanish, and French) and depositing these into searchable repositories. Serve as a scalable and extensible model for other special collections and libraries rich in manuscripts and related reference materials. Completion of transatlantic specialist collections by making them electronically accessible via the internet Enhanced preservation of rare and fragile materials Page 6 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 • Highlight the contribution made to world knowledge by Arab philosophers, physicians, and scientists. 6. Stakeholder Analysis Stakeholder Interest / stake Status as leaders in Islamic studies Models for other special collections and libraries to follow suit Enhanced access to valuable historical manuscripts linked to robust reference materials Online access to rare resources made easy through cross-repository searching SOAS and Yale SOAS and Yale Libraries Researchers around the world Transatlantic collections Importance High High High Medium to High 7. Risk Analysis Risk Probability (1-5) Severity (1-5) Score (P x S) Action to Prevent/Manage Risk 3 5 15 Project manager unable to spend sufficient time on project due to other work commitments 3 4 12 Not able to recruit suitable staff 2 4 8 Comprehensive documentation, including contractual obligations, will help to minimize the risk. In the case of the specialist staff, SOAS has recruited two people; Yale has cross-trained staff. SOAS is a small institution, so there is no full-time back-up for individual members of staff. Project tasks will be allocated to other members of the project team, effectively spreading the load. Yale ’s Project Director will monitor. The project team is already in place and it is intended that a number of digital photographers will be recruited from the student body at SOAS to ensure that there is a pool of competent staff to complete the project. 2 3 6 1 4 4 Staffing Loss of staff, in particular, specialist staff Organisational Project fails to keep to schedule Project scope creep and/or over-run Page 7 of 24 Document title: JISC Project Plan Effective monitoring, regular meetings, SMART objectives, and contingency timing built into the key targets will ensure that the project keeps to schedule. The project has deliberately chosen a small list of important materials. Regular review of objectives will ensure that the Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 project is restricted to its initial aims. Any areas of the collection identified as potential projects for the future will be documented and form part of the “next steps” strategy. Regular team meetings and prompt sharing/distribution of documentation; key project outcomes and documents stored in central facility available to all partners; regular reviews of progress and objectives against plan There will be regular reviews of financial expenditure and most of the costs are up front in terms of equipment purchase and staff costs. The project team has agreed to ensure both quantity and quality outputs for cataloguing and metadata. Lack of communication between project partners 1 3 3 Cost over-run/failure to keep to budget 2 3 6 Failure to maintain metadata creation and cataloguing targets 2 4 8 Technical Poor quality metadata 2 4 8 Poor quality images 2 4 8 Loss of digital images 2 5 10 Inappropriate storage 1 2 2 Technology failures 2 3 6 External suppliers Failure of contractors to deliver Web system 2 5 10 Clear requirements and timescale to be agreed with contractor, with penalty clauses. Clear and frequent milestones to be agreed with contractor. Legal IP issues 1 5 5 Copyright issues 1 5 5 Copyright permission is on file at Yale for only title post-1923. Yale will consult with its General Counsel. A copyright audit will be carried out in consultation with Page 8 of 24 Document title: JISC Project Plan Metadata guidelines will be drawn up based on Dublin Core and other successful web-based projects of a similar nature. The team will follow mutually agreed-upon standards for ensuring quality control of the metadata. The digitisation assistants carrying out the work have or will receive training on the equipment. The team will test quality at regular intervals. Ensure back-up and off-site storage of images. Workflow will include storing files on centralized resilient RAID system at both partner campuses. Essential equipment will be under warranty or support. Potential down time will be included as a contingency in the work plan. Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 SOAS’s Information Compliance Manager. 8. Standards Name of standard or specification Metadata standards Version Notes TEI MARCXML Dublin Core Image standards TIFF JPG Optical Character Recognition OCR Gold VERUS ABBYY Repository Fedora Eprints OAI-PMH We will follow established best practices for TEI mark-up as well as the Project AMEEL model to include Dublin Core for repository organization and MARCXML metadata for librarian perusal. Individuals with language expertise, as well as metadata training, will manage mark-up and quality assurance tasks. The Yale team will share best practices and guidelines with the SOAS team. Following the Project AMEEL model, the archival format will be TIFF; while the display format, i.e. from the repository to the page viewer, will be JPG to achieve faster online delivery. Three OCR software products will be used: 1) Automatic Reader – OCR Gold from Sakhr Software Co., in Cairo, Egypt, 2) VERUS, the OCR product from NovoDynamics in Ann Arbor, Michigan 3) ABBYY FineReader, an international company founded by David Yang for the automated translation of Russian dictionaries. Project AMEEL at Yale uses Fedora 2.2, an open source software product. The Fedora framework provides for OAI compatible harvesting for resource discovery, thus permitting other repositories to discover the newly generated metadata. EPrints, in use at SOAS and other UK academic institutions, is also open source software for OAI compliant repositories. Both repository approaches share commonalities, which will be explored to resolve the connectivity needed for searching the joined collections simultaneously. The technical team from both campuses will develop modules of code that can be adapted to both software approaches. Standards for digitization: media resolution ratio archival format manuscripts 600ppi 1:1 TIFF text— 300ppi* 1:1 TIFF** Page 9 of 24 Document title: JISC Project Plan Compression uncompressed or lossless compression; no LZW CCIT Group 4+ number of digital copies 4 (small, medium, large, thumbnail) 1 *** Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 bitonal text— greyscale 300ppi* 1:1 TIFF** LZW++ 1 *** * The resolution for text is set at 300ppi based on experience with text extraction using OCR software. ** Following the Project AMEEL model, the archival format will be TIFF; while the display format, i.e. from the repository to the page viewer, will be JPG to achieve faster online delivery. *** The thumbnail will be generated at the time automated scripts deposit each digital file into the repository. + CCIT Group 4 is an image compression schema based on the "Comité Consultatif International Téléphonique et Télégraphique"), a telecommunications standard created in 1956 ++ LZW is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch 9. Technical Development 9.1 Metadata generation and cross-reference links We will conduct the preliminary analyses using an Open Source product called AraMorph, which returns morphological tokens, or categorized blocks of text for lexical analysis. The resulting index will be reviewed manually to determine rules for extracting essential keywords, titles, place names, and subject headings. In addition, we will rely on language experts to develop crosswalks, or interpretive tables, to link modern spellings to the many transliteration schemas, varying over time, which are present in the selected materials. For example, the manuscript listings for ibn Jazlah and al-Suyuti in the Bodleian catalogue and the Rieu supplement appear as Ali B. Djazla and Alsoiuthi. Further, in order to compile a full listing of existing digital copies of targeted manuscripts, we will employ two methods: 1) OAI harvesting of appropriately configured online databases, and 2) student workers, with language skills, to conduct Internet searches and review the OAI harvested results. The Yale team will share the methodology and results with the SOAS technical team in order to determine mutual processes for link creation. 9.2 Page viewing As part of the AMEEL project at Yale, the technical team has developed a page turner to simulate the reading experience, with adjustable page size and navigation. The technical team for the proposed project will work to adapt these methods, as well as other suitable Open Source methods, as needed to deliver a page viewer of high usability for library patrons. There are two basic outcomes from this effort: 1) to permit search word highlighting, and 2) to manage oversize displays from manuscript folios, catalogues, and dictionaries that allow the patron to work with more than one display at a time and well as enlarge specified sections of the works. 9.3 Repositories and cross-collection searching Yale’s Project AMEEL has developed a full-text repository using the Fedora open source framework, SOAS uses Eprints for their digital libraries. The two technical teams will create the necessary tools to permit cross-collection transatlantic searching from each other’s archive and regardless of the entry point chosen by the library patron. We will insure that interoperability between EPrints (SOAS) and Fedora (Yale) is a key goal during the project. Page 10 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 10. Intellectual Property Rights Out of copyright resources will be our first priority for this digitisation project, in order to avoid the timeconsuming efforts needed for seeking copyright permission. However, exceptions will also be considered if they enhance accessibility. We will carry out copyright clearance and obtain permissions for digitisation and online access. The IPR of the digitised images, metadata and transcripts of the manuscripts will lie with SOAS and Yale. Project Resources 11. Project Partners Yale University Library Yale has extensive experience in digitising Arabic material and is the acknowledged experts in applying OCR systems to Arabic script. Their role is to share best practices and knowledge in digitisation and OCR text extraction with the SOAS team, and select and digitise materials that are lacking in SOAS Library. 12. Project Management The Project will be under supervision of the Digitisation Project Board, established to oversee the running of digitisation projects at SOAS. The Project Team will be responsible for delivering the project. Conference calls between project partners will be scheduled monthly, usually on the last Friday of each month to report on progress and agree actions and deadlines for the next stage. Project title Name Responsibilities Principle Investigator and Project Director (SOAS) John Robinson John is the Director of Library and Information Services. Principal Investigator (Yale) Ann Okerson Ann is Yale's Associate University Librarian with specific responsibility for Collections Development and International Programs Project Director (Yale) Elizabeth A. S. Beaudin Elizabeth is the Manager of International Digital Projects, Yale Curator (SOAS) Narguess Farzad Narguess is a senior fellow in Persian in the Department of Languages and Cultures of Near and Middle East Simon Samoeil Simon has been Curator of the Near East Collection at Yale Library since 1990 and is a member of the Yale Council on ME Studies Huei-Lan Liu Responsible for the management, coordination and administration of the project. Huei-Lan is the Repository Support Officer. Curator (Yale) Project Manager (SOAS) Page 11 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 Academic Advisor (SOAS) Peter Colvin Peter is a former specialist librarian for Islamic Middle East in SOAS Library. Project Technical Consultant (SOAS) Malcolm Raggett Malcolm is the Head of the Centre for Digital Asia, Africa and the Middle East Systems Programmer (Yale) Xinjian Guo Guo is a senior member of the central Library IT staff Preservation and Collection Care Librarian (Yale) Ian Bogus Archivist (Yale) Bill Landis Digitisation Assistant (SOAS) Alex Shipman Ian is Head of the Collections Care unit in the Preservation department Bill is Head of Arrangement, Description, & Metadata Coordinator Responsible for digitising and scanning images and the input of basic metadata. 13. Programme Support Facilitate cooperation with other JISC supported Islamic digitisation projects. 14. Budget See Appendix B Detailed Project Planning 5. Workpackages See Appendix C 16. Evaluation Plan Timing Oct 2009 – Jun 2010 Oct 2009 – Aug 2010 Factor to Evaluate Digitisation of 3,000 folio images; 16,800 page images Questions to Address Competing on schedule OCR text extraction and consequent metadata creation Lexical analysis of OCR-extracted text Page 12 of 24 Document title: JISC Project Plan Method(s) Measure of Success Monitoring progress at Project team meetings 100% of images are completed to schedule Quality control checks on a statistical sample of all finished work A random sample equal to 10% of the total batch of files being accepted. Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 Jan 2010 – Aug 2010 Crossreferencing Usability testing Identify 12 crossreference links to be created during OCR extraction 100 % of the twelve links properly produced from automated scripts developed during the project Jan 2010 – Aug 2010 Cross-collection searching Usability testing We will judge success when students can retrieve 80% or more of the search materials. Aug 2010 - Usage statistics How visitors find the content, where are they from, which content is popular Usability study with control groups of undergraduate and graduate students and specific types of searches Collection of a range of usage statistics (website traffic; searches; downloads) for further analysis Ongoing monitoring 17. Quality Plan Output Timing Quality criteria Sep 2009Dec 2009 XML Mark up and metadata Oct 2009 Jun 2010 Creation of scanned images Jan 2010 Aug 2010 Usability of crosscollection search QA method(s) Evidence of compliance Quality responsibilitie s Yale/SOAS TEI Inspection of randomly selected batch files Image quality Conformant with international metadata/encoding standards Images meet the required standards Yale/SOAS Usability study with control groups of students Minimum 80% of retrieval success rate Yale/SOAS Quality tools (if applicable) Dublin Core MARCXML TIFF JPG 18. Dissemination Plan Timing Oct 2009 Dissemination Activity Wiki Audience Project team members Oct 2009 Project website JISC, SOAS students and staff, academic community, other Page 13 of 24 Document title: JISC Project Plan Purpose To share project documentation and communication To highlight the project, comply with JISC requirements and Key Message The existence of the project and that advice and feedback are Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 digitisation projects Oct 2009 One day workshop. 27th October 2009 SOAS Project team members Dec 2009 Presentation at NACIRA (National Conference for Information Resources on Asia). December 2009 Presentation at MELCOM (Middle East Libraries Committee) UK Meeting. 12th or 13th January 2010 Presentation at MELCOM International Conference. Cordoba, 19th-21th April 2010 Presentation at TIMA (The Islamic Manuscript Association). Cambridge, 8th-10th July 2009 Project launch event Academic community Jan 2010 Apr 2010 July 2010 July 2010 encourage feedback about the process To familiarise with issues related to digitisation project To promote the existence of this resource Highlight the contribution made to world knowledge by Arab philosophers, physicians, and scientists To promote the existence of this openly accessible online resource The availability of the online resource relating to Islamic studies Academic community Academic community welcome as part of the process Academic community Academic and state sector communities 19. Exit and Sustainability Plans Project Outputs Digitised materials Action for Take-up & Embedding Digital images are captured according to preservation standards Findings regarding the new digital collection and project documentation to be published for use by other academic libraries Documentation OCR text extraction Cross-collection searching Project Outputs Incorporate different OCR software packages into the digitisation workflow Digitised materials to be deposited into partner’s repositories Why Sustainable Action for Exit Outputs checked and approved by preservation experts Ensure all procedures and technical standards are documented and made available on project Wiki or website Quality control of conversion of texts and periodic modifications to existing workflows Infrastructure is implemented to permit cross-collection transatlantic searching Online repository Will continue to be maintained by SOAS Digitised materials Will be preserved by SOAS Scenarios for Taking Forward Investigate further funding opportunities to enhance and expand content To upgrade/migrate to new hardware/software formats Metadata Standards Maintained by SOAS To be used and enhanced Page 14 of 24 Document title: JISC Project Plan Issues to Address Further funding opportunities Ensuring built in capacity to fund upgrades Funding and staff Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 to support new digitisation projects Appendixes Appendix A. List of Selected Materials Page 15 of 24 Document title: JISC Project Plan capacity to maintain and extend this resource Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 Page 16 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 Page 17 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 Page 18 of 24 Document title: JISC Project Plan Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 Appendix B. Project Budget Project Acronym: YS-IMG Version: v1.0 Contact: John Robinson Date: 1 October 2009 Page 20 of 24 Document title: JISC Project Plan Project Acronym: Version: Contact: Date: Appendix C. Workpackages JISC WORK PACKAGE WORKPACKAGES Month 1: 2: 3: 4: 5: Sep 09 Sep 09 Sep 09 Sep 09 Oct 09 Project Initiation Integration Digitisation Dissemination Evaluation 1 2 3 4 5 6 7 8 9 10 11 12 X X X X X X X X X X X X X X X X X X X X X X X X X X X x X X X x X X X x X X X X X X X X X Project start date: 1 September 2009 Project completion date: 31 August 2009 Duration: 12 months Page 21 of 24 Document title: JISC Project Plan Last updated: April 2007 13 14 15 16 17 18 19 20 21 22 23 24 Project Acronym: Version: Contact: Date: Earliest start date Latest completion date 1. Wiki Sep 2009 Oct 2009 Tikiwiki site for use by teams 2. Project Web Site Oct 2009 Oct 2009 Initial project promotion site Yale/SOAS 3. Recruitment of staff Sep 2009 Nov 2009 Digitisation assistant recruited SOAS 4. Text Analysis Sep 2009 Dec 2009 Mapping of extracted text for keyword and subject heading 12/09 delivery of final mapping to tech team Curatorial teams 5. Lexical Token Creation Oct 2009 Mar 2010 Modification of AraMorph to generate extracted keywords test dataset by 01/10; working specs by 03/10 Yale 6. Metadata Identification: TEI, DTD creation Sep 2009 Dec 2009 TEI schema for ingest ingest of test dataset Yale/SOAS 7. Metadata Mark-up tool Oct 2009 Dec 2009 Online forms for use with TEI schema first use by team 01/10 Yale/SOAS Workpackage and activity Outputs Milestone Responsibility YEAR 1 WORKPACKAGE 1: Project Initiation Objective: Staff recruitment, Project website Entry of meeting notes; subsequent monthly updates Yale/SOAS WORKPACKAGE 2: Integration Objective: Infrastructure development, analysis of digitised materials Page 22 of 24 Document title: JISC Project Plan Last updated: April 2007 Project Acronym: Version: Contact: Date: 8. Page Turner Oct 2009 Aug 2010 Modification to incorporate MSS images and active links prototype demonstrated 02/10 Yale 9. Citation Generation Oct 2009 Aug 2010 Bookbag features – save query, research notes prototype demonstrated 04/10 Yale/SOAS 10. Cross-Collection Searching Jan 2009 Aug 2010 Ground truth testing of known data deposited into repository usability testing 02/10 Yale/SOAS 11. 12 MSS links – manual link generation Oct 2009 Dec 2009 Ground truth testing of 12 links usability testing 02/10 Yale/SOAS 12. MSS links – automated link generation Jan 2010 Aug 2010 Ground truth testing of 12 links usability testing 07/10 Yale/SOAS 13. Workshop on Text Digitisation Oct 2009 Oct 2009 Participation and workbook 10/27/09 workshop EB 14. Scanning Oct 2009 Jun 2010 3000 folio images; 16,800 page images 15. OCR Processing Oct 2009 Aug 2010 16,800 page conversions half by 02/10 Yale/SOAS 16. Quality Control Oct 2009 Aug 2010 10% randomly selected from 16,800 page conversions half by 02/10 Yale/SOAS 17. Deposit Objects to Archives Oct 2009 Aug 2010 3000 folio images; 16,800 page images; metadata files for each 33% by 02/10 Yale/SOAS WORKPACKAGE 3: Digitisation Objective: scanning, depositing, and indexing materials Page 23 of 24 Document title: JISC Project Plan Last updated: April 2007 Yale/SOAS Project Acronym: Version: Contact: Date: Oct 2009 Aug 2010 Publishable documentation reviewed by team by 05/10 19. Press Releases; Talks at conferences Sep 2009 Aug 2010 Press releases Presentations at beginning and completion MELCOM 2010 20. Launch Event Jul 2010 Jul 2010 21. Sustainability Planning Jan 2010 Aug 2010 Mission statement; text for distribution to possible funding sources first session- 01/10; second session 04/10 Yale/SOAS 22. Usability Study with focus group Feb 2010 Aug 2010 Full text content searchable by student testers Able to retrieve 80% or more of the search materials Yale/SOAS 23. Functionality testing July 2010 Aug 2010 Metadata harvesting; search and retrieval; page viewing; citation generation System meet specifications Yale/SOAS 24. Final Project Report Aug 2010 Aug 2010 Report submitted to JISC 18. Compile Workflow Documentation Yale/SOAS WORKPACKAGE 4: Dissemination Objective: to highlight the project, promote the existence of this resource AO /EB /PC PC / HL WORKPACKAGE 5: Evaluation Objective: functionality and usability study Members of Project Team: AO=Ann Okerson (Yale) EB=Elizabeth Beaudin (Yale) PC=Peter Colvin (SOAS) HL=Huei-Lan Liu (SOAS) Page 24 of 24 Document title: JISC Project Plan Last updated: April 2007 HL