JISC Grant Funding Call Name of Programme & Strand:

Transcription

JISC Grant Funding Call Name of Programme & Strand:
Cover Sheet for Proposals
JISC Grant Funding Call
Name of Programme & Strand:
Programme Tags:
Name of Call Area Bidding For:
Name of Lead Institution:
Name of Department where
project would be based:
Full Name of Proposed Project:
Full Contact Details for Primary
Lead and/or Contact for the
Project:
Length of Project:
Project Start Date:
Project End Date:
Total Funding Requested from
JISC:
Funding Broken Down over
Financial Years (April - March)
Project Description / Abstract:
Keywords describing project:
I have looked at the example
FOI form at Appendix B and
included an FOI form in the
attached bid
I have read the Call, Briefing
Paper and associated Terms
and Conditions of Grant at
Appendix D
Information Environment 2011 Programme: Deposit of
research outputs and Exposing digital content for
education and research
"INF11" and "JISCexpo"
• Strand B - Expose
University of Birmingham
Institute for Textual Scholarship and Electronic
Editing, School of Philosophy, Theology and Religion
Linking documents, works and texts
Name: Peter Robinson
Position: Senior Research Fellow
Email: [email protected]
Tel: 0121 415 8441
Skype/VoIP: peterrr73
Address: College of Arts and Law, University of
Birmingham
Postal Code:
B29 6LG
9 months
1 June 2010
28 February 2011
£48,315
£48,315
This project will deploy an ontology of works, documents
and texts to create over 500,000 RDF records. It will link
these to item-level records for the works and documents
and to books and articles dealing with specific text
segments and manuscripts. These records will be made
available over the web for harvesting by RDF federators.
The project will create and document a public access
demonstrator, showing how a dynamic interface can be
built on the harvested RDF records. A public report will
outline the lessons learnt from this work.
Web 2.0; Resource Discovery; Research and
innovation; Digital libraries
YES
YES
1
1.
Project
overview
1.1.
Summary
1.1.1 This project begins with a request from a scholar: show me all the manuscripts which
have the New Testament Greek Text of Chapter 1, Verse 1, of the Gospel of John. Further
requests will follow: for the exact pages of those manuscripts containing those verses; for digital
images of the pages; transcripts of the text on those pages; annotations on these; links to articles
and books discussing these pages. Many of these materials are on the web. Yet, locating them is
extraordinarily difficult: a highly-skilled expert scholar could spend hours with search engines
and portals, and still not find all there is.
1.1.2 This is exactly the kind of task for which Web 2.0 technologies were created. We could
create unambiguous metadata for each of the objects mentioned in the last paragraph; web
crawlers could harvest all this metadata, and purpose-designed search engines could lead the
reader to the materials sought. More specifically: we could use an ontology to define the various
objects (manuscripts, texts, images, books and more) and their relationships. We could then use
RDF statements, following the 'linked data' model, to populate our ontology with manuscripts,
texts, images and other resources. We could place the RDF statements in a repository, open to
harvesting; we could offer an interface with key search functionality, and open our metadata so
that others can build their own access routes to our data.
1.1.3 That is what this project will do, for digital data it holds for four large textual traditions:
the New Testament; Dante's Commedia and Monarchia; Chaucer's Canterbury Tales. It will use
an ontology of works, documents and texts to answer the first questions given above: show me
the manuscripts (documents) which contain a text of the work the Gospel of John.1 Linkage to
further ontologies will permit the reader to find resources relating to these documents, texts and
works: images, transcripts, catalogue entries, online and offline materials of every kind. Much
work has already been done in creating ontologies for declaration and linkage of 'item-level'
objects (books, articles), for example in the EU Discovery and DELOS Projects, in the AUSTLIT FRBR-based ontology. This project will extend these 'item-level' ontologies through an
ontology of works, texts and documents jointly developed by the PI and Federico Meschini
(Loyola University). The project base in the Institute for Textual Scholarship holds some
fourteen thousand pages of manuscript transcripts and other information drawn from over 140
manuscripts, across four major textual traditions (New Testament Greek), Chaucer's Canterbury
Tales, Dante's Monarchia and Commedia. The project will create RDF triple records for every
text of each of these four works on every one of these pages: around 500,000 records.2 It will link
these to item-level RDFs for the works and documents, so that a reader may go seamlessly from
1
The use of the terms 'document', 'work' and 'text' follows the usual practice of textual
scholarship (e.g. T. Tanselle A Rationale of Textual Criticism). A 'document' is the physical
carrier of a text, e.g. Codex Sinaiticus (c.f. FRBR 'item'; CIDOC-CRM 'information carrier'); a
'work' is the intellectual object, e.g. The Gospel of John (c.f. FRBR 'work'; CIDOC-CRM
'information object'); a 'text' is an instance of a work in a document, as in the text of the Gospel of
John in Codex Sinaticus. For FRBR, 'Federated requirements for bibliographic records', see the
IFLA website http://www.ifla.org/publications/functional-requirements-for-bibliographic-records;
for CIDOC-CRM, representing an effort comparable to FRBR for cultural heritage
documentation, see http://cidoc.ics.forth.gr/ .
2
A separate bid by the PI to the 'deposit' strand of this JISC call also proposes to create separate
metadata for these same records. However, there is no overlap between the bids in their aims,
though some economies will be achieved in resource use if both are funded. These economies
will be applied to further development of the interface tools to be developed by the two projects.
2
an entry for the Gospel of John to a listing of the chapters and verses which compose it, to the
manuscripts and manuscript pages which hold these, and to the images and transcripts on the web
for those pages. Linkages will also be made in the other direction: to books and articles dealing
with specific text segments and manuscripts, for two of the works: to resources catalogued in the
Birmingham Research Publications Database and to the Chaucer Bibliography on line.
1.1.4 These records will be made available over the web for harvesting by RDF federators.
The project will then create and document a public access demonstrator, showing how a dynamic
interface can be built on the harvested RDF records. Some of the RDF records will be derived
from ITSEE’s partners in Münster and Florence, to show handling of distributed records. A
public report will outline the lessons learnt from this work.
1.2
Response
to
JISC
Objectives
1.2.1
1.
2.
3.
This project has three parts:
Creation of some 500,000 RDF records from existing research data;
Creation of an open-access demonstator
A publicly-available report detailing the lessons of this project, in terms of barriers
encountered, opportunities exposed, and paths for further exploitation.
These three parts correspond to the three areas of work detailed in §29 of the grant funding call.
In summary: the project will create a body of data of sufficient critical mass to test the
hypothesis of this JISC call: that the linked data model, as outlined in the "Four rules for linked
Data" (http://data.gov.uk/wiki/Linked_Data) will have considerable benefits for research data.
1.3
Value
to
JISC
community
1.3.1 Over the last two decades, large quantites of data in digital form relating to documents,
texts and works have been accumulated by many projects around the world. Almost all of this
has been encoded at the level of the page (for example, digital images of manuscript pages);
much has been encoded at the level of parts of works (for example, transcripts of particular pages
holding particular parts of works).
1.3.2 However, there is a significant gap between the capacities of cataloguing systems and the
level of detail now available within this data. Standard cataloguing systems inherit the print
model of item-level identification, and so are extremely powerful at identifying particular copies
of particular books. More recently, application of higher-level abstractions to item-level records
have permitted more complex grouping and retrieval. The well-known FRBR entity definitions
represent not only documents, works and texts as defined by this project but also persons,
corporate bodies, concepts, objects, events and places, and represent also the relationships
between the entities. CIDOC-CRM provides similar abstractions for cultural heritage materials,
and a harmonization of the FRBR and CIDOC-CRM definitions has been created, as a single
ontology ('Modelling Intellectual Processes: The FRBR-CRM Harmonization’, at
http://cidoc.ics.forth.gr/docs/doer_le_boeuf.pdf). This concentration on item-level data has
created successful models for complex linkages of (for example) books and articles. The
AUSTLIT database (http://www.austlit.edu.au/), built entirely on FRBR ontologies illustrates
this. Through AUSTLIT one can go to an author, see a list of the works of that author, then
various expressions of those works (films, plays, as well as novels, poems, articles), then be taken
to catalogue records for individual copies of those items (and, recently, to electronic versions of
those).
1.3.3 In the print world, there was no need to consider how records might point to individual
pages of documents, or separate parts of works. But in the digital world, the standard unit of
information is one browser screen: an image or text transcript of a single manuscript page.
Accordingly, there is now (as stated) a large body of 'born-digital' materials for which we have
information (often, immense amounts of it) below item-level. To take just one dramatic instance:
3
for Codex Sinaticus, which itself as a whole might be just one 'item-level' record, we have
information about the exact placing of each of the half-million words transcribed in the
manuscript on each of the 800 surviving manuscript pages. For each word, we also know exactly
its place in the verse, chapter and biblical books to which it belongs: altogether, over one million
separate pieces of information. Creating an ontology which extends linkages between textual
objects below the item level will permit expression of all this in a form accessible through
existing item-level ontologies. One could (in the AUSTLIT example) navigate further, to a
particular digital image of a particular page.
1.3.4 It would be difficult to overstate the possible impact of this work on that part of the JISC
community which deals with documents and the texts contained in them. At present, finding
individual parts of works or documents is difficult, and usually dependent on a particular project
interface (for example, the Codex Sinaiticus interface at www.codexsinaiticus.org). The methods
proposed by this project will make finding a part of a work or document precise and certain. The
expression of this information as publically-available metadata will make it possible for users to
go directly to the resource, independent of the project interface. See further 2.5 below.
1.3.5 This project may also prepare the way for a much larger impact. Typically, texts exist in
many copies, distributed in many places. Finding, identifying, digitizing, transcribing and editing
them is a task for whole communities, not just for the very few scholars who have so far been
able to do this work. But for this to happen, we need a secure means of identifying all the
individual parts of all these individual texts. This project offers a crucial first step towards that.
1.4
Innovation
1.4.1 As explained in the last section, much work has been done within the broad digital library
community on item-level records and the relationships among them and other entities. Very little
work has been done on developing a formal ontology for constructs below the item-level. Here is
a paradox, and an opportunity. The paradox is that almost all digital projects dealing with text
find a need to declare objects below the item level: for example, stating the sequence of pages in
a manuscript. Yet, very little of this information is exposed to public view: typically, the reader
must go through the project interface to access this. The opportunity is to create a means by
which this data can be reliably and efficiently exposed.
1.4.2 With Federico Meschini, the PI has developed an ontology for documents, works and
texts which provides the level of granularity required to support identification and linkages below
the item-level: not just to the level of the page or text segment, but to words, to individual
characters, even to the smallest mark on the page. This ontology is based on some five years of
preparation by the PI (first published in ‘Current directions in the making of digital editions:
towards interactive editions.’ Ecdotica 2007). It will be formally presented in a joint paper to the
2010 ADHO conference in London 2010. This project will be the first substantial instantiation of
this ontology.
1.4.3 Do we need a new ontology? Are there existing systems which could achieve what we
want? There are three other efforts to create formal structures which might be used to address the
needs of this project. The first is the Canonical Text Services initiative
(http://chs75.harvard.edu/projects/diginc/techpub/cts). The CTS scheme is not expressed as a
formal ontology; it does not provide a secure discrimination between documents, works and texts;
it is optimized for efficient retrieval of text fragments by applications, rather than formal
definition of a scheme for labelling fragments. However, translation of CTS data to and from the
ontology here proposed would be straightforward.
1.4.4 The other two possible methods are both digital library systems which create 'digital
wrappers' for related objects, and define the relationships among them. These are the Library of
4
Congress Metadata Encoding and Transmission Scheme (METS:
http://www.loc.gov/standards/mets/mets-home.html) and the Open Archives Initiative Object
Reuse and Exchange initiative (OAI-ORE: http://www.openarchives.org/ore/). These provide
powerful systems for managing objects within digital libraries. However, the resources this
project addresses may be anywhere, and not within digital library systems: that is the nature of
linked data. Indeed, many of the objects to be referenced by this project are not digital at all. A
manuscript is made of parchment, not bytes, and the particular strength of the linked data model
is that it is built on a clear distinction between an 'information resource' (an object in digital form,
such as a digital image) and a 'non-information resource' (an object not in digital form, such as a
manuscript).3 That said, there are elements within these systems which are highly relevant to the
needs of this project. For example, from this project's metadata one would could readily create a
sequence of digital images representing all the pages of a manuscripts. From this, a METS list of
all the images could be created, and then sent to a METS image viewer. This suggests that the
linked data to be created by this project and systems such as METS and OAI-ORE are
complementary. This project's ontology could be used to declare fine-grained relationships
among objects, which would then enable intelligent handling of these objects by digital library
systems.
2.
Project
plan
2.1
Timetable
and
deliverables
2.1.1 Months 1-3: Creation of c.500,000 RDF records from materials held within ITSEE.
Thus: For 100 mss of parts of the Greek NT, each of 50 pages, each containing 30
verses=100*50*25=150,000; Dante Monarchia 22*80*25=44,000; Dante's Commedia
9*200*70=126,000; Canterbury Tales 12*500*35=210,000.
Deposit of these in the University Institutional Repository; exposure of these to RDF
federating systems. Creation of a project website.
Deliverable 1: the RDF records, mounted on the Birmingham IR
2.1.2 Months 4-6: Building two access demonstrators. The first will show navigation strategies
for movement within the RDF records created by this project, from documents and works to
their parts. It will show how alternative interfaces to the data may be developed from the
metadata alone. The second demonstrator will show linkages between the RDF records
using the project ontology and other resources. We will implement two kinds of linkage:
i.
From item-level catalogue records into the records made by this project. That is:
a query to an online catalogue for 'The Gospel of John' should link to the records
for that work in this project's ontology. The reader should be able to go from the
catalogue entry through the project ontology to a list of all manuscripts
containing this work, and thence to associated digital images and transcripts
ii.
From the records made by this project to resources outside the ontology
developed by this project. The project will implement these links for two sets of
data. First, it will implement links to the publications (books and articles)
relating to the New Testament texts by members of the New Testament editing
team in ITSEE, as listed in the Birmingham Research Publications Database,
maintained by the Birmingham Institutional Repository team. Second, it will
implement links to the Chaucer Bibliography Online for the Canterbury Tales
materials. For example: the Chaucer Bibliography Online lists an article by
Hugh Keenan on lines 345-346 of the General Prologue. The project will create
3
For this rather ugly terminology, see http://www4.wiwiss.fuberlin.de/bizer/pub/LinkedDataTutorial/.
5
a link between the instances of those lines in the RDF records and the online
Bibliography, and express that link too in RDF form.
The demonstrators will then be linked to the project website.
Deliverables 2 and 3: the access demonstrators
2.1.3 Months 7-9: Dissemination activities. The project will host a one-day workshop showing
its methods and results. A report will be placed on the project website.
Deliverables 4 and 5: the workshop; public report.
2.2
Project
management
2.2.1 Project management will follow the model offered by JISC’s Project Management
Guidelines, May 2008, p. 9 ff, with responsibilities divided into three:
1. A project steering group. This will meet three times: at the commencement of the project
in June 2010, at the end of month 3, and at the project end. The steering group will
consist of the PI, two senior researchers in the university outside the project, and the codirector (with the PI) of ITSEE, Professor David Parker. The project manager will report
monthly to the steering group.
2. The project manager: the PI, Robinson. One half-day a week to the project throughout.
3. The project technical officer (Green), reporting weekly to the project manager.
2.3
Risks:
staff
recruitment
2.3.1 The two key project staff, the PI and technical officer, are in post already. For the PI:
scheduled completion of other projects before June 2010 will free time to work on this project.
The Technical Officer (Green) is currently employed at 0.4 time, and so will be available on 1
June for this post. In the event that he is not so available: posts of this nature, even short term
ones, invariably draw a strong field of applicants in this university, and can be filled quickly.
This might, however, delay the project one or two months.
2.4
IPR
2.4.1 While in many cases the data to which the metadata points (manuscript images,
transcripts) has IPR restrictions, no such restrictions apply to any of the metadata, in the form of
RDF records, to be generated by this project. All these RDF records will be made available freeto-all under the Creative Commons attribution-share alike licence. This will permit the widest
possible use and re-use of the records. Note that we will not apply a 'non-commercial' restriction
to the licence. There are important commercial users of metadata such as this project will create
(e.g. Talis) and the metadata should be as readily available to them as it is to anyone else.
2.5
Sustainability
2.5.1 In the first place, the project will secure the longevity of the metadata by depositing it in
the University Institutional Repository, as sets of RDF-XML files containing multiple RDF
records. The RDF-XML files will themselves have OAI-PMH compliant metadata, which will
expose the data to worldwide RDF aggregators, and enable retrieval through RDF federation
systems (e.g. JENA, SESAME) and the RDF standard query language, SPARQL.
2.5.2 We see the linked-data model behind the project as having a much more important impact
on sustainability. One of the premises of this project is that the current model, where most access
to high-quality digital resources depends entirely on the interface to those resources made by the
projects which created those resources, is fundamentally flawed.4 This model means that the data
4
Key documents, setting out the approach to the interface which lies behind this proposal, are the
papers by Roger Bagnall, Greg Crane and the PI at the ‘Shape of Things to Come’ conference,
Charlottesville, March 2010: http://shapeofthings.org/papers/ (user name shapeofthings, password
papers; to be published by Rice University Press in April 2010).
6
is only available so long as the interface is available: and as interfaces are extremely system- and
browser-dependent, this is likely to be a rather short time. This project offers an alternative: by
creating rich metadata for each distinct digital element (even, a single character in a text on one
page of a manuscript) it will be possible to create multiple access routes to the data from the
metadata alone. These would complement, and could ultimately replace, the dedicated interfaces
so far created. To return to the example at the beginning of this proposal: one could create an
interface for resources relating to the first verse of the Gospel of John, giving access to each
manuscript which has this text, and images, transcripts, and to other materials relating to these,
from the metadata alone.
3.
Engagement
with
the
community
3.1
Project
stakeholders
3.1
It is in the nature of 'linked data' that everyone, everywhere is a stakeholder: the road
leads to every door. The following groups have a special interest in this project, in order of
widening focus:
i.
Scholars interested in the text of the four textual traditions
ii.
Scholars interested in other works and documents susceptible to the same
methodology as developed for this project
iii.
Linked data developers, for whom the volume and characteristics of data on
documents, works and texts will present challenges
iv.
Everyone interested in these texts.
Even for the narrowest of these communities, (i) above, the numbers are large. The annual
conferences of the Society of Biblical Literature draw several thousand professional scholars;
over 25,000 copies of the fundamental text-critical edition of the Greek New Testament, the
Nestle-Aland edition, are sold or given away each year, mostly to students in seminaries and
universities. For the largest of these groupings, (iv) above: one may point to the more than one
million individual visitors to the Codex Sinaiticus website from July to November 2011.
3.2
Dissemination
3.2.1 The project will target the first three stakeholder groups listed above, as follows:
Scholars interested in the text of the four textual traditions: ITSEE co-director Parker will be
responsible for the New Testament texts, and will present the project at the annual
meeting of all participants in the Birmingham-Munster NT editing projects. PI Robinson
will be responsible for the two Dante and Chaucer sets of materials. He will present these
at the annual Kalamazoo medieval conference, the most widely-attended single
conference in medieval studies. In addition, links to the access demonstrator will be
provided from websites for all four editorial groups.
Scholars interested in other works and documents: these will be targetted through presentations at
the two major international conferences on textual scholarship: the Society for Textual
Scholarship, New York (March 2011) and the European Society for Textual Scholarship,
Pisa (November 2010). The PI is the UK representative on the ESF-COST InterEdition
project, and the project will be presented to those groups also.
Linked data developers: A workshop in the last months of the project will present the project's
methodologies, focussing on the possibilities for dynamic interface development from the
metadata created by the project. The ADHO 2010 presentation of the documents, works
and texts ontology by the PI and Meschini will be developed into an article to be
submitted to a major digital humanities journal.
There is no direct way of targetting the fourth group: everyone interested in these texts. This
project will focus on the first three groups. Later projects may seek to reach and foster wider
textual communities, from the starting point provided by this project. Linkage of the records
7
created by this project for document and text segments below the item level, to the item-level
records for the whole documents and texts of which they are part, will mean that any reader
coming through a catalogue interface to the whole document or text will also be able to navigate
through to the parts of the documents or texts recorded according to this ontology, and to the links
between them. This will make resources related to the individual pages of documents and
segments of texts considerably more visible than they are at present.
4.
Impact
4.1
The
project
and
the
wider
community
4.1.1 The project will have achieved its immediate aims if it reaches the first three stakeholder
groups identified in 3.1 above. However, there is a further aim, toward which this project is a
critical first step. This is the creation of 'textual communities' for the editing of large textual
traditions based on digital technology. Within the communities, scholars and readers will execute
the entire editing process, using (among others) the editorial tools and standards developed by the
PI. The textual communities will be open, where anyone interested in a text (say, Dante’s
Commedia) can find which manuscripts and printed editions hold the text; can locate digital
images and transcripts of these; can compare them, search them and analyse them using many
different tools; and can contribute his or her own knowledge and materials for others to use.
4.1.2 The ontology created and implemented by this project will be a key enabling technology
towards achievement of this vision. Other tools have been or are being made for these
communities (e.g. the 'son of SUDA online' in development by the Integrating Digital
Papyrology project). Consider the following scenario:
i.
A reader notices that a new set of digital images for a manuscript of the New
Testament has been created. He or she knows what part of the New Testament is
contained on each page. The browser presents a tool which allows the reader to state,
for each page, what text is on it; this is converted into RDF statements using the
ontology here created and deposited in a RDF store;
ii.
A reader, somewhere else in the world, has declared he or she is interested in this
particular text. RDF records, using this project's ontology, are generated stating this
reader's interest.
iii.
Elsewhere: an RDF federating application matches the availability of the new images
of the text in (i) with the reader's interest in this text in (ii), and generates an RSS
record which is sent to the reader: 'you might be interested in this website, which
contains an image of a text of A, in document B'.
iv.
The reader in (ii) discovers that there is no transcript of this text on this page
available, by submitting a query through the browser to the RDF store. He or she
makes a transcript of this page, and places it on a website. Again, RDF statements
about this new transcript are generated, and deposited in a RDF store.
One could extend this scenario indefinitely. Other readers could find the new transcript, correct it
and augment it; others could then compare the transcript with other texts of the same part of the
New Testament found in other documents; others could annotate it in various ways. In every
case, the ontology first instantiated by this project would have a crucial role, in setting out the
links in the chain.
4.1.3 Or, another scenario: a reader is looking at the first line of the Canterbury Tales in their
browser. A piece of software running in the background notices this and thinks: that person is
reading the first line of the Tales. What is there out there, relevant to what he or she is reading?
The computer queries the RDF store and locates records using this project's ontology. It sorts the
information into transcripts, images, commentaries, etc, and sends a message to the reader by the
browser: 'you might be interested in ... '.
8
4.2
This
project,
the
community
and
sustainability
4.2 It is usual to see the two issues of sustainability and communities as requiring separate
strategies. One could pursue a centrally-based model of sustainability, and deal separately with
community building. This project takes a different view. We believe that there is one solution to
both problems. We aim to build a single model for the making of scholarly editions in the digital
age which is both sustainable and which permits the widest engagement with the community.
Our model is: the creation of textual communities for collaborative editing of large textual
traditions based on digital technology, through services and data distributed across the web.
4.3 How will the creation of textual communities address sustainability? Sustainability is not
only a matter of data handling standards and routines. Sustainability, of any kind, depends on
community will. So long as people want to read the Commedia, they will want access to editions
and information about it. However, the will of the community must be given practical shape, as
crucial materials can be lost through negligence. The open architecture proposed by this project,
interlocking with existing and foreseen data storage and migration facilities (particularly, the
institutional repository movement), offers a route towards sustainability of distributed resources.
4.3
Evaluation
4.3.1 The project will be able to provide statistical measures of its progress, as follows:
i.
RDF records created, categorized by type
ii.
Accesses to the RDF records on the Birmingham IR
iii.
Accesses to digital on ITSEE servers materials referenced from the RDF records
iv.
Users of the access demonstrator
v.
Incorporations of access demonstrator elements on other websites.
These statistical measures will be used throughout the project to assess its progress. They will be
considered particularly at the first steering group meeting, at the end of month 3. These measures
will be supplemented by a user survey in the last three months of the project. Resources are
allocated in the project to commission a draft evaluation report, based on the survey results and
statistics. This will then be revised by the Project PI, and submitted as a final evaluation report.
5.
Previous
Experience
of
Project
Staff
Peter Robinson, PI: codirector of ITSEE and of the Canterbury Tales project. Involved in the
making of digital editions since 1990. His publications, as editor or facilitator, include twenty
digital publications. He led the EU-funded MASTER project, which created the manuscript
description encoding which is the basis of the TEI P5 manuscript description element, and was
the major contributor to the TEI P4 chapters on text transcription and apparatus encoding. He
most recently served on the Technical Standards Working Party of the Codex Sinaiticus project,
and led the JISC-funded Virtual Manuscript Room project. [10% time, directly allocated]
Jill Russell: manages the University of Birmingham institutional repository and is closely
involved in developing the University’s emerging strategies for archiving digital documents and
other research outputs. She holds a Masters Degree in Library and Information Studies and has
extensive experience of work in Higher Education libraries. She has a successful record of
managing internal and external projects. [Member of the steering group]
David Parker: codirector of ITSEE and Executive Editor of the International Greek New
Testament Project; PI of the Codex Sinaiticus Project and the IGNTP Project; Co-PI of
the Vetus Latina Iohannes Project [Member of steering group and NT consultant]
Zeth Green, technical officer: holds an undergraduate degree in Theology and a Masters in
Electronic Editing. Ten years experience in web development; particular expertise in Python,
XML databases; vice-chair of Python UK Society [50%, directly allocated]
9
6.
Budget
Directly Incurred Staff
Apr10– Mar11
Robinson: 10%, grade 9, point 51
Green: 50%, grade 7, point 30
Total Directly Incurred Staff (A)
Non-Staff
Travel and expenses
Hardware/software
Dissemination
Evaluation
Other
Total Directly Incurred Non-Staff
(B)
TOTAL £
Apr11– Mar
12
£
£
£
£
£
£
TOTAL £
Apr10– Mar11
£1,000
£2,000
£2,000
£1,500
£
£6,500
Directly Incurred Total (C)
(A+B=C)
Directly Allocated
Apr11– Mar
12
£5,473 £
£14,987
£20,460 £
£26,960 £
£1,000
£2,000
£2,000
£1,500
£
£6,500
£26,960
Apr11– Mar
12
£3,438 £
£
£3,438 £
TOTAL £
17917 £
£17,917
Total Project Cost (C+D+E)
£48,315 £
£48,315
Amount Requested from JISC
Institutional Contributions
£48,315 £
£0 £
£48,315
£
Estates
Other
Directly Allocated Total (D)
Apr10– Mar11
£5,473
£14,987
£20,460
£
Indirect Costs (E)
Percentage Contributions over the
life of the project
JISC
Partners
X 100
X%
No. FTEs used to calculate indirect
and estates charges, and staff
included
No FTEs
0.6
10
Which Staff
Robinson, Green
£3,438
£
£3,438
Total
100%
FOI Withheld Information Form
1. We would like JISC to consider withholding the following sections or paragraphs
from disclosure, should the contents of this proposal be requested under the
Freedom of Information Act, or if we are successful in our bid for funding and our
project proposal is made available on JISC’s website.
2. We acknowledge that the FOI Withheld Information Form is of indicative value
only and that JISC may nevertheless be obliged to disclose this information in
accordance with the requirements of the Act. We acknowledge that the final
decision on disclosure rests with JISC.
Section / Paragraph No.
-
Relevant exemption from
disclosure under FOI
-
Justification
-
11
Professor Vincent Gaffney
Director of Research and Knowledge Transfer
College of Arts and Law
University of Birmingham
Edgbaston
B15 2TT
United Kingdom
16th April 2010
Statement of support
I am writing to affirm the support of the University of Birmingham for the application
submitted by Dr Peter Robinson under the name Linking documents, works and texts,
closing 12 noon UK time on Tuesday 20th April 2010. This project has implications for,
and so involves staff from, many segments of the university. I affirm that there has been
wide consultation among all these divisions of the university in the preparation of the bid,
that appropriate commitments of staff time have been made for this project in the event of
the bid's success, and that the project costings have been prepared and approved by the
University finance office. I affirm also that the University will administer this project,
should the bid be successful.
Yours sincerely,
Professor Vincent Gaffney
12
Appendix:
the
Virtual
Manuscript
Room
project,
funded
by
JISC
2008‐2009
Project facts: start date 1 September 2008; finished 30 September 2009.
Funding: £69,000 from JISC, matched by the University of Birmingham
Resources: one full-time member of staff; one manager 10% time (the PI of this project)
URL: http://vmr.bham.ac.uk
This project, based in the Institute for Textual Scholarship and Electronic Editing (ITSEE) at the
University of Birmingham, addressed both the issues of cost and of metadata. The first aim was
to establish a pipeline for efficient submission of a full set of manuscript images, with
accompanying metadata, to a web interface. This was achieved for 138 sets of manuscript
images: 71 from the Mingana collection, 22 of Geoffrey Chaucer's Canterbury Tales, 38
miniscules of the Greek New Testament, 7 of Dante's Commedia, amounting to around 40,000
manuscript images. At a total cost to JISC of around £1.50 per image, this represents excellent
value. Indeed, the marginal cost of adding an additional set of manuscript images to the system is
much lower than that. It is around 15 minutes work to add a folder containing a full set of images
for a whole manuscript to the VMR, inclusive of metadata generation: thus, pennies per image.
The images for the manuscripts can then be viewed through the image viewer online: for
example, at http://vmr.bham.ac.uk/Collections/Mingana/Islamic_Arabic_1572/table/.
The second aim was to create appropriate metadata for each manuscript and each image which
would allow the images and manuscripts to be accessed through the University of Birmingham
Institutional Repository. This would greatly add to their exposure on the web, and also provide a
route for long-term sustainability. This was achieved for the 71 Mingana manuscripts. Thus,
http://epapers.bham.ac.uk/116/ provides parallel access to the same manuscript. Subject to
copyright agreement, currently in negotiation with many of the manuscript holding institutions,
images from the other three collections (New Testament, Chaucer and Dante) will be made
available for public access as are the Mingana manuscripts.
These achievements have provided a sound foundation to build from the VMR, as the base for
digitization and for the range of editing activities carried on within ITSEE and its partners. As a
pathway towards digitization: a plan has been developed to digitize all 3000 manuscripts of the
Mingana Collection and use the same combination of the VMR and the Institutional Repository to
present them to the world. As a base for editing: ITSEE's partner in the New Testament work, the
Institute for New Testament Textual Research, has implemented its own version of the Virtual
Manuscript Room, at http://intf.uni-muenster.de/vmr/NTVMR/IndexNTVMR.php, with links
between the Munster and Birmingham implementations. The VMR project in Birmingham is also
adding facilities to allows scholars to provide further information on materials held on the site,
and a JISC-sponsored Workshop on Collaborative Editing in September 2009 explored the use of
the VMR as a host for community-based editing.
13