Development of a Topographical Transcription Method Introduction

Transcription

Development of a Topographical Transcription Method Introduction
Development of a Topographical Transcription Method
Introduction
In the past years, digitization was about transforming analog documents into a digital
representation. Problems in respect of color management, lightness scale, resolution and
geometrical distortion had to be and have been solved. Today, the necessary methods can be
considered to be well elaborated and satisfactory for the creation of digital images of analog
documents. Thus, numerous digitization initiatives led to the formation of web-portals making
available digital facsimiles, corresponding metadata and tools to search and browse. However,
only few tools are available to uncover the full potential of these digital facsimiles with
respect to their use in humanities research.
Noticing this deficit, we are developing SALSAH, a Virtual Research Environment (VRE) for
the humanities. SALSAH (System for Annotation and Linkage of Sources in Arts and
Humanities) is a collaborative research platform allowing for the visualization, the annotation
and the linkage of digital resources in the humanities. The application is completely webbased and renders possible the usage of digital resources in humanities research directly on
the web. This way, the research data such as annotations, linkages etc. emerges in a borndigital form.
When thinking about digital resources such as digital facsimiles and methods to make use of
them in the humanities, it is also necessary to think of methods for transcription and text
constitution in the digital medium. Furthermore, we have to consider the digital facsimile to
be a representation of an analog document offering text but possibly also pictorial information
such as illustrations etc. Putting the focus on pictorial aspects, we can also conceive text as
being pictorial in the first place. For this reason, we are developing a topographical
transcription method for digital facsimiles within SALSAH.
This article first briefly describes the general purpose, the functionality and the data model of
SALSAH. It then presents general thoughts about the benefit of digital facsimiles and
describes the topographical transcription method currently being developed as an extension of
SALSAH’s core functionality.
SALSAH
SALSAH has been developed at the Imaging & Media Lab of the University of Basel since
summer 2009. It has originated in an art historical context and comprises further humanities
disciplines today. Besides the “Narrenschiff”-project1, SALSAH is used by two edition
projects: the “Anton Webern Gesamtausgabe”2 and the “Kritische Robert Walser-Ausgabe”3.
SALSAH is designed as a general VRE for the humanities (Schweizer 2011: 147ff.).
Currently, SALSAH offers methods to work with digital facsimiles (the support of audio and
moving image is already planned).
SALSAH offers the functionality to:
- visualize various digital resources simultaneously
- annotate digital resources and to share these annotations collaboratively
1 Together with Prof. Barbara Schellewald, Institute of Art History, University of Basel
2 Institute of Musicology, University of Basel
3 Institute of German Philology, University of Basel
-
create links between digital resources and to annotate them
create Regions of Interest (ROI) within digital resources and to annotate and link them
access external repositories of digital resources and to apply SALSAH’s functionality
to them
By the use of SALSAH’s annotation and linkage functionality, the research data emerges in
the digital medium and is directly connected to the digital resources it refers to. Figure 1
shows an example out of the art historical “Narrenschiff”-project. The elliptical shapes
represent digital objects (here: books and pages) while annotations are indicated by
rectangles. The arrows show how the digital objects and annotations are related to each other.
The book is characterized by two annotations: title and date of publication. A book is a
compound object which means that it consists of other objects: single pages. Each page
belonging to the “Narrenschiff “ thus refers to the digital object representing the book. Like
the book itself, each page can be annotated4. Because pages have a certain order, they are
annotated with a pagination. Furthermore, each page can be described by composing a page
description.
By creating links between digital objects, relations between them can be expressed. Each link
is again treated as a digital object that can be annotated (here with a description). In this way,
the link’s semantics can by expressed. By annotating and linking digital objects, the research
knowledge emerges as a network-like structure which can be browsed and extended by other
Title: Das
Narrenschiff
Date of Publication: 3rd March 1495
Book
Page
Page
Page
Pagination: 1 verso
Link
Page Description: This page
shows a fool
Description: Interesting
illustrations of a fool
Figure 1 Structure of the Research Data within SALSAH
researchers. By
working on the same digital corpus, humanities researchers can collaborate
with each other – either within a working group or even in an interdisciplinary setting.
The annotations and even the digital objects available can be defined specifically for each
project. Each project within SALSAH can define the semantics of its digital objects and
which annotations they may have. Digital objects may have a digital representation (that is
digital data representing some physical aspects of the analog object such as the digital data of
4 In terms of the data model, annotations and metadata are not distinguished: metadata are also annotations. But
in the Graphical User Interface (GUI), metadata will be presented seperately from the annotations since they are
quite definite while annotations can be regarded as more subjective and thus open to discussion.
a digital facsimile represents the local reflectance of a page of text), but they can also be
abstract constructs (e.g. a person which may characterized by name and birthdate, but there is
no further digital data representing the physical aspects of a person).
For digital facsimiles, we have recently developed a method to define Regions of Interest.
These regions are geometrically described areas on the digital facsimile and can be annotated
and linked like other digital objects. This functionality renders possible the direct referencing
of parts of pictorial resources.
Figure 2 Creation of a Region of Interest
Figure 2 shows the creation of a Region of Interest consisting of two polygon shapes. The
region can be annotated with a comment. Art historians could describe specifically defined
areas of pages of the “Narrenschiff” by using this functionality. Each region consists of one or
more geometrical shapes and annotations which can be configured according to the research
project’s needs.
All of this functionality can also be applied to remote resources not stored in SALSAH’s local
database. We have already implemented a connection to the assets of the e-codices-project5.
Due to SALSAH’s flexible data model, the facsimiles of e-codices can be annotated as if they
were locally stored in SALSAH. But in fact, only the annotations created in SALSAH are
stored locally, the remote facsimiles are referenced in the SALSAH database.
SALSAH is thus designed as a shared system: remote resources can be accessed and
annotated within the SALSAH environment. On the other hand, all the annotations and links
stored in SALSAH could be made available to the outside by implementing an interface
accessible via a web service. We are currently working on an interface to export SALSAH’s
data. This method would also allow for online connections. For example, the e-codices
website could then indicate if there are annotations created by SALSAH for certain
manuscripts.
5 The project can be accessed here: http://www.e-codices.ch. It currently (last access: 23rd November 2011)
encompasses 833 manuscripts from 34 different libraries.
Transcribing Digital Facsimiles
Having digital images of analog sources available and digital tools to address them, we are
able to conceive a method to transcribe digital facsimiles and subsequently to constitute texts.
First, a brief outline about the importance of facsimiles shall be given. Then a method will be
described which allows for the creation of transcriptions directly in the SALSAH
environment.
Importance of Facsimiles
The digitization of analog documents makes them available as digital images respectively
digital facsimiles. These digital images represent the analog documents with reference to their
visual appearance. This allows for the examination of illustrations and all other kinds of
pictorial elements contained in these documents. Unlike text-based representations, digital
images represent the original material in a non-abstract way6 not presuming the separation of
textual information from the document itself by identifying textual characters. In the (digital)
facsimile, the surface of the document and the textual information are still one entity (Gabler
2007: 198).
Taking the example of the Burgunderchronik of the XV century scribe Diebold Schilling
from Bern, the most easily accessible edition is the purely text-based edition of Gustav Tobler
(Tobler 1897 and Tobler 1901) presenting the manuscript kept in Zurich (known as the
Zürcher Schilling) in the edition text and the official chronicle from Bern (known as the
Berner Schilling) as a variant in the critical apparatus. The illuminations of both manuscripts
are only briefly described in a register in the appendix of the edition. The assumptions of the
editor seem to have been that the text of the Zürcher Schilling is more authentic because it is
thought of as a more original version while the text of the official Berner Schilling is
conceived as a censored copy (Tobler 1901: 347). So far, the editor’s interest is not orientated
towards the documents themselves (their reception etc.) but to find somehow the best text
available. As a consequence, both manuscripts are presented in one edition (implying that
they are manifestations of the same text) but not without building a hierarchy between them
(the text of one manuscript is presented in the edition text, the other manuscript’s text is
presented in the critical apparatus). Having a look at the printed facsimile editions existing for
both manuscripts (only available in few libraries and archives), the overall impression of the
two manuscripts is very different. While it can be said that they offer a very similar text where
they converge7, the illuminations offered by the two manuscripts are of very different kind
and thus constitute very different relations between texts and pictorial elements significantly
influencing the perception of the manuscripts. Besides having a look at the original
documents, only facsimile editions reveal these aspects. But since these print editions are high
priced and not widespread, their benefit is limited8.
Looking at the younger history of editing in German philology, we can see a paradigm change
towards an edition technique consequently integrating facsimiles in the seventies. The
Frankfurter Hölderlin-Ausgabe (FHA) realized by Dietrich E. Sattler (Sattler 1975-2008)
applied a novel way of editing. Instead of presenting the constituted text as the edition text
accompanied by a critical apparatus containing its variants, this edition made visible the entire
analytical process beginning with the facsimile and ending with a constituted text (Martens
6 Of course, also the making of digital images can be conceived as an abstraction from the original implying
decisions about perspective, resolution, color adjustment etc.
manuscripts are of different temporal extent.
8 In fact, e-codices has already digitized the Berner Schilling. Once it is made available on their website, the
accessibility of this manuscript will be unproblematic.
7 The
1982: 52ff.). The edited text representing the final state in the constitution process can thus be
conceived as the result of an analytical process openly presented to the reader via the
consequent integration of facsimile, their diplomatic transcription and a phase analysis
(Martens 1982:53f.).
The integration of the facsimile in the edition ensures the transparency of the analytical
process the edition has undertaken. The reference to the facsimile also emphasizes the status
of the document the transcription and process of text constitution are based on (Gabler 2007:
199).
Topographical Transcription Method
The transcription method being developed in SALSAH is orientated topographically. The
transcription process begins by defining visually coherent areas on the digital facsimile (using
SALSAH’s functionality to create geometrical figures on the facsimile) to be encoded into
textual characters line by line. Manuscripts possibly don’t offer one overall text area but
several distinct areas of textual information (text blocks, annotations, notes, marginalia,
glosses etc.). Addressing them topographically renders possible their individual transcription.
By transcribing these areas line by line, the correspondency between the encoded text and the
facsimile is sustained.
Figure 3 Diplomatic Transcription of a Page of the “Narrenschiff” in SALSAH
Figure 39 shows SALSAH’s transcription tool still being in an early state of development. On
the left hand side, the facsimile is displayed. The regions defined on the facsimile are shown
accordingly on the right side as rectangle shapes. Each of these rectangle shapes offers an
editable area where the transcription of the corresponding part on the facsimile can be
entered. While the facsimile can be feely zoomed and panned, the transcription area on the
right side always shows the whole page because it is thought of as a typification of the textual
information given by the facsimile. While the facsimile is conceived as an image (even
though offering textual information), the transcription area on the right side requires the
9 This is an example out of the „Narrenschiff“ which often combines textual and pictorial information. The
transcription method will be used soon in the Weber-project to transcribe supplement material like letters, notes
etc.
encoding of the transcription as textual characters. By doing this area per area, the sequential
relation between the areas can be left open in the first place. The characters within the areas
have to be entered in a linear order, but such an order is not presupposed between the single
areas themselves.
To the transcription text of each area properties can be assigned. Similar to a word processor,
the user is able to make a text selection and to choose a property (like bold, underline etc.).
Because SALSAH is designed as a generic and general system for the humanities, the
available properties can be defined specifically for each project. These properties represent
visual attributes of the transcribed text.
Furthermore, structural relations can be defined – either within a single transcription area or
in between several such areas. In the current state of development, we are thinking of the two
basic operations insertion and deletion which could then be combined to a substitution or a
transposition. In practice, it would be possible to express textual dynamics by defining
structural relations resulting in alternative sequences of textual characters. For example, we
could think of the overwriting of characters by others. We would then have an initial set of
characters which then would have been substituted by others. Or we could think of additional
text which could be considered as an insertion.
The transcription of a facsimile as described before possibly offers more than one linear text.
The sequential combination of transcription areas and the definition of structural relations10
(deletions, insertions, substitutions, transpositions) allow for the building of multiple
readings. A reading is thought of as an unambiguous sequence of characters representing a
certain interpretation of the facsimile. Each reading may be annotated with a comment etc. by
other researchers and various readings could be interrelated to each other in order to express
their semantic difference. For example, several readings could represent different states in the
genesis of a text. These different states are based on the analysis of corrections (insertions,
deletions, substitutions, transpositions) present in the digital facsimile.
Each reading built by using this transcription method is transparent because is can be
backtracked to the diplomatic transcription which is directly related to the facsimile. The
described method is not a special tool offered by SALSAH but an integral application of its
annotation and linkage possibilities. That way, the constitution of texts representing the
content of documents can be seen as a task not fundamentally different from the constitution
of research knowledge among sources within SALSAH as a VRE. As any other form of
knowledge within SALSAH, the process of transcribing and the constitution of readings can
be reconstructed in their generation as well as criticized by annotation.
Bibliography
GABLER Hans Walter (2007), ‘The Primacy of the Document in Editing’, Ecdotica, vol. 4,
pp. 197-207.
MARTENS Gunter (1982), ‘Texte ohne Varianten? Überlegungen zur Bedeutung der
Frankfurter Hölderlin-Ausgabe in der gegenwärtigen Situation der Editionsphilologie’,
Zeitschrift für deutsche Philologie, vol. 101, Sonderheft: Probleme neugermanistischer
Edition, pp. 43-64.
10 Already a simple insertion offers two alternative sequences: a reading without it and another including it.
SATTLER Dietrich E. (ed.) (1975-2008), Friedrich Hölderlin. Sämtliche Werke. ‘Frankfurter
Ausgabe’, Frankfurt am Main, 20 vols plus supplements.
SCHWEIZER Tobias, ROSENTHALER Lukas (2011), ‘SALSAH – eine virtuelle
Forschungsumgebung für die Geisteswissenschaften’, in Konferenzband. EVA 2011 Berlin.
Elektronische Medien & Kunst, Kultur, Historie, Berlin, pp. 147-153.
TOBLER Gustav (ed.) (1897-1901), Die Berner-Chronik des Diebold Schilling 1468-1484, 2
vols, Bern.