Carl G Stahmer Director of Digital Scholarship University of

Transcription

Carl G Stahmer Director of Digital Scholarship University of
Carl G Stahmer
Director of Digital Scholarship
University of California Davis Library
Associated Director
English Broadside Ballad Archive
Forthcoming: Huntington Library Quarterly
DO NOT DISTRIBUTE
Digital Analytical Bibliography:
Ballad Sheet Forensics, Preservation, and the Digital Archive
In late February, 1954, Allan Stevenson was on fellowship at the Huntington Library engaged
in research on the sources of paper used in the printing of early English books when news broke that
the Pierpont Morgan Library had acquired one of three known copies of the Constance Missale—a
liturgical text believed by many to predate the Gutenberg Bible (printed in 1455) by as many as five
years, thereby making it the oldest extant text printed using moveable type.1 The officers of the Morgan
Library certainly believed this to be true, as they invested an excess of $100,000.00 to acquire the text.2
On hearing news of the acquisition, in Stevenson's own words, “It occurred to [him] that, although half
a million words had been spilt concerning the typography of the Missale, decidedly few had been
ventured concerning the paper which had served to set the type before the gaze of men.”3
As a result of this revelation, Stevenson spent the next 13 years meticulously documenting and
cataloguing observable aspects of the material text. Building on the previous work of scholars such as
Bühler, Hupp, Masson, and Scholderer, he created a chronological and spatial map of the movement of
1
Scholars had variously dated the Missale from 1448 to the mid 1470s. For a good overview of the various publication
dates ascribed to the text see Stevenson, Alan. The Problem of the Missale Especiale (London: Bibliographical Society,
1967), esp. Chapter 1, “The Riddle of the Book.”
2
Raymond A. Lajoie, "Constance Missal Considered World's Oldest Book," Gadsden Times, September 5, 1954.
Accounting for inflation, in 2014 the acquisition would have cost in excess of $850,000.00.
3
Stevenson, Problem of the Missale Especiale, 29.
individual type sorts and fonts as they appeared both in the Constance Missale and in various other
texts from the fifteenth century.4 By also meticulously measuring the orientation of watermarks to the
wire and chain lines on the paper on which the work was printed, as well as the movement of these
lines in relation to each other, he was also able to construct a picture of the production history of the
paper on which the Constance Missale was printed.5 As a result of this work with the visible remnants
of the paper making and printing processes, ghosts of the work's material past, Stevenson was able to
solve one of the highest profile scholarly debates of his time and definitively show that the Constance
Missale had actually been published in 1473, after the Gutenberg Bible.
4
A “sort” is an individual, physical piece of type representing a letter or symbol. Individual sorts were cast in a matrix,
typically a hand mould, into which molten metal is poured. After cooling, the matrix is opened and the sort is released.
The new sort must then be cleaned and filed by hand to remove rough edges prior to use. As a result of this hand filing
process, and its unique printing history, each sort is to some extent unique. For previous scholarship on which
Stevenson’s method was based, see Curt F. Bühler, The Fifteenth-century Book; the Scribes, the Printers, the
Decorators Philadelphia: Univ. of Pennsylvania Press, 1960. Also "Another View on the Dating of the Missale Speciale
Constantiense," The Library, 5th ser., no. 14 (1959): 1-10; Otto Hupp,. Ein Missale Speciale, Vorläufer Des Psalteriums
Von 1457. Beitrag Zur Geschichte Der Ältesten Druckwerke (Munich: Müchen-Regensburg, 1898); Irvine Masson, The
Mainz Psalters and Canon Missae, 1457-1459 (London: Printed for the Bibliographical Society, 1954); and Victor
Scholderer, Fifty Essays in Fifteenth-and Sixteenth-century Bibliography, ed.Dennis E. Rhodes (Amsterdam: M.
Hertzberger &, 1966).
5
Figure 1 shows both a paper making moulde and a piece of unprinted paper (from a different moulde). The “chains” in the
paper making moulde are metal or wood structural ribs that help the frame hold its form (vertically oriented in Figure 1).
“Wires” are then strung across the frame at a right angle to the chains (horizontally oriented in Figure 1). The wires act
as the actual bed onto which the fibers that will constitute the paper come to rest during the paper making process.
Watermarks were typically sown onto the frame at the end of the mouldeproduction. These physical structures of the
paper making moulde leave an impression on the resulting piece of paper. All of these various elements migrate relative
to each other during the paper making process, creating a unique fingerprint on each resulting piece of paper.
Figure 1: Paper making moulde and a blank sheet of paper (printed from different moulde)
showing the impressions left by the physical apparatus of the paper making process
Stevenson’s work built upon an established form of enquiry known as analytical bibliography,
that relied upon, in his words, “searching out forms of material evidence” and developing techniques to
effectively establish the printing history of texts.6 His particular application of the methods of
analytical bibliography to the problem of the Constance Missale proved so convincing that it caused
Victor Scholderer, whose own work on the text Stevenson refuted, to state definitively that, “The new
procedure will be required of the cataloguer henceforward, now that we see how decisive the evidence
of the paper can be.”7 Alas, I can safely report that Scholderer's predictions for the future of
cataloguing and bibliography have not come to pass. With the exception of a few, rarified cases, neither
6
Stevenson, Problem of the Missale Especiale, 26.
7
Victor Scholderer, forward to Stevenson, Problem of the Missale Especiale.
library cataloguers nor textual scholars are en masse combing the world's libraries with magnifying
glasses and rulers in hand measuring idiosyncrasies of paper and type.8
There are several reasons why Stevenson’s application of analytical bibliography has not
evolved into a mainstream scholarly approach to the study of early modern texts in general nor of the
broadside ballad in particular. First and foremost is the simple fact that it is labor intensive. It took
Stevenson thirteen years of concerted study looking at a relatively small number of texts (compared to
the vast body of extant broadside ballads let alone other published works) to complete, compile, and
publish his analysis. Neither the workflows nor budgets of libraries or academic departments are
internally capable of sustaining such a gargantuan effort. Given this reality, this work can only be
conducted through the successful acquisition of outside funding. Such funding opportunities have
always been sparse, and are today more sparse than ever.9 As a result, pursuing this type of research is,
in the majority of cases, simply not institutionally feasible.
A second factor that has prevented the application of analytical bibliography to broadside
ballads has been limited access. Prior to the advent of the digital archive, most scholars simply did not
have access to any view of the majority of ballads other than through microfilm or print from
microfilm. These microfilm-derived versions, which presented only high-contrast masks of the
originals, did not adequately capture the traces of the material printing process upon which the
analytical bibliographic method relied. The only way for scholars to observe and document this crucial
8
The major exception is the dating of the William Shakespeare quartos, where these techniques were applied by a group of
dedicated scholars who, through their efforts, succeeded not only in creating a complete catalogue of font types used in
the quartos but also, through this effort, established their publication history. See R. B. McKerrow, Printers' &
Publishers' Devices in England & Scotland 1485–1640 (London: Chiswick Press, 1913); also . Sir Walter Greg, "'On
Certain False Dates in Shakespearian Quartos," The Library, 2nd ser., no. 9 (1908): 113-31.
9
American Academy of Arts and Sciences, The State of the Humanities: Funding 2014, Report April 2014, accessed
September 9, 2014, http://www.humanitiesindicators.org/binaries/pdf/HI_FundingReport2014.pdf.
information was to view the originals in their institutional repositories, which were (and are) scattered
around the globe and, in some cases, closed to viewing. The logistics of access thereby added yet
another layer of temporal and financial burden to the endeavor.
Computer technologies offer the potential to overcome the temporal, access, and cost barriers
that have heretofore stood in the way of large-scale adoption of such meticulous forms of analytical
bibliography, opening up new possibilities for advances not only in broadside ballad studies but in
early modern studies in general. At present, a strong majority of extant pre-eighteenth-century English
broadside ballads are freely available online, where they can be studied by anyone, anywhere, with an
internet connection and a web browser.10 This unprecedented access has already had a positive impact
on both the quantity and quality of ballad scholarship;11 and digitization efforts are ongoing.12 As such,
we find ourselves in an era of “Big Data” ballad scholarship, where both humans and, of more
importance to analytical bibliography, computers have access to large-scale, analyzable stores of
cultural data.
In an analytical bibliographic ecosystem, computer access to big ballad data supersedes the
need for human access because computers can easily overcome what has historically been the method's
10
As of this writing, the English Broadside Ballad Archive, or EBBA, currently provides high-resolution facsimiles (in a
variety of views) of 59.75% of all extant seventeenth-century English broadside ballads. In addition, the Bodleian
Library Broadside Ballad Project provides digital access to their holdings of around 2,500 pre-1800 ballads and over
30,000 ballads from the sixteenth through to twentieth century. See English Broadside Ballad Archive, hereafter
referred to as EBBA, http://ebba.english.ucsb.edu. See also Bodleian Ballads, http://www.bodley.ox.ac.uk/ballads/.
11
See Carl G. Stahmer, "Open Access Results in Increased Scholarship of English Broadside Ballads." in Carl Stahmer,
PhD - Digital Humanist (blog), July 7, 2014, accessed September 1, 2014. http://www.carlstahmer.com/2014/07/openaccess-results-in-increased-scholarship-of-english-broadside-ballads/.
12
Work at EBBA is ongoing, with the recent awarding of a fifth round of funding from the NEH to support the digitization
of the early printed ballads held at Harvard University’s Houghton Library. EBBA's goal is to provide public access to
100% of the extant broadside ballads from its period of focus (circa 11,000 items).
primary barrier to entry: the sheer time involved in carrying out its methodologies of categorization,
measurement, and calculation. Because analytical bibliography is mathematically rather than
semantically intensive, computers are actually better equipped to engage in this type of scholarship
than are their human counterparts. I am not here advocating for a completely automated mode of
scholarship. What I am advocating for is a mode of digital analytical bibliography in which ballad
scholars can leverage computational proficiencies in order to answer important, previously unanswered
questions.
As was the case with the Constance Missale prior to Stevenson's analytical bibliographic work,
the obscured production history of broadside ballads remains at present one of the most significant
impediments to their scholarly study. A majority of the extant English broadside ballads have no
imprinted publication date, and many contain incomplete or no information regarding the printer and/or
publisher of the ballad. Scholars have historically assigned possible publication date ranges to these
ballads, often spanning as long as 50 years, through references made within a ballad to historical events
of known date, through reference to known periods of activity by an identified printer or publisher
(where one exists), and sometimes simply by reference to particular stylistic features. As with the
Constance Missale, a digital analytical bibliographic approach would directly engage this knowledge
deficit.
In order to accomplish its work, a digital analytical bibliography would need to develop a set of
verifiable methodologies for extracting and analyzing a variety of material aspects of the text. As a
starting point for this new method, I suggest the following forms of information extraction and
computational analysis, all of which we currently have the technological capability to implement: (1)
an analysis of the frequent practice of reusing blocks, rules, or ornament on multiple broadsides over
time; (2) the identification and tracking of individual sorts of type as they are used and reused on
multiple ballads over time; (3) an analysis of how woodcuts, ornament, and type arranged in the
printing press forme tend to drift from their original positions as multiple copies of a single ballad are
printed; and (4) the analysis of wire lines, chain lines, and watermarks according to the methodology
established by scholars such as Stevenson.
A computational analysis of the frequent practice of reusing blocks, rules, or ornament on
multiple broadsides over time would significantly add to our knowledge of their printing history.
Broadsides are object-oriented by nature, comprised of collections of representational units that, as we
see discussed in many essays in this issue, were arranged and re-arranged across various printings.13
Text, musical score, woodblock impressions, and ornamentation were used and re-used in various
contexts to create a multitude of unique compositions. There was, in fact, a robust trade in the building
blocks of the broadside printing industry. Individual blocks, sorts of type, rules, etc. were frequently
sold and/or traded between printers and publishers. As a result, it is common, for example, to find the
same woodblock used multiple times across completely different broadside ballads.
Computers offer the possibility of tracking these building blocks as they moved from broadside
to broadside and also from printer to printer. For the past two years, EBBA has been developing
Content Based Image Retrieval (CBIR)14 software specifically designed to allow scholars to search a
collection of digital images for all occurrences of a seed image. A beta version of the platform, which
we call Arch-V, short for Archive Vision, is currently being implemented at EBBA where it is being
used to power EBBA's nascent Ballad Impression Archive (BIA), which will provide a fully searchable
and catalogued index of all woodblock impressions that appear on the broadside ballads in the EBBA
13
By “object-oriented” I mean here to consciously invoke the modern, computer science sense of the term as describing a
design approach that produces a complex process through the arrangement of a set of discrete sub-objects, each being a
complete entity in its own right. See "Object-oriented Design," Wikipedia, accessed December 9, 2014,
http://en.wikipedia.org/wiki/Object-oriented_design.
14
See A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-based Image Retrieval at the End of
the Early Years," IEEE Transactions on Pattern Analysis and Machine Intelligence 22, no. 12 (2000): 1349-380.
doi:10.1109/34.895972.
archive.15
A detailed discussion of the inner workings of Arch-V is beyond the scope of this essay, but a
brief introduction will help to demonstrate how Arch-V in particular, and CBIR in general, can be
leveraged to support an analytical bibliographic approach to ballad scholarship. Arch-V functions by
creating a searchable index of “features” that appear in the images in a collection. Specifically, it
utilizes an algorithm, called SURF feature extraction, to identify meaningful moments in an image
where visible lines either diverge in their directions or end.16 Figure 2 below shows a woodblock
impression extracted from EBBA 31981 (University of Glasgow Library Euing 370) with its SURF
feature points highlighted.
15
In its current, beta incarnation, BIA and Arch-V are currently available to users by selecting the “Impression Archive” tab
while viewing any ballad in the archive.
16
This is a very simplified explanation of SURF feature points. For an in-depth explanation see Herbert Bay, Andreas Ess,
Tinne Tuytelaars, and Luc Van Gool, "Speeded-Up Robust Features (SURF)," Computer Vision and Image
Understanding 110, no. 3 (2008): 346-59, accessed February 7, 2010, doi:10.1016/j.cviu.2007.09.014.
Figure 2: SURF feature points identified by Arch-V in woodblock impression.
The size of the drawn circle indicates the magnitude of the identified feature point.
In the above example, each circle represents an identified feature point, and the radial line of each
circle represents the computer’s mathematical assessment of the directions in which these features are
oriented. Thus, for example, the portion of the image where the woman’s right elbow meets her body
produces what the computer understands as a triangular feature that orients outward according to the
radial line.17 Notice that here (as in many cases) the system will calculate several overlapping features
of variant size. Here, we see one circled feature representing the small triangle where the elbow itself
meets the body, a larger circle that represents the entire shape up to the cuff of the arm and bottom of
17
The “radial line” is the straight line that extends from the center of each circular feature point identifier to it exterior
boundary/circumference.
the blouse, etc., until we reach the end of the lady’s fingertips. These overlapping features are
represented by the collection of concentrically drawn feature points, each with a slightly different radial
orientation indicator as appropriate.
Once Arch-V has identified the feature points that belong to an image, it associates the feature
point collection with the image, and stores this association in a searchable index. When conducting a
search of the EBBA archive, Arch-V extracts the feature points used in the original search (such as the
angle of the lady’s elbow) and utilizes these points as a query to search against the entire collection’s
feature point index.. As a result, similar individual shapes are identified, and similar combinations of
shapes are combined into patterns of increasing significance. Figure 3 below shows a selection from a
sample BIA search return for one of the woodblock impressions that appears on EBBA 30213 (British
Library Roxburghe 1.308-309).
Figure 3: Sample BIA search return showing found examples
of the same or similar block used to produce impressions on other broadsides
As shown in Figure 3, Arch-V allows scholars to track the appearance of an individual
woodblock impression across multiple incarnations. In practice, this can reveal exact or variant copies
of the same ballad, the re-use of a block across the many similar editions and/or printings of ballads, or
the appearance of diverse similar woodblocks. The examples of the fourth impression in Figure 3
above, on Pepys 1.457 (EBBA 20032) and the fifth impression, on Roxburghe 1.54-55 (EBBA 30040),
represent instances where the same woodblock was reused to create an illustration for completely
different broadside ballads printed by different printers. Table 1 below indicates the current, most
accurate known publication related data as reflected in the citation information provided in EBBA for
three of the broadside ballads grouped by the above impression search:
Shelfmark
EBBA ID
Date
Imprint
Roxburghe 1.308-309 30213
1623-1661 ?
Printed at London by M.P. for
F. Groue, / neere the Sarazens
head with / out New-gate.
Roxburghe 1.54-55
30040
1611-1656 ?
London, Printed by M.P. for
Edward Wright / at his Shop
neere Christ Church gate.
Pepys 1.457
20032
1632 ?
Printed by
A.M. for
H.G.
Table 1: Three broadside ballads printed with same woodblock
In addition to the ballads above that were printed using this block, the printer of EBBA 20032, A.M.,
also utilized the same block in a book that he printed in 1630.18
The current publication dates associated with the three ballads above indicate that EBBA 30213
has the latest potential publication date of the three ballads (1661), with EBBA 30040 following
(1656). Both of these broadsides were printed by M.P. However, we know that the same block was
also used by A.M. to print both broadsides and books as early as 1630. The current cataloguing record
thus indicates potentially overlapping publication date ranges for broadsides produced by different
18
See Tom Thumbe, His Life and Death Wherein Is Declared Many Maruailous Acts of Manhood, Full of Wonder, and
Strange Merriments: Which Little Knight Liued in King Arthurs Time, and Famous in the Court of Great-Brittaine
(London: Printed for Iohn Wright, 1630). For printing attribution of this book see University of Ghent library, accessed
November 20, 2014. http://lib.ugent.be/catalog/rug01:001503347.
printers using the exact same block. As only one printer could have been in possession of the block at
any one time, this means that either: 1) M.P. and A.M. were trading the block back and forth, thereby
allowing them to produce printings in an overlapping time period; or 2) the block was first owned by
one printer and then acquired by the other such that there is no overlap in publication usage by the two
printers.
It is most likely that the two printers used the block sequentially and not during overlapping
periods. The figure carved on the block and its subsequent impression is, by all accounts, a generic
figure, and the block is used as such by both printers, where it serves as a multi-use male character,
representing everything from lover to father. Given the fact that both M.P. and A.M. are known to
have had many such stock figures in their block inventories, it is unlikely that either would have had
need to, on multiple occasions, negotiate a short term lease or loan of that particular block for a
particular printing.19
The evidence of the impressions both supports the above theory and allows us to establish a
more accurate publication date sequence for the ballads.20 Figure 4 below presents a side-by-side
rendition of the three impressions in question:
19
It is, of course, within the realm of possibility that they traded back and forth in order to keep their inventories fresh, but
there is no actual evidence to this end; and, given the fact each maintained in their inventory multiple generic male
figures, this is the least likely scenario.
20
I wish to both thank and credit Megan Palmer-Browne for lending her expertise in working with woodcut impressions and
early modern printers to establishing the proper sequence of publication and providing information on Augustine
Matthews’ personal history as indicated in the discussion that follows.
Figure 4: Three impressions showing growth of wormholes
Moving from left to right, notice that there are no wormholes in the leftmost impression, the
appearance of wormholes in the center impression, and finally their continued growth and accentuation
in the rightmost impression. This clearly establishes the following publication sequence for the
impressions: Pepys 1.457 (EBBA 20032) was printed by A.M., after which M.P. printed Roxburghe
1.308-309 (EBBA 30213) and then Roxburghe 1.54-55 (EBBA 30040).
In light of the above evidence, the most likely scenario is that A.M. held the block first and then
transferred ownership to M.P., who died circa 1656.21 This jibes with the historical record as the only
known printer from the period with the initials A.M., Augustine Matthews, was condemned to lose his
press for reprinting Cole’s Holy Table without license and was subsequently listed by Sir John Lambe
as a pauper in 1634, presumably after losing his press. Given this information, we can, with a high
level of confidence, re-establish the actual printing dates of the three ballads on which the impressions
appear as follows:
21
Hyder E. Rollins, " Martin Parker, Ballad-Monger," Modern Philology 16, no. 9 (1919): 449-474.
Shelfmark
EBBA ID
Date
Imprint
20032
1632-1634
Printed by A.M. for H.G.
30213
1632-1656 ?
Printed at London by M.P. for
F. Groue, / neere the Sarazens
head with / out Newgate.London,
30040
1632 – 1656?
Printed by M.P. for Edward
Wright / at his Shop neere
Christ Church gate.
Table 2: Revised publication dates for three broadside ballads printed with same woodblock
The above is just one example of how CBIR, by allowing us to follow the migration of the
physical objects used in the printing process, can help us refine our knowledge of the history of the
broadside ballads themselves. This same technology can also be leveraged to examine typeface to
similar advantage. As discussed with the case of the Constance Missale, the identification and tracking
of individual sorts of type and fonts as they were used and reused over time was an important element
of the analytical bibliography as it evolved over the course of the late nineteenth and twentieth
centuries. This kind of typographic analysis involves identifying particular fonts and/or identifiable
variant(s) or flaws in an individual sort (as reflected in its print manifestation), collecting printed
examples of the same font, letter, or figure from other printed material from the time, and then
examining these suspects for the identified variant(s), thereby positively identifying other places where
the same font or sort was used.22 This technique has typically been employed only on small, select
22
For example, one might find the removal of a jutting spur on the left from a “u” or “r” character so that it could be placed
immediately adjacent to an “r” or “x” with a jutting spur on the right without the two spurs running into each other on
the printed page. This type of post-casting modification of sorts was common. Because these modifications were
produced by hand, each has a unique signature.
collections of work, because the dataset involved in a complete analysis is too large to be human
processed. If we were to segment the text in every extant work from the fifteenth to the nineteenth
centuries into a collection of individual fonts, or more extreme, letters that appear in print, the resulting
dataset would be so large that no human could work through it in a lifetime or cognitively grapple with
the dataset even given multiple lifetimes. Imagine extracting every single “r” that appears in print in
every work printed in the seventeenth century and then examining each of these extracted “r”s by hand
to see how closely it resembles one, particular “r.” The spreadsheet would simply be too large to be
comprehended.23
Whereas the above task would be daunting if not impossible for a human scholar, for a properly
programmed computer with CBIR capabilities, the task is rudimentary. In addition to EBBA's Arch-V
platform, there are currently several other, important projects working to make this kind of analysis a
reality. The Early Modern Optical Character Recognition Project (eMOP) at the Initiative for Digital
Humanities, Media, and Culture (IDHMC) at Texas A&M University has been at the forefront of work
to utilize digital technologies to investigate both font and type.24 Work at eMOP has shown that font
analysis and identification can be successfully leveraged to improve text recognition during the OCR
extraction of textual content from early modern texts, even when working with lower and varied
resolution images such as those found in EEBO. Additionally, pilot work carried out in conjunction
23
George A. Miller, "The Magical Number Seven, plus or minus Two: Some Limits on Our Capacity for Processing
Information," Psychological Review 63, no. 2 (1956): 81-97. Miller's study remains the definitive work on the human
brain's ability to simultaneously hold in memory multiple pieces of information. According to the study, the brain is only
capable of simultaneously dealing with between 5-9 pieces of information at a time. As a result any human scholar could
only hold an image of a few “r” sorts in consciousness while scanning other “r”s for possible matches. All of these
possible matches would then have to be set aside for later actual comparison—all in all, a very inefficient and
impractical system given the volume of information needing to be processed.
24
Early Modern OCR Project, http://emop.tamu.edu/, Initiative for Digital Humanities, Media, and Culture,
http://idhmc.tamu.edu/.
with EBBA has suggested that identification of individual type sorts can be accomplished given a
collection of higher resolution images.
Figure 5: Using Arch-V to find text printed using a particular sort
Figure 5 above provides sample output of Arch-V's feature point matching capabilities applied
to the task of identifying text printed using a particular, known sort. In the above image, the small “r”
to the left is an image extracted from a different printed page than the page extract shown on the right.
In this case, a human scholar first identified the seed “r” (which appears in the upper left corner of the
image above) in which the spur on the bottom left of the letter is separated from the remainder of the
letter as it appears in print. This is visible to the human eye as the space between the main body of the
“r” and the small dot that appears at the bottom left hand corner (note that in the above image the
computer has identified and marked this space as a feature point represented by a purple circle). This
unique feature of this particular “r” could be the result of wear, filing, or poor inking. If, produced by
inking, however, it is unlikely that one would find the exact same representation appearing on multiple
printings, especially if they belong to completely different type-settings. As such, the appearance of
the same “r” on multiple ballads suggests the re-use of the same sort across printings. By extracting
and then matching feature points, Arch-V is, in fact, able to find the same “r” in a completely different
print location (which appears on the right side of the image above).25 Note that other “r”s appear in the
same extract of the sheet, but these are not matched by the system (a typecase would, of course, contain
multiple “r”s). The lines drawn between the feature points in the left and right sides of the image
indicate the points where the computer has made a match.
The above does not constitute definitive proof that the same sort was used to produce the two
“r”s as it is, theoretically possible (though unlikely) that poor inking on two, unrelated print runs could
produce an identical impression or that two “r”s from the same matrix were identically filed by a
printer or exhibit identical wear; however, when run on the average desktop computer, CBIR systems
can perform this type of analysis on thousands of pages of text an hour, and thus produce a highly
accurate output set for subsequent human examination and corroboration. The addition of a feedback
loop into the system, whereby a human scholar feeds information back to the system telling the
computer which of its matches were accurate and which were false would quickly produce a system
capable of exceeding human accuracy when dealing with large data sets.
An analysis of the migration of the physical objects arranged in the printing press forme as
multiple copies of a single ballad are printed is another potentially important tool for the digital
analytical bibliographer. In the letterpress era, the design for each page to be printed is hand created by
combining sorts, blocks, rules, lead spacers, and filler furniture in a tightly bound forme, thereby
creating a negative relief of the desired, printed page. It was not only common but nearly unavoidable
in the early days of print for the various items bound in the forme to shift over the course of a print run.
Because the press itself applies pressure in a regularized fashion, this migration tends to occur in flows
25
The fact that the computer is able to find another example of the same “r” with hanging foot, suggests that this anomaly is
the result of a physical defect in the sort and not of poor inking.
that are identifiable over time. 26 Figure 6 below shows a composite, overlaid image of three extant
copies of a single broadside ballad with one copy appearing in its original black print and two overlaid
copies appearing in colorized red and green print the purpose of demonstration. This composite view
allows the viewer to clearly see the movement of type and block in the forme when the individual
copies of the ballad were coming off the press.
Figure 6: Migration of objects in the forme during printing
Analysis of the movement of objects within the forme during printing as depicted in Figure 6 can be
automated using existing technologies.27 It is, in fact, worth noting that some of these very technologies
were used to construct Figure 6 as a means of making this movement humanly readable. Applying
computational algorithms designed to make predictions about the flow of objects to the item
segmentation and identification data depicted above is all that would be needed to endow the computer
26
By “regularized pressure” I mean to suggest not identical levels of force with each print but the fact that the mechanical
nature of the press directs whatever force is being applied into a regularized direction/vector.
27
The Paragon project at the Center for Digital Humanities, University of South Carolina, is an NEH funded project
devoted to developing image based text collation tools. As part of this effort, they have created a host of image
segmentation tools. See Wang, Song, and David L. Miller. "Paragon," Center for Digital Humanities: Paragon, accessed
September 15, 2014. http://cdh.sc.edu/projects/paragon.
with the ability to model these representations of difference into a prediction of printing order.28
Because the force applied by the printing process is directionally consistent. the individual items in a
forme tend likewise to spread in a directionally consistent fashion.29 The pieces travel, as it were,
down a machine mediated road. Happily, a range of such algorithms has been developed by the
engineering sciences for tracking and understanding just this type of travel by objects (such as, for
example, water, ice, and electrons through logic gates), and they are readily available for
implementation by humanities scholars as open source software libraries.30
A final technique of digital analytical bibliography is the analysis of the materiality of the paper
on which broadside ballads are printed. This would serve two primary purposes: (1) to reasonably
identify the printer of a given broadside; and (2) as another means to suggest a potential production
order of broadsides, specifically printed using the same batch of paper. By tracking both the appearance
and shifting of watermarks on paper across multiple publications, Stevenson is able to create a history
of the paper itself, showing which printer owned and was using a particular paper, in a particular place,
and at a particular time. By matching the paper in the Constance Missale to this map, in sum, he is able
to effectively date the work. Image segmentation (the extraction of shapes from a larger image) is a
primary focus of computer science image manipulation research, and the state of the art is quite
advanced. As such, this manually intensive work can easily be carried out computationally by the
28
In computing terms, segmentation is the process of “segmenting” an image into discrete, identifiable parts. A simple
segmentation routine would, for example, separate the above image into one segment representing the title text above
the block impression and another for the block impression. If we were segmenting an image of the entire broadside, the
remaining text columns would also be segmented into separate units.
29
When force is applied in a directionally consistent manner, the flow will likewise migrate in a similarly consistent
manner. The distance that individual objects move will vary even if the force applied remains consistent, but the
direction of flow remains consistent.
30
Two examples of such libraries currently in wide use are OpenFOAM, http://www.openfoam.com/, and SU2,
http://su2.stanford.edu/. Both provide a large collection of tools for analyzing and modeling object flow.
computer, thereby allowing us to implement Stevenson's method across large collections of texts.31
The evidence of the paper can also be used to suggest the production order of individual copies
of the same ballad printed in a single print run. This can be accomplished by tracking spatial changes in
the relationship between wire and chain lines and watermarks across pieces of paper. The paper making
process was skilled manual labor that involved a rhythmical shake of the moulde after it was pulled
through the pulpy water, in order to entangle and bind the pulp fibers as their suspension medium
(water) drains through the bottom of the wire moulde. As a result of the shaking, it was common for
sewn watermarks to shift relative to the wire and chain lines, and, in older mouldes, for the wire and
chain lines to sometimes shift relative to each other. This shifting and spreading tends to happen in a
regularized flow, such that, given a stack of resulting paper, one can typically use such migration to
establish an order in which the paper itself was manufactured. Because the data acquired from this
analysis relates directly to the paper itself and not necessarily to the order in which texts were actually
printed on paper (the paper could easily have been shuffled prior to printing) and because of its extreme
labor intensiveness, this type of textual analysis is quite rare. But whereas the measurement of wire and
chain lines on individual sheets of paper represents a significant work effort for a human scholar, for a
computer examining a suitable image, the task is trivial and can be accomplished in a matter of
seconds.32 The subsequent task of analyzing the movement of these lines across multiple sheets in order
to derive a manufacturing order is, as with the process described above for analyzing the flow of
objects in the printing forme, similarly trivial for the computer. As such, today this approach offers
31
See Hazem Ali Abd Al Faleh Al Hiary, "Paper-based Watermark Extraction with Image Processing," PhD diss.,
University of Leeds, 2008, accessed January 12, 2013, http://etheses.whiterose.ac.uk/1355/1/hazem.pdf. A digital
application of this methodology would benefit from being applied to the largest possible collection.
32
As described below, the successful application of a computerized method is greatly enhanced by the acquisition of
specialized digital images designed specifically to reveal aspects of the physical paper not readily apparent in standard
photographic surrogates (regardless of resolution) or even by viewing the original artifact.
itself as another potentially valuable tool in the digital analytical bibliographic toolkit.
None of the above scenarios are mere science fiction. All are based on actual work, currently
underway. This is not to say that there is not still work to be done to perfect these techniques. The
algorithms of extraction and analysis can be improved, as can the types of images to which they are
applied. To date, most digitization efforts, including EBBA’s, have focused on acquiring and curating
digital versions of original artifacts that capture, as much as possible, a human visual experience of the
artifact. The high-resolution, color JPEG or PNG is an attempt to provide the human viewer with as
much of the same information as he or she could glean from an examination of the original. But the
digital analytical bibliographic method relies on a computer reader rather than a human one, and
computers “see” differently than humans. As such, to maximize the application of computing
technologies, we need to start capturing and curating views of historical printed material that will be of
maximum use to the computer itself. We need to archive as much for the computer reader as the human
one.
Photographic methods such as Hyperspectral Imaging, Reflectance Transformation Imaging,
and simple back-light photography have all been shown to reveal visually hidden aspects of both print
and paper materiality.33 The large-scale capturing of such images across digital archives would
33
Hyperspectral Imaging is a photographic process that captures data beyond the human visible light spectrum. It is known
to reveal altered, erased, covered, and even un-inked marks on a text. See Patrick Shiel, Malte Rehbein, and John
Keating,"The Ghost in the Manuscript: Hyperspectral Text Recovery and Segmentation," in Kodikologie Und
Paläographie Im Digitalen Zeitalter - Codicology and Palaeography in the Digital Age, ed. Malte Rehbein, Patrick
Sahle, and Torsten Schassan (Norderstedt: Institute for Documentology and Scholarly Editing, 2009), 159-74. See also
John R. Quain, "Peeling Back the Hidden Pages of History with Hyperspectral Photography," American Photo.
Reflectance Transformation Imaging is a photographic process that combines photographs of an object taken from a
fixed camera perspective but with a shifting light source. It is known to reveal altered, erased, covered, and un-inked
marks on a text that are not visible to the human under normal viewing conditions. See "Reflectance Transformation
Imaging (RTI)," Cultural Heritage Imaging, accessed September 14, 2014,
dramatically expand and improve the results of digital analytical bibliography and should be pursued.
But even in the absence of such data, digital analytic bibliography still offers the potential to enhance
both the depth and breadth of our knowledge of the broadside ballad specifically and of historical
printed materials in general. Digitally attending to the materiality of our texts will allow us to re-create
the complex printing networks out of which our artifacts of study were born. The use of these
techniques could revolutionize early modern studies by providing a chronological and geospatial map
of the entire printing history of the period.
http://culturalheritageimaging.org/Technologies/RTI/. See also Todd R. Hanneken, Integrating Spectral and Reflectance
Transformation Imaging Technologies for the Digitization of Manuscripts and Other Cultural Artifacts, PDF (San
Antonio: St. Mary's University, June 30, 2014), http://palimpsest.stmarytx.edu/integrating/WhitePaper-20140630.pdf.
Finally, for information on back-light photographic methods, see Hazem Ali Abd Al Faleh Al Hiary, 81-112.