Carl G Stahmer Director of Digital Scholarship University of
Transcription
Carl G Stahmer Director of Digital Scholarship University of
Carl G Stahmer Director of Digital Scholarship University of California Davis Library Associated Director English Broadside Ballad Archive Forthcoming: Huntington Library Quarterly DO NOT DISTRIBUTE Digital Analytical Bibliography: Ballad Sheet Forensics, Preservation, and the Digital Archive In late February, 1954, Allan Stevenson was on fellowship at the Huntington Library engaged in research on the sources of paper used in the printing of early English books when news broke that the Pierpont Morgan Library had acquired one of three known copies of the Constance Missale—a liturgical text believed by many to predate the Gutenberg Bible (printed in 1455) by as many as five years, thereby making it the oldest extant text printed using moveable type.1 The officers of the Morgan Library certainly believed this to be true, as they invested an excess of $100,000.00 to acquire the text.2 On hearing news of the acquisition, in Stevenson's own words, “It occurred to [him] that, although half a million words had been spilt concerning the typography of the Missale, decidedly few had been ventured concerning the paper which had served to set the type before the gaze of men.”3 As a result of this revelation, Stevenson spent the next 13 years meticulously documenting and cataloguing observable aspects of the material text. Building on the previous work of scholars such as Bühler, Hupp, Masson, and Scholderer, he created a chronological and spatial map of the movement of 1 Scholars had variously dated the Missale from 1448 to the mid 1470s. For a good overview of the various publication dates ascribed to the text see Stevenson, Alan. The Problem of the Missale Especiale (London: Bibliographical Society, 1967), esp. Chapter 1, “The Riddle of the Book.” 2 Raymond A. Lajoie, "Constance Missal Considered World's Oldest Book," Gadsden Times, September 5, 1954. Accounting for inflation, in 2014 the acquisition would have cost in excess of $850,000.00. 3 Stevenson, Problem of the Missale Especiale, 29. individual type sorts and fonts as they appeared both in the Constance Missale and in various other texts from the fifteenth century.4 By also meticulously measuring the orientation of watermarks to the wire and chain lines on the paper on which the work was printed, as well as the movement of these lines in relation to each other, he was also able to construct a picture of the production history of the paper on which the Constance Missale was printed.5 As a result of this work with the visible remnants of the paper making and printing processes, ghosts of the work's material past, Stevenson was able to solve one of the highest profile scholarly debates of his time and definitively show that the Constance Missale had actually been published in 1473, after the Gutenberg Bible. 4 A “sort” is an individual, physical piece of type representing a letter or symbol. Individual sorts were cast in a matrix, typically a hand mould, into which molten metal is poured. After cooling, the matrix is opened and the sort is released. The new sort must then be cleaned and filed by hand to remove rough edges prior to use. As a result of this hand filing process, and its unique printing history, each sort is to some extent unique. For previous scholarship on which Stevenson’s method was based, see Curt F. Bühler, The Fifteenth-century Book; the Scribes, the Printers, the Decorators Philadelphia: Univ. of Pennsylvania Press, 1960. Also "Another View on the Dating of the Missale Speciale Constantiense," The Library, 5th ser., no. 14 (1959): 1-10; Otto Hupp,. Ein Missale Speciale, Vorläufer Des Psalteriums Von 1457. Beitrag Zur Geschichte Der Ältesten Druckwerke (Munich: Müchen-Regensburg, 1898); Irvine Masson, The Mainz Psalters and Canon Missae, 1457-1459 (London: Printed for the Bibliographical Society, 1954); and Victor Scholderer, Fifty Essays in Fifteenth-and Sixteenth-century Bibliography, ed.Dennis E. Rhodes (Amsterdam: M. Hertzberger &, 1966). 5 Figure 1 shows both a paper making moulde and a piece of unprinted paper (from a different moulde). The “chains” in the paper making moulde are metal or wood structural ribs that help the frame hold its form (vertically oriented in Figure 1). “Wires” are then strung across the frame at a right angle to the chains (horizontally oriented in Figure 1). The wires act as the actual bed onto which the fibers that will constitute the paper come to rest during the paper making process. Watermarks were typically sown onto the frame at the end of the mouldeproduction. These physical structures of the paper making moulde leave an impression on the resulting piece of paper. All of these various elements migrate relative to each other during the paper making process, creating a unique fingerprint on each resulting piece of paper. Figure 1: Paper making moulde and a blank sheet of paper (printed from different moulde) showing the impressions left by the physical apparatus of the paper making process Stevenson’s work built upon an established form of enquiry known as analytical bibliography, that relied upon, in his words, “searching out forms of material evidence” and developing techniques to effectively establish the printing history of texts.6 His particular application of the methods of analytical bibliography to the problem of the Constance Missale proved so convincing that it caused Victor Scholderer, whose own work on the text Stevenson refuted, to state definitively that, “The new procedure will be required of the cataloguer henceforward, now that we see how decisive the evidence of the paper can be.”7 Alas, I can safely report that Scholderer's predictions for the future of cataloguing and bibliography have not come to pass. With the exception of a few, rarified cases, neither 6 Stevenson, Problem of the Missale Especiale, 26. 7 Victor Scholderer, forward to Stevenson, Problem of the Missale Especiale. library cataloguers nor textual scholars are en masse combing the world's libraries with magnifying glasses and rulers in hand measuring idiosyncrasies of paper and type.8 There are several reasons why Stevenson’s application of analytical bibliography has not evolved into a mainstream scholarly approach to the study of early modern texts in general nor of the broadside ballad in particular. First and foremost is the simple fact that it is labor intensive. It took Stevenson thirteen years of concerted study looking at a relatively small number of texts (compared to the vast body of extant broadside ballads let alone other published works) to complete, compile, and publish his analysis. Neither the workflows nor budgets of libraries or academic departments are internally capable of sustaining such a gargantuan effort. Given this reality, this work can only be conducted through the successful acquisition of outside funding. Such funding opportunities have always been sparse, and are today more sparse than ever.9 As a result, pursuing this type of research is, in the majority of cases, simply not institutionally feasible. A second factor that has prevented the application of analytical bibliography to broadside ballads has been limited access. Prior to the advent of the digital archive, most scholars simply did not have access to any view of the majority of ballads other than through microfilm or print from microfilm. These microfilm-derived versions, which presented only high-contrast masks of the originals, did not adequately capture the traces of the material printing process upon which the analytical bibliographic method relied. The only way for scholars to observe and document this crucial 8 The major exception is the dating of the William Shakespeare quartos, where these techniques were applied by a group of dedicated scholars who, through their efforts, succeeded not only in creating a complete catalogue of font types used in the quartos but also, through this effort, established their publication history. See R. B. McKerrow, Printers' & Publishers' Devices in England & Scotland 1485–1640 (London: Chiswick Press, 1913); also . Sir Walter Greg, "'On Certain False Dates in Shakespearian Quartos," The Library, 2nd ser., no. 9 (1908): 113-31. 9 American Academy of Arts and Sciences, The State of the Humanities: Funding 2014, Report April 2014, accessed September 9, 2014, http://www.humanitiesindicators.org/binaries/pdf/HI_FundingReport2014.pdf. information was to view the originals in their institutional repositories, which were (and are) scattered around the globe and, in some cases, closed to viewing. The logistics of access thereby added yet another layer of temporal and financial burden to the endeavor. Computer technologies offer the potential to overcome the temporal, access, and cost barriers that have heretofore stood in the way of large-scale adoption of such meticulous forms of analytical bibliography, opening up new possibilities for advances not only in broadside ballad studies but in early modern studies in general. At present, a strong majority of extant pre-eighteenth-century English broadside ballads are freely available online, where they can be studied by anyone, anywhere, with an internet connection and a web browser.10 This unprecedented access has already had a positive impact on both the quantity and quality of ballad scholarship;11 and digitization efforts are ongoing.12 As such, we find ourselves in an era of “Big Data” ballad scholarship, where both humans and, of more importance to analytical bibliography, computers have access to large-scale, analyzable stores of cultural data. In an analytical bibliographic ecosystem, computer access to big ballad data supersedes the need for human access because computers can easily overcome what has historically been the method's 10 As of this writing, the English Broadside Ballad Archive, or EBBA, currently provides high-resolution facsimiles (in a variety of views) of 59.75% of all extant seventeenth-century English broadside ballads. In addition, the Bodleian Library Broadside Ballad Project provides digital access to their holdings of around 2,500 pre-1800 ballads and over 30,000 ballads from the sixteenth through to twentieth century. See English Broadside Ballad Archive, hereafter referred to as EBBA, http://ebba.english.ucsb.edu. See also Bodleian Ballads, http://www.bodley.ox.ac.uk/ballads/. 11 See Carl G. Stahmer, "Open Access Results in Increased Scholarship of English Broadside Ballads." in Carl Stahmer, PhD - Digital Humanist (blog), July 7, 2014, accessed September 1, 2014. http://www.carlstahmer.com/2014/07/openaccess-results-in-increased-scholarship-of-english-broadside-ballads/. 12 Work at EBBA is ongoing, with the recent awarding of a fifth round of funding from the NEH to support the digitization of the early printed ballads held at Harvard University’s Houghton Library. EBBA's goal is to provide public access to 100% of the extant broadside ballads from its period of focus (circa 11,000 items). primary barrier to entry: the sheer time involved in carrying out its methodologies of categorization, measurement, and calculation. Because analytical bibliography is mathematically rather than semantically intensive, computers are actually better equipped to engage in this type of scholarship than are their human counterparts. I am not here advocating for a completely automated mode of scholarship. What I am advocating for is a mode of digital analytical bibliography in which ballad scholars can leverage computational proficiencies in order to answer important, previously unanswered questions. As was the case with the Constance Missale prior to Stevenson's analytical bibliographic work, the obscured production history of broadside ballads remains at present one of the most significant impediments to their scholarly study. A majority of the extant English broadside ballads have no imprinted publication date, and many contain incomplete or no information regarding the printer and/or publisher of the ballad. Scholars have historically assigned possible publication date ranges to these ballads, often spanning as long as 50 years, through references made within a ballad to historical events of known date, through reference to known periods of activity by an identified printer or publisher (where one exists), and sometimes simply by reference to particular stylistic features. As with the Constance Missale, a digital analytical bibliographic approach would directly engage this knowledge deficit. In order to accomplish its work, a digital analytical bibliography would need to develop a set of verifiable methodologies for extracting and analyzing a variety of material aspects of the text. As a starting point for this new method, I suggest the following forms of information extraction and computational analysis, all of which we currently have the technological capability to implement: (1) an analysis of the frequent practice of reusing blocks, rules, or ornament on multiple broadsides over time; (2) the identification and tracking of individual sorts of type as they are used and reused on multiple ballads over time; (3) an analysis of how woodcuts, ornament, and type arranged in the printing press forme tend to drift from their original positions as multiple copies of a single ballad are printed; and (4) the analysis of wire lines, chain lines, and watermarks according to the methodology established by scholars such as Stevenson. A computational analysis of the frequent practice of reusing blocks, rules, or ornament on multiple broadsides over time would significantly add to our knowledge of their printing history. Broadsides are object-oriented by nature, comprised of collections of representational units that, as we see discussed in many essays in this issue, were arranged and re-arranged across various printings.13 Text, musical score, woodblock impressions, and ornamentation were used and re-used in various contexts to create a multitude of unique compositions. There was, in fact, a robust trade in the building blocks of the broadside printing industry. Individual blocks, sorts of type, rules, etc. were frequently sold and/or traded between printers and publishers. As a result, it is common, for example, to find the same woodblock used multiple times across completely different broadside ballads. Computers offer the possibility of tracking these building blocks as they moved from broadside to broadside and also from printer to printer. For the past two years, EBBA has been developing Content Based Image Retrieval (CBIR)14 software specifically designed to allow scholars to search a collection of digital images for all occurrences of a seed image. A beta version of the platform, which we call Arch-V, short for Archive Vision, is currently being implemented at EBBA where it is being used to power EBBA's nascent Ballad Impression Archive (BIA), which will provide a fully searchable and catalogued index of all woodblock impressions that appear on the broadside ballads in the EBBA 13 By “object-oriented” I mean here to consciously invoke the modern, computer science sense of the term as describing a design approach that produces a complex process through the arrangement of a set of discrete sub-objects, each being a complete entity in its own right. See "Object-oriented Design," Wikipedia, accessed December 9, 2014, http://en.wikipedia.org/wiki/Object-oriented_design. 14 See A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-based Image Retrieval at the End of the Early Years," IEEE Transactions on Pattern Analysis and Machine Intelligence 22, no. 12 (2000): 1349-380. doi:10.1109/34.895972. archive.15 A detailed discussion of the inner workings of Arch-V is beyond the scope of this essay, but a brief introduction will help to demonstrate how Arch-V in particular, and CBIR in general, can be leveraged to support an analytical bibliographic approach to ballad scholarship. Arch-V functions by creating a searchable index of “features” that appear in the images in a collection. Specifically, it utilizes an algorithm, called SURF feature extraction, to identify meaningful moments in an image where visible lines either diverge in their directions or end.16 Figure 2 below shows a woodblock impression extracted from EBBA 31981 (University of Glasgow Library Euing 370) with its SURF feature points highlighted. 15 In its current, beta incarnation, BIA and Arch-V are currently available to users by selecting the “Impression Archive” tab while viewing any ballad in the archive. 16 This is a very simplified explanation of SURF feature points. For an in-depth explanation see Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool, "Speeded-Up Robust Features (SURF)," Computer Vision and Image Understanding 110, no. 3 (2008): 346-59, accessed February 7, 2010, doi:10.1016/j.cviu.2007.09.014. Figure 2: SURF feature points identified by Arch-V in woodblock impression. The size of the drawn circle indicates the magnitude of the identified feature point. In the above example, each circle represents an identified feature point, and the radial line of each circle represents the computer’s mathematical assessment of the directions in which these features are oriented. Thus, for example, the portion of the image where the woman’s right elbow meets her body produces what the computer understands as a triangular feature that orients outward according to the radial line.17 Notice that here (as in many cases) the system will calculate several overlapping features of variant size. Here, we see one circled feature representing the small triangle where the elbow itself meets the body, a larger circle that represents the entire shape up to the cuff of the arm and bottom of 17 The “radial line” is the straight line that extends from the center of each circular feature point identifier to it exterior boundary/circumference. the blouse, etc., until we reach the end of the lady’s fingertips. These overlapping features are represented by the collection of concentrically drawn feature points, each with a slightly different radial orientation indicator as appropriate. Once Arch-V has identified the feature points that belong to an image, it associates the feature point collection with the image, and stores this association in a searchable index. When conducting a search of the EBBA archive, Arch-V extracts the feature points used in the original search (such as the angle of the lady’s elbow) and utilizes these points as a query to search against the entire collection’s feature point index.. As a result, similar individual shapes are identified, and similar combinations of shapes are combined into patterns of increasing significance. Figure 3 below shows a selection from a sample BIA search return for one of the woodblock impressions that appears on EBBA 30213 (British Library Roxburghe 1.308-309). Figure 3: Sample BIA search return showing found examples of the same or similar block used to produce impressions on other broadsides As shown in Figure 3, Arch-V allows scholars to track the appearance of an individual woodblock impression across multiple incarnations. In practice, this can reveal exact or variant copies of the same ballad, the re-use of a block across the many similar editions and/or printings of ballads, or the appearance of diverse similar woodblocks. The examples of the fourth impression in Figure 3 above, on Pepys 1.457 (EBBA 20032) and the fifth impression, on Roxburghe 1.54-55 (EBBA 30040), represent instances where the same woodblock was reused to create an illustration for completely different broadside ballads printed by different printers. Table 1 below indicates the current, most accurate known publication related data as reflected in the citation information provided in EBBA for three of the broadside ballads grouped by the above impression search: Shelfmark EBBA ID Date Imprint Roxburghe 1.308-309 30213 1623-1661 ? Printed at London by M.P. for F. Groue, / neere the Sarazens head with / out New-gate. Roxburghe 1.54-55 30040 1611-1656 ? London, Printed by M.P. for Edward Wright / at his Shop neere Christ Church gate. Pepys 1.457 20032 1632 ? Printed by A.M. for H.G. Table 1: Three broadside ballads printed with same woodblock In addition to the ballads above that were printed using this block, the printer of EBBA 20032, A.M., also utilized the same block in a book that he printed in 1630.18 The current publication dates associated with the three ballads above indicate that EBBA 30213 has the latest potential publication date of the three ballads (1661), with EBBA 30040 following (1656). Both of these broadsides were printed by M.P. However, we know that the same block was also used by A.M. to print both broadsides and books as early as 1630. The current cataloguing record thus indicates potentially overlapping publication date ranges for broadsides produced by different 18 See Tom Thumbe, His Life and Death Wherein Is Declared Many Maruailous Acts of Manhood, Full of Wonder, and Strange Merriments: Which Little Knight Liued in King Arthurs Time, and Famous in the Court of Great-Brittaine (London: Printed for Iohn Wright, 1630). For printing attribution of this book see University of Ghent library, accessed November 20, 2014. http://lib.ugent.be/catalog/rug01:001503347. printers using the exact same block. As only one printer could have been in possession of the block at any one time, this means that either: 1) M.P. and A.M. were trading the block back and forth, thereby allowing them to produce printings in an overlapping time period; or 2) the block was first owned by one printer and then acquired by the other such that there is no overlap in publication usage by the two printers. It is most likely that the two printers used the block sequentially and not during overlapping periods. The figure carved on the block and its subsequent impression is, by all accounts, a generic figure, and the block is used as such by both printers, where it serves as a multi-use male character, representing everything from lover to father. Given the fact that both M.P. and A.M. are known to have had many such stock figures in their block inventories, it is unlikely that either would have had need to, on multiple occasions, negotiate a short term lease or loan of that particular block for a particular printing.19 The evidence of the impressions both supports the above theory and allows us to establish a more accurate publication date sequence for the ballads.20 Figure 4 below presents a side-by-side rendition of the three impressions in question: 19 It is, of course, within the realm of possibility that they traded back and forth in order to keep their inventories fresh, but there is no actual evidence to this end; and, given the fact each maintained in their inventory multiple generic male figures, this is the least likely scenario. 20 I wish to both thank and credit Megan Palmer-Browne for lending her expertise in working with woodcut impressions and early modern printers to establishing the proper sequence of publication and providing information on Augustine Matthews’ personal history as indicated in the discussion that follows. Figure 4: Three impressions showing growth of wormholes Moving from left to right, notice that there are no wormholes in the leftmost impression, the appearance of wormholes in the center impression, and finally their continued growth and accentuation in the rightmost impression. This clearly establishes the following publication sequence for the impressions: Pepys 1.457 (EBBA 20032) was printed by A.M., after which M.P. printed Roxburghe 1.308-309 (EBBA 30213) and then Roxburghe 1.54-55 (EBBA 30040). In light of the above evidence, the most likely scenario is that A.M. held the block first and then transferred ownership to M.P., who died circa 1656.21 This jibes with the historical record as the only known printer from the period with the initials A.M., Augustine Matthews, was condemned to lose his press for reprinting Cole’s Holy Table without license and was subsequently listed by Sir John Lambe as a pauper in 1634, presumably after losing his press. Given this information, we can, with a high level of confidence, re-establish the actual printing dates of the three ballads on which the impressions appear as follows: 21 Hyder E. Rollins, " Martin Parker, Ballad-Monger," Modern Philology 16, no. 9 (1919): 449-474. Shelfmark EBBA ID Date Imprint 20032 1632-1634 Printed by A.M. for H.G. 30213 1632-1656 ? Printed at London by M.P. for F. Groue, / neere the Sarazens head with / out Newgate.London, 30040 1632 – 1656? Printed by M.P. for Edward Wright / at his Shop neere Christ Church gate. Table 2: Revised publication dates for three broadside ballads printed with same woodblock The above is just one example of how CBIR, by allowing us to follow the migration of the physical objects used in the printing process, can help us refine our knowledge of the history of the broadside ballads themselves. This same technology can also be leveraged to examine typeface to similar advantage. As discussed with the case of the Constance Missale, the identification and tracking of individual sorts of type and fonts as they were used and reused over time was an important element of the analytical bibliography as it evolved over the course of the late nineteenth and twentieth centuries. This kind of typographic analysis involves identifying particular fonts and/or identifiable variant(s) or flaws in an individual sort (as reflected in its print manifestation), collecting printed examples of the same font, letter, or figure from other printed material from the time, and then examining these suspects for the identified variant(s), thereby positively identifying other places where the same font or sort was used.22 This technique has typically been employed only on small, select 22 For example, one might find the removal of a jutting spur on the left from a “u” or “r” character so that it could be placed immediately adjacent to an “r” or “x” with a jutting spur on the right without the two spurs running into each other on the printed page. This type of post-casting modification of sorts was common. Because these modifications were produced by hand, each has a unique signature. collections of work, because the dataset involved in a complete analysis is too large to be human processed. If we were to segment the text in every extant work from the fifteenth to the nineteenth centuries into a collection of individual fonts, or more extreme, letters that appear in print, the resulting dataset would be so large that no human could work through it in a lifetime or cognitively grapple with the dataset even given multiple lifetimes. Imagine extracting every single “r” that appears in print in every work printed in the seventeenth century and then examining each of these extracted “r”s by hand to see how closely it resembles one, particular “r.” The spreadsheet would simply be too large to be comprehended.23 Whereas the above task would be daunting if not impossible for a human scholar, for a properly programmed computer with CBIR capabilities, the task is rudimentary. In addition to EBBA's Arch-V platform, there are currently several other, important projects working to make this kind of analysis a reality. The Early Modern Optical Character Recognition Project (eMOP) at the Initiative for Digital Humanities, Media, and Culture (IDHMC) at Texas A&M University has been at the forefront of work to utilize digital technologies to investigate both font and type.24 Work at eMOP has shown that font analysis and identification can be successfully leveraged to improve text recognition during the OCR extraction of textual content from early modern texts, even when working with lower and varied resolution images such as those found in EEBO. Additionally, pilot work carried out in conjunction 23 George A. Miller, "The Magical Number Seven, plus or minus Two: Some Limits on Our Capacity for Processing Information," Psychological Review 63, no. 2 (1956): 81-97. Miller's study remains the definitive work on the human brain's ability to simultaneously hold in memory multiple pieces of information. According to the study, the brain is only capable of simultaneously dealing with between 5-9 pieces of information at a time. As a result any human scholar could only hold an image of a few “r” sorts in consciousness while scanning other “r”s for possible matches. All of these possible matches would then have to be set aside for later actual comparison—all in all, a very inefficient and impractical system given the volume of information needing to be processed. 24 Early Modern OCR Project, http://emop.tamu.edu/, Initiative for Digital Humanities, Media, and Culture, http://idhmc.tamu.edu/. with EBBA has suggested that identification of individual type sorts can be accomplished given a collection of higher resolution images. Figure 5: Using Arch-V to find text printed using a particular sort Figure 5 above provides sample output of Arch-V's feature point matching capabilities applied to the task of identifying text printed using a particular, known sort. In the above image, the small “r” to the left is an image extracted from a different printed page than the page extract shown on the right. In this case, a human scholar first identified the seed “r” (which appears in the upper left corner of the image above) in which the spur on the bottom left of the letter is separated from the remainder of the letter as it appears in print. This is visible to the human eye as the space between the main body of the “r” and the small dot that appears at the bottom left hand corner (note that in the above image the computer has identified and marked this space as a feature point represented by a purple circle). This unique feature of this particular “r” could be the result of wear, filing, or poor inking. If, produced by inking, however, it is unlikely that one would find the exact same representation appearing on multiple printings, especially if they belong to completely different type-settings. As such, the appearance of the same “r” on multiple ballads suggests the re-use of the same sort across printings. By extracting and then matching feature points, Arch-V is, in fact, able to find the same “r” in a completely different print location (which appears on the right side of the image above).25 Note that other “r”s appear in the same extract of the sheet, but these are not matched by the system (a typecase would, of course, contain multiple “r”s). The lines drawn between the feature points in the left and right sides of the image indicate the points where the computer has made a match. The above does not constitute definitive proof that the same sort was used to produce the two “r”s as it is, theoretically possible (though unlikely) that poor inking on two, unrelated print runs could produce an identical impression or that two “r”s from the same matrix were identically filed by a printer or exhibit identical wear; however, when run on the average desktop computer, CBIR systems can perform this type of analysis on thousands of pages of text an hour, and thus produce a highly accurate output set for subsequent human examination and corroboration. The addition of a feedback loop into the system, whereby a human scholar feeds information back to the system telling the computer which of its matches were accurate and which were false would quickly produce a system capable of exceeding human accuracy when dealing with large data sets. An analysis of the migration of the physical objects arranged in the printing press forme as multiple copies of a single ballad are printed is another potentially important tool for the digital analytical bibliographer. In the letterpress era, the design for each page to be printed is hand created by combining sorts, blocks, rules, lead spacers, and filler furniture in a tightly bound forme, thereby creating a negative relief of the desired, printed page. It was not only common but nearly unavoidable in the early days of print for the various items bound in the forme to shift over the course of a print run. Because the press itself applies pressure in a regularized fashion, this migration tends to occur in flows 25 The fact that the computer is able to find another example of the same “r” with hanging foot, suggests that this anomaly is the result of a physical defect in the sort and not of poor inking. that are identifiable over time. 26 Figure 6 below shows a composite, overlaid image of three extant copies of a single broadside ballad with one copy appearing in its original black print and two overlaid copies appearing in colorized red and green print the purpose of demonstration. This composite view allows the viewer to clearly see the movement of type and block in the forme when the individual copies of the ballad were coming off the press. Figure 6: Migration of objects in the forme during printing Analysis of the movement of objects within the forme during printing as depicted in Figure 6 can be automated using existing technologies.27 It is, in fact, worth noting that some of these very technologies were used to construct Figure 6 as a means of making this movement humanly readable. Applying computational algorithms designed to make predictions about the flow of objects to the item segmentation and identification data depicted above is all that would be needed to endow the computer 26 By “regularized pressure” I mean to suggest not identical levels of force with each print but the fact that the mechanical nature of the press directs whatever force is being applied into a regularized direction/vector. 27 The Paragon project at the Center for Digital Humanities, University of South Carolina, is an NEH funded project devoted to developing image based text collation tools. As part of this effort, they have created a host of image segmentation tools. See Wang, Song, and David L. Miller. "Paragon," Center for Digital Humanities: Paragon, accessed September 15, 2014. http://cdh.sc.edu/projects/paragon. with the ability to model these representations of difference into a prediction of printing order.28 Because the force applied by the printing process is directionally consistent. the individual items in a forme tend likewise to spread in a directionally consistent fashion.29 The pieces travel, as it were, down a machine mediated road. Happily, a range of such algorithms has been developed by the engineering sciences for tracking and understanding just this type of travel by objects (such as, for example, water, ice, and electrons through logic gates), and they are readily available for implementation by humanities scholars as open source software libraries.30 A final technique of digital analytical bibliography is the analysis of the materiality of the paper on which broadside ballads are printed. This would serve two primary purposes: (1) to reasonably identify the printer of a given broadside; and (2) as another means to suggest a potential production order of broadsides, specifically printed using the same batch of paper. By tracking both the appearance and shifting of watermarks on paper across multiple publications, Stevenson is able to create a history of the paper itself, showing which printer owned and was using a particular paper, in a particular place, and at a particular time. By matching the paper in the Constance Missale to this map, in sum, he is able to effectively date the work. Image segmentation (the extraction of shapes from a larger image) is a primary focus of computer science image manipulation research, and the state of the art is quite advanced. As such, this manually intensive work can easily be carried out computationally by the 28 In computing terms, segmentation is the process of “segmenting” an image into discrete, identifiable parts. A simple segmentation routine would, for example, separate the above image into one segment representing the title text above the block impression and another for the block impression. If we were segmenting an image of the entire broadside, the remaining text columns would also be segmented into separate units. 29 When force is applied in a directionally consistent manner, the flow will likewise migrate in a similarly consistent manner. The distance that individual objects move will vary even if the force applied remains consistent, but the direction of flow remains consistent. 30 Two examples of such libraries currently in wide use are OpenFOAM, http://www.openfoam.com/, and SU2, http://su2.stanford.edu/. Both provide a large collection of tools for analyzing and modeling object flow. computer, thereby allowing us to implement Stevenson's method across large collections of texts.31 The evidence of the paper can also be used to suggest the production order of individual copies of the same ballad printed in a single print run. This can be accomplished by tracking spatial changes in the relationship between wire and chain lines and watermarks across pieces of paper. The paper making process was skilled manual labor that involved a rhythmical shake of the moulde after it was pulled through the pulpy water, in order to entangle and bind the pulp fibers as their suspension medium (water) drains through the bottom of the wire moulde. As a result of the shaking, it was common for sewn watermarks to shift relative to the wire and chain lines, and, in older mouldes, for the wire and chain lines to sometimes shift relative to each other. This shifting and spreading tends to happen in a regularized flow, such that, given a stack of resulting paper, one can typically use such migration to establish an order in which the paper itself was manufactured. Because the data acquired from this analysis relates directly to the paper itself and not necessarily to the order in which texts were actually printed on paper (the paper could easily have been shuffled prior to printing) and because of its extreme labor intensiveness, this type of textual analysis is quite rare. But whereas the measurement of wire and chain lines on individual sheets of paper represents a significant work effort for a human scholar, for a computer examining a suitable image, the task is trivial and can be accomplished in a matter of seconds.32 The subsequent task of analyzing the movement of these lines across multiple sheets in order to derive a manufacturing order is, as with the process described above for analyzing the flow of objects in the printing forme, similarly trivial for the computer. As such, today this approach offers 31 See Hazem Ali Abd Al Faleh Al Hiary, "Paper-based Watermark Extraction with Image Processing," PhD diss., University of Leeds, 2008, accessed January 12, 2013, http://etheses.whiterose.ac.uk/1355/1/hazem.pdf. A digital application of this methodology would benefit from being applied to the largest possible collection. 32 As described below, the successful application of a computerized method is greatly enhanced by the acquisition of specialized digital images designed specifically to reveal aspects of the physical paper not readily apparent in standard photographic surrogates (regardless of resolution) or even by viewing the original artifact. itself as another potentially valuable tool in the digital analytical bibliographic toolkit. None of the above scenarios are mere science fiction. All are based on actual work, currently underway. This is not to say that there is not still work to be done to perfect these techniques. The algorithms of extraction and analysis can be improved, as can the types of images to which they are applied. To date, most digitization efforts, including EBBA’s, have focused on acquiring and curating digital versions of original artifacts that capture, as much as possible, a human visual experience of the artifact. The high-resolution, color JPEG or PNG is an attempt to provide the human viewer with as much of the same information as he or she could glean from an examination of the original. But the digital analytical bibliographic method relies on a computer reader rather than a human one, and computers “see” differently than humans. As such, to maximize the application of computing technologies, we need to start capturing and curating views of historical printed material that will be of maximum use to the computer itself. We need to archive as much for the computer reader as the human one. Photographic methods such as Hyperspectral Imaging, Reflectance Transformation Imaging, and simple back-light photography have all been shown to reveal visually hidden aspects of both print and paper materiality.33 The large-scale capturing of such images across digital archives would 33 Hyperspectral Imaging is a photographic process that captures data beyond the human visible light spectrum. It is known to reveal altered, erased, covered, and even un-inked marks on a text. See Patrick Shiel, Malte Rehbein, and John Keating,"The Ghost in the Manuscript: Hyperspectral Text Recovery and Segmentation," in Kodikologie Und Paläographie Im Digitalen Zeitalter - Codicology and Palaeography in the Digital Age, ed. Malte Rehbein, Patrick Sahle, and Torsten Schassan (Norderstedt: Institute for Documentology and Scholarly Editing, 2009), 159-74. See also John R. Quain, "Peeling Back the Hidden Pages of History with Hyperspectral Photography," American Photo. Reflectance Transformation Imaging is a photographic process that combines photographs of an object taken from a fixed camera perspective but with a shifting light source. It is known to reveal altered, erased, covered, and un-inked marks on a text that are not visible to the human under normal viewing conditions. See "Reflectance Transformation Imaging (RTI)," Cultural Heritage Imaging, accessed September 14, 2014, dramatically expand and improve the results of digital analytical bibliography and should be pursued. But even in the absence of such data, digital analytic bibliography still offers the potential to enhance both the depth and breadth of our knowledge of the broadside ballad specifically and of historical printed materials in general. Digitally attending to the materiality of our texts will allow us to re-create the complex printing networks out of which our artifacts of study were born. The use of these techniques could revolutionize early modern studies by providing a chronological and geospatial map of the entire printing history of the period. http://culturalheritageimaging.org/Technologies/RTI/. See also Todd R. Hanneken, Integrating Spectral and Reflectance Transformation Imaging Technologies for the Digitization of Manuscripts and Other Cultural Artifacts, PDF (San Antonio: St. Mary's University, June 30, 2014), http://palimpsest.stmarytx.edu/integrating/WhitePaper-20140630.pdf. Finally, for information on back-light photographic methods, see Hazem Ali Abd Al Faleh Al Hiary, 81-112.