Digitally connecting the scattered heritage: a Polish perspective Marcin Werla [email protected] :URFáDZSeptember 8, 2014 Development of digital libraries infrastructure in Poland 1 Increase of the number of digital libraries between 2002 and 2013 90 1 2 1 1 1 1 1 1 2 80 3 70 21 1 60 3 1 50 40 1 1 1 4 1 1 30 2 20 10 0 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 1 - Several hundreds institutions - 1.8M objects 5 1 1 1 10 1 1 1 1 Digital libraries in the PIONIER network Digital libraries in the PIONIER Network In total around 2 mln of digital objects ~70 institutional digita libraries Name 1 2 3 4 5 6 7 8 9 10 Cyfrowa Biblioteka Narodowa Polona :ĂŐŝĞůůŽŷƐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ e-‐biblioteka Uniwersytetu Warszawskiego ZĞƉŽnjLJƚŽƌŝƵŵLJĨƌŽǁĞ/ŶƐƚLJƚƵƚſǁEĂƵŬŽǁLJĐŚ ŝďůŝŽƚĞŬĂLJĨƌŽǁĂhŶŝǁĞƌƐLJƚĞƚƵtƌŽĐųĂǁƐŬŝĞŐŽ Polska Biblioteka Internetowa ŝďůŝŽƚĞŬĂLJĨƌŽǁĂhŶŝǁĞƌƐLJƚĞƚƵBſĚnjŬŝĞŐŽ ŝďůŝŽƚĞŬĂLJĨƌŽǁĂWŽůŝƚĞĐŚŶŝŬŝ_ůČƐŬŝĞũ Muzeum Narodowe w Warszawie ŬĂĚĞŵŝĐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ<Z<Mt Average size: 15 826 objects Median: 1 357 objects ~40 regional digital libraries Name Size 308 933 258 280 161 965 46 958 43 989 32 071 23 593 22 444 13 060 11 789 Size 1 Wielkopolska Biblioteka Cyfrowa 222 521 2 _ůČƐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ 105 423 3 DĂųŽƉŽůƐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ 86 918 4 Kujawsko-‐Pomorska Biblioteka Cyfrowa 75 674 5 Biblioteka Cyfrowa -‐ ZĞŐŝŽŶĂůŝĂŝĞŵŝBſĚnjŬŝĞũ 52 376 6 ůďůČƐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ 43 847 7 ĂųƚLJĐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ 39 916 8 Pomorska Biblioteka Cyfrowa Zachodniopomorska Biblioteka Cyfrowa 9 "Pomerania" 10 Podlaska Biblioteka Cyfrowa Average size: 24 020 objects Median: 9 399 objects 38 221 29 733 28 927 Digital libraries in the PIONIER Network Regional digital libraries ± Development of idea of regional collaboration shaped during the initiation of Wielkopolska Digital Library in 2002 ± Allow smaller institutions to secure collections in digital form and to make them available on-line ± Optimize the use of shared IT infrastructure ± They are implemented also in country scale (FIDES, RCIN) as well as in local scale (Tarnowska DL, 2ãQEMCDL, e ± Make access to digital content easier by providing single point of access Practice of regional digital libraries For reader they are simply web portals giving access to collections of cultural heritage from many institutions under a single WWW address In practice, realized as consortia, which on the basis of knowledge exchange and collaboration, give their participants: ± Access to IT infrastructure necessary to put digital collections on-line ± Ways to professionally preserve digital copies for long time ± Know-how allowing to prepare high resolution digital materials and metadata ± Wide promotion of resources ± Very good conditions to acquire additional funding in common projects Digital library of Wielkopolska t popularity in 2013 According to Google Analytics Practice of regional digital libraries Structure of collections of regional digital library often reflects complexity of the consortium ± Regional collections ± Thematic collections ± Institutional collections Regional collaboration gives many benefits, but also requires compromises ± Common metadata schema ± Common web interface ± .GUUGORJCUKUQPKFGPVKV[QHUKPINGKPUVKVWVKQPUOQTGQPEQPUQTVKWOpUKFGPVKV[ Practice of regional digital libraries Good solution to balance collaboration and promotion of individual institutions are virtual repositories built on top of regional digital libraries Role of regional digital libraries Regional digital libraries are more often a basis for new information services related to the heritage of a region They are used as repositories of source data, making the information services more rich and trusted DInGO software t dDigitise CPF)Qr Technical ingredient of regional digital libraries dLibra: system for digital libraries (e.g.: dMuseion: system for digital museums (e.g.: dLab: system for management of digitisation processes dArceo: system for long-term digital preservation Digitisation process and DInGO software Selection of objects for digitisation Planned objects Archiving Preparation of digital object On-‐line publishing Digitisation, standarisation On-‐line access Presentation files MASTER files Promotion of regional heritage on (inter)national level Regional consortia allow small institutions to appear on the Internet Regional digital libraries aggregate local and regional heritage in a digital form National level access and promotion is organized on the basis of metadata aggregation from distributed sources to one central database This is the responsibility of Digital Libraries Federation of the PIONIER Network Federation collaborates with Europena, moving these regional collections even higher, to international level Digital Libraries Federation (DLF) Public portal ͻSearching, browsing ͻDigitisation plans, persistent identifiers Data provider for external services ͻEuropeana, DART-‐Europe ͻKaRo Information website for DL creators ͻNews, publications ͻDigital libraries database Advanced services for DL administrators ͻTraffic monitoring ͻMetadata analysis module Competence center for professionals ͻE-‐learning courses ͻQ&A platform Who is providing data to DLF? Hundreds of institutions from entire Poland Digital libraries, repositories, digital museums, digital archives What kind of objects can you find in DLF? Based on metadata analysis, done on September 3, 2014 ephemera 1% manuscript 1% archival document 1% photo other 16% journal 46% 1% oldprint 2% postcard 2% other 6% ephemera 3% PhD thesis 3% book 5% electronic document 4% photo 4% book 12% journal 80% 80% objects: materials created before 1945 article 14% 20% objects: materials created after 1945 Increase of the number of objects in the DLF 2014 -‐ ~2 million objects 2007 ʹ public opening of DLF, ~75 thousand objects DLF statistics Presently: During 2013: ~2 million objects 4,5 million views 325 institutions 105 data sources 1,1 million visits 560 thousands unique users Collaboration with Europeana = European Digital Library, Museum and Archive 2009 Beginning of collaboration in EuropeanaLocal Federation connected to Europeana 2010 Europeana API pilot program participation 2011 Polish edition of Hack4Europe 2012 Two more Hack4Europe contests as a part of Europeana Awareness project 2013 Collaboration on Europeana 1989 Europeana Cloud project started Visibility of Polish collections in Europeana Data from (September 3, 2014) Top 10 countries in Europeana 1. Francja 11,9% 2. Niemcy 11,2% 3. Holandia 10,8% 4. Hiszpania 9,1% 5. Szwecja 8,3% ϲ͘tųŽĐŚLJ 8,1% 7. Wielka Brytania 8. Norwegia 3 876 048 3 650 312 3 515 861 2 975 847 2 707 656 2 655 770 7,6% 5,4% 9. Polska 5,2% 10. Irlandia 3,3% 2 486 594 1 766 490 1 711 099 1 090 660 Top 10 data providers to Europeana 1. The European Library 19,5% 2. Hispana 6,9% 3. OpenUp! 6,4% 4. Athena 6,2% 5. CARARE 6,1% 6. Federacja Bibliotek Cyfrowych 4,3% 7. Linked Heritage 4,2% 8. Swedish Open Cultural Heritage 4,1% 9. Arts Council Norway 3,3% 10. CultureGrid 3,2% 6 368 924 2 240 932 2 103 884 2 025 754 2 005 866 1 405 903 1 381 668 1 331 865 1 062 881 1 036 395 Europeana and private collections How to save private collections together with their social context? Public collection days and home digitisation Community contributions Long term preservations Example of high value of private collections Summarizing - Most important success factors Regional collaboration ± Development of digital libraries in Poland as they are at the moment was initiated as a series of regional projects, often WITHOUT any dedicated external funding ± +PUWEJqTGIKQPCNFKIKVCNNKDTCT[rOQFGNVJGTGCTGWUWCNN[ One host institution which is providing the technical infrastructure A number of partners providing content ± First consortium was: Poznan Foundation of Scientific Libraries, PSNC, academic and public institutions from the Wielkopolska region t ± Such approach Allows to lower the costs for each participating institution (in many aspects) Gives small libraries opportunity to promote their collections on-line Provides natural platform for collaboration for next projects 4GSWKTGUVJGCEEGRVCPEGQHTGIKQPCNEQPUQTVKWOqKFGPVKV[r Summarizing - Most important success factors Good technical support ± Shared technology platform (in case of Poland: dLibra/DInGO) Common development directions Shared development costs Lack of typical risks related to project-based funding ± Not maintained in-house solutions ± Abandoned commercial software ± Rising prices and vendor lock-in Documentation and technical support available locally Natural environment for development of good users community ± Requires reliable technology partner with proper business model Summarizing - Lessons learned Bottom-up approach made all that possible ± Did I forget to mention any central institutions in my presentation? DWVe ± Some things were not standardized initially on central level CPFVJGPqUVCPFCTFUrYGTG created in many places in parallel 40+ variatons of Dublin Core ± Other solutions were blindly copied, while they could be tailored to specific local needs The curse of DjVu format popularity Most important challenges Quality in mass digitization projects ± How to check within a month the quality of what a commercial company was preparing for 6-8 months? ± How to eliminate cheating companies and not cancel the project? Long-term digital preservation ± How to make sure that results of hundreds of digitisation projects are properly secured for the future? Most important challenges Data interoperability ± How to make sure that newly developed small systems follow best digital libraries practices? ± How to use data automatically with tools for digital humanities researchers? Open access to data and proper rights labelling ± Metadata t copyrighted or not? Europeana requires CC0 statement ± Content Is digitisation a creative process? Can commercial reuse of public domain materials be free? Coordination of Europeana-related efforts ± Assuring proper representation of Polish heritage Cloud technologies in the cultural sector Small institutions: LoCloud Mapping Small libraries Private archives Home museums Local memory institutions Aggreg ation DLaaS Wide access Enrichment Cloud services Remote support and education Europeana Cloud technologies in the cultural sector LoCloud Collections t Digital Library Service in a cloud ± The service is now open and available for testing 1.0 version is planned for January 2015 Until the end of 2015 the service is free, after that time it must become self sustainable Cloud technologies in the cultural sector European infrastructure: Europeana Cloud‐cloud The European Library EU-‐Screen Digital Libraries Federation The European Library Digital Libraries Federation Europeana Research ͙ ͙ vs Portal Europeana EU-‐Screen ͙ ͙ IMPACT European Center of Competence Tools Optimization of resources usage in digitisation processes Standardization of data and tools Prizes, contests, events Founding members Best practicies IMPACT CoC in Digitsation Shared infrastructure for digital libraries competence centers Data Services Trainings Virtual Transcription Laboratory Virtual Transcription Laboratory ( offers: ± A free tool supporting creation of textual versions of historical documents ± Dedicated OCR service for all VTL users ± Crowdsourcing platform allowing to collaborate while creating transcriptions of digitized documents Examples of projects in VTL Books, old-RTKPVUCFFTGUUDQQMUOCIC\KPGUCPFOQTGe OCR training tool for profiling with historical documents OCR training tool Thank you for your attention! Marcin Werla ([email protected]) 2Q\PCý Supercomputing and Networking Center affiliated to the Institute of Bioorganic Chemistry of the Polish Academy of Sciences, ul. Noskowskiego 12/14, 61-3R]QDĔ32/$1' Office: phone center: (+48 61) 858-20-00, fax: (+48 61) 852-59-54, e-mail: [email protected],