Digitally connecting the scattered heritage

Transcription

Digitally connecting the scattered heritage
Digitally connecting the scattered heritage:
a Polish perspective
Marcin Werla
[email protected]
:URFáDZSeptember 8, 2014
Development of digital libraries infrastructure in Poland
1
Increase of the number of digital
libraries between 2002 and 2013
90
1
2
1
1
1
1
1
1
2
80
3
70
21
1
60
3
1
50
40
1
1
1
4
1
1
30
2
20
10
0
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
1
- Several hundreds
institutions
- 1.8M objects
5
1
1
1
10
1
1
1
1
Digital libraries in the PIONIER network
Digital libraries in the PIONIER Network
In total around 2 mln of digital objects
‡
~70 institutional digita libraries
Name
1
2
3
4
5
6
7
8
9
10
Cyfrowa Biblioteka Narodowa Polona
:ĂŐŝĞůůŽŷƐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ
e-­‐biblioteka Uniwersytetu Warszawskiego
ZĞƉŽnjLJƚŽƌŝƵŵLJĨƌŽǁĞ/ŶƐƚLJƚƵƚſǁEĂƵŬŽǁLJĐŚ
ŝďůŝŽƚĞŬĂLJĨƌŽǁĂhŶŝǁĞƌƐLJƚĞƚƵtƌŽĐųĂǁƐŬŝĞŐŽ
Polska Biblioteka Internetowa
ŝďůŝŽƚĞŬĂLJĨƌŽǁĂhŶŝǁĞƌƐLJƚĞƚƵBſĚnjŬŝĞŐŽ
ŝďůŝŽƚĞŬĂLJĨƌŽǁĂWŽůŝƚĞĐŚŶŝŬŝ_ůČƐŬŝĞũ
Muzeum Narodowe w Warszawie
ŬĂĚĞŵŝĐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ<Z<Mt
Average size: 15 826 objects
Median: 1 357 objects
‡
~40 regional digital libraries
Name
Size
308 933
258 280
161 965
46 958
43 989
32 071
23 593
22 444
13 060
11 789
Size
1 Wielkopolska Biblioteka Cyfrowa
222 521
2 _ůČƐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ
105 423
3 DĂųŽƉŽůƐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ
86 918
4 Kujawsko-­‐Pomorska Biblioteka Cyfrowa
75 674
5 Biblioteka Cyfrowa -­‐ ZĞŐŝŽŶĂůŝĂŝĞŵŝBſĚnjŬŝĞũ
52 376
6 ůďůČƐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ
43 847
7 ĂųƚLJĐŬĂŝďůŝŽƚĞŬĂLJĨƌŽǁĂ
39 916
8 Pomorska Biblioteka Cyfrowa
Zachodniopomorska Biblioteka Cyfrowa 9
"Pomerania"
10 Podlaska Biblioteka Cyfrowa
Average size: 24 020 objects
Median: 9 399 objects
38 221
29 733
28 927
Digital libraries in the PIONIER Network
‡
Regional digital libraries
± Development of idea of regional collaboration shaped during the initiation of
Wielkopolska Digital Library in 2002
± Allow smaller institutions to secure collections
in digital form and to make them available
on-line
± Optimize the use of shared IT infrastructure
± They are implemented also in country scale
(FIDES, RCIN) as well as in local scale
(Tarnowska DL, 2ãQEMCDL, e
± Make access to digital content easier
by providing single point of access
Practice of regional digital libraries
‡
‡
For reader they are simply web portals giving access to collections of cultural heritage from
many institutions under a single WWW address
In practice, realized as consortia, which on the basis of knowledge exchange and
collaboration, give their participants:
± Access to IT infrastructure necessary
to put digital collections on-line
± Ways to professionally preserve
digital copies for long time
± Know-how allowing to prepare high
resolution digital materials and metadata
± Wide promotion of resources
± Very good conditions to acquire
additional funding in common projects
Digital library of Wielkopolska t popularity in 2013
According to Google Analytics
Practice of regional digital libraries
‡
‡
Structure of collections of regional digital library often reflects complexity of the consortium
± Regional collections
± Thematic collections
± Institutional collections
Regional collaboration gives many benefits, but also requires compromises
± Common metadata schema
± Common web interface
± .GUUGORJCUKUQPKFGPVKV[QHUKPINGKPUVKVWVKQPUOQTGQPEQPUQTVKWOpUKFGPVKV[
Practice of regional digital libraries
‡
Good solution to balance collaboration and promotion of individual institutions are virtual repositories built on
top of regional digital libraries
Role of regional digital libraries
‡
‡
Regional digital libraries are more often a basis for new information services related to the heritage of a
region
They are used as repositories of source data, making the information services more rich and trusted
DInGO software t dDigitise CPF)Qr
http://dingo.psnc.pl/
Technical ingredient of regional digital libraries
‡ dLibra: system for digital libraries (e.g.: http://jbc.bj.uj.edu.pl/)
‡ dMuseion: system for digital museums (e.g.: http://cyfrowe.mnw.art.pl/)
‡ dLab: system for management of digitisation processes
‡ dArceo: system for long-term digital preservation
Digitisation process and DInGO software
Selection of objects for digitisation
Planned objects
Archiving
Preparation of digital object
On-­‐line publishing
Digitisation, standarisation
On-­‐line access
Presentation files
MASTER files
Promotion of regional heritage on (inter)national level
‡
‡
‡
‡
‡
Regional consortia allow small institutions to appear on the Internet
Regional digital libraries aggregate local and regional heritage in a digital form
National level access and promotion is organized on the basis of metadata aggregation from distributed
sources to one central database
This is the responsibility of Digital Libraries Federation of the PIONIER Network
Federation collaborates with Europena, moving these regional collections even higher, to international level
http://fbc.pionier.net.pl/
Digital Libraries Federation (DLF)
http://fbc.pionier.net.pl/
Public portal
ͻSearching, browsing
ͻDigitisation plans, persistent identifiers
Data provider for external services
ͻEuropeana, DART-­‐Europe
ͻKaRo
Information website for DL creators
ͻNews, publications
ͻDigital libraries database
Advanced services for DL administrators
ͻTraffic monitoring
ͻMetadata analysis module
Competence center for professionals
ͻE-­‐learning courses
ͻQ&A platform
Who is providing data to DLF?
Hundreds of institutions from entire Poland
Digital libraries, repositories, digital museums, digital archives
What kind of objects can you find in DLF?
Based on metadata analysis, done on September 3, 2014
ephemera
1%
manuscript
1%
archival document
1%
photo
other
16%
journal
46%
1%
oldprint
2%
postcard
2%
other
6%
ephemera
3%
PhD thesis
3%
book
5%
electronic document
4%
photo
4%
book
12%
journal
80%
80% objects: materials created before 1945
article
14%
20% objects: materials created after 1945
Increase of the number of objects in the DLF
2014 -­‐ ~2 million objects
2007 ʹ public opening of DLF, ~75 thousand objects
DLF statistics
Presently:
During 2013: ~2 million objects
4,5 million views
325 institutions
105 data sources
1,1 million visits
560 thousands unique users
Collaboration with Europeana
Europeana.eu = European Digital Library, Museum and Archive
2009
Beginning of collaboration in EuropeanaLocal
Federation connected to Europeana
2010
Europeana API pilot program participation
2011
Polish edition of Hack4Europe
2012
Two more Hack4Europe contests as a part of Europeana Awareness project
2013
Collaboration on Europeana 1989 Europeana Cloud project started
Visibility of Polish collections in Europeana
Data from http://www.europeana.eu/ (September 3, 2014)
Top 10 countries in Europeana
1. Francja
11,9%
2. Niemcy
11,2%
3. Holandia
10,8%
4. Hiszpania
9,1%
5. Szwecja
8,3%
ϲ͘tųŽĐŚLJ
8,1%
7. Wielka Brytania
8. Norwegia
3 876 048
3 650 312
3 515 861
2 975 847
2 707 656
2 655 770
7,6%
5,4%
9. Polska
5,2%
10. Irlandia
3,3%
2 486 594
1 766 490
1 711 099
1 090 660
Top 10 data providers to Europeana
1. The European Library
19,5%
2. Hispana
6,9%
3. OpenUp!
6,4%
4. Athena
6,2%
5. CARARE
6,1%
6. Federacja Bibliotek Cyfrowych
4,3%
7. Linked Heritage
4,2%
8. Swedish Open Cultural Heritage
4,1%
9. Arts Council Norway
3,3%
10. CultureGrid
3,2%
6 368 924
2 240 932
2 103 884
2 025 754
2 005 866
1 405 903
1 381 668
1 331 865
1 062 881
1 036 395
Europeana and private collections
How to save private collections together with their social context?
Public collection days and home digitisation
fbc.pionier.net.pl/zbiorki
Community contributions
europeana1989.eu
Long term preservations
fbc.pionier.net.pl/zbiorki
Example of high value of private collections
Summarizing - Most important success factors
‡
Regional collaboration
± Development of digital libraries in Poland as they are at the moment was initiated as a
series of regional projects, often WITHOUT any dedicated external funding
± +PUWEJqTGIKQPCNFKIKVCNNKDTCT[rOQFGNVJGTGCTGWUWCNN[
‡ One host institution which is providing the technical infrastructure
‡ A number of partners providing content
± First consortium was: Poznan Foundation of Scientific Libraries, PSNC, academic and
public institutions from the Wielkopolska region t http://www.wbc.poznan.pl/
± Such approach
‡ Allows to lower the costs for each participating institution (in many aspects)
‡ Gives small libraries opportunity to promote their collections on-line
‡ Provides natural platform for collaboration for next projects
‡ 4GSWKTGUVJGCEEGRVCPEGQHTGIKQPCNEQPUQTVKWOqKFGPVKV[r
Summarizing - Most important success factors
‡
Good technical support
± Shared technology platform (in case of Poland: dLibra/DInGO)
‡ Common development directions
‡ Shared development costs
‡ Lack of typical risks related to project-based funding
± Not maintained in-house solutions
± Abandoned commercial software
± Rising prices and vendor lock-in
‡ Documentation and technical support available locally
‡ Natural environment for development of good users community
± Requires reliable technology partner with proper business model
Summarizing - Lessons learned
‡
‡
Bottom-up approach made all that possible
± Did I forget to mention any central institutions in my presentation?
DWVe
± Some things were not standardized initially on central level CPFVJGPqUVCPFCTFUrYGTG
created in many places in parallel
‡ 40+ variatons of Dublin Core
± Other solutions were blindly copied, while they could be tailored to specific local needs
‡ The curse of DjVu format popularity
Most important challenges
‡
‡
Quality in mass digitization projects
± How to check within a month the quality of what a commercial company was preparing
for 6-8 months?
± How to eliminate cheating companies and not cancel the project?
Long-term digital preservation
± How to make sure that results of hundreds of digitisation projects are properly secured for
the future?
Most important challenges
‡
‡
‡
Data interoperability
± How to make sure that newly developed small systems follow best digital libraries
practices?
± How to use data automatically with tools for digital humanities researchers?
Open access to data and proper rights labelling
± Metadata t copyrighted or not?
‡ Europeana requires CC0 statement
± Content
‡ Is digitisation a creative process?
‡ Can commercial reuse of public domain materials be free?
Coordination of Europeana-related efforts
± Assuring proper representation of Polish heritage
Cloud technologies in the cultural sector
Small institutions: LoCloud
http://locloud.eu/
Mapping
Small libraries
Private archives
Home museums
Local memory institutions
Aggreg
ation
DLaaS
Wide access
Enrichment
Cloud services
Remote support and education
Europeana
Cloud technologies in the cultural sector
‡
‡
‡
‡
LoCloud Collections t
Digital Library Service in a cloud
± https://locloud.pl/
The service is now open and
available for testing
1.0 version is planned
for January 2015
Until the end of 2015 the service
is free, after that time it must become
self sustainable
Cloud technologies in the cultural sector
European infrastructure: Europeana Cloud
http://pro.europeana.eu/web/europeana-­‐cloud
The European
Library
EU-­‐Screen
Digital Libraries Federation
The European
Library
Digital Libraries Federation
Europeana Research
͙
͙
vs
Portal Europeana
EU-­‐Screen
͙
͙
IMPACT European Center of Competence
http://digitisation.eu/
Tools
Optimization of resources usage in digitisation processes Standardization of data and tools
Prizes, contests, events
Founding members
Best practicies
IMPACT CoC in Digitsation
Shared infrastructure for digital libraries competence centers
Data
Services
Trainings
Virtual Transcription Laboratory
‡
Virtual Transcription Laboratory
(http://wlt.synat.pcss.pl) offers:
± A free tool supporting creation of
textual versions of historical
documents
± Dedicated OCR service for all VTL
users
± Crowdsourcing platform allowing to
collaborate while creating
transcriptions of digitized documents
Examples of projects in VTL
http://wlt.synat.pcss.pl
Books, old-RTKPVUCFFTGUUDQQMUOCIC\KPGUCPFOQTGe
OCR training tool for profiling with historical documents
OCR training tool
http://wlt.synat.pcss.pl/cutouts
Thank you for your attention!
Marcin Werla ([email protected])
http://dl.psnc.pl/
2Q\PCý Supercomputing and Networking Center
affiliated to the Institute of Bioorganic Chemistry of the Polish Academy of Sciences,
ul. Noskowskiego 12/14, 61-3R]QDĔ32/$1'
Office: phone center: (+48 61) 858-20-00, fax: (+48 61) 852-59-54,
e-mail: [email protected], http://www.psnc.pl