Apresentação

Transcription

Apresentação
INPA's Biological Collection Data
Quality Improvement
Laurindo Campos
[email protected]
MCTI/ INPA
The National Institute for Amazonian Research
Information Technology Coordination
BioGeo Informatics Unit
Semantic Interoperability Laboratory
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Outline
q 
q 
q 
q 
q 
q 
Biodiversity Scenarios: Global and National
INPA and its presence in Amazonia
Data Quality Issues
Disseminating Biological Data
INPA’s IT Evolution
Concluding Remarks
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Biodiversity Scenarios: Global and National
Global Diversity q 
From ~ 1.7 million of known species ¤  56% are insects! ¤  14% are plants ¤  2.7% are mammals and birds q 
It is es?mated that 4-­‐20 million of species have not been described yet 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Megadiverse countries America and the Caribbean is
Latin
the region with the greatest
biological diversity on the planet:
50% of the world’s tropical forests
33% of its total mammals
35% of its reptilian species
41% of its birds
50% of its amphibians
Six countries in Latin America
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
National Scenario:
Brazil - higher rate of biodiversity in the world (~
20%) - (Assunção, 2011)
q  Six biomes (disruption and degradation are the main
threats)
q  Combined pressures are forcing the loss of habitat
and species
q  Planning and Decisions are dependent on data/
metadata management
q 
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
National Scenario (cont.):
Data and metadata are collected as a preliminary
process in scientific experiments
q  Management is mandatory
q  Sharing data, analysis and synthesis are crucial
q  Data Governance - data and information as
commodities (Jason Kolb, 2011)
q 
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Barriers to overcome:
1. 
2. 
3. 
4. 
Data policy for organizations;
Improvements in infrastructure;
Improvements in data quality;
Effective management and use of data/
metadata and information.
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
INPA and its presence in Amazonia
Mission: ”To generate and disseminate knowledge and technologies, and to enable human resources for the development of the Amazon" 4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
THEMES
Biodiversity Knowledge of the biological diversity of the Amazon region. Society, Environment and Health Dynamics of human popula?ons of the Amazon and its social and environmental implica?ons. Environmental dynamics Understanding the Amazon ecosystem. Technology & InnovaHon Applica?on of the knowledge produced on natural resources for the development of techniques, processes and products that meet the socioeconomic demands. 4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
10
Research Areas MSc e PhD Programs ecology botany aquaculture entomology Aqua?c Biology Health Science Natural Products Forest Products tropical forest agronomy Food Technology Climate and Water Resources Humani?es and Social Sciences ecology botany entomology agriculture tropical forest Aqua?c Biology and Fisheries Gene?cs, Conserva?on & Evolu?onary Biology Biological Reserves Management Biotechnology (UFAM) Regional Products and Biotechnology (UEA) Science of Food (UFAM) 4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Brazilian Amazonia - INPA Central and State
Centers
São Gabriel da Cachoeira
Consolidated
Partnership
Santarém
Tefé
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
INPA’s New Geographic Approach
Amazonia sensu latissimo,
Source: Eva & Huber (2004).
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Expertise - Projects, Partnerships & Training
13
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
14
Program of ScienHfic CollecHons and Archives Zoological CollecHons INVERTEBRATES
~ 3,500,000 insects
Reptiles and Amphibians
~ 17000
PEIXES
~ 120,000 specimens of
BIRDS
various river flows. ~ 800 specimens
HERBARIUM
217.462 records
MAMMALS
CARPOTECA
2.500 samples
Wood Collection ~ 10.445 samples ~ 5.242
specimens
COLLECTIONS
MICROBIOLOGICAL
Medical and agroforestry
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
To follow principles
¨ 
¨ 
¨ 
The way to fulfill INPA’s mission is to treat data as longterm asset and managing it within a coordinated
framework.
Principles of data quality need to be applied at all stages
of the data management process (capture, digitization,
storage, analysis, presentation and use).
Focus on two keys to the improvement of data quality:
prevention and correction.
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Data Quality
POOR QUALITY •  Jeopardize decision making process, credibility
of data, satisfaction of users;
•  High costs of data management and the effective
use and value of data (Redman, 1996).
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
POOR DATA QUALITY: IMPACTS • 
• 
• 
• 
Pervasiveness of poor data;
Troublesome data and collection management;
Difficult data integration and database merging;
Scientific and institutional reputation.
(Dalcin, 2005)
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
APPROACH •  Since data users have wide range of needs, and data are
collected from different sources, INPA must enable data
of known (good) quality to be shared.
•  For specific dataset, it must document the way data has
been compiled and verified, and use it to provide
valuable information to metadata description.
•  Implement data curation activities
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Data and Computational Resources
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Diretoria
INPA
Curadoria de Dados
Estrutura
SDIN
COAE
CPAF
Pesquisas
CTIN
Programa
de
Coleções
LIS
Curadoria
de Dados
Científicos
NBGI
Grande
Projetos
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
DATA CURATION @ INPA •  On going management activities to maintain scientific
data in long-term mode such that it is available for reuse
and preservation.
•  Ex.: LBA, GEOMA, PELD,PPBIO, TEAM, GO AMAZON,
ATTO, etc;
•  Institutional Data Committee (Researchers & IT
Professionals) – Implementing data policy and its
enforcement;
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
DATA CURATION @ INPA Best practices and the development/adoption of specific tools.
Focus on:
•  Accuracy of taxonomic identification
•  Precision over the location and associated information in the
record
•  Clarity of the recording approach and methodology
•  Accuracy of producing and documenting the record
•  Quality of data transmission
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Generic Error Pattern (English, 1999)
¨  Data cleansing
¨  Error patterns
¤  Domain value redundancy
¤  Missing data values
¤  Incorrect data values
¤  Nonatomic data values
¤  Domain schizophrenia
¤  Duplicate occurrences
¤  Inconsistent data values
¤  Information quality contamination
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
DisseminaHng Biological Data 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Information Network: traditional Infrastructure
Science
Application
Policy
Tools
for
Presentation
Tools
for
Synthesis
Information
Tools
for
Analysis
Information
Infrastructure
Data Providers
Data Digitization
Adapted from Erick Mata, 2008.
4th 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Applications / data access today...
Web site
INPA
Portal DiGIR
INPA
Servidor
colecoes.inpa.gov.br
Coleções Biológicas
do INPA
Web site
Biodiversidade amazonica
(PPBio Amazônia)
Web site
INPA
Portal DiGIR
Biodiversidade amazônica
Portal DiGIR
INPA
Servidor
Biodiversidadeamazonica.net.br
Acervos de outras coleções
da Amazônia Ocidental
Servidor
colecoes.inpa.gov.br
Coleções Biológicas
do INPA
Web site
speciesLink
Web site
Biodiversidade amazonica
(PPBio Amazônia)
Web site
INPA
Portal DiGIR
speciesLink
Portal DiGIR
Biodiversidade amazônica
Portal DiGIR
INPA
Servidor
Biodiversidadeamazonica.net.br
Servidor
colecoes.inpa.gov.br
Rede Paraná
Taxon-line
Rede
Espírito Santo
Rede
São Paulo
Rede Rio
de Janeiro
Acervos de outras coleções
da Amazônia Ocidental
Coleções Biológicas
do INPA
Ferramentas
• Mapas
• Modelagem
• Datacleaning
• Georreferenciamento
automático
Web site
speciesLink
Web site
Biodiversidade amazonica
(PPBio Amazônia)
Web site
INPA
Portal DiGIR
speciesLink
Portal DiGIR
Biodiversidade amazônica
Portal DiGIR
INPA
Servidor
Biodiversidadeamazonica.net.br
Servidor
colecoes.inpa.gov.br
Rede Paraná
Taxon-line
Rede
Espírito Santo
Rede
São Paulo
Rede Rio
de Janeiro
Acervos de outras coleções
da Amazônia Ocidental
Coleções Biológicas
do INPA
GBIF
Web site
Biodiversidade amazonica
(PPBio Amazônia)
IABIN
SIBBr
Portal DiGIR
Biodiversidade amazônica
Web site
INPA
Portal DiGIR
INPA
Rede
speciesLink
Servidor
Biodiversidadeamazonica.net.br
Servidor
colecoes.inpa.gov.br
Rede Paraná
Taxon-line
Rede
Espírito Santo
Rede
São Paulo
Rede Rio
de Janeiro
Acervos de outras coleções
da Amazônia Ocidental
Coleções Biológicas
do INPA
SIBBr: A NaHonal IniHaHve 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Slide from SIBBR/LNCC, 2013
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Towards a beUer cyberinfrastrucure 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Conceptual Map
Slide from
D. Pennington
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Conceptual Landscape of Technology
Enabling Science - CLTES
Mental Model
¨  Research Design
¨  Collect Data
¨  Conduct Analyses
¨  Dissemination/Publishing
¨  Cyberinfrastructure Systems
¨ 
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
CLTES – Technology & Research Cicle
Slide from
D. Pennington
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
CLTES no INPA, MPEG, SIBBr, GBIF, Probio II, Biota, Cria, etc
Slide Adapted
From D. Penningtonth
SIBBr
4 4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Infrastructure for improving data management
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
From Data on the Web to a Web of Data
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
EVOLUTION OF DATA/METADATA
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
(Tim Bernes-Lee’s Open Data Classification, 2010)
★ On the web, open license
★★ Machine-readable data
★★★ Non-proprietary format
★★★★ RDF standards
★★★★★ Linked RDF
★★★★★ Linked to rich, descriptions capable of supporting
interoperability
Linked Open (Biological Data) - LOD
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
EVOLUÇÃO DA WEB
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Concluding Remarks •  Data quality refer to the understanding and description of
the processes in data acquisition, treatment and
management; information production, usage and delivery;
and data modeling and implementation.
•  The significant aspect of data quality issues is related
with the Internet and robust cyberinfrastructure which
promotes a better way information is delivered.
•  Researchers (Biologists) must follow/trust the new way
data is managed.
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Thank you!
Laurindo Campos
[email protected]
4th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Partners and Collaborators