A brochure - 3TU.Datacentrum

Transcription

A brochure - 3TU.Datacentrum
3TU.Datacentrum
Showcasing 3TU.Datasets
This brochure shows some examples of what 3TU.Datacentrum can do for researchers and their research data. Take
a look and read the stories of your fellow researchers. Many more interesting datasets are available; browse our
datacentre to discover data that you can perhaps reuse in your own research.
Contact us
• If you wish to deposit a single dataset in 3TU.Datacentrum you can easily use the self-upload facility.
• If you have complex datasets that require customised solutions, please contact us.
• If you have any questions about data management, digital object identifiers (DOIs) or data citation, we’re looking
forward to helping you!
We offer services during the whole research data lifecycle and also frequently organize symposia, seminars, training
and workshops.
3TU.Datacentrum - Prometheusplein 1, NL - 2628 ZC Delft
P.O. Box 98, NL - 2600 MG Delft - +31 (0)15 27 88 600
[email protected]
datacentrum.3tu.nl - data.3tu.nl - datacite.tudelft.nl
BAS WOLS
I think increasing your citation score will apply in the
long term, once a dataset becomes more and more
recognized as an official publication entity
Depositing research data underlying a
dissertation
In 2010, Bas Wols received a doctoral degree in Civil
Engineering at TU Delft. He used computational fluid
dynamics (CFD) to study the hydraulics in ozone and UV
drinking water treatment systems. His work resulted in
a modification of an ozone installation in Amsterdam
(Waternet), improving water disinfection. Also, his
simulations have shed light on energy-efficient UVfiltering of drinking water.
During his research project, a colleague drew Wols’
attention to 3TU.Datacentrum. His main reason for
depositing his data in a data repository was to give other
researchers access. “I would have profited from having
such data available during my own research. Because
data sharing isn’t main stream yet, I had to extract data
from scientific articles manually. This is neither efficient
nor accurate”. Wols’ data consist of measurements,
modelling data and movies, linked together in one
dataset. “I have deposited all data which underlie my
research. In this way researchers can in theory replicate
what I have done”.
Depositing his data at 3TU.Datacentrum was quite easy,
Wols says. “The 3TU.Datacentrum employees outlined
their ambition of giving access to research data clearly.
I had to transform my data into a NetCDF-format. This
format allows for downloading parts of the dataset and
querying it. Then, I brought a hard disk to the staff of
the 3TU.Datacentrum and my data were put online”.
To increase citability, digital object identifiers (DOIs)
were assigned to his research data as well as to his
dissertation. “I think increasing your citation score will
apply in the long term, once a dataset becomes more and
more recognized as an official publication entity”.
TOBIAS OTTO
It has taken some time to set up the data archiving
process and to create the metadata. But we save that
time now by being able to easily access our data
Benefiting from a continuously updated time
series of climate data
Tobias Otto and Herman Russchenberg (Atmospheric
Remote Sensing, TU Delft) take part in the CESAR
consortium (Cabauw Experimental Site for Atmospheric
Research). This consortium operates a large set of
instruments to study the atmosphere and its interaction
with the land surface. One of these instruments is the
TU Delft IRCTR drizzle radar, IDRA in short. It is located
on top of the 213 metre high tower, located next to the
village of Cabauw between Gouda and Utrecht. IDRA
measures drizzle (very fine rain), precipitation, and low
clouds with a high spatial and temporal resolution. The
long-term operation of IDRA will support the CESAR
objective to monitor trends in atmospheric changes.
The IDRA weather radar measurements consist of a large
time series of numerical data. The first dataset dates
from April 2009 and is continuously updated. To ensure
longevity and easy access, the datasets were stored at
3TU.Datacentrum. The data are stored in NetCDF- format
and reside on an OPeNDAP server. The dataset is enriched
with metadata to make the dataset self-explanatory. The
datasets are freely available and are easily accessible to
the users. In 2010 the IDRA dataset was given a digital
object identifier (DOI). Due to this permanent link,
the IDRA dataset is much easier to find on the digital
highway. “IDRA is now more visible to the scientific
community”, Otto says.
“It has taken some time to set up the data archiving
process and to create the metadata. But we save that
time now by being able to easily access our data”, Otto
explains. “The data are also used for education; not
only at TU Delft but also elsewhere, thus enhancing
collaboration. Within the European ERASMUS programme,
two students from Politecnico di Bari (Italy) who have
already been experienced to work with IDRA data came to
Delft to do parts of their MSc project with us”. The future
for the IDRA data seems bright: “We expect that in the
near future, IDRA data will be used even more in various
research projects, for example to validate and refine highresolution atmospheric simulations”.
RICARDO SEGUEL
3TU.Datacentrum is providing a great service to the
science community, being a central hub to boost
collaboration between researchers
Depositing a Virtual Machine for testing
business protocol adaptors
Ricardo Seguel (Information Systems, TU/e) has obtained
his doctoral degree designing business protocol adaptors.
Business protocols describe the order in which messages
are communicated to another partner in a business
chain. As the business protocols of each partner support
its own way of working, the business protocols can
easily mismatch. Mismatches can be resolved by using
protocol adaptors. In his dissertation, Seguel presents an
efficient, automated method to build a minimal adaptor
for two business protocols that have a behavioural
mismatch. Moreover, he identifies how protocol
adaptation can be used to support the flexible formation
of business chains. “Collaboration in business chains is
essential for organizations to be competitive in modern
markets”, Seguel underlines the importance of developing
business protocol adaptors.
“I decided to deposit a Virtual Machine with all the test
data and the prototype software tool to 3TU.Datacentrum
to make it available to anyone wishing to replay the
tests and experiments explained in my thesis”, Seguel
says. “Depositing the data in 3TU.Datacentrum was really
easy. I just pointed a link to the SHARE System where
the Virtual Machine resides and then it was copied to the
data store of 3TU.”
According to Seguel the main benefit from depositing
his data has been that anyone can validate his research
findings. “Anyone can access and check your prototype,
experiments and test data. And the data can be used
in other research too”, he adds. Seguel appreciates the
setting up of 3TU.Datacentrum: “It is a great initiative
that provides a great service to the science community
being a central hub to boost collaboration among
researchers of all countries and research fields”.
NICO SOMMERDIJK
People may use the same datasets for things we were not
looking for, thus generating new science with the same
data
Preparing for the next level: How can we
stimulate actual reuse of datasets?
Nico Sommerdijk (Department of Chemical Engineering
and Chemistry, TU/e) studies the formation and materials
properties of biominerals like calcium phosphate, calcium
carbonate and iron oxides. Biominerals often have
superior physical properties when compared to manmade materials. Many scientists want to synthesize new
materials with biosimilar properties, applying bioinspired
mineralization techniques. To be able to design such
materials, the mechanisms of biomineral formation have
to be unravelled. Sommerdijk mimics natural systems in
his laboratory and studies the mechanisms involved with
advanced electron-microscopy techniques (Cryo-TEM).
Sommerdijk has uploaded a dozen gigabytes of
experimental data to 3TU.Datacentrum, encompassing the
data for an article that his department published in the
leading journal Science. The dataset was too large for the
publisher to make it available along with the article, so
he turned to 3TU.Datacentrum. Sommerdijk: “Putting your
data online will increase the reliability of research. If
you process your data, you make changes in the original
dataset. Fellow researchers can now look back into the
steps taken and check if we did our job properly.” Also,
experimental data often contain relevant information
that the original researchers did not extract. “People may
use the same datasets for things we were not looking
for, thus generating new science with the same data”,
Sommerdijk adds. To stimulate reuse of deposited datasets,
Sommerdijk thinks 3TU.Datacentrum should proactively
advertise within the research community: “Research data
will not be reused if no one knows they are available”.
For example, he thinks researchers in countries with tight
research budgets could greatly benefit if they knew about
the existence of reusable datasets. “Once you’ve got the
data, all you need is a cool head and a fast computer for
analysing it. Even if you lack funds for pricy experiments,
you can still make a contribution to science.”
SERGEY FROLOV
In the world of open science progress will happen faster
Datasets of ‘The Dutch particle’
In 2012, the research team of Leo Kouwenhoven (TU Delft’s
Kavli Institute of Nanoscience and the FOM Foundation)
obtained first signatures of a new fundamental particle: the
Majorana fermion. The team managed to create a nanoscale
device in which a pair of Majorana fermions ‘appeared’ at
either end of a nanowire. The discovery was named amongst
the top ten breakthroughs in science in 2012 by Science
and Physics World. “The Majorana fermion has now become
famous as ‘the Dutch particle’. It is a bizarre particle because
it represents its own anti-particle and it can also be used
to build quantum computers”, Sergey Frolov describes the
relevance of its discovery. Frolov was the researcher who was
responsible for having the underlying datasets published at
3TU.Datacentrum.
Frolov: “Our research team strongly believes in openness of
the scientific process. Sharing raw experimental data is a
powerful means of enhancing the value of scientific results”.
Currently, scientists exchange images of their data as figures
printed in scientific journals. ”This is silly, since most data
are taken in digital form using computers. And readers will
most likely read the paper on their own computers”, Frolov
adds. He sees a future where raw datasets will be part of the
paper. “You can read and tweak the dataset in each figure,
rotate them in 3D, apply math to the data”, Frolov envisions.
According to Frolov depositing the data at 3TU.Datacentrum
was very easy: “The staff did everything! We chose 3TU.
Datacentrum because they are based at our university and
they were very helpful in accommodating our datasets even
though they were not in their preferred format”. The Majorana
experiment accumulated 3000 data sets, only a couple of
which made it to the paper. “Seeing more can be very useful
to for colleagues who want to think about the experiments”.
Frolov strongly supports all efforts for more openness in
science and new tools for scientists. “Our colleagues are
now able to explore our data on their own computers, study
them, scrutinize and analyse our conclusions. In fact, since
the data were published, over 100 other papers appeared in
scientific literature citing our results. Over a six month period
this is very impressive feedback. In the world of open science
progress will happen faster”, Frolov concludes.
Photo: courtesy of TU Delft
LEON OSINSKI
Researchers are much more keen to tell about their data
than about their publications
The data librarian: Your partner for depositing
your data at 3TU.Datacentrum
Leon Osinski works at the Information Expertise Center
/ Library at Eindhoven University of Technology. In the
past years, Leon spent more and more time supporting
researchers with data management and acquiring datasets
for 3TU.Datacentrum. Anticipating the fast developments
in the field, Osinski was officially appointed as a ‘data
librarian’ in May 2012. In Eindhoven, Leon is proactively
providing support for the future where depositing your
data will be just as natural as doing the research itself.
At this moment Osinski focuses on PhD students in order
to acquire the datasets belonging to their PhD projects.
He organizes tailored workshops on research data
management for researchers. “Often, researchers do not
know which part of their data they should submit to a
long- term data archive like 3TU.Datacentrum. It helps to
sit down together, leaving no data overlooked”. A close
co-operation between a researcher and a data librarian,
ensures that datasets remain understandable for years to
come.
What Osinski likes about being a data librarian is being a
pioneer. He is constantly on the outlook for new services
that may benefit the research community. What he would
like the most is that researchers call him in advance and
say ’Soon I’m going to start a research project that will
generate a lot of data that I would like to have archived
and eventually published. Can you help me find a way to
do so?’
Leon finds his inspiration in the shift of the focus
from publications to research data: “Researchers are
much more keen to tell about
their data than about their
publications. Being a data
librarian has brought me closer
to the scientific process. And I
love it”.
MARTIJN WESTHOFF
If you contact 3TU.Datacentrum the moment raw data
are available you can deliver your data in the right
format and with appropriate metadata right away
Data conversion as a service of
3TU.Datacentrum
In 2011 Martijn Westhoff earned a doctoral degree in Civil
Engineering (TU Delft) with his thesis ‘High resolution
temperature observations to identify different runoff
processes’. During his study Martijn collected hydrological
data in a river basin in Luxembourg in a project called
DARELUX (Data Archiving River Environment LUXemburg).
He measured water temperatures with a fibre optic cable
in order to identify groundwater contributions to the
stream. Together with his advisor Wim Luxemburg they
have set up the sensors and analyzed the data. With
this kind of information the generation of floods can be
better understood.
The DARELUX datasets were made available by
3TU.Datacentrum for reuse by other projects and
disciplines. Westhoff is an advocate of open data:
“Research becomes more transparent and other
researchers can verify your research or use the data
for their own research”. Since the DARELUX dataset was
deposited, a lot of data conversion has already been
performed by 3TU.Datacentrum. The dataset first was
converted to a homemade XML format. Subsequently, the
NcML (XML version of NetCDF) was created and afterwards
the dataset was converted to NetCDF. At this moment the
dataset was moved from the 3TU.Datacentrum (Fedora)
server to OPeNDAP. The conversion to XML was done
because in XML standard metadata are assigned. In this
way information about the content of the dataset is easy
to add, keeping the data comprehensible and readable
to future users. The reason for subsequently converting
the data to NetCDF is this format increases the ways to
interact with the data.
Westhoff thinks that 3TU.Datacentrum is a great
initiative. He would suggest other researchers to contact
3TU.Datacentrum the moment raw data are available: “In
this way you can deliver your data in the right format
and with appropriate metadata right away”.
WIL VAN DER AALST
I think we stand on the threshold of a development
where it is no longer acceptable to publish papers
without making datasets available.
Event logs residing at 3TU.Datacentrum allow
for testing of process mining techniques
Wil van der Aalst and Boudewijn van Dongen
(Information Systems, TU/e) address an emerging
research area called ‘process mining’. With so called event
logs the paths people follow in information systems
are discovered. These event logs are generated by
various types of systems ranging from X-ray machines to
enterprise information systems. Process mining is a dataintensive and highly empirical research area. With process
mining the event logs are analysed. These analyses
may lead to better business processes and information
systems that enhance user experience and truly support
workflows in organizations.
Van der Aalst and Van Dongen were always willing to
share or publish their datasets. Moreover, they have been
thinking of setting up their own repository for quite some
time. However, they didn’t have the manpower to set it
up. 3TU.Datacentrum made a difference by providing two
student assistants who worked on the determination of
standard and specific metadata elements of event logs
among others. A lot of customization had to be done.
Because the TU/e event logs are easily accessible and can
be referred to, they have been successfully used for the
‘Business Process Intelligence Challenge’, an international
competition for process mining techniques. ”As ‘process
mining’ is a young and emerging research area, it is
important that the datasets in the form of event logs are
made available. In this way, process-mining techniques
can be tested”, says Van der Aalst.
The datasets have been assigned a digital identifier DOI,
which makes the event logs easily found and citable by
colleagues in the Netherlands or abroad. “The usefulness
of depositing our event logs at 3TU.Datacentrum is
evident. The many emails with requests for event logs
can easily be answered by giving the DOI of the dataset”,
says Van der Aalst. “I think we stand on the threshold
of a development where it is no longer acceptable to
publish papers without making datasets available. Thanks
to 3TU.Datacentrum we can be a frontrunner”.
3TU.Datacentrum:
Making your research output visible
Dear researcher,
We at the 3TU.Datacentrum are proud to be your facility
for your research output. Over thousands of datasets
are hosted at 3TU. Datacentrum up to now. Alongside
the long-term archive and permanent access to your
data we offer different data services including hands on
assistance with grant applications and training. With the
data labs we support data management of data-intensive
research. This is especially useful in international
collaborations enabling all the participants in a research
project to work on the data, anytime, anywhere.
We supply digital tools that connect to your digital
working environment and advise you on the sustainable
management of your data. At the end of the research
project, the complete datasets can easily be integrated
in the data archive. Your data will be clearly described
with citation information. When depositing your data
at 3TU.Datacentrum a digital object identifier (DOI) can
be assigned, which allows research data to be cited. In
this way the authors of the research data get the credit
they deserve. Your data are kept safe and accessible to
others thus increasing your visibility in the scientific
community.
Jeroen Rombouts, Director of 3TU.Datacentrum
Your notes:
Do you wish your dataset to be shown here in the next edition? Let us know!
Text & Photography: Verbeeldingskr8 | Concept and editing: Alenka Prinčič, 3TU.Datacentrum | Lay-out & Printing: Edauw+Johannissen BV | 2014