A brochure - 3TU.Datacentrum
Transcription
A brochure - 3TU.Datacentrum
3TU.Datacentrum Showcasing 3TU.Datasets This brochure shows some examples of what 3TU.Datacentrum can do for researchers and their research data. Take a look and read the stories of your fellow researchers. Many more interesting datasets are available; browse our datacentre to discover data that you can perhaps reuse in your own research. Contact us • If you wish to deposit a single dataset in 3TU.Datacentrum you can easily use the self-upload facility. • If you have complex datasets that require customised solutions, please contact us. • If you have any questions about data management, digital object identifiers (DOIs) or data citation, we’re looking forward to helping you! We offer services during the whole research data lifecycle and also frequently organize symposia, seminars, training and workshops. 3TU.Datacentrum - Prometheusplein 1, NL - 2628 ZC Delft P.O. Box 98, NL - 2600 MG Delft - +31 (0)15 27 88 600 [email protected] datacentrum.3tu.nl - data.3tu.nl - datacite.tudelft.nl BAS WOLS I think increasing your citation score will apply in the long term, once a dataset becomes more and more recognized as an official publication entity Depositing research data underlying a dissertation In 2010, Bas Wols received a doctoral degree in Civil Engineering at TU Delft. He used computational fluid dynamics (CFD) to study the hydraulics in ozone and UV drinking water treatment systems. His work resulted in a modification of an ozone installation in Amsterdam (Waternet), improving water disinfection. Also, his simulations have shed light on energy-efficient UVfiltering of drinking water. During his research project, a colleague drew Wols’ attention to 3TU.Datacentrum. His main reason for depositing his data in a data repository was to give other researchers access. “I would have profited from having such data available during my own research. Because data sharing isn’t main stream yet, I had to extract data from scientific articles manually. This is neither efficient nor accurate”. Wols’ data consist of measurements, modelling data and movies, linked together in one dataset. “I have deposited all data which underlie my research. In this way researchers can in theory replicate what I have done”. Depositing his data at 3TU.Datacentrum was quite easy, Wols says. “The 3TU.Datacentrum employees outlined their ambition of giving access to research data clearly. I had to transform my data into a NetCDF-format. This format allows for downloading parts of the dataset and querying it. Then, I brought a hard disk to the staff of the 3TU.Datacentrum and my data were put online”. To increase citability, digital object identifiers (DOIs) were assigned to his research data as well as to his dissertation. “I think increasing your citation score will apply in the long term, once a dataset becomes more and more recognized as an official publication entity”. TOBIAS OTTO It has taken some time to set up the data archiving process and to create the metadata. But we save that time now by being able to easily access our data Benefiting from a continuously updated time series of climate data Tobias Otto and Herman Russchenberg (Atmospheric Remote Sensing, TU Delft) take part in the CESAR consortium (Cabauw Experimental Site for Atmospheric Research). This consortium operates a large set of instruments to study the atmosphere and its interaction with the land surface. One of these instruments is the TU Delft IRCTR drizzle radar, IDRA in short. It is located on top of the 213 metre high tower, located next to the village of Cabauw between Gouda and Utrecht. IDRA measures drizzle (very fine rain), precipitation, and low clouds with a high spatial and temporal resolution. The long-term operation of IDRA will support the CESAR objective to monitor trends in atmospheric changes. The IDRA weather radar measurements consist of a large time series of numerical data. The first dataset dates from April 2009 and is continuously updated. To ensure longevity and easy access, the datasets were stored at 3TU.Datacentrum. The data are stored in NetCDF- format and reside on an OPeNDAP server. The dataset is enriched with metadata to make the dataset self-explanatory. The datasets are freely available and are easily accessible to the users. In 2010 the IDRA dataset was given a digital object identifier (DOI). Due to this permanent link, the IDRA dataset is much easier to find on the digital highway. “IDRA is now more visible to the scientific community”, Otto says. “It has taken some time to set up the data archiving process and to create the metadata. But we save that time now by being able to easily access our data”, Otto explains. “The data are also used for education; not only at TU Delft but also elsewhere, thus enhancing collaboration. Within the European ERASMUS programme, two students from Politecnico di Bari (Italy) who have already been experienced to work with IDRA data came to Delft to do parts of their MSc project with us”. The future for the IDRA data seems bright: “We expect that in the near future, IDRA data will be used even more in various research projects, for example to validate and refine highresolution atmospheric simulations”. RICARDO SEGUEL 3TU.Datacentrum is providing a great service to the science community, being a central hub to boost collaboration between researchers Depositing a Virtual Machine for testing business protocol adaptors Ricardo Seguel (Information Systems, TU/e) has obtained his doctoral degree designing business protocol adaptors. Business protocols describe the order in which messages are communicated to another partner in a business chain. As the business protocols of each partner support its own way of working, the business protocols can easily mismatch. Mismatches can be resolved by using protocol adaptors. In his dissertation, Seguel presents an efficient, automated method to build a minimal adaptor for two business protocols that have a behavioural mismatch. Moreover, he identifies how protocol adaptation can be used to support the flexible formation of business chains. “Collaboration in business chains is essential for organizations to be competitive in modern markets”, Seguel underlines the importance of developing business protocol adaptors. “I decided to deposit a Virtual Machine with all the test data and the prototype software tool to 3TU.Datacentrum to make it available to anyone wishing to replay the tests and experiments explained in my thesis”, Seguel says. “Depositing the data in 3TU.Datacentrum was really easy. I just pointed a link to the SHARE System where the Virtual Machine resides and then it was copied to the data store of 3TU.” According to Seguel the main benefit from depositing his data has been that anyone can validate his research findings. “Anyone can access and check your prototype, experiments and test data. And the data can be used in other research too”, he adds. Seguel appreciates the setting up of 3TU.Datacentrum: “It is a great initiative that provides a great service to the science community being a central hub to boost collaboration among researchers of all countries and research fields”. NICO SOMMERDIJK People may use the same datasets for things we were not looking for, thus generating new science with the same data Preparing for the next level: How can we stimulate actual reuse of datasets? Nico Sommerdijk (Department of Chemical Engineering and Chemistry, TU/e) studies the formation and materials properties of biominerals like calcium phosphate, calcium carbonate and iron oxides. Biominerals often have superior physical properties when compared to manmade materials. Many scientists want to synthesize new materials with biosimilar properties, applying bioinspired mineralization techniques. To be able to design such materials, the mechanisms of biomineral formation have to be unravelled. Sommerdijk mimics natural systems in his laboratory and studies the mechanisms involved with advanced electron-microscopy techniques (Cryo-TEM). Sommerdijk has uploaded a dozen gigabytes of experimental data to 3TU.Datacentrum, encompassing the data for an article that his department published in the leading journal Science. The dataset was too large for the publisher to make it available along with the article, so he turned to 3TU.Datacentrum. Sommerdijk: “Putting your data online will increase the reliability of research. If you process your data, you make changes in the original dataset. Fellow researchers can now look back into the steps taken and check if we did our job properly.” Also, experimental data often contain relevant information that the original researchers did not extract. “People may use the same datasets for things we were not looking for, thus generating new science with the same data”, Sommerdijk adds. To stimulate reuse of deposited datasets, Sommerdijk thinks 3TU.Datacentrum should proactively advertise within the research community: “Research data will not be reused if no one knows they are available”. For example, he thinks researchers in countries with tight research budgets could greatly benefit if they knew about the existence of reusable datasets. “Once you’ve got the data, all you need is a cool head and a fast computer for analysing it. Even if you lack funds for pricy experiments, you can still make a contribution to science.” SERGEY FROLOV In the world of open science progress will happen faster Datasets of ‘The Dutch particle’ In 2012, the research team of Leo Kouwenhoven (TU Delft’s Kavli Institute of Nanoscience and the FOM Foundation) obtained first signatures of a new fundamental particle: the Majorana fermion. The team managed to create a nanoscale device in which a pair of Majorana fermions ‘appeared’ at either end of a nanowire. The discovery was named amongst the top ten breakthroughs in science in 2012 by Science and Physics World. “The Majorana fermion has now become famous as ‘the Dutch particle’. It is a bizarre particle because it represents its own anti-particle and it can also be used to build quantum computers”, Sergey Frolov describes the relevance of its discovery. Frolov was the researcher who was responsible for having the underlying datasets published at 3TU.Datacentrum. Frolov: “Our research team strongly believes in openness of the scientific process. Sharing raw experimental data is a powerful means of enhancing the value of scientific results”. Currently, scientists exchange images of their data as figures printed in scientific journals. ”This is silly, since most data are taken in digital form using computers. And readers will most likely read the paper on their own computers”, Frolov adds. He sees a future where raw datasets will be part of the paper. “You can read and tweak the dataset in each figure, rotate them in 3D, apply math to the data”, Frolov envisions. According to Frolov depositing the data at 3TU.Datacentrum was very easy: “The staff did everything! We chose 3TU. Datacentrum because they are based at our university and they were very helpful in accommodating our datasets even though they were not in their preferred format”. The Majorana experiment accumulated 3000 data sets, only a couple of which made it to the paper. “Seeing more can be very useful to for colleagues who want to think about the experiments”. Frolov strongly supports all efforts for more openness in science and new tools for scientists. “Our colleagues are now able to explore our data on their own computers, study them, scrutinize and analyse our conclusions. In fact, since the data were published, over 100 other papers appeared in scientific literature citing our results. Over a six month period this is very impressive feedback. In the world of open science progress will happen faster”, Frolov concludes. Photo: courtesy of TU Delft LEON OSINSKI Researchers are much more keen to tell about their data than about their publications The data librarian: Your partner for depositing your data at 3TU.Datacentrum Leon Osinski works at the Information Expertise Center / Library at Eindhoven University of Technology. In the past years, Leon spent more and more time supporting researchers with data management and acquiring datasets for 3TU.Datacentrum. Anticipating the fast developments in the field, Osinski was officially appointed as a ‘data librarian’ in May 2012. In Eindhoven, Leon is proactively providing support for the future where depositing your data will be just as natural as doing the research itself. At this moment Osinski focuses on PhD students in order to acquire the datasets belonging to their PhD projects. He organizes tailored workshops on research data management for researchers. “Often, researchers do not know which part of their data they should submit to a long- term data archive like 3TU.Datacentrum. It helps to sit down together, leaving no data overlooked”. A close co-operation between a researcher and a data librarian, ensures that datasets remain understandable for years to come. What Osinski likes about being a data librarian is being a pioneer. He is constantly on the outlook for new services that may benefit the research community. What he would like the most is that researchers call him in advance and say ’Soon I’m going to start a research project that will generate a lot of data that I would like to have archived and eventually published. Can you help me find a way to do so?’ Leon finds his inspiration in the shift of the focus from publications to research data: “Researchers are much more keen to tell about their data than about their publications. Being a data librarian has brought me closer to the scientific process. And I love it”. MARTIJN WESTHOFF If you contact 3TU.Datacentrum the moment raw data are available you can deliver your data in the right format and with appropriate metadata right away Data conversion as a service of 3TU.Datacentrum In 2011 Martijn Westhoff earned a doctoral degree in Civil Engineering (TU Delft) with his thesis ‘High resolution temperature observations to identify different runoff processes’. During his study Martijn collected hydrological data in a river basin in Luxembourg in a project called DARELUX (Data Archiving River Environment LUXemburg). He measured water temperatures with a fibre optic cable in order to identify groundwater contributions to the stream. Together with his advisor Wim Luxemburg they have set up the sensors and analyzed the data. With this kind of information the generation of floods can be better understood. The DARELUX datasets were made available by 3TU.Datacentrum for reuse by other projects and disciplines. Westhoff is an advocate of open data: “Research becomes more transparent and other researchers can verify your research or use the data for their own research”. Since the DARELUX dataset was deposited, a lot of data conversion has already been performed by 3TU.Datacentrum. The dataset first was converted to a homemade XML format. Subsequently, the NcML (XML version of NetCDF) was created and afterwards the dataset was converted to NetCDF. At this moment the dataset was moved from the 3TU.Datacentrum (Fedora) server to OPeNDAP. The conversion to XML was done because in XML standard metadata are assigned. In this way information about the content of the dataset is easy to add, keeping the data comprehensible and readable to future users. The reason for subsequently converting the data to NetCDF is this format increases the ways to interact with the data. Westhoff thinks that 3TU.Datacentrum is a great initiative. He would suggest other researchers to contact 3TU.Datacentrum the moment raw data are available: “In this way you can deliver your data in the right format and with appropriate metadata right away”. WIL VAN DER AALST I think we stand on the threshold of a development where it is no longer acceptable to publish papers without making datasets available. Event logs residing at 3TU.Datacentrum allow for testing of process mining techniques Wil van der Aalst and Boudewijn van Dongen (Information Systems, TU/e) address an emerging research area called ‘process mining’. With so called event logs the paths people follow in information systems are discovered. These event logs are generated by various types of systems ranging from X-ray machines to enterprise information systems. Process mining is a dataintensive and highly empirical research area. With process mining the event logs are analysed. These analyses may lead to better business processes and information systems that enhance user experience and truly support workflows in organizations. Van der Aalst and Van Dongen were always willing to share or publish their datasets. Moreover, they have been thinking of setting up their own repository for quite some time. However, they didn’t have the manpower to set it up. 3TU.Datacentrum made a difference by providing two student assistants who worked on the determination of standard and specific metadata elements of event logs among others. A lot of customization had to be done. Because the TU/e event logs are easily accessible and can be referred to, they have been successfully used for the ‘Business Process Intelligence Challenge’, an international competition for process mining techniques. ”As ‘process mining’ is a young and emerging research area, it is important that the datasets in the form of event logs are made available. In this way, process-mining techniques can be tested”, says Van der Aalst. The datasets have been assigned a digital identifier DOI, which makes the event logs easily found and citable by colleagues in the Netherlands or abroad. “The usefulness of depositing our event logs at 3TU.Datacentrum is evident. The many emails with requests for event logs can easily be answered by giving the DOI of the dataset”, says Van der Aalst. “I think we stand on the threshold of a development where it is no longer acceptable to publish papers without making datasets available. Thanks to 3TU.Datacentrum we can be a frontrunner”. 3TU.Datacentrum: Making your research output visible Dear researcher, We at the 3TU.Datacentrum are proud to be your facility for your research output. Over thousands of datasets are hosted at 3TU. Datacentrum up to now. Alongside the long-term archive and permanent access to your data we offer different data services including hands on assistance with grant applications and training. With the data labs we support data management of data-intensive research. This is especially useful in international collaborations enabling all the participants in a research project to work on the data, anytime, anywhere. We supply digital tools that connect to your digital working environment and advise you on the sustainable management of your data. At the end of the research project, the complete datasets can easily be integrated in the data archive. Your data will be clearly described with citation information. When depositing your data at 3TU.Datacentrum a digital object identifier (DOI) can be assigned, which allows research data to be cited. In this way the authors of the research data get the credit they deserve. Your data are kept safe and accessible to others thus increasing your visibility in the scientific community. Jeroen Rombouts, Director of 3TU.Datacentrum Your notes: Do you wish your dataset to be shown here in the next edition? Let us know! Text & Photography: Verbeeldingskr8 | Concept and editing: Alenka Prinčič, 3TU.Datacentrum | Lay-out & Printing: Edauw+Johannissen BV | 2014