Rebecca Wilson, University of Bristol
Transcription
Rebecca Wilson, University of Bristol
DataSHIELD Taking the analysis to the data Dr Becca Wilson D2K Research Group, University of Bristol McGill University, OICR, Maelstrom Research The Norwegian Institute of Public Health, Dept of Epidemiology MRC Epidemiology Unit, Cambridge Eindhoven Technical University F1000 Research Journal @Data2Knowledge @drbeccawilson #d2kDatashield Rationale Data access-analysis barriers in any discipline result from a range of scenarios: • Ethical-legal or governance restrictions surrounding confidentiality/disclosure of confidential data • Maintaining control intellectual property • Physical size of the data DataSHIELD provides a flexible, modular, open-source solution ideally placed to grow a broad user and development community @Data2Knowledge @drbeccawilson #d2kDatasheld 2 The DataSHIELD Approach • DataSHIELD born from requirement in biomedical & social sciences to co-analyse individual patient data from different sources, without disclosing identity or sensitive information • Under DataSHIELD, raw data never leaves the data provider, only non-disclosive summary statistics returned to the researcher • Researcher able to do the analysis themselves using R • The analysis is taken to the data – not the data to the analysis @Data2Knowledge @drbeccawilson #d2kDatasheld 3 Example Infrastructure @Data2Knowledge @drbeccawilson #d2kDatasheld 4 Example Infrastructure @Data2Knowledge @drbeccawilson #d2kDatasheld 5 Example Infrastructure @Data2Knowledge @drbeccawilson #d2kDatasheld 6 Example Infrastructure Includes R parser @Data2Knowledge @drbeccawilson #d2kDatasheld 7 Example Infrastructure @Data2Knowledge @drbeccawilson #d2kDatasheld 8 Example Infrastructure @Data2Knowledge @drbeccawilson #d2kDatasheld 9 DataSHIELD Status • DataSHIELD methodology and infrastructure proven • Gaye, A. et al (2014). DataSHIELD: taking the analysis to the data, not the data to the analysis.International Journal of Epidemiology • Jones, EM et al (2012). DataSHIELD – shared individual-level analysis without sharing data: a biostatistical perspective.Norwegian Journal of Epidemiology • @Data2Knowledge @drbeccawilson #d2kDatasheld 10 DataSHIELD Status • Current functionality http://www.datashield.ac.uk/latest-release/ - descriptive stats (e.g. mean) - exploratory stats (e.g. histogram) - contingency tables (e.g. 1D and 2D) - modelling (survival analysis using piecewise exponential regression, glm) • Currently enhancing existing functions, developing further modeling tools (glmm), exploring other datasets genomics, geospatial, text @Data2Knowledge @drbeccawilson #d2kDatasheld 11 DataSHIELD Status • Current pilot phase in in 10 European studies (www.bioshare.eu): – Healthy Obese Project – Environmental Core Project: effects of environmental exposures on cardio-respiratory and mental health in European adults @Data2Knowledge @drbeccawilson #d2kDatasheld 12 DataSHIELD status @Data2Knowledge @drbeccawilson #d2kDatasheld 13 DataSHIELD Future • Support and Training: Historically grant funded – investigating alternative funding models – Free: support from wiki • http://www.datashield.ac.uk/wiki • http://wiki.obiba.org/display/CAG/Home – User Support: Access to support forum • www.datashield.ac.uk/forum • code, questions, trouble shoot error messages • complex questions may require a DataSHIELD developer access permissions to the portal in order to replicate the error @Data2Knowledge @drbeccawilson #d2kDatasheld 14 DataSHIELD Future – Consortium Support: for infrastructure and users • Support implementation of Opal/DataSHIELD • Design / develop new packages or functionality • Opal monitoring system: users and data providers can see – When/which Opal servers are down – Clues as to why they are down (memory, load etc) – Alerts sent to 2 designated people at each data provider @Data2Knowledge @drbeccawilson #d2kDatasheld 15 DataSHIELD Future • Training: Can provide training on: – Introduction to R (assuming statistical knowledge) [1 day] – Introduction to DataSHIELD (assuming R and stats knowledge) [1-2 days] – Developer workshop [3-4 days] @Data2Knowledge @drbeccawilson #d2kDatasheld 16 DataSHIELD Future • Broadening use: – Applications outside academia e.g. academic publishing, university data repositories etc • Different types of DataSHIELD: – Single site DataSHIELD – Vertical DataSHIELD @Data2Knowledge @drbeccawilson #d2kDatasheld 17 Further Info Any Questions? www.datashield.ac.uk @Data2Knowledge @drbeccawilson #d2kDatasheld 18 Guidelines for Data Providers • Hardware requirements: – Server or VM to install Opal plus database server (mongodb or mysql) to hold the data e.g. NCDS BioSHaRE has: • 2 vCPU 2.6GHz • 4GB RAM • 20GB disk space (for db server) – Authorise Opal to receive/send comms via web services (through your firewall) to DataSHIELD portal @Data2Knowledge @drbeccawilson #d2kDatasheld 19 Guidelines for Data Providers • Operations: Designate 2 people to maintain Opal and DataSHIELD. Responsible for: – Installing / setting up Opal and DataSHIELD – Joining the DataSHIELD/Opal mailing list – Maintaining software updates – Resilience of the data service – Transparency about the disclosure level @Data2Knowledge @drbeccawilson #d2kDatasheld 20 Guidlines for Client Portal • Client Portal: Designate 2 people to maintain the DataSHIELD client portal. Responsible for: – Installing / setting up DataSHIELD client portal – Joining the DataSHIELD/Opal mailing list – Maintaining software updates – Resilience of the client portal
Similar documents
Interesting facts of Birthday stone —Opal Stone
Opal online at gemsngems.com where we sell the best quality original and natural Opal stone online for purchase at best price. Order our Ethiopian Opal for under $100. We have available in Round Cabochon and Oval Cabochon.
More information