View the PDF - EPCC - University of Edinburgh
Transcription
View the PDF - EPCC - University of Edinburgh
The newsletter of EPCC, the supercomputing centre at the University of Edinburgh news Issue 79 Summer 2016 Best in breed Novel computing brings innovation to the farmyard Fortissimo: better design through HPC In this issue Large Synoptic Survey Telescope: data-intensive research in action Creating a safe haven for health data Join us at EuroMPI! From the Director Regular readers will notice that this welcome is now in the singular rather than the plural. I’d also like to welcome our newest avid reader, Alison Kennedy, the new Director of the Science and Technology Facilities Council’s Hartree Centre. In all seriousness, I wish my friend and long-time colleague Alison all the best in her new role, which she took up at the start of April. Alison has been a key member of EPCC staff for over 20 years and we all wish her well. I hope this issue of EPCC News conveys the huge breadth of EPCC’s activities. Many scientific users know us mainly through the ARCHER service (and all of the previous national HPC services that we have run). ARCHER has never been busier – indeed we are well aware of the challenges we face in terms of meeting the expectations of our many users. Busy systems mean long queuing times – a clear argument for more Government investment in national HPC services. However, any new investment will take time and this doesn’t help the immediate needs of our many users who face long turn-around times. But EPCC does much more than supply national HPC services. I hope that you find the articles on our many and varied industry projects interesting. As the Fortissimo project is showing, the need for HPC by Europe’s small to medium sized enterprises (SMEs) continues to grow. With 122 partners this is our most complex EC project but also only scratches the surface of the potential demand. We estimate more than 30,000 SMEs could easily benefit from HPC in Europe today. The DiHPC project provides a set of resources and tools to help the HPC community engender change, implement best practice and champion diversity. 5 A scaleable, extensible data infrastructure In partnership with industry 6 Working for Scottish industry HPC on demand 8 Easier HPC adoption Supporting Scottish SMEs 12 Measuring power in parallel technologies Adept project achievements 14 Data-intensive research in action Preparing for the Large Synoptic Survey Telescope 16 A safe haven for UK health data Health informatics 18 Exploiting parallelism for research The INTERTWinE project 19 Spreading the word ARCHER Champions Faces of HPC is a series of stories about people who represent the diversity of the HPC community, championing role models for an inclusive culture. 20 Software skills for researchers Software Carpentry workshops Diversity in HPC is a project developed by EPCC and funded through the UK national supercomputing facility ARCHER and the EPSRC. 22 UK Many-Core Developer Conference UKMAC 2016 review 23 HPC outreach EPCC at the Big Bang Fair And if you want further proof of the breadth of HPC look to our example on page 11 – which also explains the attractive Scottish beast on the cover! Mark Parsons EPCC Director [email protected] Read more about our work www.hpc-diversity.ac.uk [email protected] +44 (0)131 650 5030 EPCC is a supercomputing centre based at The University of Edinburgh, which is a charitable body registered in Scotland with registration number SC005336. 2 Better design for business Fortissimo helps European SMEs access HPC services Generating impact from research Speeding innovation from research to industry Contact us www.epcc.ed.ac.uk 4 10 The Diversity in HPC (DiHPC) project is working to showcase the diversity of talent in the high performance computing community. No personal characteristic should be a barrier to participating in HPC. This includes disability, sexuality, sex, ethnicity, gender identity, age, religion, culture, pregnancy and family. Contents Join us at EuroMPI! Now in its 23rd year, EuroMPI is the leading conference for users and developers of MPI. Join us in the beautiful city of Edinburgh for 4 days of networking, discussion and skills building. The theme of this year’s conference is “Modern Challenges to MPI’s dominance in HPC”. Through the presentation of contributed papers, posters and invited talks, attendees will have the opportunity to share ideas and experiences and to contribute to the improvement and furthering of message-passing and related parallel programming paradigms. representation of women comes from a recognition that diversity fosters innovation and is a benefit to society. Our partnership with WHPC will help us ensure that the conference is accessible and welcoming to all and will encourage us to challenge the status quo. A panel will discuss the challenges facing MPI, what the HPC community needs from MPI to adapt for the future, and if MPI will survive as we proceed to Exascale and beyond. Keynotes In addition to the conference’s main technical program, one-day and half-day tutorials will be held. Women in HPC partnership We are delighted to announce that EuroMPI 2016 will be the first conference to work in partnership with Women in HPC (WHPC). EuroMPI 2016 is committed to helping broaden diversity in the HPC community and beyond. This commitment to diversity and in particular to addressing the under- Dan Holmes [email protected] EuroMPI 2016 will host an exciting programme of keynotes from across the MPI community discussing the pros and cons of using MPI and the challenges we face. The speakers for this year’s conference include: How can MPI fit into today’s big computing? Jonathan Dursi, Ontario Institute for Cancer Research. MPI: The once and future king Bill Gropp, University of Illinois in Urbana-Champaign The MPI Tools Information Interface Kathryn Mohror, Lawrence Livermore National Laboratory. HPC’s a-changing, so what happens to everything we know? David Lecomber, Allinea Software. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh EuroMPI will run from 25-28 September in Edinburgh, UK. Registration www.eurompi2016.ed.ac.uk/ registration Early bird registration closes 22 July 2016 For more information on the partnership with WHPC, see www.eurompi2016.ed.ac.uk/ diversity 3 Koenigsegg’s the One:1, developed with the assistance of Fortissimo. Image Julia LaPalme. Fortissimo Better design through HPC A consortium of Europe’s leading supercomputing centres and HPC experts is developing the Fortissimo Marketplace, a one-stop-shop where end users will access modelling and simulation services, plus high-performance data analytics. The advantages of using high performance computing (HPC) in modelling and simulation are well established. However it has proved more difficult for small companies to gain these benefits compared to larger ones. This is typically because the cost of entry has been too high: only larger companies have been able to afford the investments required to buy and run HPC systems, and to provide the necessary expertise to use them. Fortissimo has learned from the success of cloud computing by offering HPC on-demand as a pay-per-use service, so removing any need for the end-users to buy and run their own systems. This dramatically reduces the risk for first-time users of HPC, and allows users to cut costs since they only pay for the resources they use. Access to HPC experts through the Fortissimo Marketplace helps to get users up and running more quickly. The Marketplace development is being driven by over 50 experiments involving various types of stakeholders such as end-users, simulation service providers and software vendors. The experiments are designed to prove the concept of the Fortissimo Marketplace and 4 determine what features it should have. They will also provide an initial set of services that will be available through the Marketplace. Mark Sawyer [email protected] One of the services that will be offered through Fortissimo is being developed by Dublin City University and EPCC. It uses discrete event simulation to model the operation of industrial manufacturing processes, allowing them to be optimised to improve business performance. Fortissimo has attracted much attention from industry, with a number of software and service companies interested in joining the Marketplace. HPCwire awarded it Best Use of HPC in the Cloud in both Readers’ and Editors’ Choice categories at the Supercomputing 2015 conference. Evaluating the many possible scenarios requires a lot of computing power. The benefit of the Fortissimo approach is that the experts at discrete event simulation do not need to own or run their own HPC system, they simply access the systems in the Fortissimo HPC cloud when they need to. This allows them to focus on their area of expertise and to build up a scalable business. A prototype of the Marketplace was released a few months ago and work has been continuing to validate the approach and add features. An updated version, intended to fully support the services being developed by the experiments, will be launched in the near future. Fortissimo is a collaborative project that enables European SMEs to be more competitive globally through the use of simulation services running on an HPC cloud infrastructure. www.fortissimo-project.eu Fortissimo Marketplace www.fortissimo-marketplace.com Industry partnership: building a scaleable, extensible data infrastructure Modern genome-sequencing technologies are easily capable of producing data volumes that can swamp a genetic researcher’s existing computing infrastructure. EPCC is working with the breeding company Aviagen to build a system that allows such researchers to scale up their data infrastructures to handle these increases in volume without compromising their analytical pipelines. calculation. HDF5 is a data model, library, and file format for storing and managing data. It is designed for flexible and efficient I/O and for high volume and complex data. To achieve the desired scalability and reliability, the system uses a distributed columnar database where the data is replicated across a number of compute and data nodes. More compute nodes and storage can be easily added as the data volumes increase without affecting the analyses that operate on the data. The pipelines use Aviagen’s inhouse queue management framework to exploit parallelism by distributing the tasks across a set of available heterogeneous compute nodes. Using this parallel framework, we are implementing a bespoke task library that provides basic functionality (such as matrix multiplication) so that a researcher only need plug together the various analytical operations they require. The framework deals with managing the distribution of the parallel tasks, dependencies between tasks, and management of the distributed data. The analytics code had to be re-written and parallelised to allow it to scale up as the volume of data increases. The new analytical pipelines operate on HDF5 extracts from the data store, with the data filtered at this stage to only include the data relevant to the subsequent This system combining the columnar database and the parallel analytics library will allow data archiving and data processing in a scalable, extensible manner. Aviagen will be able to add more data analysis functionality as needed. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh Eilidh Troup [email protected] Amy Krause [email protected] “The collaboration with EPCC promises to give us the ability to handle increasingly large amounts of data.” Andreas Kranis, Research Geneticist, Aviagen Aviagen http://en.aviagen.com 5 HPC: working for Scottish industry Through EPCC, Scottish company Global Surface Intelligence is using HPC to make global forestry and crop yield growth predictions. Ronnie Galloway of GSi explains how. Global Surface Intelligence (GSi) is an Edinburgh-based company with a global reach and expertise in using Earth Observation (EO) via satellite images to determine commercially valuable information on forest growth and crop yield predictions. GSi has developed bespoke machine-learning software that learns to recognise what satellites are “seeing” when covering important global assets such as forestry and crops. By understanding what is on the ground, often across vast areas, GSi provides agri-investors, forestry management and investment companies, farmers and crop traders and government with invaluable ongoing insight into the value of those forests or crops. The software developed by GSi is estimated to be 100,000 times quicker than other similar software when run on conventional non-HPC systems; the exponential benefits of running GSi software on HPC machines are huge. GSi first partnered with EPCC in 2013 through the Scottish Enterprise funded ‘Supercomputing Scotland’ programme which provided GSi with funding for EPCC expertise and compute power to parallelise and integrate GSi’s 6 software with HPC. Currently in 2016, the company has a commercial relationship with EPCC and enjoys the benefits of access to 1,500 cores on INDY (see opposite page), allowing GSi to run different jobs simultaneously. The vast data ingested by the GSi-Platform is stored efficiently at EPCC. The data and compute infrastructures are co-located at EPCC’s Advanced Computing Facility. Ronnie Galloway, Consultant, Global Surface Intelligence Ltd [email protected] High bandwidth, low latency interconnect reduces the need for copy-managing the data through other means. This presents a huge commercial advantage to GSi in reducing time and effort to provide EO analysis of land assets. At all times, EPCC provides expertise and advice to GSi in maximising efficiencies of using HPC in EO and big data management. The GSi relationship with the University of Edinburgh extends to a Data Lab-funded project in 2016 employing a PhD student from the School of Geosciences related to the crop yield aspect of the business. Allied to the close working relationship between EPCC and GSi, this typifies a relevant and vital collaboration between industry and academia that helps a local SME tackle truly global challenges. GSi www.surfaceintelligence.com The Data Lab www.thedatalab.com All images courtesy of GSi. On-demand HPC for industry The INDY service used by GSi (see opposite) is part of EPCC’s on-demand computing service Accelerator, which brings supercomputing capability straight to our clients’ desktops. Through a simple internet connection they gain cost-effective access to an impressive range of high-performance computing (HPC) resources including ARCHER, the national HPC service. Accelerator is targeted at engineers and scientists solving complex simulation and modelling problems in fields such as bioinformatics, computational biology, computational chemistry, computational fluid dynamics, finite element analysis, life sciences and earth sciences. The Accelerator model provides direct access to HPC platforms delivering the highest levels of performance. Unlike cloud-based services, no inefficient virtualisation techniques are deployed. The highest levels of data security are provided, and the service is administered directly by the client using a range of administration and reporting functions. The service is fully supported with an integral help desk. EPCC support staff are available to help with usage problems such as compiling codes and running jobs. INDY is a dual configuration LinuxWindows HPC cluster aimed at industrial users from the scientific and engineering communities who require on-demand access to mid-range, industry standard, HPC. The system comprises 24 back-end nodes and two front-end, login nodes utilising high performance, low latency interconnect. There are four AMD Opteron, 16-core processors per node giving 64 cores per node, and 1536 cores in total. As standard, each back-end node has 256Gbyte of shared RAM, with two large memory back-end nodes configured with 512 Gbyte RAM to support applications which have a requirement for a larger shared memory resource. The system has support for future installation of up to two GPGPUs cards (NVIDIA or AMD) per node. INDY utilises IBM’s industry leading Platform HPC cluster management software, providing job-level dynamic provisioning of compute nodes into either Windows or Linux depending on a user’s specific O/S requirement. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh George Graham: [email protected] On-demand at EPCC To discuss our on-demand Accelerator service, contact George Graham: [email protected] +44 (0) 131 651 3460 or +44 (0) 777 370 8191. www.epcc.ed.ac.uk/facilities/ demand-computing 7 SHAPE: Making HPC adoption easier for SMEs It can be challenging for SMEs to adopt HPC. They may have no in-house expertise, no access to hardware, or be unable to commit resources to a potentially risky endeavour. This is where SHAPE comes in, by making it easier for SMEs to make use of high-performance computing in their business, whether to improve product quality, reduce time to delivery or provide innovative new services to their customers. Successful applicants to the SHAPE programme get effort from a PRACE HPC expert and access to machine time at a PRACE centre. In collaboration with the SME, the PRACE partner helps them try out their ideas for utilising HPC to enhance their business. So far, SHAPE has assisted over 20 SMEs (see the project website for examples), and the third call for applications has just closed, so more companies will benefit from this enabling programme. SHAPE will continue in the next phase of PRACE, and the plan is to have six-monthly calls (the next opens in June 2016), giving ample opportunity for SMEs to investigate what HPC can do for their business. Albatern: producing power from waves Albatern is an innovative Scottish SME of 15 engineers. Its wave power generation product consists of buoyant three-armed Squid modules which can link with up to three other Squids. The Squid modules and their link-arms contain mechanisms to generate power, capturing the heave and surge motion of the waves via hydraulics. In this way Albatern has developed a highly scalable, modular wave power 8 generator. Albatern’s project, supported by SHAPE, marked the start of the development of a physics code capable of simulating and predicting the power of a large-scale Wavenet array (100 or more devices). Wave energy prototypes are large, expensive and funded through risk capital. As a result, prototype simulation also forms an essential part of the device design process. To progress beyond the limitations of current, commercially available software, it was proposed to construct a new, modular solver capable of capturing the behaviour of large-scale Wavenet arrays. Through SHAPE and with the support of PRACE experts, Albatern has prototyped a parallel multibody dynamics solver, using the PETSc open source numerical library and scaled out on ARCHER, the CRAY XC30 hosted by EPCC. “Simulations demonstrating the potential cost and performance improvements gained through deploying extremely large, coupled wave energy arrays will be a breakthrough for the industry,” says Dr William Edwards of Albatern. “PRACE has helped Albatern develop inhouse software that will directly aid expanding the scope of their simulation capability. Albatern is now in a position to write a Paul Graham, EPCC [email protected] SHAPE (SME HPC Adoption Programme in Europe) is a pan-European initiative supported by the PRACE (Partnership for Advanced Computing in Europe) project. The Programme aims to raise awareness and provide European SMEs with the expertise necessary to take advantage of the innovation possibilities created by highperformance computing (HPC), thus increasing their competitiveness. The programme allows SMEs to benefit from the expertise and knowledge developed within the top-class PRACE Research Infrastructure. multibody dynamics code that will share common parts of the simulation procedure, allowing interchange of either the simultaneous or sequential methods.” NEXIO: Amping up electromagnetic modelling NEXIO SIMULATION is a French SME that develops electromagnetic simulation software called CAPITOLE-EM to study the electromagnetic behaviour of any product during the design process, before the manufacturing phase. After a first step performed locally in France using the HPC-PME initiative, their PRACE SHAPE project has enabled them, via access to HPC resources and expertise in HPC and numerical simulation, to jump from a personal computer version of this software to an HPC version. Electromagnetic simulation is used increasingly often because of the proliferation of communication devices such as mobile phones and modems. Studying the effects of interferences between pieces of equipment has become essential for large industrial companies in aeronautics, space, automotive, etc. to improve the performances of the transmitting and receiving systems, or antennas. NEXIO SIMULATION proposes solutions for electromagnetic simulation problems with larger frequencies and model dimensions that lead to linear systems with millions of unknowns: one of the biggest challenges that researchers in this field encounter. Such solutions call for special numerical techniques which are able to highly reduce the numerical effort and complexity of the solution as well as the necessary used memory. “These techniques are usually based both in physical and mathematical properties,” says Pascal de Resseguier of NEXIO SIMULATION. “However, there is a certain point where these methods are not enough and we need to add some more gain. There it enters the era of parallelisation and HPC systems. “Parallel codes can extremely reduce computational times if they have a good scalability with the number of cores. Getting to an efficient and optimised parallel code requires some expertise and resources which are hard to reach for a SME. We expect that half of the future sales of CAPITOLE-EM will come from the HPC version developed through this SHAPE project.” The newsletter of EPCC, the supercomputing centre at the University of Edinburgh To find out more, see the SHAPE website or contact the SHAPE team at [email protected] SHAPE www.prace-ri.eu/hpc-access/ shape-programme PRACE www.prace-ri.eu Albatern http://albatern.co.uk 9 Impacting on industry EPCC is engaged in two collaborative projects designed to generate impact from research in science, technology, engineering and mathematics. ‘Accelerating impact by developing advanced modelling techniques in the multiphase flow to the chemical process industry.’ Global equipment manufacturers in the chemical and oil and gas industry, such as Sulzer Chemtech, the industrial partner in this project, often rely on commercial Computational Fluid Dynamics (CFD) software tools for the design of their equipment. These commercial codes are currently unable to handle complex twophase flows which exhibit challenging interfaces between gas and liquids such as travelling waves. The formation of interfacial waves, their frequency and amplitude is particularly difficult to model in industrial environments. This project, led by Dr Prashant Valluri, accelerates the impact of world leading research at the University of Edinburgh in the modelling of complex flow systems for industrial applications such as distillation, absorption, carbon capture and oil refining. It aims to lead to new practices in CFD modelling, disrupting industry’s current reliance on empirical design practice for chemical technology equipment such as structured packings. A new software tool relying on rigorous highperformance computing simulation 10 of multiphase flow and transport phenomena will be developed, with expert feedback from users at Sulzer, so that it can be routinely used by industry in the future. Carolyn Brock [email protected] Dr Valluri explains the importance of TPLS, a high-resolution 3D Direct Numerical Simulation code for two-phase flows that we have developed in collaboration with Dr Valluri and Dr Lennon Ó Náraigh, University College Dublin: “Understanding multiphase flows with rigorous simulations is crucial for the accurate and economic design of any industrial units. Until recently, rigorous flow simulations were mainly restricted to academic environments and only empirical simulation methods being so-called ‘design-ready’ despite tremendous errors. However, over the past decade, falling costs and faster multi-thread processors have led to cluster computing becoming more widespread in industrial R&D units and powerful supercomputing clusters such as ARCHER becoming more accessible. “Industry is now getting ready to embrace rigorous simulations not only for accuracy but also for a strong economic argument given smaller trial-and-error commissioning downtimes and reduced physical pilot plant trials. The funds for these projects were awarded from the Engineering & Physical Sciences Research Council’s (EPSRC) ‘Impact Acceleration Account’ (IAA), which is aimed at enhancing innovation opportunities and to encourage partnership working between universities and industry. “Our IAA project with Sulzer is an example. EPCC is at the heart of TPLS Solver. Through a series of HECToR/ARCHER and EPSRC projects, we have been fortunate to have EPCC by our side all along. Their best practices in optimisation, data management, code structures and numerical strategies have given TPLS its ultra-powerful bite making it the only two-phase flow direct numerical simulation solver bespoke for supercomputing architectures with the choice of two highly powerful interface capturing algorithms. “Now at version 2 with over 700 downloads since 2013, and many more physical and computational enhancements underway, we are confident that with EPCC by our side, industrial/commercial uptake of TPLS will increase in the next four years!” ‘Development of a hand-held device for measuring semen, part 2: transferring DDM prototype into advanced commercial prototype.’ The second project focuses on assessing bull semen motility. The British market in bull semen is worth around £50m a year, with 75% of all dairy cattle breeding being by artificial insemination. To date, however, there is no easily portable method of accurately and objectively determining parameters that characterise bull semen in an on-farm setting, some of which are part of crucial assessments in maximising bovine conception rates and thus herd efficiency. Dr Vincent Martinez and Professor Wilson Poon (Institute of Condensed Matter & Complex Systems, University of Edinburgh), have pioneered the use of Differential Dynamic Microscopy (DDM) for characterising motile microorganisms. In collaboration with RAFT Solutions Ltd and using previous IAA funding, they have validated the use of DDM for accurately assessing bull semen. Moreover, they also built a portable, first-generation prototype and used it successfully on-farm to characterise clinical semen samples. Following this success, there is now a need to develop the technology into an advanced commercial prototype/IP package to enable subsequent clinical/industrial validation work. This milestone requires software and hardware development as well as ‘voice of the customer’ market validation. The three-way collaboration between ICMCS, EPCC and RAFT Solutions Ltd will be vital to delivering the next key milestone in this project. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh Dr Vincent Martinez [email protected] Dr Prashant Valluri [email protected] EPSRC www.epsrc.ac.uk Institute of Condensed Matter and Complex Systems www.ph.ed.ac.uk/icmcs 11 Adept: nearing the finish line! The Adept project has been working hard for over two years to further understanding of how power is used in parallel software and hardware, and we are now on the finishing straight. Here we take stock of our achievements and reflect on how to focus our efforts in the final phase. We also consider life after the project ends: how do we want to exploit the technologies we have developed and the knowledge we have gained? How do we ensure a lasting legacy for Adept? Parallel computing is no longer limited to large-scale HPC systems, and parallel technologies are becoming critical to everyday lives. Parallelism on every scale is in use throughout society, from the HPC machines in our labs to the smartphones in our pockets. Small and large businesses alike now need sensible, affordable parallel systems in order to remain competitive, and there is a vast array of different parallel commodity hardware now available. Investigating and increasing the efficiency of such devices is therefore no longer an abstract concern but a real and pressing need. Financial needs, environmental concerns, system requirements – all of these considerations and more will affect how systems are built in future. Adept Power Measurement System One of our key outcomes is the sophisticated Adept Power Measurement System (APMS). This fine-grained measurement infrastructure reads the current and voltage from the powerlines that 12 feed the different components of a computer system, eg CPU, memory or disk. The APMS is capable of measuring from multiple components with the very high resolution of 1 million samples per second. Adept Benchmark Suite To complement the APMS, the Adept project has also developed a suite of benchmarks that can be used to test and evaluate existing systems. The benchmarks are designed to be used for system characterisation and they target specific operations and common computational patterns. Mirren White [email protected] A lot of challenging work remains to be done, however the finish line is now in sight. We are certain that Adept will deliver on all its objectives and more! The suite consists of three different types of benchmarks: Micro benchmarks. Small single purpose functions such as basic operations on scalar data types, branch & jump statements, function calls, I/O operations, inter-process communication, or memory access operations. Kernel benchmarks. Computational patterns and kernels that largely consist of the operations from the micro benchmarks. A set of Adept power measurement boards, fully wired up and ready for deployment. Application benchmarks. Small applications that consist of multiple computational kernels. Our Benchmark Suite includes a wrapper for Intel’s Running Average Power Limit (RAPL) system, which is an in-band method for reading power and energy counters on certain Intel CPUs. Together, the Power Measurement System and Benchmark Suite form a powerful set of diagnostic tools to allow in-depth analysis of an application’s power use in every aspect of a system. Adept Performance and Power Prediction Tool The Adept project is not limited to measuring power consumption, and another important outcome of the project is our Performance and Power Prediction Tool. Using detailed statistical modelling that examines a software binary, we are able to predict how well a CPU and memory hierarchy system will perform and how power efficient it will be, even if we do not have access to that system or even if that system does not yet exist. The Adept tool will impact on software developers and system designers by freeing them from making poorly informed decisions about how to implement changes to their systems. It allows for the design of smarter, cheaper, and more efficient systems, because a system’s performance and power behaviours can be matched to a specific workload. Giving owners and developers the freedom and flexibility to know how their equipment will perform prior to porting their workloads means giving them the ability to make better choices about what they implement, how, and when. In the final few months of the project, we will be focusing on improving the Adept tool wherever possible to make its predictions increasingly accurate, and we will use our measurement infrastructure to conduct a wide range of experiments around power efficiency techniques in software development. But we will also focus significant effort on the exploitation of the project outcomes to ensure the lasting impact of our research. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh The Adept Power Measurement system. The Adept project focuses on balancing power consumption and performance in both parallel software and hardware. www.adept-project.eu 13 Large Synoptic Survey Telescope: data-intensive research in action A number of recent, significant discoveries have propelled astronomy research into the spotlight. The discovery of dark matter and dark energy at the beginning of the 21st century overturned our understanding of how the Universe works. And the first observation of a gravitational wave earlier this year confirmed Albert Einstein’s long-standing hypothesis precisely 100 years after it was first published in his general theory of relativity. This is an exciting time for astronomy in the UK, a fact that is reflected by our involvement and leadership of some amazingly ambitious new telescopes. The European Space Agency’s Euclid dark Universe programme will launch a space telescope in 2020 to answer our most pressing questions about the dark Universe. The Square Kilometre Array (SKA) radio telescope, coordinated from Jodrell Bank, will be able to see back to the early Universe to the time when cosmological structures such as galaxies and stars first began to form when it commences operation in 2022. And in Chile construction is underway on the Large Synoptic Survey Telescope (LSST) – the most ambitious optical telescope ever undertaken – which should “see first light” in 2019. While the outputs of LSST will challenge astronomers for years to 14 come, the ambition of the LSST is already creating significant challenges for the engineers and computational scientists involved in its construction and future operation. At the heart of the telescope sits a 3.2 Gigapixel camera (that is more than 100 times the elements of a current top-of-the-range digital camera), which is being designed in part in the UK. Thanks to this camera, the telescope will produce more than 100 Petabytes of data during a 10-year survey that will image more than half of the sky with unprecedented depth and sensitivity. LSST:UK consortium UK astronomers have ambitious plans for LSST to advance understanding of dark energy, to identify and study near-Earth objects, to detect and follow transient events, and to progress George Beckett [email protected] LSST data mining sphere. The LSST team has developed an innovative “overlapping partitioning” method for storing enormous amounts of information for rapid access. By overlapping equally sized packets of information in the partitioned sphere, searching for nearest neighbour sources becomes quick and efficient. The technique has been shown to work just as efficiently with increasingly complex systems. The improved algorithms resulting from this innovative architecture will be available as open source software that can be used by a broad spectrum of fields to transform access to large databases. Image: LSST. Cut-away image of LSST camera showing inner workings. Image courtesy of LSST. supernova science. To support its ambition, the community has formed a consortium called LSST:UK with representation from every astronomy department in the country and – with support from the Science Technology and Facilities Council – the consortium has secured full membership of the LSST. LSST:UK Science Centre Construction progress in Chile is mirrored by scientific progress here in the UK, as scientists make their preparations in an £18 million STFC-funded project called the LSST:UK Science Centre (LUSC). The pre-operations phase of LUSC, which is led by the University of Edinburgh, started in July 2015 and will run for four years. During this term, the infrastructure to host and analyse LSST data (called the Data Access Centre) will be designed and science groups will define and optimise the workflows that will be run in the Data Access Centre. Engagement with the international community is vital during the construction phase. LSST:UK is already building strong relationships with the core teams of scientists and technologists in the United States and France. Further, we are looking towards collaboration opportunities with peer activities in Euclid, SKA, and the LHC, exploiting the UK’s unique position of being involved in all three of these programmes. The programme of work in the lead-up to first light in 2019 is ambitious and exciting. The volume and rate of data generated by LSST will break today’s databases and analysis software, and will challenge established astronomy practices and expectations. This is dataintensive research in action. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh The Large Synoptic Survey Telescope (LSST) project will conduct a 10-year survey of the sky that will deliver a 200-petabyte set of images and data products which will address some of the most pressing questions about the structure and evolution of the Universe and the objects in it. LSST:UK Science Centre www.lsst.ac.uk 15 Creating a safe haven for health data Safe havens allow data from electronic records to be used to support research when it is not practicable to obtain individual patient consent while protecting patient identity and privacy. EPCC is now the operator of the new NHS National Services Scotland (NSS) national safe haven in collaboration with the Farr Institute of Health Informatics Research which provides the infrastructure. Enabling researcher access to sensitive data sources is a complex process. Data providers manage their risk by making data supply dependent on research projects meeting specific information governance, data stewardship and system security requirements, in some cases through audited assessment. These system requirements place a very substantial burden on individual research projects and in some cases these requirements alone can make projects unviable. However, the whole supplier risk-management process can be streamlined, and in some cases eliminated entirely, if research projects use an appropriately accredited safe haven facility to broker access to the data. Safe havens act as secure virtual data rooms in which the data suppliers deposit data for the research projects to access it. The 16 practice of providing researcher access to NHS patient and health data has been pioneered in the UK through governance initiatives such as the Scottish Health Informatics Programme (SHIP). NSS safe haven The new NHS National Services Scotland (NSS) national safe haven service implementation work started in September 2015 with the live service rolled out during December and January 2016. Now fully operational the safe haven is both physical and remote. It offers a secure file transfer and submission service for data providers and a range of access methods and analytics platforms and tools for researchers. The standard service offered to research projects is secure remote browser-based access to a lockeddown virtual desktop MS Windows system with MS Excel, SPSS, Stata, SAS and R. Donald Scobbie [email protected] The Farr Institute is a UK-wide research collaboration. Publicly funded by a consortium led by the Medical Research Council, the Institute is committed to using big data to advance the health and care of patients and the public. Development and operation of the new NSS safe haven presented new challenges for EPCC, although the safe haven model is mature and relatively well understood, with expertise in it readily found in the HPC community. This project therefore prompted the development of new capability within EPCC, bringing security management and secure data stewardship as new core skills to the system development team. Implementing and operating the extensive supporting infrastructure (including enterprise products for the virtual desktop infrastructure) for the new safe haven has been the key to delivery of the service and evolution of the new security environment. Information governance The information governance and security regime of the safe haven has now reached the standard where NHS national data sets and Department of Work and Pensions (DWP) data can be hosted by the service and the next goal is to host the NSS national image archive for research purposes. Information governance in a safe haven environment is very much the primary concern and HPC a secondary one. EPCC is working closely with NSS and the Farr Institute to extend and enhance the new safe haven service beyond its current basic compute capability to provide traditional HPC services within the safe haven. A higher powered compute cluster and petabyte-scale storage services are being developed alongside the safe haven. The intention is to provide a more capable, secure analytic environment for health research that continues to meet the data stewardship and sharing security needs of data providers such as the NHS and DWP. These services will be rolled out later this year. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh Farr NSS Safe Haven: • 3 node hyper-V hypervisor platform • 46 Virtual Servers: Windows and Linux • 20 research partners and institutions • 103 registered researcher users • 62 active projects www.farrinstitute.org 17 INTERTWinE: boosting research by exploiting parallelism The first exascale computers, capable of performing 1×1018 calculations per second using tens of millions of CPUs, are likely to be produced within the next few years. However, current versions of scientific software cannot produce enough concurrent tasks to keep such a high number of CPUs busy at once, even if there are enough tasks waiting to be processed. Such inefficient use would mean that an exascale-capable machine would in fact be unlikely to achieve exascale performance. The INTERTWinE project is addressing this by helping scientists to find and exploit the parallelism that already exists within their software. By working with real software and popular programming techniques, we ensure the focus is aligned to scientists’ pressing needs and applications. We have identified a number of key parallel programming models that have been widely adopted in current scientific software. However in order to achieve large-scale parallelism, these often need to be used together, and INTERTWinE focuses on this interoperability of programming models. EPCC is contributing to various areas of the technical work. For instance, we are investigating how to combine a thread-based model 18 with a distributed-memory model for off-node parallelism. In order to make this transparent to the user, yet highly performant, we are focusing at the runtime level; for example by using a directory, which knows where data is located, to hide explicit data movement and using a cache to limit the amount of communication in the first place. It is important that the improvements to the technologies meet the requirements of the application developers. Another of EPCC’s technical contributions is in the optimisation and improvement of existing parallel applications using these new technologies. EPCC is currently focusing on Ludwig, a lattice Boltzmann code for complex fluids, and investigating the best mix of programming technologies in order to achieve good performance and scalability. The lessons learned from this work are not only fed back into the programming interoperability work, but also into best practice, training and the development of standards to meet interoperability demands. In addition to this we are working with the relevant standards bodies to better understand and support the particular requirements of interoperability with other programming models. Catherine Inglis [email protected] The INTERTWinE team in Barcelona. To subscribe to news from INTERTWinE, sign up at: www.intertwine-project.eu/ newsletter. INTERTWinE is led by EPCC and funded by the EC Horizon 2020 Research & Innovation programme for 3 years from October 1, 2015. www.intertwine-project.eu ARCHER Champions: spreading the word ARCHER Champions began with a vision: every research organisation that could benefit from ARCHER should have someone local who knows about the routes to access ARCHER and who can help potential users to get started. We want Champions to tell us how we can improve support for them and their local users, and how to start joining up all the HPC facilities and the people with the expertise around the UK. The Engineering and Physical Sciences Research Council agreed that these ideas were worth funding and so we were able to launch the ARCHER Outreach project and, as part of that, ARCHER Champions. We consulted experts around the country on the best and most useful way forward and, based on their suggestions, in March we organised a meeting in Edinburgh to gather HPC experts, ARCHER users and interested researchers to share our ideas on ARCHER Champions. Members of the ARCHER Team began by outlining what ARCHER is, what it offers researchers, how it fits into the National HPC infrastructure, how to access ARCHER, the training available and the user support structures. We invited discussion on the obstacles to accessing HPC facilities (both ARCHER and others), what the ARCHER team should do next, and the concerns, frustrations and uncertainties of new users. We even managed to bust a few misconceptions about problems with using ARCHER. The name “ARCHER Champions” was reviewed. This was always intended to imply “Enthusiasts championing the use of ARCHER” rather than “Supreme ARCHER users” and this discussion reassured some of those present who felt they were not (yet) champion users. The meeting had a terrific atmosphere of positivity, bringing together lots of enthusiasm and experience, and has provided us with a wealth of ideas for taking ARCHER Champions forwards. In the next few weeks we will ensure all the Champions have access to ARCHER, with a budget, so that if they are not already users then they can experience using ARCHER for themselves and be able to demonstrate to others. We will also forge further links with other HPC networks such as HPC-SIG and the Research Software Engineers with a view to co-locating a future Champions meeting and continuing to provide our Champions with resources and information. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh Jo Beech-Brandt [email protected] We would like to thank everyone who has helped get ARCHER Champions off to such a great start. To get involved, email: [email protected] See our webpage for details of all current Champions and the resources shared at the meeting. We will continue to add resources and information about future events. www.archer.ac.uk/community/ champions 19 Software Carpentry Teaching researchers the software development skills essential to their work. Software Carpentry (SC) is an international collaboration offering highly-interactive two-day workshops. The SC model is based on a community of certified instructors who teach at the workshops, and contributors who maintain the lessons materials. The lessons materials are all available under Creative Commons BY licence and maintained by the community itself. They are used as the modular bricks to build the typical SC workshop curriculum, which must include SC’s core topics: automating tasks using the Unix shell; structured programming in Python, R, or MATLAB; and version control using Git or Mercurial. SC is about methods and practices, rather than specific tools. SC has enjoyed a steady increase in popularity, thanks to an engaged community that has succeeded in introducing this format of training to academic departments all over the world. This growth led to the creation of the Software Carpentry Foundation, which holds the reins of the initiative and whose main 20 concern is currently ensuring a sustainable business model. SC has a strong international vocation and is subdivided into regional administrations, which interface with the groups involved in the training (host/learners and instructors/lesson-maintainers). EPCC has always played a crucial role in promoting SC across the UK. A number of active SC instructors are based at EPCC and we host the UK regional administration (as part of the Edinburgh branch of the Software Sustainability Institute), which coordinates most SC workshops in the UK. ARCHER (the UK supercomputing facility, hosted by EPCC) has, since 2014 used the SC format as a regular part of its training. Data Carpentry A noteworthy development has been the birth of Data Carpentry (DC), a sibling project which shares most of SC’s operations. DC focuses on introductory computational skills for data management and analysis, and Giacomo Peru [email protected] Instructors come from a range of backgrounds and are often researchers with sound experience of research software development and a clear sense of the pitfalls. They are certified after an extensive online course or an intensive two-day face-to-face one. SC instructors are volunteers who offer to teach at workshops for free and sometimes during their work leave. Katy Wrathall, Flickr targets a less experienced audience than SC, offering a curriculum which features spreadsheets, data cleaning (OpenRefine), R, visualisation with R and SQL. DC is enjoying good success as learners gain direct, tangible benefits. Future developments of both SC and DC are likely to come from efforts to establish them as more regular training models within academic departments (and in Centres for Doctoral Training), as well as from the development of lessons that are more domainoriented, eg based on the use of a domain-specific sample dataset throughout the course. For example, the development of HPC Carpentry is moving in this direction. What participants say Alexander Konovalov, University of St Andrews (learner) “The course I attended covered many aspects of delivering handson training to novice learners (presumably scientists with no formal training in programming). Acquiring such skills is very important to improve researchers’ productivity and facilitate collaboration, and I hope to contribute by recommending and delivering software carpentry training in my domain. “Techniques that I have practiced here will certainly help me in teaching computer science modules as well.” Aleksandra Pawlik, University of Manchester (instructor) “I have taught on SC and DC courses since 2013 and have met a very wide range of audiences. I recommend the courses to all researchers whose work is heavily dependent on any type of software and/or deals with large datasets. I would also recommend it to other professionals such as librarians and administrators, and we do teach them as well. “In my experience learners leave the course with the feeling of having learned very relevant skills and capable of significantly improving both the quality and the quantity of their workflows.” The newsletter of EPCC, the supercomputing centre at the University of Edinburgh SC was founded in 1998 by Greg Wilson, formerly of EPCC, and arose from the growing awareness that something should be done to link domain specific knowledge-bases with software development and computational training. http://software-carpentry.org 21 UKMAC 2016: UK Many-Core Developer Conference Edinburgh hosted the UK ManyCore Developer Conference in May 2016. This informal day of talks spanning the whole landscape of accelerated, heterogeneous and many-core computing brought together academic and industrial researchers striving to improve the programmability of modern computing systems, which are becoming increasingly powerful at the expense of complexity. The informal nature of the UKMAC series provides invaluable opportunities for participants to meet colleagues and swap stories of many-core successes and challenges. A highlight of the day was the discussion provided by keynote speaker Andrew Richards, CEO of Codeplay Software, an Edinburghbased company that develops compilers for many-core parallel systems and also works on associated parallel programming models and standards. 22 Andrew emphasised the increasing importance of parallel computing, particularly in relation to the recent explosion in machine learning usage within mainstream markets such as online services and selfdriving cars. He also gave his thoughts on how to address the challenges of performance portability (the ability of software to run well across different hardware architectures) and composability (the ability of different software components to interoperate effectively). The remainder of the day covered a range of topics, including experiences with exploiting GPUs for scientific applications, frameworks to ease programmability of FPGAs for image processing algorithms, and work to enable applications written in high-level languages such as Java to utilise modern many-core devices. Alan Gray [email protected] UKMAC, in which EPCC had an organisational role, was held at The Informatics Forum in Edinburgh, which was bathed in sunshine as spring finally arrived. This was the 7th event in the series and the first in Scotland, with previous meetings in Cambridge, Oxford, Imperial, and Bristol. The presentation slides are available at: http://conferences.inf.ed.ac. uk/UKMAC2016 The Big Bang Fair EPCC has an experienced outreach team and under ARCHER we have increased the scale of our activity, enthusing even more children about computational science and supercomputing. However the Big Bang Fair was a step up again. It is the UK’s largest celebration of science, technology, engineering and maths for young people, with around 70,000 people attending over 4 days. Our stand presented three main activities, with Wee Archie, our mini supercomputer, particularly popular. components to buy must be balanced against running costs and the income generated from clients. Wee Archie The leader board proved to be a lot of fun, and the game also allowed us to demonstrate the main components of a high performance computing system (HPC) and to highlight some of the challenges of running such a system. We used Wee Archie to run an enhanced version of our dino-racer demo, with children able to build their own dinosaurs on the system. Wee Archie comprises 18 Raspberry Pi 2s, a network switch, a power supply unit (PSU), and Ethernet cables in a transparent case. The LED lights on each of the system’s nodes show how the workload on a parallel system is balanced, with some nodes carrying out more work than others. Wee Archie is an excellent tool for explaining the basics of a parallel computer and how the components all fit together. Build your own supercomputer This game, which we presented on iPads, allows players to design, build and operate their own supercomputer. As with a real system, decisions about the type of Lorna Smith [email protected] Post sort demo A simple but fun demo, the post sort introduces parallel algorithms in a practical way. By sorting a series of envelopes while working together, the children learned about parallelism and the different possible bottlenecks. How did we do? So how did it go? The event was a great success, the booth was constantly busy, and people generally went away with a better understanding of what HPC is and why it is important. Overall this was a great event for us. 2017 here we come! The newsletter of EPCC, the supercomputing centre at the University of Edinburgh EPCC Outreach www.epcc.ed.ac.uk/outreach/ discover-and-learn Wee Archie Wee Archie is a portable, functional cluster developed by EPCC to demonstrate applications and concepts relating to parallel systems. www.epcc.ed.ac.uk/outreach/ discover-and-learn/facts-andfun/wee-archie 23 Master’s degrees in High Performance Computing (HPC) and in HPC with Data Science From EPCC at the University of Edinburgh EPCC is the UK’s leading supercomputing centre. We are a major provider of HPC training in Europe, and have an international reputation for excellence in HPC education and research. Our two MSc programmes have a strong practical focus and provide access to leading-edge HPC systems such as ARCHER, which is the UK’s largest, fastest and most powerful supercomputer. Through EPCC’s strong links with industry, all students are offered the opportunity to undertake an industry-based dissertation project. The University of Edinburgh is consistently ranked among the top 50 universities in the world.* * Times Higher World University Ranking Apply now www.epcc.ed.ac.uk/msc 24