Hi-Res PDF - Polyhedron
Transcription
Hi-Res PDF - Polyhedron
Newsletter Summer 2007 With Dual-Core processors now outselling single cores and quad-cores just arriving on the market, we look at how developers can exploit this extra processing power, and stay ahead of their competitors. I f you buy a new computer today, the chances are that it will have a dual or even a quad-core processor. Intel and AMD have taken a break from endlessly ramping up clock speed, and are instead putting several CPUs into a single socket - a trend which will undoubtedly continue for a few years. To get the maximum performance from these new architectures, programmers have to divide their problem up, so that several CPUs can work on it simultaneously, and then merge the results into a coherent whole before presenting them to the user. This can be a daunting task, with many new ways to go wrong and get confused. Thankfully, there are tools to ease the pain, by providing standardised ways to analyse and implement parallel solutions, and to visualize and debug the resulting software systems. Analysis The first step in working towards parallelization is to analyze the existing code. The Intel VTune Performance Analyzer has been designed to quickly find application bottlenecks by using different techniques to gather tuning data. The Sampling technique runs with a very low overhead and quickly shows where the application is spending most of its time and the Call Graph profiler gives a pictorial view of program flow and calling sequences. Introduce Parallelism The simplest option, provided by some compilers (see table on page 2) is "Auto-Parallelization". This requires no user intervention - the compiler automatically detects loops that can be executed in parallel and creates appropriate code. This can be useful, but it's not a panacea: you can usually do better with some programmer input. World Wide Web OpenMP OpenMP is the industry standard for developing portable multithreaded applications. The compiler directives are an easy and powerful way to convert sections of serial code into parallel code. Parallelism is added incrementally so that the sequential program gradually evolves into a parallel program. OpenMP is normally restricted to "shared memory" systems, though Intel have recently introduced "Cluster OpenMP", which extends the model to clusters. MPI MPI is designed for "clusters" - groups of separate computers, each with its own memory, which exchange data via a high-speed network. However MPI programs also run on shared memory systems, often with efficiency similar to OpenMP. The MPI model is quite different to OpenMP - you have to think of muliple copies of a program passing messages to each other, rather than a single program spawning new threads. It is harder to program, but more flexible; it is the most common choice for high performance super-computers. Debugging and Tuning There are many new classes of error in parallel programming. For example a "data race" occurs when multiple threads access the same memory and results depend which gets there first. Threads may also stall or block each other. The Intel Thread Checker and Thread Profiler can be used to detect data races, pinpoint errors and highlight thread imbalances. Similarly problems arise with MPI, and in addition, the time spent passing messages can eat into the gains from parallelization. Intel's Cluster Toolkit for Linux provides an excellent Trace Analyzer and Collector to graphically show how time is being spent in each separate thread. Polyhedron's website is widely respected as a source for impartial reference material Visit us at http://www.polyhedron.com/ Compiler Comparison Windowscompilers compilers Windows ABSOFT 10 INTEL 10 LAHEY 7.1 NAG 5.0 PGI PGI 7.x Salford Salford 5.0 5.10 OpenMP YES* YES NO YES YES NO 64-bit support YES YES NO NO YES NO 2003 extensions NO YES NO YES NO NO VS2005 Integration NO YES Soon NO YES YES Linux compilers ABSOFT 10 INTEL 10 LAHEY 8.0 NAG 5.0 PGI 7.x Pathscale 3.0 OpenMP YES* YES YES YES YES YES Auto-Parallelization YES* YES YES NO YES NO 64-bit support YES YES YES YES YES YES 2003 extensions NO YES NO YES NO NO * Separate MP compiler Portland PGI Compilers & Tools P olyhedron are pleased to announce their new partnership with Oregon based The Portland Group. The company offers high performance scalar and parallel Fortran, C and C++ compilers and tools for 32-bit and 64-bit workstations, servers and clusters running either Windows or Linux. The PGI Visual Fortran product fully integrates the PGI suite of Fortran compilers into Microsoft* Visual Studio* 2005 with standard features such as syntax colouring, tips and keyword completion and enhanced features such as graphical symbolic debugging of multi-thread and OpenMP applications. PathScale (E veryKnownOptimizationPath) High Performance 32-bit & 64-bit Compiler Suite. New to Polyhedron - PathScale Compiler Suite s C, C++, and Fortran 77/90/95 compilers. s Industry leading optimizations s Complete support for OpenMP 2.0 (including WORKSHARE) s Complete support for 64-bit and 32-bit x86 compilation s Code generation for AMD64 ABI, AMD Opteron, and Intel EM64T s QLogic optimized AMD Core Math Library (available for download) s New advanced serial debugger — Pathdb s Compatible with GNU/gcc tool chain and popular Third Party debuggers s Supported on SUSE, RedHat, and Fedora Linux We pride ourselves on our technical support and can offer advice to help you get the most out of your programming. Alternatively, if you need to use our programming services, no job is too big or small. We use the software that we sell! V 7.1of Winteracter, the Fortran GUI and graphics toolkit has now been released and includes the following features: 6 6 6 6 6 6 6 6 6 6 6 3D models can now be split into parts allowing transformations, visibility, material and names to be applied to sections of a model 3D DXF support has been upgraded to include the interrogation of the number of vertices and facets, part recognition and faster loading of 'shared vertices' Database interrogation via ODBC on Windows, Linux and Mac OS X Support for multi-line graphics text-blocks Faster bitmap rotation on Windows X/Winteracter now available for Mac OS X/x86 using Intel or g95 compilers Support for Absoft Pro Fortran v10/Win32 as well as Win64, Linux32 and Linux64 WiDE improvements to Fortran source reformatter and source code exchange between Windows and Linux/Mac OS X Help Editor: HTML tag insert options, access to Microsoft's HTML Tag Reference, Expanded documentation Advanced documentation search options under Windows and new wsearch tool on Linux and Mac OS X Additional help options and HTML tag insertion options in Troubleshooter editor All trademarks are acknowledged T ecplot 360 is CFD & Numerical Simulation Visualization Software. Finally, with just one tool you can analyze and explore complex datasets, arrange multiple XY, 2D and 3D plots, create animations & then communicate your results with brilliant, high-quality output. Tecplot 360 is designed for speed. Smarter loading of data results in a faster time to display the first image and gives the ability to open files that were previously not possible. Unlike the previous single-threaded version, Tecplot 360's intensive computing operations are now spread across all available CPU's resulting in faster streamtraces, slices and iso-surfaces. Tecplot Focus Tecplot Focus is a great way to get into Tecplot functionality if you don't need the support of CFD data formats, CFD analysis and transient data. Its great price still includes features such as; Multi-frame layout, Macros and automation, XY, 2D and 3D plotting and multi-platform support. ® Intel Compilers V10 ® Intel Visual Fortran compilers now include Visual Studio 2005 PPE, so you only need to purchase a copy of VS if you are doing mixed language programming. The Professional Editions of Intel Fortran for Windows, ® Linux and MAC OS include Intel Math Kernel Library ® (MKL ), and customers can upgrade from Standard to Professional for the price of Professional support. Other new Fortran compiler features include: • More Fortran 2003 features (Win/Linux/MAC) • Updated COM Server Wizard (Win) • Improved performance & threading (Win/Linux/MAC) • Security checking & diagnostics (Win/Linux/MAC) • Optimization reports (Win/Linux/MAC) • Support for latest multi-core processors (Win/Linux/MAC) • 64-bit Mac OS X support (MAC) Intel C++ new features include: • Improved performance & threading (Win/Linux/MAC) • Security checking & diagnostics (Win/Linux/MAC) ® ® • Windows Vista and Visual Studio 2005 support (Win) • Optimization reports (Win/Linux/MAC) • Support for latest multi-core processors (Win/Linux/MAC) ® Customers can also upgrade Intel C++ Standard to ® Professional for the price of Professional support. Intel ® C++ Professional adds Intel Threading Building Blocks, Intel® Integrated Performance Primitives and MKL®. Intel Visual C++ does not include Microsoft Visual Studio PPE - it still requires Microsoft Visual C++ or above. G INO 7.0 is now available and includes the following new features: 6 6 6 6 6 6 6 6 6 6 6 6 Importing of 2D and 3D DXF files including facilities to enquire the entity count, graphical extent and list of layer names Import DXF polymesh surface for interpretation by GINOSURF Access to Hardware fonts by name and point size Generate BMP/JPEG/PNG image containing OpenGL graphics Generate colour-scaled XY plots together with automatic graduated colour bar Extend the cut and fill surface functionality catering for break lines and site boundaries Cater for vertical fault lines in surfaces allowing for vertical discontinuity between two heights Addition of Spinner/up-down control in GINOMENU Generate Manifest files for maintaining Windows XP look and feel Allow definition of HyperText link callbacks in GINOMENU Improved Code Editor management in GINOMENU Studio including new Source Code viewer Support for Visual Studio 2005 including full online Help integration Lahey LF64 for 64-bit Linux LF64 is available in two configurations, Express and Professional. LF64 PRO adds auto-parallelization, OpenMP compatibility, the Winteracter Starter Kit, WiSK, for creating Windows GUIs and displaying graphics, thread-safe BLAS and LAPACK, Polyhedron's Automake utility, and the Fujitsu SSL2 math library (thread-safe for parallel applications). LF64 complies code for 64-bit machines but the code won’t run on 32-bit computers. LF95 for Linux 32-bit will run on 32 or 64 bit Linux machines. However, it will only produce 32-bit code. B ased on Mathematica, the tool of choice for scientific research, engineering analysis, modelling and technical education, gridMathematica 2 delivers an optimized parallel environment for modern multiprocessor machines, clusters, grids and supercomputers. Features include: 6 Parallelization at the Mathematica language level 6 Support for multiprocessor machines, clusters and grids 6 High-performance MathLink communication protocol optimized for all common configurations 6 Efficient, adaptive load balancing 6 User-programmable scheduling 6 Support for tracing and debugging Ease of Development gridMathematica introduces only a small number of new parallel computing constructs, and users familiar with Mathematica can transition to gridMathematica without difficulty. Furthermore, programs written in Mathematica can be easily modified to run on a grid. Even users who are new to Mathematica can use its high-level programming capabilities and thousands of built-in functions to solve grid-computing problems that used to require thousands of lines of code in C or Fortran. Platform Independence gridMathematica is platform independent and can be used on dedicated multiprocessor machines as well as on homogeneous and heterogeneous clusters. The only technical requirement, apart from the ability to run Mathematica, is a TCP/IP connection between the individual computing nodes. This connection allows customers to run the same code on any available machines without any porting work. It also makes it easy to build ad-hoc clusters out of under-utilized computers or to take advantage of low-use periods. Polyhedron Iterative Linear Solver Many of you will have seen that, on our web site, there is a short paper about Nested Factorization, an iterative linear solver which was developed over 25 years ago by Dr John Appleyard, one of the founders of Polyhedron and his then colleague Dr Ian Cheshire. Ever since, this algorithm has been at the core of the ECLIPSE reservoir simulator from Schlumberger. However Nested Factorization is not widely known outside the oil industry. The NF benchmark on the Polyhedron compiler comparison pages includes a simple implementation of that algorithm. Over the past couple of years, Polyhedron has developed a radical new version of this algorithm, which unlike the original, is applicable to general sparse matrices - particularly those arising from space filling meshes. Serial and MPI versions of the new algorithm have been implemented successfully in the current version of ECLIPSE software and results have been extremely good. Polyhedron is now looking to exploit this new algorithm in other applications and industries. If you have an application that depends on the fast solution of huge sparse matrices using iterative methods, please email [email protected] to arrange a meeting. Visit our newly designed web-site at www.polyhedron.com for independent compiler comparisons, advice on programming and helpful articles. Silverfrost FTN95 V5.10 Absoft Pro Fortran Find your bugs fast with FTN95. Download the trial version from our web-site to evaluate the compiler. See for yourself how you can save hours using CHECKMATE to find your undefined variables. NEW – IMSL 32-bit & 64-bit numerical libraries for Pro Fortan v10 under Mac/Intel are now shipping. Currently IMSL numerical libraries are supported on over 70 platforms worldwide, but IMSL for MacOS/Intel is available only through Absoft. Polyhedron Software Ltd, Linden House, 93 High Street, Standlake, WITNEY, OX29 7RH. United Kingdom Tel (+44/0)1865-300579, Fax (+44/0)1865-300232 [email protected] www.polyhedron.com