Automated Design Exploration and Optimization + HPC Best Practices

Transcription

Automated Design Exploration and Optimization + HPC Best Practices
Automated Design Exploration and Optimization + HPC Best Practices 1
© 2011 ANSYS, Inc.
June 8, 2012
Laz Foley & Simon Pereira Confidence by Design
Detroit June 5, 2012
Outline
• The Path to Robust Design
• ANSYS DesignXplorer •
•
•
•
Mesh Morphing and Optimizer
RBF Morph
Adjoint Solver
HPC Best Practices
The Path to Robust Design
obust Design is an NSYS Advantage
ngle Physics olution
Robust Design
Optimization
•Algorithms
•Published API
Design Exploration
“What if” Study
Multiphysics
Solution
•Integration Platform
ccuracy, robustness, eed…
•Parametric Platform
•DOE, Response Surfaces, Correlation, Sensitivity, Unified reporting, etc.
•Six Sigma Analysis
•Probabilistic Algorithms
•Adjoint solver methods
Single Physics/Multi Physics
e physics is the entry point mulation
i‐Physics enables real al prototyping
Fluid Pressure Distribution
2 way coupled with a transient structural solution
Stress ?
omagnetic Force Density with Thermal‐Stress and Electromagnetic Force load Deformation 




“What If?”
Interactively adjust the parameter values and “Update”
Needed for "What If?"
Parametric CAD Connections
Pervasive Parameters
Persistent Updates
Managed State, Update Mechanisms
Remote Solve Manager (RSM)
Design Exploration
Optimization
Optimal Candidates
ANSYS DesignXplorer
Initial vs. Optimized Design
Output
Initial Design
Optimized
Tt Ratio
1.116
1.126
pt Ratio
1.674
1.709
η [%]
71.65
76.25
Power [MW]
1.208
1.268
Engineers can easily appreciate the value of understanding.
Thermal
Stress
Pressure & Flow Velocity
Parametric Geometry
Deformation
ust manifold design
Sigma Analysis
All samples reports max deformation below 1.5 mm mum Displacement should not exceed 1.5 mm
Parameters
Diameter of the d
ess at inlet
al Temperature
e RPM
Response Parameters
Max Flow Temperature
Max Deformation
Max Von‐Mises stress
Uncertainty of input Response Surface showing the effect of engine speed and thickness at outlet on Where are you?
may want to consider how far along the “path” your simulation practices are
haps you could improve your use of optimization and greater leverage your investment in Simulation?
ANSYS DesignXplorer
Integral with Workbench
• Parametric multiphysics
modeling with automated updates
• Bi‐directional CAD, RSM, scripting, reporting and more...
Robust Design Tools at ANSYS
NSYS DesignXplorer
Unified Workbench solution
NSYS Fluent Built‐in Mesh Morphing and shape Optimization (MMO) tools
Baseline Design
Being hooked up to DX for 14.5
Adjoint solver
High sensitivity – changes to shape have a big effect on drag
Optimized Design
High sensitivity – Shape on Downforce
ࣔࢊ࢕࢝࢔ࢌ࢕࢘ࢉࢋ
࢙ࣔࢎࢇ࢖ࢋ
࢘ࢇࢍ
Low sensitivity
R14
Robust Design Tools at ANSYS
NSOFT Optimetrics
Produce “families of curves”
Simultaneous solve with DSO packs
Access to EBU Adjoint
nd more
XY Plot 1
Ansoft Corporation
Maxwell3DDesign1
350.00
Curve Info
Fm
Setup1 : LastAdaptive
Gap='0.001in'
300.00
Fm
Setup1 : LastAdaptive
Gap='0.002in'
Fm
Setup1 : LastAdaptive
Gap='0.003in'
Fm [newton]
250.00
Fm
Setup1 : LastAdaptive
Gap='0.004in'
200.00
Fm
Setup1 : LastAdaptive
Gap='0.005in'
150.00
Fm
Setup1 : LastAdaptive
Gap='0.006in'
100.00
50.00
0.00
0.00
100.00
200.00
300.00
400.00
500.00
ampturns [A]
600.00
700.00
800.00
900.00
1000.00
Optimization Partners
NSYS simulation software has been effectively used in concert with many optimization partners
MATLAB (Mathworks)
ModeFrontier (Esteco)
OptiSLang (Dynardo)
eArtius
Optimus (Noesis)
RBF‐Morph
Sculptor (Optimal)
Sigma Technology (IOSO)
TOSCA (FE‐DESIGN)
iSight (Dassault)
Qfin (Qfinsoft)
ANSYS DesignXplorer
ANSYS DesignXplorer
DesignXplorer is everything under this Parameter bar…
• Low cost & easy to use!
• It drives Workbench • Improves the ROI!
ANSYS Workbench Solvers
DX
Design of Experiments
With little more effort than for a single run, you can use DesignXplorer to create a DOE Correlation Matrix
Understand how your parameters are correlated/influenced by other parameters!
Sensitivity
Understand which parameters your design is most sensitive to!
Response Surface
Understand the ensitivities of the output arameters (results) wrt the input parameters.
2D Slices Response
3D Response
Goal‐Driven Optimization
Use an optimization algorithm or screening to understand tradeoffs or discover optimal design candidates!
Robustness Evaluation
Input parameters have variation!
Make sure your design is robust!
Six Sigma, TQM
ut meters also!
Understand how our performance will vary with your Predict how many parts will likely fail?
Understand which inputs require the Industry Testimonials
“The ease of using simulation tools has helped to transform our organization from a test‐centric culture to an analysis‐centric culture.”
Bob Tickel, Director of Analysis at Cummins
“This technology makes it possible to quickly evaluate hundreds of designs in batch processes to explore the complete design space so that we know we have the best possible design.”
Stresses Ken Karbon, Staff Engineer, General Motors
er the course of the design process, Dyson’s engineers steadily improved the formance of the fan to the point that the final design has an amplification ratio of 15 ne, a 2.5‐fold improvement over the six‐to‐one ratio of the original concept design. team investigated 200 different design iterations using simulation, which was 10 es the number that would have been possible had physical prototyping been the mary design tool. Physical testing was used to validate the final design, and the ults correlated well with the simulation analysis.”
Example 1: Slit Die
Need uniform outflow
Minimize pressure drop P2
P3
Good
GOOD
BAD
Pressure Drop
Bad
P1
Example 2: Combustor
Diffuser Length
Outlet
Exit Height
Outlet
− 3 parameters
− Minimize pressure loss
− Minimize mach number
Outlet
Inlet
Dump Gap
Sensitivity
Mesh Morphing and Optimizer
Fluent Morpher‐Optimization Feature
Allows users to optimize product design based on shape deformation to achieve design objective
Based on free‐form deformation tool coupled with various optimization methods
Mesh Morphing
lies a geometric design change directly to the mesh in the solver
s a Bernstein polynomial‐based morphing scheme
eform mesh deformation defined on a matrix of control points leads to a ooth deformation
orks on all mesh types (Tet/Prism, CutCell, HexaCore, Polyhedral)
r prescribes the scale and direction of deformations to control points distributed evenly through the rectilinear region.
Process
What if?
OR
Optimizer
Setup Case
Setup Case
Run
Run
Regions
Setup Morph
Parameters
Setup Optimizer
Morph
Deformation
Optimize
Evaluate
Choose Optimizer
Auto
Optimal Deformation Definition
• Define constraint(s) (if any)
• Select control points and prescribe the relative ranges of motion
Objective Function
• Objective Function: Equal flow rate
Baseline Design
Optimized Design
Optimizer
Algorithms; Compass, Powell, Rosenbrock, Simplex, Torczon
uto
• Optimize!
Example: L‐Shaped Duct
Application: L‐shaped duct
Objective Function: Uniform flow at the outlet
Significant Improvement in Flow Uniformity
RBF Morph
How Does RBF‐Morph work?
A system of radial functions is used to fit a solution for the mesh
movement/morphing, from a list of source points and their
prescribed displacements
Radial Basis Function interpolation is used to derive the displacement at any location in the space
The RBF problem definition is mesh independent.
RBF‐Morph is Integrated with Fluent
Example 1: Internal Flow
Example 3: External Flow
Example 4: 50:50:50 Optimisation
Courtesy of Volvo Cars
Recently conducted conceptual study by ANSYS in conjunction with Volvo Cars
50 Million cell hybrid mesh of Volvo XC60
50 Design variants investigated using RBF‐Morph Addon for ANSYS Fluent and Workbench Design Explorer
50 hours total clock time to complete full optimisation on HPC Cluster
External Aerodynamics
olvo XC60 vehicle model
Four shape parameters
RBF Morph to define shape parameters
ANSYS DesignXplorer
• To drive shape parameters
• To create DOE
• To perform Goal Driven Optimization
eal Aerodynamics ptimization Process
– Capacity
– Automatic – Fast Step #1 : Baseline Model
Volume Mesh – TGrid
Cell Count : 50.2 Million Cells
rism Layers : 10 (First Aspect Ratio 10, Growth 1.1)
rism Count : 24.4 Million Cells
kewness < 0.9
Prism Layers
Cut Plane Y=0
Step #2 : CFD Setup
oundary Conditions
nlet : Velocity Inlet 100 kmph
Outlet : Pressure Outlet, 0 Pa (Gage)
Side walls : Wall, no‐slip
Top wall : Wall, no‐slip
olver Settings
Steady, PBCS, Green Gauss Node Based Gradient
Fluid : Incompressible air, Density = 1.225 kg/m3,
Turbulence : Realizable K‐epsilon, Non‐equilibrium wall treatment
Discretization : Pressure – Standard Momentum, TKE, TDR – 2nd Order
• Solution Controls
– Courant Number = 200
– ERF
Momentum, Pressure = 0.75
– URFs Density = 1.0, Body Forces = 1.0
Step #3 : RBF‐Morph
Boat Tail Angle (P2)
Roof Drop Angle (P3)
Step #3 : RBF‐Morph
Step #3 : RBF‐Morph
• Fully integrated within FLUENT and Workbench
• Easy to use
• Parallel => rapidly morph large size models
• Mesh independent solution works with all element
types (tetrahedral, hexahedral, polyhedral, etc.)
• Superposition of multiple RBF-solutions makes the
FLUENT case truly parametric (only 1 mesh is stored)
• RBF-solution can also be applied on the CAD
• Precision: exact nodal movement and exact feature
preservation.
Step #4 : Setup DesignXplorer
Central Composite Design, Face Centered, Enhanced
Step #5 : Running Simulations
Task
768 Cores
384 Cores
288 Cores
240 Cores
144 Cores
Time (Seconds)
Time (Seconds)
Time (Seconds)
Time (Seconds)
Time (Seconds)
Baseline Case (i.e. Design Point 1)
d volume mesh of baseline into the CFD solver and y solver settings
Solution
225
340
365
481
228
6979
11153
14409
17256
27246
ing CFD data file
681
538
558
600
532
Each Subsequent Design Point
ph vehicle shape
84
59
65
69
100
Solution
1284
1754
2208
2630
4100
ing CFD data file
734
559
572
621
532
30.80
35.63
42.98
50.28
72.19
al Run Time (Wall Clock) eded for All 50 Design nts (Hours)
Results :
Optimization
Results :
low Results Discussion
– Design point 1, 9, 19 & 25
– Velocity contours
– Iso‐surface of total pressure = 0.0
Design Points
Boat Tail Angle
(P2)
Long Roof Angle Green House (P3)
(P4)
Front Spoiler Drag Force (N) Angle (P5)
(P1)
1
0.000
0.000
0.000
0.000
388.01
9
0.000
1.500
0.000
1.900
393.01
19
1.850
‐2.300
‐0.700
0.000
372.30
25
‐1.850
1.500
‐0.700
0.000
397.33
Design Point #1
Design Point #19
Design Point #25
Adjoint Solver
Adjoint Solution?
An adjoint solver allows specific information about a fluid system to be computed that is very difficult to gather otherwise.
The adjoint solution itself is a set of derivatives.
• They are not particularly useful in their raw form and must be post‐processed appropriately.
• The derivative of an engineering quantity with respect to all of the inputs for the system can be computed in a single calculation.
– Example: Sensitivity of the drag on an airfoil to its shape.
There are 4 main ways in which these derivatives can be used:
1.
Qualitative guidance on what can influence the performance of a system strongly.
2.
Quantitative guidance on the anticipated effect of specific design changes.
3.
Guidance on important factors in solver numerics.
4.
Gradient‐based design optimization.
How to Use the Results ‐ Qualitative
GOAL: Identify features of a system design that are most influential in the performance of the system.
EXAMPLE:
– Sensitivity of the Drag on a NACA 0012 airfoil to changes in the shape of the airfoil.
– The shape sensitivity field is extracted from the adjoint solution in a post‐processing step.
High sensitivity – changes to shape have a big effect on drag
Low sensitivity – changes to shape have a small effect on drag
How to Use the Results ‐ Quantitative
GOAL: Identify specific system design changes that benefit the performance and quantify the improvement in performance that is anticipated.
EXAMPLE:
– Design modifications to turning vanes in a 90 degree elbow to reduce the total pressure drop.
– The optimal adjustment that is made to the shape is defined by the shape sensitivity field (steepest descent algorithm).
– Effect of each change can be computed in advance based on linear extrapolation.
Baseline
Modified
Original P = ‐232.8 Pa
Expected change computed using the adjoint and linear extrapolation = 10.0 Pa
Make the change and recompute the solution.
Actual change = 9.0 Pa
How to Use the Results ‐ Solver Numerics
GOAL: Identify aspects of the solver numerics and computational mesh that have a strong influence on quantities that are being computed that are of engineering interest.
EXAMPLE:
– Use the adjoint solution to identify parts of the mesh where mesh adaption will benefit the computed drag by reducing the influence of discretization errors.
Adapted Mesh
Baseline Mesh
Adapted Mesh
Detail
How to Use the Results ‐ Optimization
GOAL: Perform a sequence of automated design modifications to improve a specific performance measure for a system
EXAMPLE:
– Gradient‐based optimization of the total pressure drop in a pipe.
– Flow solution is recomputed and the adjoint recomputed at each design iteration.
100
90
80
Initial design
ptot [Pa]
70
60
50
40
30
Final design
20
30% reduction in total pressure drop after 30 design iterations
10
0
Mesh Morphing
Once a desired change to the geometry of the system has been selected, how is that change to be made?
• Mesh morphing provides a convenient and powerful means of changing the geometry and the computational mesh.
– Use Bernstein polynomial‐based morphing scheme discussed earlier
Mesh Morphing & Adjoint Data
Flow
Example: Sensitivity of lift to surface shape
Select portions of the geometry to be modified
Adjoint to deformation operation
Surface shape sensitivity becomes control point sensitivity (chain rule for differentiation)
Benefit of this approach is two‐fold
Smooths the surface sensitivity field
Provides a smooth interior and boundary mesh deformation
Mesh Morphing, Adjoint Data & Constraints
The adjoint solution is determined based on the specific flow physics of the problem in hand.
The effect of other practical engineering constraints must be reconciled with the adjoint data to decide on an allowable design change.
Example:
– Some walls within the control volume may be constrained not to move.
– A minimal adjustment is made to the control‐point sensitivity field so that deformation of the fixed walls is eliminated.
Fixed wall
Fixed wall
Moveable walls
Current Functionality
e adjoint solver is released with all Fluent 14 packages.
ocumentation is available
Theory
Usage
Tutorial
Case study
aining is available
nctionality is activated by Loading the adjoint solver add‐on module
new menu item is added at the top level
Current Functionality
Application Drivers
ey initial application areas are:
Low‐speed external aerodynamics
– F1 (increase downforce)
– Production automobiles (decrease drag)
Low‐speed internal flows
– Total pressure drop (reduce losses)
n Fluent 14.5 a mechanism for users to efine a wide range of observables of nterest will be provided.
•
•
•
•
Forces
Moments
Pressure drop
Swirl
•
•
•
•
•
Ratios
Products
Variances
Linear combinations
Unary operations
Current Scope
NSYS‐Fluent flow solver has very broad scope
djoint is configured to compute solutions based on some assumptions
Steady, incompressible, laminar flow.
Steady, incompressible, turbulent flow with standard wall functions.
First‐order discretization in space.
Frozen turbulence.
he primary flow solution does NOT need to be run with these restrictions
Strong evidence that these assumptions do not undermine the utility of the adjoint solution data for engineering purposes.
lly parallelized.
radient algorithm for shape modification
Mesh morphing using control points.
djoint‐based solution adaption
Example 1: Automotive Aerodynamics
Surface map of the drag sensitivity to shape changes
Example 2: Pressure Drop in a Duct
Total Pressure Drop (Pa)
Geometry
Predicted
Result
Original
‐‐‐
‐22.0
Modified
‐14.8
‐18.3
ggressive adjustment results in a 17% reduction in loss in just one design iteration
HPC Best Practices
Guidelines :
• Know your hardware lifecycle
• Have a goal in mind for what you want to achieve • Using Licensing productively
• Using ANSYS provided processes effectively
Hardware Considerations
•
This section is meant to provide an overview of the different hardware components and how they can effect solution time. •
Hopefully this will give you some of the tools to understand why some of the benchmark numbers in better detail. •
ANSYS would always recommend that the best thing to do before buying a system is to look at the latest benchmarks.
•
If you are not sure please ask. Effect of Clock Speed
Impact of CPU Clock on Application Performance
Processor: Xeon X5600 Series
Hyper Threading: OFF, TURBO: ON
Active cores: 12/node; Memory speed: 1333 MHz
(performance measure is improvement relative to CPU Clock 2.66 GHz)
1.40
1.35
Improvement due to Clock
Higher is better
1.30
1.25
1.20
1.15
2.66 GHz
1.10
2.93 GHz
1.05
3.47 GHz
1.00
0.95
0.90
0.85
0.80
Clock Ratio eddy_417K aircraft_2M turbo_500K sedan_4M
truck_14M
Effect of Memory Speed
We can see here the effect of memory speed. Impact of DIMM speed on ANSYS/FLUENT Application Performance
(Intel Xeon x5670, 2.93 GHz)
Hyper Threading: OFF, TURBO: ON
Active threads per node: 12
(performance measure improvement is relative to memory speed of 1066 MHz)
This has implications on how you build your hardware.
On other processors non‐
optimally filling the memory channels can slow the memory speed.
125%
120%
Impact of Memory Speed
Some processors types have slower memory speeds by default. 130%
115%
110%
1066 MHz
105%
1333 MHz
100%
95%
90%
85%
80%
eddy_417K
turbo_500K
aircraft_2M
sedan_4M
ANSYS/FLUENT Model
truck_14M
Turbo Boost (Intel) / Turbo Core (AMD)
Turbo Boost (Intel)/ Turbo Core(AMD) is a form of over‐clocking that allows you to give more GHz to individual processors when others are idle.
With the Intel’s have seen variable performance with this ranging between 0‐8% improvement depending on the numbers of cores in use.
The graph below for CFX on a Intel X5550. This only sees a maximum of 2.5% improvement. Hyper‐Threading: ANSYS Fluent
Hyper‐Threading Technology makes a single physical processor appear as two logical processors. Evaluation of Hyperthreading on ANSYS/FLUENT Performance
iDataplex M3 (Intel Xeon x5670, 2.93 GHz)
TURBO: ON
(measurement is improvement relative ot Hyperthtreading OFF)
This is not the same as physically having two logical processors and does not give double the speedup.
It is worth noting that this has licensing implications as you would need to oversubscribe the physical cores and hence would need double the HPC Licenses. Improvemet due to Hyperthreading .
Higher is better
In our tests we’ve seen as high as a 20% increase in performance although you can see the actual performance can be quite variable from the graph opposite. HT OFF (12 threads on 12 physical cores)
HT ON (24 threads on 12 physical cores)
1.10
1.05
1.00
0.95
0.90
eddy_417K
turbo_500K
aircraft_2M
ANSYS/FLUENT Model
sedan_4M
truck_14M
AMD vs. Intel
•
Traditionally Intel take the “power approach” in general in their 2 socket systems (faster core but less of them per processor/socket). •
Traditionally AMD take the economies of scale approach (more cores per processor but individually slower clock speeds). •
Remember that this landscape changes because they are constantly in competition with each other. •
Please note that whilst we do have some numbers for the new Intel Sandy‐bridge chips we do not have scaling numbers for the equivalent AMD 6200 series at the time of writing this presentation. 2 Socket vs. 4 Socket Systems
Current 4 socket systems come up slower than their 2 socket counterparts (based on Intel Westmere vs. Xeon E7‐8837).
• Clock speed slower
• Memory speed slower
• No additional memory bandwidth.
Performance of ANSYS Fluent on two‐socket and four‐socket based systems
Performance measure is Fluent Rating (higher values are better) 4‐socket based Systems
IBM HX5 Blade, X3850
(Xeon E7‐8837 series)
2‐socket based Systems
HS22/HS22V Blade, 3550/3650 M3, Dx360 M3
(Xeon 5600 Series)
odes
1
Sockets
Cores
2
12
Fluent
Rating
88
Nodes
Sockets
Cores
1
2
16
Fluent
Rating
96
Effect of the Interconnect
When going for multiple systems linked together the interconnect becomes an important factor.
ANSYS/FLUENT Performance
iDataplex M3 (Intel Xeon x5670, 12C 2.93 GHz)
Network: Gigabit, 10-Gigabit, 4X QDR Infiniband (QLogic, Voltaire)
Hyperthreading: OFF, TURBO: ON
Models: truck_14M
QLogic
10-Gigabit
Gigabit
FLUENT Rating
4000
Higher is better
The interconnect is the fabric that connects the nodes. We can see from the graph opposite with FLUENT how quickly the performance of Gigabit Ethernet drops off. Voltaire
5000
3000
2000
1000
0
12
24
48
96
192
Number of Cores used by a single job
384
768
ANSYS Fluent Auto‐Partitioning
Time to Partition 200M Cavity Case over 768 cores
cavity case, 768 cores
12
to partitioning is now very quick
Time in seconds
ss than 10s to process 800M cells!
10
8
6
4
2
0
rial pre‐partitioning step no Time
nger required
200M
400M
600M
800M
2.914
4.706
6.617
9.86
Time in seconds
Time to Partition 111M Truck Case
truck_111m
9
8
7
6
5
4
3
2
1
ANSYS CFX Partitioning
ptimize parallel partitioning in multi‐core clusters (CFX)β
Partitioner determines number of connections between partitions and optimizes part.‐host assignments
P4
P6
P2
P7
P1
P8
P3
P5
Partitioning step finds adjacency amongst partitions; partitions with max adjacency are grouped on same compute nodes
‐use previous results to initialize calculations on large problem (CFX) β
P1
P3
P2
P7
Large case interpolation for cases with >~100M nodes
P5
P6
P4
P8
ean up of coupled partitioning option for multi‐domain cases (CFX)
Eliminates ‘isolated’ partition spots
amatically reduced partitioning times for cases th fluid‐solid interfaces and very large numbers of gions
Compute Node 1
Compute Node 2
ANSYS Fluent Parallel Scalability
Intel Harpertown
nsistently improved scalability
7000
6000
oss releases
5000
4000
6.3.0
3000
12.0.0
2000
1000
0
dan, 4M cells
0
n X5560 @ 2.80GHz (Nehalem EP)
100
200
300
400
500
600
Intel Westmere
70000
60000
50000
40000
13.0.0
12.0.0
13.0.0
30000
20000
14.0.0
ANSYS Fluent Parallel Scalability
Intel Harpertown
nsistently improved scalability
450
400
oss releases
350
300
250
6.3.0
200
12.0.0
150
100
50
0
ck, 111M cells
0
ICE 8400EX, Intel 6‐core
200
400
600
800
1000
1200
Intel Westmere hex‐core 2.93 GHz
2500
2000
1500
13.0.0
12.0.0
13.0.0
1000
500
14.0.0
ANSYS Fluent Parallel Scalability on Intel
SYS Fluent 14
tive Performance
er is better
Leading Performance for fluid flow simulation
Sedan_4m
1.86
Geomean
1
6 core Xeon X5675
Geomean
8 core Xeon E5‐2680
Truck_14m
1.53
1
6 core Xeon X5675
8 core Xeon E5‐2680
The memory bandwidth of the Intel® Xeon®
processor E5-2600 product family allows excellent
scalability and per core performance.
Support for higher speed memory DIMMs, added
on-core capacity for memory loads, as well as a
larger cache size are key to extending performance
and scalability.
Higher memory bandwidth has a pronounced impact
with fully coupled solver applications, which are the
most memory intensive. Sedan_4m is shown as an
example of fully coupled solver performance.
Truck_14m is representative of segregated solver
performance. The horizontal line at 1.63 represents
the geomean speedup over 6 standard benchmarks.
ANSYS CFX Parallel Scalability on Intel
Intel Xeon 5650
AirliftReactor
BigPipe
CombBVM
CombEDM
Cylinder
IndyCar
Internal
LeMansCar
LES_001
Pump
RadCity
RadFurnace
tageCompressor
aticMixer100MM
StaticMixer100
StaticMixer200
taticMixer 400k
Turbine
Wigley100
Intel Xeon E5-2680
• Good scalability and more operations per
clock make obtaining results on Intel®
Xeon® E5 1.68x faster than on Intel Xeon
5600 platforms
• For end user it is about faster turnaround
or solving larger tasks with the same
resources along with lower TCO
Source: Published/submitted/approved results as of March 6, 2012. Software and workloads used in performance tests
may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may
Including Monitors
3072 cores
alability with Monitors
Scalability to higher core counts
Simulations with monitors including plotting and printing
Hex‐core mesh, F1 car, 130 million cells
monitor‐enabled Monitor support optimizations 200
400
600
800
1000
Example data for scaling with R14 monitors
maintain scalability expectations
Fluids I/O
UENT, CFX and AUTODYN use a “singular” e structure. This means there is one global set of files and every process writes to them.
is methodology falls down at a large mber of cores where the file I/O becomes bottleneck.
CFX deals with this by using inline compression cdat)
FLUENT has both inline compression (cdat) and at v12.x introduced support for a Parallel File pdat). rallel file system support in ANSYS UENT
– ~10x ‐ 20x speedup for data write
– Eliminates scaling bottleneck for data ANSYS FLUENT
Serial I/O
Parallel I/O
HPC Fluids Demonstration Case
To Demonstrate 50:50:50 Method
– Volvo XC60 vehicle model
– Four shape parameters
– RBF Morph (Integrated within FLUENT) to define shape parameters
– Grid morphing in parallel
ANSYS WorkBench (Frame Work to Automate Process)
– To drive shape parameters
– To create DOE
– To perform Goal Driven Optimization
The 50:50:50 Method
50
50 design points in the design space
EXTENT
50
50 million cells used in CFD simulation of each design point
ACCURACY
50
50 hours total elapsed time to simulate all the design points
SPEED
“One – Click” – Entire design space is simulated and post‐
processed completely automatically after the initial baseline case setup
HPC Fluids Demonstration Case
STEP 1
Prepare Meshed Model for Baseline Vehicle Shape
STEP 2
CFD Solver Setup, Define Shape Parameters
STEP 3
Generate DOE using Input Shape Parameters
Mesh Morpher Integrated within FLUENT
Solver (FLUENT), Optimizer (DX) & Post Morph Vehicle Shape
Processor (CFD Post) Integrated within STEP 4
ANSYS WorkBench
Run CFD Simulation
Collate Data,
HPC Fluids Demonstration Case
Task
768 Cores
384 Cores
288 Cores
240 Cores
144 Cores
Time (Seconds)
Time (Seconds)
Time (Seconds)
Time (Seconds)
Time (Seconds)
Baseline Case (i.e. Design Point 1)
d volume mesh of baseline into the CFD solver and y solver settings
Solution
225
340
365
481
228
6979
11153
14409
17256
27246
ing CFD data file
681
538
558
600
532
Each Subsequent Design Point
ph vehicle shape
84
59
65
69
100
Solution
1284
1754
2208
2630
4100
ing CFD data file
734
559
572
621
532
30.80
35.63
42.98
50.28
72.19
al Run Time (Wall Clock) eded for All 50 Design nts (Hours)
HPC Fluids Demonstration Case
Compute Cluster Details
1. Intel’s Endeavor Cluster
2. Intel Xeon X5670 (dual socket)
3. Clock speed 2.93 GHz
4. Six cores per socket (12 cores per node)
5. 24 GB RAM @ 1333 MHz, SMT ON, Turbo ON
6. QDR Infiniband
7. RHEL Server Release 6.1
GPU Acceleration for CFD
Radiation View Factor calculation (ANSYS FLUENT 14 ‐
beta)
capability for “specialty physics” view factors, ray tracing, reaction rates, etc. Getting the right setup is a balancing act..
Factors to Consider
•
HPC Licensing Cost
•
Cost of Hardware •
Complexity of Deployment and Maintenance
HPC Licensing Cost
•
ANSYS HPC is licensed in either the HPC Workgroup/Enterprise (or individually) or HPC Packs.
•
Given that it is licensed per partition (which in most cases translated to a core) – the best value for money is in getting the best scalability per core as possible. •
When running multiple cores make sure you are using them as effectively as the memory bandwidth allows.
Cost of Hardware
•
ANSYS will, in general, recommend the best hardware for performance that gets you the best out of your licensing investment. However you may need to make trade‐off's for your budget. •
2 socket systems provide the best performance but more inherently more complexity (and hence cost) because of the need for high speed interconnects when in a cluster. •
Current 4 socket systems have less performance than their 2 socket counterparts but are also cheaper because of their lack of requirement for the high speed interconnects to get to higher numbers of nodes at the low end.
Complexity of Deployment and Maintenance
•
A large cluster can have significant overheads in ease of deployment & on‐going maintenance costs.
•
A 4 socket system, whilst having less performance, may provide an easier deployment and maintenance route at the lower end and will be a better fit to what the average IT department is used to. •
Often users get too caught up on per core performance at the detriment of not getting any extra speedup at all. •
It is important to purchase something you feel you can internally support. •
Purchase 3rd party support for high performance clusters if you do not feel you have the skills to support it internally. Remember the Following ...
If you opt for unsupported infrastructure
– This does not mean that it will not work but you use them at your own risk.
– We may ask you to replicate it on a system that is supported before providing further support if you run into problems!
We recommend:
– Buying Supported Operating systems and Hardware
– Using ANSYS Supported Practices
– Talking to us before buying! It is in all our interests that you get this right!
Information Available
NSYS Partner Solutions
– http://www.ansys.com/corporate/partners/partners‐hpc.asp
• Reference configurations
• Performance data
• White papers
• Sales contact points
erformance Data
– http://www.ansys.com/benchmarks
Information Available
NSYS Platform Support
http://www.ansys.com/services/ss‐platform‐support.asp
– Platform Support Policies
– Supported Platforms
– Supported Hardware
– Tested systems
NSYS Virtual Demo Room
http://www.ansys.com/demoroom/
– Click on HPC!
Information Available
e Manual
Sections on best practices and parallel processing for various solvers
nstallation walkthroughs for installing the products, parallel processing, licensing and RSM remote solve manager)
NSYS Advantage
Online Magazine
Information Available
stomer Portal
http://www1.ansys.com/customer/
– Knowledge Resources
– Installation and Systems FAQ’s
stomer Support
http://www1.ansys.com/customer/
Portal, Email or Phone
Automated Design Exploration and Optimization + HPC Best Practices