About the CS-Storm Hardware Guide
Transcription
About the CS-Storm Hardware Guide
CS-Storm Hardware Guide HR90-2003-D Contents Contents About the CS-Storm Hardware Guide.......................................................................................................................4 CS-Storm System Description...................................................................................................................................5 CS-Storm Power Distribution.....................................................................................................................................9 CS-Storm System Cooling.......................................................................................................................................12 CS-Storm Chilled Door Cooling System........................................................................................................12 CS-Storm Rack Conversion Kits..............................................................................................................................14 CS-Storm Envirnonmental Requirements................................................................................................................19 CS-Storm Chassis Components..............................................................................................................................20 Font and Rear Panel Controls and Indicators...............................................................................................26 Hard Drive Support........................................................................................................................................28 1630W Power Supplies.................................................................................................................................29 Power Backplane...........................................................................................................................................31 GPU Power Connections...............................................................................................................................32 Add-in Card LED Indicators...........................................................................................................................33 PCI Riser Interface Boards............................................................................................................................35 Flex-Foil PCIe Interface Cables.....................................................................................................................35 CS-Storm GPU Sleds..............................................................................................................................................37 GPU trays......................................................................................................................................................38 Right and Left PCI Riser Boards...................................................................................................................39 NVIDIA Tesla GPUs.................................................................................................................................................43 NVIDIA GPU Boost and Autoboost ...............................................................................................................46 CS-Storm Fan Control Utility....................................................................................................................................49 S2600WP Motherboard Description........................................................................................................................60 Component Locations....................................................................................................................................62 Architecture...................................................................................................................................................63 E2600 v2 Processor Features.......................................................................................................................64 Integrated Memory Controller (IMC)..............................................................................................................65 RAS Modes...................................................................................................................................................68 Integrated I/O Module....................................................................................................................................69 Riser Card Slots............................................................................................................................................71 Integrated BMC.............................................................................................................................................72 S2600TP Motherboard Description.........................................................................................................................76 Component Locations....................................................................................................................................78 Architecture...................................................................................................................................................81 E2600 v3 Processor Features.......................................................................................................................82 HR90-2003-D CS-Storm Hardware Guide 2 Contents Intel E5-2600 v4 Processor Features............................................................................................................83 S2600x Processor Support......................................................................................................................................86 Motherboard System Software......................................................................................................................88 Memory Population Rules.............................................................................................................................89 Motherboard Accessory Options...................................................................................................................90 BIOS Security Features.................................................................................................................................93 Quickpath Interconnect..................................................................................................................................95 InfiniBand Controllers....................................................................................................................................95 Motherboard BIOS Upgrade....................................................................................................................................98 HR90-2003-D CS-Storm Hardware Guide 3 About the CS-Storm Hardware Guide About the CS-Storm Hardware Guide The CS-Storm Hardware Guide describes the components in the 2626X and 2826X server chassis and systemlevel information about the CS-Storm platform. Hardware Releases HR90-2003-D April 2016. Added clarification that the 2826X compute chassis supports 4 or 8 GPUs and the IO/login chassis supports 4 GPUs. Added a section describing Intel® Xeon E5-2600 v4 (Broadwell) processor features. HR90-2003-C March 2016. Added notes that the Intel Xeon E5-2600 v4 (Broadwell) processor family and DDR4 2400 MT/s memory are available. Included a reference/link to the new Cray Chilled Door Operator Guide in the System Cooling section. HR90-2003-B August 2015. Added shipping, operating, and storage environment requirements. Information on the following topics was also added: K80 GPUs, Intel Taylor Pass motherboard, and additional motherboard I/O capabilities through a 24-lane PCI slot. HR90-2003-A Added CS-Storm rack conversion kit information. HR90-2003 The initial release of the CS-Storm Hardware Guide including the 2626X server chassis and supported Intel® Washington Pass (S2600WP) motherboards with NVIDIA® K40 GPUs. Scope and Audience This publication does not include information about peripheral I/O switches or network fabric components. Refer to the manufacturers documentation for that equipment. This document assumes the user has attended Cray hardware training courses and is experienced in maintaining HPC equipment. Feedback Visit the Cray Publications Portal at https://pubs.cray.com and use the "Contact us" link in the upper-right corner to make comments online. Comments can also be emailed using the [email protected] address. Your comments and suggestions are important to us. We will respond to them within 24 hours. HR90-2003-D CS-Storm Hardware Guide 4 CS-Storm System Description CS-Storm System Description The Cray® CS-Storm cluster supercomputer is an air-cooled, rackmounted, high-density system based on 2U, 24in wide, servers mounted in a 48U rack. Figure 1. CS-Storm Front View Features: ● Each CS-Storm rack can hold up to 22 rackmounted servers, models 2626X and 2826X. ● Delivers up to 329 double-precision GPU teraflops of compute performance in one 48U rack. ● Completely air-cooled platform. ● Single 100A, 480V, 3-phase power feed to custom PDU in each cabinet. ● Optional custom liquid-cooled 62kW rear door heat exchanger. ● Multiple interconnect topology options, including 3D Torus/fat tree, single/ dual rail, and QDR/FDR InfiniBand. ● 2626X and 2826X compute and I/O servers are 2U, Standard EIA, 24” wide: ○ Compute nodes host 4 or 8 NVIDIA® K40 or K80 GPUs in each chassis. ○ Based on Intel® S2600WP and S2600TP motherboards, 16 DIMM slots. ○ Support for up to 6 x 2.5-in solid-state hard drives. ○ Support for up to three 1630W power supplies and N+ 1 redundancy. ○ Optional QDR/FDR InfiniBand host channel adapters. A 48U rack includes a single power distribution unit (PDU) that provides the electrical connections for equipment in the rack. A single facility power connection supplies 480V, 3-phase, 100A power (up to 62kW per rack). The capability of the power supplies to accept 277V input power enables the rack to support 480V facility power without an optional rackmounted power transformer. The system supports a comprehensive HPC software stack including tools that are customizable to work with most open-source and commercial compilers, schedulers and libraries. The Cray HPC cluster software stack includes Cray’s Advanced Cluster Engine (ACE™) management software, which provides network, server, cluster and storage management capabilities with easy system administration and maintenance. The system also supports the optional Cray Programming Environment on Cluster Systems (Cray PE on CS), which includes the Cray Compiling Environment, Cray Scientific and Math Libraries and Performance Measurement and Analysis Tools. HR90-2003-D CS-Storm Hardware Guide 5 CS-Storm System Description Figure 2. Cray CS-Storm Rear View Rear Front 42U or 48U, 19” or 24” EIA standard rack (24”, 48U shown) Power distribution unit (PDU) Optional input power stepdown transformer for 208V equipment and water-cooled door Facility 480V/100 A power connection Table 1. Cray CS-Storm General Specifications Feature Architecture Description Air cooled, up to 22 servers per 48U rack Supported rack options and corresponding maximum number of server nodes: Power Cooling HR90-2003-D ● 24” rack, 42U and 48U options – 18 and 22 nodes, respectively ● 19” rack, 42U and 48U options – 10 and 15 nodes, respectively ● Up to 63 kW in a 48U standard cabinet, depending on configuration ● 480 V power supplied to rack with a choice of 208VAC or 277VAC 3-phase power supplies. Optional rack-mounted transformer required for 208V equipment ● Air cooled ● Airflow: 3,000 cfm; Intake: front; Exhaust: back ● Optional passive or active chilled cooling rear-door heat exchanger CS-Storm Hardware Guide 6 CS-Storm System Description Feature Cabinet Weight Description ● 2,529 lbs.; 248 lbs./sq. ft. per cabinet (48U standard air-cooled door) ● 2,930 lbs.; 287 lbs./sq. ft. per cabinet (48U with optional rear-door heat exchanger) Cabinet Dimensions 88.5” x 30” x 49” (88.5” x 30” x 65” with optional rear-door heat exchanger) Processors (per node) One or two 64-bit, Intel Xeon E5-2600 processors: (v2 on S2600WP, v3 [Haswell] and v4 [Broadwell] on S2600TP) Memory (per node) ● Sixteen DIMM slots across eight memory channels ● S2600WP: ● ● Chipset ○ 512 GB, registered DDR3 (RDIMM), load reduced DDR3 (LRDIMM), unregistered DDR3 (UDIMM) ○ DDR3 transfer rates of 800/1066/1333/1600/1867 MT/s S2600TP: ○ 1,024 GB RDDR4, LDDR4 ○ DDR4 transfer rates of 1600/1866/2133/2400 MT/s NVIDIA Tesla GPU accelerators (K40 – 12 GB GDDR5 memory, K80 – 24 GB GDDR5 memory) S2600WP: Intel C600-A Platform Controller Hub (PCH) S2600TP: Intel C610 Accelerators (per node) Interconnect External I/O Connections Internal I/O connectors/ headers ● Support for 4 or 8 NVIDIA® Tesla® K40 or K80 GPU accelerators ● K40: One GK110 GPU and 12 GB of GDDR5 on-board memory ● K80: Two GK210 GPUs and 24 GB of GDDR5 on-board memory (12 GB per GPU) ● Optional InfiniBand with Mellanox ConnectX®-3/Connect-IB, or Intel True Scale host channel adapters ● Options for single or dual-rail fat tree or 3D Torus ● DB-15 Video connectors ● Two RJ-45 Network Interfaces for 10/100/1000 LAN ● One stacked two-port USB 2.0 (Port 0/1) connector ● Optional InfiniBand QSFP port ● Bridge Slot to extend board I/O ● HR90-2003-D ○ Four SATA/SAS ports for backplane ○ Front control panel signals ○ One SATA 6Gb/s port for Disk on Module (DOM) One USB 2.0 connector CS-Storm Hardware Guide 7 CS-Storm System Description Feature Description ● One 2x7-pin header for system FAN module ● One DH-10 serial Port A connector ● One SATA 6Gb/s (Port 1) ● One 2x4-pin header for Intel RMM4 Lite ● One 1x4-pin header for Storage Upgrade Key ● One 1x8 pin connector for backup power control connector (S2600TP) Power Connections Two sets of 2x3-pin connector System Fan Support Three sets of dual rotor fans software controlled using hydrad daemon Riser Support Four PCIe 3.0 riser slots Video Hard Drive Support ● Riser slot 1 - x16 PCIe 3.0 ● Riser slot 2 - S2600TP: x24 PCIe 3.0 , x16 with InfiniBand. ● Riser slot 2 - S2600WP: one x16 PCIe 3.0 and one x8 PCIe3 in one physical slot or one x8 PCIe 3.0 with InfiniBand ● Riser slot 3 - S2600WP: x16, S2600TP: x24 ● Riser slot 4 - x16 PCIe 3.0 ● One Bridge Slot for board I/O expansion ● Integrated 2D Video Graphics controller ● DDR3 Memory (S2600WP - 128MB , S2600TP - 16MB) S2600WP: One SATA port at 6Gb/s on the motherboard. Four SATA/SAS ports (from SCU0; SAS support needs storage upgrade key) and one SATA 6Gb/s port (for DOM) are supported through motherboard bridge board (SATA backplane). Six solid-state hard drives are supported in each chassis. S2600TP: Ten SATA 6Gb/s ports, two of them are SATA DOM compatible. RAID Support Server Management HR90-2003-D ● Intel RSTe RAID 0/1/10/5 for SATA mode ● Intel ESRT2 RAID 0/1/10/5 for SAS/SATA mode ● Cray Advanced Cluster Engine (ACE™): complete remote management capability ● On-board ServerEngines® LLC Pilot III® Controller ● Support for Intel Remote Management Module 4 Lite solutions ● Intel Light-Guided Diagnostics on field replaceable units ● Support for Intel System Management Software ● Support for Intel Intelligent Power Node Manager (PMBus®) CS-Storm Hardware Guide 8 CS-Storm Power Distribution CS-Storm Power Distribution CS-Storm systems that are fully deployed at data centers with a 480V power source typically use the 72 outlet rack PDU shown below. The 1630W power supplies in each CS-Storm server connect to the 277VAC PDU outlets using 1.5 m power cords. Rack PDU Figure 3. CS-Storm 48U Rack PDU The CS-Storm rack PDU receives 480VAC 3-phase facility power through a single AC input connector as shown. Each phase supports 100A maximum. The output voltage to each AC output connector on the Neutral PDU is 277VAC. A 60A, 36-outlet version of the rack Line PDU is also available for less populated configurations. 2 2 DI DI G P 3 4 5 6 H DD H DD G L3 H DD H DD L2 H DD H DD L1 L3 Rack PDU features: L3 L2 L1 L3 ● Input Current: 100A maximum per phase ● Output voltage: 277VAC or 208 VAC ● Output power: 1.8kW per port (6.5A, maximum 10A designed) Ground L2 L1 L2 L1 L3 L2 L1 L3 L2 L1 L3 L2 L1 L3 L2 ● Frequency: 50-60Hz ● 1 AC input connector: 480VAC – Hubbell HBL5100P7W Rack PDU 72 - 277VAC 10A outlets L1 L3 L2 L1 L3 L2 L1 L3 ● 72 output connectors: 277VAC or 208VAC – RongFeng RF-203P-HP L2 L1 L3 L2 L1 L3 ● 104.3A per phase @ 277VAC output ● A circuit breaker at the bottom of this PDU protects and applies power to all outlets L2 L1 L3 L2 L1 PDU Circuit Breaker Power cable 5 Core, 2 AWG Facility AC Power Hubbell HBL5100P7W 3-phase 277V/480VAC 4P5W HR90-2003-D CS-Storm Hardware Guide 9 CS-Storm Power Distribution Cray TRS277 Step-Down Transformer The TRS277 step-down, 2U rackmount transformer can power other 208V switches and equipment that cannot accept 277VAC input power. Figure 4. Cray TRS277 Step-Down Transformer Specifications: ● Output: 1.2kW (1.5kVA) ● Frequency: 50-60Hz ● Input: 277VAC ● Output: 220VAC (10 outlets, C13) PDU Options Cray offers other PDU options from the rack PDU and transformer described above. PDU choices may be based on data center facilities/requirements, customer preferences, and system/rack equipment configurations. CS-Storm Chassis Power Distribution Three (N+1) power supplies in 2626X/2826X chassis receive power from the rack PDU. The power supplies are installed in the rear of the chassis and distribute power to all the components in the chassis through the power backplane. The power backplane is located at the bottom of the chassis, below the motherboard. Each PCIe riser receives power from the power backplane which supports the PCI add-on cards and 4 GPUs in the GPU sled. 12V auxiliary power from the right and left PCI risers connects to each GPU tray through a blind connector when the tray is installed in the GPU sled. HR90-2003-D CS-Storm Hardware Guide 10 CS-Storm Power Distribution Figure 5. CS-Storm Power Subsystem Major Components Left PCI riser Right PCI riser Fan power and tachometer connector to GPU fan PCI risers provide PCIe bus power and control to GPUs and add-on cards 12V auxiliary power connectors to GPUs 12V power to motherboard Hard drive power HD D 1/ 2 HD G G P D 1/ 2 12V Auxiliary power to GPUs ID ID Power button 1630W power supplies receive 277VAC or 208VAC from rack PDU Power backplane HR90-2003-D CS-Storm Hardware Guide 11 CS-Storm System Cooling CS-Storm System Cooling The CS-Storm 2626X/2826X server chassis is air-cooled by two central chassis fans, power supply fans, and a fan at each end of each GPU sled. These fans pull air in from the front of the chassis and push air out the back as shown below. The central chassis fans 1A and 1B pull cool air in from the front across the hard disk drives, and direct it over the motherboard, power backplane, and 1630W power supplies. These central chassis fans send tachometer signals and receive power from the power backplane. The central chassis fan speed is controlled by the baseboard management controller (BMC) integrated on the motherboard. GPU fans 1-4 receive power and send tachometer signals through the PCI riser and power backplane. GPU fan speeds are controlled by the hydrad fan speed control utility. Figure 6. CS-Storm 2626X/2826X Chassis Cooling Subsystem GPU fan 2 Rear Power supply cooling fans Chassis fan 1A G P U 2 0 1 Chassis fan 1B Fan power and tachometer cable to PCI riser GPU fan 1 GPU fan 3 Airflow Front Airflow GPU fan 4 CS-Storm Chilled Door Cooling System An optional Motivair® ChilledDoor® rack cooling system is available for attaching to the back of CS-Storm racks. The 48U rack supports an optional 62kW chilled-water heat exchanger. The 42U rack supports a 57kW heat exchanger. The chilled door is hinged and replaces the rear door of the CS-Storm rack. This chilled door uses 65oF facility-supplied water or a coolant distribution unit (CDU) provided with the cooling door system. The chilled door removes heat from the air exhausted out the back of the rack. Fans inside the chilled door draw the heated air through a heat exchanger where heat load is removed and transferred to the cooled water system. A menu-driven programmable logic controller (PLC) with a built in screen and alarm system is included in the chilled door. The PLC gives access to all controls, alarms and event history. The LCD display indicates normal operating conditions with an override display for alarms. Parameters monitored and controlled include fan speed, water flow, inlet air, outlet air, inlet water, and outlet water temperatures. HR90-2003-D CS-Storm Hardware Guide 12 CS-Storm System Cooling Refer to the Cray Chilled Door Operator Guide for detailed information about the ChilledDoor and CDU control and monitoring systems, programming displays, and set points and alarms. This guide also includes maintenance and operation procedures, and a list of spare parts is included at the end of the document. Figure 7. CS-Storm Rack Rear Door Heat Exchanger Figure 8. CS-Storm Rear Door Heat Exchanger Cooling System 65° Chilled Water Returning to Chiller at 75° F Standard or Custom Server Rack EC Fans 75° F Room Air Out HR90-2003-D CS-Storm Hardware Guide 75° F Room Air In 13 CS-Storm Rack Conversion Kits CS-Storm Rack Conversion Kits There are two CS-Storm rack conversion kits: ● 24-to-19: Mounts 24-in CS-Storm servers in a 19-in rack ● 19-to-24: Mounts 19-in servers/switches in a CS-Storm 24-in rack CS-Storm 24-to-19in Vertical-Mount Conversion Kit A custom 14U rack conversion kit is available for mounting CS-Storm servers vertically in a 19-in rack rather than their normal horizontal position in a 24-in rack. This assembly is shown in CS-Storm 24- to 19-inch Conversion Assembly on page 16and is also referred to as the 14U vertical-mounting kit. The assembly has five 2U wide slots for mounting up to five CS-Storm 2626X or 2826X servers. The front-right handle and bracket of the 2626X/2826X server must be removed and replaced with a server bracket. The server is slid into the 14U rack assembly on its side and is secured to the lower tray with a single screw. Another bracket is attached to the rear of the 2626X/2826X server. This rear bracket acts as a safety stop block to prevent the 2626X/2826X server from unintentionally being removed from the rack. To remove the server, press the release pin on the front of the assembly to disengage the locking clip built into the roof of the conversion rack. DANGER: ● Heavy Object. Mechanical lift or two person lift are required depending on your equipment. Serious injury, death or equipment damage can occur with failure to follow these instructions: ● Each CS-Storm server can weigh up to 93lbs (42kg). ● When installing these servers below 28U, a server lift is required to remove and/or install them from a rack. If a lift is not available, two or more people must use safe lifting techniques to remove/install these servers. ● When installing these servers at or above 28U, a server lift is required to remove/install a 2626X or 2826X server. ● Personnel handing this equipment must be trained to follow these instructions. They are responsible for determining if additional requirements are necessary under applicable workplace safety laws and regulations. A CS-Storm Server Lift video is available that shows how to assemble and use the lift and use it to remove or install a CS-Storm server. Configuration Rules for CS-Storm Servers in 19-in Racks ● Top of rack (TOR) switches need to be contained in the 2U air shroud ● Switches need to be from the Cray approved list (front-to-back/port-side airflow only) HR90-2003-D CS-Storm Hardware Guide 14 CS-Storm Rack Conversion Kits ● ● 42U CS-Storm Rack: ○ Maximum of two vertical-mounting kits per rack (up to 10 CS-Storm servers) ○ Other 19-in wide servers with can be installed in the empty 12U space 48U CS-Storm Rack: ○ Maximum of three vertical-mounting kits per rack (up to 15 CS-Storm servers) ○ Other 19-inch wide servers with can be installed in the empty 4U space Installation Recommendations ● Install conversion kit parts from the front/rear of the cabinet. There is no need to remove any cabinet side panels. ● Use two people, one front and one rear to install the upper and lower trays from the front of the rack. ● Install the lower tray, then the four braces, then the upper tray, working bottom to top. ● Position the screws in the rear mounting brackets to set the tray length to fit the desired post-to-post spread, 30-in recommended (see figure). Don't fully tighten these screws until the trays and upper-to-lower braces are installed. ● Tilt the trays at an angle, side to side, when installing them from the front of the cabinet. The front cutout on each side provides clearance around the front posts so the trays can then be leveled and attached to the posts. HR90-2003-D CS-Storm Hardware Guide 15 CS-Storm Rack Conversion Kits Figure 9. CS-Storm 24- to 19-inch Conversion Assembly Server removal safety bracket/ear (beveled edge faces rear) Bracket/ear functions as a safety stop so the server can’t be removed from the rack unintentionally until the server release pin is pressed. Pressing the server pin disengages the locking clip in the roof of the upper tray. M4 X 6.0 flush head screws (3) Parts for the 2626X/2826X server (front lock bracket and handle, safety bracket/ear and screws) are provided in a hardware conversion bag, separate from the rack conversion kit. CS-Storm 2628X/2828X server Hardware for the 24- to 19-inch conversion kit is provided with the kit. st po 0” to st po 3 14U (top to bottom) Upper 14U tray assembly Server release pin (spring loaded, press to release server locking clip to remove server) Server mounting screw DI DI PG Upper-to-lower rear braces (2) G Lower 14U tray assembly Rear vertical mounting rail Server bracket mounting screw 2626X/2826X front bracket with handle (replaces rack handle/bracket) Screw positions for mounting depth of 30” (post-to-post) Upper-to-lower front braces (2) (mount flush to back of post) Server slots (5 - 2U) Tray cutout (provides clearance around post when installing tray) Front vertical mounting rail Rear tray mounting brackets (2R, 2L) [mounts tray assembly to rear vertical mounting rail] M4 x 6.0 screws (16 for attaching upper-to-lower braces to upper/lower trays) CS-Storm 19- to 24-inch Conversion Kit A custom 10U conversion kit is available for mounting standard 19-inch servers/switches in a CS-Storm 24-inch wide rack. This assembly is shown in CS-Storm 19- to 24-inch Conversion Assembly on page 17 and is also referred to as the 10U conversion kit. This kit has a load limit of 600lbs (270kg). HR90-2003-D CS-Storm Hardware Guide 16 CS-Storm Rack Conversion Kits Figure 10. CS-Storm 19- to 24-inch Conversion Assembly 10U rear frame (2) Distance from outside edges of front and back 10U frames should be set at 29.5 inches (75 cm) for 19-inch equipment 10U (top to bottom) Horizontal bars (4) Frame alignment tabs (4) Standard 19-inch 1U server/switch 10U front frame (2) Load limit: 270 kg/600 lbs Rear vertical-mounting rail Front verticalmounting rail (24-inch wide, 42U or 48U rack) CS-Storm Server Support Brackets Each CS-Storm 2626X and 2826X rackmount server sits on a pair of support brackets mounted to the 24-inch rack. These support brackets are included in rack all preconfigured CS-Storm cabinets. A set of support bracket/ angle assemblies should be ordered when additional or separate CS-Storm servers are ordered. Support bracket/angle part numbers for mounting 2626X and 2826X servers in 24-inch racks: ● 101072200: left support angle assembly ● 101072201: right support angle assembly HR90-2003-D CS-Storm Hardware Guide 17 CS-Storm Rack Conversion Kits Figure 11. CS-Storm Server Support Brackets Rear angle M6 X 16.0 Pan screws (4) Shelf to support server Rear angle Front angle (101072201) Front angle (101072200) HR90-2003-D CS-Storm Hardware Guide 18 CCS Environmental Requirements CCS Environmental Requirements The following table lists shipping, operating and storage environment requirements for Cray Cluster Systems. Table 2. CCS Environmental Requirements Environmental Factor Requirement Operating Operating temperature Operating humidity 41° to 95° F (5° to 35° C) [up to 5,000 ft (1,500 m)] ● Derate maximum temperature (95° F [35° C]) by 1.8° F (1° C) ● 1° C per 1,000 ft [305 m] of altitude above 5,000 ft [1525 m]) ● Temperature rate of change must not exceed 18° F (10° C) per hour 8% to 80% non-condensing Humidity rate of change must not exceed 10% relative humidity per hour Operating altitude Up to 10,000 ft. (up to 3,050 m) Shipping Shipping temperature -40° to 140° F (-40° to 60° C) Temperature rate of change must not exceed 36° F (20° C) per hour Shipping humidity 10% to 95% non-condensing Shipping altitude Up to 40,000 ft (up to 12,200 m) Storage Storage temperature 41° to 113° F (5° to 45° C) Temperature rate of change must not exceed 36° F (20° C) per hour Storage humidity 8% to 80% non-condensing Storage altitude: Up to 40,000 ft (12,200 m) HR90-2003-D CS-Storm Hardware Guide 19 2626X/2826X Chassis Components 2626X/2826X Chassis Components Cray CS-Storm 2626X/2826X rackmount servers use an EIA standard 24-inch wide chassis. They are configured as compute nodes or I/O (login/service) nodes. The nomenclature 2626X/2826X is used to describe features common to both node types. Major components of both node types are shown below after the features table. Table 3. CS-Storm Node Types Product Name Description 2626X / 2826X Compute node and I/O or login node 2626X8 / 2826X8 Compute node 2626X8N / 2826X8N Compute node with 4 or 8 NVIDIA GPUs 2626X2 / 2826X2 I/O or login node 2626X2N / 2826X2N I/O or login node with 4 NVIDIA GPUs Figure 12. 2626X/2826X Chassis Input power from rack PDU Standard 24-inch wide 2U rackmount chassis RD AR WA RE TO E TH RD AR WA RE TO E TH ID Six hard drive bays Front panel controls and indicators Table 4. CS-Storm 2626X/2826X Features Feature Description Architecture ● 2U rackmounted servers in 24-in wide chassis ● One Intel S2600WP or SW2600TP motherboard per rackmount server HR90-2003-D CS-Storm Hardware Guide 20 2626X/2826X Chassis Components Feature Description Power ● Dual 1630W redundant power supplies (optional N+1) ● Power input of 277 VAC at 10A from rack PDU ● Compute node with eight K40/K80 GPUs measured at 2745W under heavy load ● Six 80mm x 80mm x 38mm fans ● Passive heat sinks on motherboard and GPUs ● Operating temperature: 10°C to 30°C ● Storage temperature: -40°C to 70°C ● Up to 93 lbs (42 kg) ● Mechanical lift required for safe installation and removal ● 1 per chassis Cooling Weight Motherboard Memory Capacity Disk Subsystem Expansion Slots ○ 2628X - Washington Pass (S2600WP) ○ 2828X - Taylor Pass (S2600TP) ● 2626X (S2600WP) - up to 512 GB DDR3 ● 2828X (S2600TP) - up to 1,024 GB DDR4 ● On-board SATA 6 Gb/s, optional HW RAID w/BBU (N/A with on-board InfiniBand) ● Up to six 2.5 inch removable SATA/SAS solid-state drives ● Compute node ○ ● System management HR90-2003-D 1 riser card slot: x8 PCIe 3.0 I/O, login, or service node ○ 1 riser card slot: x8 PCIe 3.0 (external) ○ 1 riser card slot: x16 PCIe 3.0 (external) ○ 1 riser card slot: x16 PCIe 3.0 (internal) ● Cray Advanced Cluster Engine (ACE™): complete remote management capability ● Integrated BMC with IPMI 2.0 support ● Remote server control (power on/off, cycle) and remote server initialization (reset, reboot, shut down) ● On-board ServerEngines® LLC Pilot III® Controller ● Support for Intel Remote Management Module 4 Lite solutions ● Intel Light-Guided Diagnostics on field replaceable units ● Support for Intel System Management Software CS-Storm Hardware Guide 21 2626X/2826X Chassis Components Feature Description ● Support for Intel Intelligent Power Node Manager (CS-Storm 1630 Watt Power supply is a PMBus®-compliant power supply) Figure 13. 2626X/2826X Compute Node Components Slot 2 PCI riser interface board Slot 3 PCI riser interface board Left PCI riser G Left GPU sled 1630W power supplies (N+1) P U 2 0 1 SATA backplane U G P 3 GPU fan 0 1 Motherboard Flex-foil PCI cables Right GPU sled HDD 2 G Chassis fans Solid-state hard drives HR90-2003-D Hard drive backplanes CS-Storm Hardware Guide 22 2626X/2826X Chassis Components Figure 14. 2626X2 I/O or Login Node Components Slot 2 PCI riser interface board Slot 3 PCI riser interface board InfiniBand, FC, or 10GbE add-on card 1630W power supplies G Left GPU sled (4 GPUs) x8 Gen3 PCIe slot U P 2 0 1 Left PCI riser SATA backplane G P U GPU fan 3 0 Motherboard (S2600WP) 1 Flex-foil PCI cables to right PCI riser InfiniBand, FC, or 10GbE add-on card HD D 1/2 HD HDD RAID add-on card D 1/2 2 G HDD HDD 3 G P HD ID ID 4 HDD 1 HDD HDD Fans 5 6 Hard drive HR90-2003-D D 1/2 Hard drive backplanes CS-Storm Hardware Guide 23 2626X/2826X Chassis Components Figure 15. 2826X2 I/O or Login Node Components Slot 2 PCI riser interface board Slot 4 PCI riser interface board P G Left GPU sled (4 GPUs) U 2 0 1 Left PCI riser SATA backplane Motherboard (S2600TP) G P U GPU fan 1630W power supplies 3 0 1 Add-on card x8 Gen3 PCIe slot InfiniBand, FC, or 10GbE add-on card HD D 1/2 HD D 1/2 HDD 2 G HDD HDD 3 G P HD D 1/2 ID Fans ID 4 HDD 1 HDD HDD RAID add-on card Flex-foil PCI cables to right PCI riser 5 6 Hard drive Hard drive backplanes Slot 3 cable assembly (wraps under motherboard) Gen3x16 (to right PCI riser) Slot 3 PCI cable assembly (reverse angle) Gen3x8 Gen3x16 Gen3x8 Taylor Pass Motherboard Slot 3 Slot 1 Slot 4 Slot 2 HR90-2003-D CS-Storm Hardware Guide 24 2626X/2826X Chassis Components Figure 16. 2626X8 CPU Node Block Diagram Left GPU sled Accelerator/GPU 12V PCIe PCIx16 PCIe PCIx16 (Top) HDD 2 HDD 3 (Top) HDD 4 HDD 5 (Top) HDD 6 (Bottom) 12V/Tach Pwr Mgt PCI Slot 2 Slot 2 riser IFB SATA 0 SAS SAS SAS SAS B 12V Power Management Bus T B 12V/Tach/Pwr Mgt (Bottom) PCI SMBUS_S2 PCI Slot 3 Slot 3 riser IFB 12V/Tach/Pwr Mgt T (Bottom) PLX SMBUS_S3 Hard drive backplanes HDD 1 PCI Left PCI riser slots 2 and 3 PLX 12V/Tach Pwr Mgt Accelerator/GPU 12V 12V PCI Slot 4 0 1 2 3 SATA backplane PCI Slot 3 PSU Motherboard (S2600WP) Power Management Bus 12V 12V/Tach/Pwr Mgt 12V/Tach/Pwr Mgt L2 Flex-foil PCI cable Flex-foil PCI cable PCI Slot 4 PCI Slot 1 SMBUS_S4 PCI Right PCI riser slots 1 and 4 PLX PCI Right GPU sled HR90-2003-D Accelerator/GPU 12V L1 L3 PSU PCI Slot 1 12V/Tach/Pwr Mgt L3 L2 PSU SATA port 1 12V B 277 or 208VAC from PDU PCI Slot 2 12V Power Backplane T Bridge board Slot Slot22PCIeX8 PCIX8 L1 12V/Tach Pwr Mgt SMBUS_S1 PCI PLX PCI Accelerator/GPU CS-Storm Hardware Guide 12V 25 2626X/2826X Chassis Components Figure 17. 2826X8 CPU Node Block Diagram Left GPU sled Accelerator/GPU 12V PCIe PCIx16 PCIe PCIx16 12V/Tach/Pwr Mgt (Top) HDD 2 HDD 3 (Top) HDD 4 HDD 5 (Top) HDD 6 (Bottom) SAS SAS SAS SAS B 12V Power Management Bus T B 12V/Tach/Pwr Mgt (Bottom) PCI SMBUS_S2 12V/Tach Pwr Mgt GPU 2 Group Slot 2 riser IFB SATA 0 T (Bottom) PLX SMBUS_S3 GPU 3 Group Slot 4 riser IFB Hard drive backplanes HDD 1 PCI Left PCI riser slots 2 and 4 PLX 12V/Tach Pwr Mgt Accelerator/GPU 12V 12V 0 1 2 3 SATA backplane PCI Slot 3 PCI Slot 4 Bridge board PSU Motherboard (S2600TP) Slot 3 PCIX8 12V Power Management Bus Flex-foil PCI cable 12V/Tach/Pwr Mgt PCI Accelerator/GPU L3 L1 Flex-foil PCI cable GPU 1 Group Right PCI riser slots 1 and 3 PLX Right GPU sled Flex-foil PCI cable SMBUS_S4 PCI L1 L2 GPU 4 Group 12V/Tach/Pwr Mgt PSU PSU PCI Slot 1 12V/Tach/Pwr Mgt L3 L2 SATA port 1 12V B 277 or 208VAC from PDU PCI Slot 2 12V Power Backplane T Slot 2 PCIeX8 12V 12V/Tach Pwr Mgt SMBUS_S1 PCI PLX PCI Accelerator/GPU 12V 2626X/2826X Front and Rear Panel Controls and Indicators The front panel controls and indicators are identical for the 2626X/2826X CPU and I/O nodes. Figure 18. 2626X/2826X Front Panel Controls and Indicators GPU status – Green/Red GPU power status – Green/Red ID LED – Off/White System status LED – Off/Red Power stautus LED – Off/Red ID Button/LED – Off/White Reset Button/LED – Off/White Power Button/LED – Off/Blue HR90-2003-D CS-Storm Hardware Guide 26 2626X/2826X Chassis Components Power Button The power button is used to apply power to chassis components. Pressing the power button initiates a request to the BMC integrated into the S2600 motherboard, which forwards the request to the ACPI power state machines in the S2600 chip set. It is monitored by the BMC and does not directly control power on the power supply. ● Off-to-On Sequence: The Integrated BMC monitors the power button and any wake-up event signals from the chip set. A transition from either source results in the Integrated BMC starting the power-up sequence. Since the processors are not executing, the BIOS does not participate in this sequence. The hardware receives the power good and reset signals from the Integrated BMC and then transitions to an ON state. ● On-to-Off (operating system down) Sequence: The System Control Interrupt (SCI) is masked. The BIOS sets up the power button event to generate an SMI and checks the power button status bit in the ACPI hardware registers when an SMI occurs. If the status bit is set, the BIOS sets the ACPI power state of the machine in the chip set to the OFF state. The Integrated BMC monitors power state signals from the chip set and de-asserts PS_PWR_ON to the power supply backplane. As a safety mechanism, if the BIOS fails to service the request, the Integrated BMC automatically powers off the system in four to five seconds. ● On-to-Off (operating system up) The power button switch generates a request through SCI to the operating system to shut down the system if an ACPI operating system is running. The operating system retains control of the system and the operating system policy determines the sleep state into which the system transitions, if any. Otherwise, the BIOS turns off the system. Reset button The Reset button initiates a reset request forwarded by the Integrated BMC to the chip set. The BIOS does not affect the behavior of the reset button. ID button The ID button toggles the state of the chassis ID LED. If the LED is off, pushing the ID button lights the ID LED. It remains lit until the button is pushed again or until a chassis identify command is received to change the state of the LED. GPU status The GPU status LED indicates fatal errors have occurred on a PLX chip or GPU. Green/ON : All GPUs and PLX chips are working normally. Red/ON : A fatal error has occurred on a GPU or PLX chip. GPU power The GPU power status LED indicates the power status for the PLX chips on the right and left PCI status risers. Green/ON: GPU power normal. Red/ON: One or more GPU power failures. ID LED The ID LED is used to visually identify a specific server installed in the rack or among several racks of servers. The ID LED can be illuminated by pushing the ID button or by using a chassis identification utility. White ON: Identifies the sever. System status The system status LED indicates a fatal or non-fatal error in the system. OFF: Normal operation. Amber (Solid ON): Fatal error. Amber (Blinking): Non-Fatal error. System power The system power status LED indicates S2600 motherboard power status. OFF: Power OFF. Blue ON: Power ON. Rear Panel Controls and Indicators Rear panel controls and indicators are shown in the following figure. A Red/Green LED on the power supply indicates a failed condition. The power supplies are designated as shown in the figure. HR90-2003-D CS-Storm Hardware Guide 27 2626X/2826X Chassis Components Figure 19. 2626X/2826X Rear Panel Controls and Indicators LAN1 PSU 1 LAN2 PSU 2 USB PSU 3 1630W Power Supplies (2+1) ID LED NIC 1 (RJ45) NIC 2 (RJ45) Video (DB15) Status LED POST Code LEDs (8) InfiniBand Port (QSFP)* InfiniBand Link LED* Stacked 2-port USB 2.0 InfiniBand Activity LED* * Only on S2600WPQ/WPF ID LED NIC 1 (RJ45) NIC 2 (RJ45) * Only on S2600TPF Video (DB15) InfiniBand Port (QSFP+)* Status LED Stacked 2-port USB 2.0 Dedicated Management Port (RJ45) InfiniBand Link LED* POST Code LEDs (8) InfiniBand Activity LED* 2626X and 2826X Hard Drive Support The S2600WP/S2600TP motherboards support one SATA port at 6 Gbps for DOM, four SATA/SAS (3 Gb/s on S2600WP, 6 Gb/s on S2600TP) ports to the backplane are supported through the bridge board slot on the motherboard. The SATA backplane PCB is installed in the bridge board slot and supports 5 of the 6 solid-state drives mounted in the front of the chassis (HDD1-HDD6). HDD2 is cabled to SATA port 1 on the motherboard. On the S2600WP, SAS support needs a storage upgrade key. HR90-2003-D CS-Storm Hardware Guide 28 2626X/2826X Chassis Components Figure 20. Disk Drive Latch Disk Drive Latch Figure 21. 2626X/2826X Disk Subsystem SATA backplane SAS 0 to HDD 1 SAS 1 to HDD 2 SAS 2 to HDD 3 SAS 3 to HDD 4 SATA 0 to HDD 5 Bridge board I/O expansion slot One SATA III port (6 Gb/s) or Four 6 Gb/s SAS ports SATA port 1 SATA backplane control and status to/from power backplane Hard drive power from power backplane HD D 1/ 2 HD D 3/ 4 HDD 1 HDD 2 G G P HD D 5/ 6 ID HDD 3 SATA Port 1 to HDD 6 ID HDD 4 HDD 1 HDD HDD 1 HDD 2 5 HDD 3 HDD 4 HDD 6 HDD 5 HDD 6 1630W Power Supplies Cray CS-Storm 2626X and 2826X servers use 1,630 W, high-efficiency, power supplies. The server can operate normally from 2 power supplies. A third power supply is optional to provide a redundant N+1 configuration. Each power supply receives power from the rack PDU, and plugs into the power backplane assembly in the server chassis. The power supplies support Power Management Bus (PMBus™) technology and are managed over this bus. The power supplies can receive 277 VAC or 208 VAC (200-277 VAC input). An optional rackmounted transformer steps down 480 VAC facility power to 208 VAC for use by other switches/equipment in the rack. HR90-2003-D CS-Storm Hardware Guide 29 2626X/2826X Chassis Components 1630W power supply features: ● 1630W (including 5V stand-by, 5VSB) continuous power: 12V/133A, 5VSB/6A ● N+1 redundant operation and hot swap capability ● High line input voltage operation with Active Power Factor Correction ● Power Management Bus (PMBus™) ● Dimension: 40.0mm(H) x 76.0 mm(W) x 336mm(L) ● High Efficiency: CSCI-2010 80 PLUS Gold Compliant Figure 22. 2626X/2826X 1630W Power Supplies 277 VAC from rack PDU or 208* VAC 2 or 3 (N+1) 1630W power supplies Power and PMBus connections to power backplane * An optional rackmount step-down transformer or other 208 V source is needed. 1630W Power Supply LED Status Green Power LED A green/amber LED indicates the power supply status. A (slow) blinking green POWER LED (PWR) indicates that AC is applied to the PSU and that Standby Voltage is available. This same LED shall illuminate a steady green to indicate that all the Power Outputs are available. Amber Failure The amber LED blinks slowly or may illuminate solid ON to indicate that the power supply has LED failed or reached a warning status and must be replaced. Table 5. 1630W Power Supply LED Status Condition Green PWR LED Status Amber FAIL LED Status No AC power to all power supplies Off Off Power Supply Failure (includes over voltage, over current, over temperature and fan failure) Off On Power Supply Warning events where the power supply continues to operate (high temperature, high power and slow fan) Off 1Hz Blinking AC Present / 5VSB on (PSU OFF) On Off Power Supply ON and OK On On HR90-2003-D CS-Storm Hardware Guide 30 2626X/2826X Chassis Components 2626X/2826X Power Backplane Each power supply in a 2626X/2826X chassis plugs into the power backplane, located at the bottom of the chassis below the motherboard. The power backplane provides all the power and control connections for the motherboard, front panel controls and indicators, fans, PCIe bus power, and 12V auxiliary power for the GPUs. PCI bus power, 12V auxiliary power, fan power, tachometer signals, and system management bus (SMB) control signals are routed through the left and right PCI risers and distributed to the GPUs, fans, and PCI add-on cards. Fan power (12V) and tachometer signals are provided to each fan through two 4-pin connectors on the power backplane. A 14-pin header provides power good, and fan tachometer signals to the motherboard SMB. Three disk drive power connectors supply 5V and 12V to the hard drive backplanes at the rear of each disk drive. Motherboard SMB control and status signals, LED indicators, and power and reset buttons on the front panel also connect to a 14-pin header on the power backplane assembly. Figure 23. 2626X/2826X Power Backplane Assembly Left PCI riser power and control for motherboard PCI slots 2 and 3 12V fan power and tachometer 1630W power supply connectors To front panel assmbly, SMB control and monitoring, fault, and power status Power good, SMB, fan tachometer to motherboard fan header 12V@55A power to motherboard 12V fan power and tachometer Front panel control and status to SATA backplane HR90-2003-D 5V and 12V to hard drive backplanes CS-Storm Hardware Guide Right PCI riser power and control for PCI slots 1 and 4 31 2626X/2826X Chassis Components Figure 24. 2626X Power Backplane Block Diagram GPU Power Connections GPUs receive power from their respective right or left PCI riser. The power backplane provides 12V power to each PCI riser. A blind connector on the GPU interface board (IFB), attached to the GPU tray, plugs into the PCI riser. The GPU IFB uses either PCIe 6-pin and 8-pin connectors (K40) or an EPS 8-pin connector (K80) to provide power to each GPU. The fan control daemon (hydrad) provides an active fan control utility that enables adjustment of fan speeds and powering of GPUs on or off. HR90-2003-D CS-Storm Hardware Guide 32 2626X/2826X Chassis Components Figure 25. 2626X/2826X 12V Auxiliary Power Connectors K40 GPU GPU tray EPS-12V 8-pin power connector PCIe 6-pin and 8-pin 12V auxiliary power connectors K80 GPU GPU PCI connector 12V power connector to PCI riser GPU IFB (different IFBs for K40 and K80) Figure 26. 2626X/2826X PCI Riser Block Diagram 12V auxiliary power PCI Riser 12V auxiliary power 12V power male connector Power backpane 12V@58A 3.3V DC-DC 12V, 3.3V Voltage monitor PCIx16 male connector PLX Chip EEPROM I2C Gen3x16 PCI slot Slot 2 TX/RX DC-DC 12V, 1.8V Gen3x16 PCI slot Slot 1 TX/RX DC-DC 12V, 0.9V JTAG TX/RX PCI slot Reset Motherboard CLK Reset Buffer Slot 1 data SMB MUX I2C Slot 2 data 2626X and 2826X Plug-in Card LED Indicators There are four plug-in cards that may be installed in 2626X/2826X servers. The meaning of the LEDs on each card are described below. HR90-2003-D CS-Storm Hardware Guide 33 2626X/2826X Chassis Components Table 6. 2626X Plug-in Cards Card Description Fibre Channel Dual-port 8Gb fibre channel to PCIe host bus adapter RAID Controller Eight-port PCIe RAID controller InfiniBand Connect-IB (InfiniBand FDR) single-port QSFP+ host channel adapter card 10GbE ConnectX-3 10GbE Ethernet dual SFP+ PCIe adapter card Fibre Channel Adapter: Table 7. Dual-port Fibre Channel HBA LED (Light Pipe) Scheme Yellow LED Green LED Amber LED Activity Off Off Off Power off On On On Power on (before firmware initialization) Flashing Flashing Flashing Power on (after firmware initialization) Yellow, green, and amber LEDs flashing alternately Firmware error Off Off On and flashing Online, 2Gbps link/I/O activity Off On and flashing Off Online, 4Gbps link/I/O activity On and flashing Off Off Online, 8Gbps link/I/O activity Flashing Off Flashing Beacon RAID Controller: The LEDs on the RAID card (one per port) are not visible from outside the server chassis. When lit, each LED indicates the corresponding drive has failed or is in an unconfigured-bad state. InfiniBand: There are two LEDs on the I/O panel. When data is transferring, normal behavior is solid green and flashing yellow. ● ● Yellow - Physical link ○ Constant yellow indicates a good physical link ○ Blinking indicates a problem with the physical link ○ If neither color is on, the physical link has not been established ○ When logical link is established, yellow LED turns off Green - Logical (data activity) link ○ Constant green indicates a valid logical (data activity) link without data transfer ○ A blinking green indicates a valid logical link with data transfer ○ If the LED only lights yellow, no green, the logical link has not been established ConnectX-3 10GbE Ethernet: There are two I/O LEDs per port in dual-port designs (four LEDs between the two ports). ● Green - physical link ○ Constant on indicates a good physical link HR90-2003-D CS-Storm Hardware Guide 34 2626X/2826X Chassis Components ○ ● If neither LED is lit, the physical link has not been established Yellow - logical (data activity link) ○ Blinking yellow indicates data is being transferred ○ Stays off when there is no activity 2626X/2826X PCI Riser Interface Boards CS-Storm riser slot 2 and riser slot 3 (slot 4 on S2600TP) support PCIe add-on cards when the left GPU cage is removed. Add-on cards in riser slot 2 are secured to the rear panel of the blade. I/O or login servers provide openings on the rear panel for add-on cards. Add-on cards in slot 3 (slot 4) are supported by a bracket and must connect to the rear panel with a cable assembly. Slot 2 PCI riser interface board is a x24 PCIe Gen3 bus. It supports a Gen3 x16 bus and also a Gen3 x8 PCIe slot for an add-on card mounted to the rear panel. Figure 27. Slot 2 and Slot 3 PCI Riser Interface Boards Slot 2 PCI riser interface board PCI add-on card slot PCI X 8 PCI riser interface board (S2600WP: slot 3, S2600TP: slot 4) Motherboard 2626X/2826X Flex-Foil PCIe Interface Cables PCIe slot 1 and slot 4 on the S2600WP (slot 3 on the S2600TP) motherboards connect to the right PCI riser through flex-foil cables. For IO/login nodes (or when the right GPU sled is removed in compute nodes), add-on cards are supported through slot 1 and slot 4/slot 3. These cards plug into the same right PCI riser board used for GPUs. An optional RAID card connects to the hard drive backplanes on the front panel. Cards in slot 4 are mounted to a bracket inside the chassis. Add-on cards through Slot 1 are secured to the rear panel. HR90-2003-D CS-Storm Hardware Guide 35 2626X/2826X Chassis Components Figure 28. 2626X/2826X Slot 1 and Slot 4 Flex-Foil Cables PCI flex-foil cable under motherboard to PCI slot 4 on S2600WP, (slot 3 on S2600TP*) PCI flex-foil cable to motherboard PCI slot 1 Add-on card secured to rear panel Optional RAID add-on card to hard drive backplanes * On the S2600TP, slot 3 (X24), the PCI cable assembly includes a flex cable (x16) to the right PCI riser card, just like the S2600WP. The S2600TP PCI cable assembly includes an additional x8 flex cable for an optional low profile add-on card that is mounted to the chassis rear panel (not shown). HR90-2003-D CS-Storm Hardware Guide 36 GPU Sleds GPU Sleds The right and left GPU sleds each support 4 NVIDIA® Tesla® K40 or K80 GPUs (8 GPUs per 2626X8N/2826X8N node). The GPU sleds are secured to the chassis with 2 thumb screws, and lift straight out of the chassis. A fan at each end of the GPU sled draws air in from the front of the chassis, and pushes air out the rear. The GPU fans receive power and tachometer signals through 4-pin connectors on the PCI riser. The right GPU riser is connected to the motherboard using a flex-foil cable. The left GPU sled is connected to slots 2 and 3 on the motherboard using PCIe interface PCBs. Figure 29. 2626X8/2826X8 Right and Left GPU Sled Components PCI interface board to motherboard PCIe slot 2 PCI interface board to motherboard PCIe slot 3 GPU slot group 2 1 0 PLX PCIe switch devices GPU slot group 3 1 0 PCI flex-foil cable to motherboard PCI slot 1 PCI flex-foil cable to motherboard PCI slot 4 -S2600WP* Airflow Fan power and tachometer GPU slot group 1 1 0 Push-pull fan configuration GPU tray (group 4, tray 0) GPU slot group 4 1 0 Airflow * On the S2600TP, a different PCI cable assembly connects to slot 3. It includes a flex cable (x16) to the right PCI riser card, just like the S2600WP. The S2600TP PCI cable assembly includes an additional x8 flex cable for an optional low profile add-on card that is mounted to the chassis rear panel (not shown). The GPU group numbering is the same for both S2600WP and S2600TP motherboards. HR90-2003-D CS-Storm Hardware Guide 37 GPU Sleds Figure 30. Remove Right GPU Sled from 2626X Chassis Flex-foil cable connector from motherboard PCIe slot 1 Flex-foil cable connector from motherboard PCIe slot 4 Handle GPU Trays GPU trays are easily removed from the GPU sled. The trays are attached to the sled by a screw at each end of the tray. Two handles are provided to remove each tray. Power (12V) to the GPU or accelerator card is provided from the GPU riser card through a blind mate connector. The blind mate connector routes 12V power through a GPU interface board (IFB) attached to the GPU tray. Power from the IFB connects to the GPU through different power connectors as shown in the following figure. Figure 31. 2626X and 2826X GPU Tray Power Connectors K40 GPU GPU tray PCIe 6-pin and 8-pin 12V auxiliary power connectors EPS-12V 8-pin power connector K80 GPU GPU PCI connector 12V power connector to PCI riser GPU IFB (different IFBs for K40 and K80) Custom Accelerator Cards A GPU tray supports different sized GPUs or custom-designed accelerator cards. The largest accelerator card dimension that the GPU tray can support is 39.06mm x 132.08mm x 313.04mm. Each compute node can support up to 8 accelerator cards (up to 300W per card) per chassis. HR90-2003-D CS-Storm Hardware Guide 38 GPU Sleds Figure 32. Accelerator Card Tray Dimensions Right and Left PCI Riser Boards Two PCI riser boards (left and right) connect the motherboard PCI slots to the GPU, accelerator card, or IO addon card. The right PCI riser used in compute servers supports 4 GPUs or accelerator cards. The right PCI riser used in I/O or login servers supports 2 add-on PCI cards. Each PCI riser receives power from power backplane and provides 12V power and control to the Gen3x16 PCI slots on the riser. PCI riser edge connectors plug into the power backplane assembly to receive 12V power. Voltage regulators provide 0.9V, 1.8V, 3.3V, to the Gen3x16 PCI slots. 12V power and control signals from the power backplane are connected to the Gen3x16 PCI slots and to two blind power connectors that supply 12V auxiliary power to the GPUs. Two Gen3x16 PCI slots on the PCI riser support 4 GPU or accelerator cards. The PCI riser includes two ExpressLane™ PEX 8747 devices (PLX chips). The PLX chip is a 48-lane, 5-port, PCIe Gen3 switch device and supports multi-host PCIe switching capability. Each PLX chip multiplexes the single Gen3x16 PCI slot from the motherboard into two Gen3x16 PCI buses to support 2 GPUs or accelerator cards. The PLX chip supports peer-to-peer traffic and multicast for maximum performance. ● Flex-foil cables connect motherboard slots 1 and 4 to the right PCI riser ● Interface PCBs connect motherboard slots 2 and 3 to the left PCI riser ● Login or I/O nodes use a different right PCI riser and mounting hardware to support 2 PCI add-on cards. One card (FC or GbE) is secured to the rear panel, and an internal RAID controller add-on card is mounted internally and connects to the hard drive backplanes. HR90-2003-D CS-Storm Hardware Guide 39 GPU Sleds Figure 33. PCI Riser Block Diagram - Single PLX Chip 12V auxiliary power PCI Riser 12V auxiliary power 12V power male connector Power backpane 12V@58A 3.3V DC-DC 12V, 3.3V PCIx16 male connector EEPROM PLX Chip Voltage monitor I2C Gen3x16 PCI slot Slot 2 TX/RX DC-DC 12V, 1.8V Gen3x16 PCI slot Slot 1 TX/RX DC-DC 12V, 0.9V JTAG TX/RX PCI slot Reset Motherboard CLK Reset Buffer Slot 1 data SMB MUX I2C Slot 2 data Figure 34. 2626X/2826X Left GPU PCI Riser Components Gen3x16 PCIe connector to motherboard PCI slot 2 Gen3x16 PCIe connector to motherboard PCI slot 3 PLX chip PCI connectors to GPU trays/accelerator cards (4x) Fan power and tachometer Blind power connector HR90-2003-D PLX chip Edge connector to power backplane CS-Storm Hardware Guide 40 GPU Sleds The right PCI riser used for compute nodes supports 4 GPUs or accelerator cards. The right PCI riser for I/O or login nodes supports 2 add-on PCI cards. Figure 35. 2626X/2826X Right GPU PCI Riser Components Gen3x16 PCIe connector to motherboard slot 1 via flex-foil cable Gen3x16 PCIe connector to motherboard via flex-foil cable (slot 4-S2600WP, slot 3-S2600TP) 12V auxiliary power connectors PLX chip Gen3x16 PCI slots to accelerator card PLX chip 12V auxiliary power connectors Edge connector to power backplane Gen3x16 PCI slots to accelerator card Figure 36. Add-on Card Right Riser Card I/O and Login node Right PCI Riser FC, IB, GbE add-on card RAID add-on card HR90-2003-D CS-Storm Hardware Guide 41 GPU Sleds GPU Fault Conditions The front panel indicators include a GPU power status indicator (green=good, red=fault) and GPU fault status indicator. Each PCI riser supports two high-performance, low-latency PCIe switch devices (PLX chip, PEX8747) that support multi-host PCIe switching capabilities. Each PLX chip provides end-to-end cyclic redundancy checking (ECRC) and poison bit support to ensure data path integrity. The front panel GPU status indicates fatal errors have occurred on a PLX chip or GPU: Green On All GPUs and PLX chips are working normally Red On A fatal error has occurred a GPU or PLX chip The front panel GPU power status indicates the power status from the PLX chips on the right and left PCI risers: Green On GPU power normal Red On One or more GPU power failures Figure 37. PLX Chip Error Indicators HR90-2003-D CS-Storm Hardware Guide 42 NVIDIA Tesla K40 and K80 GPUs NVIDIA Tesla K40 and K80 GPUs The NVIDIA® Tesla® K40 and K80 graphics processing units (GPUs) are dual-slot computing modules that use the Tesla (267 mm length) form factor. Both K40 and K80 support PCI Express Gen3. They use passive heat sinks for cooling. Tesla K40/K80 modules ship with ECC enabled by default to protect the register files, cache and DRAM. With ECC enabled, some of the memory is used for the ECC bits, so the available memory is reduced by ~6.25%. Processors and memory for these GPU modules are: ● ● K40 ○ One GK110B GPU ○ 12 GB of GDDR5 on-board memory ○ ~11.25 GB available memory with ECC on K80 ○ Two GK210B GPUs ○ 24 GB of GDDR5 on-board memory (12 GB per GPU) ○ ~22.5 GB available memory with ECC on Figure 38. NVIDIA K40 and K80 GPUs K40 K80 Table 8. NVIDIA K40 and K80 Features Feature K40 K80 GPU ● Processor cores: 2880 ● Processor cores: 2496 ● Core clocks: ● Core clocks: ○ HR90-2003-D Base clock: 745 MHz CS-Storm Hardware Guide ○ Base clock: 560 MHz ○ Boost clocks: 562 MHz to 875 MHz 43 NVIDIA Tesla K40 and K80 GPUs Feature K40 ○ Board Memory BIOS Power Connectors K80 Boost clocks: 810 MHz and 875 MHz ● Package size: 45 mm × 45mm 2397 pin ball grid array (SFCBGA) ● Package size: 45 mm × 45mm 2397 pin ball grid array (SFCBGA) ● PCI Express Gen3 ×16 system interface ● Physical dimensions: 111.15 mm (height) × 267 mm (length), dual-slot ● Memory clock: 3.0GHz ● Memory clock: 2.5 GHz ● Memory bandwidth 288 GB/sec ● ● Interface: 384-bit Memory bandwidth 480 GB/sec (cumulative) ● Interface: 384-bit ○ Total board memory: 12 GB ○ 24 pieces of 256M × 16 GDDR5, SDRAM ○ Total board memory: 24 GB ○ 48 pieces of 256M × 16 GDDR5, SDRAM ● 2Mbit serial ROM ● 2Mbit serial ROM ● BAR1 size: 16 GB ● BAR1 size: 16 GB per GPU One 6-pin CPU power connector One 8-pin CPU power connector One 8-pin CPU power connector Table 9. NVIDIA Tesla K40 and K80 Board Configuration Specification K40 K80 Graphics processor One GK110B Two GK210B Core clocks Base clock: 745 MHz Base clock: 560 MHz Boost clocks: 810 MHz and 875 MHz Boost clocks: 562 – 875 MHz Memory clock 3.0 GHz 2.5 GHz Memory Size 12 GB 24 GB (per board) 12 GB (per GPU) Memory I/O 384-bit GDDR5 384-bit GDDR5 Memory bandwidth 288 GB/s (per board) 480 GB/s (per board) 240 GB/s (per GPU) Memory configurations 24 pieces of 256M x 16 GDDR5 SDRAM 48 pieces of 256M × 16 GDDR5 SDRAM Display connectors None HR90-2003-D None CS-Storm Hardware Guide 44 NVIDIA Tesla K40 and K80 GPUs Specification K40 K80 Power connectors PCIe 6-pin EPS-12V 8-pin PCIe 8-pin Board power/TDP 235 W 300 W Power cap level 235 W 150 W per GPU 300 W per board BAR1 size 16 GB 16 GB (per GPU) Extender support Straight extender is the default and the long offset extender is available as an option Straight extender or long offset extender Cooling Passive heat sink Passive heat sink ASPM Off Off K40 and K80 Connectors and Block Diagrams The K40 receives power through PCIe 6-pin and 8-pin connectors. A Y-cable from these connectors plugs into a K40 interface board (IFB) attached to the GPU tray. The K80 uses a single EPS-12V 8-pin connector/cable that plugs into a K80 IFB. The IFB boards uses a blind power connector that plugs into the GPU riser card. Figure 39. K40 and K80 Block Diagrams NVIDIA K80 GPU 384b 12GB GDDR5 24 pieces 256Mx16 BIOS 2Mbit ROM GPU FB GK110B BIOS 2 Mbit ROM GK210B 384b Power Supply PEX PCI Edge Connector GPU Riser Card 12V PCIe 6 pin 12V Gen3 x16 PCI bus PCIe 8 pin 12V aux. power PCI Edge Connector K40 IFB GPU Riser Card Blind power connector to GPU Riser Card HR90-2003-D 384b GPU FB PEX GPIO GPIO Gen3 x16 PCI bus GK210B PLX PCIe Switch CS-Storm Hardware Guide 12GB GDDR5 24 pieces 256Mx16 12GB GDDR5 24 pieces 256Mx16 NVIDIA K40 GPU Power Supply 12V EPS-12V 8 pin 12V aux. power K80 IFB Blind power connector to GPU Riser Card 45 NVIDIA Tesla K40 and K80 GPUs NVIDIA GPU Boost and Autoboost NVIDIA GPU Boost™ is a feature available on NVIDIA Tesla products that makes use of any power and thermal headroom to boost application performance by increasing GPU core and memory clock rates. GPU Boost is customized for compute intensive workloads running on clusters. Application workloads that have headroom can run at higher GPU clocks to boost application performance. If power or thermal limits are reached, the GPU clock scales down to the next available clock setting so that the GPU board remains below the power and thermal limit. The GPU clocks available under NVIDIA GPU Boost are: ● Base Clock: A clock defined to run the thermal design power (TDP) application under TDP test conditions (worst-case board under worst-case test conditions). For Tesla products, the TDP application is typically specified to be a variation of DGEMM. ● Boost Clock(s): The clocks above the base clock and they are available to the GPU when there is power headroom. The number of boost clocks supported, vary from K40 to K80. NVIDIA GPU Boost gives full control to end-users to select the core clock frequency that fits their workload the best. The workload may have one or more of the following characteristics: ● Problem set is spread across multiple GPUs and requires periodic synchronization. ● Problem set spread across multiple GPUs and runs independent of each other. ● Workload has “compute spikes.” For example, some portions of the workload are extremely compute intensive pushing the power higher and some portions are moderate. ● Workload is compute intensive through-out without any spikes. ● Workload requires fixed clocks and is sensitive to clocks fluctuating during the execution. ● Workload runs in a cluster where all GPUs need to start, finish, and run at the same clocks. ● Workload or end user requires predictable performance and repeatable results. ● Data center is used to run different types of workload at different hours in a day to better manage the power consumption. GPU Boost on K40 The K40 ships with the GPU clock set to the base clock. To enable the GPU Boost, the end user can use the NVML or nvidia-smi to select one of the available GPU clocks or boost levels. A user or system administrator can select higher clock speeds or disable autoboost and manually set the right clocks for an application, by either running the nvidia-smi command line tool or using the NVIDIA Management Library (NVML). You can use nvidia-smi to control application clocks without any changes to the application. GPU Boost on K80 The K80 ships with Autoboost enabled by default. Autoboost mode means that when the Tesla K80 is used for the first time, the GPUs will start at base clock and raise the core clock to higher levels automatically as long as the boards stays within the 300 W power limit. Tesla K80 autoboost can automatically match the performance of explicitly controlled application clocks. If you do not want the K80 clocks to boost automatically, the end-user can disable this feature and lock the module to a clock supported by the GPU. The Autoboost feature can be disabled and the module locked to a supported clock speed so the K80 will not automatically boost clocks. The K80 autoboost feature enables GPUs to work independently and not need to run in lock step with all the GPUs in the cluster. The following table summarizes the GPU Boost behavior and features for K40 and K80. HR90-2003-D CS-Storm Hardware Guide 46 NVIDIA Tesla K40 and K80 GPUs Table 10. K40 and K80 Boost Features Feature K40 K80 GPU clocks 745 MHz 562 MHz to 875 MHz at 13 MHz increments 810 MHz 875 MHz Base clock 735 MHz 560 MHz Autoboost: NVIDIA GPU Boost enabled by default No. End user has to explicitly select using nvidia-smi/NVML Yes. Enabled by default to boost the clock based on power headroom Ability to select clocks via nvidiasmi/NVML Yes Yes Ability to disable NVIDIA GPU Boost Yes. Using nvidia-smi/NVML Yes. Using nvidia-smi/NVML API for GPU Boost NVML is a C-based API for monitoring and managing the various states of Tesla products. It provides a direct access to submit queries and commands via nvidia-smi. NVML documentation is available from: https:// developer.nvidia.com/nvidia-management-library-nvml. The following table is a summary of nvidia-smi commands for using GPU Boost. Table 11. nvidia-smi Command Summary Purpose Command View the supported clocks nvidia-smi–q –d SUPPORTED_CLOCKS Set one of the supported clocks nvidia-smi -ac <MEM clock, Graphics clock> Make the clock settings persistent across driver unload nvidia-smi -pm 1 Make the clock settings revert to base clocks after driver nvidia-smi -pm 0 unloads (or turn off the persistent mode) To view the clock in use nvidia-smi -q –d CLOCK To reset clocks back to the base clock (as specified in the board specification) nvidia-smi –rac To allow “non-root” access to change graphics clock nvidia-smi -acp 0 Enable auto boosting the GPU clocks nvidia-smi --auto-boost-default=ENABLED -i 1 Disable auto boosting the GPU clocks nvidia-smi --auto-boost-default=ENABLED -i 0 HR90-2003-D CS-Storm Hardware Guide 47 NVIDIA Tesla K40 and K80 GPUs Purpose Command To allow “non-root” access to set autoboost nvidia-smi --auto-boostpermission=UNRESTRICTED -i 0 When using non-default applications clocks, driver persistence mode should be enabled. Persistence mode ensures that the driver stays loaded even when no NVIDIA® CUDA® or X applications are running on the GPU. This maintains current state, including requested applications clocks. If persistence mode is not enabled, and no applications are using the GPU, the driver will unload and any current user settings will revert back to default for the next application. To enable persistence mode run: # sudo nvidia-smi-pm 1 The driver will attempt to maintain requested applications clocks whenever a CUDA context is running on the GPU. However, if no contexts are running the GPU will revert back to idle clocks to save power and will stay there until the next context is created. Thus, if the GPU is not busy, you may see idle current clocks even though requested applications clocks are much higher. NOTE: By default changing the application clocks requires root access If the user does not have root access, the user can request his or her cluster manager to allow non-root control over application clocks. Once changed, this setting will persist for the life of the driver before reverting back to root-only defaults. Persistence mode should always be enabled whenever changing application clocks, or enabling non-root permissions to do so. Using GPU Boost ● The K40 runs at a base clock of 745 MHz. Run an workload at the base clock and check the power draw using the NVML or nvidia-smi query. If the power draw is less than 235 W, select a higher boost clock and run the application again. A few iterations and experimentation may be needed to see what boost clock works the best for a specific workload. ● The K80 ships with Autoboost enabled. The GPUs will start boosting the clock depending on the power headroom. ● If K40 and K80 GPUs are used with several others in a cluster, root access may be needed to try and set different clocks. The nvidia-smi -acp 0 command grants permission to set different boost clocks. ● Experimentation may be needed to find a clock speed that works best for a workload running on multiple GPUs running at the same clock speed. ● Selecting the highest boost clock on a K40 or K80 is likely the best option when running a workload where each GPU works independently on a problem set and there's little interaction or collaboration between GPUs. HR90-2003-D CS-Storm Hardware Guide 48 Hydra Fan Control Utility Hydra Fan Control Utility The hydra fan control utility monitors and controls GPUs and fans in CS-Storm servers. This utility controls Cray designed PCIe expansion and fan control logic through the motherboard BMC. The utility runs as a Linux service daemon (hydrad) and is distributed as an RPM package. Fan control utility (hydrad) features: ● Supports 8 GPUs or customer accelerators ● Supports Intel motherboards ● Active/manual fan control for 8x GPUs with fan localization (left or right) ● Supports Red Hat Enterprise Linux (RHEL) 6 ● GPU power on/off for energy saving ● User-programmable fan control parameters ● Power data monitoring with energy counter for PSU, motherboard and GPU Figure 40. 2626X/2826X Fan Control Block Diagram hydra command hydra CLI IPMI GPU Fans Motherboard I2C BMC ADT7462 Fan Controller Fan1 Fan2 Fan3 SMBPBI IPMB Data file Status update hydrad daemon start/stop hydrad PCI1 Fan4 I2C MUX IPMI PCI2 Config file PCI3 (PCI4 S2600TP) PCI4 (PCI3 S2600TP) I2C MUX I2C MUX I2C MUX GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU8 The CS-Storm fan control utility RPM package includes the following: /usr/sbin/hydrad The hydrad daemon is the main part of the hydra utility and runs as a service daemon on Linux OS. It starts and stops by the init script at runlevel 3, 4, and 5. When the service HR90-2003-D CS-Storm Hardware Guide 49 Hydra Fan Control Utility starts, hydrad parses the /etc/hydra.conf file for runtime environment information, then identifies/discovers the motherboard BMC, GPU, and fan control hardware logic on the system. The service then monitors the GPU status and fan speed every second. The fan speed varies according to GPU temperature, or what is defined in hydra.conf.The hydrad service updates the data file/var/tmp/hydra_self whenever the GPU or fan status has changed. /usr/sbin/hydrad.sh This script is called by /etc/rc.d/init.d/hydra and invokes the hydrad service. It generates a /tmp/hydrad.log file. /usr/sbin/hydra The hydra fan utility provides following command line interface (CLI) to users. ● Show GPU status ● Control GPU power on/off ● Show fan status ● Set active/manual fan control mode ● Set fan speed under manual mode /etc/hydra.conf This file contains the running environment for the hydrad service. The running parameters for fan speed and GPU temperature can be adjusted on the system. Restart the hydrad service to apply changes made to the hydra.conf file. RPM Package After installing the hydra RPM package, the hydra utility automatically registers and starts up the service daemon. If you want to change any parameters, modify your /etc/hydra.conf file, then stop and start the hydra service. Install: # rpm -ihv ./hydra-0.4-0.x86_64.rpm The hydrad service will startup automatically during install. hydrad keeps running as a service daemon unless the package is removed. Remove: # rpm -e hydra The /etc/hydra.conf file is moved to /etc/hydra.conf.rpmsave for the next installation. Data File Once hydrad starts up, a data file is created at /var/tmp/hydra_self. It contains all GPU and fan information that hydrad collects. Both hydrad and hydra use this data file to monitor and control the hydra system. This file can be used as a snapshot image of the latest system status. HR90-2003-D CS-Storm Hardware Guide 50 Hydra Fan Control Utility Configuration Parameters The hydrad runtime environment is modified using the /etc/hydra.conf configuration file. The /etc/ hydra.conf file contains following parameters. Use the hydra config command to display/verify the current GPU environment settings. Modify the /etc/ hydrad.conf then restart the hydrad service. ● activefan (on, off, default is on). Selects active or manual fan control mode ● debug (on, off, default is off). When this option is set to on, hydrad outputs debug messages to /tmp/ hydrad.log. ● discover (on, off, default is on). hydrad responds if there is a broadcast packet issued from hscan.py on the network UDP 38067 port. ● fanhigh (fannormal − 100%, default is 85%). The PWM duty value of high speed. If the GPU maximum temperature is higher than hightemp, the fan speed is set by this high duty value. The default setting is full speed. ● fanlow (5% - fannormal, default is 10%). The pulse-width modulation (PWM) duty value for low speed. If the GPUs maximum temperature is lower than normaltemp, the fan speed is set according to this low duty value. The default value 10%, set for the idled state of the GPU, which reduces fan power consumption. ● fannormal (fanlow − fanhigh, default is 65%). The PWM duty value of normal fan speed. If the GPU maximum temperature is higher than normaltemp, and lower than hightemp, the fan speed is set run by this normal duty value. ● fanspeed (5 - 100%, default is 85%). The default fan speed after you set manual fan control mode. ● CAUTION: ○ GPU Overheating ○ Manually setting the default fan speed to low can overheat the GPU. Monitor GPU temperature after manually setting the fan speed to avoid damage to the GPU or accelerator. ● gpuhealth (on, off, default is on). Set gpuhealth to off to disable the GPU monitoring function if GPUs are not installed in the system ● gpumax (0°C - 127°C, default is 90°C). The maximum GPU temperature allowed. If a GPU exceeds the gpumax value, hydrad issues an event in the event log. Set the proper gpumax temperature for the type of GPU installed in the system. ● gpu_type (auto, K10, K20, K40, K80, MIC, default is auto). You can define the type of your GPU/MIC. If you set auto, hydrad will automatically detect the type of GPU (requires additional time). ● hightemp (normaltemp - 127°C, default is 75°C). The minimum temperature where the fan runs at high speed. If a GPU exceeds this high temperature value, the fan runs at high speed. ● login_node (on, off, default is off). When this option is set to on, hydrad operates for a login or I/O node. The I/O or login nodes do not support the Group A components : ● ○ PCI1, PCI4 ○ FAN1, FAN4 loglevel (info, warning, critical, default is info). Controls what events hydrad logs to the /tmp/ hydrad.log file HR90-2003-D CS-Storm Hardware Guide 51 Hydra Fan Control Utility ● nodepower (on, off, default is off). hydrad monitors motherboard power consumption. ● normaltemp (0°C - hightemp, default is 60°C). The minimum temperature where the fan runs at normal speed. If a GPU temperature exceeds the normal temperature value, the fan runs at normal speed. ● polling (1 - 100 seconds, default is 2 seconds). Controls how often hydrad service accesses the GPU and fan controller ● psu_health (on, off, default is off). hydrad monitors GPU power consumption. ● psupower (on, off, default is on). hydrad checks and monitors power status and consumption of the three PSUs. ● sysloglevel (info, warning, critical, default is warning). The hydrad service also supports the syslog facility using this log level. hydrad event logs are written to /var/log/messages. ● CAUTION: ○ GPU Overheating ○ Manually setting the default fan speed to low can overheat the GPU. Monitor GPU temperature after manually setting the fan speed to avoid damage to the GPU or accelerator. hydra Commands To start the fan control service: # service hydra start To stop the fan control service: # service hydra stop Fan control utility settings are controlled from the /etc/hydra.conf configuration file when hydrad is started. To disable or enable active fan control: # hydra fan [on|off] on: Active Fan Control by GPU temperature off: Manual Fan Control To set manual fan control to a specific PWM duty value (% = 10 to 100): # hydra fan off # hydra fan [%] Command line options (examples shown below): # hydra Usage: hydra [options] <command> Options: - D :display debug message - f <file> :use specific hydrad data file. default: /var/tmp/hydra_self - v :display hydra version Commands: config :display running hydrad settings gpu [on|off] :display or control GPU power node :display node status sensor :display GPU temperatures fan [%|on|off] :display fan status, set duty cycle, active control, manual control HR90-2003-D CS-Storm Hardware Guide 52 Hydra Fan Control Utility power [node|gpu|clear] :display PSU, motherboard and GPU power status or reset the energy counter hydra config: Display Configuration Paramaters The hydra config command displays parameter values that the hydra service is currently using. Values can be changed in the /etc/hydrad.conf file and implemented by stopping and starting the hydra service. [root@hydra3]# hydra config uid=0 cid=0 id=0 gpu_map=00000000 gpu_type=auto normaltemp=60 hightemp=75 gpumax=90 fanspeed=100 low=50 normal=80 high=100 polling=2 loglevel=info sysloglevel=warning activefan=on gpu_health=on psu_health=on nodepower=off gpupower=off login_node=off debug=off ok [root@hydra3]# hydra gpu: GPU Power Control The CS-Storm has power control logic for the all GPUs that can be controlled using a hydrad CLI command. GPU power can be disabled to reduce power consumption. The default initial power state for GPUs is power on. If the GPU power is off, the GPU is not powered on when powered on, unless GPU power is enabled using the CLI command. The following limitations exist for GPU power control: ● The OS may crash if GPU power is set to off while the operating system is active due to the disabled PCI link. ● Reboot the operating system after enabling power to a GPU so that the GPU is recognized. Show the GPU status or on/off the GPU power. The power operation is performed for all installed GPUs. Individual GPU control is not allowed. Status information includes Bus number, PCI slot, Mux, power status, GPU Type, Product ID, firmware version, GPU slave address, temperature, and status. Use the following commands to enable or disable power to the GPUs. Args: <non>: MIC, GPU on : off : Display GPU status: Bus(PCI#,Mux), Power, Type, Product ID, FWVer for slave address, Temperature and Status. Turn on the all GPU power. Turn off the all GPU power. HR90-2003-D CS-Storm Hardware Guide 53 Hydra Fan Control Utility [root@hydra3]# hydra gpu # Slot Mux Power Type PID FWVer 0 1 1 on K40 1023 1 1 2 on K40 1023 2 2 1 on K40 1023 3 2 2 on K40 1023 4 3 1 on K40 1023 5 3 2 on K40 1023 6 4 1 on auto 7 4 2 on auto ok [root@hydra3]# hydra gpu off ok [root@hydra3]# hydra gpu on ok [root@hydra3]# Addr Temp Status 9eH 31 ok 9eH 33 ok 9eH 32 ok 9eH 33 ok 9eH 31 ok 9eH 32 ok hydra node: Motherboard BMC Status The hydra node command displays motherboard BMC status, Product ID, BMC firmware version and IP settings. [root@hydra3]# hydra node Prod-ID: 004e BMC Ver: 1.20 BMC CH1: 00:1e:67:76:4e:91 ipaddr: 192.168.1.57 netmask: 255.255.255.0 gateway: 192.168.1.254 BMC CH2: 00:1e:67:76:4e:92 ipaddr: 0.0.0.0 netmask: 0.0.0.0 gateway: 0.0.0.0 Sensors: 4 p1_margin: ok ( -49.0 p2_margin: ok ( -55.0 inlet: ok ( 31.0 outlet: ok ( 45.0 ok [root@hydra3]# 'C) 'C) 'C) 'C) hydra fan: Display Fan Status and Set Control Mode The hydrad fan command displays fan status and changes fan control mode and speed. When active fan control is disabled, the fan speed is automatically set to the default manual fan speed. Use the hydrad fan command to display controller chip revision, slave address, control mode and fan status. Args: <none>: on : off : % : Display FAN status: Chip Rev, slave addr, control mode and FAN status. Set Active Fan control mode. Set Manual Fan control mode. Set FAN speed duty. 5-100(%) [root@hydra3]# hydra fan ADT7462 Rev : 04h ADT7462 Addr: b0h Active Fan : on Fan Stat RPM Duty HR90-2003-D CS-Storm Hardware Guide 54 Hydra Fan Control Utility FAN1 FAN2 FAN3 FAN4 Ok ok ok ok ok 9591 9574 9574 9574 50 50 50 50 Set fan control mode to manual: [root@hydra3]# hydra fan off ok [root@hydra3]# hydra fan ADT7462 Rev : 04h ADT7462 Addr: b0h Active Fan : off Fan Stat RPM Duty FAN1 ok 13300 100 FAN2 ok 12980 100 FAN3 ok 13106 100 FAN4 ok 13466 100 Ok Set fan duty cycle to 70%: [root@hydra3]# hydra fan 70 ok [root@hydra3]# hydra fan ADT7462 Rev : 04h ADT7462 Addr: b0h Active Fan : off Fan Stat RPM Duty FAN1 ok 12356 70 FAN2 ok 12300 70 FAN3 ok 12300 70 FAN4 ok 12244 70 Ok Set fan control mode to active. [root@hydra3 ~]# hydra fan on Ok hydra sensor: Display GPU Temperatures The hydra sensor command displays GPU temperatures [root@hydra3 ~]# hydra sensor PCI1-A PCI1-B PCI2-A PCI2-B 31 33 32 33 ok [root@hydra3 ~]# PCI3-A 31 PCI3-B 32 PCI4-A PCI4-B hydra power: Display Power Values The hydra power command displays PSU, motherboard and GPU power status and can be used to reset the peak/average and energy counters. Args: HR90-2003-D CS-Storm Hardware Guide 55 Hydra Fan Control Utility <none>: node : gpu : clear : Display PSU power status Display Motherboard power status Display GPU power status Reset all Peak/Average and Energy Counters [root@hydra]# hydra power No Pwr Stat Temp Fan1 Fan2 +12V Curr ACIn Watt Model 00 on ok 27 7776 6656 11.9 42 207 572 PSSH16220 H 01 on ok 27 6144 5248 11.9 43 207 572 PSSH16220 H 02 Power : 84.0 A 1122 W (Peak 1226 W, Average 1129 W) Energy: 3457.5 Wh in last 11013secs(3h 3m 33s) ok [root@hydra]# hydra PMDev : ADM1276-3 0 Power : 12.2 V 10.5 Energy: 576.1 Wh in ok power node (ok) p: 368.0 a: 187.6 A 193 W (Peak 228 W, Average 188 W) last 11011secs(3h 3m 31s) [root@hydra]# hydra power gpu No Slot Stat +12V Curr Watt Peak Avrg Model 1 PCI1 ok 12.2 20.0 366.5 495.0 367.8 ADM1276-3 2 PCI2 ok 12.2 20.8 386.9 485.2 387.7 ADM1276-3 3 PCI3 ok 12.1 20.0 365.5 480.6 364.3 ADM1276-3 4 PCI4 ok 12.2 18.4 339.3 483.2 340.6 ADM1276-3 Power : 78.9 A 1450 W (Peak 1534 W, Average 1407 W) Energy: 4310.9 Wh in last 11019secs(3h 3m 39s) ok 0 0 0 0 [root@hydra]# hydra power clear ok [root@hydra]# hydra power No Pwr Stat Temp Fan1 Fan2 +12V Curr ACIn Watt Model 00 on ok 27 7776 6656 11.9 42 207 560 PSSH16220 H 01 on ok 27 6144 5248 11.9 42 207 558 PSSH16220 H 02 Power : 84.0 A 1118 W (Peak 1118 W, Average 1129 W) Energy: 1.9 Wh in last 1secs(0h 0m 1s) ok [root@hydra]# Fan Speeds by GPU Temperature As described above, fan speeds increase and decrease based on GPU termperatures. If one of GPU gets hot and exceeds the next temperature region, hydrad immediately changes the fan speed to reach target speed. As the GPU gets back to a low temperature below the region, hydrad will decrease the fan speed step by step. Duty % fanhigh |---------------------------| / ^ | / | | L | fannormal |---------+=======>+--------| / ^ | / | | / | | / | HR90-2003-D CS-Storm Hardware Guide 56 Hydra Fan Control Utility | L | fanlow |========>+-----------------| +---------------------------- Temperature 'C normaltemp hightemp GPU and Fan Localization Each group of fans is controlled independently. The GPU temperature for a group does not affect the other group's fan speed. The fan speeds are determined by the GPUs within the same group. The I/O or login nodes do not support the Group A components. The two central chassis fans (fan 1A and 1B) are not control by hydrad. Fans 1A and 1B are controlled by the motherboard BMC. GPUs and fans are separated into two groups: ● ● Group A components (Right GPU sled): ○ PCI1 - GPU1, GPU2 ○ PCI4 - GPU7, GPU8 ○ FAN1, FAN4 Group B components (Left GPU sled): ○ PCI2 - GPU3, GPU4 ○ PCI3 - GPU5, GPU6 ○ FAN2, FAN3 Note: The 2626X2 and 2826X2 login nodes don't have Group A components. Group A and B fans run independently from each other. The temperature of GPUs in one group do not effect the fan speeds in the other group. Fan speeds are determined by the GPU temperatures in that group. Motherboard dedicated fans 1A and 1B are not controlled by hydrad. These fans are controlled by the motherboard BMC. No Power or Unknown GPU States If there is no power or the GPU state is unknown, hydrad sets the fans speeds to either: ● Idle Speed (10%), if all of GPUs are powered off ● Full Speed (100%), if one GPU is unidentified or in an abnormal state (no thermal status reported for example) Fan Control Watchdog Timeout Condition The CS-storm system includes hardware watchdog timeout logic to protect the GPUs from overheating in the event hydrad malfunctions. The fan speed is set to full speed after 5-10 seconds if any of the following conditions occur: ● System crash ● BMC crash ● hydrad crash ● hydrad service is stopped HR90-2003-D CS-Storm Hardware Guide 57 Hydra Fan Control Utility ● hydra fan utility package is removed Discover Utility A discovery utility (hscan.py) identifies all CS-Storm systems/nodes that are running hydrad. The hscan.py utility provides the following information from hydrad. (hydrad contains the internal identification/discovery service and provides information through UDP port 38067.) You can turn off the discover capability using the discover=off option on the hydra.conf on each CS-Strom system. ● system: IP address, MAC of eth0, hostname, node type, hydrad version ● gpu: GPU temperature and type ● fan: fan status, pwm (L/R) and running speed (RPM) ● power: PSU, node, GPU power status If hydra is not running on the system, hscan.py will not display any information even though the system is running and online. NOTE: The fan, power and temperature information can not be displayed together. So the -T, -S, -P, -N, G and -F options can not be combined. Usage: ./hscan.py [options] <command> Options: -h : display this message -w <time> : waiting time for response packet -i <nic> : specific ethernet IF port (eth0, eth1,...) -m : display Mac address -c : display Current IP address -l : display system Location (cid-uid) -n : display host Name -t : display Node type -v : display hydrad Version -d : display hydrad Date -F : display Fan status -T : display gpu Temperature -S : display pci Slot devices -P : display PSU Power status -N : display Node Power status -G : display GPU Power status Getting hscan.py The hscan.py binary file is located /usr/sbin/hscan.py after installing the RPM package. [root@sona]# rpm -ihv hydra-1.0-3.x86_64.rpm Preparing... ##################################### [100%] 1: hydra ##################################### [100%] chkconfig --level 345 hydra on Starting hydra: [ OK ] [root@sona]# which hscan.py /usr/sbin/hscan.py [root@sona]# Copy the hscan.py binary file to a directory in the default path, or the ccshome directory. HR90-2003-D CS-Storm Hardware Guide 58 Hydra Fan Control Utility $ scp root@s074:/usr/sbin/hscan.py . hscan.py 100% 7764 7.6KB/s 00:00 $ Option Handling Options for the discovery utility are displayed in the order they are entered: $ ./hscan.py -mcn 00:1e:67:56:11:fd 192.168.100.74 sona $ ./hscan.py -ncm sona 192.168.100.74 00:1e:67:56:11:fd $ System Information When you run ./hscan.py without option, each hydrad displays basic system information to your command window. $ ./hscan.py 0-0 cn 00:1e:67:56:11:fd 192.168.100.74 v1.0rc3(Sep/23/20140) sona $ GPU Information GPU information cannot be displayed with fan information on the same command line. The -G, and -P options display GPU information. $ ./hscan.py -lcG 00-0 192.168.100.74 1A:0 1B:0 2A:29 2B:28 3A:25 3B:26 4A:0 4B:0 01-0 $ ./hscan.py -lcP 00000-00 192.168.100.74 1A:- 1B:- 2A:K40 2B:K40 3A:K20 3B:K20 4A:- 4B:Fan Information Fan information cannot be displayed with GPU information on the same command line. The -F option, displays fan information. $ ./hscan.py –lcF 00000-00 192.168.1 A5h 10% 10% 0(bad) 5115(ok) 4631(ok) 0(bad) Power Information If you enter any one of the -P, -N, or -T options, ./hscan.py displays PSU, Node and GPU power information. $ ./hscan.py -lcP 00000-00 192.168.100.106 PSU 74A 1026W (1086/1014) 251Wh-14m 00000-00 192.168.100.19 PSU 42A 620W (1250/667) 160Wh-14m $ ./hscan.py -lcN 00000-00 192.168.100.106 Node 16A 319W (332/311) 77Wh-14m 00000-00 192.168.100.19 Node 9A 185W (204/186) 45Wh-14m $ ./hscan.py -lcG 00000-00 192.168.100.106 GPU 59A 1130W (1228/1125) 282Wh-15m 00000-00 192.168.100.19 GPU 33A 634W (1564/696) 171Wh-14m $ HR90-2003-D CS-Storm Hardware Guide 59 S2600WP Motherboard Description S2600WP Motherboard Description The S2600 Washington Pass motherboard (S2600WP) is designed to support the Intel® Xeon® E5-2600 v2 (IvyBridge) processor family. There are three board SKUs based on the following different hardware configurations: ● S2600WP: Base SKU ● S2600WPQ: Base SKU with Mellanox® ConnectX-3® InfiniBand® QDR populated ● S2600WPF: Base SKU with Mellanox ConnectX-3 InfiniBand FDR populated The S2600WP motherboard supports the following Cray server products: ● Cray GB512X, GB612X, and GB812X GreenBlade servers ● Cray CS-Storm 2626X I/O/login and compute servers ● Cray CS-300 2620XT compute servers (4 motherboards) Refer to the Intel Server Board S2600WP Technical Product Specification, G44057, for more detailed troubleshooting information for the S2600WP motherboard. Figure 41. S2600WP Motherboard Table 12. S2600WP Motherboard Specifications Feature Description Processors Support for one or two Intel Xeon Processor E5-2600 series processors HR90-2003-D ● Up to eight GT/s Intel QuickPath Interconnect (Intel QPI) ● LGA 2011/Socket R processor socket ● Maximum thermal design power (TDP) of 130 W CS-Storm Hardware Guide 60 S2600WP Motherboard Description Feature Description Memory ● Up to 512 GB/s in 16 DIMM slots across 8 memory channels (4 channels per processor using 32 GB DIMMs) ● 1066/1333 /1866 MT/s DDR3 RDIMMs/LR-DIMMs with ECC (1 DPC) ● 1066/1333 MT/s DDR3 RDIMMs/LR-DIMMs with ECC (2 DPC) ● DDR3 standard I/O voltage of 1.5V(All Speed) and DDR3 Low Voltage of 1.35V (1 600 MT/s or below) Chipset Intel C600-A platform controller hub (PCH) External I/O Connections ● DB-15 video connectors ● Two RJ-45 network interfaces for 10/100/1000 LAN ● One stacked two-port USB 2.0 (Port 0/1) connector ● One InfiniBand QDR QSFP port (SKU: S2600WPQ only) ● One InfiniBand FDR QSFP port (SKU: S2600WPF only) ● Bridge slot to extend board I/O Internal I/O connectors/headers ○ SCU0 (four SATA/SAS 3 Gb/s ports) for backplane ○ Front control panel signals ○ One SATA (Port 0) 6 Gb/s port for DOM ● One USB 2.0 connector (USB port 2/3) ● One 2x7-pin header for system FAN module ● One DH-10 serial Port A connector ● One SATA 6 Gb/s (port 1) ● One 2x4-pin header for Intel RMM4 Lite ● One 1x4-pin header for Storage Upgrade Key Power Connections Two sets of 2x3-pin connectors System Fan Support Three sets of dual rotor fans Add-in Riser Support Four PCIe Gen III riser slots ● Riser slot 1 supports a PCI Gen3x16 Riser ● Riser slot 2 of S2600WP supports one PCI Gen3x16 riser and one PCI Gen3x8 riser in one physical slot at the same time or PCI Gen3x8 riser [for Intel rIOM (Intel input/output module)] ● Riser slot 2 of S2600WPQ and S2600WPF supports PCIe Gen3 x16 Riser or PCI Gen3x8 Riser [ for Intel rIOM (Intel input/output module) ] ● Slot 3 supports a PCIe Gen3x16 Riser There is one Bridge Slot for board I/O expansion. Video HR90-2003-D ● Integrated 2D video graphics controller CS-Storm Hardware Guide 61 S2600WP Motherboard Description Feature Description ● 128MB DDR3 Memory Hard Drive Support One SATA port at 6 Gbps on board. Four SATA/SAS ports (from SCU0; SAS support needs storage upgrade key) and one SATA 6 Gbps port (for DOM) are supported through bridge board RAID Support ● Intel RSTe RAID 0/1/10/5 for SATA mode ● Intel ESRT2 RAID 0/1/10/5 for SAS/SATA mode ● On-board ServerEngines® LLC Pilot III® Controller ● Support for Intel Remote Management Module 4 Lite solutions ● Intel Light-Guided Diagnostics on field replaceable units ● Support for Intel System Management Software ● Support for Intel Intelligent Power Node Manager (PMBus®) Server Management S2600WP Component Locations S2600WP component locations and connector types are shown below. Figure 42. S2600WP Component Locations CPU 2 DIMMs (8) Fan control (2x7) Riser slot 4 (x16) CPU 1 DIMMs (8) Riser slot 3 (x16) DIMM G1 DIMM C1 DIMM G2 DIMM H1 DIMM C2 DIMM D1 DIMM H2 DIMM D2 Bridge board Riser slot 2 (x16) InfiniBand QDR/FDR InfiniBand diagnostic and status LED IPMB USB (2x5) InfiniBand QSFP Power (2x3) USB (2) Status and ID LED PCH 600-A Power (2x3) CPU 2 VGA CPU 1 NIC chip DIMM F2 DIMM B2 DIMM F1 DIMM E2 DIMM B1 DIMM A2 DIMM E1 DIMM A1 NIC 2 BMC NIC 1 Storage upgrade key SATA Riser slot 1 CMOS RMM4 (x16) battery Lite port 1 Serial port A Figure 43. S2600WP Connectors HR90-2003-D CS-Storm Hardware Guide 62 S2600WP Motherboard Description Table 13. S2600WP Connector Locations A - NIC port 1 (RJ45) E - Status LED B - NIC port 2 (RJ45) F - Dual-port USB connector C - DB15 video out G - QSFP Connector D - ID LED H - InfiniBand diagnostic and status LED S2600WP Architecture The S2600WP is designed around the integrated features and functions of the Intel Xeon E5-2600 (Ivy Bridge) processor family. Features of the different S2600WP SKUs and a functional block diagram are included below. Table 14. S2600WP Features Board S2600WP S2600WPQ /S2600WPF Form Factor 6.8 " (173mm) x 18.9 " (480mm) CPU Socket LGA2011, Socket R Chipset Intel C600-A Chipset PCH Memory 16 DDR3 RDIMMs/LR-DIMMs/UDIMMs with ECC Slots Three PCI Express Gen3 x16 connectors Four PCI Express Gen3 x16 connectors One system bridge board connector One PCI Express® Gen3 x16 + x8 connector One system bridge board connector Ethernet InfiniBand Dual GbE, Intel I350 GbE N/A Single port of InfiniBand QDR /FDR SATA Storage One SATA III port (6Gb/s) on base board and one SATA III port (6Gb/s) on the bridge board SAS Storage Four SAS ports ( 3Gb/s, on the backplane of server system H2000WP) from SCU0 through bridge board (The SAS support needs storage upgrade key). Software RAID Processor Support Intel ESRT2 SAS/SATA RAID 0,1,5,10 or Intel RSTe SATA RAID 0,1,5, and 10 Maximum 130W TDP Video Integrated in BMC iSMS On-board ServerEngines LLC Pilot III Controller with IPMI 2.0 support Intel Light-Guided Diagnostics on field replaceable units Support for Intel System Management Software HR90-2003-D CS-Storm Hardware Guide 63 S2600WP Motherboard Description Board S2600WP S2600WPQ /S2600WPF Support for Intel System Management Software Support for Intel Intelligent Power Node Manager (PMBus) Power Supply 12V and 5VS/B PMBus Figure 44. S2600WP Block Diagram Intel E5-2600 v2 Processor Features With the release of the Intel Xeon processor E5-2600 and E5-2600 v2 product family, several key system components, including the CPU, Integrated Memory Controller (IMC), and Integrated IO Module (IIO), have been combined into a single processor package and feature per socket. Two Intel QuickPath Interconnect point-to-point links capable of up to 8.0 GT/s, up to 40 lanes of Gen 3 PCI Express links capable of 8.0 GT/s, and 4 lanes of DMI2/PCI Express Gen 2 interface with a peak transfer rate of 5.0 GT/s. The processor supports up to 46 bits of physical address space and 48-bit of virtual address space The following list provides an overview of the key processor features and functions that help to define the performance and architecture of the S2600WP motherboard. For more comprehensive processor specific HR90-2003-D CS-Storm Hardware Guide 64 S2600WP Motherboard Description information, refer to the Intel Xeon processor E5-2600 v2 product family specifications listed in Intel’s official website – www.intel.com. Processor features: ● Up to 8 execution cores ● Each core supports two threads (Intel Hyper-Threading Technology) ● 46-bit physical addressing and 48-bit virtual addressing ● 1GB large page support for server applications ● A 32-KB instruction and 32-KB data first-level cache (L1) for each core ● A 256-KB shared instruction/data mid-level (L2) cache for each core ● Up to 2.5MB per core instruction/data last level cache (LLC) Supported Technologies: ● Intel Virtualization Technology (Intel VT) ● Intel Virtualization Technology for Directed I/O (Intel VT-d) ● Intel Trusted Execution Technology (Intel TXT) ● Intel Advanced Vector Extensions (Intel AVX) ● Intel Hyper-Threading Technology ● Execute Disable Bit ● Intel Turbo Boost Technology ● Intel Intelligent Power Technology ● Data Direct I/O (DDIO) ● Enhanced Intel Speed Step Technology ● Non-Transparent Bridge (NTB) S2600WP Integrated Memory Controller (IMC) The integrated memory controller (IMC) has the following features: ● Unbuffered DDR3 and registered DDR3 DIMMs ● LR DIMM (Load Reduced DIMM) for buffered high capacity memory subsystems ● Independent channel mode or lockstep mode ● Data burst length of eight cycles for all memory organization modes ● Memory DDR3 data transfer rates of 800, 1066, 1333, and 1600 MT/s ● 64-bit wide channels plus 8-bits of ECC support for each channel ● DDR3 standard I/O Voltage of 1.5 V and DDR3 Low Voltage of 1.35 V ● 1-Gb, 2-Gb, and 4-Gb DDR3 DRAM technologies supported for these devices: ● UDIMM DDR3 – SR x8 and x16 data widths, DR – x8 data width ● RDIMM DDR3 – SR,DR, and QR – x4 and x8 data widths ● LRDIMM DDR3 – QR – x4 and x8 data widths with direct map or with rank multiplication ● Up to 8 ranks supported per memory channel, 1, 2 or 4 ranks per DIMM HR90-2003-D CS-Storm Hardware Guide 65 S2600WP Motherboard Description ● Open with adaptive idle page close timer or closed page policy ● Per channel memory test and initialization engine can initialize DRAM to all logical zeros with valid ECC (with or without data scrambler) or a predefined test pattern ● Isochronous access support for Quality of Service (QoS) ● Minimum memory configuration: independent channel support with 1 DIMM populated ● Integrated dual SMBus master controllers ● Command launch modes of 1n/2n ● RAS Support: ○ Rank Level Sparing and Device Tagging ○ Demand and Patrol Scrubbing ○ DRAM Single Device Data Correction (SDDC) for any single x4 or x8 DRAM device. Independent channel mode supports x4 SDDC. x8 SDDC requires lockstep mode ○ Lockstep mode where channels 0 & 1 and channels 2 & 3 are operated in lockstep mode ○ Data scrambling with address to ease detection of write errors to an incorrect address. ○ Error reporting via Machine Check Architecture ○ Read Retry during CRC error handling checks by iMC ○ Channel mirroring within a socket ○ CPU1 Channel Mirror Pairs (A,B) and (C,D) ○ CPU2 Channel Mirror Pairs (E,F) and (G,H) ○ Error Containment Recovery ● Improved Thermal Throttling with dynamic Closed Loop Thermal Throttling (CLTT) ● Memory thermal monitoring support for DIMM temperature HR90-2003-D CS-Storm Hardware Guide 66 S2600WP Motherboard Description Figure 45. S2600WP Integrated Memory Controller (IMC) and Memory Subsystem S2600WP Supported Memory Table 15. S2600WP RDIMM Support Guidelines (Subject to Change) Speed (MT/s) and Voltage Validated by Slot per Channel (SPC) and DIMM Per Channel (DPC)2 1 slot per channel 1 DPC Ranks Per DIMM & Data Width Memory Capacity Per DIMM1 1.35V 1066, 1333 1.5V SRx8 1GB 2GB 4GB 1066, 1333, 1600 DRx8 2GB 4GB 8GB 1066, 1333 1066, 1333, 1600 SRx4 2GB 4GB 8GB 1066, 1333 1066, 1333, 1600 DRx4 4GB 8GB 16GB 1066, 1333 --- QRx4 8GB 16GB 32GB 800 1066 QRx8 4GB 8GB 16GB 800 1066 1. Supported DRAM Densities are 1Gb, 2Gb, and 4Gb. Only 2Gb and 4Gb are validated by Cray 2. Command Address Timing is 1N HR90-2003-D CS-Storm Hardware Guide 67 S2600WP Motherboard Description Table 16. S2600WP LRDIMM Support Guidelines (Subject to Change) Speed (MT/s) and Voltage Validated by Slot per Channel (SPC) and DIMM Per Channel (DPC)3,4,5 1 slot per channel Ranks Per DIMM & Data Width1 1 DPC Memory Capacity Per DIMM2 1.35V 1.5V QRx4 (DDP)6 8GB 32GB 1066, 1333 1066, 1333 QRx8 (DPP)6 4GB 16GB 1066, 1333 1066, 1333 QRx4 (DDP)6 8GB 32GB 1066, 1333 1066, 1333 QRx8 (DPP)6 4GB 16GB 1. Physical Rank is used to calculate DIMM Capacity 2. Supported and validated DRAM Densities are 2Gb and 4Gb 3. Command address timing is 1N 4. The speeds are estimated targets and will be verified through simulation 5. For 3SPC/3DPC – Rank Multiplication (RM) >=2 6. DDP – Dual Die Package DRAM stacking. P – Planar monolithic DRAM Dies. S2600WP Memory RAS Modes The S2600WP motherboard supports the following memory reliability, availability, and serviceability (RAS) modes: ● Independent Channel Mode ● Rank Sparing Mode ● Mirrored Channel Mode ● Lockstep Channel Mode Regardless of RAS mode, the requirements for populating within a channel must be met at all times. Note that support of RAS modes that require matching DIMM population between channels (Mirrored and Lockstep) requires that ECC DIMMs be populated. Independent Channel Mode is the only mode that supports non-ECC DIMMs in addition to ECC DIMMs. For RAS modes that require matching populations, the same slot positions across channels must hold the same DIMM type with regards to size and organization. DIMM timings do not have to match, but timings will be set to support all DIMMs populated (that is, DIMMs with slower timings will force faster DIMMs to the slower common timing modes). HR90-2003-D CS-Storm Hardware Guide 68 S2600WP Motherboard Description Independent Channel Mode Channels can be populated in any order in Independent Channel Mode. All four channels may be populated in any order and have no matching requirements. All channels must run at the same interface frequency but individual channels may run at different DIMM timings (RAS latency, CAS Latency, and so on). Rank Sparing Mode In Rank Sparing Mode, one rank is a spare of the other ranks on the same channel. The spare rank is held in reserve and is not available as system memory. The spare rank must have identical or larger memory capacity than all the other ranks (sparing source ranks) on the same channel. After sparing, the sparing source rank will be lost. Mirrored Channel Mode In Mirrored Channel Mode, the memory contents are mirrored between Channel 0 and Channel 2 and also between Channel 1 and Channel 3. As a result of the mirroring, the total physical memory available to the system is half of what is populated. Mirrored Channel Mode requires that Channel 0 and Channel 2, and Channel 1 and Channel 3 must be populated identically with regards to size and organization. DIMM slot populations within a channel do not have to be identical but the same DIMM slot location across Channel 0 and Channel 2 and across Channel 1 and Channel 3 must be populated the same. Lockstep Channel Mode In Lockstep Channel Mode, each memory access is a 128-bit data access that spans Channel 0 and Channel 1, and Channel 2 and Channel 3. Lockstep Channel mode is the only RAS mode that allows SDDC for x8 devices. Lockstep Channel Mode requires that Channel 0 and Channel 1, and Channel 2 and Channel 3 must be populated identically with regards to size and organization. DIMM slot populations within a channel do not have to be identical but the same DIMM slot location across Channel 0 and Channel 1 and across Channel 2 and Channel 3 must be populated the same. S2600WP Integrated I/O Module The processor’s integrated I/O module provides features traditionally supported through chipset components. The integrated I/O module provides the following features: ● PCI Express Interfaces: The integrated I/O module incorporates the PCI Express® interface and supports up to 40 lanes of PCI Express. Following are key attributes of the PCI Express interface: ○ Gen3 speeds at 8GT/s (no 8b/10b encoding) ○ X16 interface bifurcated down to two x8 or four x4 (or combinations) ○ X8 interface bifurcated down to two x4 ● DMI2 Interface to the PCH: The platform requires an interface to the legacy Southbridge (PCH) which provides basic, legacy functions required for the server platform and operating systems. Since only one PCH is required and allowed for the system, any sockets which do not connect to PCH would use this port as a standard x4 PCI Express® 2.0 interface. ● Integrated IOAPIC: Provides support for PCI Express® devices implementing legacy interrupt messages without interrupt sharing. ● Non Transparent Bridge: PCI Express® non-transparent bridge (NTB) acts as a gateway that enables high performance, low overhead communication between two intelligent subsystems; the local and the remote HR90-2003-D CS-Storm Hardware Guide 69 S2600WP Motherboard Description subsystems. The NTB allows a local processor to independently configure and control the local subsystem, provides isolation of the local host memory domain from the remote host memory domain while enabling status and data exchange between the two domains. ● Intel QuickData Technology: Used for efficient, high bandwidth data movement between two locations in memory or from memory to I/O. Figure 46. S2600WP I/O Block Diagram Riser Slot 3 and slot 4 can only be used in dual processor configurations. With dual processor configurations, there is still an add-in graphic card in the PCI slot, the default video output is still from on-board integrated BMC until the users enable “dual monitor Video” in BIOS. Users need to determine whether Legacy VGA video output is enabled for PCIe slots attached to Processor Socket 1 (PCIe slot 1 and slot 2) or 2 (PCIe slot 3 and slot 4). Socket 1 is the default. You can change “legacy VGA socket” in BIOS setup interface from default “CPU socket 1” to “CPU socket 2” to enable video output through add-in graphic card which is in Riser slot 3 or 4. HR90-2003-D CS-Storm Hardware Guide 70 S2600WP Motherboard Description Figure 47. S2600WP PCIe Express Lanes Block Diagram S2600WP Riser Card Slots The S2600WP provides four riser card slots identified by Riser Slot 1, Riser Slot 2, Riser Slot 3, and Riser Slot 4. The PCIe signals for each riser card slot are supported by each of the two installed processors. All lanes routed to Riser Slot 1 and Riser Slot 2 are from CPU 1. All lanes routed to Riser Slot 3 and Riser Slot 4 are from CPU 2. The table lists the PCIe connections to CPU1 and CPU2 on the S2600WP motherboard Table 17. S2600WP CPU1 and CPU2 PCIe Connectivity CPU Port IOU Width Connection CPU1 DMI2 IOU2 x4 PCH (lane reversal, no polarity inversion) CPU1 PE1 IOU2 x8 QDR/FDR InfiniBand CPU1 PE2 IOU0 x16 Riser 1 CPU1 PE3 IOU1 x16 Riser 2 CPU2 DMI2 IOU2 x4 Unused CPU2 PE1 IOU2 x8 Unused CPU2 PE2 IOU0 x16 Riser 3 HR90-2003-D CS-Storm Hardware Guide 71 S2600WP Motherboard Description CPU CPU2 Port PE3 IOU IOU1 Width x16 Connection Riser 4 NOTE: Riser Slot 3 and slot 4 can only be used in dual processor configurations. With dual processor configurations, there is still an add-in graphic card in the PCI slot, the default video output is still from on-board integrated BMC until the users enable “dual monitor Video” in BIOS. Users need to determine whether Legacy VGA video output is enabled for PCIe slots attached to Processor Socket 1 (PCIe slot 1 and slot 2) or 2 (PCIe slot 3 and slot 4). Socket 1 is the default. You can change “legacy VGA socket” in BIOS setup interface from default “CPU socket 1” to “CPU socket 2” to enable video output through add-in graphic card which is in Riser slot 3 or 4. On S2600WP, the riser slot 2 has a x16 PCIe Gen 3 and a x8 PCIe Gen 3 electrical interface in the physical slot. On S2600WPQ and S2600WPF motherboards, the riser slot 2 supports a x16 PCIe Gen 3 electrical interface. Riser3 and riser4 are both customized slots which support 1x16 PCIe Gen3. The S2600WP motherboard supports riser ID which can configure all 4 riser slots to 1x16 PCI Gen 3 port or 2x8 PCI Gen 3 ports. Riser ID Configuration 1 1x16 0 1x8 The placement of the rear IO connectors and layout of the components on the board must be made to support a MD2, low profile card in the Riser1, and a rIOM (Intel Input/Output Module) mounted on a riser carrier for Riser 2. Riser 3 and 4 on S2600WP support off-board standard full height, full length I/O cards including double wide GPU boards. To support GPU boards, each riser provides 66W of 12V power as well as 10W of 3.3V power 2x8 boards being hosted in a customized chassis generate 20W of 3.3V, the number of 12amp pins on the riser is increased to accommodate this. S2600WP Integrated Baseboard Management Controller The Intel® Washington Pass (S2600WP) motherboard utilizes the baseboard management features of the Server Engines® Pilot-III Server Management Controller. The following figure provides an overview of the features implemented on the S2600WP from each embedded controller. Figure 48. S2600WP BMC Components Block Diagram The Integrated BMC is provided by an embedded ARM9 controller and associated peripheral functionality that is required for IPMI-based server management. How firmware uses these hardware features is platform dependent. HR90-2003-D CS-Storm Hardware Guide 72 S2600WP Motherboard Description Features of the Integrated BMC management hardware: ● 400 MHz 32-bit ARM9 processor with memory management unit (MMU) ● Two independent10/100/1000 Ethernet Controllers with RMII/RGMII support ● DDR2/3 16-bit interface with up to 800 MHz operation ● Twelve 10-bit ADCs ● Sixteen fan tachometers ● Eight pulse width modulators (PWM) ● Chassis intrusion logic ● JTAG Master ● Eight I2C interfaces with master-slave and SMBus timeout support. All interfaces are SMBus 2.0 compliant. ● Parallel general-purpose I/O Ports (16 direct, 32 shared) ● Serial general-purpose I/O Ports (80 in and 80 out) ● Three UARTs Platform Environmental Control Interface (PECI) ● Six general-purpose timers ● Interrupt controller ● Multiple SPI flash interfaces ● NAND/Memory interface ● Sixteen mailbox registers for communication between the BMC and host ● LPC ROM interface ● BMC watchdog timer capability ● SD/MMC card controller with DMA support ● LED support with programmable blink rate controls on GPIOs ● Port 80h snooping capability ● Secondary Service Processor (SSP), which provides the HW capability of offloading time critical processing tasks from the main ARM core. ● Server Engines® Pilot III contains an integrated SIO, KVMS subsystem and graphics controller with the following features: Super I/O Controller The integrated super I/O controller provides support for the following features as implemented on the S2600WP: ● Keyboard style/BT interface for BMC support ● Two fully functional serial ports, compatible with the 16C550 ● Serial IRQ support ● Up to 16 shared GPIO available for host processor ● Programmable wake-up event support ● Plug and play register set ● Power supply control HR90-2003-D CS-Storm Hardware Guide 73 S2600WP Motherboard Description Keyboard and Mouse Support The S2600WP does not support PS/2 interface keyboards and mice. However, the system BIOS recognizes USB specification-compliant keyboard and mice. Wake-up Control The super I/O contains functionality that allows various events to power on and power off the system. Graphics Controller and Video Support The integrated graphics controller provides support for the following features as implemented on the S2600WP: ● Integrated graphics core with 2D hardware accelerator ● DDR-2/3 memory interface supports up to 256 MB of memory ● Supports all display resolutions up to 1600 x 1200 16bpp @ 60Hz ● High speed integrated 24-bit RAMDAC The integrated video controller supports all standard IBM VGA modes. This table shows the 2D modes supported for both CRT and LCD: Table 18. Video Modes 2D Mode Refresh Rate (Hz) 2D Video Mode Support 8 bpp 16 bpp 32 bpp 640x480 60, 72, 75, 85, 90, 100, 120, 160, 200 Supported Supported Supported 800x600 60, 70, 72, 75, 85, 90, 100, 120,160 Supported Supported Supported 1024x768 60, 70, 72, 75,85,90,100 Supported Supported Supported 1152x864 43,47,60,70,75,80,85 Supported Supported Supported 1280x1024 60,70,74,75 Supported Supported Supported 1600x1200 60 Supported Supported Supported Video resolutions at 1600x1200 are supported only through the external video connector located on the rear I/O section of the S2600WP. Utilizing the optional front panel video connector may result in lower video resolutions. The S2600WP provides two video interfaces. The primary video interface is accessed using a standard 15-pin VGA connector found on the back edge of the S2600WP. In addition, video signals are routed to a 14-pin header labeled “FP_Video” on the leading edge of the S2600WP, allowing for the option of cabling to a front panel video connector. Attaching a monitor to the front panel video connector will disable the primary external video connector on the back edge of the board. The BIOS supports dual-video mode when an add-in video card is installed. In the dual mode (on-board video = enabled, dual monitor video = enabled), the onboard video controller is enabled and is the primary video device. The add-in video card is allocated resources and is considered the secondary video device. The BIOS Setup utility provides options to configure the feature. HR90-2003-D CS-Storm Hardware Guide 74 S2600WP Motherboard Description Remote KVM The Integrated BMC contains a remote KVMS subsystem with the following features: ● USB 2.0 interface for keyboard, mouse and remote storage such as CD/DVD ROM and floppy ● USB 1.1/USB 2.0 interface for PS2 to USB bridging, remote keyboard and mouse ● Hardware-based video compression and redirection logic ● Supports both text and graphics redirection ● Hardware assisted video redirection using the frame processing engine ● Direct interface to the integrated graphics controller registers and frame buffer ● Hardware-based encryption engine HR90-2003-D CS-Storm Hardware Guide 75 S2600TP Motherboard Description S2600TP Motherboard Description The S2600 Taylor Pass motherboard (S2600TP) is designed to support the Intel® Xeon® E5-2600 v3 and v4 processor families. Previous generation Xeon processors are not supported. Figure 49. S2600TP Motherboard Table 19. S2600TP Motherboard Specifications Feature Description Processors Support for one or two Intel Xeon Processor E5-2600 v3 and v4 series processors Memory ● Up to eight GT/s Intel QuickPath Interconnect (Intel QPI) ● LGA 2011/Socket R processor socket ● Maximum thermal design power (TDP) of 160 W ● Sixteen DIMM slots across eight memory channels ● Registered DDR4 (RDIMM), Load Reduced DDR4 (LRDIMM) ● Memory DDR4 data transfer rates of 1600/1866/2133/2400 MT/s Chipset Intel C610 chipset External I/O Connections ● DB-15 video connectors ● Two RJ-45 network interfaces for 10/100/1000 Mbps ● One dedicated RJ-45 port for remote server management ● One stacked two-port USB 2.0 (Port 0/1) connector ● One InfiniBand QDR QSFP port (SKU: S2600TPF only) HR90-2003-D CS-Storm Hardware Guide 76 S2600TP Motherboard Description Feature Description Internal I/O connectors/headers ● Bridge slot to extend board I/O ○ Four SATA 6 Gb/s ports to backplane ○ Front control panel signals ○ One SATA 6 Gb/s port for SATA DOM ○ One USB 2.0 connector (port 10) ● One internal USB 2.0 connector (port 6/7) ● One 2x7-pin header for system fan module ● One 1x12-pin control panel connector ● One DH-10 serial Port A connector ● One SATA 6 Gb/s port for SATA DOM ● Four SATA 6 Gb/s connectors (port 0/1/2/3) ● One 2x4-pin header for Intel RMM4 Lite ● One 1x4-pin header for Storage Upgrade Key ● One 1x8-pin backup power control connector Power Connections Two sets of 2x3-pin connectors (power backplane) System Fan Support ● One 2x7-pin connector ● Three 1x8-pin fan connectors Add-in Riser Support Four PCIe Gen III riser slots ● Riser slot 1 provides x16 lanes ● Riser slot 2 provides x24 lanes on S2600TP and x16 lanes on S2600TPF ● Riser slot 3 provides x24 lanes ● Riser slot 4 provides x16 lanes There is one Bridge Slot for board I/O expansion. Video ● Integrated 2D video graphics controller ● 16MB DDR3 Memory On-board storage and controller options Ten SATA 6 Gb/s ports, two of them are SATA DOM compatible RAID Support Server Management HR90-2003-D ● Intel Rapid Storage RAID Technology (RSTe) 4.0 ● Intel Embedded Server RAID Technology 2 (ESRT2) with optional RAID C600 Upgrade Key to enable SATA RAID 5 ● On-board Emulex Pilot III Controller ● Support for Intel Remote Management Module 4 Lite solutions ● Intel Light-Guided Diagnostics on field replaceable units ● Support for Intel System Management Software CS-Storm Hardware Guide 77 S2600TP Motherboard Description Feature Description ● Support for Intel Intelligent Power Node Manager (PMBus®) S2600TP Component Locations S2600TP component locations and connector types are shown below. Figure 50. S2600TP Component Locations System System Control fan 3 panel fan 1 Riser slot 4 (x16) Riser slot 3 (x16) SATA SGPIO IPMB Riser slot 2 (x16) Bridge board SATA 0 Diagnostic LEDs SATA 3 USB Fan connector RMM4 lite InfiniBand port (QSFP+) Dedicated management port USB Status LED ID LED SATA 2 SATA 1 HDD activity Main power 1 System fan 2 CPU 2 Main power 2 VGA CPU 1 NIC 2 VRS NIC 1 SATA RAID 5 upgrade key Serial SATA Backup CR 2032 (3V) port A DOM power control Riser slot 1 NOTE: InfiniBand port (QSFP+) is only available on S2600TPF. Figure 51. S2600TP Connectors ID LED NIC 1 (RJ45) NIC 2 (RJ45) Video (DB15) Status LED Stacked 2-port USB 2.0 * Only on S2600TPF InfiniBand Port (QSFP+)* Dedicated Management Port (RJ45) InfiniBand Link LED* POST Code LEDs (8) InfiniBand Activity LED* RJ45/NIC LEDs. These activity LEDs are included for each RJ45 connector. The link/activity LED (at the right of the connector) indicates network connection when On, and transmit/receive activity when blinking. The speed LED (at the left of the connector) indicates 1,000-Mbps operation when green, 100-Mbps operation when amber, and 10-Mbps when Off. The following table provides an overview of the LEDs. LED Color Green/Amber (A) HR90-2003-D LED State Off NIC State 10 Mbps CS-Storm Hardware Guide 78 S2600TP Motherboard Description LED Color LED State Green (B) NIC State Amber 100 Mbps Green 1,000 Mbps On Active Connection Blinking Transmit/Receive activity ID LED. This blue LED is used to visually identify a specific motherboard/server installed in the rack or among several racks of servers. The ID button on front of the server/node toggles the state of the chassis ID LED. If the LED is Off, pushing the ID button lights the ID LED. It remains lit until the button is pushed again or until a chassis identify command is received to change the state of the LED. The LED has a solid On state when it is activated through the ID button; it has a 1 Hz blink when activated through a command. Status LED. This bicolor LED lights green (status) or amber (fault) to indicate the current health of the server. Green indicates normal or degraded operation. Amber indicates the hardware state and overrides the green status. The state detected by the BMC and other controllers are included in the Status LED state. The Status LED on the front of the server/node and this motherboard Status LED are tied together and show the same state. The System Status LED states are driven by the on-board platform management subsystem. A description of each LED state follows. Color State Criticality Description Off System is not operating Not ready ● System is powered off (AC and/or DC) ● System is in Energy-using Product (EuP) Lot6 Off mode/regulation1 ● System is in S5 Soft-off state. Green Solid on OK Indicates the system is running (in S0 state) and status is Healthy. There are no system errors. AC power is present, the BMC has booted, and management is up and running. After a BMC reset with a chassis ID solid on, the BMC is booting Linux. Control has been passed from BMC uboot to BMC Linux. Remains in this state for approximately 10-20 seconds. Blinking (~1 Hz) HR90-2003-D Degraded: System is operating in a degraded state although still functional, or system is operating in a redundant state but with an impending failure warning. System degraded: ● Power supply/fan redundancy loss ● Fan warning or failure ● Non-critical threshold crossed (temperature, voltage, power) ● Power supply failure ● Unable to use all installed memory ● Correctable memory errors beyond threshold ● Battery failure ● Error during BMC operation CS-Storm Hardware Guide 79 S2600TP Motherboard Description Color State Criticality Description Amber Solid on Critical, nonFatal alarm: System has failed or shutdown recoverable - system is halted Blinking (~1 Hz) Non-critical: System is operating in a degraded state with an impending failure warning, although still functioning. Non-fatal alarm: System failure likely ● Critical threshold crossed (temperature, voltage, power) ● Hard drive fault ● Insufficient power from PSUs ● Insufficient cooling fans 1. The overall power consumption of the system is referred to as System Power States. There are a total of six different power states ranging from: S0 (the system is completely powered ON and fully operational), to S5 (the system is completely powered OFF), and the states (S1, S2, S3, and S4) referred to as sleeping states. BMC Boot/Reset Status LED Indicators. During the BMC boot or BMC reset process, the System Status and System ID LEDs are used to indicate BMC boot process transitions and states. A BMC boot occurs when AC power is first applied to the system. A BMC reset occurs after a BMC firmware update, after receiving a BMC cold reset command, and upon a BMC watchdog initiated reset. These two LEDs define states during the BMC boot/ reset process. BMC Boot/Reset State Chassis ID LED Status LED Condition BMC/Video memory test failed Solid blue Solid amber Non-recoverable condition. Contact Cray service for information on replacing the motherboard. Both universal bootloader (u-Boot) images bad Blink blue (6 Hz) Solid amber Non-recoverable condition. Contact Cray service for information on replacing the motherboard. BMC in u-Boot Blink blue (3 Hz) Blink green (1 Hz) Blinking green indicates degraded state (no manageability), blinking blue indicates u-Boot is running but has not transferred control to BMC Linux. System remains in this state 6-8 seconds after BMC reset while it pulls the Linux image into Flash. BMC booting Linux Solid blue Solid green Solid green with solid blue after an AC cycle/BMC reset, indicates control passed from u-Boot to BMC Linux. Remains in this state for ~10-20 seconds. End of BMC boot/reset process. Normal system operation Off Solid green Indicates BMC Linux has booted and manageability functionality is up and running. Fault/Status LEDs operate normally. POST Code Diagnostic LEDs.There are two rows of four POST code diagnostic LEDs (eight total) on the back edge of the motherboard. These LEDs are difficult to view through the back of the server/node chassis. During the HR90-2003-D CS-Storm Hardware Guide 80 S2600TP Motherboard Description system boot process, the BIOS executes a number of platform configuration processes, each of which is assigned a specific hex POST code number. As each configuration routine is started, the BIOS displays the given POST code to the POST code LEDs. To assist in troubleshooting a system hang during the POST process, the LEDs display the last POST event run before the hang. During early POST, before system memory is available, serious errors that would prevent a system boot with data integrity cause a System Halt with a beep code and a memory error code to be displayed through the POST Code LEDs. Less fatal errors cause a POST Error Code to be generated as a major error. POST Error Codes are displayed in the BIOS Setup error manager screen and are logged in the system event log (SEL), that can be viewed with the selview utility. The BMC deactivates POST Code LEDs after POST is completed. InfiniBand Link/Activity LED. These dedicated InfiniBand Link/Activity LEDs are not visible from the back of the server/node chassis. The amber LED indicates an established logical link. The green LED indicates an established physical link. NOTE: The InfiniBand port only applies to the S2600TPF motherboard. S2600TP Architecture The architecture of the S2600TP motherboard is developed around the integrated features and functions of the Intel® processor E5-2600 v3/v4 product families, the Intel C610 chipset, Intel Ethernet 1350 (1 GbE) controller, and the Mellanox Connect-IB adapter (S2600TPF only). HR90-2003-D CS-Storm Hardware Guide 81 S2600TP Motherboard Description Figure 52. S2600TPF Block Diagram QPI 9.6 GT/s QPI 9.6 GT/s Riser Slot 3 Riser Slot 1 Riser Slot 4 Single Port QSFP+ Riser Slot 2 Bridge Board Bridge Board Intel E5-2600 v3 Processor Features The Intel® Xeon® processor E5-2600 v3 (S2600WT/S2600KP/S2600TP) product family combines several key system components into a single processor package, including the CPU, Integrated Memory Controller (IMC), and Integrated IO Module (IIO). Each processor package includes two Intel QuickPath Interconnect point-to-point links capable of up to 9.6 (v3) GT/s, up to 40 lanes of Gen 3 PCI Express links capable of 8.0 GT/s, and 4 lanes of DMI2/PCI Express Gen 2 interface with a peak transfer rate of 5.0 (v2)/4.0 (v3) GT/s. The following list provides an overview of the key processor features and functions that help to define the performance and architecture of the S2600WT/S2600KP/S2600TP motherboards. For more comprehensive processor specific information, refer to the Intel Xeon processor E5-2600 v3 product family specifications available at: www.intel.com. Processor features: HR90-2003-D CS-Storm Hardware Guide 82 S2600TP Motherboard Description ● Maximum execution cores: ○ S2600KP/S2600TP - 12 ○ S2600WT - 18 ● When enabled, each core can support two threads (Intel Hyper-Threading Technology) ● 46-bit physical addressing and 48-bit virtual address space ● 1GB large page support for server applications ● A 32-KB instruction and 32-KB data first-level cache (L1) for each core ● A 256-KB shared instruction/data mid-level (L2) cache for each core ● Up to 2.5MB per core instruction/data last level cache (LLC) Supported Technologies: ● Intel Virtualization Technology (Intel VT) ● Intel Virtualization Technology for Directed I/O (Intel VT-d) ● Intel Advanced Vector Extensions 2 (Intel AVX2) ● Intel Hyper-Threading Technology ● Execute Disable Bit ● Advanced Encryption Standard (AES) ● Intel Turbo Boost Technology ● Intel Intelligent Power Technology ● Enhanced Intel Speed Step Technology ● Intel Node Manager 3.0 ● Non-Transparent Bridge (NTB) ● Intel OS Guard ● Intel Secure Key ● S2600WT: ○ Intel Trusted Execution Technology (Intel TXT) ○ Trusted Platform Module (TPM) 1.2 - Optional Intel E5-2600 v4 Processor Features The Intel® Xeon® E5-2600 v4 product family (codename Broadwell-EP) provides over 20 percent more cores and cache that the previous v3 generation (Haswell-EP). The E5-2600 v4 product family supports faster memory and includes integrated technologies for accelerating workloads such as database transactions and vector operations. The E5-2600 v4 are manufactured using Intel’s 14 nanometer process technology versus the 22 nanometer process used for Haswell-EP. Both E5-2600 v3 and v4 processors use the same LGA 2011-v3 (R3) sockets. The E5-2600 v4 processors now support 3D die stacked LRDIMMs, along with CRC error correction on writes to DDR4 memory. With three 3DS LRDIMMs per channel, the maximum supported frequency drops down to 1600 MHz. E5-2600 v4 product family processor features: ● Up to 22 cores and 44 threads per socket ● Up to 55 MB of last-level cache (LLC) per socket HR90-2003-D CS-Storm Hardware Guide 83 S2600TP Motherboard Description ● Up to 2400 MT/s DDR4 memory speed Optimized Intel® AVX 2.0 Cores running AVX workloads do not automatically decrease the maximum turbo frequency of other cores in the socket running non-AVX workloads. Previously, one AVX instruction on one core slowed the clock speed of all other cores on the same socket. Now, only the cores that run AVX code reduce their clock speed, allowing the other cores to run at higher speeds. Intel Transactional Synchronization Extensions (TSX) TSX increases performance of transaction intensive workloads. It provides a flexible mechanism that accelerates multi-threaded workloads by dynamically exposing otherwise hidden parallelism. TSX helps boost performance for online transaction processing (OLTP) and other multi-threaded workloads that are slowed by memory locking. TSX removes the locking mechanisms that keep threads from manipulating the same data in main memory and lets all threads complete all work in parallel. When running mixed AVX workloads, Intel Turbo Boost Technology with improvements takes advantage of power and thermal headroom to increase processor frequencies across a wide range of workloads. Resource Director Technology (RDT) The new RDT enables operating systems and virtual machine managers to monitor and manage shared platform resources in detail, down to memory bandwidth or cache allocation. RDT allows for the partitioning of resources on a per-application or per-VM basis, and works by assigning RMID attributes to a particular thread, which can then be managed with RDT. Intel Node Manager complements RDT by monitoring and controlling server power, thermals, and utilization. In combination with Intel® Data Center Manager, Intel Node Manager enables IT departments to dynamically optimize energy consumption at every level, from individual servers, racks, and rows to entire data centers. RDT provides enhanced telemetry data to enable administrators to automate provisioning and increase resource utilization. This includes Cache Allocation Technology, Code and Data Prioritization (CDP), Memory Bandwidth Motioning (MBM) and enhanced Cache Monitoring Technology (CMT). Memory Bandwidth Monitoring (MBM) The new MBM builds on the Cache Monitoring Technology (CMT) infrastructure to allow monitoring of bandwidth from one level of the cache hierarchy to the next - in this case focusing on the L3 cache, which is typically backed directly by system memory. With this enhancement, memory bandwidth can be monitored. Gather Improvements Performs faster vector gather operations. IO Directory Cache (IODC) Reduces memory access time for IO transactions to shared lines. Hardware Controlled Power Management (HWPM) HWPM enables the hardware to make power management decisions. It moves processor performance state (P-state) control from the operating system into the processor hardware. P-states are used to manage processor power consumption. Enhanced Hardware Assisted Security Features Crypto Speedup This feature improves RSA public-key and AES-GCM symmetric-key cryptography. HR90-2003-D CS-Storm Hardware Guide 84 S2600TP Motherboard Description The new ADOX and ACDX cryptography instructions accelerate secure session initiation protocols based on RSA, ECC, and Secure Hash Algorithm (SHA). These instructions enable the processor to more efficiently perform a mathematical operation frequently used in public key cryptography. New Random Seed Generator (RDSEED) Enables faster 128-bit encryption. The E5-2600 v4 family provides an integrated random number generator for creating security keys and a random bit generator for seeding software-based solutions. Both technologies help to provide high-quality keys for enhanced security. Use RDSEED for seeding a software pseudorandom number generator (PRNG) of arbitrary width. Use RDRAND for applications that require high-quality random numbers. Supervisor Mode Access Protection (SMAP) SMAP terminates popular attack vectors against operating systems and prevents malicious attempts to trick the operating system into branching off user data. SMAP provides hardware based protection against privilege escalation attacks. HR90-2003-D CS-Storm Hardware Guide 85 S2600x Processor Support S2600x Processor Support The S2600 motherboard series (S2600WP/S2600TP/S2600KP/S2600WT) support the Intel® Xeon® E5-2600 (v2, v3, v4) processors as shown in the following table. Table 20. S2600 Socket, Processor, ILM, and TDP Specifications 2600 Socket Motherboard Processor Chipset ILM Mounting Hole Dimensions Maximum Board TDP S2600WP LGA 2011 E5-2600 v2 C600-A 84x106 (2011 narrow) 135W S2600TP LGA 2011-3 E5-2600 v3 and v4 C610 56x94 (2011-3 narrow) 160W S2600KP LGA 2011-3 E5-2600 v3 and v4 C610 80x80 (2011-3 square) 160W S2600WT LGA 2011-3 E5-2600 v3 and v4 C612 56x94 (2011-3 narrow) 145W The S2600 motherboard series use one of two types of LGA2011 processor sockets. These sockets have 2011 protruding pins that touch contact points on the underside of the processor. These protruding pins inside the processor socket are extremely sensitive. Other than the processor, no object should make contact with the pins inside the socket. If used, a damaged socket pin may render the socket inoperable, and produce erroneous processor or other system errors. Processor Socket Assembly Each processor socket of the S2600 motherboards are pre-assembled with an independent loading mechanism (ILM) retention device that holds the CPU in place while applying an exact amount of force required for a CPU to be properly seated. The ILM and back plate allow for secure placement of the processor and processor heat sink as shown below. As part of their design, ILMs have differently placed protrusions which are intended to mate with cutouts in processor packagings. These protrusions, also known as ILM keying, prevent installation of incompatible processors into otherwise physically compatible sockets. Keying also prevents ILMs from being mounted with a 180-degree rotation relative to the socket. Different variants (or generations) of the LGA 2011 socket and associated processors come with different ILM keying, which makes it possible to install processors only into generation-matching sockets. Processors that are intended to be mounted into LGA 2011-v3 sockets are all mechanically compatible regarding their dimensions and ball pattern pitches, but the designations of contacts are different between generations of the LGA 2011 socket and processors, thus making them electrically and logically incompatible. The original LGA 2011 socket is used for Sandy Bridge-E/EP and Ivy Bridge-E/EP processors. LGA 2011-v3 (R3) sockets are used for Haswell-E and Broadwell-E processors. There are two types of ILM with different shapes and heat sink mounting hole patterns for each. The square ILM is the standard type and the narrow ILM is available for space-constrained applications. A matching heat sink is required for each ILM type. HR90-2003-D CS-Storm Hardware Guide 86 S2600x Processor Support Figure 53. Intel Processor Socket Assembly Processor Heat Sinks Two types of heat sinks are used with S2600 series motherboards in rackmount/node chassis. Each of the S2600 boards use two different heat sinks for processors 1 and 2. The two heat sinks are not interchangeable. The heat sinks have thermal interface material (TIM) on the underside. Use caution so as not to damage the TIM when removing or replacing a heat sink. The heat sinks are attached to the motherboard with four Phillips screws. The heat sink fins must be aligned to the front and back of the chassis for correct airflow. Incorrect alignment of a heat sink will cause serious thermal damage. A heat sink is required even if no processor is installed. Table 21. S2600 Series Heat Sinks S2600 Motherboard Heat Sink Part S2600WT/S2600TP/ S2600WP Processor 1: Cu/Al 84mm x 106mm heat sink (rear) FXXCA84X106HS Processor 2: Ex-Al 84mm x 106mm heat sink (front) FXXEA84X106HS Processor 1: Cu/Al 91.5mm x 91.5mm heat sink (rear) FXXCA91X91HS S2600KP HR90-2003-D CS-Storm Hardware Guide 87 S2600x Processor Support S2600 Motherboard Heat Sink Part Processor 2: Ex-Al 91.5mm x 91.5mm heat sink (front) FXXEA91X91HS2 Intel Motherboard System Software The motherboard includes an embedded software stack to enable, configure, and support various system functions on CS-Storm systems. This software stack includes: ● Motherboard BIOS ● Baseboard management controller (BMC) firmware ● Management engine (ME) firmware ● Field replaceable unit (FRU) data ● Sensor data record (SDR) data ● Host Channel Adapter (HCA) for systems using Mellanox Infiniband The system software is pre-programmed on the motherboard during factory assembly, making the motherboard functional at first power on after system integration. FRU and SDR data is installed onto the motherboard during system integration to ensure the embedded platform management subsystem is able to provide best performance and cooling for the final system configuration. The motherboard software stack, as well as firmware/drivers for other equipment, may be updated during Cray manufacturing and system testing to ensure the most reliable system operation. IMPORTANT: ● Installing Firmware Upgrade Packages/Releases ● Customers may find individual or combined firmware packages/releases on Intel websites. Cray and Intel do not support mixing and matching package components from different releases. Doing so could damage the motherboard. Customers should NOT download firmware packages directly from Intel, unless specifically instructed to do so by Cray. CCS Engineering must validate and approve firmware supplied by Intel and will supply Cray customers with the appropriate packages. To request a firmware package from Cray, customers must submit a report of the current version information as printed by the Intel One-Boot Flash Update Utility. To generate this report, customers should run flashupdt -i. ● Mellanox and Intel collaborate to release periodic updates to the HCA firmware. Customers should NOT obtain HCA firmware packages directly from Intel or Mellanox, unless specifically instructed to do so by Cray. CCS Engineering must validate and approve firmware and will supply Cray customers with the appropriate packages. To request an HCA firmware image, customers must provide the HCA’s PSID and current firmware version. Firmware images are custom built for specific PSIDs and mismatching of PSID between firmware and HCA is not allowed. Use the flint command to obtain the PSID for the Mellanox-onboard HCA. FRU and SDR Data As part of initial system integration, the motherboard/system is loaded with the proper FRU and SDR data to ensure the embedded platform management system is able to monitor the appropriate sensor data and operate the system with optimum cooling and performance. The BMC supports automatic configuration of the management engine after any hardware configuration changes are made. Once Cray completes the initial FRU/ HR90-2003-D CS-Storm Hardware Guide 88 S2600x Processor Support SDR update, subsequent automatic configuration occurs without the need to perform additional SDR updates or provide other user input to the system when any of the following components are added or swapped: ● Processors ● I/O modules (dedicated slot modules) ● Storage modules (dedicated slot modules) ● Power supplies ● Fans ● Fan upgrades from non-redundant to redundant ● Hot swap backplanes ● Front panels S2600 WP and TP Memory Population Rules A total of 16 DIMMs are configured with two CPUs, four channels/CPU, and two DIMMs/channel. The nomenclature for DIMM sockets is detailed in the following table. DIMM socket locations are shown in the illustration. NOTE: Although mixed DIMM configurations are supported, Cray performs platform validation only on systems that are configured with identical DIMMs installed. Table 22. S2600WP/S2600TP DIMM Channels Processor Socket 1 Processor Socket 2 (0) Channel A (1) Channel B (2) Channel C (3) Channel D (0) Channel E (1) Channel F (2) Channel G (3) Channel H A1 B1 C1 D1 E1 F1 G1 H1 A2 B2 C2 D2 E2 F2 G2 H2 Figure 54. S2600WP/S2600TP Memory Population Rules ● Each processor provides four banks of memory, each capable of supporting up to 4 DIMMs. ● DIMMs are organized into physical slots on DDR3/DDR4 memory channels that belong to processor sockets. ● The memory channels from processor socket 1 are identified as Channel A, B, C and D. The memory channels from processor socket 2 are identified as Channel E, F, G, and H. HR90-2003-D CS-Storm Hardware Guide 89 S2600x Processor Support ● The silk screened DIMM slot identifiers on the board provide information about the channel, and therefore the processor to which they belong. For example, DIMM_A1 is the first slot on Channel A on processor 1; DIMM_E1 is the first DIMM socket on Channel E on processor 2 . ● The memory slots associated with a given processor are unavailable if the corresponding processor socket is not populated. ● A processor may be installed without populating the associated memory slots provided a second processor is installed with associated memory. In this case, the memory is shared by the processors. However, the platform suffers performance degradation and latency due to the remote memory. ● Processor sockets are self-contained and autonomous. However, all memory subsystem support (such as Memory RAS, Error Management,) in the BIOS setup are applied commonly across processor sockets. The following generic DIMM population requirements generally apply: ● All DIMMs must be DDR3 (S2600WP) and DDR4 (SW2600TP) DIMMs. ● Unbuffered DIMMs should be ECC (S2600WP). ● Mixing of registered and unbuffered DIMMs is not allowed per platform. ● Mixing of LRDIMM with any other DIMM type is not allowed per platform. ● Mixing of DDR3/DDR4 voltages is not validated within a socket or across sockets by Intel. If 1.35V (DDR3L) and 1.50V (DDR3) DIMMs are mixed, the DIMMs will run at 1.50V. ● Mixing of DDR3/DDR4 operating frequencies is not validated within a socket or across sockets by Intel. If DIMMs with different frequencies are mixed, all DIMMs will run at the common lowest frequency. ● Quad rank RDIMMs (S2600WP) are supported but not validated by Intel. When populating a Quad-rank DIMM with a single- or dual-rank DIMM in the same channel on the S2600TP, the Quad-rank DIMM must be populated farthest from the processor. Intel MRC checks for correct DIMM placement. ● A maximum of eight logical ranks (ranks seen by the host) per channel is allowed. ● Mixing of ECC and non-ECC DIMMs is not allowed per platform. Publishing System Memory ● The BIOS displays the “Total Memory” of the system during POST if Display Logo is disabled in the BIOS setup. This is the total size of memory discovered by the BIOS during POST, and is the sum of the individual sizes of installed DDR3/DDR4 DIMMs in the system. ● The BIOS displays the “Effective Memory” of the system in the BIOS setup. The term Effective Memory refers to the total size of all DDR3/DDR4 DIMMs that are active (not disabled) and not used as redundant units. ● The BIOS provides the total memory of the system in the main page of the BIOS setup. This total is the same as the amount described by the first bullet above. ● If Quiet Boot/Display Logo is disabled, the BIOS displays the total system memory on the diagnostic screen at the end of POST. This total is the same as the amount described by the first bullet above. ● Some operating systems do not display the total physical memory. They display the amount of physical memory minus the approximate amount of memory space used by BIOS components. Intel Motherboard Accessory Options Two server management options are avaialable on CS-Storm systems and are described below: ● RMM4 Lite HR90-2003-D CS-Storm Hardware Guide 90 S2600x Processor Support ● Embedded RAID Level 5 ● Trusted Platform Module (TPM) RMM4 Lite The optional Intel Remote Management Module 4 (RMM4) Lite is a small board. The RMM4 enables remote keyboard, video and mouse (KVM) and media redirection on the server/node system, from the built-in BMC web console accessible from a remote web browser. A 7-pin connector is included on the motherboard for plugging in the RMM4 Lite module. This connector is not compatible with the RMM3 module. The BMC integrated on the motherboard supports basic and advanced server management features. Basic features are available by default. Advanced management features are enabled with the addition of the RMM4 Lite key. After the RMM4 Lite module is installed, the advanced management features are available through both the RJ45 dedicated management port (S2600WT/S2600KP/S2600TP) and the on-board BMC-shared NIC ports. The RMM4 captures, digitizes, and compresses video and transmits it with keyboard and mouse signals to and from a remote computer. Figure 55. RMM4 Lite Installation How to Enable RMM4 Lite This procedure describes how to enable RMM4 for remote access control of a management node. This will help site service personnel in being able to remotely power down/up a remote management node. This procedure uses the syscfg utility that can modify BIOS, CMOS and IPMI settings on the node's motherboard. For more information, refer to the Intel® Remote Management Module 4 and Integrated BMC Web Console User Guide. The following procedure uses syscfg commands to configure the BMC LAN and RMM4. The BIOS setup and IPMI tools can also be used. HR90-2003-D CS-Storm Hardware Guide 91 S2600x Processor Support 1. Login to the management node as root. 2. Use the following ipmitool command where LAN_virtual_channel is a numeric LAN virtual channel, display the LAN settings. mgmt1# ipmitool lan print LAN_virtual_channel Usually LAN channel 1 is used already from the factory defaults so select an unused LAN channel for configuring. We selected LAN channel 3 in the example. 3. Create a privileged user account for root as user 2 and enable it with a password for use on LAN channel 3: mgmt1# /ha_cluster/tools/bin/syscfg /u 2 “root” “password” mgmt1# /ha_cluster/tools/bin/syscfg /ue 2 enable 3 4. Configure a static IP address and subnet mask for LAN channel 3: mgmt1# /ha_cluster/tools/bin/syscfg /le 3 static IP_Address Subnet 5. Configure a Default Gateway for LAN channel 3: mgmt1# /ha_cluster/tools/bin/syscfg /lc 3 12 Default_Gateway How to Remotely Access the RMM4 Module 1. Open the configured IP address of the RMM4 in a web browser (IE or Firefox): https:// RMM4_IP_Address 2. Authenticate the user account with the credentials created in the How to Enable the RMM4 procedure above. 3. The RMM4 GUI should appear. It allows the user to remotely access the management node console, power on/off functions, and display various other system functions that can be controlled remotely. Embedded RAID Level 5 Most of the CS-Storm motherboards include support for the following two embedded software RAID options. Both options support Raid Levels - 0,1,5,10. SATA RAID 5 support is provided with the appropriate Intel RAID C600 Upgrade Key for ESRT2. For RSTe, RAID 5 is an available standard (no option key is required). ● Intel® Embedded Server RAID Technology 2 (ESRT2) based on LSI® MegaRAID SW RAID technology ● Intel Rapid Storage Technology (RSTe) The RAID upgrade key is a small PCB board that has up to two security EEPROMs that are read by the system ME to enable different versions of LSI RAID 5 software stack. Installing the optional RAID C600/SATA RAID upgrade key (RKSATA4R5) on the motherboard enables Intel ESRT2 SATA RAID 5. RAID level 5 provides highly efficient storage while maintaining fault-tolerance on 3 or more drives. RAID 5 is well suited for applications that require high amounts of storage while maintaining fault tolerance. With RAID 5, both data and parity information are striped and mirrored across three or more hard drives. To use RAID 5, a minimum of three hard drive disks are required. RAID partitions created with RSTe or ESRFT2 cannot span across the two embedded SATA controllers. Only drives attached to a common SATA controller can be included in a RAID partition. The <F2> Bios Setup utility supports options to enable/disable RAID and select which embedded RAID software option to use. HR90-2003-D CS-Storm Hardware Guide 92 S2600x Processor Support Figure 56. RAID 5 Upgrade Key Installation Trusted Platform Module (TPM) Some motherboards support the optional Trusted Platform Module (TPM) which plugs into a 14-pin connector labeled TPM. The TPM is a hardware-based security device that addresses the concern on boot process integrity and offers better data protection. The TPM must be enabled through the Security tab in the <F2> BIOS Setup Utility. TPM protects the system start up process by ensuring it is tamper-free before releasing system control to the operating system. A TPM device provides secured storage to store data, such as security keys and passwords. A TPM device has encryption and hash functions. A TPM device is secured from external software attacks and physical theft. A pre-boot environment, such as the BIOS and OS loader, uses the TPM to collect and store unique measurements from multiple factors within the boot process to create a system fingerprint. This unique fingerprint remains the same unless the pre-boot environment is tampered with. It is used to compare to future measurements to verify boot process integrity. BIOS Security Features The motherboard BIOS supports a variety of system security options designed to prevent unauthorized system access or tampering of server/node settings on CS-Storm systems. System security options include: ● Password protection ● Front panel lockout The BIOS Security screen allows a user to enable and set user and administrative passwords and to lock the front panel buttons so they cannot be used. The screen also allows the user to enable and activate the Trusted Platform Module (TPM) on motherboards configured with this option. Entering the BIOS Setup The <F2> BIOS Setup Utility is accessed during POST. To enter the BIOS Setup using a keyboard (or emulated keyboard), press the <F2> key during boot time when the logo or POST Diagnostic Screen is displayed. The Main screen is displayed unless serious errors have occurred causing the Error Manager screen to be displayed. HR90-2003-D CS-Storm Hardware Guide 93 S2600x Processor Support At initial system power On, a USB keyboard will not be functional until the USB controller has been initialized during the POST process. When the USB controller is initialized, the system will beep once. Only after that time will the key strokes from a USB keyboard be recognized allowing for access into the <F2> BIOS Setup utility. The following message is displayed on the Diagnostic Screen or under the Quiet Boot Logo Screen: Press <F2> to enter setup, <F6> Boot Menu, <F12> Network Boot After pressing the <F2> key, the system eventually loads the BIOS Setup utility and displays the BIOS Setup Main Menu screen. Whenever information is changed (except date and time), the system requires a save and reboot (<F10>) to take place in order for the changes to take effect. Pressing the <Esc> key discards the changes and resumes POST to continue to boot the system. BIOS Security Options Menu The BIOS Security screen provides options to configure passwords and lock front panel control buttons. The BIOS uses passwords to prevent unauthorized access to the server. Passwords can restrict entry to the BIOS Setup Utility, restrict use of the Boot Device pop-up menu during POST, suppress automatic USB device reordering, and prevent unauthorized system power on. A system with no administrative password allows anyone who has access to the server/node to change BIOS settings. The administrative and user passwords must be different from each other. Once set, a password can be cleared by setting it to a null string. Clearing the administrator password also clears the user password. Entering an incorrect password three times in a row during the boot sequence places the system in a halt state. A system reset is then required to exit out of the halt state. Figure 57. BIOS Security Screen Administrator Password Status User Password Status Set Administrator Password HR90-2003-D Information only. Indicates the status of the Administrator password. Information only. Indicates the status of the User password. This password controls "change" access to Setup. The Administrator has full access to change settings for any Setup options, including setting the Administrator and User passwords. CS-Storm Hardware Guide 94 S2600x Processor Support When Power On Password protection is enabled, the Administrator password may be used to allow the BIOS to complete POST and boot the system. Deleting all characters in the password entry field removes a previously set password. Clearing the Administrator Password also clears the User Password. Set User Password The User password is available only if the Administrator Password is installed. This option protects Setup settings as well as boot choices. The User Password only allows limited access to the Setup options, and no choice of boot devices. When Power On Password is enabled, the User password may be used to allow the BIOS to complete POST and boot the system. Power On Password When Power On Password is enabled, the system halts soon after power On and the BIOS prompts for a password before continuing POST and booting. Either the Administrator or User password may be used. If an Administrator password is not set, this option is grayed out and unavailable. If enabled and the Administrator password is removed, this option is disabled. Front Panel Lockout If Front Panel Lockout is Enabled in BIOS setup, the following front panel features are disabled: ● The Off function of the Power button ● System Reset button If Enabled, the power button Off and reset buttons on the server front panel are locked. It also locks the NMI Diagnostic Interrupt button on motherboards that support this feature. If Enabled, power off and reset must be controlled through a system management interface, and the NMI button is not available. S2600 QuickPath Interconnect The Intel® QuickPath® Interconnect is a high speed, packetized, point-to-point interconnect used in the processor. The narrow high-speed links stitch together processors in distributed shared memory and integrated I/O platform architecture. It offers much higher bandwidth with low latency. The Intel QuickPath Interconnect has an efficient architecture allowing more interconnect performance to be achieved in real systems. It has a snoop protocol optimized for low latency and high scalability, as well as packet and lane structures enabling quick completions of transactions. Reliability, availability, and serviceability features (RAS) are built into the architecture. The physical connectivity of each interconnect link is made up of twenty differential signal pairs plus a differential forwarded clock. Each port supports a link pair consisting of two uni-directional links to complete the connection between two components. This supports traffic in both directions simultaneously. To facilitate flexibility and longevity, the interconnect is defined as having five layers: Physical, Link, Routing, Transport, and Protocol. The Intel QuickPath Interconnect includes a cache coherency protocol to keep the distributed memory and caching structures coherent during system operation. It supports both low-latency source snooping and a scalable home snoop behavior. The coherency protocol provides for direct cache-to-cache transfers for optimal latency. S2600 InfiniBand Controllers Intel® S2600 motherboards can be populated with a new generation InfiniBand (IB) controller on CS-Storm systems. The Mellanox® ConnectX®-3 and Connect-IB controllers support Virtual Protocol Interconnect® (VPI), providing 10/20/40/56 Gb/s IB interfaces. HR90-2003-D CS-Storm Hardware Guide 95 S2600x Processor Support Figure 58. ConnectX®-3 InfiniBand Block Diagram Major features and functions include: ● Single InfiniBand Port: SDR/DDR/QDR on S2600WPQ, SDR/DDR/QDR/FDR on S2600WPF with port remapping in firmware ● Performance optimization: achieving single port line-rate bandwidth ● PCI Express® Gen3x8 to achieve 2.5, 5 or 8 GT/s link rate per lane ● Low power consumption: 6.5 W typical (ConnectX-3), 7.9 W typical (Connect-IB) Device Interfaces Major interfaces of the Mellanox ConnectX-3 and ConnectX-IB chips: ● Clock and Reset signals: include core clock input and chip reset signals ● Uplink Bus: The PCI Express bus is a high-speed uplink interface used to connect ConnectX to the host processor. ConnectX supports a PCI Express 3.0 x8 uplink connection with transfer rates of 2.5GT/s, 5GT/s and 8GT/s per lane. The PCI Express interface may also be referred to as the “uplink” interface ● Network Interface: Single network port connecting the device to a network fabric. InfiniBand is configured to 10/20/40/56 Gb/s ● Flash interface: Chip initialization and host boot ● I2C Compatible Interfaces: For chip, QSFP/QSFP+ connectors, and chassis configure and monitor ● Management Link: Connect to BMC via SMBus and NC-SI ● Others:Include MDIO, GPIO and JTAG HR90-2003-D CS-Storm Hardware Guide 96 S2600x Processor Support Quad Small Form-Factor Pluggable (QSFP) Connector The Mellanox ConnectX-3 port is connected to a single QSFP connector. The ConnectX-IB is connected to a single QSFP+ connector. The following figure shows the application reference between Mellanox ConnectX-3 and QSFP: NOTE: 2 or 3 meter InfiniBand cables are recommended for better EMI performance. Figure 59. Connection Between ConnectX-3 and QSFP Host Board (Only one channel shown for simplicity) A B’ Rx Rx Out n QSFP Module Tx In p Tx In n Tx Optical Connector/Port (Optical Interface) ASIC (SerDes) Rx Out p Module Card Edge (Host Interface) C’ Host Edge Card Connector C D B HR90-2003-D CS-Storm Hardware Guide Rx 1 Rx 2 Rx 3 Rx 4 Tx 1 Tx 2 Tx 3 Tx 4 97 S2600JF and S2600WP Motherboard BIOS Upgrade S2600JF and S2600WP Motherboard BIOS Upgrade This motherboard upgrade procedure describes how to upgrade from BIOS package version v01.06.001 to v02.01.002 on the S2600JF and S2600WP motherboards only, on clusters NOT managed by the Cray ACE management suite. Downgrading of motherboard BIOS package is not advised. NOTE: The Advanced Cluster Management software (ACE) utility plugins can be used to perform various utility tasks on the ACE-managed system such as upgrading the BIOS on compute nodes or upgrading the firmware on InfiniBand HCAs and switches. IMPORTANT: These procedures are valid ONLY for updating the BIOS package via the Intel One-Boot Flash (OFU) utility and NOT via EFI. Updating the BIOS package via EFI should be performed only when approved by Cray CS Engineering. If the motherboard BIOS package is older than v01.06.001, please contact Cray Support before proceeding. If nodes are equipped with accelerators cards, then they must be disabled before proceeding with the BIOS upgrade. Intel One-Boot Flash Update Utility The One-Boot Flash Update Utility (OFU) allows users to upgrade motherboard BIOS and firmware locally while the host operating system is booted. Upgrading motherboard BIOS and firmware via this method is regarded by Cray as standard operating procedure. The prerequisite tools provided by Intel are described in the following table. Table 23. Intel On-Boot Flash Update Utility Name Filename One-Boot Flash Utility Linux_OFU_V11_B15.zip Linux Application flashupdt Description The One-Boot Flash Update Utility is a program used for updating the system BIOS, BMC, Field Replaceable Units (FRU), and the Sensor Data Records (SDR)of systems that support the Rolling Update feature. System Event Log Viewer Linux_SELVIEWER_V11_B10.zipselview The SEL Viewer Utility is a DOS application used for viewing system event records. Save and Restore System Configuration utility Linux_SYSCFG_V12_B10.zip The System Configuration Utility (syscfg.exe) is a command-line utility that provides the ability to display, configure, save, and restore certain system firmware, BIOS, and Intel® Server Management settings on a single HR90-2003-D syscfg CS-Storm Hardware Guide 98 S2600JF and S2600WP Motherboard BIOS Upgrade Name Filename Linux Application Description server or across multiple identical model servers (cloning). System Information Retrieve Utility Linux_sysinfo_V12_B11.zip sysinfo The Intel System Information Retrieval Utility (hereinafter referred to as sysinfo) is used for collecting system information. Intel distributes each of the above applications as a combined release or as separate releases. Customers are advised to install the latest available version of these utilities to any system receiving a BIOS update. To determine the currently installed version of these utilities, run the following commands: #./flashupdt -i #./syscfg -i #./sysinfo -i #./selview –i When upgrading to BIOS 02.01.0002 or later, please install the latest version of flashupdt (minimum: Version 11, Build 14), as-is from Intel, before proceeding. CAUTION: Failure to use the latest Intel utilities could damage the motherboard and render it unusable. Finally, flashupdt and syscfg have additional dependencies that must be installed before a BIOS upgrade can be performed: ● ncurses-libs-5.7-3.20090208.el6.i686.rpm ● libstdc++-4.4.6-3.el6.i686.rpm These dependencies are normally included in the Red Hat/CentOS distribution ISO, but if they are not installed, run the following commands to install them. # rpm -ivh libstdc++-4.4.6-3.el6.i686.rpm # rpm -ivh ncurses-libs-5.7-3.20090208.el6.i686.rpm A complete user guide for the Intel OFU is available from Intel at the following link. http://download.intel.com/support/motherboards/server/ism/sb/ intel_ofu_user_guide.pdf Obtaining Motherboard BIOS/ME/BMC/SDR Firmware Files Intel releases a combined System Firmware Update Package for use on the OFU utility. This package includes firmware for these components: ● Motherboard BIOS ● Manageability Engine (ME) ● Baseboard Management Controller (BMC) HR90-2003-D CS-Storm Hardware Guide 99 S2600JF and S2600WP Motherboard BIOS Upgrade ● Sensor Data Record (SDR/FRUSDR) CAUTION: Each Intel release is provided as a complete package. Cray and Intel do not support mixingand-matching package components from different releases. Doing so could damage the motherboard. Customers should NOT obtain BIOS firmware packages directly from Intel, unless specifically instructed to do so by Cray. CCS Engineering must validate and approve Intel-supplied firmware and will supply Cray customers with the appropriate packages. To request a firmware package from Cray, customers must submit a report of the current version information as printed by the Intel OFU To generate this report, customers should run flashupdt -i. [root@system]# ./flashupdt -i One Boot Flash Update Utility Version 11.0 Build 14 Copyright (c) 2013 Intel Corporation System BIOS and FW Versions BIOS Version:......... SE5C600.86B.01.06.0001 BMC Firmware Version: --------------------Op Code:........... 1.16.4010 Boot Code:......... 01.14 ME Firmware Version:.. 02.01.05.107 SDR Version:.......... SDR Package 1.09 Baseboard Information: ----------------------Base Board ID:..... S2600JF Asset Tag Name:.... .................... System Information: ------------------Manufacturer Name:. CRAY Product Name:...... gb812x-rps Version Number:...... .................... Chassis Information: -------------------Manufacturer Name:. CRAY Chassis Type:...... Main Server Chassis Obtaining Mellanox-onboard HCA Firmware Files Select motherboards on the CS-Storm platform include an onboard Mellanox InfiniBand Host Channel Adapter (HCA). These motherboards are identified by the model numbers S2600WPQ, and S2600WPF. Mellanox and Intel collaborate to release periodic updates to the HCA firmware. Customers should NOT obtain HCA firmware packages directly from Intel or Mellanox, unless specifically instructed to do so by Cray. CCS Engineering must validate and approve firmware and will supply Cray customers with the appropriate packages. To request an HCA firmware image, customers must provide the HCA’s PSID and current firmware version. Firmware images are custom built for specific PSIDs and mismatching of PSID between firmware and HCA is not allowed. To obtain the PSID for the Mellanox-onboard HCA, run the following command: [root@system]# flint -d /dev/mst/mt4099_pci_cr0 query Image type: FS2 FW Version: 2.30.8000 FW Release Date: 4.5.2014 Device ID: 4099 Description: Node Port1 Port2 Sys image GUIDs: 001e6703004835c4 001e6703004835c5 001e6703004835c6 HR90-2003-D CS-Storm Hardware Guide 100 S2600JF and S2600WP Motherboard BIOS Upgrade 001e6703004835c7 MACs: VSD: n/a PSID: INCX-3I358C10551 001e674835c5 001e674835c6 Instructions for BIOS package upgrade Customers should follow the upgrade procedures described in the Intel One-Boot Flash Update Utility User Guide. http://www.intel.com/support/motherboards/server/sysmgmt/sb/CS-029303.htm NOTE: The Intel OFU supports only a limited variety of Intel motherboards. Customers are advised to verify compatibility of the boards with the utility. Introduction of new motherboards Motherboard replacements are expected during the life of the cluster and all nodes must be running at a common BIOS package version. Therefore, it may be necessary to upgrade or downgrade the firmware of the replacement motherboard such that it matches the rest of the cluster. Motherboards are expected to ship with factory default settings, which is not compatible with certain other hardware on the CS-Storm platform. Cray recommends the following minimum BIOS settings in addition to the factory defaults. 1. Press F2 to enter the BIOS configuration utility Main page. 2. To revert all BIOS settings to the factory optimal default settings, use the F9 key. 3. Save settings and reboot using F10, then re-enter the BIOS configuration utility Main page. 4. Configure (disable) Quiet Boot from the Main page. HR90-2003-D CS-Storm Hardware Guide 101 S2600JF and S2600WP Motherboard BIOS Upgrade Figure 60. Disable Quiet Boot in BIOS 5. Configure processor settings in the Advanced->Processor Configuration menu. Processor C6: Disabled Intel (R) Hyper-Threading Tech: Disabled HR90-2003-D CS-Storm Hardware Guide 102 S2600JF and S2600WP Motherboard BIOS Upgrade Figure 61. Disable processor C6 and Hyper-Threading 6. Configure system performance in Advanced->Power & Performance. CPU Power and Performance Policy: Performance HR90-2003-D CS-Storm Hardware Guide 103 S2600JF and S2600WP Motherboard BIOS Upgrade Figure 62. Enforce Performance Power Management Policy 7. Configure memory behavior in Advanced->Memory Configuration. Memory SPD Override: Enabled 8. Configure accelerator support in Advanced->PCI configuration. Memory Mapped I/O Above 4GB: Enabled Memory Mapped I/O Size: 256G (512G for NVIDIA K40 systems) HR90-2003-D CS-Storm Hardware Guide 104 S2600JF and S2600WP Motherboard BIOS Upgrade Figure 63. Modify PCI configuration 9. Configure system fan behavior in Advanced->System Acoustic and Performance Configuration. FAN PWM Offset: 40 10. Configure AC power loss behavior in Server Management. Resume on AC Power Loss: Last State 11. Enable console redirection in Server Management->Console Redirection. Console Redirection: Serial Port A NOTE: Leave all other console redirection settings as-is (optimal defaults). 12. Save and Exit (F10). 13. Replacement motherboards lack any Cray specific data such as the chassis AP serial number. To restore this information, run the following commands and reboot the system for changes to take effect: # # # # # # ./flashupdt ./flashupdt ./flashupdt ./flashupdt ./flashupdt ./flashupdt HR90-2003-D -set -set -set -set -set -set chassis product chassis chassis product product Snum "<Node serial number>" Snum "<Node serial number>" Pnum "<Node Model # (eg. CRAY-512X)>" Mn "CRAY" Mn "CRAY" Pn "<Node Model #>" CS-Storm Hardware Guide 105