Recipient Name - Rhode Island Housing
Transcription
Recipient Name - Rhode Island Housing
REQUEST FOR PROPOSALS INTRODUCTION Through this Request for Proposals (“RFP”), Rhode Island Housing seeks proposals from qualified firms with expertise to perform remediation and IT Disaster Recovery. Our production site consist of FAS 2040/Three ESX Hosts/One VC/about 50 VMs and Disaster Recovery FAS 2040/Three ESX Hosts/One VC/over 50 VMs. The strategy is to ensure data from our Storage Area Network (SAN) and applications are replicated, stored and quickly accessible should Rhode Island Housing experience a disaster. Our primary data center is located at 44 Washington Street, Providence RI, 02903. The Disaster Recovery site will be located at 1 Federal Street, Springfield Massachusetts. Both sites are connected with a 100MB fiber connection between them. INSTRUCTIONS The proposal should be submitted to [email protected] no later than 5:00 PM May 4, 2012. Proposals should be presented on electronic business letterhead. Respondents are advised that all submissions (including those not selected for engagement) may be made available to the public on request upon completion of the process and award of a contract(s). Accordingly, any information included in the proposal that the respondent believes to be proprietary or confidential should be clearly identified as such. SCOPE OF WORK Please see Attachment A. ITEMS TO BE INCLUDED WITH YOUR PROPOSAL A. General Firm Information 1. Provide a brief description of your firm, including but not limited to the following: a. Name of the principal(s) of the firm b. Name, telephone number and email address of a representative of the firm authorized to discuss your proposal. c. Address of all offices of the firm. d. Number of employees of the firm. Page 2 B. Experience and Resources: 1. Describe your firm and its capabilities. In particular, support your capacity to perform the Scope of Work. 2. Indicate which principals and associates from your firm would be involved in providing services to Rhode Island Housing. Provide appropriate background information for each such person and identify his or her responsibilities. 3. Provide a detailed list of references including a contact name and telephone number for organizations or businesses for whom you have performed similar work. 4. Identify any conflict of interest that may arise as a result of business activities or ventures by your firm and associates of your firm, employees, or subcontractors as a result of any individual’s status as a member of the board of directors of any organization likely to interact with Rhode Island Housing. 5. Identify any material litigation, administrative proceedings or investigations in which your firm is currently involved. Identify any material litigation, administrative proceedings or investigations, to which your firm or any of its principals, partners, associates, subcontractors or support staff was a party, that has been settled within the past two (2) years. 6. Describe how your firm will handle actual and or potential conflicts of interest. C. Fee Structure: Fixed fee based project. The cost of services is one of the factors that will be considered in awarding this contract. The information requested in this section is required to support the reasonableness of your fees. 1. Please provide a cost proposal for consulting work described in attachment A 2. Please provide any other fee information applicable to the engagement that has not been previously covered that you wish to bring to the attention of Rhode Island Housing. D. Miscellaneous 1. Rhode Island Housing encourages the participation of persons of color, women, persons with disabilities and members of other federally and State-protected classes. Describe your firm’s affirmative action program and activities. Include the number and percentage of members of federally and State-protected classes who are either principals or senior managers in your firm, the number and Page 3 percentage of members of federally and State-protected classes in your firm who will work on Rhode Island Housing’s engagement and, if applicable, a copy of your Minority- or Women-Owned Business Enterprise state certification. 2. Discuss any topics not covered in this Request for Proposals that you would like to bring to Rhode Island Housing’s attention. E. Certifications 1. Rhode Island Housing insists upon full compliance with Chapter 27 of Title 17 of the Rhode Island General Laws, Reporting of Political Contributions by State Vendors. This law requires State Vendors entering into contracts to provide services to an agency such as Rhode Island Housing, for the aggregate sum of $5,000 or more, to file an affidavit with the State Board of Elections concerning reportable political contributions. The affidavit must state whether the State Vendor (and any related parties as defined in the law) has, within 24 months preceding the date of the contract, contributed an aggregate amount in excess of $250 within a calendar year to any general officer, any candidate for general office, or any political party. 2. Does any Rhode Island “Major State Decision-maker,” as defined below, or the spouse or dependent child of such person, hold (i) a ten percent or greater equity interest, or (ii) a Five Thousand Dollar or greater cash interest in this business? For purposes of this question, “Major State Decision-maker” means: (i) All general officers; and all executive or administrative head or heads of any state executive agency enumerated in § 42-6-1 as well as the executive or administrative head or heads of state quasi-public corporations, whether appointed or serving as an employee. The phrase “executive or administrative head or heads” shall include anyone serving in the positions of director, executive director, deputy director, assistant director, executive counsel or chief of staff; (ii) All members of the general assembly and the executive or administrative head or heads of a state legislative agency, whether appointed or serving as an employee. The phrase “executive or administrative head or heads” shall include anyone serving in the positions of director, executive director, deputy director, assistant director, executive counsel or chief of staff; (iii) All members of the state judiciary and all state magistrates and the executive or administrative head or heads of a state judicial agency, whether appointed or serving as an employee. The phrase “executive or administrative head or heads” shall include anyone serving in the positions Page 4 of director, executive director, deputy director, assistant director, executive counsel, chief of staff or state court administrator. If your answer is “Yes,” please identify the Major State Decision-maker, specify the nature of their ownership interest, and provide a copy of the annual financial disclosure required to be filed with the Rhode Island Ethics Commission pursuant to R.I.G.L. §§36-14-16, 17 and 18. 3. Please include a letter from your president, chairman or CEO certifying that (i) no member of your firm has made inquiries or contacts with respect to this Request for Proposals other than in an email or written communication to [email protected] to seek clarification of the Scope of Work set forth in this proposal, from the date of this RFP through the date of your proposal, (ii) no member of your firm will make any such inquiry or contact until after June 1, 2012, (iii) all information in your proposal is true and correct to the best of her/his knowledge, (iv) no member of your firm gave anything of monetary value or promise of future employment to a Rhode Island Housing employee or Commissioner, or a relative of the same, based on any understanding that such person’s action or judgment will be influenced and (v) your firm is in full compliance with Chapter 27 of Title 17 of the Rhode Island General Laws, Reporting of Political Contributions by State Vendors. EVALUATION AND SELECTION A selection committee consisting of Rhode Island Housing employees (the “Committee”) will review all proposals and make a determination based on the following factors: Professional capacity to undertake the scope of work. Proposed fee structure (must be fixed fee) Ability to perform within time and budget constraints Evaluation of potential work plans Previous work experience and performance with Rhode Island Housing and/or similar organizations Recommendations by references Firm minority status and affirmative action program or activities The time line for completion of remediation and disaster recovery implementation Other pertinent information submitted Having the skill set and experience to perform all aspect of remediation and configuration for Disaster Recovery Holding the skill set and the expertise to complete production failover to Disaster Recovery site and back to Production site within an acceptable time frame (15 minutes or less). Also provide High Availability for selected applications Ability to deliver the final solution by June 15, 2012 Meeting and fulfilling requirements outlined on the check list (Attachment B) Page 5 Rhode Island Housing may invite one or more finalists to make presentations. In its sole discretion, Rhode Island Housing may negotiate with one or more firms who have submitted qualifications to submit more detailed proposals on specific projects as they arise. By this Request for Proposals, Rhode Island Housing has not committed itself to undertake the work set forth. Rhode Island Housing reserves the right to reject any and all proposals, to rebid the original or amended scope of services and to enter into negotiations with one or more respondents. Rhode Island Housing reserves the right to make those decisions after receipt of responses. Rhode Island Housing’s decision on these matters is final. For additional information contact: Abdel El idrissi 401-457-1121 [email protected]. Together with its partners, Rhode Island Housing works to ensure that all people who live and work in Rhode Island can afford a healthy, attractive home that meets their needs. Rhode Island Housing uses all of its resources to provide low-interest loans, grants, education and assistance to help Rhode Islanders find, rent, buy, build and keep a good home. Created by the General Assembly in 1973, Rhode Island Housing is a privately funded public purpose corporation. Page 6 Attachment A Scope of Work General: Rhode Island Housing is seeking proposals from vendors to perform a comprehensive Remediation and IT Disaster Recovery implementation. The selected vendor must be qualified in our primary technologies: NetApp SAN, VMware, HP servers, Cisco switches and Juniper firewalls. The work to be completed includes two environments: a production environment in Providence, RI and our recovery site located in Springfield, MA. Each technical environment consists of FAS 2040, VSphere 4 and all related dependencies and subset components. This fixed price bid to include professional services and associated expenses (travel accommodations, after hours and weekend work for both Providence and co-location in Springfield, MA). During the Remediation and IT Disaster Recovery Phase the selected vendor will address remediation of both environments, software upgrades, connectivity, firewalls configuration, backups, replication, internet bandwidth and all aspect of disaster recovery including failover and testing (from production to disaster recovery site and back to production). Technical Requirements: Rhode Island Housing requires the senior engineers carrying out the implementation to be certified and have extensive working knowledge with proven track records for these types of implementations: 1. 2. 3. 4. 5. 6. 7. NetApp (all modules) VMware (all modules) Cisco/Juniper (switches, routers and firewalls) Disaster and Recovery Linux & Windows 2003/2008/SQL 2005 (working knowledge) Novell /OES/GroupWise (working knowledge) Symantec Backup Proposer must apply best practice at all times and must spell out step by step methodology by which the remediation, disaster recovery connectivity testing and full disaster recovery will be conducted. Page 7 Systems Remediation and Disaster Recovery Implementation: Proposer is expected to perform a comprehensive remediation and disaster recovery, addressing each of the items identified as needing remediation in the Disaster Recovery Assessment - Phase 1, in accordance with and as supplemented by the recommendations contained in the VMware Health Check - Production Report and VMware Health Check Disaster Recovery Report, as appropriate to the environment. The Rhode Island Housing Phase I Design diagram sets forth our illustration of our anticipated connectivity configuration. These documents are provided Exhibits 1-4 to this RFP and are referred to as the Documents. In carrying out the work specified in the Documents 1-4, the proposer must ensure that each of the following criteria are addressed and satisfied: The proposed design must have built in redundancy so that no single point of failure will result in an outage or downtime Zoning speed should be 8 GB Consultant should ensure that effective Remote Management features exist in the solution so issues can be addressed remotely to eliminate frequent visits to colocation. The proposed design should insure that all VMs without exception must be replicated to disaster recovery and have full disaster recovery functionality Consultant must achieve real time replication or close to it (less than 1/2 hour) in addition, provide high availability for selected systems and applications Disaster recovery site testing must include all aspect of Rhode Island Housing normal operation, accessing applications, file servers, emails, Databases, printing, VPN connections. The proposed design must address VPN connection for users and site to site As part of disaster recovery design, users should have the ability to perform normal functions by either connecting directly from Providence to Springfield using their desktops or thru the use of laptops with VPN Consultant will need use wire management, label all data centers equipment including wires After remediation, a Peer Assessment (Health Check assessment for VMware, Netapp, etc) has to take place prior to any disaster recovery test. The assessment has to be done by engineer not involved in the remediation project. The result must meet the recommendation in Exhibit 1-4, only then consultant can start with disaster recovery configuration and testing Setup and configuration of data collection tools to determine rate of change for volumes being replicated to the Disaster Recovery site The selected vendor will install the most current software releases and all hardware must be tested and operational. Backup and restore processes must be verified, tested and meet requirement see Exhibit 1-4 for details Page 8 Backup solution must address a full backup solution: disk to disk to tape of 10 TB with back up time 8 hours or less Reconfigure Symantec and work on resolving speed issues (No drives will be purchased) Consultant would be responsible for the security, the moving, and the installation & configuration of all disaster recovery equipment at the Springfield co-location Proposer will have to conduct detailed design review with Rhode Island Housing staff to validate that the design decisions meet the requirements Proposer must describe, in detail, their ability to provide a solution for each of the line item described in Exhibit 1. A non-response will be equivalent to “no solution available” for the specification. It is the vendor’s responsibility to correctly correlate their response to each item It is the proposer responsibility to remediate all existing issues or complications arising from the proposed design Prior to starting the work on the project, consultant must perform inventory, verification of equipment, software quantities, and confirm that everything is available to start the project Provide a list of any additional equipment or software required to complete this project Consultant is responsible for making connectivity configurations for disaster recovery OSHAEAN is our disaster recovery co-location partner and SecureWorks is our firewall provider are there to fulfill and facilitate firewall changes If proposer believes that any of the above requirements are in conflict with the Documents they must identify the conflict in the response to this RFP and set specifically Summary of key Deliverables Completion of all remediation work set forth in the Documents, in accordance with the requirements set forth above A successful full disaster recovery test that meet Rhode Island Housing requirements set forth in this RFP and the Documents Delivery of a detailed test report that shows successful implementation of remediation work Delivery of full documentation that outlines step by step how to declare, execute and recover back to Production Delivery of as built document that encompasses physical and logical layout of production & Disaster Recovery which include Visio diagrams Delivery of training and a knowledge transfer document Page 9 Inquiries and Communication All inquiries and other communications with respect to this RFP are to be directed ONLY to the following email address: [email protected] Attachment B Check List This form needs to be returned with your proposal to confirm your understanding of the assignment and agreement to provide the required services. ___ 1. This is a Remediation and IT Disaster Recovery Phase, the outcome is a full disaster recovery within the parameters outline in Attachment A and Exhibit 1-4 ___ 2. The proposed fee must be fixed for the duration of the project and without exclusions. ___ 3. Project shall commence immediately after the execution of an agreement and the completion date must be no later than June 15th, 2012 ___ 4. This type of project will need to have at least 2 to 3 engineers on the premises, engaged and committed to the project ___ 5. The full solution implementation must be completed within 4 to 5 weeks from the signing of the agreement. ___ 6. Proposer must have at least five (5) years experience installing systems and a list of locally completed projects (3 minimum) equivalent in size and system type to this project. Consultant must provide contact names, telephone numbers and dates of completion. ___ 7. If Proposer cannot supply a list based on the above criteria, provide a list of five system installation regionally that is of the same manufactured type and model. (Include contact names, telephone numbers, dates of completion and size of systems). ___ 8. Proposer shall provide a list of all qualified engineers and project managers to be assigned to this project, including relevant training programs completed by each, and years of related experience in area of expertise. Page 10 ___ 9. Provide a detailed connectivity for disaster recovery and testing methodology. ___ 10. Provide a detailed project plan with time table of completion for each component addressed in Exhibit 1, bullet points in attachment A and apply best practice set forth in Exhibit 3-4. ___ 11. Commitment to successfully address every item in the Attachments A and B, and implement all the recommendations contained in the health check assessment Documents (Exhibits 1-3). ___ 12. Proposer has to show physical and logical layout of their disaster recovery design. ___ 13. Backup solution must be spelled out in great detail in your response to this RFP. ___ 14. Commitment to visiting the co-location prior to starting the project, the goal of the visit is to be familiar with the site as well as check equipment, fiber line, phone line, rack. Exhibit 1 Disaster Recovery Assessment Phase 1 – A Consultative Document on Remediation and Design Rhode Island Housing – Phase 1 - Disaster Recovery Assessment Executive Summary.............................................................................................................................................. 3 Overview .............................................................................................................................................................. 3 Phase 1 – Deliverables ............................................................................................................................... 4 Assessment Summary ......................................................................................................................................... 4 Components Analyzed in this report – Production and DR Environments ............................... 5 Detailed Report – Health Check Audit List .................................................................................................. 5 Conduct an end-to-end overview of the whole environment ......................................................... 6 Review overall utilization and performance metrics of virtual environment .......................... 6 Analyze auto-support messages and Syslogs ........................................................................................ 6 Verification of multi-pathing on all hosts ............................................................................................... 7 Review of cabling ............................................................................................................................................. 7 Explore licenses in hand versus what is needed to accomplish disaster recovery ................ 8 Assess SAN switches & Fiber Channel zoning ....................................................................................... 8 Routing, connectivity, bandwidth, firewalls, site to site replication ............................................ 8 Aggregate configuration laid out for best performance .................................................................... 9 Security risks and exposures .................................................................................................................... 10 Switch traffic separation (i.e. console 2, vMotion 2, Network 4... ) ............................................ 10 Time servers for production and disaster recovery ........................................................................ 11 DNS & DHCP redundancy for both environments ............................................................................ 11 Spindle count per aggregate ..................................................................................................................... 12 Space usage and hot spares ....................................................................................................................... 12 NetApp Alua .................................................................................................................................................... 12 NetApp BMC .................................................................................................................................................... 13 VMware DRS.................................................................................................................................................... 13 High Availability ............................................................................................................................................ 13 VMware DPM .................................................................................................................................................. 14 VIan/ Tagging ................................................................................................................................................. 14 VMware SRM ................................................................................................................................................... 14 VMware Thin Provisioning ........................................................................................................................ 15 NetApp Aggregates and volumes ............................................................................................................ 15 VMotion ............................................................................................................................................................. 16 Data Protection .............................................................................................................................................. 16 Storage Networking ..................................................................................................................................... 16 NetApp Deduplication ................................................................................................................................. 17 NetApp SnapShot........................................................................................................................................... 17 NetApp FlexClone .......................................................................................................................................... 18 NetApp Snap Vault ........................................................................................................................................ 18 Monitoring & Management........................................................................................................................ 19 NetApp SnapManager .................................................................................................................................. 19 NetApp Operation Manager ...................................................................................................................... 20 NetApp Protection Manager ..................................................................................................................... 20 NetApp Provisioning Manager ................................................................................................................. 20 Backup and Recovery .................................................................................................................................. 21 Verify all software licensing for NetApp/VMware/Backup/Disaster Recovery .................. 22 Explore NetApp/VMware/HP software and firmware upgradability ...................................... 22 VMware Ports configurations and binding ......................................................................................... 23 Error logs for both environments ........................................................................................................... 23 Address configuration for graceful shutdown of the environment during power .............. 24 loss (Symmetra LX, 16KVA tower) ......................................................................................................... 24 Opportunities to optimize configurations and improve performance .......................................... 24 Roadmap for software, configuration enhancement and upgrades ............................................... 25 Coordination regarding Disaster Recovery ............................................................................................. 27 Coordination with Oshean ......................................................................................................................... 27 Coordination with SecureWorks ............................................................................................................. 27 Procurement Needed........................................................................................................................................ 29 Logical and Physical Layout Visio Diagrams ........................................................................................... 29 Executive Summary Overview Rhode Island Housing (RIH) has engaged Vendor Company for Phase 1 of the Disaster Recovery Site Implementation Project. Phase 1, at its highest level, is to provide an assessment and analysis of RIH’s current production and future disaster recovery environments (currently staged at main datacenter in Providence) in order to facilitate the Disaster Recovery Site Implementation Project, known as Phase 2. Phase 2, at its highest level, includes moving the Disaster Recovery Environment to Oshean’s datacenter and configuring for proper disaster recovery functionality. The purpose of Phase 1 is twofold. First is to identify any current issues with production and disaster recovery environments that would prevent proper functionality of disaster recovery. These issues will be defined as necessary remediation steps for Phase 2. Second is to provide a design and any other necessary information for the implementation portion of Phase 2, which is configuring and moving the Disaster Recovery Environment to Oshean’s datacenter. Phase 1 – Deliverables The deliverables in this assessment report include the following items as detailed in the RIH document, Attachment A: Scope of Work, provided in this report as Appendix A. Assessment Summary Detailed Report – Health Check Audit List Detailed Report – VMware Health Check Coordination regarding Disaster Recovery As Built Document – Logical and Physical Layout Visio Diagrams Report addressing opportunities to optimize configurations and improve performance Report providing roadmap for software, configuration enhancement and upgrades Assessment Summary The Assessment of RIH’s production and disaster recovery environments was performed by analyzing those components of both environments in order to determine current health and any requirements needed to facilitate the transition to Phase 2. This summary will identify the components (hardware and software) that were analyzed and the method for analysis used which yielded the data for all deliverables in this project. Reading the reports below you will find the specific audit list items (as determined by Attachment A), the components included in the analysis, and the recommendation for Phase 2. Components Analyzed in this report – Production and DR Environments NetApp FAS2040 w/ dual controllers and DS4243 Disk Shelf o Analysis Tool – Vendor Company Engineer Manual Analysis and NetApp Operations Manager VMware – vSphere 4.1 – Vendor Company Engineer Manual Analysis and VMware Health Check Analyzer Juniper SNG w/ HA – Dell SecureWorks Engineer Analysis and Vendor Company Engineer Manual Analysis Cisco 3750s & 9124mds – Vendor Company Engineer Manual Analysis Cabling (Fiber and Ethernet) – Vendor Company Engineer Manual Analysis Bandwidth – Vendor Company Engineer, Oshean Engineer and Dell SecureWorks Engineer Manual Analysis DNS / DHCP – Vendor Company Engineer Manual Analysis Backups – Symantec, HP StorageWorks Library, NetApp SnapManager – Vendor Company Engineer Manual Analysis Based on all the items analyzed in the Components section above, there are many items that require remediation. Recommended remediation will be detailed in the 2 sections below (Detailed Report). In addition, any items of note that are important for the delivery of Phase 2 will be detailed in the 2 sections below as Design Notes. Detailed Report – Health Check Audit List The list below was pulled from Attachment A (see Appendix) as developed by RIH and was used as a guide by the Vendor Company Engineer to provide thorough analysis. This section will define the specific audit item and then subsequently detail both necessary remediation steps and design notes; items that are required as deliverable in Phase 2. Conduct an end-to-end overview of the whole environment Necessary Remediation – Remediation is defined in individual items of audit list below. Design Notes – See design notes of each individual audit list item below in addition to reports in this document around Phase 2 coordination, optimization opportunity, upgrade roadmap and Visio diagrams. Review overall utilization and performance metrics of virtual environment Necessary Remediation – VMware – Upgrade ESX to vSphere 5 in both Production and DR and upgrade each virtual machine hardware version to 8.0. Design Notes – VMware – qty three (3) ESX hosts (per environment) with minimal load on cpu and memory HP Proliant DL380 G7 – qty three (3) (per environment) with 196GB Ram each Analyze auto-support messages and Syslogs Necessary Remediation – NetApp – Syslog messages indicated that system time was wrong across all controllers on both filers (DR and Production). Recommendation is to configure internet time server. Design Notes – Verify time servers are syncing with NetApp (all controllers) and providing proper time. Verification of multi-pathing on all hosts Necessary Remediation – VMware – Multi-pathing is properly configured for all datastores, however three (3) Raw Device Mappings exist in Production environment. Recommendation is to migrate RDM data to VMFS datastores and ensure multi-pathing is correct. Design Notes – Multi-pathing exists for all VMFS datastores – mapped over Fiber Channel paths (four (4) paths each). Verify number of paths is correct for those RDMs which were migrated to new VMFS datastores in remediation. Review of cabling Necessary Remediation – Fiber – Recommendation is to chase all fiber connections from Production ESX hosts and verify that each HBA interface on HP server is patched into disparate fiber switch and configured properly in fiber switch to accomplish no single point of failure. Ethernet - Recommendation is to chase all Production ESX Ethernet adapters (HP servers) and verify they are patched into disparate switching. Also recommended is to ensure PCI and Onboard NICS are leveraged in redundancy across VMware vswitches. Each vswitch should have both a PCI and Onboard NIC assigned to it. Design Notes – In order to accomplish Ethernet redundancy at both Production and DR facilities, a qty of one (1) Cisco 3750 24-port switches must be procured. Explore licenses in hand versus what is needed to accomplish disaster recovery Necessary Remediation – NetApp – Install, configure and test FlexClone license for both controllers on DR and Production SAN. Juniper – Need to coordinate setup, configure and test of IPS and IDS with Dell SecureWorks. Design Notes NetApp – Procure qty four (4) (two (2) controller X 2 SANs) FlexClone licenses. Juniper – procure licensing for IDS and IPS from Dell SecureWorks. Assess SAN switches & Fiber Channel zoning Necessary Remediation – None noted. Ensure running at 8gb for all fiber Design Notes Fiber Zoning – is appropriate – details below a. Prod A – 1 VSAN with 15 Zones b. Prod B - 1 VSAN with 70 Zones c. DR A – 1 VSAN with 15 Zones d. DR B - 1 VSAN with 30 Zones Routing, connectivity, bandwidth, firewalls, site to site replication Necessary Remediation – VMware – SRM is licensed but is not configured. Recommendation is to configure SRM and ensure replication works. NetApp – FlexClone needs to be licensed. NetApp SnapMirror – No replication schedule set. Need to set a replication schedule on all controllers. Prod Controller A - replication not set for volumes datastore05, 07, 08. Prod Controller B - replication not set for volumes datastore17, 18, 19. DR Controller A - destination not set for datastore01, 02, 03,04,05,06. DR Controller B - destination not set for datastore11, 12, 13,14,15,16. Design Notes – VMware – Configure SRM. NetApp – Configure FlexClone. Bandwidth is sufficient for requirements – 100mb fiber provided by Oshean Firewalls. Production site has Juniper SNG (managed by Dell Secureworks) with High Availability. Routing is in place for DR site subnets. DR site has a Cisco ASA provided by Dell Secureworks. Vendor will need to create and send logical firewall configuration for DR site (form provided by Dell) to SecureWorks. Aggregate configuration laid out for best performance Necessary Remediation – NetApp Plug-in for ESX – MBR Tools not installed. Recommendation is to install, configure and test NetApp MBR Tools. Design Notes – This ESX console-based tool tests and aligns guest file systems on a VMDK for VMFS and NFS datastores. Aligning the file system block boundaries to the underlying NetApp storage system LUN ensures the best storage performance. The data is migrated from a backup of the original -flat.vmdk file to a new, properly aligned -flat.vmdk file. Security risks and exposures Necessary Remediation – NetApp – CIFS and NFS are running on both SANs but are not used. Recommendation is to turn off both services. NetApp - Create users for delegated tasks (administrative) and configure auditing of account access. VMware – Discrepancies exist between security profiles on services consoles of all ESX Hosts. Recommendation is to set security profiles to VMware default. VMware - Create users for delegated tasks (administrative) and configure auditing of account access. Cisco – Vlans are configured on Cisco Core 3750 but ACLs are not being leveraged in any capacity. Recommendation is to configure Access Groups only allowing necessary traffic between production Vlans and management Vlans (those subnets used for service console, vmkernel, NetApp Ethernet Management) and apply to the management Vlans. Cisco - Vlan tagging is not being used in any capacity. Recommendation is to use tagging at a minimum for NetApp Management Vlan, VMware Service Console Vlan Juniper – (previously stated in licensing section above) IPS and IDS are not licensed. Need to procure these through SecureWorks and ensure they configure for proper intrusion prevention and deep packet inspection. Design Notes – It will be required for the purposes of Phase 2 that vendor can supply an experienced Cisco Engineer for the remediation of all Cisco items (ACL design and configuration). Switch traffic separation (i.e. console 2, vMotion 2, Network 4... ) Necessary Remediation – VMware – Prod and DR - vmotion, vmkernel and production port groups all share same physical NICS and Virtual Switches are not separated by traffic type. Recommendation is to segment vswitches by traffic type (vmkernel for vmotion, management, vm production). Design Notes VMware – Prod and DR - each vswitch should have at least two (2) physical NICs attached. Physical NICs uplinked to virtual switching should be disparate PCI buses (i.e. 1 onboard NIC and 1 pcie NIC to each vswitch). General - Also note the requirements in security exposures Cisco section the need for creating separate vlans for management networks. Time servers for production and disaster recovery Necessary Remediation – NetApp – DR and Prod - current time server is set to vCenter. Recommendation is to add internet time server. VMware – Prod and DR – ESX hosts currently get their time from vCenter and DRVCenter respectively. Recommendation is to add internet time server. Design Notes – NetApp and ESX should gain time from internet time servers. Virtual machines should be configured to either get time from ESX host or from network time server (domain controller). DNS & DHCP redundancy for both environments Necessary Remediation – VMware – DR Hosts not configured with DNS servers. Configure DNS on these hosts. VMware – DR – No DNS server currently for Disaster Recovery environment. Recommendation is to build a VM with DNS configured to provide DNS to DR environment. DHCP Prod – Currently one (1) DHCP server serves the scope for production systems. Recommendation is to deploy a second DHCP server and configure for split scope. Design Notes – Production DNS servers are 192.168.2.230 and .231 and are AD Integrated. Spindle count per aggregate Necessary Remediation – None noted. Design Notes – Note the remediation necessary in the “Aggregate configuration laid out for best performance” section. Space usage and hot spares Necessary Remediation – NetApp – two (2) spare disks are assigned to each aggregate (Production and DR). Recommendation is to add one (1) spare disk to aggregate as RAID is configured for DP and two (2) spare disks are unnecessary. Design Notes – Aggregate usage should not exceed 90%. Volume usage should not exceed 85%. NetApp Alua Necessary Remediation – NetApp – Alua is not currently enabled on any fiber initiators (igroup). Recommendation is to enable, configure and test Alua. VMware – Once Alua is enabled on NetApp, initiators then configure the ESX hosts to use the Alua paths. Design Notes None NetApp BMC Necessary Remediation – None noted; it is properly configured Design Notes None VMware DRS Necessary Remediation – VMware – DRS set to maximum aggressiveness. Recommendation is to set to default. Design Notes None High Availability Necessary Remediation – None noted. Design Notes VMware – DR and Production are properly configured. NetApp – DR and Production - each SAN has two (2) controllers – is properly configured for HA (active / active). Juniper SNG – Dell SecureWorks confirmed that HA is properly configured for the Juniper pair. Cisco ASA DR –. Need to procure an additional Cisco ASA from Secureworks and coordinate with them for configuration. Vendor will ensure and test that proper HA is setup, configured and tested by Secureworks. VMware DPM Necessary Remediation – None noted. Design Notes DPM is not being used and configuration is not recommended. VIan/ Tagging Necessary Remediation – Note – also see “Security exposures and risks” section. Cisco 3750 Core – Currently Vlans exist for all subnets on this device. Tagging is not being used in any capacity. Recommendation is to implement tagging for Vlans used by VMware and NetApp management. Design Notes – Note – Phase 2 vendor should supply an experienced Cisco engineer. Tagged Vlans should include VMware service console (DR and Production), VmKernel (DR and production), and NetApp Management (Ethernet interfaces configured DR and production). VMware SRM Necessary Remediation – Note – also see “Routing, connectivity, bandwidth, firewalls, site to site replication” section. VMware – SRM is currently licensed but not configured. Recommendation is to configure VMware SRM and test functionality. Design Notes SRM should be configured according to best practice as set forth by collaboration performed by VMware and NetApp in this document: http://media.netapp.com/documents/tr-3671.pdf VMware Thin Provisioning Necessary Remediation – See “Monitoring and Management” section. Design Notes – Thin Provisioning is being leveraged extensively, which is fine. Only qty eighteen (18) vmdk disks are thick provisioned. Recommendation is to maintain at least 20% free space on VmDatastores and monitor datastore utilization to ensure storage does not become overcommitted. NetApp Aggregates and volumes Necessary Remediation – On Production SAN there are seven (7) disks that are not assigned to an aggregate. Recommendation is to assign them. On Production SAN, disk 0d.01.4 is not currently owned by any controller. Recommendation is to ensure proper firmware version of disk and assign disk to a controller and subsequently add this disk to aggregate. On DR SAN there are eighteen (18) disks not assigned to an aggregate. Recommendation is to assign them. On Prod SAN Controller A, volumes restore and vol2 are not being used. Recommendation is to delete. Design Notes – NetApp – Both SANs – 28 disks assigned to single aggregate but no performance issues noted. Ensure all disks, per remediation, are assigned to a controller and added to an aggregate. VMotion Necessary Remediation – See “Switch Traffic Separation” section. Design Notes vMotion – Works correctly but recommendation is to have vmkernel for vMotion on its own vSwitch. Data Protection Necessary Remediation – See “Backup and Recovery” section. Design Notes NetApp, Prod and DR, data is contained within Raid DP (similar to Raid 6) aggregates. Each aggregate has a designated hot spare disk available. This configuration allows for up to two (2) simultaneous disk failures at any given time inside an aggregate. Storage Networking Necessary Remediation – Production – Both Controllers A & B - e0p is 198.15.1.x/24 – unknown subnet. Recommendation is to determine reason and disable. E0b is 10.44.55.x/24 – unknown subnet. Recommendation is to determine reason and disable. NetApp interfaces e0a – Production is 10.44.10.x/24; DR is 10.55.10.x/24. This is on the same subnet as production lan for servers. Recommendation is to configure e0a and an additional NetApp interface with ips in a management vlan. See Also “Review of Cabling” and “Switch Traffic Separation” sections. Design Notes – NetApp Storage Paths, Prod and DR, are configured using Fiber Channel. NetApp Deduplication Necessary Remediation – Production - Deduplication window is too short. Recommendation is to have at least a two (2) hour window per volume and only run during non-business hours daily. DR - Deduplication not currently configured for any volume. Use same configuration method as Production, except set the daily schedule for at least four (4) hours after Production Deduplication schedule begins. Design Notes – Deduplication can be enhanced and gain back more storage by grouping Virtual Machines with same OS on the same NetApp volume. NetApp SnapShot Necessary Remediation – Snapshotting is currently configured incorrectly as the retention period is 365 dailies. Recommendation is to setup, configure and test snapshotting as follows: a. Daily – qty five (5) b. Hourly– 7am – 7pm – qty twelve (12) c. Monthly – qty twelve (12) d. Yearly = one (1) Design Notes – None NetApp FlexClone Necessary Remediation – FlexClone is not currently licensed. Need to procure and setup, configure and test FlexClone Design Notes – Procure qty four (4) FlexClone licenses (2 SANS X 2 Controllers). FlexClone is necessary for testing NetApp failover to DR. NetApp Snap Vault Necessary Remediation – None noted. Design Notes NetApp Snap Vault is not being used. It is a backup and recovery product but this will add an additional layer of complexity to the backup and DR plan that is unnecessary. See “Backup and Recovery” section regarding design. Monitoring & Management Necessary Remediation – NetApp –NetApp tools are currently dispersed amongst several servers. Recommendation is to consolidate NetApp Management Tools onto a single server for ease of use by IT Personnel. Recommendation is to also configure NetApp Operations Manager for proper monitoring and alerting of Prod and DR filers. VMware – Only default alerts are currently configured, which is fine, but notifications are not configured. Recommendation is to configure notifications for important alerts when thresholds are exceeded for storage, networking, resource utilization, and critical errors. Networking – No snmp monitoring is currently being leveraged for the Cisco switching. Recommendation is to procure a central monitoring system software solution for notifications on snmp traps and server-specific thresholds (i.e. Ipswitch What’s Up Gold or Solarwinds). Design Notes NetApp – Operations Manager is the primary tool used here to accomplish comprehensive monitoring and management. VMware – Monitoring and notifications are built-in, but need to be configured. NetApp SnapManager Necessary Remediation – SnapManager for Virtual Infrastructure single file restore does not appear operative. Need to be able to recover vms from this utility or perform file-level restoration. Recommendation is to troubleshoot the reason why this is not working and ensure filelevel recovery capability. Design Notes – This tool should be installed, configured and tested on the same server with other NetApp utilities. NetApp Operation Manager Necessary Remediation – See also “Monitoring and Management” section above. SnapMirror not set in Operations Manager; needs to be setup, configured and tested. Design Notes This tool provides comprehensive NetApp monitoring and management and should be installed, configured and tested on the same server with other NetApp utilities. NetApp Protection Manager Necessary Remediation – This tool provides comprehensive NetApp data protection monitoring and management and should be installed, configured and tested on the same server with other NetApp utilities. Design Notes – This tool should be used as part of RIH’s use of SnapManager. NetApp Provisioning Manager Necessary Remediation – This tool provides comprehensive NetApp storage provisioning and configuration management and should be installed, configured and tested on the same server with other NetApp utilities. Design Notes This tool will provide a single place for RIH to provision new storage and also configure things like deduplication and thin provisioning. Backup and Recovery Necessary Remediation – Symantec Backup Servers = BK38, RIH40, RIH173, RIHBackup253 1. HP StorageWorks Library with only single full height LTO-5 drive and current backups-to-tape take 36 hours to complete. Recommendation is to procure up to three (3) half height tape drives. 2. Recommendation is to adopt a disk-to-disk-to-tape solution. Will need to procure inexpensive SATA storage for this (at least 15TB useable). 3. VCenter - VCenter Databases (Prod and DR) should be backed up leveraging a SQL maintenance schedule, generating a bkf file to be backed up by BackupExec. Maintenance schedule requires SQL Standard or Enterprise which is not currently owned. Design Notes Recommended Design for Backups - Production a. Apps, Files, SQL, Groupwise = Symantec BackupExec (disk-to-disk-to-tape) b. VMs, NetApp Volumes = NetApp SnapManager to SAN c. NetApp Volumes = BackupExec NDMP Option (already owned and configured). Change configuration to run a month-end to tape only with retention of qty twelve (12) to have snapshots of NetApp Volumes on tape. d. Backup Exec Note - RIH currently owns version 2012 SP3 (installed) e. Backup-to-Disk Notes – Storage Device for backups must have at least 15TB useable space and have qty 2 8gb Fibre Channel interfaces. In addition the current backup server only has 1 port for Fibre Channel on its HBA. An 8gb dual port Fibre Channel HBA must be purchased, installed and configured in the current HP Proliant backup server. f. Vendor needs to show that connectvity and performance of backup design actually works as designed and vendor must be committed to the performance as designed and troubleshoot if necessary. The expectation is that interface perform at 8gb/s and backups run at optimal speed. Procurement Needs a. qty three (3) – LTO5 half height drives for HP StorageWorks Library b. qty one (1) – NAS / SAN – Inexpensive Storage device (at least 15TB usable space) for backup-to-disk solution with 2 port 8gb Fibre Channel interface. c. Qty one (1) – PCI Express Fibre Channel 8gb Dual Port HBA for HP Proliant Backup server Verify all software licensing for NetApp/VMware/Backup/Disaster Recovery VMware 1. All licensing for DR is currently owned (ESX Enterprise and SRM). NetApp 1. qty four (4) FlexClone is required for DR (specifically for testing failover). 2. All other required licensing is owned. 3. SnapManager products – Recommendation is to ensure support is current and access to new version is always available. Symantec 1. Recommendation is to maintain Symantec Support Agreement for availability of newer versions. Explore NetApp/VMware/HP software and firmware upgradability 1. General - Firmware will be available for all hardware (HP Proliant Servers, NetApp, Cisco Switches, Juniper) – Recommendation is to upgrade all relevant firmware before disaster recovery site goes live. 2. NetApp – Recommendation is to upgrade Prod and DR to DataOntap 8.x (most stable). 3. HP DL380 G7 (qty six (6) Servers) – 8 X CPU (two (2) sockets) – 196GB RAM (maxed)– has been confirmed as vSphere 5.0 Certified Hardware. 4. ESX – Current version is vSphere 4.1 Build 348481. Recommendation is to upgrade Prod and DR to vSphere 5. 5. vCenter – Currently running version 4. Recommendation is to upgrade DR and Prod to vCenter 5. VMware Ports configurations and binding Necessary Remediation – Multiple ports on network adapters on ESX hosts are not uplinked. Recommendation is to use ports available in collaboration with vSwitch and port segmentation. See “Vlan / Tagging” section. Design Notes See Visio Diagrams – Logical and Physical Designs Error logs for both environments Necessary Remediation – All remediation for errors have been noted in other sections. Design Notes See “Monitoring & Management” section. Address configuration for graceful shutdown of the environment during power loss (Symmetra LX, 16KVA tower) Necessary Remediation – VMware - Virtual machines not configured for automatic startup / shutdown. Recommendation is to uninstall any APC powerchute software from virtual machines and install, configure and test APC agent on all ESX hosts (Prod and DR). Virtual machines all need to be configured for startup/shutdown in ESX configuration. NetApp – NetApp (Prod and DR) are not configured for APC Network Shutdown. Need to setup, configure and test UPS shutdown on NetApp. Design Notes VMware – APC Network Shutdown 1. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=di splayKC&externalId=1007036 NetApp – Configure snmp trap on NetApp for APC Symmetra and add the FAS filers as enabled trap receivers from the APC Network Management Card. Use the “enable ups” command in DataOnTap to configure the UPS and shutdown event. Opportunities to optimize configurations and improve performance RIH has engaged Vendor Company for assessment of both the production and disaster recovery environments in order to identify any opportunities to optimize current performance. Almost all optimization notes have already been covered in the previous section: Health Check Audit List. In addition Vendor Company has provided RIH with a VmWare Health Assessment of both DR and Prod virtual datacenters. The recommendation to improve performance and optimize configuration is to remediate all items identified in these sections. The VmWare Consultant, Vendor Engineer , has also identified the following items in regards to performance optimization: 1. In the Vmware Disaster Recovery Environment resource pools are not being currently employed. Recommendation is to mimic resource pool configuration of the Production environment for DR. This will ensure resources such as CPU and Memory are allocated to the proper virtual machines 2. In the VmWare Disaster Recovery and Production Environments VM Swap space is set to use local ESX storage (HP DAS). The recommendation is to configure virtual machines to swap inside their respective vmfs datastores. This will ensure swap usage is not crossing disparate storage controllers and increase performance. 3. In the VmWare Disaster Recovery and Production Environments multiple Network adapters on the ESX hosts are not being utilized. Recommendation is to uplink all available network adapters and assign them to virtual switches. This will ensure redundancy in vswitch uplinks and also increase performance as more physical adapters can be used for vswitch bandwidth. 4. In the VmWare Disaster Recovery and Production Environments the VCenter server database should be upgraded to SQL Server Standard 2008R2. In addition the database should be moved off the OS partition. With SQL Server Standard there is enhanced support for SQL maintenance. Roadmap for software, configuration enhancement and upgrades RIH has engaged Vendor Company for assessment of both the production and disaster recovery environments in order to identify a roadmap for software upgrades and configuration enhancement. Upgradeability of hardware and software had already been identified in the previous section: Health Check Audit List. This section will provide those software upgrades recommended for Phase 2 as an overview. 1. NetApp – Currently DR and PROD filers are running DataOnTap version 7.3.4. It is recommended to upgrade DataOnTap to v.8.1. The new version of DataOnTap contains enhancements for both snapshotting and flexclone. In addition the new version has fewer restrictions on aggregate size and number of disks. 2. Cisco – RIH employs Cisco switching throughout its environment and they are all under a current Cisco Support Agreement. It is recommended to perform IOS upgrades with version consistency across all 3750s and 9124mds – about 35 switches total (DR and Prod) 3. HP Proliant DL 380 G7 – There are 3 of these servers each in both DR and Prod. The servers are the ESX hosts for both environments. It will be important, as part of the upgrade to ESXi5, to perform firmware upgrades on all hardware components with particular attention to the BIOS version and HBA firmware. 4. ESX – RIH currently runs ESX 4.1. It is recommended that RIH be upgraded to ESXi 5 before the Disaster Recovery environment is moved to the Oshean datacenter. One key benefit for RIH in ESXi version 5 will be Storage DRS. It will be key to configure Storage DRS as part of the upgrade. At a high level the new version provides the following: a. Improved Reliability and Security — with fewer lines of code and independence from general purpose OS, ESXi drastically reduces the risk of bugs or security vulnerabilities and makes it easier to secure your hypervisor layer. b. Streamlined Deployment and Configuration — ESXi has far fewer configuration items than ESX, greatly simplifying deployment and configuration and making it easier to maintain consistency. c. Higher Management Efficiency — The API-based, partner integration model of ESXi eliminates the need to install and manage third party management agents. You can automate routine tasks by leveraging remote command line scripting environments such as vCLI or PowerCLI. d. Simplified Hypervisor Patching and Updating — Due to its smaller size and fewer components, ESXi requires far fewer patches than ESX, shortening service windows and reducing security vulnerabilities. Coordination regarding Disaster Recovery RIH has engaged Vendor Company for the information required for the coordination of disaster recovery Phase 2. Per RIH Attachment A: Scope of Work the following needs for coordination have been identified: Commitment to discuss Customer's disaster recovery needs with OSHEAN Customer's disaster recovery needs with SecureWorks Strategy for disaster recovery implementation This section will detail the information gathered by the Vendor Company engineer in this engagement as it pertains to the items above. Coordination with Oshean The Vendor Company engineer had a formal conversation with Oshean in order to determine their SLA in terms of the Phase 2 engagement. Oshean is providing RIH with a secure datacenter in Springfield Massachusetts that will serve as RIH’s space for the new disaster recovery environment. Oshean is primarily responsible for the following items per their SLA: Adequate space and rack in Springfield datacenter for RIH equipment Adequate power and backup (UPS and generator) for RIH equipment A 100mb Cox-leased fiber point-to-point line (dual path except for local loop in Providence) from the RIH Providence RI datacenter to the DR datacenter in Springfield Massachusetts Proactive monitoring of the health and uptime of the Cox-leased line Availability to the datacenter for physical access by RIH IT personnel Coordination with RIH for physical access to the datacenter by RIH vendors Coordination with SecureWorks The Vendor Company engineer had a formal discussion with Dell Secureworks as it relates to the coordination and SLA for Phase 2. Secureworks is the vendor for RIH who provides hardware and management for the firewalls in Providence and Springfield. Currently Secureworks maintains QTY 2 Juniper SNG firewalls for RIH in Providence for their production datacenter. Since the firewalls are managed no access is given to RIH or vendors for login or configuration. It will therefore be important for the Phase 2 vendor to coordinate closely with RIH and Secureworks to ensure proper configuration of the new firewall to be installed in Springfield. Vendor Company spoke with the project manager and engineer at Secureworks who will be responsible for delivery of the new firewall and determined the following: Secureworks will provide QTY 1 Cisco ASA 5510 – 2nd ASA may be procured by RIH Secureworks will perform configuration of ASA RIH Vendor must provide Secureworks with the logical configuration for the Cisco ASA Logical Configuration is provided to Secureworks by filling out a configuration form Secureworks, after configuration has been applied, will ship the ASA to the the RIH Providence Datacenter and vendor will be responsible for delivery and install in Springfield. RIH vendor must physically install Cisco ASA into rack provided by Oshean Secureworks project management team for Phase 2 is only responsible for delivery of configuration and hardware up to testing of site-to-site connectivity Once connectivity and proper configuration has been determined then the Secureworks project management team becomes disengaged Any issues beyond site-to-site connectivity and configuration must be communicated to SecureWorks leveraging the standard support channels SecureWorks support is facilitated by RIH IT personnel SecureWorks provides ongoing monitoring of the hardware that it manages for RIH Regarding the coordination with Oshean for the delivery of Phase 2 Oshean has communicated that RIH must facilitate vendor access to the Springfield datacenter by notifying Oshean of dates and times. Access by vendors to the datacenter can be accomplished via escort by RIH personnel with key carded access or by obtaining keycard access to the vendor. It is also important to note that the leased line for connectivity to the datacenter is provided by Oshean. Any issues with connectivity over this line must go through Oshean’s NOC. Finally, Oshean does not own and is not responsible for any equipment procured by RIH (all DR servers, hardware or software). The vendor used in Phase 2 must be responsible for the DR equipment during the Phase 2 project in regards to configuration and functionality. Procurement Needed QTY 1 Cisco 3750 24-port switch QTY 4 NetApp Flexclone licenses Licensing for IDS and IPS for Juniper Production Firewall – (will be procured directly through SecureWorks by RIH) – no need for quote on this QTY 1 Cisco ASA – in addition to QTY 1 already procured for HA at Disaster Recovery Facility (will be procured directly through SecureWorks by RIH) – no need for quote on this QTY 1 Central Monitoring Solution – must be able to monitor VmWare, NetApp, Cisco and Windows Servers (ie..What’s Up Gold, Solarwinds) QTY 3 – LTO5 Half-Height Tape Drives for existing HP StorageWorks Library QTY 1 – SAN / NAS storage solution for disk-to-disk backups – SATA storage with at least 15TB useable space. Must have dual-port 8gb Fiber interface for connectivity to HP Proliant Backup server. QTY 1 – PCI Express 8Gb Fibre Channel Dual-Port HBA for HP Proliant Logical and Physical Layout Visio Diagrams Exhibit 2 VMware Health Check Report for Rhode Island Housing VMware and Rhode Island Housing Confidential Health Check Report © 2010 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html. VMware, VMware vSphere, VMware vCenter, the VMware “boxes” logo and design, Virtual SMP and VMotion are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. VMware, Inc 3401 Hillview Ave Palo Alto, CA 94304 www.vmware.com © 2010 VMware, Inc. All rights reserved. Page 2 of 33 Health Check Report Contents 1. Executive Summary ......................................................................... 4 1.1 Report Overview ............................................................................................................ 4 1.2 Assessment Highlights ................................................................................................... 4 1.3 Next Steps ...................................................................................................................... 4 2. Recommended Action Items ............................................................ 5 3. Health Check Assessment ............................................................... 7 3.1 Availability/Management ................................................................................................ 7 3.2 Performance................................................................................................................. 16 4. Appendix A: Audited Inventory ....................................................... 25 4.1 Host Configurations ..................................................................................................... 25 4.2 Networking Configurations ........................................................................................... 25 4.3 Storage ......................................................................................................................... 27 4.4 Virtual Datacenter ........................................................................................................ 28 5. Appendix B: Health Check Assessment Checklist .......................... 29 6. Appendix C: References ................................................................ 32 © 2010 VMware, Inc. All rights reserved. Page 3 of 33 Health Check Report 1. Executive Summary 1.1 Report Overview This report summarizes activities and findings from a VMware Health Check that was conducted for Rhode Island Housing. This report contains: 1.2 Recommended changes to configuration and/or usage per VMware best practices that may improve availability/management or performance of VMware components Inventory of components analyzed Checklist of assessment activities performed Assessment Highlights Analysis Period March, 2012 Datacenters Rhode Island Housing – Providence Production Site Contributing Participants Rhode Island Housing Abdel El idrissi Vendor Engineer, VmWare Consultant Summary of Activities 1.3 Performed Standard Health Check Assessment Checklist (see Appendix) Gathered system information collected from VMware HealthAnalyzer Interviewed participants to discuss priority issues and concerns Conducted knowledge transfer to o clarify understanding of VMware component requirements and behavior o clarify changes to configuration and usage per VMware best practices Reviewed documents supplied by Rhode Island Housing Next Steps Rhode Island Housing should review this report and consider the recommended action items. A follow-up consult and/or Health Check is also advised. If required, VMware, through its Professional Services Organization or via one of its many partner organizations, is able to assist Rhode Island Housing in implementing the recommended actions as detailed within this report. © 2010 VMware, Inc. All rights reserved. Page 4 of 33 Health Check Report 2. Recommended Action Items Priority Component Recommended Action Item 1 Network 1 Virtual Datacenter Set up a redundant service console portgroup to use a separate vmnic/uplink, and an alternate isolation response gateway address for more reliability in HA isolation detection Set up a redundant service console portgroup to use a separate vmnic/uplink on a separate subnet Specify "isolation address" for the redundant service console (das.isolationaddress2) Increase the failure detection time (das.failuredetectiontime) setting to 20000 milliseconds or greater 1 Virtual Datacenter Avoid making resource pools and VMs siblings in a hierarchy in order to avoid unexpected performance 1 Virtual Datacenter Disconnect vSphere Clients from the vCenter Server when they are no longer needed 1 Virtual Datacenter Use vCenter Server roles, groups, and permissions in order to provide appropriate access and authorization to virtual infrastructure. Avoid using Windows built-in groups (Administrators) 1 Virtual Datacenter Avoid changing default firewall rules and ports unless necessary 1 Virtual Machines Check to make sure that VMware Tools are installed, running, and not out of date for running VMs 1 Virtual Machines Limit use of snapshots and for short term use 1 Virtual Machines Check to make sure that VMs meet the requirements for VMotion 1 Virtual Machines Allocate only as much virtual hardware as required for each VM. Disable any unused or unnecessary virtual hardware devices. 2 Host Avoid changes to advanced parameter settings unless necessary 2 Network Check to make sure there is redundancy in networking paths and components to avoid single points of failure (e.g. at least 2 paths to each network) 2 Storage Allocate separate space on shared datastores for templates and media/ISOs from datastores for VMs 2 Storage Check to make sure there is redundancy in storage paths and components to avoid single points of failure Change portgroup security default settings ForgedTransmits and MACAddressChanges to Reject unless app requires it © 2010 VMware, Inc. All rights reserved. Page 5 of 33 Health Check Report Priority Component Recommended Action Item 2 Virtual Machines Use the latest version of vmxnet that is supported by the guest OS 2 Virtual Machines Select the correct guest OS type in the VM configuration to match the guest OS 2 Virtual Machines Use reservations/limits selectively on VMs that need it; don't set reservation too high or limits too low 2 Virtual Machines Consider using virtual hardware v7 to take advantage of additional capabilities (like VMXNET3, PVSCSI) 2 Virtual Machines Disable copy/paste between guest OS and remote console 3 Network Configure NICs and physical switch speed and duplex settings consistently. Set to autonegotiation for 1Gb NICs 3 Network Distribute vmnics for a portgroup across different PCI busses for greater redundancy 3 Virtual Machines Use the correct virtual SCSI hardware (e.g. BusLogic Parallel, LSILogic SAS/Parallel, VMware Paravirtual) © 2010 VMware, Inc. All rights reserved. Page 6 of 33 Health Check Report 3. Health Check Assessment 3.1 Availability/Management Item Comments Observation 1 Advanced Settings for the following 3 ESX host(s) have been changed from defaults: 10.44.10.10 10.44.10.14 10.44.10.12 Priority 2 Component Host Recommendation Avoid changes to advanced parameter settings unless necessary Justification ESX/ESXi hosts have a default configuration that normally does not need to be changed. Certain advanced parameter settings can be selectively changed if they are recommended VMware best practices and documented, or under direction by VMware support in order to address specific issues. Otherwise the advanced parameter settings should not be changed, as that may have an adverse and unpredictable impact on management, availability or performance of the virtual infrastructure. If an advanced parameter setting needs to be changed, make sure that the changes are consistently applied to all applicable hosts within the environment or cluster. Also, maintain proper change management procedures and document configuration changes. An example of advanced parameter setting that is valid is if a redundant service console portgroup is set up and an alternate isolation response gateway address needs to be configured for more reliability in HA isolation detection (by modifying das.isolationaddress2 and das.failuredetectiontime parameters. An example of advanced parameter setting that should not be changed is disabling transparent page sharing by setting sched.mem.pshare.enable parameter to false. Transparent page sharing provides numerous advantages for memory resource management and should not be changed. Item Comments © 2010 VMware, Inc. All rights reserved. Page 7 of 33 Health Check Report Item Comments Observation 2 The portgroup/vSwitches on the following 3 host(s) have less than 2 uplink paths: 10.44.10.10 10.44.10.14 10.44.10.12 Priority 2 Component Network Recommendation Check to make sure there is redundancy in networking paths and components to avoid single points of failure (e.g. at least 2 paths to each network) Justification In order to ensure that there is no service disruption, it is important to ensure that the networking configuration is fault resilient to accommodate networking path and component failures. It is recommended that all portgroups and distributed virtual portgroups are configured with at least two uplink paths using different vmnics. Use NIC teaming with at least two active NICs or in the case of service console/management portgroup one in active and at least one in standby. Set failover policy with the appropriate active and standby NICs for failover. Connect each physical adapter to different physical switches for an additional level of redundancy. Upstream physical network components should also have the necessary redundancy in order to accommodate physical component failures. Item Comments Observation 3 The vmnics on the following 3 host(s) are not distributed across different PCI busses: 10.44.10.10 10.44.10.14 10.44.10.12 Priority 3 Component Network Recommendation Distribute vmnics for a portgroup across different PCI busses for greater redundancy Justification Distributing vmnics for a portgroup across different PCI busses provides greater redundancy from failures related to a particular PCI bus. It is also important to team vmnics from different PCI busses in order to improve fault resiliency from component failures. © 2010 VMware, Inc. All rights reserved. Page 8 of 33 Health Check Report Item Comments Observation 4 The portgroup security settings ForgedTransmit or MacAddressChanges are not set to Reject on the following 3 host(s): Host: 10.44.10.10 VMotion (Setting: MAC address changes ) VM Network (Setting: MAC address changes ) Service Console 2 (Setting: MAC address changes ) Service Console (Setting: MAC address changes ) No Network (Setting: MAC address changes ) 10.44.10.x (Setting: MAC address changes ) VMotion (Setting: Forged Transmit ) VM Network (Setting: Forged Transmit ) Service Console 2 (Setting: Forged Transmit ) Service Console (Setting: Forged Transmit ) No Network (Setting: Forged Transmit ) 10.44.10.x (Setting: Forged Transmit ) VMotion (Setting: MAC address changes ) VM Network (Setting: MAC address changes ) Service Console 2 (Setting: MAC address changes ) Service Console (Setting: MAC address changes ) No Network (Setting: MAC address changes ) 10.44.10.x (Setting: MAC address changes ) VMotion (Setting: Forged Transmit ) VM Network (Setting: Forged Transmit ) Service Console 2 (Setting: Forged Transmit ) Service Console (Setting: Forged Transmit ) No Network (Setting: Forged Transmit ) 10.44.10.x (Setting: Forged Transmit ) VMotion (Setting: MAC address changes ) VM Network (Setting: MAC address changes ) Service Console 2 (Setting: MAC address changes ) Service Console (Setting: MAC address changes ) No Network (Setting: MAC address changes ) 10.44.10.x (Setting: MAC address changes ) VMotion (Setting: Forged Transmit ) VM Network (Setting: Forged Transmit ) Service Console 2 (Setting: Forged Transmit ) Service Console (Setting: Forged Transmit ) No Network (Setting: Forged Transmit ) 10.44.10.x (Setting: Forged Transmit ) Priority 1 Component Network Recommendation Change portgroup security default settings ForgedTransmits and MACAddressChanges to Reject unless app requires it © 2010 VMware, Inc. All rights reserved. Page 9 of 33 Health Check Report Item Comments Justification It is recommended that both of these options are set to Reject for improved security. In order to protect against MAC address impersonations and prevent ESX/ESXi from honoring requests to change the effective MAC address to anything other than the initial MAC address, change the settings to Reject. By setting the MACAddressChange setting to Reject, ESX/ESXi compares the source MAC address being transmitted by the Guest OS with the effective MAC address for its adapter to see if they match. If the addresses do not match, ESX/ESXi drops the packet. This allows impersonated addresses to be dropped before they are delivered and the Guest OS assume that the packets have been dropped. For VMs that require overriding this setting (intrusion detection or MSCS VM), create a special port group for these (and only these) VMs with the modified settings. References: "Configuring the ESX/ESXi Host" section in VMware Infrastructure 3 Security Hardening Guide http://www.vmware.com/vmtn/resources/726 Item Comments Observation 5 The following 3 ESX host(s) have datastore(s) with no redundant storage path configured between the datastore and ESX host: Host: 10.44.10.10 ESX_LUN_1 - naa.600c0ff000d54b87958e314801000000 ESX_LUN_0 - naa.600c0ff000d54024c28e314801000000 ESX_LUN_1 - naa.600c0ff000d54b87958e314801000000 ESX_LUN_0 - naa.600c0ff000d54024c28e314801000000 ESX_LUN_1 - naa.600c0ff000d54b87958e314801000000 ESX_LUN_0 - naa.600c0ff000d54024c28e314801000000 Priority 2 Component Storage Recommendation Check to make sure there is redundancy in storage paths and components to avoid single points of failure Justification Configuring multiple paths to storage improves availability and in some cases allows for load balancing. For FC storage, redundant fabrics are highly recommended. References: SAN System Design and Deploy http://www.vmware.com/resources/techresources/772 © 2010 VMware, Inc. All rights reserved. Page 10 of 33 Health Check Report Item Comments Observation 6 The following 3 ESX host(s) do not have redundant service console port groups on distinct vSwitches: 10.44.10.10 10.44.10.14 10.44.10.12 The default HA failure detection time has not been changed for the following 1 Cluster(s): Production No alternate isolation addresses have been specified for the following 1 Cluster(s): Production Priority 1 Component Virtual Datacenter Recommendation Set up a redundant service console portgroup to use a separate vmnic/uplink, and an alternate isolation response gateway address for more reliability in HA isolation detection Set up a redundant service console portgroup to use a separate vmnic/uplink on a separate subnet Specify "isolation address" for the redundant service console (das.isolationaddress2) Increase the failure detection time (das.failuredetectiontime) setting to 20000 milliseconds or greater Justification Although NIC teaming is used to account for NIC failures, overall redundancy for HA heartbeats and isolation response detection can be made more reliable by setting up a redundant service console on a separate subnet. Each service console network should have one isolation address it can reach. When you set up service console redundancy, you must specify an additional isolation response address for the secondary service console network. VMware also recommends increasing the failure detection time setting to 20000 ms or greater. References: "Best Practices for VMware HA Clusters" section in vSphere Availability Guide http://www.vmware.com/support/pubs "VMware HA Best Practices" section in VMware High Availability: Concepts, Implementation, and Best Practices http://www.vmware.com/resources/techresources/402 Item Comments © 2010 VMware, Inc. All rights reserved. Page 11 of 33 Health Check Report Item Comments Observation 7 The following 1 user Windows default users/groups are being used for vCenter user roles/permissions : Administrators (for entity Datacenters) Priority 1 Component Virtual Datacenter Recommendation Use vCenter Server roles, groups, and permissions in order to provide appropriate access and authorization to virtual infrastructure. Avoid using Windows built-in groups (Administrators) Justification By default, any user or group who is a member of the local Administrators group of the Windows Server running vCenter Server will have full administrative control of vCenter Server (and the virtual infrastructure). This can allow other system administrators that are not virtual infrastructure administrators access to the virtual infrastructure. Use the appropriate vCenter Server roles and assign them to the appropriate vCenter Administrators AD group to ensure access is limited to virtual infrastructure administrators. Before removing users or groups from vCenter Server, make sure that you create and test access to vCenter Server for the new users and groups. Reference: "Managing Users, Groups, Roles, and Permissions section in vSphere Basic System Administration http://www.vmware.com/support/pubs Item Comments Observation 8 The firewall settings for the following ESX host(s) have been modified from the default: 10.44.10.10 10.44.10.14 10.44.10.12 Refer to findings data for more specific details about the settings that have been modified Priority 1 Component Virtual Datacenter Recommendation Avoid changing default firewall rules and ports unless necessary © 2010 VMware, Inc. All rights reserved. Page 12 of 33 Health Check Report Item Comments Justification The default firewall rules are configured in order to ensure adequate security while allowing communication with the appropriate virtual infrastructure components. Unless required to enable communication for virtual infrastructure services, avoid changing the firewall rules as that can introduce additional security issues. It is best to leave the default security firewall settings, which block all incoming and outgoing traffic that is not associated with an enabled service. If you do enable a service and open ports, make sure to document the changes, including the purpose for opening each port and consistently make the changes on all the appropriate ESX/ESXi hosts. Avoid changing the default ports unless necessary. These ports are required for vCenter Server, ESX hosts and other components to communicate with the virtual infrastructure. References: "Configure the Firewall for Maximum Security" section in VMware Infrastructure 3 Security Hardening http://www.vmware.com/resources/techresources/727 TCP and UDP Ports for vCenter Server, ESX hosts, and other network components management access KB http://kb.vmware.com/kb/012382 Item Comments Observation 9 The following 1 VM(s) have SCSI controllers which differ from the default SCSI controller for their guest OS: GPO (version 7) on host 10.44.10.10 is using VirtualIDEController Priority 3 Component Virtual Machines Recommendation Use the correct virtual SCSI hardware (e.g. BusLogic Parallel, LSILogic SAS/Parallel, VMware Paravirtual) © 2010 VMware, Inc. All rights reserved. Page 13 of 33 Health Check Report Item Comments Justification Selecting the incorrect virtual SCSI hardware can prevent the VM from properly booting or impact the performance of the VM. Check the Guest Operating System Installation Guide for the correct virtual SCSI hardware that is supported. vCenter Server automatically selects the default SCSI adapter that is supported for the guest OS of the VM. In general, older guest OSs might require BusLogic. LSILogic SAS is available for VMs with virtual hardware v7. LSI Logic is best for workloads that drive less than 2000 IOPS and 8 outstanding I/Os. VMware Paravirtual PVSCSI adapter can be used for environments where hardware and applications drive a high amount of I/O throughput. PVSCI is best for workloads that drive more than 2000 IOPS and 8 outstanding I/Os. This adapter is not suited for DAS environments and has some other limitations, such as 1. 2. 3. Supported on a few guest OS (e.g. Win Server 2003/8, RHEL5) Hot add/remove requires a bus rescan from within the guest Cannot boot a Linux guest or Windows guest (prior to ESX4 U1) can be used as a data disk VMware FT and MSCS cluster is not supported 4. References: Configuring Disks to use VMware Paravirtual SCSI KB http://kb.vmware.com/kb/1010398 Guest Operating System Installation Guide http://www.vmware.com/support/pubs Do I Choose PVSCSI or LSI Logic virtual adapter on ESX 4.0 for non-IO intensive workloads? http://kb.vmware.com/kb/1017652 Item Comments Observation 10 The following 4 VM(s) do not meet some of the VMotion requirements (either floppy/cd-rom found, VM in internal network, network or datastore not visible to all ESX in cluster): RIH16 COPY MM RIH44 RIH39 oestester Priority 1 Component Virtual Machines Recommendation Check to make sure that VMs meet the requirements for VMotion © 2010 VMware, Inc. All rights reserved. Page 14 of 33 Health Check Report Item Comments Justification In order to facilitate VMotion of VMs between hosts, the following requirements must be met: 1. The source and destination hosts must use shared storage and the disks of all VMs must be available on both source and target hosts VM should not be connected to internal networks The portgroup names must be the same on the source and destination hosts (easier with vDS vNetwork Distributed Switch) VMotion requires a 1GB network CPU compatibility source and destination hosts must have compatible CPUs (relaxed for EVC Enhanced VMotion Compatibility) No devices attached that prevent VMotion (CDROM, Floppy, serial/parallel devices 2. 3. 4. 5. 6. References: VMware VMotion and CPU Compatibility www.vmware.com/files/pdf/vmotion_info_guide.pdf Item Comments Observation 11 Copy/paste is not recommended and is enabled on the following 56 VM(s) : RIH19 SUSE_BASE RIH26 RIH233 RIH55 Priority 2 Component Virtual Machines Recommendation Disable copy/paste between guest OS and remote console © 2010 VMware, Inc. All rights reserved. Page 15 of 33 Health Check Report Item Comments Justification When VMware Tools runs in a virtual machine, by default you can copy and paste between the guest operating system and the computer where the remote console is running. As soon as the console window gains focus, non-privileged users and processes running in the virtual machine can access the clipboard of the virtual machine console. If a user copies sensitive information to the clipboard before using the console, the user perhaps unknowingly exposes sensitive data to the virtual machine. It is recommended that you disable copy and paste for the guest operating system by creating the parameters shown in the following table. Note that this does not disable copy and paste for the users when they access the virtual machine through other means like Terminal Services. This disable copy and paste only from the virtual machine console e.g. when using the console from the Virtual Infrastructure client. Configuration Settings to Disable Copy and Paste Name Value isolation.tools.copy.disable true isolation.tools.paste.disable true isolation.tools.setGUIOptions.enable false References: "Virtual Machines" section in VMware Infrastructure 3 Security Hardening http://www.vmware.com/resources/techresources/727 3.2 Performance Item Comments Observation 1 The following 3 ESX host(s) has Gbps network adapter(s) and not set to AutoNegotiate: 10.44.10.10 10.44.10.14 10.44.10.12 Priority 3 Component Network Recommendation Configure NICs and physical switch speed and duplex settings consistently. Set to autonegotiation for 1Gb NICs © 2010 VMware, Inc. All rights reserved. Page 16 of 33 Health Check Report Item Comments Justification Incorrect network speed and duplex settings can impact performance. The network adapter (vmnic) and physical switch settings needs to be checked and set correctly. If your physical switch is configured for a specific speed and duplex setting, you must force the network driver to use the same speed and duplex setting. For Gigabit links, network settings should be set to autonegotiate and not forced. Setting network adapter speed and duplex settings can be done from the vSphere Client although a reboot is required for changes to take effect. Reference: Performance Best Practices for VMware vSphere 4.0 http://www.vmware.com/resources/techresources/10041 Item Comments Observation 2 The following 3 datastore(s) have both VMs and Templates: DatastoreB4 DatastoreB5 DatastoreA3 Priority 2 Component Storage Recommendation Allocate separate space on shared datastores for templates and media/ISOs from datastores for VMs Justification To improve performance, separate VM files from other files such as templates and ISO files that have higher I/O characteristics. A best practice is to dedicate separate shared datastores/LUNs for VM templates and for ISO/FLP files, separate from the VMs themselves. ISO and FLP media files can be placed either locally in the /vmimages directory on each host or in a shared datastore. To avoid storing unnecessary copies, place media files on shared storage. Item Comments Observation 3 The following 1 Resource Pool(s) have both virtual machines and child resource pools as siblings in hierarchy : Priority Resources (Owner: Production ) 1 © 2010 VMware, Inc. All rights reserved. Page 17 of 33 Health Check Report Item Comments Component Virtual Datacenter Recommendation Avoid making resource pools and VMs siblings in a hierarchy in order to avoid unexpected performance Justification Resource pools help improve manageability and grouping and partitioning of CPU and memory resources. We recommend, however, that resource pools and VMs not be made siblings in a hierarchy. Instead, each level should contain resource pools, or only VMs. This is because by default resource pools are assigned share values that might not compare appropriately with those assigned to VMs, potentially resulting in unexpected performance. References: Performance Best Practices for VMware vSphere 4 http://www.vmware.com/resources/techresources/10041 Item Comments Observation 4 The following 2 user session(s) have been idle for long time: idle for 4 hours idle for 3 hours Priority 1 Component Virtual Datacenter Recommendation Disconnect vSphere Clients from the vCenter Server when they are no longer needed Justification vCenter Server must keep all client sessions current with inventory changes. Doing this for connected but unused sessions attached to the vCenter Server can affect the vCenter Server systems CPU usage and user interface speed. Disconnect vSphere Client sessions from the vCenter Server when they are not longer needed in order to improve the performance of vCenter Server. Reference: Performance Best Practices for VMware vSphere 4 http://www.vmware.com/resources/techresources/10041 Item Comments © 2010 VMware, Inc. All rights reserved. Page 18 of 33 Health Check Report Item Comments Observation 5 The following 20 VMs (version 7) are not using VMXNET3 even though their configuration and guest OS support it: RIH26 RIH55 RIH-SFTP RIHSCAN RIHDOC Only 5 results are displayed above. See Findings data for more observations of this type. The following 30 VMs (version 4) are not using VMXNET2 even though their configuration and guest OS support it: RIH19 RIH233 DEV RIH33 UAT Priority 2 Component Virtual Machines Recommendation Use the latest version of vmxnet that is supported by the guest OS Justification For best performance, use the vmxnet3 paravirtualized network adapter for operating systems in which it is supported. Note that this requires that the virtual machine use virtual hardware version 7 and that VMware Tools be installed in the guest OS. If vmxnet3 is not supported by the guest OS, use Enhanced vmxnet (vmxnet2). Both vmxnet3 and Enhanced vmxnet support jumbo frames. If Enhanced vmxnet is not supported in the guest OS, then use the Flexible device type, which automatically converts each vlance network device to vmxnet device if VMware Tools is installed. Refer to the KB in the references and the product documentation for supported guest OS for the particular adapter. References: Choosing a network adapter for your virtual machine KB http://kb.vmware.com/kb/1001805 Item Comments © 2010 VMware, Inc. All rights reserved. Page 19 of 33 Health Check Report Item Comments Observation 6 The following 2 virtual machines have a different OS installed than the one configured: VM Name: RIH26 Configured OS: windows7_64GuestInstalled OS: windows7Server64Guest VMware HealthAnalyzer Configured OS: rhel5GuestInstalled OS: other26xLinuxGuest Priority 2 Component Virtual Machines Recommendation Select the correct guest OS type in the VM configuration to match the guest OS Justification Selecting the guest OS type determines the 1. 2. optimal monitor mode to use default optimal devices for the guest OS (such as SCSI controller and network adapter) appropriate VMware Tools to be installed in the guest OS 3. Thus, it is important to make sure that the guest OS type matches the OS installed in the VM to improve the performance and manageability of the VM. Note that changing the guest OS type can only be performed when the VM is powered off. Item Comments Observation 7 The following 26 virtual machines have resources limits/reservations specified: RIH19 RIH55 RIH-SFTP RIH1 DEV Priority 2 Component Virtual Machines Recommendation Use reservations/limits selectively on VMs that need it; don't set reservation too high or limits too low © 2010 VMware, Inc. All rights reserved. Page 20 of 33 Health Check Report Item Comments Justification Use reservations selectively on VMs that need it. Specify the minimum acceptable amount of CPU or memory. Dont set reservations too high since it can limit the number of virtual machines you can power on in a resource pool, cluster, or host. Setting reservations can also impact the slot size calculation for HA clusters which can impact the admission control policy of a HA cluster (for admission control policy of number of host failures). Setting limits too low can impact the amount of CPU or memory resources available to the VMs which can impact the overall performance. Setting reservations/limits on VMs increases the manageability of the virtual infrastructure, so it is important to selectively set these only on VMs that need it. References: General Resource Management Best Practices section in Performance Best Practices for VMware vSphere 4.0 http://www.vmware.com/resources/techresources/10041 Item Comments Observation 8 There following 1 VM(s) have VMware tools not installed or not up to date or not running: SUSE_BASE (tools status: guestToolsNotInstalled) Priority 1 Component Virtual Machines Recommendation Check to make sure that VMware Tools are installed, running, and not out of date for running VMs Justification Install VMware Tools in all guests that have supported VMware Tools available. VMware Tools optimize the guests to make them run better inside virtual machines by providing 1. 2. 3. 4. 5. 6. optimized virtual NIC and storage drivers efficient memory management using the balloon driver driver to assist with file system quiescing to facilitate backups improved keyboard, video, and mouse operation graceful shutdown of VMs perfmon integration of virtual machine performance data (for vSphere) To ensure compatibility and optimal performance, upgrade the VMware Tools for older virtual machines to the highest versions supported by their ESX/ESXi hosts. Item Comments © 2010 VMware, Inc. All rights reserved. Page 21 of 33 Health Check Report Item Comments Observation 9 The following 5 VM(s) are using virtual hardware older than v7: RIH39 is using virtual hardware version vmx-04 on host 10.44.10.10 (host version: 4.1.0) WS3.rihmfc.com is using virtual hardware version vmx-04 on host 10.44.10.10 (host version: 4.1.0) RIH42 is using virtual hardware version vmx-04 on host 10.44.10.10 (host version: 4.1.0) RIH7 is using virtual hardware version vmx-04 on host 10.44.10.10 (host version: 4.1.0) RIH237 is using virtual hardware version vmx-04 on host 10.44.10.10 (host version: 4.1.0) Priority 2 Component Virtual Machines Recommendation Consider using virtual hardware v7 to take advantage of additional capabilities (like VMXNET3, PVSCSI) Justification Virtual hardware version 7 provides numerous additional capabilities such as: 1. 2. 3. vmxnet3 (IPv6 checksum, TSO) PVSCSI Additional capabilities like VMware Fault Tolerance (FT) Although virtual hardware version 7 can provide additional performance benefits, it is important to note that virtual machines with virtual hardware v7 cannot be run on ESX/ESXi versions earlier than 4.0. This can limit your choices for VMotion, DRS, and DPM. Also, virtual machines that are converted to virtual hardware v7 cannot be reverted back to the earlier version unless you have taken a backup or created a snapshot of the virtual machine prior to converting to v7. References: Performance Best Practices for VMware vSphere 4.0 http://www.vmware.com/resources/techresources/10041 Item Comments Observation 10 The following 5 virtual machine(s) have snapshot(s): Priority RIH-SFTP RIH4 RIH7 RIH6 RIH42 1 © 2010 VMware, Inc. All rights reserved. Page 22 of 33 Health Check Report Item Comments Component Virtual Machines Recommendation Limit use of snapshots and for short term use Justification Snapshots provide a means to allow point-in-time state captures allowing VMs to have their state reverted to a snapshot for testing and recovery purposes. Having multiple snapshots results in more disk usage and although SCSI contention has been significantly improved in VMFS3 and vSphere 4, it is recommended to limit use of snapshots and use snapshots for short term use. Snapshots can also prevent certain operations like Storage VMotion. Item Comments Observation 11 Connected virtual hardware devices are found on the following 3 VM(s) : RIH44 RIH39 oestester Priority 1 Component Virtual Machines Recommendation Allocate only as much virtual hardware as required for each VM. Disable any unused or unnecessary virtual hardware devices. © 2010 VMware, Inc. All rights reserved. Page 23 of 33 Health Check Report Item Comments Justification Provisioning a virtual machine with more resources that it requires can, in some cases, reduce the performance of that virtual machine as well as other virtual machines sharing the same host. For example, configuring more vCPUs than required for an application that is single threaded can reduce overall performance. Also, configuring more memory than required can impact the other virtual machines on the same host. In addition to disabling unnecessary virtual devices within the virtual machine, ensure that no device is connected to a virtual machine if it does not need to be there. For example, serial and parallel ports are rarely used for virtual machines in a datacenter environment, and CD/DVD drives are usually connected only temporarily during software installation. Disabling any unused or unnecessary virtual hardware devices improves performance (can reduce device polling), improves security, and reduces chances of these devices preventing VMotion. Virtual machine performance can also be improved by configuring the VMs to use ISO images instead of physical drives, and can be avoided entirely by disabling optical drives in the virtual machines when the devices are not needed. References: ESX and Virtual Machines section in Performance Best Practices for VMware vSphere 4.0 http://www.vmware.com/resources/techresources/10041 "Virtual Machines" section in VMware Infrastructure 3 Security Hardening http://www.vmware.com/resources/techresources/727 © 2010 VMware, Inc. All rights reserved. Page 24 of 33 Health Check Report 4. Appendix A: Audited Inventory 4.1 Host Configurations Host Configuration 1 Platform Specifications: System: HP ProLiant DL380 G7 CPU: 2 sockets, 8 total cores, Intel(R) Xeon(R) CPU E5640 @ 2.67GHz RAM: 192 GB HBAs: 1 dual-channel ICH10 4 port SATA IDE Controller, 2 single-channel ISP2532-based 8Gb Fibre Channel to PCI Express HBA, 1 single-channel Smart Array P410i NICs: 2 dual-port NC364T PCI Express Quad Port Gigabit Server Adapter, 2 dual-port NC382i Integrated Quad Port PCI Express Gigabit Server Adapter ESX/ESXi Hosts: 4.2 10.44.10.10 10.44.10.12 10.44.10.14 Networking Configurations Networking Configuration 1 Virtual Datacenter Name: RIH Cluster Name: Production ESX/ESXi Hosts: 10.44.10.10, 10.44.10.12, 10.44.10.14 Switch Name Total Ports Available Ports Port Group Active NICs/Uplinks vSwitch0 128 97 10.44.10.x vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic0, vmnic4, vmnic6 vSwitch0 128 97 VM Network vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic0, vmnic4, vmnic6 vSwitch0 128 97 Service Console 2 vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic0, vmnic4, vmnic6 © 2010 VMware, Inc. All rights reserved. Page 25 of 33 Standby NICs/Uplinks Health Check Report vSwitch0 128 97 Service Console vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic0, vmnic4, vmnic6 vSwitch0 128 97 VMotion vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic0, vmnic4, vmnic6 vSwitch1 128 127 No Network vSwitch0 128 102 VM Network vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic4, vmnic6 vSwitch0 128 102 10.44.10.x vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic4, vmnic6 vSwitch0 128 102 Service Console 2 vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic4, vmnic6 vSwitch0 128 102 Service Console vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic4, vmnic6 vSwitch0 128 102 VMotion vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic7, vmnic4, vmnic6 vSwitch1 128 127 No Network vSwitch0 128 103 10.44.10.x vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic6, vmnic7, vmnic4 vSwitch0 128 103 VM Network vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic6, vmnic7, vmnic4 vSwitch0 128 103 Service Console 2 vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic6, vmnic7, vmnic4 © 2010 VMware, Inc. All rights reserved. Page 26 of 33 Health Check Report vSwitch0 128 103 Service Console vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic6, vmnic7, vmnic4 vSwitch0 128 103 VMotion vmnic0, vmnic1, vmnic2, vmnic3, vmnic5, vmnic6, vmnic7, vmnic4 vSwitch1 128 127 No Network 4.3 Storage Storage Specifications: Array: Storage Vendor, Model Datastore Name Type Size (GB) Free Space (GB) DatastoreA1 VMFS 500 324 DatastoreA2 VMFS 975 345 DatastoreA3 VMFS 975 534 DatastoreA4 VMFS 975 821 DatastoreA5 VMFS 500 445 DatastoreA6 VMFS 500 359 DatastoreA8 VMFS 950 480 DatastoreB1 VMFS 500 499 DatastoreB2 VMFS 975 635 DatastoreB3 VMFS 975 656 DatastoreB4 VMFS 975 688 DatastoreB5 VMFS 500 372 DatastoreB6 VMFS 500 462 Comments © 2010 VMware, Inc. All rights reserved. Page 27 of 33 Health Check Report Datastore Name Type Size (GB) ESX_LUN_0 VMFS 1956 860 ESX_LUN_1 VMFS 1956 827 LocalStorage 10 VMFS 278 169 LocalStorage 12 VMFS 278 169 LocalStorage 14 VMFS 278 93 4.4 Free Space (GB) Comments Virtual Datacenter Datacenter 1: Virtual datacenter name: RIH Physical datacenter: RIH Providence Production Cluster Production Enabled Features HA, DRS Hosts Checked 3 © 2010 VMware, Inc. All rights reserved. Page 28 of 33 No. of VMs 53 Health Check Report 5. Appendix B: Health Check Assessment Checklist Component Check (per Best Practice) Host Verify equipment was burned in with memory test for at least 72 hours Host Verify all host hardware is on the VMware Hardware Compatibility List (HCL) Host Verify all host hardware meets minimum supported configuration Host Check CPU compatibility for vMotion and FT Host Check ESX/ESXi host physical CPU utilization to make sure that it is not saturated or running in a sustained high utilization Host Verify all hosts in the cluster are compatible versions of ESX/ESXi Host Check ESX/ESXi host active Swap In/Out rate to make sure that it is not consistently greater than 0 Host Check to make sure that there is sufficient service console memory (max is 800MB) Host Verify that ESX service console root file system is not getting full Host Check if any 3 party agents are running in the ESX service console Host Verify that NTP is used for time synchronization rd Network Verify that networking in configured consistently across all hosts in a cluster Network Check to make sure there is redundancy in networking paths and components to avoid single points of failure (e.g. at least 2 paths to each network) Network If HA is being used, check that physical switches that support PortFast (or equivalent) have PortFast enabled Network Check that NICs for the same uplink have same speeds and duplex settings Network Check that Management/Service Console, Vmkernel, and VM traffic is separated (physical or logical using VLANs) Network Verify that portgroup security settings for ForgedTransmits and MACAddressChanges are set to Reject Network Check the virtual switch portgroup failover policy for appropriate active and standby NICs for failover Network Verify that VMotion and FT traffic is on at least a 1 Gb network Network Check that IP storage traffic is physically separate to prevent sharing network bandwidth © 2010 VMware, Inc. All rights reserved. Page 29 of 33 Health Check Report Component Check (per Best Practice) Storage Verify that VMs are on a shared datastore Storage Check that datastores are masked/zoned to the appropriate hosts in a cluster Storage Check that datastores are consistently accessible from all hosts in a cluster Storage Check that the appropriate storage policy is used for the storage array (MRU, Fixed, RR) Storage Check to make sure there is redundancy in storage paths and components to avoid single point of failure (e.g. at least 2 paths to each datastore) Storage Check that datastores are not getting full Virtual Datacenter Check that all datacenter objects use a consistent naming convention Virtual Datacenter Verify that hosts within a cluster maintain a compatible and homogeneous (CPU/mem) to support the required functionality for DRS, DPM, HA, and VMotion Virtual Datacenter Check that FT primaries are distributed on multiple hosts since FT logging is asymmetric Virtual Datacenter Verify that hosts for FT are FT compatible Virtual Datacenter Check that reservations/limits are used selectively on VMs that need it and are not set to extreme values Virtual Datacenter Check that vCenter Server is not running other applications and vCenter add-ons (for large environments and heavily loaded vCenter systems) and is sized appropriately Virtual Datacenter Check that the DB log setting is Normal unless there is a specific reason to set it to High Virtual Datacenter Check that the vCenter statistics level is set to an appropriate level (1 or 2 recommended) Virtual Datacenter Check that appropriate vCenter roles, groups, and permissions are being used VM Check any VMs with CPU READY over 2000 ms VM Check any VMs with sustained high CPU utilization VM Check any VMs with incorrect OS type in the VM configuration compared to the guest OS VM Check any VMs with multiple vCPUs to make sure the applications are not single threaded © 2010 VMware, Inc. All rights reserved. Page 30 of 33 Health Check Report Component Check (per Best Practice) VM Check the active Swap In/Out rate of VMs to make sure it is not consistently greater than 0 VM Check that NTP, windows time service, or another timekeeping utility suitable for the OS is used (and not VMware Tools) VM Check that VMware Tools are installed, running, and not out of date for running VMs VM Check VMs that are configured and enabled with unnecessary virtual hardware devices (floppy, serial, parallel, CDROM) and any devices that prevent VMotion VM Check VMs that are not yet on virtual hardware v7 VM Check VM configuration (memory reservation) for VMs running JVM to consider setting reservation to the size of OS+ java heap © 2010 VMware, Inc. All rights reserved. Page 31 of 33 Health Check Report 6. Appendix C: References Item URL Documentation http://www.vmware.com/support/pubs VMTN Technology information http://www.vmware.com/vcommunity/technology VMTN Knowledge Base http://kb.vmware.com Discussion forums http://www.vmware.com/community User groups http://www.vmware.com/vcommunity/usergroups.html Online support http://www.vmware.com/support Telephone support http://www.vmware.com/support/phone_support.html Education Services http://mylearn.vmware.com/mgrreg/index.cfm Certification http://mylearn.vmware.com/portals/certification/ Technical Papers http://www.vmware.com/vmtn/resources Network throughput between virtual machines Detailed explanation of VMotion considerations http://kb.vmware.com/kb/1428 http://www.vmware.com/resources/techresources/1022 Time keeping in virtual machines http://www.vmware.com/vmtn/resources/238 http://kb.vmware.com/kb/1006427 VMFS partitions http://www.vmware.com/vmtn/resources/608 VI3 802.1Q VLAN Solutions http://www.vmware.com/pdf/esx3_vlan_wp.pdf VMware Virtual Networking Concepts Using EMC Celerra IP Storage (VI3 VMware vCenter Update Manager documentation VMware vCenter Update Manager Best Practices Performance Best Practices for VMware vSphere 4.0 Recommendations for aligning VMFS partitions Performance Troubleshooting for VMware vSphere http://www.vmware.com/resources/techresources/997 http://www.vmware.com/resources/techresources/1036 http://www.vmware.com/support/pubs/vum_pubs.html http://www.vmware.com/resources/techresources/10022 http://www.vmware.com/resources/techresources/10041 http://www.vmware.com/vmtn/resources/608 http://communities.vmware.com/docs/DOC-10352 Large Page Performance http://www.vmware.com/resources/techresources/1039 VMware vSphere PowerCLI http://www.vmware.com/support/developer/windowstoolkit/ VI3 security hardening http://www.vmware.com/vmtn/resources/726 VMware HA: Concepts and Best Practices http://www.vmware.com/resources/techresources/402 Java in Virtual Machine on ESX http://www.vmware.com/files/pdf/Java_in_Virtual_Machines_on_ESX-FINALJan-15-2009.pdf CPU scheduler in ESX 4.0 http://www.vmware.com/resources/techresources/10059 Dynamic Storage Provisioning (Thin Provisioning) http://www.vmware.com/resources/techresources/10073 © 2010 VMware, Inc. All rights reserved. Page 32 of 33 Health Check Report Item URL Understanding memory resource management on ESX http://www.vmware.com/resources/techresources/10062 © 2010 VMware, Inc. All rights reserved. Page 33 of 33 Exhibit 3 VMware Health Check Report for Rhode Island Housing VMware and Rhode Island Housing Confidential Health Check Report © 2010 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html. VMware, VMware vSphere, VMware vCenter, the VMware “boxes” logo and design, Virtual SMP and VMotion are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. VMware, Inc 3401 Hillview Ave Palo Alto, CA 94304 www.vmware.com © 2010 VMware, Inc. All rights reserved. Page 2 of 32 Health Check Report Contents 1. Executive Summary ......................................................................... 4 1.1 Report Overview ............................................................................................................ 4 1.2 Assessment Highlights ................................................................................................... 4 1.3 Next Steps ...................................................................................................................... 4 2. Recommended Action Items ............................................................ 5 3. Health Check Assessment ............................................................... 7 3.1 Availability/Management ................................................................................................ 7 3.2 Performance................................................................................................................. 17 4. Appendix A: Audited Inventory ....................................................... 24 4.1 Host Configurations ..................................................................................................... 24 4.2 Networking Configurations ........................................................................................... 24 4.3 Storage ......................................................................................................................... 26 4.4 Virtual Datacenter ........................................................................................................ 26 5. Appendix B: Health Check Assessment Checklist .......................... 28 6. Appendix C: References ................................................................ 31 © 2010 VMware, Inc. All rights reserved. Page 3 of 32 Health Check Report 1. Executive Summary 1.1 Report Overview This report summarizes activities and findings from a VMware Health Check that was conducted for Rhode Island Housing. This report contains: 1.2 Recommended changes to configuration and/or usage per VMware best practices that may improve availability/management or performance of VMware components Inventory of components analyzed Checklist of assessment activities performed Assessment Highlights Analysis Period March, 2012 Datacenters Rhode Island Housing – Staged DR Datacenter located in Providence Contributing Participants Rhode Island Housing Abdel El idrissi Vendor Engineer, VmWare Consultant Summary of Activities 1.3 Performed Standard Health Check Assessment Checklist (see Appendix) Gathered system information collected from VMware HealthAnalyzer Interviewed participants to discuss priority issues and concerns Conducted knowledge transfer to o clarify understanding of VMware component requirements and behavior o clarify changes to configuration and usage per VMware best practices Reviewed documents supplied by Rhode Island Housing Next Steps Rhode Island Housing should review this report and consider the recommended action items. A follow-up consult and/or Health Check is also advised. If required, VMware, through its Professional Services Organization or via one of its many partner organizations, is able to assist Rhode Island Housing in implementing the recommended actions as detailed within this report. © 2010 VMware, Inc. All rights reserved. Page 4 of 32 Health Check Report 2. Recommended Action Items Priority Component Recommended Action Item 1 Network Configure networking consistently across all hosts in a cluster 1 Network Minimize differences in number of active NICs across hosts within a cluster 1 Network Avoid mixing NICs with different speeds and duplex settings on the same uplink for a portgroup/dvportgroup 1 Network Configure management/service console, Vmkernel, and VM networks such that there is separation of traffic (physical or logical using VLANs) 1 Network Change portgroup security default settings ForgedTransmits and MACAddressChanges to Reject unless app requires it 1 Virtual Datacenter Set up a redundant service console portgroup to use a separate vmnic/uplink, and an alternate isolation response gateway address for more reliability in HA isolation detection Set up a redundant service console portgroup to use a separate vmnic/uplink on a separate subnet Specify "isolation address" for the redundant service console (das.isolationaddress2) Increase the failure detection time (das.failuredetectiontime) setting to 20000 milliseconds or greater 1 Virtual Datacenter Avoid making resource pools and VMs siblings in a hierarchy in order to avoid unexpected performance 1 Virtual Datacenter Use vCenter Server roles, groups, and permissions in order to provide appropriate access and authorization to virtual infrastructure. Avoid using Windows built-in groups (Administrators) 1 Virtual Datacenter Avoid changing default firewall rules and ports unless necessary 1 Virtual Machines Check to make sure that VMware Tools are installed, running, and not out of date for running VMs 1 Virtual Machines Allocate only as much virtual hardware as required for each VM. Disable any unused or unnecessary virtual hardware devices. 2 Host Avoid changes to advanced parameter settings unless necessary 2 Network Check to make sure there is redundancy in networking paths and components to avoid single points of failure (e.g. at least 2 paths to each network) 2 Network Make sure that VMotion traffic is on at least a 1Gb network 2 Storage Size datastores appropriately 2 Storage Minimize differences in number of storage paths © 2010 VMware, Inc. All rights reserved. Page 5 of 32 Health Check Report Priority Component Recommended Action Item 2 Virtual Machines Use the latest version of vmxnet that is supported by the guest OS 2 Virtual Machines Select the correct guest OS type in the VM configuration to match the guest OS 2 Virtual Machines Consider using virtual hardware v7 to take advantage of additional capabilities (like VMXNET3, PVSCSI) 2 Virtual Machines Check to make sure that VMs meet the requirements for VMotion 2 Virtual Machines Disable copy/paste between guest OS and remote console 3 Network Configure NICs and physical switch speed and duplex settings consistently. Set to autonegotiation for 1Gb NICs 3 Network Distribute vmnics for a portgroup across different PCI busses for greater redundancy © 2010 VMware, Inc. All rights reserved. Page 6 of 32 Health Check Report 3. Health Check Assessment 3.1 Availability/Management Item Comments Observation 1 Advanced Settings for the following 3 ESX host(s) have been changed from defaults: 10.55.10.14 10.55.10.10 10.55.10.12 Priority 2 Component Host Recommendation Avoid changes to advanced parameter settings unless necessary Justification ESX/ESXi hosts have a default configuration that normally does not need to be changed. Certain advanced parameter settings can be selectively changed if they are recommended VMware best practices and documented, or under direction by VMware support in order to address specific issues. Otherwise the advanced parameter settings should not be changed, as that may have an adverse and unpredictable impact on management, availability or performance of the virtual infrastructure. If an advanced parameter setting needs to be changed, make sure that the changes are consistently applied to all applicable hosts within the environment or cluster. Also, maintain proper change management procedures and document configuration changes. An example of advanced parameter setting that is valid is if a redundant service console portgroup is set up and an alternate isolation response gateway address needs to be configured for more reliability in HA isolation detection (by modifying das.isolationaddress2 and das.failuredetectiontime parameters. An example of advanced parameter setting that should not be changed is disabling transparent page sharing by setting sched.mem.pshare.enable parameter to false. Transparent page sharing provides numerous advantages for memory resource management and should not be changed. Item Comments Observation 2 The following 1 cluster(s) do not have networking configured consistently across ESX hosts: DR © 2010 VMware, Inc. All rights reserved. Page 7 of 32 Health Check Report Item Comments Priority 1 Component Network Recommendation Configure networking consistently across all hosts in a cluster Justification Minimize differences in the network configuration across all hosts in a cluster. Consistent networking configuration across all hosts in a cluster easies administration and troubleshooting. Also, since services like VMotion require portgroups to be named consistently in order for VMotion to work, it is important to have a consistent configuration so that DRS and VMotion capabilities are not disrupted. Also, use a consistent naming convention for virtual switches, portgroups, and uplink groups Product/Version: vSphere 4 VMware vSphere 4 introduces VMware vNetwork Distributed Switches (vDS) and Cisco Nexus 1000V distributed switches which reduce administration time and ensure consistency across the virtual datacenter. Changes to the distributed virtual portgroup are consistently and automatically applied to all hosts that are connected to the distributed switch. Check the licensing requirements in order to determine if distributed switches can be used in the environment. Consider using distributed switches if possible. Item Comments Observation 3 The portgroup/vSwitches on the following 3 host(s) have less than 2 uplink paths: 10.55.10.14 10.55.10.10 10.55.10.12 Priority 2 Component Network Recommendation Check to make sure there is redundancy in networking paths and components to avoid single points of failure (e.g. at least 2 paths to each network) © 2010 VMware, Inc. All rights reserved. Page 8 of 32 Health Check Report Item Comments Justification In order to ensure that there is no service disruption, it is important to ensure that the networking configuration is fault resilient to accommodate networking path and component failures. It is recommended that all portgroups and distributed virtual portgroups are configured with at least two uplink paths using different vmnics. Use NIC teaming with at least two active NICs or in the case of service console/management portgroup one in active and at least one in standby. Set failover policy with the appropriate active and standby NICs for failover. Connect each physical adapter to different physical switches for an additional level of redundancy. Upstream physical network components should also have the necessary redundancy in order to accommodate physical component failures. Item Comments Observation 4 The vmnics on the following 3 host(s) are not distributed across different PCI busses: 10.55.10.14 10.55.10.10 10.55.10.12 Priority 3 Component Network Recommendation Distribute vmnics for a portgroup across different PCI busses for greater redundancy Justification Distributing vmnics for a portgroup across different PCI busses provides greater redundancy from failures related to a particular PCI bus. It is also important to team vmnics from different PCI busses in order to improve fault resiliency from component failures. Item Comments Observation 5 Management, VMKernel (Storage, Vmotion, FT) and VM Networks traffic are not separate on the following 3 host(s): Priority 10.55.10.14 10.55.10.10 10.55.10.12 1 © 2010 VMware, Inc. All rights reserved. Page 9 of 32 Health Check Report Item Comments Component Network Recommendation Configure management/service console, Vmkernel, and VM networks such that there is separation of traffic (physical or logical using VLANs) Justification Separate the following traffic: 1. 2. 3. 4. 5. management/service console vmkernel for IP storage vmkernel for VMotion vmkernel for FT Virtual Machine network traffic Traffic separation improves performance, prevents bottlenecks, and increases security. Use physical separation or logical separation using VLANs as appropriate. Note that physical switch ports should be configured as trunkports for VLANs. Item Comments © 2010 VMware, Inc. All rights reserved. Page 10 of 32 Health Check Report Item Comments Observation 6 The portgroup security settings ForgedTransmit or MacAddressChanges are not set to Reject on the following 3 host(s): Host: 10.55.10.14 VMotion (Setting: MAC address changes ) VM Network (Setting: MAC address changes ) Service Console 2 (Setting: MAC address changes ) Service Console (Setting: MAC address changes ) No Network (Setting: MAC address changes ) 10.44.10.x (Setting: MAC address changes ) VMotion (Setting: Forged Transmit ) VM Network (Setting: Forged Transmit ) Service Console 2 (Setting: Forged Transmit ) Service Console (Setting: Forged Transmit ) No Network (Setting: Forged Transmit ) 10.44.10.x (Setting: Forged Transmit ) VMotion (Setting: MAC address changes ) VM Network (Setting: MAC address changes ) Service Console 2 (Setting: MAC address changes ) Service Console (Setting: MAC address changes ) No Network (Setting: MAC address changes ) 10.44.10.x (Setting: MAC address changes ) VMotion (Setting: Forged Transmit ) VM Network (Setting: Forged Transmit ) Service Console 2 (Setting: Forged Transmit ) Service Console (Setting: Forged Transmit ) No Network (Setting: Forged Transmit ) 10.44.10.x (Setting: Forged Transmit ) VMotion (Setting: MAC address changes ) VM Network (Setting: MAC address changes ) Service Console 2 (Setting: MAC address changes ) Service Console (Setting: MAC address changes ) No Network (Setting: MAC address changes ) 10.44.10.x (Setting: MAC address changes ) VMotion (Setting: Forged Transmit ) VM Network (Setting: Forged Transmit ) Service Console 2 (Setting: Forged Transmit ) Service Console (Setting: Forged Transmit ) No Network (Setting: Forged Transmit ) 10.44.10.x (Setting: Forged Transmit ) Priority 1 Component Network Recommendation Change portgroup security default settings ForgedTransmits and MACAddressChanges to Reject unless app requires it © 2010 VMware, Inc. All rights reserved. Page 11 of 32 Health Check Report Item Comments Justification It is recommended that both of these options are set to Reject for improved security. In order to protect against MAC address impersonations and prevent ESX/ESXi from honoring requests to change the effective MAC address to anything other than the initial MAC address, change the settings to Reject. By setting the MACAddressChange setting to Reject, ESX/ESXi compares the source MAC address being transmitted by the Guest OS with the effective MAC address for its adapter to see if they match. If the addresses do not match, ESX/ESXi drops the packet. This allows impersonated addresses to be dropped before they are delivered and the Guest OS assume that the packets have been dropped. For VMs that require overriding this setting (intrusion detection or MSCS VM), create a special port group for these (and only these) VMs with the modified settings. References: "Configuring the ESX/ESXi Host" section in VMware Infrastructure 3 Security Hardening Guide http://www.vmware.com/vmtn/resources/726 Item Comments Observation 7 VMotion traffic for the following 3 host(s) is on less than 1 GB network: 10.55.10.14 10.55.10.10 10.55.10.12 Priority 2 Component Network Recommendation Make sure that VMotion traffic is on at least a 1Gb network Justification This is a VMware requirement that VMotion traffic should be on at least a 1Gb network. Since this traffic is unencrypted, it is also recommended that VMotion traffic be kept separate from other traffic. VMotion network can be on a separate isolated non routed network segment. References: VMware VMotion and CPU Compatibility http://www.vmware.com/resources/techresources/1022 Performance Best Practices for VMware vSphere 4.0 http://www.vmware.com/resources/techresources/10041 Item Comments © 2010 VMware, Inc. All rights reserved. Page 12 of 32 Health Check Report Item Comments Observation 8 The following 3 ESX host(s) do not have redundant service console port groups on distinct vSwitches: 10.55.10.14 10.55.10.10 10.55.10.12 The default HA failure detection time has not been changed for the following 1 Cluster(s): DR No alternate isolation addresses have been specified for the following 1 Cluster(s): DR Priority 1 Component Virtual Datacenter Recommendation Set up a redundant service console portgroup to use a separate vmnic/uplink, and an alternate isolation response gateway address for more reliability in HA isolation detection Set up a redundant service console portgroup to use a separate vmnic/uplink on a separate subnet Specify "isolation address" for the redundant service console (das.isolationaddress2) Increase the failure detection time (das.failuredetectiontime) setting to 20000 milliseconds or greater Justification Although NIC teaming is used to account for NIC failures, overall redundancy for HA heartbeats and isolation response detection can be made more reliable by setting up a redundant service console on a separate subnet. Each service console network should have one isolation address it can reach. When you set up service console redundancy, you must specify an additional isolation response address for the secondary service console network. VMware also recommends increasing the failure detection time setting to 20000 ms or greater. References: "Best Practices for VMware HA Clusters" section in vSphere Availability Guide http://www.vmware.com/support/pubs "VMware HA Best Practices" section in VMware High Availability: Concepts, Implementation, and Best Practices http://www.vmware.com/resources/techresources/402 Item Comments © 2010 VMware, Inc. All rights reserved. Page 13 of 32 Health Check Report Item Comments Observation 9 The following 1 user Windows default users/groups are being used for vCenter user roles/permissions : Administrators (for entity Datacenters) Priority 1 Component Virtual Datacenter Recommendation Use vCenter Server roles, groups, and permissions in order to provide appropriate access and authorization to virtual infrastructure. Avoid using Windows built-in groups (Administrators) Justification By default, any user or group who is a member of the local Administrators group of the Windows Server running vCenter Server will have full administrative control of vCenter Server (and the virtual infrastructure). This can allow other system administrators that are not virtual infrastructure administrators access to the virtual infrastructure. Use the appropriate vCenter Server roles and assign them to the appropriate vCenter Administrators AD group to ensure access is limited to virtual infrastructure administrators. Before removing users or groups from vCenter Server, make sure that you create and test access to vCenter Server for the new users and groups. Reference: "Managing Users, Groups, Roles, and Permissions section in vSphere Basic System Administration http://www.vmware.com/support/pubs Item Comments Observation 10 The firewall settings for the following ESX host(s) have been modified from the default: 10.55.10.14 10.55.10.10 10.55.10.12 Refer to findings data for more specific details about the settings that have been modified Priority 1 Component Virtual Datacenter Recommendation Avoid changing default firewall rules and ports unless necessary © 2010 VMware, Inc. All rights reserved. Page 14 of 32 Health Check Report Item Comments Justification The default firewall rules are configured in order to ensure adequate security while allowing communication with the appropriate virtual infrastructure components. Unless required to enable communication for virtual infrastructure services, avoid changing the firewall rules as that can introduce additional security issues. It is best to leave the default security firewall settings, which block all incoming and outgoing traffic that is not associated with an enabled service. If you do enable a service and open ports, make sure to document the changes, including the purpose for opening each port and consistently make the changes on all the appropriate ESX/ESXi hosts. Avoid changing the default ports unless necessary. These ports are required for vCenter Server, ESX hosts and other components to communicate with the virtual infrastructure. References: "Configure the Firewall for Maximum Security" section in VMware Infrastructure 3 Security Hardening http://www.vmware.com/resources/techresources/727 TCP and UDP Ports for vCenter Server, ESX hosts, and other network components management access KB http://kb.vmware.com/kb/012382 Item Comments Observation 11 The following 5 VM(s) do not meet some of the VMotion requirements (either floppy/cd-rom found, VM in internal network, network or datastore not visible to all ESX in cluster): XP IT-Test-VM RIH39 RIH33 RIH177-software VMotion traffic for the following 3 host(s) is on less than 1 GB network: 10.55.10.14 10.55.10.10 10.55.10.12 Priority 2 Component Virtual Machines Recommendation Check to make sure that VMs meet the requirements for VMotion © 2010 VMware, Inc. All rights reserved. Page 15 of 32 Health Check Report Item Comments Justification In order to facilitate VMotion of VMs between hosts, the following requirements must be met: 1. The source and destination hosts must use shared storage and the disks of all VMs must be available on both source and target hosts VM should not be connected to internal networks The portgroup names must be the same on the source and destination hosts (easier with vDS vNetwork Distributed Switch) VMotion requires a 1GB network CPU compatibility source and destination hosts must have compatible CPUs (relaxed for EVC Enhanced VMotion Compatibility) No devices attached that prevent VMotion (CDROM, Floppy, serial/parallel devices 2. 3. 4. 5. 6. References: VMware VMotion and CPU Compatibility www.vmware.com/files/pdf/vmotion_info_guide.pdf Item Comments Observation 12 Copy/paste is not recommended and is enabled on the following 4 VM(s) : XP RIH232 IT-Test-VM HP Management Priority 2 Component Virtual Machines Recommendation Disable copy/paste between guest OS and remote console © 2010 VMware, Inc. All rights reserved. Page 16 of 32 Health Check Report Item Comments Justification When VMware Tools runs in a virtual machine, by default you can copy and paste between the guest operating system and the computer where the remote console is running. As soon as the console window gains focus, non-privileged users and processes running in the virtual machine can access the clipboard of the virtual machine console. If a user copies sensitive information to the clipboard before using the console, the user perhaps unknowingly exposes sensitive data to the virtual machine. It is recommended that you disable copy and paste for the guest operating system by creating the parameters shown in the following table. Note that this does not disable copy and paste for the users when they access the virtual machine through other means like Terminal Services. This disable copy and paste only from the virtual machine console e.g. when using the console from the Virtual Infrastructure client. Configuration Settings to Disable Copy and Paste Name Value isolation.tools.copy.disable true isolation.tools.paste.disable true isolation.tools.setGUIOptions.enable false References: "Virtual Machines" section in VMware Infrastructure 3 Security Hardening http://www.vmware.com/resources/techresources/727 3.2 Performance Item Comments Observation 1 The following 3 ESX host(s) has Gbps network adapter(s) and not set to AutoNegotiate: 10.55.10.14 10.55.10.10 10.55.10.12 Priority 3 Component Network Recommendation Configure NICs and physical switch speed and duplex settings consistently. Set to autonegotiation for 1Gb NICs © 2010 VMware, Inc. All rights reserved. Page 17 of 32 Health Check Report Item Comments Justification Incorrect network speed and duplex settings can impact performance. The network adapter (vmnic) and physical switch settings needs to be checked and set correctly. If your physical switch is configured for a specific speed and duplex setting, you must force the network driver to use the same speed and duplex setting. For Gigabit links, network settings should be set to autonegotiate and not forced. Setting network adapter speed and duplex settings can be done from the vSphere Client although a reboot is required for changes to take effect. Reference: Performance Best Practices for VMware vSphere 4.0 http://www.vmware.com/resources/techresources/10041 Item Comments Observation 2 The following 1 cluster(s) do not have portgroups configured consistently (either name or active NIC total speeds) across ESX hosts: DR Priority 1 Component Network Recommendation Minimize differences in number of active NICs across hosts within a cluster Justification Having a variance in the number of active NICs across hosts within a cluster can lead to inconsistent network performance as VMs are migrated to other hosts within a cluster. Hosts that have fewer NIC ports than others might have network bottlenecks, but this might not be obvious if you assume that all hosts have the same number of active NIC ports available. Item Comments Observation 3 The following 3 host(s) have mixed NICs speed and duplex settings on a portgroup/dvportgroup uplink: Host: 10.55.10.14 Priority vSwitch0 vSwitch0 vSwitch0 1 © 2010 VMware, Inc. All rights reserved. Page 18 of 32 Health Check Report Item Comments Component Network Recommendation Avoid mixing NICs with different speeds and duplex settings on the same uplink for a portgroup/dvportgroup Justification Having a portgroup/dvportgroup mapped to multiple vmnics at different speeds is not a best practice because, depending on the traffic load balancing algorithm, the network speed of the traffic can be arbitrarily and randomly determined and the result can be undesirable. For example, suppose there are several VMs all connected to a single vSwitch with two outbound adapters, one at 100Mbps and one at a gigabit. Some VMs would be luckier than others depending on how their traffic is routed. A best practice is to ensure the speed is more predictable and deliberately chosen. Item Comments Observation 4 The following 1 datastore(s) have too many VM in them: PlaceholderVMs Priority 2 Component Storage Recommendation Size datastores appropriately Justification Use consistent LUN sizes, and create one datastore per LUN. Consider the time it takes to restore a LUN in the event of a disk failure which choosing a LUN size. There are restrictions on the maximum LUN size in vSphere so refer to the Configuration Maximums document. References: Configuration Maximums http://www.vmware.com/support/pubs Item Comments Observation 5 The following 1 Resource Pool(s) have both virtual machines and child resource pools as siblings in hierarchy : Resources (Owner: DR ) Priority 1 Component Virtual Datacenter © 2010 VMware, Inc. All rights reserved. Page 19 of 32 Health Check Report Item Comments Recommendation Avoid making resource pools and VMs siblings in a hierarchy in order to avoid unexpected performance Justification Resource pools help improve manageability and grouping and partitioning of CPU and memory resources. We recommend, however, that resource pools and VMs not be made siblings in a hierarchy. Instead, each level should contain resource pools, or only VMs. This is because by default resource pools are assigned share values that might not compare appropriately with those assigned to VMs, potentially resulting in unexpected performance. References: Performance Best Practices for VMware vSphere 4 http://www.vmware.com/resources/techresources/10041 Item Comments Observation 6 The following 1 VMs (version 7) are not using VMXNET3 even though their configuration and guest OS support it: Update Manager DR 10.55.10.31 The following 1 VMs (version 4) are not using VMXNET2 even though their configuration and guest OS support it: XP Priority 2 Component Virtual Machines Recommendation Use the latest version of vmxnet that is supported by the guest OS Justification For best performance, use the vmxnet3 paravirtualized network adapter for operating systems in which it is supported. Note that this requires that the virtual machine use virtual hardware version 7 and that VMware Tools be installed in the guest OS. If vmxnet3 is not supported by the guest OS, use Enhanced vmxnet (vmxnet2). Both vmxnet3 and Enhanced vmxnet support jumbo frames. If Enhanced vmxnet is not supported in the guest OS, then use the Flexible device type, which automatically converts each vlance network device to vmxnet device if VMware Tools is installed. Refer to the KB in the references and the product documentation for supported guest OS for the particular adapter. References: Choosing a network adapter for your virtual machine KB http://kb.vmware.com/kb/1001805 © 2010 VMware, Inc. All rights reserved. Page 20 of 32 Health Check Report Item Comments Observation 7 The following 2 virtual machines have a different OS installed than the one configured: VM Name: VMware HealthAnalyzer Configured OS: rhel5GuestInstalled OS: other26xLinuxGuest Update Manager DR 10.55.10.31 Configured OS: windows7_64GuestInstalled OS: windows7Server64Guest Priority 2 Component Virtual Machines Recommendation Select the correct guest OS type in the VM configuration to match the guest OS Justification Selecting the guest OS type determines the 1. 2. optimal monitor mode to use default optimal devices for the guest OS (such as SCSI controller and network adapter) appropriate VMware Tools to be installed in the guest OS 3. Thus, it is important to make sure that the guest OS type matches the OS installed in the VM to improve the performance and manageability of the VM. Note that changing the guest OS type can only be performed when the VM is powered off. Item Comments Observation 8 There following 5 VM(s) have VMware tools not installed or not up to date or not running: RIH232 (tools status: guestToolsNotInstalled) IT-Test-VM (tools status: guestToolsNotInstalled) HP Management (tools status: guestToolsNotInstalled) test VM (tools status: guestToolsNotInstalled) Update Manager 10.44.10.31 (tools status: guestToolsNotInstalled) Priority 1 Component Virtual Machines Recommendation Check to make sure that VMware Tools are installed, running, and not out of date for running VMs © 2010 VMware, Inc. All rights reserved. Page 21 of 32 Health Check Report Item Comments Justification Install VMware Tools in all guests that have supported VMware Tools available. VMware Tools optimize the guests to make them run better inside virtual machines by providing 1. 2. 3. 4. 5. 6. optimized virtual NIC and storage drivers efficient memory management using the balloon driver driver to assist with file system quiescing to facilitate backups improved keyboard, video, and mouse operation graceful shutdown of VMs perfmon integration of virtual machine performance data (for vSphere) To ensure compatibility and optimal performance, upgrade the VMware Tools for older virtual machines to the highest versions supported by their ESX/ESXi hosts. Item Comments Observation 9 The following 1 VM(s) are using virtual hardware older than v7: XP is using virtual hardware version vmx-04 on host 10.55.10.14 (host version: 4.1.0) Priority 2 Component Virtual Machines Recommendation Consider using virtual hardware v7 to take advantage of additional capabilities (like VMXNET3, PVSCSI) Justification Virtual hardware version 7 provides numerous additional capabilities such as: 1. 2. 3. vmxnet3 (IPv6 checksum, TSO) PVSCSI Additional capabilities like VMware Fault Tolerance (FT) Although virtual hardware version 7 can provide additional performance benefits, it is important to note that virtual machines with virtual hardware v7 cannot be run on ESX/ESXi versions earlier than 4.0. This can limit your choices for VMotion, DRS, and DPM. Also, virtual machines that are converted to virtual hardware v7 cannot be reverted back to the earlier version unless you have taken a backup or created a snapshot of the virtual machine prior to converting to v7. References: Performance Best Practices for VMware vSphere 4.0 http://www.vmware.com/resources/techresources/10041 Item Comments © 2010 VMware, Inc. All rights reserved. Page 22 of 32 Health Check Report Item Comments Observation 10 Connected virtual hardware devices are found on the following 5 VM(s) : XP IT-Test-VM RIH39 RIH33 RIH177-software Priority 1 Component Virtual Machines Recommendation Allocate only as much virtual hardware as required for each VM. Disable any unused or unnecessary virtual hardware devices. Justification Provisioning a virtual machine with more resources that it requires can, in some cases, reduce the performance of that virtual machine as well as other virtual machines sharing the same host. For example, configuring more vCPUs than required for an application that is single threaded can reduce overall performance. Also, configuring more memory than required can impact the other virtual machines on the same host. In addition to disabling unnecessary virtual devices within the virtual machine, ensure that no device is connected to a virtual machine if it does not need to be there. For example, serial and parallel ports are rarely used for virtual machines in a datacenter environment, and CD/DVD drives are usually connected only temporarily during software installation. Disabling any unused or unnecessary virtual hardware devices improves performance (can reduce device polling), improves security, and reduces chances of these devices preventing VMotion. Virtual machine performance can also be improved by configuring the VMs to use ISO images instead of physical drives, and can be avoided entirely by disabling optical drives in the virtual machines when the devices are not needed. References: ESX and Virtual Machines section in Performance Best Practices for VMware vSphere 4.0 http://www.vmware.com/resources/techresources/10041 "Virtual Machines" section in VMware Infrastructure 3 Security Hardening http://www.vmware.com/resources/techresources/727 © 2010 VMware, Inc. All rights reserved. Page 23 of 32 Health Check Report 4. Appendix A: Audited Inventory 4.1 Host Configurations Host Configuration 1 Platform Specifications: System: HP ProLiant DL380 G7 CPU: 2 sockets, 8 total cores, Intel(R) Xeon(R) CPU E5640 @ 2.67GHz RAM: 192 GB HBAs: 1 dual-channel ICH10 4 port SATA IDE Controller, 2 single-channel ISP2532-based 8Gb Fibre Channel to PCI Express HBA, 1 single-channel Smart Array P410i NICs: 2 dual-port NC364T PCI Express Quad Port Gigabit Server Adapter, 2 dual-port NC382i Integrated Quad Port PCI Express Gigabit Server Adapter ESX/ESXi Hosts: 4.2 10.55.10.10 10.55.10.12 10.55.10.14 Networking Configurations Networking Configuration 1 Virtual Datacenter Name: RIH Cluster Name: DR ESX/ESXi Hosts: 10.55.10.10, 10.55.10.12, 10.55.10.14 Switch Name Total Ports Available Ports Port Group Active NICs/Uplinks vSwitch0 128 117 10.44.10.x vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 117 Service Console 2 vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 117 Service Console vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 © 2010 VMware, Inc. All rights reserved. Page 24 of 32 Standby NICs/Uplinks Health Check Report vSwitch0 128 117 VMotion vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 117 VM Network vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch1 128 127 No Network vSwitch0 128 116 10.44.10.x vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 116 Service Console 2 vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 116 Service Console vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 116 VMotion vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 116 VM Network vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch1 128 127 No Network vSwitch0 128 116 10.44.10.x vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 116 Service Console 2 vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 116 Service Console vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 © 2010 VMware, Inc. All rights reserved. Page 25 of 32 Health Check Report vSwitch0 128 116 VMotion vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch0 128 116 VM Network vmnic0, vmnic1, vmnic2, vmnic3, vmnic4, vmnic5, vmnic7 vSwitch1 128 127 No Network 4.3 Storage Storage Specifications: Array: Storage Vendor, Model Datastore Name Type Size (GB) DRStorage VMFS 500 385 LocalStorage 10 VMFS 278 269 LocalStorage 12 VMFS 278 269 LocalStorage 14 VMFS 278 269 PlaceholderV Ms VMFS 24 23 4.4 Free Space (GB) Comments Virtual Datacenter Datacenter 1: Virtual datacenter name: RIH Physical datacenter: RIH DR Staged Environment in Providence © 2010 VMware, Inc. All rights reserved. Page 26 of 32 Health Check Report Cluster DR Enabled Features Hosts Checked HA, DRS No. of VMs 40 © 2010 VMware, Inc. All rights reserved. Page 27 of 32 Health Check Report 5. Appendix B: Health Check Assessment Checklist Component Check (per Best Practice) Host Verify equipment was burned in with memory test for at least 72 hours Host Verify all host hardware is on the VMware Hardware Compatibility List (HCL) Host Verify all host hardware meets minimum supported configuration Host Check CPU compatibility for vMotion and FT Host Check ESX/ESXi host physical CPU utilization to make sure that it is not saturated or running in a sustained high utilization Host Verify all hosts in the cluster are compatible versions of ESX/ESXi Host Check ESX/ESXi host active Swap In/Out rate to make sure that it is not consistently greater than 0 Host Check to make sure that there is sufficient service console memory (max is 800MB) Host Verify that ESX service console root file system is not getting full Host Check if any 3 party agents are running in the ESX service console Host Verify that NTP is used for time synchronization rd Network Verify that networking in configured consistently across all hosts in a cluster Network Check to make sure there is redundancy in networking paths and components to avoid single points of failure (e.g. at least 2 paths to each network) Network If HA is being used, check that physical switches that support PortFast (or equivalent) have PortFast enabled Network Check that NICs for the same uplink have same speeds and duplex settings Network Check that Management/Service Console, Vmkernel, and VM traffic is separated (physical or logical using VLANs) Network Verify that portgroup security settings for ForgedTransmits and MACAddressChanges are set to Reject Network Check the virtual switch portgroup failover policy for appropriate active and standby NICs for failover Network Verify that VMotion and FT traffic is on at least a 1 Gb network Network Check that IP storage traffic is physically separate to prevent sharing network bandwidth © 2010 VMware, Inc. All rights reserved. Page 28 of 32 Health Check Report Component Check (per Best Practice) Storage Verify that VMs are on a shared datastore Storage Check that datastores are masked/zoned to the appropriate hosts in a cluster Storage Check that datastores are consistently accessible from all hosts in a cluster Storage Check that the appropriate storage policy is used for the storage array (MRU, Fixed, RR) Storage Check to make sure there is redundancy in storage paths and components to avoid single point of failure (e.g. at least 2 paths to each datastore) Storage Check that datastores are not getting full Virtual Datacenter Check that all datacenter objects use a consistent naming convention Virtual Datacenter Verify that hosts within a cluster maintain a compatible and homogeneous (CPU/mem) to support the required functionality for DRS, DPM, HA, and VMotion Virtual Datacenter Check that FT primaries are distributed on multiple hosts since FT logging is asymmetric Virtual Datacenter Verify that hosts for FT are FT compatible Virtual Datacenter Check that reservations/limits are used selectively on VMs that need it and are not set to extreme values Virtual Datacenter Check that vCenter Server is not running other applications and vCenter add-ons (for large environments and heavily loaded vCenter systems) and is sized appropriately Virtual Datacenter Check that the DB log setting is Normal unless there is a specific reason to set it to High Virtual Datacenter Check that the vCenter statistics level is set to an appropriate level (1 or 2 recommended) Virtual Datacenter Check that appropriate vCenter roles, groups, and permissions are being used VM Check any VMs with CPU READY over 2000 ms VM Check any VMs with sustained high CPU utilization VM Check any VMs with incorrect OS type in the VM configuration compared to the guest OS VM Check any VMs with multiple vCPUs to make sure the applications are not single threaded © 2010 VMware, Inc. All rights reserved. Page 29 of 32 Health Check Report Component Check (per Best Practice) VM Check the active Swap In/Out rate of VMs to make sure it is not consistently greater than 0 VM Check that NTP, windows time service, or another timekeeping utility suitable for the OS is used (and not VMware Tools) VM Check that VMware Tools are installed, running, and not out of date for running VMs VM Check VMs that are configured and enabled with unnecessary virtual hardware devices (floppy, serial, parallel, CDROM) and any devices that prevent VMotion VM Check VMs that are not yet on virtual hardware v7 VM Check VM configuration (memory reservation) for VMs running JVM to consider setting reservation to the size of OS+ java heap © 2010 VMware, Inc. All rights reserved. Page 30 of 32 Health Check Report 6. Appendix C: References Item URL Documentation http://www.vmware.com/support/pubs VMTN Technology information http://www.vmware.com/vcommunity/technology VMTN Knowledge Base http://kb.vmware.com Discussion forums http://www.vmware.com/community User groups http://www.vmware.com/vcommunity/usergroups.html Online support http://www.vmware.com/support Telephone support http://www.vmware.com/support/phone_support.html Education Services http://mylearn.vmware.com/mgrreg/index.cfm Certification http://mylearn.vmware.com/portals/certification/ Technical Papers http://www.vmware.com/vmtn/resources Network throughput between virtual machines Detailed explanation of VMotion considerations http://kb.vmware.com/kb/1428 http://www.vmware.com/resources/techresources/1022 Time keeping in virtual machines http://www.vmware.com/vmtn/resources/238 http://kb.vmware.com/kb/1006427 VMFS partitions http://www.vmware.com/vmtn/resources/608 VI3 802.1Q VLAN Solutions http://www.vmware.com/pdf/esx3_vlan_wp.pdf VMware Virtual Networking Concepts Using EMC Celerra IP Storage (VI3 VMware vCenter Update Manager documentation VMware vCenter Update Manager Best Practices Performance Best Practices for VMware vSphere 4.0 Recommendations for aligning VMFS partitions Performance Troubleshooting for VMware vSphere http://www.vmware.com/resources/techresources/997 http://www.vmware.com/resources/techresources/1036 http://www.vmware.com/support/pubs/vum_pubs.html http://www.vmware.com/resources/techresources/10022 http://www.vmware.com/resources/techresources/10041 http://www.vmware.com/vmtn/resources/608 http://communities.vmware.com/docs/DOC-10352 Large Page Performance http://www.vmware.com/resources/techresources/1039 VMware vSphere PowerCLI http://www.vmware.com/support/developer/windowstoolkit/ VI3 security hardening http://www.vmware.com/vmtn/resources/726 VMware HA: Concepts and Best Practices http://www.vmware.com/resources/techresources/402 Java in Virtual Machine on ESX http://www.vmware.com/files/pdf/Java_in_Virtual_Machines_on_ESX-FINALJan-15-2009.pdf CPU scheduler in ESX 4.0 http://www.vmware.com/resources/techresources/10059 Dynamic Storage Provisioning (Thin Provisioning) http://www.vmware.com/resources/techresources/10073 © 2010 VMware, Inc. All rights reserved. Page 31 of 32 Health Check Report Item URL Understanding memory resource management on ESX http://www.vmware.com/resources/techresources/10062 © 2010 VMware, Inc. All rights reserved. Page 32 of 32 Exhibit 4 Rhode Island Housing – DataCenter Disaster Recovery Inventory Production Providence, RI Disaster Recovery Springfield, Mass QTY 3 – HP Proliant DL 380 G7 – 8 X CPU – 2 sockets – 196GB RAM UID 11 22 11 POWER POWER SUPPLY SUPPLY 99 7 7 88 33 1 5 2 6 3 7 22 2 2 1 6 11 22 33 11 22 OVER OVER TEMP TEMP 4 8 1 5 7 7 88 POWER POWER SUPPLY SUPPLY 2 6 3 7 22 99 22 44 66 88 1 1 2 2 1 11 33 22 POWER POWER SUPPLY SUPPLY OVER OVER TEMP TEMP 99 7 7 88 4 8 1 5 22 FANS 4 3 4 3 6 3 7 1 6 3 7 4 8 1 5 2 6 3 7 4 8 HP ProLiant DL380 G7 PROC PROC FANS 4 3 4 3 2 1 2 22 33 22 POWER POWER SUPPLY SUPPLY OVER OVER TEMP TEMP 7 7 1 6 22 22 44 66 88 1 1 ONLINE AMP SPARE STATUS PROC PROC MIRROR 5 5 FANS 4 3 4 3 2 2 1 1 8 QTY 2 – Cisco 3570 – 24-port Ethernet Switching Juniper SSG in HA SecureWorks Managed HP ProLiant DL380 G7 33 11 11 33 55 77 99 44 2 2 6 44 POWER POWER CAP CAP DIMMS 55 6 6 PROC FANS 1 4 5 2 POWER POWER CAP CAP 22 44 66 88 1 1 MIRROR PROC PROC 2 8 1 ONLINE AMP SPARE STATUS 5 11 88 PROC 2 4 44 POWER POWER SUPPLY SUPPLY 99 2 22 44 66 88 1 1 MIRROR 5 7 DIMMS 22 5 11 POWER POWER CAP CAP ONLINE AMP SPARE STATUS 5 6 3 UID HP ProLiant DL380 G7 44 5 2 1 33 11 11 33 55 77 99 44 2 2 6 33 11 11 33 55 77 99 44 2 2 6 POWER POWER SUPPLY SUPPLY 55 6 6 6 DIMMS 55 6 6 PROC PROC FANS 6 33 OVER OVER TEMP TEMP PROC FANS 1 UID 22 11 POWER POWER SUPPLY SUPPLY 1 2 22 PROC PROC FANS 4 3 4 3 2 22 7 7 88 PROC MIRROR 5 1 PROC FANS 4 3 4 3 11 ONLINE AMP SPARE STATUS 5 6 HP ProLiant DL380 G7 44 PROC MIRROR 11 POWER POWER SUPPLY SUPPLY 33 11 11 33 55 77 99 44 2 2 6 POWER POWER CAP CAP DIMMS 55 6 6 PROC PROC FANS 33 22 44 66 88 1 1 ONLINE AMP SPARE STATUS 5 UID HP ProLiant DL380 G7 44 22 5 6 POWER POWER CAP CAP DIMMS 33 11 11 33 55 77 99 44 2 2 PROC FANS 1 OVER OVER TEMP TEMP 22 POWER POWER SUPPLY SUPPLY 55 6 6 PROC PROC FANS 4 3 4 3 22 11 7 7 88 PROC MIRROR 5 POWER POWER SUPPLY SUPPLY 11 POWER POWER SUPPLY SUPPLY 99 22 44 66 88 1 1 ONLINE AMP SPARE STATUS 5 UID 99 UID HP ProLiant DL380 G7 POWER POWER CAP CAP DIMMS 33 11 11 33 55 77 99 44 2 2 6 44 OVER OVER TEMP TEMP 22 POWER POWER SUPPLY SUPPLY 55 6 6 PROC PROC FANS 6 QTY 3 – HP Proliant DL 380 G7 – 8 X CPU – 2 sockets – 196GB RAM QTY 2 – Cisco 3570 – 24-port Ethernet Switching Cisco ASA 5510 SecureWorks Managed CISCO ASA 5510 series 2 1 SYST RPS MASTR STAT DUPLX SPEED STACK 4 3 6 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Catalyst 3750 SERIES 23 24 1X 11X 13X 23X 2X 12X 14X 24X 2 1 SYST RPS MASTR STAT DUPLX SPEED STACK 4 3 6 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 TUS STA 1 HA POWER 11X 13X 23X 2X 12X 14X 24X R WE MDS 9124 MULTILAYER FABRIC SWITCH MGMT 10/100 P/S 3 4 5 6 7 8 10 8 11 12 13 14 15 16 17 18 19 20 21 22 23 24 0/0 LINK TX/RX 0/1 LINK TX/RX 0/2 LINK TX/RX 0/3 LINK USB 0 STATUS ACTIVE VPN SYST RPS MASTR STAT DUPLX SPEED STACK FLASH 1 SYST RPS MASTR STAT DUPLX SPEED STACK 2 3 CONSOLE AUX TX/RX 0/0 LINK TX/RX 0/1 LINK TX/RX 0/2 10/100/1000 LINK TX/RX 0/3 LINK USB 0 POWER STATUS ACTIVE VPN FLASH 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 13X 23X 12X 14X 24X 1X 11X 13X 23X 2X 12X 14X 24X 2 4 3 6 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2 Catalyst 3750 SERIES 23 24 1 MODE USB 1 7 11X 1 Adaptive Security Appliance SLOT NUMBER NFI CO 6 5 2X CISCO ASA 5510 series RESET CONFIG HA POWER 4 3 1X 1 MODE USB 1 * 2nd for HA is optionally procured by RIH FAN 2 TX/RX G RM STATUS 1 TUS STA 2 CISCO RESET AUX SSG320M ALA QTY 2 – Cisco 9124MDS Fiber Storage Switching POWER 2 3 CONSOLE 10/100/1000 PO CONSOLE NFI CO Catalyst 3750 SERIES 23 24 1X 1 MODE SLOT NUMBER G RM ALA 2 1 RESET CONFIG 2 Catalyst 3750 SERIES Adaptive Security Appliance SSG320M R WE PO 1 MODE 2 QTY 2 – Cisco 9124MDS Fiber Storage Switching MDS 9124 MULTILAYER FABRIC SWITCH CISCO CONSOLE MGMT 10/100 STATUS P/S FAN RESET DS-C9124-K9 1 2 3 4 5 6 7 8 8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 MDS 9124 MULTILAYER FABRIC SWITCH DS-C9124-K9 MDS 9124 MULTILAYER FABRIC SWITCH CISCO CISCO CONSOLE CONSOLE MGMT 10/100 STATUS MGMT 10/100 STATUS P/S P/S FAN FAN RESET 1 2 3 4 5 6 7 8 10 8 11 12 13 14 15 16 17 18 19 20 21 22 23 24 RESET DS-C9124-K9 NetApp FAS 2040 w/ Dual Controllers – QTY 12 600GB SAS in head 1 3 4 5 6 7 8 8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 DS-C9124-K9 NetApp FAS 2040 w/ Dual Controllers – QTY 12 1TB SATA in head A FAS2040 2 A FAS2040 B NetApp DS4243 24-disk Shelf QTY 24 – 600GB SAS B NetApp DS4243 24-disk Shelf QTY 24 – 1TB SATA 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB DS4243 DS4243 Rhode Island Housing – DataCenter Disaster Recovery Physical Design Disaster Recovery Springfield, Mass Cisco ASA 5510 SecureWorks Managed CISCO ASA 5510 series Adaptive Security Appliance POWER STATUS ACTIVE VPN FLASH CISCO ASA 5510 series Adaptive Security Appliance POWER QTY 4 – FC Per ESX Host – FC HBA PCI STACK 2 STACK 1 STACK 2 3 iLO 2 1 UID 4 3 iLO 2 1 UID 4 3 iLO 2 1 UID CONSOLE RATING 100-240V ~ 1.6A-0.9A, 50-60 HZ CONSOLE RATING 100-240V ~ 1.6A-0.9A, 50-60 HZ QTY 4 Ethernet Per ESX Host – PCI Onboard QTY 4 Ethernet Per ESX Host PCIE Ethernet DC INPUTS FOR REMOTE POWER SUPPLY SPECIFIED IN MANUAL +12V @8.5A DC INPUTS FOR REMOTE POWER SUPPLY SPECIFIED IN MANUAL +12V @8.5A ACTIVE VPN FLASH mb 100 nce e ean Osh Provid -P P-to STACK 1 4 STATUS MDS 9124 MULTILAYER FABRIC SWITCH CISCO CONSOLE MGMT 10/100 STATUS P/S FAN RESET 1 2 3 4 5 6 7 8 8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 RESET 1 2 3 4 5 6 7 8 8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 DS-C9124-K9 Fiber SAN MDS 9124 MULTILAYER FABRIC SWITCH CISCO CONSOLE MGMT 10/100 STATUS P/S FAN QTY 2 – FC Per Controller LNK 0b 0a 0b LNK e0P e0a e0b e0c e0d e0P e0a e0b e0c e0d RIH Providence Production DC LNK LNK IOM6 IOM3 LINK LNK 2 4 IOM6 IOM3 Disconnect all supply power for complete isolation. IOM6 IOM3 Disconnect all supply power for complete isolation. 1 3 2 4 CAUTION LINK CAUTION LNK 1 3 QTY 2 Ethernet Per Controller – Management Interfaces LNK LINK CAUTION 1 3 LNK LINK LNK 2 4 Disconnect all supply power for complete isolation. IOM6 IOM3 Disconnect all supply power for complete isolation. 1 3 2 4 CAUTION Note – diagram is for demonstration but vendor must have their own detailed design to work from 0a LNK DS-C9124-K9 Cisco 3570 and 9124mds diagrammed here – it is important that cabling is disparately patched for redundancy Rhode Island Housing – Disaster Recovery Logical Design VMWare Networking Storage Vswitch1 – LAN 10.55.10.0/24 HP - Onboard HP – FC HBA X1005A LINK ACT STAT Chelsio Communications NetApp - SATA HP - PCIE FC Alua igroup – qty 4 paths Vswitch2 – VmKernel New Vlan HP - Onboard Vmfs - datastore1 HP - PCIE Vswitch1 – Manage New Vlan HP - Onboard HP - PCIE 1.0TA vmdk1 vmdk2 vmdk3 Rhode Island Housing – Disaster Recovery Logical Design Site-to-Site Replication Vmware Disaster Recovery Production VCenter DRVCenter SRM SRM Juniper Cisco ASA CISCO ASA 5510 series SSG320M S R POWE STATU Adaptive Security Appliance RESET CONFIG SLOT NUMBER 1 2 G ALARM HA POWER 3 CONSOLE CONFI AUX TX/RX 0/0 LINK TX/RX 0/1 LINK TX/RX 0/2 LINK TX/RX 0/3 LINK LINK TX/RX 0/3 LINK 10/100/1000 USB 0 USB 1 POWER Oshean 100mb P-to-P Fiber Dual Path SSG320M S R POWE STATU RESET CONFIG SLOT NUMBER 1 2 G ALARM HA POWER CONFI 3 CONSOLE AUX TX/RX 0/0 LINK TX/RX 0/1 LINK TX/RX 0/2 10/100/1000 USB 0 USB 1 STATUS ACTIVE VPN FLASH CISCO ASA 5510 series Adaptive Security Appliance POWER STATUS ACTIVE VPN FLASH (3) HP Proliant – ESXi 5.0 (3) HP Proliant – ESXi 5.0 Netapp Production Disaster Recovery NetApp FAS 2040 w/ Dual Controllers – QTY 12 600GB SAS in head SnapMirror License Replication SnapMirror License Replication NetApp FAS 2040 w/ Dual Controllers – QTY 12 1TB SATA in head A FAS2040 A FAS2040 B B Juniper NetApp DS4243 24-disk Shelf QTY 24 – 600GB SAS SSG320M S R POWE STATU RESET CONFIG SLOT NUMBER 1 G ALARM HA POWER 2 3 CONSOLE CONFI AUX TX/RX 0/0 LINK TX/RX 0/1 TX/RX 0/0 LINK TX/RX 0/1 LINK TX/RX 0/2 LINK TX/RX 0/3 LINK USB 0 USB 1 LINK TX/RX 0/3 LINK USB 0 USB 1 10/100/1000 Cisco ASA CISCO ASA 5510 series Adaptive Security Appliance SSG320M S R POWE STATU RESET CONFIG SLOT NUMBER 1 G ALARM HA POWER CONFI POWER 2 STATUS ACTIVE VPN FLASH 3 CONSOLE AUX LINK TX/RX 0/2 10/100/1000 NetApp DS4243 24-disk Shelf QTY 24 – 1TB SATA CISCO ASA 5510 series Adaptive Security Appliance 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB 600GB 600GB 600GB 600GB 1.0TB 1.0TB 1.0TB 1.0TB POWER STATUS ACTIVE VPN FLASH DS4243 DS4243 FlexClone License – DR Test Oshean 100mb P-to-P Fiber Dual Path FlexClone License – DR Test