Corporate Overview - Data Center World

Transcription

Corporate Overview - Data Center World
DR07 - Maintaining Alignment between
Data Center Investment Decisions and
Disaster Recovery Planning
Mike Andrea
Director, The Strategic Directions Group Pty Ltd
Data Center Institute, Board of Directors
AFCOM Data Center World
Nashville, USA
October 2012
• Strategic ICT management services
• Data center design and development
• Telecommunications and networking
• Project office services
ICT Master Planners and Strategists
www.strategicdirections.com.au
AFCOM Data Center World – Nashville 2012
• DR07 - Maintaining Alignment between Data Center Investment
Decisions and Disaster Recovery Planning
Mike Andrea, Director, Strategic Directions
• With significant focus on data center energy consumption, PUE, and
rising energy costs, many companies are reconsidering their
approach to their data center investments.
• This leads to organisations investigating various options including
upgrades/refurbishment, colocation, and cloud services among
others.
• This presentation will discuss the importance of maintaining strategic
business priority alignment with the critical nature and function of the
data center in disaster recovery planning.
Agenda
•
•
•
•
Introduction
What are Strategic Business Priorities?
How does this impact Data Centers?
What is creating change?
– Natural disasters review
• Why align DR Planning?
• Strategies to Consider
• Summary & Questions
Introduction
• Do you
– know what your organisations core business is?
– understand how the organisation stays in business?
– know what the minimum operational status or
capability is for the organisation … before its gone?
• Are you aware of
– The strategic priorities that might force core business
change?
– How the business makes and spends money and
what is important to the business in a disaster?
• Production / manufacturing
• People, health and safety
• Public and emergency services
Introduction
• Do you
– Know how to recover from small issues to total loss
disasters?
– Know who is in charge should disaster strike?
– Know where disaster / emergency management will be
performed
– Know which staff might be impacted by different scale
disasters and your ability to continue to manage,
operate and recover
from a disaster
What are Strategic Business Priorities?
OBJECTIVES
Core Business is industry based:
• Legal
• Financial
• Manufacturing
• Construction
• Government
• Health / Education
• Primary Production
• Resources / Mining
• ICT
CORE BUSINESS
What are Strategic Business Priorities?
OBJECTIVES
Core Business is industry based:
• Legal
• Financial
• Manufacturing
• Construction
• Government
• Health / Education
• Primary Production
• Resources / Mining
• ICT
CORE BUSINESS
We:
• Deliver legal services
• Are a bank / insurance company
• Build widgets
• Build buildings and infrastructure
• Deliver public services
• Deliver health services
• Deliver education services
• Grow things
• Get resources out of the ground
and produce commodities
• Deliver IT, communications and
data center services
What are Strategic Business Priorities?
OBJECTIVES
CORE BUSINESS
STRATEGIC BUSINESS PRIORITIES
PRIORITIES
Strategic Business Priorities:
• Core Business delivery (volume, services, products)
• Hours of Operation – No unplanned down time
• Meet production deadlines
• Meet customer expectations
• Comply with regulatory, legal and contractual obligations
• Make a profit / improve productivity
• Deliver a growth target of x%
• Achieve a customer satisfaction rating > y%
• Achieve a customer retention target of z%
• Enter new markets
• Achieve a market ranking of #1 provider / product / service
What are Strategic Business Priorities?
BUSINESS CONTINUITY
OPERATIONAL MODE
BUSINESS AS USUAL
DISASTER RECOVERY
Business Continuity
• Ongoing, continuous delivery of the core business outcome
• Two operational modes:
– Business as Usual (normal workday functionality, maintenance works, including
typical change activities)
– Disaster Recovery (delivery of core business functions during a disaster, following
a disaster, and recovering from a disaster back to Business as Usual)
• Is your business:
– required to continue delivering core business outcomes during a disaster, or
– able to cease core business functions until the disaster has passed or been
resolved?
How does this impact Data Centers?
• Enable and/or support the Corporate Business Continuity Plan
– The business must be capable of delivering required outcomes during a disaster
• Data Center Strategy
– If the Primary Data Center fails, can the alternative (secondary) facility support
Core Business outcomes … for how long?
– PUE vs Uptime… which is more important… or can they both be achieved
through careful facility design or selection?
• Location
– Facilities supporting Emergency Services should be located in areas with a high
likelihood of continued operation and access in the event of significant natural
disasters
• Tier Rating
– Expecting long-term Tier IV uptime from a Tier I data center is not realistic
– Fire management solution (detection and suppression)
– Concurrent maintainability and fault tolerance?
How does this impact Data Centers?
• Operational Plan
– Staff training, skills retention, roster, location and knowledge levels
– Efficiency, DCIM, performance targets
• Lifecycle Management
– Upgrades, replacements and changing energy profiles
• Investment Plan
– When should expansion and major maintenance works be scheduled
– What is the ROI and expected life of the facility
• Maintenance Plan
– What are the maintenance windows for lengthy maintenance activities
– Will maintenance impact, impede or stop Core Business delivery?
– Do maintenance windows create time zone / time of day issues for
distributed national or global business services?
How does this impact Data Centers?
• Security Model
– Logical and physical security must align to the category of
system and data being supported
– 24x7 security staff
– CCTV Surveillance, digital recording and playback, and archived
footage
• Telecommunications interconnectivity
– Multi-carrier enabled?
– Redundant links between primary and secondary data centers
• Change Management planning, communication,
documentation and rollback strategies
What is Creating Change?
Reasons - Business:
• Commercial / Financial
• Core Business
–
–
–
•
•
•
•
•
•
•
•
•
•
•
Acquisition
Divesting capability
Expansion / contraction
Legal / Regulatory / Jurisdiction
Property
Funding
Business
Technical
Privacy
Security
REASONS
Market change
Customer demand
OPTIONS
Competitive pressure
Shareholder Interest
PLAN
and
Recent Disaster
ACTION
Insurance Policies
RISKS
What is Creating Change?
Reasons - Business:
• Commercial / Financial
• Core Business
–
–
–
•
•
•
•
•
•
•
•
•
•
•
Acquisition
Divesting capability
Expansion / contraction
Legal / Regulatory / Jurisdiction
Property
Funding
Business
Technical
Privacy
Security
REASONS
Market change
Customer demand
OPTIONS
Competitive pressure
Shareholder Interest
PLAN
and
Recent Disaster
ACTION
Insurance Policies
RISKS
Reasons - Technical:
• Technology
• Lifecycle (facility, IT, infrastructure)
• Density
• Consolidation, Virtualisation, Cloud
• Efficiency – PUE
• Data Retention Policies
• Security
• Capacity
• Space / Floor Loading
• Performance
• Skillsets
• Telecommunications cost, capability
and capacity
• Support and maintenance
• Age of the data center
• Energy costs (and carbon taxation)
Change: Risks
Risks:
• Mean Time To Recover
• Ability to meet Strategic Priorities
• Environmental / Nature / Climate Change
–
•
•
•
•
•
•
•
•
•
•
•
•
flood, fire, earthquake, snow storm, drought
Impact to Core Business
Energy Costs
Skillsets and Knowledge
Down time / Tier Rating
Landlord (owner of the building)
Loss of Reputation
Bottom Line Costs
Utility supply (impacted by drought)
Site / Location
Cyber Security
Neighbours
Fire and Total Site Loss
Risks: Natural Disasters
• Generally speaking:
–
–
–
–
–
–
–
Flood
Fire
Earthquake
Volcanic Eruption
Cyclone / Hurricane / Tornado
Tsunami
Tidal Surge
• But, also includes:
– Drought
– Heat Wave
– Dust Storm and Snow Storm
Risks: Natural Disasters
• Flood / Tsunami / Tidal Surge:
–
–
–
–
Water damage, silt, mud, salt, corrosion
Water flow / force of water and debris
Undermining foundations
Electrical safety issues
• Fire / Volcanic Activity:
– Direct fire damage
– Burning embers and hot ash (source of new fires)
– Smoke & ash entry into the DC (free air economiser & VESDA
risk)
– Water from fire suppression & forced entry by fire fighters
• Earthquake / land subsidence
– Direct building damage
– Services damage to cables, water & diesel tanks, lifts, etc
Natural Disasters: Direct
• Cyclone / Hurricane / Typhoon / Tornado:
– Wind and projectiles
– Water damage, silt, mud, salt, corrosion, debris
– Power, water and sewerage system impacts
• Drought
– Risk to site water supplies and power generation (power station)
– Generally longer summers (non economiser cycles)
• Heat Wave
– Excessively hot days extending the design parameters of the
cooling systems
• Dust Storm
– Direct dust ingress into the data center and possibly engines
– Clogs filters and dampers
– Fresh air intakes clogged and/or turned off
Natural Disasters: Indirect
• Flood issues:
– Debris in flooded rivers pose significant threat to
bridges
• Many bridges damaged (unsafe) and/or washed away
• Impact on transport corridors significant (road and rail)
• Some communities (supplies and staff) isolated
– Heavy / large debris in rivers can damage major
cross-city bridges (impact on diesel supplies, transport
and staff accessing office and data center sites)
– The scale of January 2011 flooding across in Australia
shows the degree of geographic impact can extend
over 2000km’s (> 1200 miles)
Natural Disasters: Indirect
• Flood issues:
– Coal mines (the size of Sydney Harbour) flooded
• no exports, and potential to impact power generation
– Interstate telecommunications links cut by flash flooding
– Basements in buildings flooded – damaged electrical
transformers and switchboards
• Mud and silt left in electrical equipment
• Some buildings electrical systems so badly damaged, unable to be occupied
for months after the event
• Lift wells flooded (needed to be drained)
• buildings not permitted to be occupied until electrical re-certification
– Office BCP/DR Sites
•
•
•
•
All were full and are contracted on a first in / first served basis
In some cases the DR office couldn’t fit all affected staff
Staff told to work from home had no power at home
Remote access connection points (dial-in and Internet VPN) lost power
Natural Disasters: Indirect
• Flood issues:
– Some communities can remain isolated (require food and fuel
drops) for weeks after the event
– Risk to electrical systems can mean
• High-rise buildings are evacuated
• Power distribution turned off whole city blocks at transformer level
• Critical Emergency Management data centers came within 8in of
being shutdown at the main switchboard
– Didn’t matter that they had diesel generators
– Emergency Services (not the power company) might have the
final say on whether a building is shutdown
– Sewerage systems directly affected and/or turned off
– Storm water drains flowed in the opposite direction from rivers
Natural Disasters: Indirect
• Flood issues:
– ICT DR and Business Continuity Plans failed
• In some cases systems deemed ‘non critical’ (and located in
a single site) were discovered to be mission critical (business
impact was significant)
• BCP had been written but not tested… due to risks…
• Some businesses had two data centers – but were both in the
same city and both affected by floods (turned off)
• Some businesses believe BCP is the same as DR
• Cyclone / Tornado / Storm issues
– Hospitals evacuated for safety reasons
– Some data centers are on the same grid as hospitals believing
they are safe from power cuts
– Insurance cost increases and/or coverage
Natural Disasters: Indirect
• Indirect issues:
– Commercial
• Government funds redirected to flood repair and economic
recovery
• Some commercial entities had IT budgets reduced to focus
on core business recovery
• Quite a number of IT and data center projects postponed due
to reallocation of funds
– Business drivers have changed to risk awareness and
BCP/DR
– Looting and other illegal activities
– For those businesses not directly affected, the level of
apathy about how safe they are has risen
• It won’t happen to us… we were fine in 2011
Natural Disasters: Risk Occurrence
Billion Dollar Weather / Climate Disasters 1980 - 2011
Source: National Climatic Data Center, NOAA, USA (2012)
Natural Disasters: Risk Occurrence
•
•
•
Since 1980, 114 billion-dollar weather and climate disasters in U.S.
Three more disasters approaching $1B
Total losses since 1980 of billion-dollar disasters exceed $800 billion.
Source: National Climatic Data Center, NOAA, USA (25 April 2012)
Natural Disasters: Risk Occurrence
14 x Billion-Dollar Weather and Climate Disasters in 2011
Natural Disasters: USA 2012
2012 USA Drought – Direct Impact to Mains Grid Power Generation
http://www.ibtimes.com/partnernet/finance/drought-could-cause-massive-blackouts-across-the-country_9738.htm
In July U.S. nuclear-power production hit its lowest seasonal levels in nine years as
drought and heat forced Nuclear power plants from Ohio to Vermont to slow output.
Nuclear Regulatory Commission spokesman David McIntyre explained, “Heat is the main
issue, because if the river is getting warmer the water going into the plant is warmer and
makes it harder to cool. If the water gets too warm, you have to dial back production,”
McIntyre said. “That’s for reactor safety, and also to regulate the temperature of
discharge water, which affects aquatic life.”
Nuclear is the thirstiest power source. According to the National Energy Technology
Laboratory (NETL) in Morgantown, West Virginia, the average NPP that generates 12.2
million megawatt hours of electricity requires far more water to cool its turbines than
other power plants. NPPs need 2725 liters of water per megawatt hour for cooling. Coal
or natural gas plants need, on average, only 1890 and 719 liters respectively to produce
the same amount of energy.
Align DR Planning
CORE BUSINESS
OBJECTIVES
STRATEGIC BUSINESS PRIORITIES
PRIORITIES
BUSINESS CONTINUITY
OPERATIONAL MODE
BUSINESS AS USUAL
REASONS
DISASTER RECOVERY
RISKS
CHANGE
OPTIONS
PLAN
and
ACTION
Test
and
Document
Align DR Planning
Strategic Business Priorities
Transform the Business
Innovation & New Markets
Grow the Business
Profits and Market Share
Business as Usual
Keep the Lights On
Consider the following example:
• Banking and Finance Sector
• The DR Plan for financial services utilising Bank Tellers
Versus
• The DR Plan for financial services delivering Internet Banking
• The DR Plan for Automatic Teller Machine (ATM) services
What is the difference for the:
• ICT and Telecommunications Strategy
• Data Center Strategy
Normal
Operations
Disaster
Recovery
Disaster
Operations
Business Continuity
Support
Interface
Business Services / Apps
Data Center
Access
Telco /
Internet
Align DR Planning
Alignment decisions include
• Data Center Strategy and Topology (primary-primary,
primary-secondary, primary-DR, other…)
• Multi-carrier and Internet connectivity solution
• Time of Day impact (week days, 24x7, time zones)
– Support capabilities and availability (staff and suppliers)
•
•
•
•
Customer reach and service availability expectations
Customer notifications and communication
Maintenance windows and Tier ratings
End-to-End Service Availability and Architecture
– Single corded equipment in a Tier III data center ?
• Own, lease, colocation, cloud, outsource, other
Align DR Planning
• Change directly impacts the Data
Center, its operations and assets
• Continually re-align the BCP and
DR Plan because you don’t know
when disaster will strike
• Know where and how to access and
apply the BCP and DR Plan during
a disaster
• Ensure only the latest version of the
Plan is distributed and used
(archive / remove previous editions)
Align DR Planning
• Maintain separate offsite records to enable quick retrieval of
– Insurance Policies
– Asset Lists (by site)
– Critical contact names and numbers (e.g. fuel suppliers)
• Schedule a re-testing of the BCP and DR Plan following significant
changes
• Test emergency fuel supplies and travel times via routes that are
unlikely to be impacted in a natural disaster or major traffic incident
Align DR Planning
• Don’t forget “Recovery”
– Key suppliers and spares… where are you placed on the supplier’s priority
list should a significant event impact the city?
– Can you rebuild the ICT architecture and systems from design
documentation that references previous equipment models and platforms
that are no longer be available?
– Where will your staff/team be located to manage a rebuild should a Total
Loss occur… can they actually get there?
– Would Strategic Priorities change the solution should a Total Loss occur?
• Could your insurance company dictate a direction or new set of priorities?
Align DR Planning
Examples of misalignment
•
•
•
•
•
•
•
Locating your primary data center and secondary data center in the same
city, adjacent to the same river… in noted flood plains (to reduce travel time
between the sites for support staff)
Locating your primary data center in the same colocation facility that your
Cloud Service provider is using to deliver your redundant, secondary or
backup solution
Keeping your Insurance Policy records, Asset Lists, BCP and DR Plans in
your primary data center with no alternative, web based access to the
information
Delaying / stopping data center maintenance works because downtime will
impact core business delivery in a 24/7 business
Insurance companies placing their primary data center and inbound call
center in a flood plain
Emergency Services routing all communications links to the Disaster
Response center through a primary data center located in a flood plain
Loading data center capacity and space to 100%
Strategies to Consider
• Review and understand the Strategic Business Priorities
– They change… at least annually
• Engage with the Risk and Change Managers
• Discuss and review recovery options from small issues to
total site loss events
– Would the site be rebuilt, or would management use the
opportunity to outsource, use colo or move to the cloud?
– Don’t forget people issues
– Know what to do and when
• Formally review and get
business buy-in to the
maximum allowable downtime
during and following a disaster
Strategies to Consider
• Ensure the data center facilities support business
continuity plans and enable disaster recovery efforts for
the business
• Make the data center manager a member of the Disaster
Management and Response Team
• Reviews of the data center might result in consideration
of:
–
–
–
–
Upgrade / refurbishment
New facility
Outsource / colocation
Cloud solution
Strategies to Consider
• Remember: Location, Location, Location
–
–
–
–
–
Flood plains are just that
Flooding from one event can cover x,000’s of miles
Cyclones and Tornadoes don’t follow the same path
Volcanic ash changes with the wind
Know the impact on your business insurance policies
• Review separation of
– Data Centers… take into account ‘area of impact’
– Carrier Exchanges and Telco Links and Paths
– Key offices, call centers and remote access points
• Know how to supply / support your site(s) with
– Clean water, diesel, staff, spares, and partner / resources
• Review and align Insurance Policies with Strategic
Priorities and DR Plans
Strategies to Consider
• Formulate a Disaster Management Plan that
targets Natural Disasters
– Plan for ‘what if’ scenarios
– Know where and what systems might need to be
moved, turned off, relocated, secured
– Mimic a disaster and test the plan
• BCP is not the same as DR
– Business Continuity during ‘Business as Usual’ as
against Disaster Recovery
– Test your BCP Plan for each Operational Mode
• Test your DR Strategy… regularly
• Simulate different impact events
• Simulate an event during a maintenance activity
Summary
•
•
•
•
•
•
Strategic Business Priorities
Impact on the Data Center
Risks and Natural Disasters
Business Continuity is constant
Change is ongoing
Align Data Center Investment
with DR Planning
• Strategies to Consider
CORE BUSINESS
OBJECTIVES
STRATEGIC BUSINESS PRIORITIES
PRIORITIES
BUSINESS CONTINUITY
OPERATIONAL MODE
BUSINESS AS USUAL
REASONS
DISASTER RECOVERY
RISKS
CHANGE
OPTIONS
PLAN
and
ACTION
Test
and
Document
Questions ?
Thank you
Mike Andrea
Director, The Strategic Directions Group
and
Data Center Institute USA, Board of Directors
Email: [email protected]
Mobile: +61 410 551 080
Web:
www.strategicdirections.com.au
www.afcom.com/dci_about.html
Copyright © 2012 The Strategic Directions Group Pty Ltd. All rights reserved.