Case Study – Disaster Recovery
Transcription
Case Study – Disaster Recovery
Network Diagram Case Study – Disaster Recovery DISASTER RECOVERY CENTRE DATA CENTRE DNS Server DNS Server Mail Server (Zimbra) Mail Server ( zimbra) Web Server Web Server Cisco Security Agent Cisco Security Agent Media Server Media Server Need -Some applications hosted on SAN based storage. Need a multisite DR solution on the top of their Fujitsu storage and brocade SAN. The primary and DR site were connected by leased line and the recovery/failover SLA was below 5 minutes. Our Solution --- SAN Switch Brocade 300 CTRL SAN Switch SAN Switch CTRL Fujitsu ETERNUS DX90 Storage Array Brocade 300 CTRL 500 m Apart Within same campus SAN Switch CTRL Fujitsu ETERNUS DX90 Storage Array A simple script based DR solution catering scale out application stack. A incremental SAN storage replication solution which caters network outage. A recovery mechanism being node agnostic. DC Data Centre FC Connectivity To DR Site DC DATA CENTRE LAN Switch DNS Server Mail Server (Zimbra) Web Server Cisco Security Agent DC-Management server Media Server SAN Switch Brocade 300 CTRL SAN Switch CTRL Fujitsu ETERNUS DX90 Storage Array 500 m Apart Within same campus DR Data Centre DISASTER RECOVERY DATA CENTRE LAN Switch DNS Server Mail Server (Zimbra) FC Connectivity to DATA Center Web Server Cisco Security Agent DR --Management server Media Server SAN Switch Brocade 300 CTRL SAN Switch CTRL Fujitsu ETERNUS DX90 Storage Array 500 m Apart Within same campus DC /DR Data Centre Setup DC DATA CENTRE DISASTER RECOVERY CENTRE DNS Server DNS Server Mail Server (Zimbra) Mail Server (Zimbra) Web Server Web Server DMS1 DMS2 Cisco Security Agent Cisco Security Agent Media Server Media Server Brocade 300 SAN Switch CTRL SAN Switch SAN Switch CTRL Fujitsu ETERNUS DX90 Storage Array Brocade 300 SAN Switch CTRL 500 m Apart Within same campus CTRL Fujitsu ETERNUS DX90 Storage Array Network Diagram Architecture Details The DR architecture was designed based on two Data Management Server placed each in primary and secondary site . Each DMS virtualizes the protection storage, provisioned from fujitsu Array. We have provisioned a distributed object file system, to enable instant storage of virtual copies of point-in-time data from the collection of applications. Each application was tied up with a SLA's to determine the lifecycle of the application data. Each SLA's would specify the following: • The frequency of application data snapshots to be taken • The storage pool in DMS which they’ll be kept, for example in the Tier I pool on SAS disk of the most recent snapshots, or in the de-duplicated Tier II pool using capacity optimized SATA drives, • A retention policy directing how long they’re to be stored The consolidated application data is stored in DMS node. The data is de duplicated to reduce the network bandwidth require to transfer between sites. The de duplicated consolidated data is replicated between two DMS in sync or async manner with the change block tracking mechanism. The dataflow is from primary to secondary direction always. For any corresponding backup job SLA , the corresponding recovery job is provisioned with the associated metadata of the backup job. The consistent snapshot of application data and it's catalogue is kept in both the primary and secondary sites. DATA CENTRE DISASTER RECOVERY CENTRE Passive Server Active Server Backup Virtualized Volume Restore Virtualized Volume DMS2 DMS1 Virtualized Volume Sync/Async Replication Virtualized Volume PR/DR Site Primary Site Array Secondary Site Array STATE TRANSITION Network Recovery Cycle andDiagram State Transition Each application server in PR/DR site lies in active /Passive mode. The active server data is periodically stored in PR DMS. The data capture is done based on changed file sets from the last backup. The PR DMS transfers application backup data and its catalogue to DR DMS in sync/async manner. In case of local disk failure , the data can be recovered from local DMS. In case of site failure , a passive server is provisioned and backup data is recovered from DR DMS. The recovery of the data happens through network only. The PR / DR site determination is triggered through user defined scripts. The data flow takes place from PR DMS to DR DMS only. To make site failover transparent to user , the PR and DR keeps a separate name server with the same DNS name mapped to different id. In the user console the primary DNS is pointed to PR DNS and secondary DNS is pointed to DR DNS server. That way user can be directed to active server node of the application. TCO / Benefit Analysis The DR setup is working in 4 Mbps Network Bandwidth and caters Network Breakdown. The dedup capability optimize the bandwidth requirement. The solution is highly script based and requires no over provisioning. It was cluster aware. The DMS server boots the application in less than 4 seconds and the application size is less than 10 MB. It was integrating seamlessly to their SAN storage. The cost of the entire DR solution was 20K $ , which was 10% of the nearest competitors quoted