Deployment Planning Guide Good Enterprise Mobility Server
Transcription
Deployment Planning Guide Good Enterprise Mobility Server
Good Enterprise Mobility ServerTM Deployment Planning Guide Product Version: 1.1 Doc Rev 2.3 Last Updated: 6-Nov-14 © 2014 Good Technology, Inc. All Rights Reserved. Table of Contents Purpose and Scope 1 Prerequisites 1 Pre-Deployment ConsiderationsLegal Notice 2 Microsoft Windows Server Considerations 2 Database Server 2 Hardware 3 Good Proxy Connections 6 Scalability 7 High Availability 7 Disaster Recovery 8 Scaling Factors 8 RTO and RPO 9 Physical Deployment 9 Simplest Deployment 10 Typical Deployment 11 High Availability (HA) 12 GEMS-HA Design Principles 12 HA for Instant Messaging 13 Load Distribution 13 Referral 14 HA for Presence Load Distribution 14 14 HA for Push Notifications 14 HA Failover Process/Behavior Summary 15 Additional HA Considerations 16 Disaster Recovery (DR) 16 ii DR Failback Process/Behavior 18 Phased Approach Recommendation 18 Deployment with Good Dynamics Network Separation 19 Server Instance Configuration in Good Control 19 Server-Side Services 20 Conclusion 21 Appendix A – Upgrading from Good Connect Classic 22 Upgrade Scenario 1: Parallel Server (Recommended) 22 Pertinent Considerations in this Scenario Upgrade Scenario 2: Repurpose Existing Server iii 19 23 24 Pertinent Considerations in this Scenario 25 Appendix B – Hardware Used for Testing GEMS 27 Purpose and Scope Purpose and Scope Good Enterprise Mobility Server™ (GEMS) is the designated consolidation of servers currently supported by Good. The purpose of this document is to identify the key planning factors that will influence the performance, reliability, and scalability of your deployed GEMS configuration, as well as to offer guidance on high available and disaster recovery options. The guidance presented herein is intended to help ensure the highest possible levels of reliability, sustainability, performance, predictability, and end-user availability. The target audience for this guide includes enterprise IT managers and administrators charged with evaluating technology and network infrastructure, as well as those responsible for making corresponding business decisions. This document does not discuss general GEMS and supporting network installation and software configuration tasks. Rather, it focuses on infrastructure configuration topics that require careful consideration when you are first planning your GEMS deployment. For both general and specific installation and configuration guidance and best practices, see the GEMS Installation and Configuration Guide. First, however, a discussion centered in the basics of physical deployment will be helpful. Prerequisites The planning information in this document is predicated on the following software releases: l Good Enterprise Mobility Server (GEMS) – v1.0 l Good Control (GC) – v1.7.38.19 l Good Proxy – v1.7.38.14 l Good Connect Client – v2.3 SR7 l Good Work Client – v1.0 General knowledge of GEMS and the Good Dynamics platform, along with Windows Server environments employing Microsoft Lync, Exchange and Active Directory is likewise required to effectively plan your GEMS deployment. 1 Pre-Deployment ConsiderationsLegal Notice Pre-Deployment ConsiderationsLegal Notice Before attempting to deploy GEMS, you may also need to plan for upgrades to the supporting environment. Is your existing change management process sufficient and are all the required tools handy? If not, you'll need to plan for these as well. In addition, your inhouse support team may need to have aspects of its training upgraded. Other key factors in the deployment of GEMS include the Microsoft Windows Server version and the machine hosting GEMS, available RAM, number of CPUs, Microsoft Lync Server version, Microsoft Exchange version, Microsoft SQL Server edition, and the roles and responsibilities of the IT staff supporting these servers and other vital components of your production network. Microsoft Windows Server Considerations Because GEMS uses Microsoft's Unified Communications Managed API (UCMA) to integrate Microsoft Lync with the GEMS Connect and Presence services, the OS version required to run GEMS Connect-Presence is dependent upon on the version of Microsoft Lync deployed. Per guidance from Microsoft, use the following guidelines to determine the version of MS Windows Server supported by GEMS Connect-Presence: l l l For MS Lync 2010 Deployments use Windows Server in one of these 64-bit versions: o 2008 R2 o 2008 R2 SP1 For MS Lync 2013 Deployments use Windows Server in one of these 64-bit versions: o 2008 R2 SP1 o 2012 R2 To host the Push Notification Service (PNS) only, use Windows Server in one of these 64bit versions: o 2012 R2 o 2008 R2 SP1 Database Server A relational database is required for the GEMS Connect and Push Notification services, but not the Presence service. This database can be part of your existing environment or newly 2 Pre-Deployment ConsiderationsLegal Notice installed. GEMS supports Microsoft SQL Server in the versions and editions listed below. In all cases, the database must be installed and prepared before starting GEMS installation. This means the necessary SQL scripts included in the GEMS installation zip file must be executed before beginning GEMS installation proper. The following versions of MS SQL Server are supported: l SQL Server 2008 (Express/Standard/Enterprise) l SQL Server 2008 R2 (Express/Standard/Enterprise) l SQL Server 2012 (Standard) l SQL Server 2012 SP1 (Enterprise) Microsoft has visual and command line tools to assist with database and schema creation; i.e., Microsoft Management Studio or sqlcmd. It must be noted that, although SQL Server Express is installed and set up with little effort, it has limited resources. For most enterprises, Microsoft SQL Server Standard or Enterprise editions are recommended. Hardware The recommended hardware specifications for each GEMS machine running any combination of the services offered is captured in the following table: Component Specification CPU 4 vCPU Memory 16 GB RAM Storage 50 GB HDD The specifications listed above are considered sufficient to handle the majority of use cases. Your specific enterprise environment, combined with your particular traffic and use requirements, is the key consideration in determining the actual hardware to implement. Hardware configurations used in testing by Good are listed in Appendix B. Use Profile Definitions (per server instance) for Push (Mail) Notification The Mail Push Notification service uses Exchange Web Services (EWS) to watch for messages sent and received. A user profile is characterized by the number of messages sent and received by a user in a typical eight hour day. 3 Pre-Deployment ConsiderationsLegal Notice Messages sent/received Activated Devices per mailbox per day supported per server Light 50-100 40,000 Medium 100-200 20,000 Heavy 200-400 5,000 Profile For details regarding the user profile used for scale testing, please follow the Microsoft Load Gen Profile to determine which profile suits your needs best. The results of testing conducted by Good1 reveal: Metric Medium Profile Heavy Profile 7% 29 % 5 iops 4 iops 25% 25 % 40 iops 45 iops GEMS CPU Utilization GEMS IOPS SQL CPU Utilization SQL IOPS Use Profile Definitions (per server instance) for Presence Since Presence is exposed as a Good Dynamics Server-Side Service, it can be used for many applications and the load will vary depending on the characteristics of the application invoking the Presence service. Refer to the following table to gauge the load you can place on a server hosting the Presence service. Profile Active Devices (%) Activated Devices subscribed per server Medium 20% 40,000 Heavy 50% 20,000 The Good Work client also uses the GEMS Presence service. Plannning for a larger profile is recommended when sizing for a Good Work deployment due to higher activity inherent in an email-centric application. The Heavy profile results reported here represent each active device subscribing to 100 contacts. 1Good lab test results are reported for the 90th percentile. The 90th percentile is a measure of statistical distributiion. Whereas the median is the statistical value for which 50% of the actual results were higher and 50% were lower, the 90th percentile reports the value for which 90% of the data points are smaller and 10% are greater. 90th percentile performance metrics are obtained by sorting test result values in increasing order, then taking the first 90 % of entries out of this set. 4 Pre-Deployment ConsiderationsLegal Notice Metric Heavy Contact Profile GEMS Presence Service CPU Utilization 9.8 % GEMS Memory 3.5 GB The Presence service does not use SQL, so there is negligible disk I/O activity. Hence, only CPU and Memory test results are reflected in the above use profile for Presence . Use Profile Definitions (per server instance) for Connect Here, a profile is characterized by the amount of activity generated by users against enterprise Lync deployments. Profile Active Devices (%) Activated Devices supported per server Light 5% 15,000 Medium 10% 10,000 Heavy 15% 5,000 The activity used for scale testing followed general guidelines published in Microsoft Lync 2010 Capacity Planning for Mobility guidance, wherein a user has 60-80 contacts and each user initiates ≈4 IM sessions, each lasting ≈6 mins per session, with 1 message sent every 30 seconds during a session. Once again, for a more detailed explanation of user profile and activity testing, please see Microsoft Lync 2010 Capacity Planning for Mobility. Resource Consumption During GEMS-Connect Load Tests 4-Core, 16 GB GEMS-Connect Profile CPU Memory Disk IOPS Light 55% 8.4 GB 0.000218 MBps/read 0.000398 MBps/write Heavy 70% 9.2 GB 0.016115245 MBps/read 0.000379 MBps/write Note: For 10,000 activated devices (containers) and a medium or average 10% concurrency—the DB size will be no more than 1GB. IOPS is negligible. 5 Pre-Deployment ConsiderationsLegal Notice General Performance (for Connect , Presence, and Push (Mail) configured on the same machine) Due to the modular design of GEMS, you can configure and run all or any of the GEMS services on the same machine or on different machines. As with all distributed systems, performance will suffer without adroit load balancing. One exception should always be made for production environments—do not run SQL Server on the same machine with other GEMS components. For lighter loads, or a lesser number of users (under 10,000), Connect, Presence and Mail Push Notifications can be configured to run on the same physical machine with a low or medium load as defined in the profiles above. Refer to the general performance outline below to determine the best configuration of (a) all services on the same machine or (b) using dedicated servers for each service to optimize performance for your particular traffic and load requirement(s). Generally, the actual use profile for most enterprises per GEMS instance will most likely be somewhere between Light and Heavy. Light profile testing1 conducted by Good on the recommended hardware configuration running all three services reveal the following metrics. Metric GEMS CPU Utilization GEMS IOPS SQL CPU Utilization SQL IOPS Light Profile 60 % 17 iops 32 % 55 iops Good Proxy Connections From the perspective of the Good Proxy (GP) server, GEMS is an application server. Any traffic relayed from GEMS to the GP server will consume a concurrent connection session on the GP server. Consequently, it's important to understand how the individual services in the GEMS machine interact with the GP server. Connect – 1 active device requires 3 connections Presence – 1 active device requires 1 connection Push (Mail) Notification – 1 active device requires 1 connection for EWS 1Again, Good lab test results are reported for the 90th percentile. The 90th percentile is a measure of statistical distributiion. Whereas the median is the statistical value for which 50% of the actual results were higher and 50% were lower, the 90th percentile reports the value for which 90% of the data points are smaller and 10% are greater. 6 Pre-Deployment ConsiderationsLegal Notice Scalability GEMS scales linearly. For this reason, and given the specifications cited, you can create additional capacity by adding more GEMS machines. You will then need to scale-out the database and Good Proxy resources accordingly to account for the additional capacity. See Scaling Factors below for best practices on utilization measurement. High Availability Hardware failure, data corruption, and physical site destruction all pose threats to GEMS services availability. You improve availability by identifying the points at which these services can fail. Increasing availability means reducing the probability of failure. At the end of the day, availability is a function of whether a particular service is functioning properly. Think of availability as a continuum, ranging from 100 percent—a completely fault-tolerant system/service that never goes offline—to 0 percent (never available/never works). Well-planned HA systems and networks typically have redundant hardware and software that makes them available despite failures. Well-designed high availability systems avoid single points-of-failure. Any hardware or software component that can fail has a redundant component of the same type. When failures occur, the failover process moves processing performed by the failed component to the backup component. This process remasters system-wide resources, recovers partial or failed transactions, and restores the system to normal, preferably within a matter of microseconds. The more transparent failover is to users, the higher the availability of the system. At all events, you cannot manage what you cannot measure, so two planning elements are vital before anything else. The first is determining the hardware required to manage and deliver the IT services in question, the basis for which is outlined above. Adequately allowing for growth, measuring as accurately as possible the number of devices, traffic and load likely to be placed on GEMS and its services offers the best indication of the server hardware and supporting infrastructure likely to be required. Concentrating solely on GEMS with Connect and Presence and its supporting architecture, the first objective in setting the goals of a high availability/disaster recovery (HA/DR) investment strategy is to develop a cost justification model for the expense required to protect each component. If the expense exceeds the value provided by the application and 7 Scaling Factors data furnished to the business, plus the cost to recover it, then optimizing the protection architecture to reduce this expense is an appropriate course of action. See High Availability (HA) below for a general discussion of HA options and alternatives. Disaster Recovery Your data is your most valuable asset for ensuring ongoing operations and business continuity. Disasters, unpredictable by nature, can strike anywhere at any time with little or no warning. Recovering both data and applications from a disaster can be stressful, expensive, and time consuming, particularly for those who have not taken the time to think ahead and prepare for such possibilities. However, when disaster strikes, those who have prepared and made recovery plans survive with comparatively minimal loss and/or disruption of productivity. Establishing a recovery site for failover if your primary site is struck by a disaster is crucial. Good recommends mirroring your entire primary site configuration at the DR site, complete with the provision for synchronous byte-level replication of your SQL databases. This is because if the system does fail, the replicated copy is up to date. To avoid a “User Resync” situation, the replica must also be highly protected. See Disaster Recover (DR) below for a discussion of Good's DR recommendations for GEMS. Scaling Factors The scale of your GEMS deployment is largely dependent on the size of your enterprise and its IT logistics—number of sites, distance between sites, number and distribution of mobile users, traffic levels, latency tolerance, high availability (HA) requirements, and disaster recovery (DR) requirements. With respect to HA/DR, two elements must be considered—applications and data. Most commonly, though not exclusively, HA refers to applications; i.e., GEMS Connect and Presence. With clustering, there is a failover server for each primary server (2xN). DR focuses on both applications and data availability. The primary driver of your DR solution is the recovery time objective (RTO). RTO is the maximum time and minimum service level within which a business process must be restored after a disaster to avert an unacceptable break in business continuity. 8 RTO and RPO Before contemplating the optimal number of servers to be deployed, however, it’s wise to first determine the right size of an individual server to meet your enterprise’s “normal use” profile. There are a number of methods for projecting a traffic and use profile. Actual, realworld measurement is recommended and made easy using built-in Windows Performance Monitoring tools. Notwithstanding the method applied, it is important to remember that GEMS performance is governed by two principal factors: CPU utilization and available memory, the former being somewhat more critical than the latter. RTO and RPO For GEMS deployment planning purposes, the first step in defining your HA/DR planning objective is to balance the value of GEMS and the services it provides against the cost required to protect it. This is done by setting a recovery objective. This recovery objective includes two principal measurements: l Recovery Time Objective (RTO) – the duration of time and a service level within which the business process must be restored after a disaster (or disruption) to avoid unacceptable consequences associated with a break in business continuity. For instance, the RTO for a payroll function may be two days, whereas the RTO for mobile communications furnished by GEMS to close a sale could be a matter of minutes. l Recovery Point Objective (RPO) – the place in time (relative to the disaster) at which you plan to recover your data. Different business functions will and should have different recovery point objectives. RPO is expressed backward in time from the point of failure. Once defined, it specifies the minimum frequency with which backup copies must be made. Obviously, if resources were fully abundant and/or free, then everything could have the best possible protection. Plainly, this is never the case. The intent of HA/DR planning is to ensure that available resources are allocated in an optimum fashion. Physical Deployment A production deployment of GEMS requires a clustered configuration, plus consideration given to integration with the Good Dynamics server infrastructure and with your existing enterprise systems. Here, it's important to understand the definition of a "GEMS cluster" and an "instance" within that cluster. 9 Physical Deployment An "instance" is any individual deployment of GEMS, with any combination of services provided by its Java tier and its .NET tier. An instance of GEMS usually runs on one physical machine. However this is not mandatory. The same physical machine could be used to deploy multiple instances of GEMS with service endpoints that listen in different ports. A GEMS cluster is just a group of instances. Within a GEMS cluster, each instance is identical in that they all expose the same services and share a common database. Instances in a cluster can be considered "active / active" in that there is no concept of a "passive" instance used for failover. Even so, instances in a cluster never communicate with each other or synchronize data. All GEMS instances in a cluster are homogeneous in that they all expose exactly the same service(s). This means that when an application is configured in the GC with a list of server endpoints, any of these server endpoints can be expected to provide the same service used by the application. This strategy also promotes ease of horizontal scale/replication, as well as ease of hardware failure correction by swapping in pre-built spares. Simplest Deployment The simplest production deployment of GEMS in a corporate network ( depicted below) comprises: As shown, such a deployment comprises: 10 Physical Deployment l One Microsoft Lync Server and an Microsoft Exchange server deployed in a corporate network and one database. l A single GEMS cluster made up of two physical instances (for fail over). This cluster provides all services—Presence, Instant Messaging, Push Notifications and Exchange Integration—for all device clients. l One Good Proxy server (GP) with affinity configured to both instances in the GEMS cluster, along with only one Good Control (GC) server. Typical Deployment Expanding on the simplest configuration, a typical deployment, adhering to generally accepted IT practices, offers high availability (HA) service access within data centers, rather than geographically distributed disaster recovery (DR) sites between data centers. 11 High Availability (HA) Here, there are two geographical regions (UK and US) to which GEMS clusters are deployed, furnishing device clients access to the services provided by GEMS. Two Microsoft Lync Pools are deployed—one in each geographical region. Device clients in each region are provided access to the Presence service and Connect (IM) service by a GEMS cluster configured to use the Microsoft Lync Pool infrastructure in that region. There is only one GEMS cluster in the UK region (Cluster #1), and it provides the Presence and Connect services. Two GEMS clusters (Clusters #2 and #3) are deployed in the US Region. Cluster #2 provides the Presence and Connect services for devices clients in the US Region, whereas Cluster #3 provides the Email (Push) Notification service for device clients in both regions. In this example, only two physical instances are required for HA. As seen above, there is a separate GP Cluster deployed in each region. GP servers in each cluster are configured to have affinity to the GEMS cluster(s) used by device clients in their region. Only one GC cluster is necessary. It is deployed in the US Region and used by the proxy servers in both GP clusters for both regions. High Availability (HA) Availability is measured in terms of outages, which are periods of time when the system is not available to users. Your HA solution must provide as close to an immediate recovery point as possible while ensuring that the time of recovery is faster than a non-HA solution. Unlike with disaster recovery, where the entire system suffers an outage, your high availability solution can be customized to individual GEMS resources and services. HA for GEMS means that the runtime and Service APIs for Push Notifications, Presence, and Connect are unaffected from the perspective of a device client whenever any instance of GEMS goes down or any of its services stop working. GEMS-HA Design Principles Services provided by GEMS instances should not differ in their approach to: i. Even distribution of work over instances ii. Detection of instance failure and 12 High Availability (HA) iii. Reallocation of work for existing users. Hence, the following design principles are followed for all services: l Shared Storage – Achieves HA/DR by adopting a shared storage model and, where possible, services provided by GEMS instances are stateless so that device clients can select any GEMS instance regardless of where they may have been previously connected. l Client-Side Load Balancing – Clients know the list of server endpoints in a GEMS cluster (with affinity to their GP cluster) and service requests are evenly distributed to those server endpoints via client-side load balancing. l Heartbeat – Services on each instance are responsible for reporting their own health in the shared database. l Elected Health Watcher pattern – One instance in the cluster is chosen through an election algorithm to watch the health of all the others, and then centrally coordinate work load distribution in response to a failed instance. All instances can be watchers and the election algorithm provides fail over for watchers. l User tables in Shared Storage – To aid failover, the database can be used to determine which instance in a GEMS cluster is currently being used to handle work for which end users. HA for Instant Messaging Instant Messaging (IM) is provided by the GEMS-Connect service to the Good Connect client. Load Distribution Client devices are aware of a list of endpoints (server instances in the same GEMS cluster) which they can contact for the Connect (IM) service. Each user session is kept up to date in the database, including which server instance is currently handling the user session. If a server instance receives a request for a user it has not yet served, it first looks for any sessions the user may have in the database, and may respond with a 503 referral to a different instance that is already holding a live session for that user. The Connect client cooperates by obeying referral responses. 13 High Availability (HA) Referral Server instances can be marked "offline" in the database due to a heartbeat failure or because another instance in the GEMS cluster has determined that it is offline. If the server of record for the user is offline, the newly contacted server can adopt the user session, dynamically establish a session with Lync on behalf of the user, and then transition the user to the new server. The offline status of the server has no effect on requests being routed to it, but it prevents referral to it by other servers of incoming requests. HA for Presence The Presence Service provided by GEMS consists of an HTTP service called by device clients and a "Lync Presence Provider" that integrates with a Lync Pool deployment (.NET). GEMS clusters used for the Presence and Connect Services will be specialized for this purpose, even though they are capable of supporting Push Notifications and Exchange integration. Put another way, the Presence service is deployed with Connect following the Connect deployment pattern with the Lync infrastructure. Load Distribution Device clients can use any instance in the GEMS cluster to establish multiple different Presence subscriptions; for example, matching a list of Contacts, Email participants, or a GAL Search. Moreover, multiple instances in the GEMS cluster can all reuse the same Lync Presence subscription. Presence subscriptions are not long lived and they are not suitable for storage in a database. Instead, they are stored in a persistent cache shared by all instances in the GEMS cluster, where they readily expire. The persistent cache is used to maintain a timestamp for each subscription which is used by the Presence service to determine what new presence information to provide to the client on request. HA for Push Notifications The High Availability objective for Push Notifications is that device clients should be able to register once for Push Notifications, and not be impacted by servers that manage those notifications going up and down. Even though device clients can be expected to eventually resubscribe, the GEMS HA design does not depend on them doing so. 14 High Availability (HA) Incoming push registrations are directed at random to any instance of GEMS in a cluster. There is no affinity to server instances for device clients based on mailbox. If the push registration already exists in the shared database, then it is assumed that one instance in the cluster is already managing an EWS Listener subscription for that user. No new action needs to be taken, except to reset the watermark of the push registration in the database for aging purposes. The EWS Listener Service on each instance of GEMS is responsible for periodically updating its own health status in the shared database. If the EWS Listener Service for any one instance fails to refresh its own health status within an expected time window, then it is considered down. One instance in the cluster is elected as a "Watcher" for this condition, whereupon it is responsible for instructing another instance to take over (and recreate) EWS subscriptions for user mailboxes that were currently attributed to the dead instance. This is done by updating Push Registrations in the shared database to reflect the new instance upon which the EWS Listener Service should manage those user mailboxes. When the dead instance comes back it is just another instance that is ready to manage new push registrations. HA Failover Process/Behavior Summary GEMS can scale horizontally and offers N+1 redundancy. This offers the advantage of failover transparency in the event of a single component failure. The level of resilience is referred to as active / active (a k a "hot") as backup components actively participate with the system during normal operation. Failover is generally transparent to the user as the backup components are already active within the system. In adequately configuring your GEMS components for this redundancy, the following measures must be taken: 1. Configure additional GEMS machines in a cluster to use the same underlying SQL Server database. This is done through the GEMS Dashboard. 2. Configure the additional GEMS Hosts in Good Control. This configuration can happen in two locations within Good Control, depending on your deployment model. Once configured, each client receives a list of supported GEMS servers during app start up of Good Connect/Good Work. The client will then choose a server from the list at random and continue to utilize that server for the life of the user's session. 15 Disaster Recovery (DR) A session constitutes an active login with the system and persists until a user either manually signs out or a 24-hour period (configurable), which ever comes first. Should one server fail, the client will retry additional servers from the list until it can successfully login. Any existing active user session will be seamlessly transferred to the new server. Detailed HA configuration steps are available in the GEMS Installation and Configuration Guide. Additional HA Considerations After adding servers for HA, each client must update its policies in order for it to be aware of the new systems. Policy updates are automatically performed each time the client is launched or a new policy is detected. However, the update could be delayed if the Good Control (GC) server is overburdened with update requests. As of the current release, each GC server can process two policy updates per second. Thus, it is important to scale your GC servers to match your policy update requirements. If you are using server affinities, these settings will need to be adjusted to account for the new servers. Disaster Recovery (DR) Disaster Recovery is different from High Availability among instances of a GEMS cluster in that an entire cluster in one region has become unavailable and device clients need to be Conclusion redirected to a GEMS cluster in a different region that provides the same services. The DR model for a GEMS cluster in a data center is to have another identically configured GEMS cluster in different data center (failover) that shares the same storage through a replication strategy provided by the vendor of the database and file system. This is the same strategy prescribed by Good Dynamics for disaster recovery of a GC cluster. Diagrammed below is a typical pattern for Disaster Recovery using a Primary and a Standby data center. Note that although the GEMS cluster illustrated is used for Presence and Connect, the pattern should be identical for a GEMS cluster used for Push Notifications and Exchange integration. 16 Disaster Recovery (DR) Note: Virtual IP is commonly employed by IT for failover, but it is not mandated by Good Dynamics in this case. GP Clusters already have a "primary", "secondary" and "tertiary" configuration with respect to an application managed in the GC. A Load Balancer with Virtual IP is used to route device traffic to a GP cluster in the primary datacenter. This GP cluster has affinity to the GEMS instances in a GEMS cluster likewise located in the primary datacenter. The Load Balancer is responsible for periodic heath check of the GEMS cluster in the primary datacenter. If the health check fails, then the Load Balancer initiates fail over to the GEMS cluster in the standby datacenter. Device clients are then routed to a GP cluster with affinity to server instances in that GEMS cluster. The database in the standby datacenter is replicated from the production database in the primary datacenter. However, any state—such as Presence subscriptions and active Lync conversations—would be lost and must be recovered as clients submit subsequent requests. 17 Disaster Recovery (DR) Good Control server instances in both the standby datacenter and the primary datacenter are in the same GC cluster because they all use replicas of the same shared storage. The only difference is that GC server instances in the standby datacenter have affinity with the GP cluster in the same datacenter. When the Health Check indicates that the primary datacenter is available once again, the Load Balancer will initiate failover back to the GEMS cluster in the primary datacenter. With respect to push notifications(, when a DR failover happens, device clients must resubscribe using the Push Notification Service (PNS) provided by the GEMS cluster in the standby datacenter. There is no expectation that EWS Listener subscriptions for existing users will be automatically recreated. DR Failback Process/Behavior Assuming the DR site is properly configured , failover should be transparent to the end user. As noted earlier, the client is aware of multiple GC, GP and GEMS with which it can connect. In the event that the primary site goes offline, GEMS clients will try to connect to the services in the secondary site. Before failing back, you must make sure that the secondary database is synchronized with the primary database. Update the DNS accordingly to remap infrastructure resources. From a client perspective, the user may need to quit and relaunch the app. In most cases, however, the process will be transparent to the end-user, and the app will reconnect to the primary resources once it comes back online. Phased Approach Recommendation Clearly, the key to a successful GEMS disaster recovery event is proper planning. To this end, the following phase approach is recommended: Phase 1 – Ensure and verify that all services are working properly in the primary site before introducing DR. Phase 2 – Independent of GEMS, test and verify that the infrastructure is setup properly in the secondary site. This includes, but not limited to, AD, SQL and Lync. Phase 3 – Add additional GC, GP and GEMS machines in the secondary site as appropriate. Phase 4 – Update configuration to include new GC, GP and GEMS machines. Phase 5 – Test a Failover/Failback. 18 Deployment with Good Dynamics Deployment with Good Dynamics A number of factors bear consideration in appropriately deploying GEMS services with an existing or newly established GD infrastructure. Network Separation Good Control instances in a GC cluster do not need to be reachable by GEMS instances in a GEMS cluster. This may be desirable to an IT administrator since GC instances could be installed with high privilege service accounts to perform Kerberos Constrained Delegation (KCD) and may hold sensitive security tokens. In such cases, GC clusters and GEMS clusters can be deployed in different network zones separated by a trust boundary. Server Instance Configuration in Good Control Device clients are able to access GEMS instances in a GEMS cluster because each individual network endpoint for each instance in the cluster has been configured in a "Server List". This is the list of endpoints provided to a device client identified by its application ID. For example, a device client activated with a deployment of Good Control as configured below would be presented with three network endpoints to use for access to Services in a GEMS cluster. Not shown here is the ability to associate user groups to each network endpoint. This permits assignment of users to a GEMS cluster accessed via the GP cluster in their region, as described earlier. 19 Deployment with Good Dynamics These network endpoints configured in the GC do not reflect any physical deployment topology for the actual server instances. IT departments rely on separate infrastructure for routing within the enterprise and across sites. In fact, an IT department may employ VPN, Router, Load Balancer or other infrastructure configuration behind each of these devicefacing network endpoints. Note also that network endpoints configured in this way are implicitly whitelisted by the GC. Server-Side Services Service names for each service provided by GEMS are registered on the Good Dynamics Network along with a service definition. An "application" is then created in Good Control and has bound to it one or more Service Definitions. In the example below there is an "application" called "com.g3.good.presence" and it has been bound to one server-side service called, "G3 Presence Service". Note that the application concept here does not represent an app on a device. Rather, it is a construct that can be used to entitle user and group access to the service(s) that are bound to it. Now, when a user who is entitled to this Application ID uses any GD application in their device, the device client is informed of this server-side service, plus all the network endpoints for it (via the "Application" entitlement in the GC), as illustrated above in Server Instance Configuration in Good Control. 20 Conclusion Conclusion In the most optimistic scenario, practically speaking, a GEMS cluster exposing all GEMS services and has two physical instances for failover—a simple system to manage. However, in large enterprises, IT organizations typically choose to deploy GEMS in a manner consistent with their existing enterprise systems, matching how Microsoft Lync and Exchange are deployed. The deployment architecture and HA design principles for GEMS are, in essence, identical to those of Good Dynamics. This consistency becomes increasingly necessary as GEMS seeks to provide the runtime environment for GD Server-Side Services, and ultimately to replace the Application Server runtime environment for Good Control. 21 Appendix A – Upgrading from Good Connect Classic Appendix A – Upgrading from Good Connect Classic Good Enterprise Mobility Server (GEMS) with Connect and Presence (CP) services is built on a different platform than the classic Good Connect server. As a result, there is no direct upgrade path from the classic Good Connect server to GEMS with Connect and Presence. For existing classic Good Connect server environments, please review the guidance that follows when upgrading to GEMS with Connect and Presence. The guidance found here covers two of the most common upgrade scenarios. It is not intended to be a step-by-step upgrade procedure, but rather a general overview of the process as a whole. Knowledge of the classic Good Connect server is required. Where appropriate, cross-references to more detailed instructions are indicated. Upgrade Scenario 1: Parallel Server (Recommended) In this scenario a new server is provisioned for GEMS with Connect and Presence to run in parallel with the existing classic Good Connect Server. The benefit is that no service interruption is required on the existing Good Connect system while GEMS is deployed. The parallel server upgrade environment can be generally depicted as follows: 22 Appendix A – Upgrading from Good Connect Classic Pertinent Considerations in this Scenario Good Dynamics We recommend that you upgrade Good Control to v1.7.38.19 and Good Proxy to v1.7.38.14 in preparation for the installation of GEMS. Service Account The service account used for the classic Good Connect server can also be used for GEMS. Database A new schema (Oracle) or database (MS SQL) will need to be created for use by the new GEMS installation. Microsoft Lync Configuration Your existing classic Good Connect Lync application pool can be reused. However, the new GEMS machine must be added as a Trusted Application computer. If you are planning to use the Presence service as well, an additional Application ID will need to be created. Please see the GEMS Installation and Configuration Guide for details. GEMS Host Machine SSL/TLS Certificate The new GEMS machine will need its own (unique) SSL/TLS certificate. Please see the GEMS Installation and Configuration Guide for additional detail regarding setting up the SSL/TLS certificate. Good Control Configuration The “Good Connect” application configuration in Good Control will need to be updated to include the new GEMS-Connect service. Caution: To minimize interruption to production users, Good Connect server affinities should be set up prior to updating the Good Connect application configuration. It is recommended that you set up two polices: one with user affinity to the classic Good Connect server, and another with affinity to GEMS-Connect. When you schedule your users to be switched over to the new server, make sure you ask them sign out of their Connect client prior to the maintenance window. 23 Appendix A – Upgrading from Good Connect Classic Verification/Testing Verify that clients can connect to the GEMS-Connect service. This can be done by assigning a user to a policy that contains the new GEMS-Connect service. Moving Users After testing is complete, all users can be moved to GEMS by updating the user’s policy set. Specifically, update the server affinity to point to the new GEMS machine. As mentioned above under Good Good Control Configuration, it is also recommended that when you schedule users to be switched over to the new server, you ask them sign out of their Connect client prior to the maintenance window. Classic Good Connect Server After all users have been moved to the new GEMS machine, the old classic Good Connect server can be decommissioned or repurposed. Upgrade Scenario 2: Repurpose Existing Server In this scenario the existing classic Good Connect server will be repurposed for GEMS. As pointed out previously, a direct upgrade on the same machine running classic Good Connect is not possible. The existing classic Good Connect server software must be uninstalled before the GEMS software is installed. The benefit of this approach is that a new server is not needed. This mean, however, that service on your production Good Connect server will be interrupted. The existing server upgrade environment can be generally depicted as follows: 24 Appendix A – Upgrading from Good Connect Classic Pertinent Considerations in this Scenario Good Dynamics We recommend that you upgrade Good Control to v1.7.38.19 and Good Proxy to v1.7.38.14 in preparation for the installation of GEMS. Service Account The service account used for the classic Good Connect server can be used for GEMS. Database You will need to run the DDL/DML database scripts for Oracle or MS SQL to reset the schema or database used by the GEMS product. Microsoft Lync Configuration The existing classic Good Connect Lync application pool and Trusted Application Computer can be reused. Again, if you are planning to use the Presence service, an additional 25 Appendix A – Upgrading from Good Connect Classic Application ID will need to be created. See the GEMS Installation and Configuration Guide for details. GEMS Host Machine SSL Certifcate If the FQDN of the server did not change, the existing SSL certificate can be reused; however, if you are planning to use the Presence service, the certificate will need to be updated with a SAN to include the Presence service App ID. Consult the relevant section in the GEMS Installation and Configuration Guide for additional instructions. Good Control Configuration If the FQDN of the server did not change, then the “Good Connect” application configuration in Good Control can remain the same. Please ask users to sign out of Good Connect prior to the upgrade since their temporary session information on the server will be lost during the upgrade process. Verification/Testing Verify that both existing and newly provisioned clients can connect to the GEMS-Connect service. 26 Appendix B – Hardware Used for Testing GEMS Appendix B – Hardware Used for Testing GEMS The following computer hardware was used for PSR validation. Component Processor EWS Push (Mail) AMD Opteron Notification 6234 2.4 GHz – Memory OS 16 GB Microsoft Windows Server 2008 R2 Enterprise 64 bit 4vCPU Connect AMD Opteron 16 GB 6234 2.4 GHz – Microsoft Windows Server 2008 R2 Enterprise 64 bit 4vCPU Presence AMD Opteron 16 GB 6234 2.4 GHz – Microsoft Windows Server 2008 R2 Enterprise 64 bit 4vCPU Connect , Presence, and AMD Opteron EWS Push (Mail) 6378 2.39 GHz – configured on the same 4 cores Virt 16 GB Microsoft Windows Server 2008 R2 Enterprise 64 bit machine SQL Server for GEMS AMD Opteron 8 GB Microsoft Windows Server 2008 6234 2.4 GHz – R2 Enterprise 64 bit / MS SQL 4vCPU Server 2008 R2 Note: This hardware profile was used for all GEMS PSR testing. All service configurations were tested running SQL Server on a separate machine. 27 Legal Notice This document, as well as all accompanying documents for this product, is published by Good Technology Corporation (“Good”). Good may have patents or pending patent applications, trademarks, copyrights, and other intellectual property rights covering the subject matter in these documents. The furnishing of this, or any other document, does not in any way imply any license to these or other intellectual properties, except as expressly provided in written license agreements with Good. This document is for the use of licensed or authorized users only. No part of this document may be used, sold, reproduced, stored in a database or retrieval system or transmitted in any form or by any means, electronic or physical, for any purpose, other than the purchaser’s authorized use without the express written permission of Good. Any unauthorized copying, distribution or disclosure of information is a violation of copyright laws. While every effort has been made to ensure technical accuracy, information in this document is subject to change without notice and does not represent a commitment on the part of Good. The software described in this document is furnished under a license agreement or nondisclosure agreement. The software may be used or copied only in accordance with the terms of those written agreements. The documentation provided is subject to change at Good’s sole discretion without notice. It is your responsibility to utilize the most current documentation available. Good assumes no duty to update you, and therefore Good recommends that you check frequently for new versions. This documentation is provided “as is” and Good assumes no liability for the accuracy or completeness of the content. The content of this document may contain information regarding Good’s future plans, including roadmaps and feature sets not yet available. It is stressed that this information is nonbinding and Good creates no contractual obligation to deliver the features and functionality described herein, and expressly disclaims all theories of contract, detrimental reliance and/or promissory estoppel or similar theories. Legal Information © Copyright 2014. All rights reserved. All use is subject to license terms posted at www.good.com/legal. GOOD, GOOD TECHNOLOGY, the GOOD logo, GOOD FOR ENTERPRISE, GOOD FOR GOVERNMENT, GOOD FOR YOU, GOOD APPCENTRAL, GOOD DYNAMICS, SECURED BY GOOD, GOOD MOBILE MANAGER, GOOD CONNECT, GOOD SHARE, GOOD TRUST, GOOD VAULT, and GOOD DYNAMICS APPKINETICS are trademarks of Good Technology Corporation and its related entities. All third-party technology products are protected by issued and pending U.S. and foreign patents. 28