Geoclustering Git
Transcription
Geoclustering Git
Geoclustering Git Delivering Performance and Reliability When Using Git for Global Development Teams Brett Taylor, Go2Group October 2015 TABLE OF CONTENTS Introduction ......................................................................................................................... 3 GIT: the fastest growing version control system ................................................................ 4 Inherent value .............................................................................................................. 4 Challenges ................................................................................................................... 5 Achieving enterprise-class resiliency with Git .................................................................... 6 Clustering architectures ............................................................................................... 6 Types of clustering ....................................................................................................... 7 Enterprise Git: Atlassian’s Bitbucket and Bitbucket Data Center ................................. 8 Bitbucket’s geographic limitations ................................................................................ 8 Go2Group’s Geoclusters .................................................................................................. 10 Overview .................................................................................................................... 10 Data Flow ................................................................................................................... 11 Bitbucket high availability options ............................................................................. 13 Benefits of geoclustered architecture ......................................................................... 13 Performance data ...................................................................................................... 15 Conclusion ........................................................................................................................ 16 Contacting Go2Group ...................................................................................................... 17 Go2Group and GSA ................................................................................................... 17 Contact ....................................................................................................................... 17 Notice ......................................................................................................................... 18 Geoclustering Git Go2Group 2 Introduction Git is the fastest growing version control system. But few Git systems meet enterprise requirements for performance and reliability, especially when deployed by globally diverse software development teams. In this white paper, we will take a closer look at how geoclustering Git— placing clustered instances on multiple servers at multiple locations— guarantees availability and enhances performance by sharing the workload and preventing outages. Go2Group’s Geoclusters for Atlassian Bitbucket provides the always-on, always-available experience modern enterprises demand. Smart companies recognize faulty code as a significant business risk. In one of the biggest outages of 2014, cloud storage company Dropbox experienced a global outage when a bug in an upgrade script tried to reinstall an operating system on an active machine. Therefore, IT’s focus on fail-safe structure has moved down the stack from the network to the application server and the developer’s application code is a focus. Software development teams depend on version control systems to improve the product lifecycle delivery process. Git has become the fastestgrowing and most widely distributed version control system on the market, and Atlassian’s Bitbucket has become one of the most popular versions of Git for the always-on enterprise. As a proven architecture for both local and remote clusters for Bitbucket, Go2Group Geoclusters allows any company to benefit from a fully supported global mirroring solution for Bitbucket. Geoclusters provide redundant local Bitbucket mirrors for the best possible performance and an additional level of availability protection for intellectual property. With this architecture’s ability to span data centers around the world, distances over 100 kilometers are made viable. Geoclusters build on Git’s native mirroring capability to provide local performance speeds at remote sites, clustering to support continuous integration (CI) farms, and multiple copies of critical source code as part of a comprehensive disaster recovery (DR) solution. In this paper we’ll discuss the use of Git as a version control system, achieving Git resiliency, incorporating Go2Group Geoclusters, and performance data. Geoclustering Git Go2Group 3 GIT: the fastest growing version control system Version control systems are key for any organization that develops software because software development is rarely a solitary effort. Modern development requires large amounts of data, and there’s an ongoing demand for developers to version all information required for the release of a product. Git meets that demand. Created in 2005 by Linus Torvalds, the father of Linux, Git is the fastest growing version control system. As of May 2014, 42.9% of all software developers used either Git or Github as their primary source control system, according to the Eclipse Community Development Survey. As of June 2015, GitHub has over 10,000,000 users. Thirty-three percent of respondents to a 2014 Forrester Consulting enterprise survey indicated that 60% or more of their code was currently stored and managed by Gitbased systems. Inherent value Git provides access to local repositories for developers, giving them the ability to make changes and branches locally. Since Git is inherently a distributed version control system, developers may use it to work on a shared project that requires a different workflow than that of a centralized version control system. Often, separate repositories are used to model more stable branches and whichever maintains the more stable repository will pull completed work from those of the contributor. While all distributed version control systems provide some degree of disconnected operation, the major benefit of Git is its ability to work in an environment where network connectivity is unreliable or unavailable. The value of disconnected operation depends on how many of the developers involved in a project are regularly working while disconnected, how frequently they are doing so, and for how long. Successful businesses seldom work in single-site isolation. Forrester Research describes this as the “extended enterprise” where employees are expected to perform their jobs anytime and anywhere. The more a project involves being disconnected, the more value a Git system provides. Of course, the ability to work while disconnected is not the only benefit of having a local repository. Congestion frequently occurs within central repositories, often when working on an especially large project, during integration. In this case, speed of operation depends on how many people are trying to integrate at the same time, the number of conflicts, and the strength of the control system’s merging capabilities. Geoclustering Git Go2Group 4 However, when the time comes to share work in an enterprise setting, all changes must eventually flow back into the central repository. Challenges If a developer is working in India and the master repository is in California, for example, every push suffers from latency due to network delays. And since Git doesn’t offer any out-of-the-box mirroring capabilities, even read operations, like clones and pulls, can be slow. The same problems hold true for a master repository that is supporting a large number of concurrent users. While Git is distributed and doesn’t have the vulnerabilities of centralized version control systems, it still has some shortcomings, including: • Access control: In order to cater to geographically dispersed teams, Git allows access to all parts of a company’s source code. It can authenticate, but local mirrors do not automatically apply Bitbucket’s authorization rules. That is, it allows users to verify who they claim to be but has no way to ensure that those users have the right to access something. • Backup and recovery: Procedures in Git must discover and account for all important repositories. All distributed version control systems require a comprehensive backup/recovery system to avoid outages if the central repository goes down. The centralized master repository feeds the build automation, code review, and other ALM systems. • Centralized usage: While Git allows for great freedom of use of local branches and repositories, a central repository is still the focal point of collaboration. A centralized model poses performance bottlenecks for remote teams and scalability bottlenecks for larger sites. Geoclustering Git Go2Group 5 Achieving enterprise-class resiliency with Git Forrester describes continuous availability as those times when high availability and disaster recovery are at the point of being one and the same. An easy, automated process simplifies disaster recovery, reduces administration and application recovery times, facilitates business continuity, and minimizes user impact. As part of a comprehensive layered availability strategy, enterprises choose to rely on replicated data kept up to date in near real-time. Clustering architectures While there are many components required to achieve continuous availability, the most appropriate technology for the always-on, alwaysavailable Bitbucket is a clustered architecture. This solution provides multiple redundant copies of critical data, with either centralized or independent management of related metadata. Clusters were first devised over 50 years ago, when it was first realized that work could no longer be made to fit on a single computer. Clusters are defined as a set of servers viewed as a single system that, together, provide a more available and scalable platform for hosting applications. With clustering, work can be done in parallel. The goal of a cluster is to pool the resources of several servers while achieving high availability and sustained performance. As distributed solutions, they are often harder to set up and maintain than their centralized counterparts. However, they offer more resilience to failure and allow systems to grow beyond the capacity of a single server. How does a clustered architecture augment distributed version control systems with regard to distance, latency, and the degree of protection? All clustered servers participate equally in servicing user requests and other processing, so the read load (typically 90% or more of the load on a Git server) is evenly balanced and distributed among the servers. If one server goes down, failover to the other servers happens automatically without manual intervention, typically within seconds. When a new server joins (or an existing one rejoins) the cluster it begins to service user requests and other processing automatically, as soon as it comes online. Geoclustering Git Go2Group 6 Types of clustering Examples of high availability clusters1, their attributes, and geographies include: • Local cluster: A single set of servers located at one data center of location. Network latency can be neglected. Data is accessed synchronously by all servers. • Metro cluster: A set of servers placed within a “metro” distance (generally up to 50 kilometers) with all sites connected by fiber. Network latency is usually low (<5 ms for distances of approximately 20 miles). Data is frequently replicated, either with mirroring or synchronous replication. • Geocluster: Multiple geographically dispersed sites, each with a local cluster, that are thousands of kilometers apart. The sites communicate via IP. Geoclustering keeps multiple instances of servers, so it doubles as high availability redundancy while also offering performance benefits, since it’s local to each team. Geoclusters need to cope with 1 Clusters of this kind have been referred to by many names, including local clusters, campus clusters, metro clusters, geo clusters, stretched clusters, and extended clusters. Geoclustering Git Go2Group 7 limited network bandwidth and high latency. Data is replicated asynchronously. Most of the servers are not local and are set up with some distance between them. In geographies where systems are too far apart, communication must be done asynchronously between multiples sites. When specifying a solution for dispersed geographies, considerations need to include: • how to make sure that a cluster is up and running • how to make sure that resources are only started once • how to manage failover between sites • how to deal with high latency in the event that resources need to be stopped • how to ensure a workload will be restarted on another cluster in a far removed location in the event of a catastrophe Enterprise Git: Atlassian’s Bitbucket and Bitbucket Data Center As with all types of software, there are many flavors of Git. The different types of Git used by developers are: Atlassian Bitbucket, Collabnet TeamForge, GitHub Enterprise, GitLab, and Wandisco Git Multsite. Atlassian Bitbucket was released in 2012. It is a development tool that serves as Atlassian’s Git repository management tool for enterprise teams. It allows for everyone in an organization to easily collaborate on Git repositories. Atlassian released Bitbucket Data Center in 2014, in an effort toward further scalability and resiliency.. Bitbucket Data Center was introduced with enterprise workloads in mind. Furthermore, Atlassian integrated two of its own tools into the Bitbucket Data Center service to speed the development process: the JIRA bug tracking software and the Bamboo continuous integration software for quickly testing new versions of a program. Bitbucket’s geographic limitations Atlassian’s Bitbucket Data Center popularized the concept of highavailability Git through its active/active cluster configuration, but it is designed for clustered servers in a single data center, not for multiple sites. Since, like all Git solutions, Bitbucket encourages a distributed developer enterprise environment, remote sites suffering from high network latency during Git operations may perform slowly. Geoclustering Git Go2Group 8 Because modern-day developers are often geographically remote, committed code is frequently moved from one repository to another. Git works well in a local environment that has integrated development on the same location and network. However, it does not work well for distributed development teams spread across various locations. The pressing questions for many code developers is “What are my requirements for code availability, accessibility, and geographies?” Geoclustering Git Go2Group 9 Go2Group’s Geoclusters As the only ALM-specific geoclustering solution, Go2Group Geoclusters lets developers create clusters at any distance to maintain business continuity. Performance and availability for read-only operations take a significant leap forward, while all operations benefit from Bitbucket’s authorization rules. Servers can be in different buildings or different continents. Overview Go2Group Geoclusters for Bitbucket allows teams in remote locations to share the same code base as the local teams working on a project, while limiting latency and bandwidth issues and staying current with the updated code base by using geo synchronization. Prior to Geoclusters, remote offices had a difficult time receiving the most up-to-date code and dealing with resource-draining bandwidth requests from remote Git servers. They also had a tough time supporting agile development and testing, known as Git branch builds. Now developers can seamlessly connect their remote teams worldwide, as if they were all in the same location. Geoclusters involve the use of multiple redundant computing resources located in different geographical locations to form what appears to be a single, highly-available system. The biggest challenge in geoclustering is to make sure that system states and their associated data are concurrent at multiple locations. Geoclustering Git Go2Group 10 Go2Group Geoclusters overview Synchronous replication from Bitbucket to the Geocluster nodes serves as an always-on backup, eliminating the need for conventional disk mirroring solutions that only work over a LAN. New changes are pushed to each mirror as they arrive and monitoring tools provide up-to-date status of all mirrors. Data Flow The diagrams below show the data flow for user read and write operations. Read operations using geoclusters Geoclustering Git Go2Group 11 Write operations using geoclusters The system automatically keeps the mirrors in sync as new updates arrive at the central repository. Data synchronization Geoclustering Git Go2Group 12 Bitbucket high availability options Atlassian Bitbucket Atlassian Bitbucket Data Center Bitbucket with Go2Group Geoclusters # of sites Single-site Single-site Multi-site Bandwidth High Medium Low Servers Single Server Multiple Servers Multiple Servers Clustering None Active-Active Synchronous push Scalability Zero High Scalability Highest Scalabiity Network Latency Negligible Low High Replication Synchronous Synchronous Synchronous Communications None Network Connection Internet Protocol (IP) Distance None Single Data Center Unlimited Overall Rating Good Better Best Benefits of geoclustered architecture Go2Group Geoclusters enables continuous availability by employing both architectural and design advantages over single-site solutions, including: Protection through multiple redundant copies of repositories: Go2Group Geoclustering works with Atlassian Bitbucket to improve recovery time objectives (RTO) by making multiple copies of valuable repositories available at several locations. Rapid recovery: Should one mirror site fail due to an event like a flood, Geoclusters can route all work to another site, which can take over the processing with nearly no interruption for connected users. Each mirror is periodically verified to make sure that it is consistent with the central repository. Geoclustering Git Go2Group 13 Full utilization of resources: Geoclusters’ ability to distribute read activity across all servers, including running a single workload across the whole cluster, allows the greatest flexibility in terms of resources. Since Geoclusters uses one physical database across the distance, there is neither a lag in data freshness nor any requirement for implementing conflict schemes. Simplicity in setup, managing, and monitoring: Metrics, verification, and system health for each site’s status are presented in an intuitive graphical interface. Bandwidth efficiency in the WAN and improved remote site performance: Bandwidth is free in the LAN but not in the WAN. With Geoclusters, remote WAN users experience the same LAN-speed read performance as local users. This is done by maintaining the equivalent of a single copy of the data across the system. Checkouts and other read operations are always local, so no WAN traffic is generated. Geoclustering Git Go2Group 14 Performance data The following tests performed by Go2Group were over distances of zero (local), 50, and 100 kilometers. The following graph shows the overall performance impact on Bitbucket due to distance measured as a percentage of local performance. Note: Write-intensive Bitbucket is generally more affected by distance then read-intensive Bitbucket. Given these numbers, it can be concluded that Atlassian Bitbucket Data Center performs acceptably in general at distances under 50 kilometers all the way up to 100 kilometers. When distances exceed 100 kilometers, Go2Group Geoclusters for Bitbucket improves on Atlassian’s Bitbucket Data Center. Geoclustering Git Go2Group 15 Conclusion Distance can have a huge effect on performance, so keeping the distance short and using dedicated, direct attached networks is optimal, but not always possible. Go2Group Geoclustering for Bitbucket is an attractive architecture that allows scalability, rapid availability, and even partial disaster recovery protection. Compared to an Atlassian Bitbucket and Bitbucket Data Center configuration, Go2Group’s geoclustering architecture provides the highest level of availability for an Atlassian environment where developers must have an always-on, always-available experience. Geoclustering Git Go2Group 16 Contacting Go2Group Go2Group is a global provider of consulting services, third-party application integrations, data migrations, software testing, and training services in Application Lifecycle Management (ALM) systems. We’ve implemented thousands of enterprise-level migrations. We specialize in complex, multi-platform, ALM integration projects. Our goal: Make it easy. Our clients say, "We feel like you are part of our team.” An Enterprise and Platinum Atlassian Expert, we offer a full suite of services for all Atlassian products and are the world’s largest reseller of Atlassian tools. We’re certified partners for the best-of-breed ALM solutions, including Atlassian, HP, IBM, Microsoft, Perforce. We specialize in integrating ALM tools such as Atlassian, HP, IBM, Microsoft, Perforce, ServiceNow, and many more: Users work in the tools they prefer, the data is synchronized automatically. Go2Group and GSA Products and services from Go2Group and its partners, including Atlassian, Microsoft, and Perforce, are available via GSA or several GWACs and procurement vehicles. We’re expert in government policies and strategies. Contact http://www.go2group.com/ Corporate Office, USA: 138 North Hickory Avenue, Bel Air, MD 21014 Hawaii: 7007 Hawaii Kai Drive, Suite C26, Honolulu, HI 96825 Japan: Le Premier Akihabara 11th Floor, 73 Kanda Neribei-cho, Chiyodaku, Tokyo 101-0022 China: Great Wall Computer Building A301, 38 Xueyuan Road, Haidian District, Beijing 100083 Telephone: 877-442-4669 (U.S. toll free); +1-410-879-8102 (U.S.) Email: [email protected] Geoclustering Git Go2Group 17 Notice © 2015 Go2Group, Inc. All rights reserved. Subject to change without notice. ConnectALL is a registered trademark of Go2Group, Inc. in the U.S. and other countries. Bitbucket and Bitbucket Data Center are registered trademarks of Atlassian. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. This white paper is for informational purposes only. Go2Group makes no warranties, express, implied, or statutory, as to the information in this document. WP-G2G-1000 Geoclustering Git Go2Group 18