Geoclustering Git

Transcription

Geoclustering Git
 Geoclustering Git Delivering Performance and Reliability When Using Git for Global Development Teams
Brett Taylor, Go2Group October 2015 TABLE OF CONTENTS Introduction ......................................................................................................................... 3 GIT: the fastest growing version control system ................................................................ 4 Inherent value .............................................................................................................. 4 Challenges ................................................................................................................... 5 Achieving enterprise-­class resiliency with Git .................................................................... 6 Clustering architectures ............................................................................................... 6 Types of clustering ....................................................................................................... 7 Enterprise Git: Atlassian’s Bitbucket and Bitbucket Data Center ................................. 8 Bitbucket’s geographic limitations ................................................................................ 8 Go2Group’s Geoclusters .................................................................................................. 10 Overview .................................................................................................................... 10 Data Flow ................................................................................................................... 11 Bitbucket high availability options ............................................................................. 13 Benefits of geoclustered architecture ......................................................................... 13 Performance data ...................................................................................................... 15 Conclusion ........................................................................................................................ 16 Contacting Go2Group ...................................................................................................... 17 Go2Group and GSA ................................................................................................... 17 Contact ....................................................................................................................... 17 Notice ......................................................................................................................... 18 Geoclustering Git Go2Group 2 Introduction Git is the fastest growing version control system. But few Git systems
meet enterprise requirements for performance and reliability, especially
when deployed by globally diverse software development teams.
In this white paper, we will take a closer look at how geoclustering Git—
placing clustered instances on multiple servers at multiple locations—
guarantees availability and enhances performance by sharing the
workload and preventing outages. Go2Group’s Geoclusters for Atlassian
Bitbucket provides the always-on, always-available experience modern
enterprises demand.
Smart companies recognize faulty code as a significant business risk. In
one of the biggest outages of 2014, cloud storage company Dropbox
experienced a global outage when a bug in an upgrade script tried to
reinstall an operating system on an active machine.
Therefore, IT’s focus on fail-safe structure has moved down the stack from
the network to the application server and the developer’s application code
is a focus.
Software development teams depend on version control systems to
improve the product lifecycle delivery process. Git has become the fastestgrowing and most widely distributed version control system on the
market, and Atlassian’s Bitbucket has become one of the most popular
versions of Git for the always-on enterprise.
As a proven architecture for both local and remote clusters for Bitbucket,
Go2Group Geoclusters allows any company to benefit from a fully
supported global mirroring solution for Bitbucket. Geoclusters provide
redundant local Bitbucket mirrors for the best possible performance and
an additional level of availability protection for intellectual property.
With this architecture’s ability to span data centers around the world,
distances over 100 kilometers are made viable. Geoclusters build on Git’s
native mirroring capability to provide local performance speeds at remote
sites, clustering to support continuous integration (CI) farms, and multiple
copies of critical source code as part of a comprehensive disaster recovery
(DR) solution.
In this paper we’ll discuss the use of Git as a version control system,
achieving Git resiliency, incorporating Go2Group Geoclusters, and
performance data.
Geoclustering Git Go2Group 3 GIT: the fastest growing version control system Version control systems are key for any organization that develops
software because software development is rarely a solitary effort. Modern
development requires large amounts of data, and there’s an ongoing
demand for developers to version all information required for the release
of a product. Git meets that demand.
Created in 2005 by Linus Torvalds, the father of Linux, Git is the fastest
growing version control system. As of May 2014, 42.9% of all software
developers used either Git or Github as their primary source control
system, according to the Eclipse Community Development Survey. As of
June 2015, GitHub has over 10,000,000 users. Thirty-three percent of
respondents to a 2014 Forrester Consulting enterprise survey indicated
that 60% or more of their code was currently stored and managed by Gitbased systems.
Inherent value Git provides access to local repositories for developers, giving them the
ability to make changes and branches locally. Since Git is inherently a
distributed version control system, developers may use it to work on a
shared project that requires a different workflow than that of a centralized
version control system. Often, separate repositories are used to model
more stable branches and whichever maintains the more stable repository
will pull completed work from those of the contributor.
While all distributed version control systems provide some degree of
disconnected operation, the major benefit of Git is its ability to work in an
environment where network connectivity is unreliable or unavailable. The
value of disconnected operation depends on how many of the developers
involved in a project are regularly working while disconnected, how
frequently they are doing so, and for how long. Successful businesses
seldom work in single-site isolation. Forrester Research describes this as
the “extended enterprise” where employees are expected to perform their
jobs anytime and anywhere.
The more a project involves being disconnected, the more value a Git
system provides. Of course, the ability to work while disconnected is not
the only benefit of having a local repository. Congestion frequently occurs
within central repositories, often when working on an especially large
project, during integration. In this case, speed of operation depends on
how many people are trying to integrate at the same time, the number of
conflicts, and the strength of the control system’s merging capabilities.
Geoclustering Git Go2Group 4 However, when the time comes to share work in an enterprise setting, all
changes must eventually flow back into the central repository.
Challenges If a developer is working in India and the master repository is in
California, for example, every push suffers from latency due to network
delays. And since Git doesn’t offer any out-of-the-box mirroring
capabilities, even read operations, like clones and pulls, can be slow. The
same problems hold true for a master repository that is supporting a large
number of concurrent users.
While Git is distributed and doesn’t have the vulnerabilities of centralized
version control systems, it still has some shortcomings, including:
• Access control: In order to cater to geographically dispersed teams, Git
allows access to all parts of a company’s source code. It can
authenticate, but local mirrors do not automatically apply Bitbucket’s
authorization rules. That is, it allows users to verify who they claim to
be but has no way to ensure that those users have the right to access
something.
• Backup and recovery: Procedures in Git must discover and account for
all important repositories. All distributed version control systems
require a comprehensive backup/recovery system to avoid outages if
the central repository goes down. The centralized master repository
feeds the build automation, code review, and other ALM systems.
• Centralized usage: While Git allows for great freedom of use of local
branches and repositories, a central repository is still the focal point of
collaboration. A centralized model poses performance bottlenecks for
remote teams and scalability bottlenecks for larger sites.
Geoclustering Git Go2Group 5 Achieving enterprise-­class resiliency with Git Forrester describes continuous availability as those times when high
availability and disaster recovery are at the point of being one and the
same. An easy, automated process simplifies disaster recovery, reduces
administration and application recovery times, facilitates business
continuity, and minimizes user impact.
As part of a comprehensive layered availability strategy, enterprises
choose to rely on replicated data kept up to date in near real-time.
Clustering architectures While there are many components required to achieve continuous
availability, the most appropriate technology for the always-on, alwaysavailable Bitbucket is a clustered architecture. This solution provides
multiple redundant copies of critical data, with either centralized or
independent management of related metadata.
Clusters were first devised over 50 years ago, when it was first realized
that work could no longer be made to fit on a single computer. Clusters
are defined as a set of servers viewed as a single system that, together,
provide a more available and scalable platform for hosting applications.
With clustering, work can be done in parallel. The goal of a cluster is to
pool the resources of several servers while achieving high availability and
sustained performance. As distributed solutions, they are often harder to
set up and maintain than their centralized counterparts. However, they
offer more resilience to failure and allow systems to grow beyond the
capacity of a single server.
How does a clustered architecture augment distributed version control
systems with regard to distance, latency, and the degree of protection?
All clustered servers participate equally in servicing user requests and
other processing, so the read load (typically 90% or more of the load on a
Git server) is evenly balanced and distributed among the servers. If one
server goes down, failover to the other servers happens automatically
without manual intervention, typically within seconds. When a new
server joins (or an existing one rejoins) the cluster it begins to service user
requests and other processing automatically, as soon as it comes online.
Geoclustering Git Go2Group 6 Types of clustering Examples of high availability clusters1, their attributes, and geographies
include:
• Local cluster: A single set of servers located at one data center of
location. Network latency can be neglected. Data is accessed
synchronously by all servers.
• Metro cluster: A set of servers placed within a “metro” distance
(generally up to 50 kilometers) with all sites connected by fiber.
Network latency is usually low (<5 ms for distances of approximately
20 miles). Data is frequently replicated, either with mirroring or
synchronous replication.
• Geocluster: Multiple geographically dispersed sites, each with a local
cluster, that are thousands of kilometers apart. The sites communicate
via IP. Geoclustering keeps multiple instances of servers, so it doubles
as high availability redundancy while also offering performance
benefits, since it’s local to each team. Geoclusters need to cope with
1
Clusters of this kind have been
referred to by many names, including
local clusters, campus clusters, metro clusters, geo clusters, stretched
clusters, and extended clusters.
Geoclustering Git Go2Group 7 limited network bandwidth and high latency. Data is replicated
asynchronously. Most of the servers are not local and are set up with
some distance between them.
In geographies where systems are too far apart, communication must be
done asynchronously between multiples sites. When specifying a solution
for dispersed geographies, considerations need to include:
• how to make sure that a cluster is up and running
• how to make sure that resources are only started once
• how to manage failover between sites
• how to deal with high latency in the event that resources need to be
stopped
• how to ensure a workload will be restarted on another cluster in a far
removed location in the event of a catastrophe
Enterprise Git: Atlassian’s Bitbucket and Bitbucket Data Center As with all types of software, there are many flavors of Git. The different
types of Git used by developers are: Atlassian Bitbucket, Collabnet
TeamForge, GitHub Enterprise, GitLab, and Wandisco Git Multsite.
Atlassian Bitbucket was released in 2012. It is a development tool that
serves as Atlassian’s Git repository management tool for enterprise teams.
It allows for everyone in an organization to easily collaborate on Git
repositories.
Atlassian released Bitbucket Data Center in 2014, in an effort toward
further scalability and resiliency.. Bitbucket Data Center was introduced
with enterprise workloads in mind. Furthermore, Atlassian integrated two
of its own tools into the Bitbucket Data Center service to speed the
development process: the JIRA bug tracking software and the
Bamboo continuous integration software for quickly testing new versions
of a program.
Bitbucket’s geographic limitations Atlassian’s Bitbucket Data Center popularized the concept of highavailability Git through its active/active cluster configuration, but it is
designed for clustered servers in a single data center, not for multiple
sites. Since, like all Git solutions, Bitbucket encourages a distributed
developer enterprise environment, remote sites suffering from high
network latency during Git operations may perform slowly.
Geoclustering Git Go2Group 8 Because modern-day developers are often geographically remote,
committed code is frequently moved from one repository to another. Git
works well in a local environment that has integrated development on the
same location and network. However, it does not work well for
distributed development teams spread across various locations.
The pressing questions for many code developers is “What are my
requirements for code availability, accessibility, and geographies?”
Geoclustering Git Go2Group 9 Go2Group’s Geoclusters As the only ALM-specific geoclustering solution, Go2Group Geoclusters
lets developers create clusters at any distance to maintain business
continuity. Performance and availability for read-only operations take a
significant leap forward, while all operations benefit from Bitbucket’s
authorization rules. Servers can be in different buildings or different
continents.
Overview Go2Group Geoclusters for Bitbucket allows teams in remote locations to
share the same code base as the local teams working on a project, while
limiting latency and bandwidth issues and staying current with the
updated code base by using geo synchronization. Prior to Geoclusters,
remote offices had a difficult time receiving the most up-to-date code and
dealing with resource-draining bandwidth requests from remote Git
servers. They also had a tough time supporting agile development and
testing, known as Git branch builds. Now developers can seamlessly
connect their remote teams worldwide, as if they were all in the same
location.
Geoclusters involve the use of multiple redundant computing resources
located in different geographical locations to form what appears to be a
single, highly-available system. The biggest challenge in geoclustering is
to make sure that system states and their associated data are concurrent at
multiple locations.
Geoclustering Git Go2Group 10 Go2Group Geoclusters overview
Synchronous replication from Bitbucket to the Geocluster nodes serves as
an always-on backup, eliminating the need for conventional disk
mirroring solutions that only work over a LAN. New changes are pushed
to each mirror as they arrive and monitoring tools provide up-to-date
status of all mirrors.
Data Flow The diagrams below show the data flow for user read and write
operations.
Read operations using geoclusters
Geoclustering Git Go2Group 11 Write operations using geoclusters
The system automatically keeps the mirrors in sync as new updates arrive
at the central repository.
Data synchronization
Geoclustering Git Go2Group 12 Bitbucket high availability options
Atlassian Bitbucket
Atlassian Bitbucket Data Center Bitbucket with Go2Group Geoclusters # of sites
Single-­site Single-­site Multi-­site Bandwidth
High Medium Low Servers
Single Server Multiple Servers Multiple Servers Clustering
None Active-­Active Synchronous push Scalability
Zero High Scalability Highest Scalabiity Network Latency
Negligible Low High Replication
Synchronous Synchronous Synchronous Communications
None Network Connection Internet Protocol (IP) Distance
None Single Data Center Unlimited Overall Rating
Good Better Best Benefits of geoclustered architecture Go2Group Geoclusters enables continuous availability by employing both
architectural and design advantages over single-site solutions, including:
Protection through multiple redundant copies of repositories:
Go2Group Geoclustering works with Atlassian Bitbucket to improve
recovery time objectives (RTO) by making multiple copies of valuable
repositories available at several locations.
Rapid recovery:
Should one mirror site fail due to an event like a flood, Geoclusters can
route all work to another site, which can take over the processing with
nearly no interruption for connected users. Each mirror is periodically
verified to make sure that it is consistent with the central repository.
Geoclustering Git Go2Group 13 Full utilization of resources:
Geoclusters’ ability to distribute read activity across all servers, including
running a single workload across the whole cluster, allows the greatest
flexibility in terms of resources. Since Geoclusters uses one physical
database across the distance, there is neither a lag in data freshness nor
any requirement for implementing conflict schemes.
Simplicity in setup, managing, and monitoring:
Metrics, verification, and system health for each site’s status are presented
in an intuitive graphical interface.
Bandwidth efficiency in the WAN and improved remote site
performance:
Bandwidth is free in the LAN but not in the WAN. With Geoclusters,
remote WAN users experience the same LAN-speed read performance as
local users. This is done by maintaining the equivalent of a single copy of
the data across the system. Checkouts and other read operations are
always local, so no WAN traffic is generated.
Geoclustering Git Go2Group 14 Performance data The following tests performed by Go2Group were over distances of zero
(local), 50, and 100 kilometers.
The following graph shows the overall performance impact on Bitbucket
due to distance measured as a percentage of local performance.
Note: Write-intensive Bitbucket is generally more affected by distance
then read-intensive Bitbucket.
Given these numbers, it can be concluded that Atlassian Bitbucket Data
Center performs acceptably in general at distances under 50 kilometers all
the way up to 100 kilometers. When distances exceed 100 kilometers,
Go2Group Geoclusters for Bitbucket improves on Atlassian’s Bitbucket
Data Center.
Geoclustering Git Go2Group 15 Conclusion Distance can have a huge effect on performance, so keeping the distance
short and using dedicated, direct attached networks is optimal, but not
always possible.
Go2Group Geoclustering for Bitbucket is an attractive architecture that
allows scalability, rapid availability, and even partial disaster recovery
protection. Compared to an Atlassian Bitbucket and Bitbucket Data Center
configuration, Go2Group’s geoclustering architecture provides the highest
level of availability for an Atlassian environment where developers must
have an always-on, always-available experience.
Geoclustering Git Go2Group 16 Contacting Go2Group Go2Group is a global provider of consulting services, third-party
application integrations, data migrations, software testing, and training
services in Application Lifecycle Management (ALM) systems. We’ve
implemented thousands of enterprise-level migrations. We specialize in
complex, multi-platform, ALM integration projects.
Our goal: Make it easy. Our clients say, "We feel
like you are part of our team.”
An Enterprise and Platinum Atlassian Expert, we offer a full suite of
services for all Atlassian products and are the world’s largest reseller of
Atlassian tools. We’re certified partners for the best-of-breed ALM
solutions, including Atlassian, HP, IBM, Microsoft, Perforce.
We specialize in integrating ALM tools such as Atlassian, HP,
IBM, Microsoft, Perforce, ServiceNow, and many more: Users work in the
tools they prefer, the data is synchronized automatically.
Go2Group and GSA Products and services from Go2Group and its partners, including
Atlassian, Microsoft, and Perforce, are available via GSA or several
GWACs and procurement vehicles. We’re expert in government policies
and strategies.
Contact
http://www.go2group.com/
Corporate Office, USA: 138 North Hickory Avenue, Bel Air, MD 21014
Hawaii: 7007 Hawaii Kai Drive, Suite C26, Honolulu, HI 96825
Japan: Le Premier Akihabara 11th Floor, 73 Kanda Neribei-cho, Chiyodaku, Tokyo 101-0022
China: Great Wall Computer Building A301, 38 Xueyuan Road, Haidian
District, Beijing 100083
Telephone: 877-442-4669 (U.S. toll free); +1-410-879-8102 (U.S.)
Email: [email protected]
Geoclustering Git Go2Group 17 Notice © 2015 Go2Group, Inc. All rights reserved. Subject to change without
notice. ConnectALL is a registered trademark of Go2Group, Inc. in the
U.S. and other countries. Bitbucket and Bitbucket Data Center are
registered trademarks of Atlassian. All other brands or products are
trademarks or registered trademarks of their respective holders and
should be treated as such. This white paper is for informational purposes
only. Go2Group makes no warranties, express, implied, or statutory, as to
the information in this document.
WP-G2G-1000
Geoclustering Git Go2Group 18