iaas Documentation Release 0.1.0 NorCAMS November 14, 2014

Transcription

iaas Documentation
Release 0.1.0
NorCAMS
November 14, 2014
Contents
1
Getting started
3
2
Installation
5
3
Design
3.1 Development hardware (draft) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
7
4
Development
9
5
Howtos
5.1 Build docs locally using Sphinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
11
6
About the project
6.1 What is NorCAMS anyways?
6.2 People . . . . . . . . . . . .
6.3 Tracking the project . . . . .
6.4 Meetings . . . . . . . . . . .
6.5 Project plan and description .
13
13
14
14
15
25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i
ii
iaas Documentation, Release 0.1.0
This is our current documentation.
Contents
1
2
Contents
CHAPTER 1
Getting started
3
4
Chapter 1. Getting started
CHAPTER 2
Installation
Documentation describing installation of the IaaS platform
5
6
Chapter 2. Installation
CHAPTER 3
Design
High-level documents describing the IaaS platform design
3.1 Development hardware (draft)
The project will deliver a geo-distributed iaas service across (at least) three locations. A key point is that each location
is built from the same hardware specification. This is done to simplify and limit influence of external variables as
much as possible while building the base platform.
The spec represents a minimal baseline for one site/location.
3.1.1 Networking
4x L3 switches
• Will be connected as routed leaf-spine fabric (OSPF)
• Each with at least 48 ports 10gb SFP+ / 4 ports 40gb QSFP
• Swithces that support ONIE/OCP preferred
1x L2 management switch
• 48 ports 1GbE, VLAN
• Remote management possible
48x 10GBase-SR SFP+ tranceivers
8x 40GBase-SR4 QSFP+ tranceivers
3.1.2 Servers
3x management nodes
• 1u 1x12 core with 128gb RAM
• 2x SFP+ 10gb and 2x 1gbE
• 2x SSD drives RAID1
• Room for more disks
• Redundant PSUs
7
3x compute nodes
• 1u 2x12 core with 512Gb RAM
• 2x SFP+ 10Gb and 2x 1GbE
• 2x SSD drives RAID1
• Room for more disks
• Redundant PSUs
5x storage nodes
• 2u 1x12 core with 128gb RAM
• 2x SFP+ 10Gb and 2x 1GbE
• 8x 3.5” 2tb SATA drives
• 4x 120gb SSD drives
• No RAID, only JBOD
• Room for more disks (12x 3.5” ?)
• Redundant PSUs
Comments
• Management and compute nodes could very well be the same chassis with different specs. Possibly even
higher density like half width would be considered, but not blade chassis (it would mean non-standard cabling/connectivity)
• Important key attribute for SSD drives is sequential write performance. SSDs might be PCIe connected.
• 2tb disks for storage nodes to speed up recovery times with Ceph
8
Chapter 3. Design
CHAPTER 4
Development
9
10
Chapter 4. Development
CHAPTER 5
Howtos
This is a collection of howtos and documentation bits with relevance to the project.
5.1 Build docs locally using Sphinx
This describes how to build the documentation from norcams/iaas locally
5.1.1 RHEL, CentOS, Fedora
You’ll need the python-virtualenvwrapper package from EPEL
sudo yum -y install python-virtualenvwrapper
# Restart shell
exit
# Make a virtual Python environment
# This env is placed in .virtualenv in $HOME
mkvirtualenv docs
# activate the docs virtualenv
workon docs
# install sphinx into it
pip install sphinx sphinx_rtd_theme
# Compile docs
cd iaas/docs
make html
# Open in modern internet browser of choice
xdg-open _build/html/index.html
# Deactivate the virtualenv
deactivate
11
12
Chapter 5. Howtos
CHAPTER 6
About the project
norcams/iaas is an open source effort focused around automating, documenting and delivering all parts of a complete,
Openstack-based production-quality infrastructure. This repository is our project handbook.
Infrastructure as code and ‘automation first’ are the main technical driving forces along with a general need for faster
and more efficient delivery of standardized, self-provisioned services among IT-departments in the Norwegian academic sector.
Development is funded by the participating entities by contributing employees and knowledge into a nationally distributed team of engineers.
Project goals are set, changed and validated within a formal project organization where management from all contributing entities are present. This project organization is named UH-sky and is coordinated by UNINETT, the Norwegian
NREN organization.
6.1 What is NorCAMS anyways?
It is nothing more than a name label, really. Or a lot more, if you choose for it to be. Pretty confusing, right?
To provide some background on it, when starting to collaborate between the universities it became apparent that we
needed a name of some sort to identify us and what we were trying to do. Key words where technology, collabration
and learning to continuosly improve.
In order to have something the NorCAMS name was invented and presented at a meetup in Tromsø early 2014. It is a
play on words created from the words Norwegian (or it could be Nordic?) and CAMS. CAMS is an acronym (we all
love them, right?) coined in 2010 by Damon Edwards and John Willis at the first US based Devopsdays. It stands for
Culture, Automation, Measurement and Sharing and has become a mantra for the devops community and concept.
NorCAMS is used as an identifier of the open source and collabration aspect of the formal UH-sky IaaS project. It is
useful in several ways, possibly mostly as a marker to show our ambition to be truly open. By not using a more offical
name we hope to not scare off anyone, thus maybe attracting contributors?
Some further references around CAMS and Devops for those interested
• John Willis, July 16, 2010 What Devops Means to Me (explaining CAMS)
• Patrick Debios started Devopsdays in 2009 with Devopsdays Ghent
• James Turnbull, Feb 2010 What DevOps means to me...
13
6.2 People
6.3 Tracking the project
Contents
• Tracking the project
– Chat room
– Tasks and progress reporting
– Core team weekly schedule
* Daily status meeting
* Weekly planning meeting
– Project calendar
– Social sharing platform
6.3.1 Chat room
All members of the project are expected to join and follow our chat room while working. The chat room is used for
socializing, status updates, informal quick questions and coordinating various group efforts.
Commit messages from the most important git repositories we use are announced in the chat room automatically.
To start using the chat room connect to a IRC server on the Freenode network and join the #uh-sky room. Remember,
everything in the room is logged on the public internet at https://botbot.me/freenode/uh-sky/
6.3.2 Tasks and progress reporting
The project uses a Trello board for tasks and project planning. Core members are expected to add and manage cards
directly on the board. Tasks described on the cards should not be too complicated to solve, ideally we want cards to
flow through the board each day. If we do this correctly we get a low-cost, low-friction way of reporting progress and
status.
Divide and conquer seems like a good idea to try for this. If a card stays in the same column for a day, divide it and
try to get smaller parts of it to Done!
The Goals column is a bit special. This is where we put larger goals and milestones broken out from the project plan.
Goals move directly to the Done column once they are reached.
The board is public and available at https://trello.com/b/m7tD31zU/iaas To be able to comment on a card you’ll need
a Trello account. Most of the team members use a Google account as their login identity.
6.3.3 Core team weekly schedule
This table shows which days the core team members are available. Jan Ivar, Tor and Erlend are working full time.
14
Chapter 6. About the project
Name
Erlend
Hans-Henry
Hege
Jan Ivar
Marte
Mikael
Tor
Monday
1
1
0
1
0
0
1
Tuesday
1
1
0
1
0
1
1
Wednesday
1
1
1
1
0.5
1
1
Thursday
1
0
1
1
1
1
1
Friday
1
0
0
1
1
0
1
Daily status meeting
The core team has daily meetings at 09:30 every work day. These are short meetings meant to summarize what has
been worked on since yesterday, what will be done today and what blocks progress, if anything. Each team member is
expected to speak briefly about their own situation.
Daily meetings are held on Goolge Hangouts and published to the project calendar. They are also announced in the
chat room a few minutes before they start.
Weekly planning meeting
The weekly planning meeting where we discuss direction, milestones and general progress. This is the place for any
larger topics or issues involving the full team. To schedule a topic for this meeting project members make a card in
Trello and label it as Discussion.
The weekly status meeting is held on Google Hangouts and published to the project calendar.
6.3.4 Project calendar
Meetings and events are published to a public Google calendar. It is possible to read it as a webpage or subscribe to it
in ical format.
Right now you’ll need to use the webpage interface to find the Google Hangouts video links for each event. There is
a plan to update the event description field in the ical data with the Hangout URL by using this Python code but it has
not been done yet.
6.3.5 Social sharing platform
We have been using a NorCAMS Google Plus-community to share links of project related information for a while.
Anyone with relevant content is free to use this as a channel. We put up a web redirect to the community page to make
it easier to find, it is at http://plus.norcams.org
6.4 Meetings
6.4.1 Planning meeting
Every thursday at 13.00 we have a planning meeting. This is the main arena for discussing the project, choices we are
making and planning ahead.
6.4. Meetings
15
2014-10-16: Planning meeting
Contents
• 2014-10-16: Planning meeting
– 1. Defining a MVP
– 2. Ceph training
– 3. Openstack Summit
– 4. Meetings with potential partners
Present: Jan Ivar, Tor, Erlend, Hege, Hans-Henry, Mikael
1. Defining a MVP
Jan Ivar brought up some ideas around MVP as a concept. Discussed defining a MVP and what it means. We got so
far as to define a few general limitations and some actions around networking.
Outcome The current list of characteristics defining our MVP
• Networking
– 2x 10GB fiber (SR) to the core network at each location
– A single /24 public IPv4 subnet per site will be allocated for the infrastructure
• Feature limitations
– No redundancy for the Openstack services
– No central authN/Z
– No persistence (no booting from Cinder) for instances, we’ll use local disk on the compute nodes
Actions
• Write a short introduction about the plans for connectivity to the core network at each site. We will share this
with the local organizations so that they are aware of the plans. (Jan Ivar, Hege)
2. Ceph training
There’s a need for basic Ceph training as part of the project. At least one person from each org.
Outcome Using the online course is fine,
it has a curriculum and dates
http://www.inktank.com/university/ceph130/. Pricing is not too bad, about $1000 for two days.
posted
at
Jan Ivar suggests that Tor, Marte, Erlend and Mikael does the training, they’ll need to check if they are allowed and
report back. Tor is ready and might even do it already next week. For the rest of the group we are looking at November
19-20.
16
3. Openstack Summit
We discussed a little around what sessions we want to be focusing on. It might make sense to not have everybody go
to the same sessions :-) Jan Ivar reported he will focus on networking (IPv6!) and openstack maintenance and release
engineering (packaging).
There’s an Openstack Summit app for Android on Google Play.
4. Meetings with potential partners
The project is in a process where we are meeting with potential partners. Jan Ivar reported about the current status of
that and it looks like we’re headed in a good direction. Interest among vendors and external parties towards our project
is great.
Jan Ivar shared a presentation (mostly in Norwegian) he gave at one of the meetings.
Contents
– 1. MVP and networking
– 2. Illustrate and document cabling of the equipment
– 3. Keeping people busy the next weeks
– 4. Better ability to draw sketches during a meeting
Present: Jan Ivar, Mikael, Tor
6.4. Meetings
17
1. MVP and networking
Outcome
• Jump hosts ideally need to provide redundancy for access across locations.
• Ideally we’d want access across the infrastructure for both in- and out-of-band managmenet (switches and
servers).
• It was (again) suggested that we use private IP addressing for at least the out-of-band management, maybe also
the in-band management.
• Discussion is not finalized
Actions
• Identify IP segments (/24 with possibility for /23)
• Write a spec of how we want the public (service) IP segment routed across two fibers (static routes, how does
redundancy/failover work with this?)
2. Illustrate and document cabling of the equipment
We need to make a cabling illustration to show our current plan for connectivity.
Actions
• Jan Ivar, Tor and Raymond will make a draft next week. Hege will need to verify and ask questions :-)
18
3. Keeping people busy the next weeks
Jan Ivar and Tor are at Devopsdays Ghent next week, and a lot of people in and around the project will go to Paris for
Openstack Summit the week after.
We should write up some tasks and TODOs for winch to keep people busy when they feel ready to move on with
PUppet.
Actions
• Jan Ivar adds more open cards to Planned in Trello, focusing on winch and Puppet
4. Better ability to draw sketches during a meeting
When discussing we need a better ability to draw and sketch stuff during meetings.
Actions
• We tested a few solutions and https://awwapp.com seems like it might fit our needs. We’ll try using it over the
next few meetings.
Contents
– 1. Further definition of MVP
– 2. Puppet versus hostnames, certificates, global identifiers across sites
– 3. Project codebase versus winch?
– 4. Placement of shared OS images for cloud, testing
Present: Jan Ivar, Tor, Trond, Mikael, Hans-Henry, Erlend
1. Further definition of MVP
Hardware placement in racks, networks, jump hosts. We need to discuss these in more detail.
Outcome Jan Ivar has described (in a drawing) how to populate the rack at each site with the Dell hardware:
6.4. Meetings
19
20
The routers and management switches have reversed air flow for our convenience. This will make cabling work easier.
We discussed in-band vs out-of-band management for the routers (compared to servers, that is). The management
ports on the routers will be treated as in-band.
The first revision of the IaaS will be set up in the most simple manner in regard to physical failover for the infrastructure
hosts, but will be fully cabled up to provide for future improvement.
Actions
• Jan Ivar creates a detailed cabling chart for all the equipment. The equipment
is to be identically set up at each site
2. Puppet versus hostnames, certificates, global identifiers across sites
Discussion around DNS, public vs internal IP addresses, name standards, auto signing of puppet certificates. This
meeting was not ready to make any decisions.
Actions
• The next meeting will feature some written material on the topics to better
facilitate a good debate, and better preparations.
3. Project codebase versus winch?
Winch will continue to be a side project for testing, training and code development. We need to start working on a
new production codebase. It’s‘ not clear for now who can start contributing and when.
Actions
• Jan Ivar will create a skeleton for a new production codebase
• We will have a kickoff day for the new production codebase
• We’ll start iterating together on the codebase and growing it based on feature requests
4. Placement of shared OS images for cloud, testing
We need a web server where we kan place images produced by the project.
Actions
• Short term solution for now - we all have arbitrary web servers we can use
• We will create a fork of the bento project and integrate our own code
6.4.2 Technical steering group
As needed, meetings in the technical steering group is called. This group has a mandate granted by the UH-sky
leadership to govern the day-to-day coordination and management needs of the project. These meeting notes are
written in Norwegian.
6.4. Meetings
21
2014-11-10: Teknisk styringsgruppe
Contents
• 2014-11-10: Teknisk styringsgruppe
– 1. Gruppen sitt mandat
– 2. Møteform og frekvens
– 3. Rapportering fra prosjektet
– 4. Prosjektplanen og status
– 5. Partnerskap med leverandører
– 6. Pågående aktiviteter
– 7. Spørsmål
Tilstede: Kjetil Otter Olsen, Raymond Kristiansen, Ola Ervik, Per Markussen, Jan Ivar Beddari, Kristin Selvaag
1. Gruppen sitt mandat
• Er det tilstrekkelig og riktig definert?
• Er det noe som eventuelt mangler?
Utfall Det er generell tilslutning til mandatet. Spørsmål om det er naturlig at gruppen rapporterer til UH-sky styringsgruppa via programleder - gruppen synes ikke dette er noe problem. Programleder er best posisjonert til å ha oversikt
med tanke på hele UH-sky. Det er uformell kommunikasjon også i det daglige med ledernivået.
Gjøremål
• Jan Ivar publiserer gruppen sitt mandat sammen med referat fra første møte
2. Møteform og frekvens
• Jan Ivar ønsker ikke å lede møtene eller skrive referat. Det er viktig at man skiller rollene på en god nok måte.
• Frekvens for møtene?
• E-post som verktøy? Annet?
Utfall
• Kjetil leder møtet. Ved neste møte skrives ikke referat av Jan Ivar.
• Prosjektet kan kalle styringsgruppa inn til elektroniske møter hvis det er nødvendig, vi setter foreløpig ikke opp
noen fast møtefrekvens. Ved innkalling skal møte avholdes senest arbeidsuken etter at saken kommer opp.
• Torsdag 22. januar blir det et fysisk møte i Trondheim hos UNINETT.
• Vi ønsker å bruke Agora-plattformen for deling av tilgangsbegrensede dokumenter, i den grad det er nødvendig.
• Referatene fra teknisk styringsgruppe publiseres i prosjekthåndboka på samme sted som prosjektets ukentlige
planleggingsmøter.
22
Gjøremål
• Jan Ivar og Kristin sørger for tilgang til Agora. Vi kan muligens gjenbruke den gruppa som allerede eksisterer
for UH-sky.
• Jan Ivar publiserer referat fra dette møtet på https://iaas.readthedocs.org (under About the project, Meetings).
3. Rapportering fra prosjektet
• Frekvens og form for rapportering fra prosjektet til teknisk styringsgruppe?
• Detaljnivå? Hva kan vi forvente av de som deltar i teknisk styringsgruppe i forhold til tidsbruk?
Utfall Styringsgruppa ønsker å ha jevnlig innsyn i prosjektet. Referatene fra ukemøtene er egnet i så måte i forhold
til å se hva vi jobber med men de ser framover og har ikke så mye rapporteringsfokus.
Møtet ønsker en måndetlig skriftlig rapport som oppsummerer status i relasjon til målene i prosjektplanen. Denne
rapporten skal sendes på epost til teknisk styringsgruppe. Gruppen kommenterer på eventuelle mangler eller uklarheter
før den samme rapporten også postes på UH-sky sine websider.
På spørsmålet om forventet tidsbruk har vi ikke noe godt svar og ønsker å ta vurderingen per sak.
Gjøremål
• Jan Ivar planlegger en månedtlig rapport første uken av hver måned
• Jan Ivar skriver en første rapport som skal publiseres på UH-sky websida
4. Prosjektplanen og status
Jan Ivar rapporterer fra pågående aktiviteter og hvordan oppstarten har gått, spesielt i forhold til ressurser.
Utfall
• Vi sjekker status på leveransen av hardware, det vil ta noen uker før alt er levert ser det ut til. Jan Ivar er ikke
bekymret for framdrift likevel siden lærings og mestringsprosessen i gruppa er god. Vi er ikke hindret av mangel
på hardware før tidlig i desember.
• Vi er bevisst på vanskelighetene med delt stilling som deltakere i prosjektet lever med daglig. Tor Lædre (UIB)
og Erlend Midttun (NTNU) blir nøkkelpersoner i prosjektet siden de er 100% avgitt. Tor har etterhvert fått god
skjerming av sin arbeidstid og Ola Ervik rapporterer at dette tas alvorlig også for Erlend.
• Ressurser fra UNINETT skal avgis innen kort sikt, det er et møte på torsdag førstkommende. Gruppen vil gjerne
ha tilbakemelding fra møtet så snart en vet resultatet. Det er positivt at prosjektet ser ut til å bli tilført nødvendig
kompetanse.
• Det er lite trolig at UNINETT vil avgi totalt 100% slik de andre har gjort, på dette tidspunktet. Gruppa v/Kjetil
vil ta dette opp videre.
• Møte i prosjektet på torsdag vil jobbe videre med definisjon av minsteprodukt. Prosjektet jobber med en
metodikk som ligger nært opp til smidig programvareutvikling. Dette er nytt for noen men forståelsen for
tilnærminga ser ut til å etablere seg godt.
6.4. Meetings
23
Gjøremål
• Kjetil ønsker å ta saken om størrelsen på UNINETT sine avgitte ressurser med UH-sky styringsgruppa etter at
resultatet foreligger. Møtet har ingen kommentarer til dette.
• Jan Ivar vil følge opp rundt levering av hardware.
5. Partnerskap med leverandører
Rapportering fra møter med Dell og Red Hat rundt partnerskap i prosjektet.
Utfall
• Dell stiller kompetansen i sin interne Openstack-ekspertgruppe til vår rådighet. Vi får “24 timer hver 90. dag”
av Paul Brook sin tid dedikert til oss. Dette skal legges inn i prosjektets kalender.
• Tekst med pressemelding fra Dell skal sendes teknisk styringsgruppe. Dell skal komme med forslag til skriftlig
avtale om partnerskap.
• Noe forklarende diskusjon rundt Red Hat og produkt versus åpen kildekode. USIT sitt kundeforhold til Red Hat
er litt annerledes enn de andres.
Gjøremål
• Jan Ivar sender teksten fra Dell til alle for gjennomlesning.
6. Pågående aktiviteter
Rapportering fra pågående aktiviteter i prosjektet.
Utfall
• Definisjon av første minimumsprodukt - fokus er nå nettverk og fysisk koblingsskjema
• Ceph-opplæring, kurs? Flere ønsker dette.
• Puppet-prosjekt i norcams/winch læringsmiljø
7. Spørsmål
Eventuelt.
Utfall Ingen vesentlige spørsmål.
Møtet ble hevet kl 15:25
Teknisk styringsgruppe sitt mandat
Bakgrunn
Det er behov for å ta tekniske beslutninger underveis i prosjektet IaaS UH-sky. Styringsgruppen vil ikke være
tilstrekkelig operativ i forhold til å kunne ta tekniske beslutninger i prosjektet og nedsetter derfor en teknisk styringsgruppe for dette formålet.
24
Mandat
1. Gruppen skal ta tekniske beslutninger innenfor prosjektets mandat, mål, økonomiske og tidsmessige rammer.
2. Teknisk styringsgruppe rapporterer til styringsgruppen via programleder.
3. Alle beslutninger som vil medføre endringer i timeplan meldes styringsgruppen med begrunnelse og kan overprøves av denne. Beslutninger som endrer mål eller økonomiske rammer skal legges fram for styringsgruppen.
4. Teknisk styringsgruppe skal være en støtte for teknisk prosjektleder, bidra til tekniske avklaringer og aktivt bidra
til at prosjektet når sine mål.
5. Teknisk styringsgruppe skal bidra med forankring av tekniske vurderinger, løsninger og valg med de faglige
miljøene hos deltagerorganisasjonene.
6. Teknisk prosjektleder rapporterer til teknisk styringsgruppe på tekniske spørsmål. Øvrig rapportering i hht
prosjektets organisering.
7. Teknisk styringsgruppe skal melde vesentlige risikoer de ser til styringsgruppen
6.5 Project plan and description
UH-sky IaaS platform development
• Project plan and description
– Descriptive summary
* Limitations
* Prerequisites
– Project goals and success criterias
* 1. Develop, document and deliver a base IaaS platform
* 2. Integration of authentication and authorization
* 3. Further develop and verify services to cover ‘traditional workloads’
* 4. Research and suggest a solution for PaaS
* 5. Research and suggest possible SaaS servics
* 6. Research and specify a consumer-focused self-service portal
– Project milestones and scheduling
– Resources and budgeting
– Project organization and management
* Core development and engineering
* Technical steering group
* Top-level management and ownership
– Risks
– Appendix
* 1. Support for the Microsoft Windows operating system
* 2. Licensing of instances in the service
* 3. Calculating needed capacity for development
6.5.1 Descriptive summary
This document describes what the IaaS project will develop and deliver. The project aims to position IaaS as a common
building block and vessel for future IT infrastructure and services delivery in the academic sector.
6.5. Project plan and description
25
The main project activity is developing, documenting and delivering an open source IaaS platform ready for production
use by June 15th 2015.
Additional activites that expands and builds on top of this platform are described. These activites will need to be
researched, discussed and specified in greater detail before they can be put into action. The project plan sets the
earliest startup time for these activities to be February/March 2015.
The base IaaS platform will deliver these services:
• Compute
• Storage in 2 variants
– Block storage, accessible as virtual disks for compute instances
– Object storage, accessible over the network as an API
Limitations
• The project will not deliver traditional backup. A common definition of backup state that backup data must be
off-site, off-grid (e.g tape). A planned property of the storage system is to be able to select that an instance will
be replicated to another location.
• The additional activites described are dependent on the base IaaS platform.
• Initial success criterias for the additional activities are described but no cost estimates (resources, budget) are
given as part of this project plan.
Prerequisites
To be able to deliver the platform as described, on time, it is a requirement that the project get access to the needed
resources
• At least 3 people must work full-time (100%) with the main project activity
• No roles less than 50%
• If split roles are used, alternating blocks of at least 3 days continuous work hours must be with the project
The project will need at least 6 months from the Locations complete milestone to delivery of the platform. This means
that to deliver on time by 15th of June 2015 procurement of the needed hardware will need to be completed within
2014. If hardware is delayed until 2015, the final delivery date will be delayed the same amount of time, counting
from August 15th 2015, as June and July are not counted due to vacations. E.g, if Locations complete is reached in
February 2015 final delivery will be 15th of October 2015.
6.5.2 Project goals and success criterias
The project will deliver a base IaaS platform to form a buildling block for future IT infrastructure delivery in the
academic sector.
The project has defined the following activities:
1. Develop, document and deliver a base IaaS platform
2. Integration of authentication and authorization
3. Further develop and verify services to cover ‘traditional workloads’
4. Research and suggest a solution for PaaS
5. Research and suggest possible SaaS servics
26
6. Research and specify a consumer-focused self-service portal
Activities 1 and 2 have been passed by the UH-sky steering group in June 2014.
To describe the activities a format similar to user stories is used. The stories share a common set of definitions
service The base IaaS platform, including all services layered below
user A person within the academic sector (with an identity record in FEIDE) given rights to administer instances and
services on behalf of a tenant.
tenant An organization or unit within the Norwegian academic sector
administrator A person given responsibility and access to all the components of the service. This does not extend to
access rights to the resources of a tenant.
small instance A compute instance defined as 1 vCPU, 4GB RAM, 10GB storage
large instance A compute instance defined as 4 vCPU, 16GB RAM, 100GB storage
1. Develop, document and deliver a base IaaS platform
This is the main project activity.
• The service must deliver capacity for ~750 small instances or ~275 large instanecs with a total of 100tb accessible storage. This capacity should be equally divided across three geo-dispersed sites.
• The project must deliver a proof-of-concept PaaS solution able to offer three standardized development environments.
• The project must deliever proof-of-concept operation of at least one common service, in a SaaS-like model.
• The service must enable and document an expansion of the base platform to include (existing or new) HPC
environments and workloads
• The service must deliver data that can be used for billing tenants. The data delivered must be usable to identify
users, organizations and organization units.
• A user must be able to start an instance immediately after first login. The instance must be available within 60
seconds.
• A user must be able to create, update and delete instanes in the service from a graphical user interface in a
browser, using an API or by using command line tools.
• A user must be able to select if an instance should have a persistent boot volume or not.
• A user must be able to assign and use more storage as needed, within a quota. Billing of storage must be per
usage, not per quota.
• A user should be able to place or move an instance geographically across the available locations. The choice
should be possible to make according to the users need for redundancy, resilience, geographical distance or other
factors.
• A user should be able to choose that an instance is replicated to other locations automatically, thus potentially
increasing protection against service outages.
• A user must be given the ability to monitor service performance and quality continuously.
• An administrator must use two-factor authentication for any access to the service for systems management and
maintenance purposes.
• An administrator must be able to expand capacity, plan and execute infrastructure changes and fix errors in all
parts of the service by using version-controlled code and automation. This key point should cover all operational
tasks like discovery, deployment, maintenance, monitoring and troubleshooting.
27
2. Integration of authentication and authorization
• A user must be able to authenticate via FEIDE and be authorized as belonging to a tenant in the service
• Any FEIDE user passwords should NOT be stored in the service
Before the service can be used in a production scenario it is neccessary to integrate central authentication and authorization. Users in the service must be identified as belonging to an organizational entity with correct billing information.
This activity must research and document a model and solution that shows how user- and organization data from
FEIDE (and other sources) can be integrated to cover the needs of the service. The model must be detailed enough to
make it possible to estimate cost and resource constraints for the solution.
Limitations in the chosen solution and model must be described. Suggestions and cost estimates for more advanced
id/authN/authZ models, e.g users and billing across organizational boundaries, must be discussed. An analysis and
assessment of integration with the UNINETT project FEIDE Connect should be done as part of this.
3. Further develop and verify services to cover ‘traditional workloads’
The base IaaS platform is planned to be built using OpenStack, a framework for building modern scalable cloud-centric
infrastructure. Traditional enterprise workloads, defined as long-lived instances with critical data and state kept as part
of the boot filesystem, is not as easily integrated into this framework. We believe a lot of our potential users would
also like the service to cover this class of workloads.
This activity integrates a solution tailored for traditional workloads with the base IaaS platform. Openstack and its
service APIs are used to unify the solution so that the consumer side of the service is kept uniform. The solution
can make use of existing infrastructure at each site/location, possibly by utilizing existing excess capacity, or later by
expansion.
A key value proposition for this activity is to confirm and further develop the requirement that any solution, knowledge
and people working in the project are part of a shared pool of resources. Existing systems and available free capacity
vary greatly between locations but this must not prevent or stop all parties from participating.
Licensing is an important question that this activity must address.
4. Research and suggest a solution for PaaS
There is a definite interest in PaaS as a concept in our communities. Earlier discussions has revealed that it is very
likely we would want to deliver some form of PaaS solution on top of the IaaS platform. Today, from what we know,
only UNINETT and its internal Nova project has experience with PaaS as an environment.
This activity must research and suggest a form and model for a PaaS service delivered on top of the base IaaS platform.
The suggested solution must be described and cost must be estimated.
5. Research and suggest possible SaaS servics
Several of the common IT services in the sector are already today delivered in models that are close to SaaS. From our
UH-sky viewpoint it is natural to look at these services as possible future migrations to the IaaS platform. This activity
must actively approach the sector on multiple fronts to find use cases and needs that could possibly fit in a SaaS model.
Early examples of such services could be software used in labs or classrooms. Is SPSS as a service possible?
6. Research and specify a consumer-focused self-service portal
This activity will define goals to enable a uniform, consumer-focused, self-service portal for all IaaS, PaaS (SaaS?)
related services. A central point for consuming the services is needed.
28
Functional aspects we’d need solved are
• Chargeback. Automatically generated billing based on usage.
• Support for several cloud and virt providers, both private and public
• Possibility for migrating workloads/instances and data between different infrastructure providers
• Overview and monitoring of allocated resources across providers
There are several products today that cover most if not all of the functional aspects described. A central customerfocused portal should be developed using one of them as a base. A development project formed around this activity
will be only loosely coupled to the IaaS project but we think it would be beneficial to wait until the core functionality
of the IaaS platform is in place.
6.5.3 Project milestones and scheduling
The following describes planned progress and possible startup dates for the project activies
Activity
Startup activity 1 and 2
Minimum viable product. Per activity 1, one of three physical sites installed and running.
Locations complete. All sites up and running. No storage or instance uptime guaranteed.
Functionally complete. All functional goals completed and operative. No storage or instance uptime
guaranteed.
Incubation period. Pre-production tuning, testing and verification. Early customers given access.
Best effort storage consistency and instance uptime. Documenting any further development needed.
Project delivery. Activites 1, 2 delivered as described.
Date
June
2014
October
2014
December
2014
February
2015
Feb.-Jun.
2015
15.6.2015
6.5.4 Resources and budgeting
This part of the project plan is not public
6.5.5 Project organization and management
Core development and engineering
Day-to-day activties are led by technical project lead Jan Ivar Beddari. A weekly meeting for planning is held Thursday
at 13:00. Daily “morning meetings” to keep track of activites are held at 0930. Both these regular meetings are held
online using video conferencing.
Core development and engineering team
• Erlend Midttun, NTNU
• Tor Lædre, University of Bergen
• Mikael Dalsgard, University of Oslo
• Hege Trosvik, University of Oslo
• Hans-Henry Jakbosen, University of Tromsø
• Marte Karidatter Skadsem, University of Tromsø
29
Technical steering group
The project reports to a technical steering group with representatives from all the participating organizations. Its main
function is to coordinate commuication and solve issues that could possibly block progress in the project. This group
is given a mandate from the top level project management to specify its roles and functions.
Its members are
• Kjetil Otter Olsen, University of Oslo (group lead)
• Per Markussen, University of Tromsø
• Ola Ervik, NTNU Norwegian University of Science and Technology
• Raymond Kristiansen, University of Bergen
• Kristin Selvaag, UNINETT, UH-sky
• Jan Ivar Beddari, UH IaaS tech project lead
Top-level management and ownership
The UH-sky steering group represents the top level project management and project ownership. This group consists
of the IT Directors from the four larger universities and representatives from university colleges and UNINETT, the
Norwegian NREN organization.
• Håkon Alstad, IT Director, NTNU Norwegian University of Science and Technology
• Lars Oftedal, IT Director, University of Oslo
• Stig Ørsje, IT Director, University of Tromsø
• Tore Burheim, IT Director, University of Bergen
• Thor-Inge Næsset, IT Manager, NHH Norwegian School of Economics
• Vidar Solheim, IT Director, HiST Sør-Trøndelag University College
• Frode Gether-Rønning, Head of IT-dept., AHO The Oslo School of Architecture and Design
• Petter Kongshaug, CEO, UNINETT
• Tor Holmen, Deputy CEO, UNINETT
Meetings in the steering group are organized by the UNINETT UH-sky program manager, Kristin Selvaag.
6.5.6 Risks
• The hardware investments planned will have a lifetime of at least four years. Risks involved with the investment
is considered low. All aquired hardware will be usable to its full extent in the local organizations even if the
project fails.
• Delays in progress (3 months or more) due to lack of access to resources, non-foreseen technical or organizational complexities, or problems with coordinating efforts across the participants is very likely.
• Inaccuracies in cost estimates for harware (both current and futur) is not considered high. However, the project
does not estimate costs for production usage of the finished platform.
6.5.7 Appendix
Questions and additions for the goals and criterias
30
1. Support for the Microsoft Windows operating system
A basic Windows-based instance requires substantial capacity from the service when compared to a basic Linux-based
instance.
The project aims to support Windows instances in the best way possible. Testing done within the project will determine
what the technical solution will be. Windows will be tested in the service as large instances and performance will be
measured and compared to our existing virtualization infrastructures.
2. Licensing of instances in the service
The project will not handle or research licensing of instances in the service. Tenants must ensure that they are properly
licensed for all instances they create using the service. Microsoft and Red Hat are examples of vendors with software
products and operating systems that requires licensing.
In a future production service we recommend negotiating agreements with vendors for site licensing. This could
potentially be more cost effective than purchasing licenses per tenant or organization. The project has so far not
planned or set aside resources towards this.
3. Calculating needed capacity for development
Back-of-a-napkin assessment of development compute capacity
• Physical cores (non-hyperthreaded): 2x12 core, 3x nodes, 3x sites = 216 cores
• Virtual cores: 4x oversubscription = 864 vCPU, 3x oversubscription = 648 vCPU
• RAM, no oversubscription = 512 GB 3x nodes, 3x sites = 4608 GB raw capacity
Instances
• Small instances: 1 vCPU, ~6 GB RAM, 10 GB disk ~ 72 instances per compute node, 648 total (at 3x cpu
oversubscription)
• Large instances: 4 vCPU, ~24 GB RAM, 100 GB disk ~ 18 instanecs per compute node, 162 total (at 3x cpu
oversubscription)
31

iaas Documentation Release 0.1.0 NorCAMS November 14, 2014

Transcription

Similar documents

How to Secure Infrastructure Clouds with Trusted Computing Technologies Nicolae Paladi

happy hour - Crab Shell

Introduction to cloud computing, PDF - Distributed Systems

SA7 IaaS procurement

What is Cloud Computing?

OLE LYNGGAARD COPENHAGEN - Mod til vækst 2015.pptx

Heat-and-Serve Packaging

What is the cloud? FOSTER TECH TREK BOOTCAMP