Gabe Westmaas Rackspace @westmaas

Transcription

Gabe Westmaas Rackspace @westmaas
Monitoring OpenStack
at Scale
Gabe Westmaas Rackspace
@westmaas
What Scale?
Tens of Thousands of
Hosts
Hundreds of Thousands
of Instances
Why are we
monitoring?
Uptime
99.99% API Availability
99.9% Build Success Rate
100% Data Plane Availability
Performance
Build Times
API Latency
Capacity
Memory Utilization
Empty Hosts
IPv4 Addresses
Tools
nagios
Sampling
stacktach/ceilometer
E-mail
slinky*
*More on this later
graphite
Fixing the Cloud
Mean Time to Resolution
slinky
Performance and
Uptime Improvements
45% Build Time
Reduction
99.95% API Availability
99.5% Build Success Rate
99.99% Data Plane Availability
What’s next?
Thank You
https://gist.github.com/westmaas/7227895