1 Raw Data is Ugly

Transcription

1 Raw Data is Ugly
1
Raw Data is Ugly
Graphical representations of data always look pretty in newspapers, magazines
and books. What you haven’t seen is the blood, sweat and tears that it sometimes takes to get those results. But you will!
http://www.hockeywidgets.com/newblog/uploaded_images/graph2775937.jpg
1
http://seatgeek.com/blog/mlb/5-useful-charts-for-baseball-fans-mlbticket-prices-by-day-time
http://skepticalsports.com/?p=791
2
The previous examples are informative graphical displays of data. They all
started life as bland data sets.
3
2
Pie Charts
Pie charts are a visual method for displaying the categories of a collection of
data. The size of a slice of the pie chart is proportional to the percentage of
data in that category.
Example 1 Consider the distribution of hits over the career of Babe Ruth. We
visually represent this data in a pie chart. We can label the slices with either
the raw numbers of percentages for each category
Babe Ruth
Single
Double
Triple
HR
Out
At Bats
1517
506
136
714
5526
Babe Ruth
Statistics
Pie chart with raw numbers
Pie chart with percentages and wedge
labels
4
Example 2 Here we display the number of games played, broken down by team,
for Michael Jordan and Nolan Ryan.
Michael Jordan
Bulls
Wizards
Games Played
930
142
Nolan Ryan
Mets
Angels
Astros
Rangers
Games Played
105
291
282
129
Games played by Team
Pie chart of Michael Jordan
basketball teams
Pie chart of Nolan Ryan baseball
teams
5
Example 3 You must be careful not to overload a pie chart with too much
detail. The distribution of tiles at the start of a game of Scrabble is given below.
This seems like a natural concept to represent with a pie chart. However, once
you construct the pie chart for this distribution it is easy to see that too much
information is contained in the picture.
Letter Frequency Letter Frequency Letter Frequency
a
9
j
1
s
4
b
2
k
1
t
6
c
2
l
4
u
4
d
4
m
2
v
2
e
12
n
6
w
2
f
2
o
8
x
1
g
3
p
2
y
2
h
2
q
1
z
1
i
9
r
6
blank
2
Frequencies of Scrabble tiles
3
Importing Data into Excel 2007
It is easy to use Excel 2007 to grab tables of data o¤ the web. Let’s say we wish
to import the table of Passing statistics of Joe Namath from the pro-footballreference.com website.
Joe Namath page at
football-reference.com
6
In an Excel 2007 spreadsheet, select the location where you wish to place the
table of data. Select the Data tab. Select From Web.
Preparing to import data into Excel
A new window will pop up. Type in the address of web page that contains the
desired data. You can use this window to surf until you …nd your destination.
Click the yellow arrow beside the desired table.
Locating data on the web
Click Import. Click Ok. Now you have the table of data in an Excel spreadsheet, ready for use.
7
Ready to use
4
Turning Data into Pie Charts
This table includes a lot of statistics and we may only be concerned with certain
variables. Let’s keep only the variables for Year, Games Played (G), Passes
Completed (Cmp), Passes Attempted (Att), Yards (Yds), Touchdowns (TD)
and Interceptions (Int). Rather than delete the individual pieces of data, we
wish to delete the whole column. To remove the column B which contains Age
data, right click B and select delete. You will see that this technique shifts all
the data to the left so we are not left with empty columns. Repeat until we
have only the desired variables remaining.
After the removal
of unwanted data
Let’s construct a pie chart for the number of games Namath played per season.
Highlight the years and games played column. Click Insert. Click Pie.
8
Making a pie chart
Excel now asks for a style of pie chart. Let’s keep it simple and select the …rst
option from the 2-D category.
The initial picture
Right now we have a colorful but highly uninformative picture. Let’s add
labels to our pie slices. With the pie chart selected, click Layout. Click Data
Labels. Click Best Fit. This will label each wedge its representative label.
Click on the title and expand its description.
9
Adding info
Excel provides lots of opportunities for customizing. We might wish to label
the wedges with the percentage of the pie it occupies and or include the category
name. Right click on any wedge in the pie and select Format Data Labels.
Check the boxes for Category Name and Percentage. Our pie chart now
has more information. Of course, we may run the risk of a visual that is too
crowded to be attractive. Stylistic choices of the pie chart fall into the category
of art rather than science.
A …nished product
Next let’s create a 3-D chart of touchdowns over the years for Joe Namath.
The touchdown column is not beside the years column. It is not di¢ cult to
select just those two columns. Highlight the years column. Hold the Ctrl key
while highlighting the TD column. Release the Ctrl key and we have selected
the two desired columns. Select the 3-D pie. Format the labels with just the
legend and percentages.
10
A …nished 3-D pie graph
Joe Namath played for one team in every season of his career. He never
changed teams mid-season. Due to this, his career statistics are very easy to
read and well organized. Namath played in a di¤erent era of sports. Players
today frequently change teams in the middle of a season. Consider the career
stats of Cli¤ Lee as of May 3, 2011. If we want to created a games played per
season pic chart for Cli¤ Lee, we cannot simply import the data into Excel and
create the pie chart. If we did so, we would have three pie wedges each for 2009
and 2010. We need to modify the data for our chart.
11
5
Exercises
1. Based on the following pie chart, which KSU student spent the most years
playing professional major league baseball?
KSU Owls in MLB
2. Import data from www.hockey-reference.com and construct a pie chart for
the number of games played broken down by team for hockey great Wayne
Gretzky.
3. Import data from www.basketball-reference.com/ and construct a pie chart
for the number of games played broken down by team for the shortest man
to play in the NBA, Tyrone "Muggsy" Bogues.
4. In week one of the 2010 NFL season, The Houston Texans beat the Indianapolis Colts 34 to 24. Rushing and receiving stats for the game appear
below.
Rushing
Arian Foster
Steve Slaton
Matt Schaub
Attempts
Receiving
Andre Johnson
Jacoby Jones
Kevin Walter
Owen Daniels
Arian Foster
Attempts
Yards
33
6
3
TD
231
29
-3
Yards
3
2
2
1
1
Lg
3
0
0
TD
33
29
29
9
7
42
13
-1
Lg
0
0
1
0
0
21
23
22
9
7
i. Construct a pie chart for yards passing for the …ve Texan receivers.
ii. What di¢ culty should you encounter if you attempt to construct a pie
chart for yards rushing? What does Excel do?
5. Go to en.wikipedia.org/wiki/Poker_probability and construct a pie chart
of the frequencies of 5 card poker hands.
12
6. Up to and including the year 2010 create a pie chart by team for the number of championships won in the ABA. Visit www.basketball-reference.com/playo¤s/
for the data.
7. Up to and including the year 2010 create a pie chart by team for the number of championships won in the NBA. Visit www.basketball-reference.com/playo¤s/
for the data.
8. Some statistics on Baseball Hall of Fame pitcher Steve Carlton are given
below.
Steve Carlton
Uniform Number
Rookie Year
Number of Teams
Career Wins
Cy Young Awards
Complete Games
32
1965
6
329
4
254
Does it make sense to use a pie chart to represent this data?
Quantitative data about Steve Carlton
13