1 Raw Data is Ugly
Transcription
1 Raw Data is Ugly
1 Raw Data is Ugly Graphical representations of data always look pretty in newspapers, magazines and books. What you haven’t seen is the blood, sweat and tears that it sometimes takes to get those results. But you will! http://www.hockeywidgets.com/newblog/uploaded_images/graph2775937.jpg 1 http://seatgeek.com/blog/mlb/5-useful-charts-for-baseball-fans-mlbticket-prices-by-day-time http://skepticalsports.com/?p=791 2 The previous examples are informative graphical displays of data. They all started life as bland data sets. 3 2 Pie Charts Pie charts are a visual method for displaying the categories of a collection of data. The size of a slice of the pie chart is proportional to the percentage of data in that category. Example 1 Consider the distribution of hits over the career of Babe Ruth. We visually represent this data in a pie chart. We can label the slices with either the raw numbers of percentages for each category Babe Ruth Single Double Triple HR Out At Bats 1517 506 136 714 5526 Babe Ruth Statistics Pie chart with raw numbers Pie chart with percentages and wedge labels 4 Example 2 Here we display the number of games played, broken down by team, for Michael Jordan and Nolan Ryan. Michael Jordan Bulls Wizards Games Played 930 142 Nolan Ryan Mets Angels Astros Rangers Games Played 105 291 282 129 Games played by Team Pie chart of Michael Jordan basketball teams Pie chart of Nolan Ryan baseball teams 5 Example 3 You must be careful not to overload a pie chart with too much detail. The distribution of tiles at the start of a game of Scrabble is given below. This seems like a natural concept to represent with a pie chart. However, once you construct the pie chart for this distribution it is easy to see that too much information is contained in the picture. Letter Frequency Letter Frequency Letter Frequency a 9 j 1 s 4 b 2 k 1 t 6 c 2 l 4 u 4 d 4 m 2 v 2 e 12 n 6 w 2 f 2 o 8 x 1 g 3 p 2 y 2 h 2 q 1 z 1 i 9 r 6 blank 2 Frequencies of Scrabble tiles 3 Importing Data into Excel 2007 It is easy to use Excel 2007 to grab tables of data o¤ the web. Let’s say we wish to import the table of Passing statistics of Joe Namath from the pro-footballreference.com website. Joe Namath page at football-reference.com 6 In an Excel 2007 spreadsheet, select the location where you wish to place the table of data. Select the Data tab. Select From Web. Preparing to import data into Excel A new window will pop up. Type in the address of web page that contains the desired data. You can use this window to surf until you …nd your destination. Click the yellow arrow beside the desired table. Locating data on the web Click Import. Click Ok. Now you have the table of data in an Excel spreadsheet, ready for use. 7 Ready to use 4 Turning Data into Pie Charts This table includes a lot of statistics and we may only be concerned with certain variables. Let’s keep only the variables for Year, Games Played (G), Passes Completed (Cmp), Passes Attempted (Att), Yards (Yds), Touchdowns (TD) and Interceptions (Int). Rather than delete the individual pieces of data, we wish to delete the whole column. To remove the column B which contains Age data, right click B and select delete. You will see that this technique shifts all the data to the left so we are not left with empty columns. Repeat until we have only the desired variables remaining. After the removal of unwanted data Let’s construct a pie chart for the number of games Namath played per season. Highlight the years and games played column. Click Insert. Click Pie. 8 Making a pie chart Excel now asks for a style of pie chart. Let’s keep it simple and select the …rst option from the 2-D category. The initial picture Right now we have a colorful but highly uninformative picture. Let’s add labels to our pie slices. With the pie chart selected, click Layout. Click Data Labels. Click Best Fit. This will label each wedge its representative label. Click on the title and expand its description. 9 Adding info Excel provides lots of opportunities for customizing. We might wish to label the wedges with the percentage of the pie it occupies and or include the category name. Right click on any wedge in the pie and select Format Data Labels. Check the boxes for Category Name and Percentage. Our pie chart now has more information. Of course, we may run the risk of a visual that is too crowded to be attractive. Stylistic choices of the pie chart fall into the category of art rather than science. A …nished product Next let’s create a 3-D chart of touchdowns over the years for Joe Namath. The touchdown column is not beside the years column. It is not di¢ cult to select just those two columns. Highlight the years column. Hold the Ctrl key while highlighting the TD column. Release the Ctrl key and we have selected the two desired columns. Select the 3-D pie. Format the labels with just the legend and percentages. 10 A …nished 3-D pie graph Joe Namath played for one team in every season of his career. He never changed teams mid-season. Due to this, his career statistics are very easy to read and well organized. Namath played in a di¤erent era of sports. Players today frequently change teams in the middle of a season. Consider the career stats of Cli¤ Lee as of May 3, 2011. If we want to created a games played per season pic chart for Cli¤ Lee, we cannot simply import the data into Excel and create the pie chart. If we did so, we would have three pie wedges each for 2009 and 2010. We need to modify the data for our chart. 11 5 Exercises 1. Based on the following pie chart, which KSU student spent the most years playing professional major league baseball? KSU Owls in MLB 2. Import data from www.hockey-reference.com and construct a pie chart for the number of games played broken down by team for hockey great Wayne Gretzky. 3. Import data from www.basketball-reference.com/ and construct a pie chart for the number of games played broken down by team for the shortest man to play in the NBA, Tyrone "Muggsy" Bogues. 4. In week one of the 2010 NFL season, The Houston Texans beat the Indianapolis Colts 34 to 24. Rushing and receiving stats for the game appear below. Rushing Arian Foster Steve Slaton Matt Schaub Attempts Receiving Andre Johnson Jacoby Jones Kevin Walter Owen Daniels Arian Foster Attempts Yards 33 6 3 TD 231 29 -3 Yards 3 2 2 1 1 Lg 3 0 0 TD 33 29 29 9 7 42 13 -1 Lg 0 0 1 0 0 21 23 22 9 7 i. Construct a pie chart for yards passing for the …ve Texan receivers. ii. What di¢ culty should you encounter if you attempt to construct a pie chart for yards rushing? What does Excel do? 5. Go to en.wikipedia.org/wiki/Poker_probability and construct a pie chart of the frequencies of 5 card poker hands. 12 6. Up to and including the year 2010 create a pie chart by team for the number of championships won in the ABA. Visit www.basketball-reference.com/playo¤s/ for the data. 7. Up to and including the year 2010 create a pie chart by team for the number of championships won in the NBA. Visit www.basketball-reference.com/playo¤s/ for the data. 8. Some statistics on Baseball Hall of Fame pitcher Steve Carlton are given below. Steve Carlton Uniform Number Rookie Year Number of Teams Career Wins Cy Young Awards Complete Games 32 1965 6 329 4 254 Does it make sense to use a pie chart to represent this data? Quantitative data about Steve Carlton 13