§ 1: Organising data in a stem-and-leaf display, drawing box-and-whisker diagrams and
Transcription
§ 1: Organising data in a stem-and-leaf display, drawing box-and-whisker diagrams and
§ 1: Organising data in a stem-and-leaf display, drawing box-and-whisker diagrams and working with measures of dispersion around the median The material in this workshop covers some of the aspects of the following Core Assessment Standards: A S 10.4.1 (a) Collect, organise and interpret univariate numerical data in order to determine: • Measures of central tendency (mean, median and mode) of ungrouped data, and know which is the most appropriate under given conditions • Measures of dispersion: range, quartile, interquartile range A S 11.4.1 (a) Calculate and represent measures of central tendency and dispersion in univariate numerical data using: • The five number summary (maximum, minimum and quartiles) • Box and whisker diagrams MEASURES OF CENTRAL TENDENCY Suppose you have two sets of data. It is often difficult to compare these two sets of data by just looking at them. However, if we had a single value to represent each of these sets of data, the two sets would be easier to compare. We call this single value the average, or the MEASURE OF CENTRAL TENDENCY. The measure of central tendency gives you an impression of the data without you having to look at all the data items; it shows what is typical in the set of data. There are several different types of averages or measures of central tendency used in statistics. The three most common ones are: • The mean – the equal shares average; • The median – the middle value; • The mode – the value that occurs most often. It depends on what you want to look at in your data as to which one of these averages you use. Their use depends on the sort of information you need your data to show. An activity will help explain these terms. You will work out the mean, the median and the mode of some data sets. Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling Activity 1 Thandi’s test results at the end of grade 10 are as follows: English 63%; Geography 63%; History 25%; Technology 37% Maths 57%; Zulu 77% Biology 31%; Life Orientation - no exam mark Mean When you work out the average of Thandi’s school marks you are actually finding the arithmetic mean. What you are doing is finding the total of all the marks and then sharing them equally amongst the 7 subjects. In general the mean is denoted by x (pronounced x bar) To calculate the mean you use a formula: Mean = total or sum of all items ∑ x = number of items n 1) Calculate (correct to one decimal place) the mean of Thandi’s marks 2) Write the marks in the table from lowest to highest – i.e. rank the marks Subject Mark Median The median is the middle score or item when the set of data is arranged in order from smallest to biggest. The median is not affected by extremes. 3) You wrote Thandi’s marks in the table in ascending order. Counting from each end, which mark is in the middle? Which subject is it? Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 2 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling Mode The mode is the score or item that occurs most often. It is usually very easy to find, especially from a frequency table. 4) Study the list of marks in ascending order. Which mark occurs most often? Which subjects does this mark relate to? 5) Which of the three averages you think is the most useful in informing you about how well Thandi is doing at school? Give a reason for your answer. ORGANISING DATA USING A STEM-AND-LEAF DIAGRAM In the 1960s John Tukey, an American mathematician and statistician, devised a new way of organising data. He called this method a stem-and leaf diagram. In a stem-and-leaf diagram the data is listed in intervals that depend on the place value of the digits of each data item. Example: Suppose the 40 learners in your maths class scored the following marks in a maths test: 32 45 36 53 ; ; ; ; 56 44 57 57 ; ; ; ; 45 52 55 56 ; ; ; ; 78 47 47 55 ; ; ; ; 77 50 33 71 ; ; ; ; 59 52 39 63 ; ; ; ; 65 51 66 62 ; ; ; ; 54 40 61 65 ; ; ; ; 54 69 48 58 ; ; ; ; 39 72 45 55 ; ; ; ; This list of numbers has little meaning as it is. • One way of organising the numbers would be to rearrange them in descending or ascending order. • Another way would be to draw a stem-and-leaf diagram - the tens digit forms the stem and the units digits the leaves. 32 ; 56 ; The first number is 32: The stem is 3 and the leaf is 2 The second number is 56: stem 3 leaves The stem is 5 and the leaf is 6 2 4 5 6 6 7 Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 3 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling Note: • The leaf is the ‘units’ digit – i.e. furthest to the right in the number. • The stem is the ‘tens’ digit – i.e. furthest to the left in the number. • If the number includes ‘hundreds’ and ‘thousands’ digits then the stem includes these digits as well. So if the list of numbers included 120 ; 134 ; 127, then 12 and 13 would be the stems and 0 ; 4 and 7 would be leaves. 12 0 7 13 4 • If the list of numbers includes a single digit number then the stem must be 0. So if the list of numbers includes the numbers 2 ; 3 ; 7 you write them as 02 ; 03 ; 07. The stem is 0 and the leaves are 2 ; 3 and 7. 0 2 3 7 • • • • Be careful how you write the ‘leaves’. Write each row underneath the ‘leaves’ in the row above (squared paper will help you do this.) When you have entered the stem and leaves on to the display, redraw the display with the leaves written in ascending order. This makes it easier to read. It is easy to find the median from a stem-and-leaf diagram. You can just count the leaves (i.e. each data item). Sometimes two data sets can be written as displays on either side of the same stem. The list of maths marks on the previous page looks like this as a stem-and-leaf diagram: stem 3 4 5 6 7 2 0 0 1 1 3 4 1 2 2 6 5 2 3 7 9 5 2 5 8 leaves 9 5 7 7 8 3 4 4 5 5 5 6 6 7 7 8 9 5 6 9 Notice that the stem-and-leaf display looks like a horizontal histogram. Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 4 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling Activity 2 Thirty learners were asked in a survey to say how many hours in a week they spent watching TV. Their answers (correct to the nearest hour) are as follows: 12 20 13 15 22 3 6 24 20 15 9 12 5 6 8 30 7 12 14 25 2 6 12 20 20 18 3 18 8 9 1) Draw a stem-and-leaf display to organise this data. 2) Find a. the mode of the data b. the median of the data SOLUTION Step 1: Collect the data on the table Stem Leaves Step 2: Arrange the leaves is ascending order Stem Leaves KEY: 2/5 = 25 Step 3: Find the mode and the median of the data Mode = …………………………………………………….. Median = …………………………………………………... Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 5 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling RANGE • When looking at data, measures of central tendency (or averages) give a general idea of the group that they represent. • However, taken by themselves, averages do not give a complete picture. • To get a better idea of the data it is necessary to know how the rest of the data is grouped around the average – whether it is closely grouped or scattered more widely. We need a measure of the spread, scattering or dispersion of the scores. The range is the simplest measure of spread. It is the difference between the largest and smallest values in the data. The range gives an indication of how much the values of data vary. Range = highest value – lowest value 200 cm Two sets of data may have the same mean but may be very different. 150 cm 150 cm 100 cm The range can help you compare them. Mean height = 150 cm Range = 0 cm Mean height = 150 cm Range = 100 cm Note: • The bigger the range the more spread out the data. However note that the range only uses the highest and the lowest data item, it does not take into account any other data value. Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 6 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling QUARTILES In many experimental situations we tend to distrust extreme measures as they may have resulted from poor measurement or behaviour that is not usual. As a result, extreme measures are often discarded. For this reason the middle section of a data set, where most of the data lies, gives you the best description of the data. The median divides the distribution of a data set into two halves. Each half can them be divided in half again; ○ the lower quartile (Q 1 ) is the median of the first half of the data set ○ the upper quartile (Q 3 ) is the median of the second half of the data set. Quartiles therefore divide the distribution into four equal parts. Set of data items divided into 4 equal parts: Median (M) Lower quartile (Q1) • • • Upper quartile (Q3) The lower quartile (Q1) is a quarter of the way through the distribution, The middle quartile which is the same as the median (M) is midway through the distribution. The upper quartile (Q3) is three quarters of the way through the distribution. Activity 3 1) For each of the following sets of data, find the median, the lower quartile, Q1 and the upper quartile, Q3 a) 3 4 5 6 6 7 8 9 9 10 11 Median = …………….. Q1 = ………………… b) 20 20 22 24 Q3 = …………………. 28 29 30 35 36 36 Median = …………….. Q1 = ………………… c) 12 12 12 14 Q3 = …………………. 15 16 17 18 18 20 21 25 Median = …………….. Q1 = ………………… Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 Q3 = …………………. 7 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling 2) Two sets of data are given below. Set 1: 57 ; 51 ; 60 ; 52 ; 63 ; 45 ; 51 ; 57 ; 72 ; 48 ; 66 ; 49 ; 73 ; 64 ; 67 ; 52 ; 55 ; 80 Set 2: 37 ; 90 ; 68 ; 38 ; 49 ; 95 ; 27 ; 79 ; 87 ; 19 ; 59 ; 76 ; 42 ; 48 ; 73 ; 85 ; 40 ; 50 a) Draw a back-to-back stem-and-leaf diagram for each data set. Leaves for Set 1 Stem 1 Leaves for Set 2 2 3 4 5 6 7 8 9 Leaves for Set 1 Stem 1 Leaves for Set 2 2 3 4 5 6 7 8 9 KEY: 8/0 = 80 b) Find the median, lower quartile and upper quartile for each data set Set 1 Set 2 Median = …………………………. Median = …………………………. Q1 = ……………………………….. Q1 = ……………………………….. Q3 = ……………………………….. Q3 = ………………………………... Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 8 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling FIVE-NUMBER SUMMARIES The median and the upper and lower quartiles tell you about the middle of a distribution but not about the extremes. If you include the minimum and maximum values, along with the median and quartiles, you get the five-number summary of a set of data. The five-number summary for a set of values consists of 1. Minimum: the smallest value in the set of data 2. Lower quartile (Q1): the median of the lowest half of the values 3. Median (M): the value that divides the data into halves 4. Upper quartile (Q3): the median of the upper half of the values 5. Maximum: the largest value in the set of data With a stem-and-leaf diagram it is very easy to COUNT the data items to find the quartiles and the median; the minimum and maximum can be read from the beginning and end of the display. Example: Eighteen numbers were listed on a stem and leaf plot as follows (n = 18) Q 1 = 30 Q 3 = 42 Stem 1 2 3 4 5 6 7 KEY: The five-number summary is: 1 0 0 0 0 Leaves 2 5 0 0 2 5|9 0 0 2 5 8 Median lies th between 9 and th 10 data item 0 3/5 = 35 Minimum = 11 Q1 = 30 M= 35 + 39 2 = 742 = 37 Q3 = 42 Maximum = 70 Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 9 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling Activity 4 The following stem and leaf diagram shows the average lifespan of different mammals. Stem Leaves 0 13455567788 1 00022222222255555556 2 00005 3 5 4 1 KEY: 2/0 = 20 List the five-number summary Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 10 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling BOX AND WHISKER DIAGRAMS A box-and-whisker diagram is a way of representing data that shows the median, the quartiles and the maximum and minimum values of the data set. This way of displaying data was invented by John Tukey in the 1960s. To draw the diagram 1. Draw a number line, starting at the minimum value of your data and ending at the maximum value of your data. 2. Draw a line at the median (M). 3. Draw lines at the lower and upper quartiles (Q1 and Q3). 4. Connect the lower and the upper quartiles to form a box. The median will fall somewhere within the box. 5. Plot a point at the minimum value and join this to the box – the left hand whisker. 6. Plot a point at the maximum value and join this to the box – the right hand whisker. Example A box-and-whisker diagram can be used to display the following data: 1, 5, 7, 8, 8, 14, 17 STEP 1: Work out the five number summary: Median = 8 Q1 = 5 Q3 = 14 Minimum = 1 Maximum = 17 STEP 2: Draw the box and whisker diagram 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Note: • The lines for the first and third quartiles are joined to make a box corresponding to the central 50% of the data items, • The whiskers are taken out from the box to the maximum and minimum values. • Remember that the quartiles and the median divide the data items into four equal groups. Each section has the same number of data items in it. However the boxes and the whiskers may be of varying lengths as these are influenced by the actual values of the data items. Some box and whisker diagrams are symmetrical (showing a very even spread of data) others might be more squashed up (showing skewed data). Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 11 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling Activity 5 1) 30 students were asked at random to pick a number between 0 and 20. Here are the results: 12 ; 7 ; 8 ; 3 ; 5 ; 7 ; 10 ; 13 ; 7 ; 10 ; 2 ; 1; 11 ; 12 ; 17 4 ; 11 ; 7 ; 6 ; 8 ; 4 ; `7 ; 11 ; 9 ; 1 ; 12 ; 10 ; 12 ; 2 ; 15 a) Draw a stem-and-leaf display of the data and find the five-number summary b) Draw a box and whisker diagram to illustrate the information 2) Which data set matches this box diagram? (More than one answer may be correct) 23 a) b) c) d) 23 23 23 23 25 ; ; ; ; 25 23 27 27 ; ; ; ; 27 26 24 28 28 ; ; ; ; 28 25 28 28 29 ; ; ; ; 28 26 33 29 ; ; ; ; 31 33 35 37 39 41 43 28 ; 28 ; 30 ; 31 ; 33 ; 41 ; 43 27 ; 29 ; 30 ; 31 ; 33 ; 41 ; 43 43 32 ; 43 Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 12 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling THE INTERQUARTILE RANGE • You know that the range of a data set looks at the maximum and the minimum values. • The measure of spread that relates to the middle section of the data is called the interquartile range (IQR). Approximately 50 % of the data items lie within the inter quartile range. The difference between the upper and lower quartiles is called the interquartile range The Interquartile Range = Upper Quartile – Lower Quartile IQR = Q3 – Q1 Example : Find the interquartile range of the following set of scores 12, 12, 3, 8, 9, 9, 5, 6, 3, 4, 5, 11 STEP 1: Rank the scores 3 3 4 5 5 6 7 9 9 11 12 12 STEP 2: Find the median and the quartiles Median = 6 +2 7 = 132 = 6,5 Q1 = 4+5 2 Q3 = 9 + 11 2 3 Note: the median is not a data item = 92 = 4,5 = 202 = 10 3 4 5 5 6 Q1 7 9 9 M 11 12 12 Q3 The IQR shows that approximately ½ of the learners lie within a range of 5,5 marks across the median STEP 3: Calculate the IQR: IQR = Q3 - Q1 = 10 – 4,5 = 5,5 A box and whisker diagram of this data would be: IQR 3 4 5 6 7 8 9 10 11 12 Note: On a box and whisker diagram the ‘box’ represents the IQR. Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 13 §1: Stem-and leaf display; box-and whisker; interquartile range FET Data Handling Activity 6 The following set of marked out of 60: 54 52 31 41 29 8 39 32 34 38 28 40 9 30 47 18 36 39 32 38 34 49 25 38 marks were obtained by 80 learners for a maths exam that was 47 20 37 19 29 23 13 24 24 46 43 43 8 58 27 27 36 32 32 33 13 34 32 48 27 27 23 52 12 35 37 36 15 31 33 37 33 47 23 45 44 33 31 38 35 16 24 18 26 57 21 39 48 21 37 8 1) Draw up a stem-and-leaf diagram to organise the data. 2) Use the display to find a) the median mark …………………………………………………………………………. b) the upper and lower quartiles …………………………………………………………… c) the IQR………………………………………………………………………………………… 3) Draw a box and whisker diagram to show the data 4) How many learners passed if the pass mark is 40/60? ……………………………… 5) Between which marks did the middle 50% of the learners lie?.............................. Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 14