§ 1: Organising data in a stem-and-leaf display, drawing box-and-whisker diagrams and

Transcription

§ 1: Organising data in a stem-and-leaf display, drawing box-and-whisker diagrams and
§ 1: Organising data in a stem-and-leaf display,
drawing box-and-whisker diagrams
and
working with measures of dispersion around the
median
The material in this workshop covers some of the aspects of the following Core
Assessment Standards:
A S 10.4.1 (a)
Collect, organise and interpret univariate numerical data in order to determine:
• Measures of central tendency (mean, median and mode) of ungrouped data, and
know which is the most appropriate under given conditions
• Measures of dispersion: range, quartile, interquartile range
A S 11.4.1 (a)
Calculate and represent measures of central tendency and dispersion in univariate
numerical data using:
• The five number summary (maximum, minimum and quartiles)
• Box and whisker diagrams
MEASURES OF CENTRAL TENDENCY
Suppose you have two sets of data. It is often difficult to compare these two sets of
data by just looking at them. However, if we had a single value to represent each of
these sets of data, the two sets would be easier to compare. We call this single
value the average, or the MEASURE OF CENTRAL TENDENCY.
The measure of central tendency gives you an impression of the data without you
having to look at all the data items; it shows what is typical in the set of data.
There are several different types of averages or measures of central tendency used
in statistics. The three most common ones are:
• The mean – the equal shares average;
• The median – the middle value;
• The mode – the value that occurs most often.
It depends on what you want to look at in your data as to which one of these
averages you use. Their use depends on the sort of information you need your data
to show.
An activity will help explain these terms. You will work out the mean, the median
and the mode of some data sets.
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
Activity 1
Thandi’s test results at the end of grade 10 are as follows:
English 63%;
Geography 63%;
History 25%;
Technology 37%
Maths 57%;
Zulu 77%
Biology 31%;
Life Orientation - no exam mark
Mean
When you work out the average of Thandi’s school marks you are actually
finding the arithmetic mean. What you are doing is finding the total of all
the marks and then sharing them equally amongst the 7 subjects. In
general the mean is denoted by x (pronounced x bar)
To calculate the mean you use a formula:
Mean =
total or sum of all items ∑ x
=
number of items
n
1) Calculate (correct to one decimal place) the mean of Thandi’s marks
2) Write the marks in the table from lowest to highest – i.e. rank the marks
Subject
Mark
Median
The median is the middle score or item when the set of data is arranged in
order from smallest to biggest. The median is not affected by extremes.
3) You wrote Thandi’s marks in the table in ascending order. Counting from each
end, which mark is in the middle? Which subject is it?
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
2
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
Mode
The mode is the score or item that occurs most often. It is usually very
easy to find, especially from a frequency table.
4) Study the list of marks in ascending order. Which mark occurs most often?
Which subjects does this mark relate to?
5) Which of the three averages you think is the most useful in informing you about
how well Thandi is doing at school? Give a reason for your answer.
ORGANISING DATA USING A STEM-AND-LEAF DIAGRAM
In the 1960s John Tukey, an American mathematician and statistician, devised a
new way of organising data. He called this method a stem-and leaf diagram. In a
stem-and-leaf diagram the data is listed in intervals that depend on the place value
of the digits of each data item.
Example:
Suppose the 40 learners in your maths class scored the following marks in a maths
test:
32
45
36
53
;
;
;
;
56
44
57
57
;
;
;
;
45
52
55
56
;
;
;
;
78
47
47
55
;
;
;
;
77
50
33
71
;
;
;
;
59
52
39
63
;
;
;
;
65
51
66
62
;
;
;
;
54
40
61
65
;
;
;
;
54
69
48
58
;
;
;
;
39
72
45
55
;
;
;
;
This list of numbers has little meaning as it is.
• One way of organising the numbers would be to rearrange them in descending or
ascending order.
• Another way would be to draw a stem-and-leaf diagram - the tens digit forms the
stem and the units digits the leaves.
32 ; 56 ;
The first
number is 32:
The stem is 3
and the leaf is 2
The second
number is 56:
stem
3
leaves
The stem is 5 and
the leaf is 6
2
4
5
6
6
7
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
3
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
Note:
• The leaf is the ‘units’ digit – i.e. furthest to the right in the number.
• The stem is the ‘tens’ digit – i.e. furthest to the left in the number.
• If the number includes ‘hundreds’ and ‘thousands’ digits then the stem
includes these digits as well. So if the list of numbers included 120 ; 134 ;
127, then 12 and 13 would be the stems and 0 ; 4 and 7 would be leaves.
12 0 7
13 4
•
If the list of numbers includes a single digit number then the stem must be
0. So if the list of numbers includes the numbers 2 ; 3 ; 7 you write them as
02 ; 03 ; 07. The stem is 0 and the leaves are 2 ; 3 and 7.
0 2 3 7
•
•
•
•
Be careful how you write the ‘leaves’. Write each row underneath the ‘leaves’
in the row above (squared paper will help you do this.)
When you have entered the stem and leaves on to the display, redraw the
display with the leaves written in ascending order. This makes it easier to
read.
It is easy to find the median from a stem-and-leaf diagram. You can just
count the leaves (i.e. each data item).
Sometimes two data sets can be written as displays on either side of the
same stem.
The list of maths marks on the previous page looks like this as a stem-and-leaf
diagram:
stem
3
4
5
6
7
2
0
0
1
1
3
4
1
2
2
6
5
2
3
7
9
5
2
5
8
leaves
9
5 7 7 8
3 4 4 5 5 5 6 6 7 7 8 9
5 6 9
Notice that the stem-and-leaf display looks like a horizontal histogram.
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
4
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
Activity 2
Thirty learners were asked in a survey to say how many hours in a week they spent
watching TV. Their answers (correct to the nearest hour) are as follows:
12 20 13 15 22
3
6
24 20 15
9
12
5
6
8
30
7
12 14 25
2
6
12 20 20 18
3
18
8
9
1) Draw a stem-and-leaf display to organise this data.
2) Find
a. the mode of the data
b. the median of the data
SOLUTION
Step 1: Collect the data on the table
Stem
Leaves
Step 2: Arrange the leaves is ascending order
Stem
Leaves
KEY: 2/5 = 25
Step 3: Find the mode and the median of the data
Mode = ……………………………………………………..
Median = …………………………………………………...
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
5
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
RANGE
• When looking at data, measures of central tendency (or averages) give a general
idea of the group that they represent.
• However, taken by themselves, averages do not give a complete picture.
• To get a better idea of the data it is necessary to know how the rest of the data is
grouped around the average – whether it is closely grouped or scattered more
widely. We need a measure of the spread, scattering or dispersion of the
scores.
The range is the simplest measure of spread. It is the difference between the
largest and smallest values in the data. The range gives an indication of how much
the values of data vary.
Range = highest value – lowest value
200 cm
Two sets of data may
have the same mean
but may be very
different.
150 cm
150 cm
100 cm
The range can help
you compare them.
Mean height = 150 cm
Range = 0 cm
Mean height = 150 cm
Range = 100 cm
Note:
• The bigger the range the more spread out the data. However note that the range
only uses the highest and the lowest data item, it does not take into account any
other data value.
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
6
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
QUARTILES
In many experimental situations we tend to distrust extreme measures as they may
have resulted from poor measurement or behaviour that is not usual. As a result,
extreme measures are often discarded. For this reason the middle section of a
data set, where most of the data lies, gives you the best description of the data.
The median divides the distribution of a data set into two halves.
Each half can them be divided in half again;
○ the lower quartile (Q 1 ) is the median of the first half of the data set
○ the upper quartile (Q 3 ) is the median of the second half of the data set.
Quartiles therefore divide the distribution into four equal parts.
Set of data items divided into 4 equal parts:
Median
(M)
Lower quartile
(Q1)
•
•
•
Upper quartile
(Q3)
The lower quartile (Q1) is a quarter of the way through the distribution,
The middle quartile which is the same as the median (M) is midway
through the distribution.
The upper quartile (Q3) is three quarters of the way through the
distribution.
Activity 3
1) For each of the following sets of data, find the median, the lower quartile, Q1
and the upper quartile, Q3
a)
3
4
5
6
6
7
8
9
9
10
11
Median = ……………..
Q1 = …………………
b)
20
20
22
24
Q3 = ………………….
28
29
30
35
36
36
Median = ……………..
Q1 = …………………
c)
12
12
12
14
Q3 = ………………….
15
16
17
18
18
20
21
25
Median = ……………..
Q1 = …………………
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
Q3 = ………………….
7
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
2) Two sets of data are given below.
Set 1:
57 ; 51 ; 60 ; 52 ; 63 ; 45 ; 51 ; 57 ; 72 ;
48 ; 66 ; 49 ; 73 ; 64 ; 67 ; 52 ; 55 ; 80
Set 2:
37 ; 90 ; 68 ; 38 ; 49 ; 95 ; 27 ; 79 ; 87 ;
19 ; 59 ; 76 ; 42 ; 48 ; 73 ; 85 ; 40 ; 50
a) Draw a back-to-back stem-and-leaf diagram for each data set.
Leaves for Set 1
Stem
1
Leaves for Set 2
2
3
4
5
6
7
8
9
Leaves for Set 1
Stem
1
Leaves for Set 2
2
3
4
5
6
7
8
9
KEY: 8/0 = 80
b) Find the median, lower quartile and upper quartile for each data set
Set 1
Set 2
Median = ………………………….
Median = ………………………….
Q1 = ………………………………..
Q1 = ………………………………..
Q3 = ………………………………..
Q3 = ………………………………...
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
8
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
FIVE-NUMBER SUMMARIES
The median and the upper and lower quartiles tell you about the middle of a
distribution but not about the extremes. If you include the minimum and
maximum values, along with the median and quartiles, you get the five-number
summary of a set of data.
The five-number summary for a set of values consists of
1. Minimum: the smallest value in the set of data
2. Lower quartile (Q1): the median of the lowest half of the
values
3. Median (M): the value that divides the data into halves
4. Upper quartile (Q3): the median of the upper half of the
values
5. Maximum: the largest value in the set of data
With a stem-and-leaf diagram it is very easy to COUNT the data items to find the
quartiles and the median; the minimum and maximum can be read from the
beginning and end of the display.
Example:
Eighteen numbers were listed on a stem and leaf plot as follows (n = 18)
Q 1 = 30
Q 3 = 42
Stem
1
2
3
4
5
6
7
KEY:
The five-number summary is:
1
0
0
0
0
Leaves
2
5
0 0 2 5|9
0 0 2 5 8
Median lies
th
between 9 and
th
10 data item
0
3/5 = 35
Minimum = 11
Q1 = 30
M=
35 + 39
2
= 742 = 37
Q3 = 42
Maximum = 70
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
9
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
Activity 4
The following stem and leaf diagram shows the average lifespan of different
mammals.
Stem
Leaves
0
13455567788
1
00022222222255555556
2
00005
3
5
4
1
KEY: 2/0 = 20
List the five-number summary
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
10
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
BOX AND WHISKER DIAGRAMS
A box-and-whisker diagram is a way of representing data that shows the median,
the quartiles and the maximum and minimum values of the data set. This way of
displaying data was invented by John Tukey in the 1960s.
To draw the diagram
1. Draw a number line, starting at the minimum value of your data and ending
at the maximum value of your data.
2. Draw a line at the median (M).
3. Draw lines at the lower and upper quartiles (Q1 and Q3).
4. Connect the lower and the upper quartiles to form a box. The median will fall
somewhere within the box.
5. Plot a point at the minimum value and join this to the box – the left hand
whisker.
6. Plot a point at the maximum value and join this to the box – the right hand
whisker.
Example
A box-and-whisker diagram can be used to display the following data:
1, 5, 7, 8, 8, 14, 17
STEP 1: Work out the five number summary:
Median = 8
Q1 = 5
Q3 = 14
Minimum = 1
Maximum = 17
STEP 2: Draw the box and whisker diagram
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Note:
• The lines for the first and third quartiles are joined to make a box corresponding
to the central 50% of the data items,
• The whiskers are taken out from the box to the maximum and minimum values.
• Remember that the quartiles and the median divide the data items into four
equal groups. Each section has the same number of data items in it. However
the boxes and the whiskers may be of varying lengths as these are influenced by
the actual values of the data items. Some box and whisker diagrams are
symmetrical (showing a very even spread of data) others might be more
squashed up (showing skewed data).
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
11
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
Activity 5
1) 30 students were asked at random to pick a number between 0 and 20. Here
are the results:
12 ; 7 ; 8 ; 3 ; 5 ; 7 ; 10 ; 13 ; 7 ; 10 ; 2 ; 1; 11 ; 12 ; 17
4 ; 11 ; 7 ; 6 ; 8 ; 4 ; `7 ; 11 ; 9 ; 1 ; 12 ; 10 ; 12 ; 2 ; 15
a) Draw a stem-and-leaf display of the data and find the five-number summary
b) Draw a box and whisker diagram to illustrate the information
2) Which data set matches this box diagram? (More than one answer may be
correct)
23
a)
b)
c)
d)
23
23
23
23
25
;
;
;
;
25
23
27
27
;
;
;
;
27
26
24
28
28
;
;
;
;
28
25
28
28
29
;
;
;
;
28
26
33
29
;
;
;
;
31
33
35
37
39
41
43
28 ; 28 ; 30 ; 31 ; 33 ; 41 ; 43
27 ; 29 ; 30 ; 31 ; 33 ; 41 ; 43
43
32 ; 43
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
12
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
THE INTERQUARTILE RANGE
• You know that the range of a data set looks at the maximum and the minimum
values.
• The measure of spread that relates to the middle section of the data is called the
interquartile range (IQR). Approximately 50 % of the data items lie within the
inter quartile range.
The difference between the upper and lower quartiles is called the interquartile
range
The Interquartile Range = Upper Quartile – Lower Quartile
IQR = Q3 – Q1
Example :
Find the interquartile range of the following set of scores
12, 12, 3, 8, 9, 9, 5, 6, 3, 4, 5, 11
STEP 1: Rank the scores
3
3
4
5
5
6
7
9
9
11 12 12
STEP 2: Find the median and the quartiles
Median = 6 +2 7 = 132 = 6,5
Q1 =
4+5
2
Q3 =
9 + 11
2
3
Note: the
median is not a
data item
= 92 = 4,5
= 202 = 10
3
4
5
5
6
Q1
7
9
9
M
11 12 12
Q3
The IQR shows that
approximately ½ of the
learners lie within a
range of 5,5 marks
across the median
STEP 3: Calculate the IQR:
IQR = Q3 - Q1 = 10 – 4,5 = 5,5
A box and whisker diagram of this data would be:
IQR
3
4
5
6
7
8
9
10
11
12
Note: On a box and whisker diagram the ‘box’ represents the IQR.
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
13
§1: Stem-and leaf display; box-and whisker; interquartile range
FET Data Handling
Activity 6
The following set of
marked out of 60:
54
52
31
41
29
8
39
32
34
38
28
40
9
30
47
18
36
39
32
38
34
49
25
38
marks were obtained by 80 learners for a maths exam that was
47
20
37
19
29
23
13
24
24
46
43
43
8
58
27
27
36
32
32
33
13
34
32
48
27
27
23
52
12
35
37
36
15
31
33
37
33
47
23
45
44
33
31
38
35
16
24
18
26
57
21
39
48
21
37
8
1) Draw up a stem-and-leaf diagram to organise the data.
2) Use the display to find
a) the median mark ………………………………………………………………………….
b) the upper and lower quartiles ……………………………………………………………
c) the IQR…………………………………………………………………………………………
3) Draw a box and whisker diagram to show the data
4) How many learners passed if the pass mark is 40/60? ………………………………
5) Between which marks did the middle 50% of the learners lie?..............................
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
14