Module 1 – Proposed Rearrangements and Additions New Order

Transcription

Module 1 – Proposed Rearrangements and Additions New Order
Module 1 – Proposed Rearrangements and Additions
New Order Topic
BN1.1
Now Showing: Basic Numeracy
BN1.2
Background Bugaboos
BN1.3
Times Table Troubles
BN1.4
Now Showing: Computations, Benchmarks
BN1.5
Perceptions, Pictures, Pcts
BN1.6
Computation and Common Sense
BN1.7
Really Random Reasoning
BN1.8
Hardwired to Slippery Thinking
BN1.9
Why Numeracy Matters
BN1.10
Mean versus Median
BN1.11
Variation Matters
BN1.12
Computing the Standard Deviation
Comment
Five questions
Currently 1.1
Currently 1.2
Five Questions
Currently 1.3
Currently 1.4
Currently 1.12
Currently 1.13
New
New
New
New
Mathematics Required
None
Percentages, addition, division
Multiplication
Percentages
Percentages
Division
Counting
None
Division
Addition, division
Addition, division, counting
Addition, division, square root
Software?
No
No
No
No
No
No
No
No
No
Yes
Optional
Yes
Stage
Encounter
Encounter
Encounter
Engage
Engage
Engage
Reflect
Reflect
Reflect
Extend
Extend
Extend
Corrections, Changes to Existing Material
NA
None
Question 1 refers to Table 1.1 when it should refer to Table 1.2
NA (suggestion to update benchmarks on video)
None
None
None
None
NA
NA
NA
NA
BN1.13
BN1.14
BN1.15
BN1.16
BN1.17
BN1.18
BN1.19
BN1.20
BN1.21
BN1.22
BN1.23
BN1.24
Now Showing: Expers - Introduction
Slippery Evidence and Confounding
Confounding Confusion
Now Showing: Compare and Rand
Experimentation Takes Flight
Catching on to Experimentation
Now Showing: Stat Sig
Questionable Evidence
Random Reflections
Assessing Statistical Significance
Designer Thoughts
What to Believe?
Five questions
Currently 1.6
New
Five Questions
Currently 1.8
New
Five Questions
New
New
New
New
New
None
None
None
Percentages, counting
Addition, division, counting
None
None
None
None
Addition, division, square root
Addition, division
None
No
No
No
No
No
No
No
No
Optional
Yes
No
Encounter
Encounter
Encounter
Engage
Engage
Engage
Reflect
Reflect
Reflect
Extend
Extend
Extend
NA
None
NA
NA
None
NA
NA
NA
NA
NA
NA
NA
BN1.25
BN1.26
BN1.27
BN1.28
BN1.29
BN1.30
BN1.31
BN1.32
BN1.33
BN1.34
BN1.35
BN1.36
Now Showing: Scatterplots
Scatterplot – Part I
Scatterplot –Part II
Now Showing: Corr Coef
Corr – Part I
Corr – Part II
Now Showing: Causation
Association and Causation
Association and Causation Revisited
Simpson
Simpson Revisited
Correlation and Outliers
Five Questions
New
New
Five Questions
New
New
Five Questions
Currently 1.9
Currently 1.10
Currently 1.14
Currently 1.15
New
None
None
None
Addition, division, square root
Addition, division, square root
Addition, division, square root
None
None
None
Fractions, percentages
Fractions, percentages
Addition, division, square root
No
Yes
No
No
Yes
No
No
No
No
No
Yes
Encounter
Encounter
Encounter
Engage
Engage
Engage
Reflect
Reflect
Reflect
Extend
Extend
Extend
NA
NA
NA
NA
NA
NA
NA
Replace the graph in Exhibit 2 with one I generated for BN 1.27
None
None
None
NA
BC1.1
A Very Lucky Project
None
No
Beyond the Class
None
BC1.2
I Got Your Simpson Right Here
Currently
BC 1.1
Currently
BC 1.2
Fractions
Yes
Beyond the Class
BC1.3
Watch My Slippery Evidence
Currently
BC 1.3
None
No
Beyond the Class
Replace first bullet link (no longer exists) with
http://blog.revolutionanalytics.com/2013/10/an-interactive-tool-toexplain-simpsons-paradox.html
None
REMOVE
Now Showing Number Sense
Currently
BN1.5
Old single-page video questions. Being broken into many different
pages now.
BEYOND THE NUMBERS 1.1_
LEARNING OUTCOME _
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Section Number:
BEYOND THE NUMBERS 1.4_
LEARNING OUTCOME _
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Section Number:
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Nursing Knowledge Needed
Questions
2. The nurse had injected the patient four times with a full 0.9 milliliter syringe. What was the nurse’s
mistake? Defend your answer.
Exhibit 2
Statistical Citizenship
Questions
1. Briefly list three “features of the Constitution that suggest a numerical approach to
governance.”
2. Initially the government was reluctant to collect more than the most basic census information of
race, sex, and age. Why? During which of the three time periods addressed by Cohen did this
attitude change?
3. In the colonies, if you had 48 pounds of soap, how many firkins of soap did you have?
4. Cohen writes that “the post-Civil War era finally brought a full melding of statistical data with
the functioning of representative government.” List three facts supporting this claim.
BEYOND THE NUMBERS 1.10_
LEARNING OUTCOME _
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Winging It
Section Number:
Exhibit 2
A BIG Visit
Exhibit 3
Gates Proof Inference
BEYOND THE NUMBERS 1.11_
LEARNING OUTCOME _
Variation Matters
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Uncommon Reach
The wingspans of 18 persons are recorded in the table below, nine being ordinary folks and nine
being current or former NBA players.
Questions
1. Use a software package (e.g.
Excel or Numbers) to compute
the mean and the median of the
Ike Diogu
Wingspan
(in)
88
73
Anthony Davis
88
P3
72
Shelden Williams
88
P4
69
Elton Brand
90
P5
69
Shawn Bradley
90
P6
68
Bismack Biyombo
90
P7
68
Saer Sene
93
P8
64
Gheorghe Muresan
94
P9
73
Manute Bol
102
Non- NBA
Persons
P1
Wingspan
(in)
70
P2
entire 18-person data set (“Data
Set I”).
Mean
________
Median ________
2. How do the mean and median
compare?
NBA Persons
Exhibit 2
Middle Muddle
Interval (Bin)
Wingspan ≤ 60
Questions
Frequency
60 < Wingspan ≤ 65
1. Construct a histogram of all 18 of the
wingspans in the table in Exhibit 1. Use the
intervals shown in the table to the right. And
plot the histogram on the axes below. Your
instructor may require you to use a software
package to do this exercise so follow her lead.
65 < Wingspan ≤ 70
70 < Wingspan ≤ 75
75 < Wingspan ≤ 80
80 < Wingspan ≤ 85
85 < Wingspan ≤ 90
The label of “60” on the plot below denotes the
first bin – Wingspan ≤ 60. The label of “75”
denotes the fourth bin – 70 < Wingspan ≤ 75
– and so on.
95 < Wingspan ≤ 100
100 < Wingspan ≤ 105
Wingspan Data
7
Frequency
6
5
4
3
2
1
0
60
65
70
75
80
85
90
95
100 105 More
Wingspan (inches)
2. Locate the mean on the plot by drawing in a vertical line segment there. How useful is the mean
at describing these 18 wingspans? Explain?
3. Let’s add an additional data set. Now suppose you have a data set of 18 persons, 8 with a
wingspan of 80.5 inches, 5 with a wingspan of 75.5 and 5 with a wingspan of 85.5. We’ll call this
“Data Set II.” Find the mean and median of these 18 data points.
4. Construct a histogram of all 18 of the wingspans
from Question 3. Use the intervals shown in the
table to the right. And plot the histogram on
the axes below. Your instructor may require you
to use a software package to do this exercise so
follow her lead.
Interval (Bin)
Wingspan ≤ 60
60 < Wingspan ≤ 65
65 < Wingspan ≤ 70
70 < Wingspan ≤ 75
75 < Wingspan ≤ 80
The label of “60” on the plot below denotes the
first bin – Wingspan ≤ 60. The label of “75”
denotes the fourth bin – 70 < Wingspan ≤ 75 –
and so on.
80 < Wingspan ≤ 85
85 < Wingspan ≤ 90
95 < Wingspan ≤ 100
100 < Wingspan ≤ 105
Wingspan Data
7
Frequency
6
5
4
3
2
1
0
60
65
70
75
80
85
90
95
Wingspan (inches)
100 105 More
Frequency
Exhibit 3
The Spice of Life
Questions
1. Compare the data sets from Exhibit 1 and Exhibit 2. How close are their average values?
2. Compare the histograms from Exhibits 1 and 2. Specify at least two ways the histograms are
notably different.
3. Let’s play a game. It costs you $1000 to play. Here are the rules. You get to pick two
wingspans at random, eyes closed, out of either Data Set I or Data Set II. Your choice. Call
your choices x1 and x2. You will receive a reward of $(80.5-x1)2 + $(80.5-x2)2. Suppose you
decided to pick from Data Set I and chose P2 and Saer Sene. How much money did you make?
4. Think back to the game in Question 3. If you truly get to pick which Data Set you want to
choose your two wingspans from, then which Data Set would you always be safest (in terms of
anticipated profit) to choose and why?
FYI – Some plots and facts.
Mean of Data Set 1 is 80.5
Mean of Data Set 2 is 80.5
Histograms are below
Wingspan Data
7
6
Frequency
5
4
3
2
1
0
60
65
70
75
80
85
90
95
Wingspan (inches)
100
105
More
Wingspan Data
9
8
Frequency
7
6
5
4
3
2
1
0
60
65
70
75
80
85
90
95
Wingspan (inches)
100
105
More
BEYOND THE NUMBERS 1.12_
LEARNING OUTCOME _
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Background
̅
(
√
̅)
(
̅)
Uncommon Reach Revisited
(
̅)
Variance
Standard Deviation
140.2647059
11.84334015
Variance
Standard Deviation
14.70588
3.834825
3.088365219
BEYOND THE NUMBERS 1.13_
LEARNING OUTCOME _
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Section Number:
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Brains and Beats
Section Number:
Exhibit 2
Fuzzy Quasi is a Bear
BEYOND THE NUMBERS 1.16_
LEARNING OUTCOME _
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Section Number:
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
A Measured Response





Distance
(cm)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Time
(sec)
0.045
0.064
0.078
0.090
0.101
0.111
0.119
0.128
0.135
0.143
0.150
0.156
0.163
0.169
0.175
Distance
(cm)
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Time
(sec)
0.181
0.186
0.192
0.197
0.202
0.207
0.212
0.217
0.221
0.226
0.230
0.235
0.239
0.243
0.247

Group R
1
2
3
4
5
6
Time
(sec)
Group L
1
2
3
4
5
6
Time
(sec)
4. Find the mean of both groups. Based on those two values, is there evidence of a difference
between the reaction times of Group L and Group R? Defend your answer.
5. What role would the variance of the measurements in each group have in making this decision in
Question 3 more precise? Explain.
BEYOND THE NUMBERS 1.19_
LEARNING OUTCOME _
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Section Number:
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Cancer Carafe
Section Number:
Exhibit 2
Of Mice and People
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
What's Random?






Section Number:
Exhibit 2
Random Opposition
B
A
B
A
B
A
A
A
B
A
B
A
A
A
B
B
A
B
A
B
B
A
B
B
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Badge of Big
Table B
Group R
1
2
3
4
5
6
Time
(sec)
0.090
0.119
0.143
0.169
0.064
0.150
Group L
1
2
3
4
5
6
Time
(sec)
0.111
0.181
0.090
0.186
0.045
0.143
5. Compute
√
6. It turns out that the one can say that the difference between the left-hand reaction times and righthand reaction times fail to be statistically significant if -2.23 < |Z| < 2.23, where Z is what you
computed in Question 5. Do the results shown in Table B support a statistically significant
difference in reaction times? Why or why not?
Instructor’s Note: The notation Z was used instead of “t” to avoid confusion later in the workbook. Also the degrees of freedom were
computed using the simple estimate of 6 + 6 -2.
BEYOND THE NUMBERS 1.23_
LEARNING OUTCOME _
Pairing Profits
Reaction Times
(sec)
Left
Right
1
1.05
0.74
0.76
0.66
0.71
0.78
0.79
0.68
0.69
0.65
0.72
0.75
0.75
0.69
0.72
0.94
0.99
0.79
0.8
0.81
0.82
0.62
0.67
(24-1) x
Variance of All
24 Reaction
Times
(24-1) x
Variance of All
24 Reaction
Times
=
Variance
Attributed to
Hand (L/R)
=
Variance
Attributed to
Hand (L/R)
+
+
Variance Left
Unexplained
Variance
Explained by
Pairing
+
Variance Left
Unexplained
Time
Source
DF Sum of Squares Mean Square F Value Pr > F
Model
12
0.30148333
0.02512361
Error
11
0.00290000
0.00026364
Corrected Total 23
0.30438333
95.30 <.0001
R-Square Coeff Var Root MSE Time Mean
0.990473
Source DF
2.097337
0.016237
0.774167
Type I SS Mean Square F Value Pr > F
Subject 11 0.29608333
0.02691667
102.10 <.0001
1 0.00540000
0.00540000
20.48 0.0009
Hand
Source DF Type III SS Mean Square F Value Pr > F
Subject 11 0.29608333
0.02691667
102.10 <.0001
1 0.00540000
0.00540000
20.48 0.0009
Hand
Time
Source
DF Sum of Squares Mean Square F Value Pr > F
Model
1
0.00540000
0.00540000
Error
22
0.29898333
0.01359015
Corrected Total 23
0.30438333
0.40 0.5350
R-Square Coeff Var Root MSE Time Mean
0.017741
Source DF
Hand
15.05836
0.116577
0.774167
Type I SS Mean Square F Value Pr > F
1 0.00540000
0.00540000
0.40 0.5350
Source DF Type III SS Mean Square F Value Pr > F
Hand
1 0.00540000
0.00540000
0.40 0.5350
BEYOND THE NUMBERS 1.24_
LEARNING OUTCOME _
Name:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Piltdown Meltdown - 1912
Section Number:
Exhibit 2
Marker Mice - 1974
Exhibit 3
Doing the Dishes - 2010
Name:
To be graded, all assignments must be completed and submitted on the original book page.
1.
2.
3.
4.
5.
Section Number:
BEYOND THE NUMBERS 1.26_
LEARNING OUTCOME _
Scatterplots --- Part I
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Anscombe's Activity
Obs
x1
y1
x4
y4
1
10
8.04
8
6.58
2
8
6.95
8
5.76
3
13
7.58
8
7.71
4
9
8.81
8
8.84
5
11
8.33
8
8.47
6
14
9.96
8
7.04
7
6
7.24
8
5.25
8
4
4.26
19
12.5
9
12
10.84
8
5.56
10
7
4.82
8
7.91
11
5
5.68
8
6.89
These data were created by F.J. Anscombe* in 1973 to remind us of
the importance of plotting our data. You will see these data again
later on in this workbook.
Questions 1. Create a scatterplot of y1 vs. x1. Does the plot show a positive association or a negative association? How do you know? 2. Create a scatterplot of y4 vs. x4. Does the plot show a positive association or a negative association? How do you know? Make sure you turn in your plots with this assignment 10
10
Y4
15
Y1
15
5
5
0
0
0
5
10
X1
15
20
0
5
10
15
X4
Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, Connecticut: Graphics Press, 1983), pp. 14‐15. F.J. Anscombe, "Graphs in Statistical Analysis," American Statistician, vol. 27 (Feb 1973), pp. 17‐21 20
Exhibit 2
Vaccines and Risk
There is an on‐going debate over possible links between vaccines with thimerosal and onset of autism. The data set below records the percentages of California children who had received 4 doses of DTP by their 2nd birthday and the number of autism cases in California’s Department of Developmental Services’ regional service center system*. California 1980‐1994
1400
Autism Cases
1200
1000
800
600
400
200
0
40
50
60
70
DTP Coverage (%)
Questions 80
Year
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
DTP
Coverage
(%)
50.9
55.4
52.1
47.7
48.9
54.3
54.1
55.3
60.9
62.2
65.9
67.3
69.8
73.6
75.7
Number of
Autism
Cases
176
201
212
229
246
293
357
347
436
522
663
823
1042
1090
1182
1. Create a scatterplot of Autism Cases versus DTP Coverage. Does the plot show a positive association or a negative association? How do you know? Make sure you turn in your plot with this assignment. 2. Is the association weak or strong? Defend your reasoning. *Dr. Loring Dales from the Immunization Branch, California Department of Health Service made these data publically available at http://www.putchildrenfirst.org/media/4.6.pdf. See also http://www.ncbi.nlm.nih.gov/pubmed/11231748 FYI‐ Plots, so you can see what they students should see. Part 1 Exhibit 1 x1-y1
14
12
10
8
6
4
2
0
0
5
10
15
20
x4-y4
14
12
10
8
6
4
2
0
0
5
10
15
20
California 1980‐1994
1400
1200
Autism Cases
Part 1 Exhibit 2 1000
800
600
400
200
0
40
50
60
DTP Coverage (%)
70
80
BEYOND THE NUMBERS 1.27_
LEARNING OUTCOME _
Scatterplots --- Part II
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Mortality and Global Warming
In this exercise we want you to construct a scatterplot of “Child Mortality” versus “CO2 Emissions” for 192 countries, from 2006 data, archived by Dr. Hans Rosling*. These data are available at http://www.heretheyarenow . You must use a computer software package (e.g. Excel or Numbers), or an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is required. Make sure you label your axes and provide a professional plot. Answer the questions below. Save your computer work. You may need it for another Beyond the Numbers later on. Questions 1. What computer software did you use to construct your plot? Make sure you turn in your plot with this assignment. 2. Does the scatterplot show a positive association or a negative association? How do you know? 3. Is the association weak or strong? Defend your reasoning. *Hans Rosling is Professor of International Health at Karolinska Institute and the co‐founder and chairman of the Gapminder Foundation. Dr. Rosling is committed to making important public data available for easy plotting and analysis with his Gapminder software. Exhibit 2
Mortality and Global Warming Transformed
Save your computer work for this Exhibit. You may need it for another Beyond the Numbers later on. Questions 1. Redo the scatterplot from Exhibit 1. Same rules on required use of a computer package and professional‐looking result. This time plot log10(Child Mortality) versus log10(CO2 Emissions). How does this plot compare to the one you did in Exhibit 1? Make sure you turn in your plot with this assignment. 2. Does the scatterplot show a positive association or a negative association? How do you know? 3. Is the association weak or strong? Defend your reasoning. FYI‐ Plots, so you can see what they students should see. Part 2 Exhibit 1 – I’d like this to replace the plot on page 20 of current workbook. Same data, this is lin/lin scale and should not be proprietary. For reference, India (blue), China (red) and US are highlighted dots. 200
Child mortality (0-5 year old dying per 1,000 born)
180
160
140
120
100
80
60
40
20
0
0
10
20
30
40
50
CO2 emissions (tonnes per person)
60
70
Part 2 Exhibit 2 – Nice linear outcome. Will compare r in Exhibit 1 and Exhibit 2 in next BN Child mortality (0-5 year old dying per 1,000 born)
3
2.5
2
1.5
1
0.5
0
-2
-1
0
1
CO2 emissions (tonnes per person)
2
BEYOND THE NUMBERS 1.28_
LEARNING OUTCOME _
Name:
To be graded, all assignments must be completed and submitted on the original book page.
1.
2.
3.
4.
5.
Section Number:
BEYOND THE NUMBERS 1.29_
LEARNING OUTCOME _
Computing Correlations --- Part I
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Anscombe's Activity Revisited
2
Obs
x1
y1
1
10
8.04
2
8
6.95
3
13
7.58
4
9
8.81
5
11
8.33
6
14
9.96
7
6
7.24
8
4
4.26
9
12
10.84
10
7
4.82
11
5
5.68
Σx =
Σy =
x1y1
x1
2
y1
Recall the Anscombe’s data from an earlier
Beyond the Numbers. In this activity you will
be asked to compute the correlation
coefficient for each pair of variables and
compare.
Questions
1. Compute r for the (x1,y1) pairs.
Σxy =
2. Compute r for the (x4,y4) pairs.
2
Σx =
Σx2 =
2
y4
1
8
6.58
2
8
5.76
3
8
7.71
4
8
8.84
5
8
8.47
6
8
7.04
7
8
5.25
8
19
12.5
9
8
5.56
10
8
7.91
11
8
6.89
Σy =
x4
2
x4
Σx =
x4y4
y4
Obs
3. Compare the two r values you found
in light of the scatterplots of these
data (which you plotted earlier).
What note of inferential caution does
this exercise sound?
Σxy =
2
Σx =
Σx2 =
Exhibit 2
Vaccines and Risk Revisited
Dr. Loring Dales of the Immunization Branch, California Department of Health Service writes “here are the data we have on (a) percentages of California children who had received 4 doses of DTP by their 2nd Year
X = DTP
Y = Number of
2
2
xy
x
y
Coverage (%) Autism Cases
1980
50.9
176
1981
55.4
201
1982
52.1
212
1983
47.7
229
1984
48.9
246
1985
54.3
293
1986
54.1
357
1987
55.3
347
1988
60.9
436
1989
62.2
522
1990
65.9
663
1991
67.3
823
1992
69.8
1042
1993
73.6
1090
1994
75.7
1182
Σx =
Σy =
Σxy =
2
Σx =
Σx2 =
Questions 1. Fill out all the entries in the table that are missing. Your instructor may have you retype the table if you are not required to turn in this actual page. 2. Compute the correlation coefficient between DTP Coverage and Autism Prevalence FYI‐ Key bit, so you can see what they students should see. Part 1 Exhibit 1 x1-y1
14
12
10
8
6
4
2
0
0
5
10
15
20
r = 0.82 x4-y4
14
12
10
8
6
4
2
0
0
5
10
15
20
r = 0.82 YIKES reaction Part 1 Exhibit 2 R = 0.9616552 (“strong” correlation, will address later in causation section) Part 2 Exhibit 1 R = ‐0.438960093 (seems a bit small – because of curvature) Part 2 Exhibit 2 R = ‐0.802859716 (much more reasonable after transformation) BEYOND THE NUMBERS 1.30_
LEARNING OUTCOME _
Computing Correlations --- Part II
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Mortality and Global Warming Revisited
Refer to BN1.B: Scatterplots – Part II. Hans Rosling is Professor of International Health at Karolinska Institute and the co‐founder and chairman of the Gapminder Foundation. Dr. Rosling is committed to making important public data available for easy plotting and analysis with his Gapminder software. In this exercise we want you to compute the correlation coefficient between “Child Mortality” versus “CO2 Emissions” for 192 countries, from 2006 data archived by Dr. Rosling. These data are available at http://www.heretheyarenow . You must use a computer software package such as Excel or Numbers, or an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is required. Questions 1. What is the value of r? 2. Does the value of r suggest the association between Child Mortality and CO2 Emissions is strong or weak? How do you know? 3. Do you think the computation of r is appropriate for these data? Why or why not? (You constructed a scatterplot of these data in BN_1.27). Exhibit 2
Transformations Revisited
Refer to BN1.B: Scatterplots – Part II. In this exercise we want you to compute the correlation coefficient between log10(Child Mortality) and log10(CO2 Emissions) for the 192 countries, from 2006, in the data archived by Dr. Rosling. The raw data are available at http://www.heretheyarenow . You must use a computer software package such as Excel or Numbers, or an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is required. Questions 1. What is the value of r? 2. How does this value of r compare to the one found in Exhibit 1? 3. Does the value of r suggest the association between Child Mortality and CO2 Emissions is strong or weak? How do you know? 4. Do you think the computation of r is appropriate for these data? Why or why not? (You constructed a scatterplot of these data in BN_Ab. See page kjlkjl also. FYI‐ Key bit, so you can see what they students should see. Part 2 Exhibit 1 R = ‐0.438960093 (seems a bit small – because of curvature) Part 2 Exhibit 2 R = ‐0.802859716 (much more reasonable after transformation) Name:
To be graded, all assignments must be completed and submitted on the original book page.
1.
2.
3.
4.
5.
Section Number:
BEYOND THE NUMBERS 1.36_
LEARNING OUTCOME _
Outliers and Leverage Points
Name:
Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Heptathletes
Finish data for two events from the 1992 Olympic Heptathlon are shown below. A scatterplot of the
data are shown just to the right of the table. Chouaa is the green data point and Barber is the red one.
Joyner‐Kersee Nastase Dimitrova Belova Braun Beer Court Kamrowska Wlodarczyk Greiner Kaljurand Zhu Skjaeveland Lesage Nazaroviene Aro Marxer Rattya Carter Atroshchenko Vaidianu Teppe Clarius Bond‐Mills Barber Chouaa Hurdles Javelin (seconds) (meters) 12.85 44.98 12.86 41.3 13.23 44.48 13.25 41.9 13.25 51.12 13.48 48.1 13.48 52.12 13.48 44.12 13.57 43.46 13.59 40.78 13.64 47.42 13.64 45.12 13.73 35.42 13.75 41.28 13.75 44.42 13.87 45.42 13.94 41.08 13.96 49.02 13.97 37.58 14.03 45.18 14.04 49 14.06 52.58 14.1 45.14 14.31 43.3 14.79 0 16.62 44.4 Heptatlon Results from 1992
Javelin Distance (Meters)
Name 60
50
40
30
20
10
0
12
13
14
15
16
17
Hurdles Time (Seconds)
Questions 1. What kind of association do you see in the scatterplot – positive, negative, neither? Support your answer.
2. Compute the correlation coefficient “r” for the entire data set. You should use a software package or an online applet as required by your instructor. Is this value of “r” consistent with what you answered in Question 1? Why or why not? Exhibit 2
Language
“Outliers” in a scatterplot are data pairs that are not spatially close to the bulk of the data. Outliers are not necessarily a problem for the human inference that arises from a correlation coefficient. However, if the removal of a single outlier causes a distinct change in the correlation, then that outlier would be called an “influence point” and influence points can disguise the essence of an association. Questions 1. Looking at the scatterplot above, which athletes are outliers? 2. Compute the correlation coefficient “r” for the data set with Barber removed. Is Barber an influence point? Why? 3. Compare the values of “r” that you computed for the entire data set and for the data set with Barber removed. Which one best reflects the association seen in the scatterplot? Why? FYI – Easy computations show the following: a) r overall is ‐0.25213, not large but notably incongruous with the plot b) r w/o barber is 0.00061 definitely making her a leverage point