Exploratory Analysis (Human Age and Fatness)

Transcription

Exploratory Analysis (Human Age and Fatness)
Part 2: Exploratory Data Analysis (20 points)
In part 1 of this assignment, you have created an SPSS data file. This part of the assignment is to
use that data file to perform the following tasks (make charts) using SPSS statistical software for
exploring and understanding the data. For each chart below that you put in for answer, you must
also label it with figure number and title and above each chart you need to write a sentence or two
to describe what you see in the chart. (See Example in the page 5 of this document.)
1. Make a histogram for the Weight variable to display the distribution of this variable. (Use a
class width of 10.)
Figure 1: Histogram for Weight
The Figure 1 above is a histogram displaying the qualitative variable of weight. The distribution
appears to be skewed to the right.
2. Make a frequency distribution table for the gender variable to see the frequency distribution
and then make a bar chart.
Table 1: Frequency Distribution Table for Gender
Frequency
Percent, %
Female
16
53.3
Male
14
46.7
Total
30
100.0
Table 1 above shows the frequency distribution of gender variable. There were slightly more
female subjects than male in the sample.
1
Figure 2: Bar Chart for Gender
Figure 2 above is a bar chart for the qualitative variable gender. The frequency displayed
indicating that the frequency of the females is more than the frequency of the males.
3. Make a cluster bar chart to examine the correlation between gender and Daily hours of TV
viewing variables. (Use the Daily hours of TV viewing variable as the category axis and
gender variable as the cluster variable.)
Figure 3: Cluster Bar Chart for Gender versus Daily Hours of TV Viewing
Figure 3 above is a cluster bar chart for the variables of daily hours of TV viewing versus
gender. The data shows similar TV viewing distribution between male and female. More
students watched TV less than 2 hours for both female and male students.
2
4. Make a scatter plot to examine the correlation between weight and height variables, and write
a sentence to describe the trend you observed from the scatter plot.
Figure 4: Scatter Plot for Height versus Weight.
Figure 4 above is a scatterplot that shows the correlation between height and weight. There is a
positive correlation. When the weight increases as the height increases.
5. A quality control officer recorded the average length for a random sample of 10 of steel frames
made from a production line in (inches). The sample was taken one every hour. Produce a time
plot to display the trend.
Time
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
Average Length
5.1
4.9
5.1
5.2
5.0
5.3
5.5
5.9
6.5
7.7
9.6
3
Figure 5: Histogram for Average Length
Figure 5 above is a time plot for displaying the time trend for average length variable. The chart
indicates that as time increases so does the average length.
4
Part 3: Descriptive Measures (Please fill in your answers in this document.)
(20 points)
1) Find the overall mean, median and sample standard deviation of weight variable in this data
set.
Sample Mean = 62.84 kg
Sample Standard Deviation = 17.44 kg
Sample Variance = 304.13 kg
Sample Median = 59.44 kg
2) Does the distribution of the weight data for these children symmetrical belled-shape by looking
at the histogram?
(Circle or underscore or red colored your answer)
Yes
No
3) Report the percentage distribution of the Daily hours of TV viewing variable using the valid
percentage distribution that do not include missing data.
Hours of TV Viewing
Less than 2 hours
2 hours or more
Relative Frequency
_58.62_ %
_41.38_ %
4) Report the percentage distribution of Exercise Per Week variable using the valid percentage
distribution that do not include missing data.
Exercise Per Week
0 Days
1 Days
2 Days
3 Days
4 Days
5 Days
6 Days
7 Days
Relative
Frequency
20.00%
13.33%
13.33%
6.67%
20.00%
3.33%
10.00%
13.33%
5) Does the weight data suggest that it was from a normally distributed population? Perform a
normality test and report the p-value of the test using .05 or 5% as the cutoff for decision
making of the normality test.
Report the p-value from the Shapiro-Wilk’s normality test and it is: __0.003____
Your conclusion on the normality is (type your answer using less than 30 words):
Since the p-value is less than 0.05, we can conclude that the normality assumption is not
acceptable, and the data is not likely from a normally distributed population.
6) Report the mean, median and sample standard deviation of weight variable for female
subjects in this data set.
Sample Mean = ___55.60_kg___
Sample Standard Deviation = __9.90_kg
Sample Median = ___51.62_kg__
5