Exploratory Analysis (Human Age and Fatness)
Transcription
Exploratory Analysis (Human Age and Fatness)
Part 2: Exploratory Data Analysis (20 points) In part 1 of this assignment, you have created an SPSS data file. This part of the assignment is to use that data file to perform the following tasks (make charts) using SPSS statistical software for exploring and understanding the data. For each chart below that you put in for answer, you must also label it with figure number and title and above each chart you need to write a sentence or two to describe what you see in the chart. (See Example in the page 5 of this document.) 1. Make a histogram for the Weight variable to display the distribution of this variable. (Use a class width of 10.) Figure 1: Histogram for Weight The Figure 1 above is a histogram displaying the qualitative variable of weight. The distribution appears to be skewed to the right. 2. Make a frequency distribution table for the gender variable to see the frequency distribution and then make a bar chart. Table 1: Frequency Distribution Table for Gender Frequency Percent, % Female 16 53.3 Male 14 46.7 Total 30 100.0 Table 1 above shows the frequency distribution of gender variable. There were slightly more female subjects than male in the sample. 1 Figure 2: Bar Chart for Gender Figure 2 above is a bar chart for the qualitative variable gender. The frequency displayed indicating that the frequency of the females is more than the frequency of the males. 3. Make a cluster bar chart to examine the correlation between gender and Daily hours of TV viewing variables. (Use the Daily hours of TV viewing variable as the category axis and gender variable as the cluster variable.) Figure 3: Cluster Bar Chart for Gender versus Daily Hours of TV Viewing Figure 3 above is a cluster bar chart for the variables of daily hours of TV viewing versus gender. The data shows similar TV viewing distribution between male and female. More students watched TV less than 2 hours for both female and male students. 2 4. Make a scatter plot to examine the correlation between weight and height variables, and write a sentence to describe the trend you observed from the scatter plot. Figure 4: Scatter Plot for Height versus Weight. Figure 4 above is a scatterplot that shows the correlation between height and weight. There is a positive correlation. When the weight increases as the height increases. 5. A quality control officer recorded the average length for a random sample of 10 of steel frames made from a production line in (inches). The sample was taken one every hour. Produce a time plot to display the trend. Time 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 Average Length 5.1 4.9 5.1 5.2 5.0 5.3 5.5 5.9 6.5 7.7 9.6 3 Figure 5: Histogram for Average Length Figure 5 above is a time plot for displaying the time trend for average length variable. The chart indicates that as time increases so does the average length. 4 Part 3: Descriptive Measures (Please fill in your answers in this document.) (20 points) 1) Find the overall mean, median and sample standard deviation of weight variable in this data set. Sample Mean = 62.84 kg Sample Standard Deviation = 17.44 kg Sample Variance = 304.13 kg Sample Median = 59.44 kg 2) Does the distribution of the weight data for these children symmetrical belled-shape by looking at the histogram? (Circle or underscore or red colored your answer) Yes No 3) Report the percentage distribution of the Daily hours of TV viewing variable using the valid percentage distribution that do not include missing data. Hours of TV Viewing Less than 2 hours 2 hours or more Relative Frequency _58.62_ % _41.38_ % 4) Report the percentage distribution of Exercise Per Week variable using the valid percentage distribution that do not include missing data. Exercise Per Week 0 Days 1 Days 2 Days 3 Days 4 Days 5 Days 6 Days 7 Days Relative Frequency 20.00% 13.33% 13.33% 6.67% 20.00% 3.33% 10.00% 13.33% 5) Does the weight data suggest that it was from a normally distributed population? Perform a normality test and report the p-value of the test using .05 or 5% as the cutoff for decision making of the normality test. Report the p-value from the Shapiro-Wilk’s normality test and it is: __0.003____ Your conclusion on the normality is (type your answer using less than 30 words): Since the p-value is less than 0.05, we can conclude that the normality assumption is not acceptable, and the data is not likely from a normally distributed population. 6) Report the mean, median and sample standard deviation of weight variable for female subjects in this data set. Sample Mean = ___55.60_kg___ Sample Standard Deviation = __9.90_kg Sample Median = ___51.62_kg__ 5