MATH30-6 Lecture 1

Transcription

MATH30-6 Lecture 1
Introduction to Statistics and Data
Analysis
MATH30-6
Probability and Statistics
Objectives
At the end of the lesson, the students are expected to
• Define basic terms and phrases used in statistics;
• Identify the importance of statistics in everyday life;
• Compare and contrast descriptive and inferential
statistics; and
• Explain the concepts of methods of data collection and
presentation.
Statistics
The field of statistics deals with the collection,
presentation, analysis, and use of data to make decisions,
solve problems, and design products and processes. In
simple terms, statistics is the science of data.
Branches of Statistics
Descriptive Statistics (DS)
• Concerned with describing the characteristics and
properties of a group of persons, places or things.
• Based on easily verifiable facts or meaningful
information.
• Does not draw inferences or conclusions about a larger
set of data.
Descriptive Statistics
Examples
• How many passed in the recent Electrical Engineering
Licensure examination?
• In Applied Life Data Analysis (Wiley, 1982), Wayne
Nelson presents the breakdown time of an insulating
fluid between electrodes at 34 kV. The times in
minutes, are as follows: 0.19, 0.78, 0.96, 1.31, 2.78,
3.16, 4.15, 4.67, 4.85, 6.50, 7.35, 8.01, 8.27, 12.06,
31.75, 35.52, 33.91, 36.71, and 72.89.
Branches of Statistics
Inferential Statistics (IS)
• Draws inferences about a population based on the data
gathered from the samples using the techniques of DS.
• Composed of those methods concerned with the
analysis of a smaller group of data leading to
predictions or inferences about the larger set of data.
• Statistics that deals in giving a generalization about the
whole from an analysis of the part of the group.
Inferential Statistics
Examples
• Is there a significant correlation between the amount
spent in studying and final grade in a computer
programming course?
• Study shows that ABET accredited programs draw more
students to enrol at Mapúa Institute of Technology in
such programs.
Population and Sample
Population
• Totality of all observations from which the dataset is
acquired
• All of the possible events should be considered.
• Variable that describes population is known as
parameter.
Example:
There are 5,786 students enrolled in MATH10-1.
Population: Students of MATH10-1
Parameter: 5,786 (population size)
Population and Sample
Sample
• Small group taken from the population
• A group heterogeneous as possible taken from the
large group to represent the population
• Variable that describes sample is known as statistic.
Example:
Of the 5,786 students enrolled in MATH10-1, 3,456 are
females.
Sample: Female students in MATH10-1
Statistic: 3,456 (sample size)
Variables
Variables are the parameters being studied in statistics.
Qualitative Variables
• Also known as categorical data which are commonly
answered by non numeric data usually qualitative in
form
• Examples are preferences, gender, civil status, and
location.
Variables
Quantitative Variables
• Also known as numerical data which are information
and observations that are countable or measurable
quantities
• Examples are force, weight, height, voltage, current,
resistance, tensile strength, and grades.
Variables
Examples: Classify as Quantitative (QN) or Qualitative
(QL).
• Weekly allowance
• Income of parents
• Religion
• Age
• Address
• Educational attainment
• Jobs
• Schools attended
Categories of Quantitative Data
Continuous Data
• Measurable quantities. Have infinite values between
intervals.
• Data that have been measured by analog devices and
have infinite values based on interpolations
• Examples are height, weight, and ratio of persons.
Categories of Quantitative Data
Discrete Data
• Countable quantities. Have finite equal intervals.
• Data that have been measured by digital measuring
device that tends to have exact values
• Examples are number of individuals and months of the
year.
Dependent vs Independent Variable
Independent Variable
• A naturally occurring phenomenon that can be altered
by increasing or decreasing its magnitude.
Dependent Variable
• A variable that is observed upon application of the
changes applied to the independent variable.
Example: The number of hours spent in studying and test
scores.
Dependent vs Independent Variable
Controlled Variable
• Kept constant to check for the external effects of the
dependent to the independent variable
Extraneous Variable
• Would have minimal effect to the result of the
dependent variable to the independent variable
Scales of Measurement
• Nominal
- Assigning numerical to categorical data.
• Ordinal Data
- Assigning rank to the levels of data.
• Interval
- Assigning a constant difference between numeric
data.
• Ratio
- Assigning continuous range of data over a range.
Nominal Data
• Commonly categorical data assigned to numbers.
• The applicable measurement is simply counting the
number of times a certain data would fall on the
category, like assigning 1 for males and 2 for females.
• Other examples include course, civil status, color, and
preference.
Ordinal Data
• Quantities where the numbers are used to designate
the rank order of the data
• The correlation or the effect of the ranking of one
variable can be measured. However, the range for each
rank is not constant.
• Examples are results of a race, ranking of a beauty
pageant, and level of hardness of a material in the
Moh’s scale.
Interval Data
• The range between the numeric values is constant.
• Addition and subtraction is applicable, but not for
multiplication and division.
• Multiplication and division can only be done in the
difference between intervals.
• Zero point is arbitrary.
• Examples include years (1994, 2004, etc.), times (00:00,
20:00, etc.) and temperature in Celsius and Fahrenheit
scales.
Ratio Data
• Widely used data in science and engineering
• Almost all the basic mathematical operations can be
performed in this data type.
• There is a non arbitrary zero point.
• Examples include length, mass, angles, charge, and
energy.
Sampling
Sampling is the process of taking samples from the
population.
• Probability Sampling
- This eliminates the biases against certain event that has
no chance to be selected by listing all the possible events
and taking a chance that they will be selected to be part
of the sample.
• Non-Probability Sampling
- This type of sampling technique has certain or has no
chance of an individual of being selected to be part of
the sample.
Probability Sampling
• Simple Random Sampling
- Performed by arranging the population according to a
certain rule, each element being numbered and a
sample is taken by various randomizing principles.
- Randomizing events examples are table of random
numbers, random number generator in computers and
calculators, and lottery or fish bowl technique.
- Each event in the population has equal chance of being
selected as part of the sample.
Probability Sampling
• Systematic Sampling
- Done by arranging the population in accordance to a
certain order and the sample will be taken by dividing
the population into equal groups and obtaining the kth
element in each group
Examples:
- Getting the temperature of the device every 4 hours
- Getting the voltage of the signal every constant interval
and converting to another signal
Probability Sampling
• Stratified Sampling
- Done by grouping the population into strata, a
subpopulation with generally homogeneous or similar
characteristics
- After dividing the population into several strata, a
random sampling is performed in each stratum
proportional to the size of each stratum relative to the
population.
Probability Sampling
• Stratified Sampling
Example: A survey to find out if families living in a certain
city are in favor of construction of manufacturing plant
will be conducted. To ensure all income groups
represented, respondents will be divided into:
Class A – high income
Class B – middle income
Class C – low income
Probability Sampling
Strata
Number of Families
A
B
C
1000
2500
1500
N = 5000
• Stratified Sampling
- Using a 5% margin of error, how many families should
be included in the survey? Use Slovin’s formula: 𝑛 =
𝑁
1+𝑁𝑒 2
- Using proportional allocation, how many from each
group should be taken as samples?
Probability Sampling
• Cluster Sampling
- Done by identifying groups called clusters, a
subpopulation with elements as heterogeneous or
diverse characteristics as possible
- The clusters must be similar to each other with respect
to the parameter being examined.
- A cluster or clusters will be selected as sample.
- Preferred since it will save time and money to go to
various clusters
- Example: Selection of a certain region.
Non-Probability Sampling
• Convenience Sampling
- Based primarily on the availability of the respondents
- Used because of the convenience it offers to the
researcher
- Example: Gathering data through telephone.
• Quota Sampling
- There is a desired number of sample and the
respondents were taken as they volunteered
themselves to become part of the experiment.
- Almost similar to the stratified random sampling
- Example: Phone call survey where the first 100 callers
are taken
Non-Probability Sampling
• Purposive Sampling
- The sample is obtained based on a certain premise.
- Example: A study about pregnant women where the
male population would have zero chance of being
selected as part of the survey
Summary
• There are two fields of Statistics: Descriptive and
Inferential Statistics.
• Population is the totality of all observations from which
the dataset is acquired. Sample is a subset of
population.
• Variables are classified as quantitative or qualitative
and independent or dependent.
• The scales of measurement are nominal, ordinal,
interval, and ratio.
• Sampling techniques are classified as probability
(random, systematic, stratified, and cluster) and nonprobability (convenience, quota, and purposive).
References
• Montgomery and Runger. Applied Statistics and
Probability for Engineers, 5th Ed. © 2011
• http://en.wikipedia.org/wiki/Statistics
• http://writing.colostate.edu/guides/research/stats/ind
ex.cfm