Data Collection and Analysis - Software and Systems Engineering

Comments

Transcription

Data Collection and Analysis - Software and Systems Engineering
Master seminar-Foundations in Empirical Software Engineering
Data Collection and Analysis
-Introduction to Data Collection Methods
Madhura Kumaraswamy
DATA COLLECTION (1):
Systematic approach of gathering information from various sources to
answer research questions and to test hypotheses
Accurate data collection essential to maintain integrity of research
Correct use of collected data important to reduce errors
Most time and labor intensive step in an empirical study
EXAMPLE STUDY:
Java
C/C++
Python
Actionscript
BASIC
PHP
.
.
.
C#
DATA COLLECTION (2):
1. Set the goal of the research study
Describe the phenomenon being studied, purpose of the study
2. Determine sources of data
Are data sources reputed and reliable?
3. Collect data from sources using various methods
Qualitative, quantitative, mixed
4. Compare collected data with historical data
Assess quality and plausibility of collected data
5. Perform reliability and validity checks
Reduce bias, perform triangulation
6. Perform data analysis
Maxwell, J. A. (1998). Designing a qualitative study. Handbook of applied social research methods, pp. 69-100
DATA COLLECTION (3):
Qualitative Methods
Collecting data based on:
• Experiences
• Emotions
• Opinions
• Social phenomena
Collected by
direct contact
with respondents
Quantitative Methods
Collecting numerical
data used in:
• Aggregation
• Inferences
Historical data
QUALITATIVE DATA COLLECTION(1):
Interviews
Qualitative
Data Collection
Methods
Participant
Observation
Focus Groups
Questionnaires
/Testing
QUALITATIVE DATA COLLECTION(2):
INTERVIEWS:
•
•
•
Used to collect opinions and experiences from interviewees
Data collected may not always be facts
Structured, unstructured, semi-structured interviews
PARTICIPANT OBSERVATION:
•
•
•
•
Social interaction between researcher and informants
Observer not necessarily engaged
Think aloud protocols:
• subject verbalizes his or her thoughts
Fly on the wall:
• researcher must be unobtrusive
QUALITATIVE DATA COLLECTION(3):
FOCUS GROUPS:
•
•
•
Combine interviewing and participant observation
Group interactions generate new data
Gain insights into the respondents’ behaviors, attitudes and language
QUESTIONNAIRES/TESTS:
•
•
•
Sets of questions, which need written answers
Most common method - quick and easy
Important:
• Wording of the questions
• Layout of the forms
• Ordering of the questions
QUALITATIVE DATA COLLECTION(4):
Rich data, highly detailed
Contact with respondents
Can explore topics in depth
Behavioral aspects of responses
Close to source of data
Large volume of information
Expensive and time-consuming
Highly trained professionals
Participants’ changing behavior
Observer bias distorts data
QUANTITATIVE DATA COLLECTION(1):
Raw Data
Surveys/Polls
Quantitative Data
Collection Methods
Aggregated
Data
Automatic
methods
Inferred Data
QUANTITATIVE DATA COLLECTION(2):
SURVEYS/POLLS:
•
•
•
Series of questions that a person answers
Provide descriptive data
Collect data from a large sample of anonymous people
AUTOMATIC METHODS:
SIMULATION:
• Abstraction of real life
process
• Based on experimental
data
INSTRUMENTING
SYSTEMS:
• Instrumentation built into
software tools
• Record information
automatically, system
monitoring
ANALYSIS OF LOG
FILES:
• Analyze log files
generated by build
tools
• Automatically
extract information
QUANTITATIVE DATA COLLECTION(3):
Allow broader study,
generalization of results
Limited results (numerical
descriptions, no human perception)
Greater objectivity, accuracy of
results
Laboratory environment not same
as real world
Allow summarizing of
information
Cannot be applied for all
phenomena/situations
Personal bias can be avoided
MIXED METHODS:
Qualitative
Methods
Merging
Interview data +
Analysis of log
files
Mixed Quantitative
Methods Methods
Sequencing
Coding
Participant
observation followed
by simulation
RELIABILITY AND VALIDITY:
HAWTHORNE EFFECT
Ritch Macefield (2007)
People act differently when they are being observed
Reliability
Validity
Contingency Checks
Verification
•Analyze dependent
attributes
•Using secondary data
•Assess quality
Sampling
Triangulation
•Subset must
represent whole
population
•Data from
multiple sources
in multiple ways
Controlling bias
•Types:
•Interviewer
•Measurement
•Recall
•Time
SUMMARY:
• Data collection - gathering information from various sources
• Process of data collection – six steps
• Qualitative data collection – interviews, participant observation, focus
groups, questionnaire/tests
• Quantitative data collection – surveys, automatic data collection
• Mixed data collection – merging, sequencing, combining
• Ensuring reliability - contingency checks, verification
• Ensuring validity – sampling, triangulation, controlling bias
THANK YOU
Questions?

Similar documents