High-Stakes Testing and Mathematics Performance of Fourth

Transcription

High-Stakes Testing and Mathematics Performance of Fourth
High-Stakes Testing and Mathematics Performance of Fourth Graders in North Cyprus
Author(s): Osman Cankoy and Mehmet Ali Tut
Source: The Journal of Educational Research, Vol. 98, No. 4 (Mar. - Apr., 2005), pp. 234-243
Published by: Taylor & Francis, Ltd.
Stable URL: http://www.jstor.org/stable/27548083 .
Accessed: 05/11/2014 10:56
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of
Educational Research.
http://www.jstor.org
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
High-Stakes
and
Testing
Performance
of
Fourth
North
Teacher
MEHMET ALI TUT
North
Academy,
Training
Eastern
Cyprus
for
Grade
1,006
skills
taking
matics
that
the
who
better,
students
performed
than did
items,
schools.
Cyprus
time on test
spent more
in routine
mathe
especially
students
on
time
less
who
that
approach
as (a)
working
test
spent
was no difference
in test results
skills. There
observed
taking
did indi
from nonroutine
However,
story problems.
analysis
too much
cate that spending
time on test-taking
skills
led to
memorizing
and
procedures
on
cuing
surface
attributes
of
of
ways
a
problem.
skills
between
cation
generations
tant
education
of any
that
has
education
matics
critical
or
as well
negative
or
system
and
teaching
Researchers
testing.
have
on
give
reliable
producing
influences
unexpected
can have
on
A
crucial
dents
and
sort
of
in the
scored
test
because
answers
items,
the
sometimes
they
testing
in education
is standard
in the
are
for making
used
tests
high-stakes
complete
in comparison
with
that
usually
are
taken
standardized
classroom
weekly
ment
only
a limited
routine
of
the
items,
or
steps.
had
However,
did
students
procedure.
to imple
for nonrou
student
not
to apply
have
If there were an algorithm
the
follow,
that could be
problem
the
mathe
mathematics
algorithm
problems,
number
nonroutine
and
a
study,
a standard
mathematics
could
used
students
were
student
had
mathematics
for
in the
used
In this
stan
study
the
the
to examine
classroom
we
items,
routine
sought
that the
the math
the
mathematics
contexts
that
problems
and textbooks.
practices
answers
to the following
of
performance
routine
answering
the
often
questions.
and
the
students
nonroutine
in
and
solving
mathematics
prob
lems differ?
stu
all
2.
conditions
are high
Is there
of
stakes
a
gender
students
mathematics
decisions
when
effect
they
on
the mathematics
solve
routine
and
performance
nonroutine
problems?
1999). Students
tests
quizzes
routine
1. Does
major
about the students (Marcus, 1994; Popham,
usually
a year,
same
tests
of mathematics
influ
is high-stakes
sense
that
the
under
questions
same manner.
The
For
items
It is the unexpected
learners.
of
questions
nonroutine
solving
routine
this
with
to the nonroutine
overlook
test
that
consequences
test
ematics problem and apply the algorithm flexibly. Contrary
affect the quality of testing.
which
testing,
answer
the same
are
test
In
or answered
student
concentrate
and
importance
and
and
teachers
ences that ultimately
dardized
prime
testing
characteristics
indicators
good
to solve
graders
tine mathematics
system as a whole (Amrein 6k Berliner,
educa
2002; Paris, 1995; Popham, 1999, 2001). Although
routinely
as
any formal algorithms.
and the education
tors
surface
emphasize
publications
problems.
solved
most
the
of
and
multiple-choice
problem represented a textbook-like
is not
poorly
objective
prepared,
as detrimental
on students
effects
when
testing,
One
systems.
is assessment,
process
learning
found
and
the
elements
important
is attracting
words
problems
ity of fourth
edu
scrutiny has led to the rethinking of impor
of
aspects
of future
the
test,
performance (e.g., National Council of Teachers of Mathe
matics [NCTM], 1989, 1991, 2000), so we analyzed the abil
and the growth
communities,
such
practices
or prior
a current
from
questions
answering
cited
mathematics
of competition
attention. That
for
Many
With the progress of globalization
classroom
rule memorizing.
non
and
test
for cue
seeking
problem)
and
routine
tests,
on
instructional
test-driven
several
(b) giving students test questions for drill, and (c) teaching
students standard algorithms and procedures
(especially
a
standardized
Key words:
high-stakes
routine mathematics
items,
test-taking
of
consisted
the mathe
that
considered
graders. We
an important
was
to
linkage
skills
standardized
high-stakes,
standard
high-stakes,
approach affected
fourth
spent
Cyprus
a
how
instructional
of
performance
on
test-taking
time
in 28 North
students
4
revealed
to determine
attempted
matics
North
University,
ized test-driven
of
Analysis
Mediterranean
We
to determine
The
authors
the
attempted
a high-stakes
standardized
instruc
testing-driven
on mathematical
tional
The
authors
approach
performance.
a multiple-choice
test
mathematics
developed
performance
ABSTRACT
effects
in
Graders
Cyprus
OSMAN CANKOY
Atat?rk
Mathematics
once
or
or
twice
Address
end-of-unit
No:24,
or monthly.
to Osman
Cankoy,
correspondence
via Mersin
North
Cyprus,
Lefkosa,
[email protected])
234
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
12. Gelibolu
10, Turkey.
Sok.
(E-mail:
[Vol. 98(No. 4)] 235
2005
March/April
3. Is there any difference among the groups of students
(those who spent more and those who spent less time on
number
Is there
4.
any
of mean
operation
(those
spent
test-taking
skills)
story
problems?
surface
orization,
which
and
on
time
less
on
routine
and
rote mem
reveal
could
or
understanding,
inmathematical
words
on
time
routine
spent
scores
of mean
of distractors,
choice
on
the groups of students
who
those
and
nonroutine
Is the
students
less
spent
scores
among
more
in terms
6.
and
problems?
5. Is there any difference
who
of
groups
who
those
and
in terms
skills)
test-taking
routine
of the broader domain
understanding
is
A
familiar
with a state English test
teacher
who
(Kober).
could prepare students by drilling them in a few dozen
items, group dependent?
the
systems
standardized
such
are
testing
systems,
are
that
assessment
commonplace
in which
those
especially
high-stakes
In
the world.
around
are
tests
the
most
fundamental
oriented
education
students'
motivation
and
abilities.
Overstressing
subvert
students'
on
strategies
gies taught every day in the classroom (Paris, 1995).
Educators primarily use high-stakes standardized testing to
sort
of
large numbers
sible;
however,
answer
or
test
multiple-choice
construction,
objective
usually
questions.
Also
important
speaking, and creating, which
are
schools,
dardized
or
testing
also
might
teaching activities.
test
status
on
a heavy
reliance
on an examination-oriented
to the test,
Teaching
system,
with
raises
scores
important
produce
that
stan
high-stakes
to the
the
deal"
their
time
(Quality Counts,
duces
in
the
States, 79% of teachers said that they spent "a great
of
unproductive
When
misleading.
tion on a test,
the
students
instructing
2001). Teaching
and
teachers
resulting
uncritical
in test-taking
but
students
directly
scores
likely
to a
give
an
also
curriculum
is part
and
of
an
process
ongoing
least
some
change
could
teachers
of
aspects
change
the most
in the
skills,
concepts
test.
For
and manner
curriculum
contained
and
to the
teaching
the
important knowledge,
for a partic
standards
ular subject; (b) addressing standards for basic and higher
order skills; (c) using test data to diagnose areas in which
students are weak and focusing in those areas; and (d) giv
diverse
students
ing
teachers,
and Camilli
school
included
to
teachers
place
problem
discussion
(2001)
assessments
Firestone,
and
of
test-item
formats,
greater
than
emphasis
use of hands-on
did
solving,
explanation
Jersey
Monfil,
found that the state's ele
in mathematics
a mix
and
connect
and
apply
In a study of New
researchers
University
Rutgers
mentary
writing,
to
opportunities
they learn (Kober, 2002).
of
their
science,
encouraged
other
materials,
states
and
on
stu
thinking.
Standardized
High-Stakes
in North Cyprus
Testing-Oriented
Education
skills
to the test not only pro
teach
suc
lifelong
In one
conducted
surveys
representative
nationally
United
and
performance
of teaching by (a) teaching
dent
knowledge
and skills in the subject being tested (Kober, 2002).
of
on
effects
students'
increasing
at
which
education
negative
decision making,
students monitor
Mayrowetz,
1989).
(Bowers,
traditional
inconsistent
their personal progress (Corono,
standardized
1992). Although
replacement of high-stakes
tests by other more positive tests may be difficult, educators
what
acting,
In general, any form of teaching
without
of
type
can and should be taught in
to second-class
relegated
in short
results
as writing,
such
skills
as pos
a manner
in as efficient
students
this narrow
classroom
example,
strate
learning
student
about
of
and
Positive features of high-stakes standardized testing might
be attained through replacement by performance testing or
portfolios of work samples in which assessment is linked to
can
might
examination
with
inconsistent
usually
student
also
results
because
inaccurate
Good Testing Practices
in which
the
overemphasizing
measures
of
by
and
examination
learning
are
strategies
taking
learning
are
cess (Dais, 1993).
the
distort
might
as a result
suffer
may
that
tests,
and students with
pre
examination
examinations
as outcomes
scores
of
importance
system
an
with
problem
is that
and
standardized
high-stakes
exist. Minorities,
to be used for prediction,
inferences
and
pared badly, educators might have difficulty producing crit
ical and reflective students. Those students who spend too
standardized tests might have
much time on high-stakes
what
they are learning or applying
problems integrating
their knowledge and skills to real situations.
The
of
validity
practices
yet continue
toward
oriented
tests,
a valid
be
should
performances
in particular,
disabilities
Standardized Testing
earlier
of their knowledge and skills (Thurlow,
ck Lehr, 2001). Even if one can
Thompson,
other problems might
Problems With High-Stakes
Education
test
students'
Quenemoen,
Literature
on
appeared
reliable measure
guarantee
Related
often
out of the hundreds of vocabulary words that students are
expected to learn (Popham, 2001).
standardized testing is not only a barrier for
High-stakes
good teaching practices and the resulting higher order stu
dent skills but also might be a waste of money. Haney,
Madaus, and Lyons (1993) estimated that "taxpayers in the
USA are devoting asmuch as $20 billion annually in direct
payments to testing companies" (p. 95). In such a testing
system,
cue
for only
searches
have
that
words
vocabulary
the
among
more
spent
nonroutine
on
scores
problems?
difference
who
(those
of mean
in terms
skills)
test-taking
nonroutine
ture of students'
can
specific
inflated
be
ques
pic
In North
Cyprus,
education
at
the
elementary
and
sec
con
ondary school level ishighly centralized and under the
enter
elemen
trol of the Ministry of Education. Students
tary school at age 6 and leave at age 11 (from first grade to
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
236 The
fifth grade). Most of the people inNorth Cyprus value edu
cation and educated people. As a result of that attitude,
to overemphasize
testing
entrance
examinations,
tend
people
sonant
with
and
can
be
working
called high-stakes
standardized tests. In addition, North
and obtaining good
Cypriots value knowing mathematics
giving
in mathematics.
grades
At
instructional
general
the
classrooms
ries;
in the
the
and
approaches
with
aligned
one can
3 years,
first
activities
mathematics
school
elementary
are
also
observe
theo
learning
constructivist
learn
ing perspectives. At the end of each school year, most fifth
graders (nearly one third of all elementary school graduates)
in North Cyprus take the Entrance Examination
for the
Middle Schools (EEMS), for which the general medium of
instruction isEnglish. The examination is considered by the
majority of families inNorth Cyprus as the most important
key in the future academic life of students. The EEMS is
prepared and administered once a year by the Ministry of
in
the
students actual test questions
students
standard
and
algorithms
teachers
preservice
are
Cyprus
families
many
on
test.
Each
year,
spend a large amount of their family budget
to prepare
lessons
private
to the
to teaching
mostly
geared
of North
schools
children
their
for
test.
the
EEMS is considered the most important high
Although
stakes standardized test in North Cyprus, the Ministry of
Education has not gathered empirical evidence about its
reliability and validity for the last 25 years. That lack of
evidence could be problematic and misleading for the over
all education system inNorth Cyprus.
data collected
In the academic year 2000-2001,
interviews
vice
and
questionnaires
activities
training
with
during
and
fourth-
through
group
ities
without
and
prior
rest
the
spent
test
instruction.
items
The
ele
fifth-grade
at
spent
least
of
70%
class
their
time
working
on
actual
test questions from a current test, (b) 85% of the teachers
gave their students actual test questions for drill, and (c)
75% of the teachers taught their students test-taking skills
and had them practice with tests from prior years. The data
also
that
showed
of
nearly
70%
matics
because
semester
time
teachers
spent
fifth-grade
on
and mathe
language
the EEMS consisted
and
for mathematics
and
fourth-
their
language
of two batteries of tests
skills.
Participants
=
randomly selected 28 schools out of 83 schools (n
1,006) in North Cyprus. Then, from each selected school,
we chose a number (ranging from 1 to 4) of fourth-grade
We
We
to determine
for observation
that
was
spent
on
28 volunteer
trained
perform
one
which
of
and
worksheets
students
most
of
from
in which
schools
nearly
skills
spent on test-taking
=
207)- That
group (MEG; n
on
its time
activ
current
used
generally
as the main
source
of
activities
noninstructional
The
practices.
students
who
in which
schools
attended
nearly 30% of the class time was spent on test-taking skills
formed the low-emphasis group (LEG; n = 448). That group
the
spent
on
its time
of
remainder
guided by the
of Education.
instructional
ities, along with
textbook recommended
by the Ministry
That
group
with
spent
the MEG.
the
groups,
more
on
time
cally
multiple-choice-type
and
the
activities
students
"teaching
compared
to be applied
for all
standard
in answering
was
test
"rule
category
those
to the observations
According
category
activ
noninstructional
activities
questions"
was
memorizing"
algo
specifi
used
used
most,
The
least.
of Education was
book recommended
by the Ministry
instructional
based primarily on traditional
approaches
and had few indirect relations with conceptual
under
and
standing
nonroutine
problem
solving.
Instrument
We developed a 36-item, multiple-choice
Mathematical
Performance Test (MPT) and adapted it for this study to
the mathematical
of
performance
fourth
graders.
test included three subtests (Number Skills, Operations
With Numbers, and Story Problems) and six dimensions:
(a) Routine Number Items (RNI), (b) Nonroutine Number
Items (ROI), (d)
Items (NRNI),
(c) Routine Operation
Items
Routine Story
Nonroutine
(e)
(NROI),
Operation
Problems (RSP), and (f) Nonroutine
Story Problems in
The
Method
time
answering
and (d) memorizing
and instructional activities guided by the textbook recom
mended by the Ministry of Education for regular classroom
measure
classes
(b)
(c) teaching
for
noninstructional
Teachers
time was
spent
on
its time
of
a textbook.
rithms and procedures
inser
the
school teachers showed that (a) 65% of the teach
mentary
ers
structured
(a)
test,
the listed categories had occurred in each 1-min period.
Each class was observed for at least 6 class hr (each class
hour was nearly 40 min). The students from schools in
which nearly 70% of class time was spent on test-taking
skills formed the high-emphasis group (HEG; n = 351). That
group
in elementary
for drill,
to observe
tried
Education.
approaches
sheet were
rules.
which
instructional
test-taking
or prior
procedures
questions,
especially multiple-choice
teachers'
The
a current
from
questions
Research
coded
classrooms.
listed on the observation
test
50% of the class
formed the moderate-emphasis
Because of this high-stakes standardized testing,
usually begins at the fourth through the fifth grade,
teacher
preservice
activities
on
The
thought-provoking
with
congruent
in
used
techniques
behaviorist
the
level,
each
sheet,
instructional
skills categories
con
preparation
which
observation
of Educational
Journal
the
observations.
the
skills
test-taking
a
structured
teachers
time
Nonroutine
five,
Contexts
seven,
eight,
sion (see Table
in mathematics.
elementary
preservice
Using
percentage
of class
to
unit
rienced
elementary
the Ministry
and
1).We
items,
were
There
(NRSP).
six
seven,
in each
respectively,
three,
dimen
previously gave the items to 13 expe
school
of Education
teachers
and
5
inspectors
and asked them to categorize
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
from
the
2005
March/April
[Vol. 98(No.
237
4)]
1. Selected Mathematics
TABLE
Problems
Test
question
Routine
Category
Number
skills
1 :How
Item
Nonroutine
is "three
and fifteen"
thousand
hundred
Item 9:Which one of the following
represented
numbers
(A) 315
(C) 30,015
Item
with
Operations
numbers
(B) 3,015
(D) 300,015
is the result
15:What
(A) 65,000
(C) 55,500
be written
(A)4
(Q2
of
x 26"
"2,500
cannot
(B)3
(D)0
Item
14:Which
used
in order
"a" with
(B) 62,000
(D) 6,500
can be
operation
to find the value of
the shortest way
in
"524 + a = 816?"
(A)Addition
(C)Multiplication
Story
can produce
Item 24: A factory
How
2,007
many
toys daily.
toys
can be produced
in 2 weeks?
problems
(A) 4,004
(C) 10,035
as
items
routine
and
nonroutine
by
the
considering
teach
.71, and
.61, respectively. Alpha
including
the
and
evaluation
36
items,
was
and
expert
for the test,
reliability
.86. We
asked
two mathematics
a measurement
in Turkey to judge the
Middle East Technical University
content validity of the test.We provided each of the experts
with information that included the definition of what we
intended
to be measured,
the
instrument,
and
the
surface
memorization,
understanding,
or
36: Ali, Mehmet,
and
Oya,
a race in 10, 8, 9,
completed
is
and 7 seconds
Who
respectively.
the first?
a
(A) Ali
(C) Oya
search
for
only cue words. All the experts concluded that the content
and format of the items were consistent with the definition
of the variable and the study participants.
became
School
us
for
we
conducted
this
study,
we
sought
permission
to
the
conduct
in the 28 schools
preservice
present
provide
for
a week
observation
the
results,
trained 28
so
the live classroom observations. We
into three groups (HEG, MEG,
participants
to the
administer
simultaneously. We
teachers
elementary
and
study
conduct
could
they
divided
LEG)
observers
the
and,
adminis
tered the MPT.
of Data
Analysis
We
methods.
based
We
observe
terms
Before
to
agreed
to the Ministry
of Education,
the Elementary
Education Division
offered permission and assis
the MPT
to
of the
from the Elementary School Education Division
Ministry of Education to develop and administer the EEMS
academic
within a high-stakes context for the 2000-2001
noted
year. The Elementary School Education Division
would
be
for
the
data
that
could
that permission
given
only
we
after
However,
impossible.
feedback
multivariate
Procedures
(B)Mehmet
(D) Cem
be collected from the EEMS. Unfortunately,
because of the
lack of evidence
for the reliability and validity of the
the research with that requirement
EEMS, conducting
according
descrip
tion of the intended sample. We also asked the experts to
judge whether the distractors of the items could enlighten
rote
Item
tance
from
educators
(B) Subtraction
(D) Division
Cem
(B) 4,014
(D) 28,098
ing and learning practices in North Cyprus elementary
schools. The fourth graders had 1 hr to answer the items on
the MPT. Alpha reliability of the subtests?Number
Skills,
.75,
Operations With Numbers, and Story Problems?were
in the
is a three-digit
place of "a" if "13a"
even number?
by numerals?
number
detailed
given
variable
the
choice
problems
with
operations
chi-square
to determine
and
of
(HEG, MEG,
items
and
answering
used
analysis
to
the
among
in
mathematics
skills,
We
problems.
differences
the
performance
nonroutine
and
to
of
on quantitative
the data analysis mainly
used analysis of variance
and
(ANOVA)
analysis of variance (MANOVA)
procedures
LEG). We
routine
whether
based
related
and
numbers,
dependent
in
groups
solving
that were
procedures
were
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
three
to
story
conduct
the
responses
on
the
the
group
interpreta
a
238 The
tions of the effect
Cohen
the
(1988);
study
was
sizes observed
on
the level of significance
MEG,
.05.
in routine
was
Results
were
one
The
intercorrelated.
related
to nonroutine
story
was
exception
problems.
the dimension
of
the
the
differ
ence, we used MANOVA
procedures to analyze the first
five dimensions
(RNI, NRNI, ROI, NROI, RSP). When
we analyzed the performances of the three groups (HEG,
However,
scores
TABLE
2.
Between
Intercorrelations
Subdimensions
of
ear
Subdimension
.41*
?
3. ROI
6. NRSP
.60*
.62*
-.02
.48*
.46*
-.05
.64*
?
.61*
-.02
.66*
-.02
routine
= Mathematical
Performance Test; RNI = routine num
ber items; NRNI = nonroutine number items; ROI = routine oper
ation items; NROI = nonroutine
items; RSP = routine
operation
= nonroutine
in nonroutine
story problems; NRSP
story problems
contexts.
Note. MPT
that
*p < .05.
for Routine
of mean
had
gender
and Nonroutine
Number
Group
Number
no
Low
scores
significant
Problems
problem
Nonroutine
MSD
SD n
Girls
4.56
4.27
Boys
1.61
1.66
1.39
1.23
172
0.85
179
0.90
4.41
Total
1.63
1.30
0.88
351
Girls
4.32
1.77
1.21
108
0.83
4.45
Boys
1.59
1.36
0.83
99
Total
4.38
1.68
1.29
207
0.83
Girls
3.69
1.73
1.06
233
0.79
3.56
Boys
1.85
emphasis
215
0.89
0.94
Total
3.63
1.79
1.00
448
0.84
4.11
Girls
1.74
1.20
513
0.83
4.00
Boys
1.77
1.13
493
0.89
4.06
1.76
1.170.86
Total
and boys
effect
on mean
problems.
that
resulted
from
RNI,
small effect sizes.
emphasis
Girls
main
mathematics
(see Table 5).
High emphasis
Moderate
items,
Skills
Routine
M
than
higher
mathematics
significant
and RSP with
RNI and NRNI
Statistics
significantly
Table 5 shows that group had significant effects on mean
scores of RNI and NRNI with small effect sizes. However,
group was more effective on RNI compared with NRNI
(see effect sizes in Table 5). Bonferroni post-hoc tests (p <
.05) revealed no significant difference between the HEG
and the MEG in terms of mean scores on RNI and NRNI.
However, both groups had significantly higher mean scores
than did the LEG on RNI and NRNI. Results also revealed
-.03
3. Descriptive
was
nonroutine
nonroutine
and
combination
Number
?
TABLE
no
had
gender
for
NRNI, NROI,
.45*
NROI
?
5
.68*
?
4.
5. RSP
4 6
3
?
l.RNI
2. NRNI
12
on
score
MANOVA
results in Table 4 revealed that group and gen
der and their interaction had significant effects on the lin
for Fourth Graders (N = 1,006)
theMPT
items
mathematics
mean
=
t(l,005)
43.285, p < .05. That finding also was confirmed
=
a
1.36). Later analysis showed sig
by
large effect size (d
nificant but small group-dependent main effects (consider
scores of routine, F(2,
ing small effect sizes) on mean
=
=
<
.01, r\2
.07, and on nonroutine
35.239, p
1,006)
=
=
mathematics
24.644, p < .05, i\2
problems, F(2, 1006)
the
HEG
and
MEG
with
the
LEG.
0.05, favoring
compared
of the MPT
Because
Research
LEG) in solving nonroutine
story problems, we used
AN
OVA
one-way
procedures (see Table 3).
A paired samples t test showed that mean score on items
the suggestions of
used throughout
Table 2 shows that five of the six dimensions
of Educational
Journal
1,006
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
effect
on
mean
scores
on
March/April
2005
[Vol. 98(No.
4)] 239
difference between the HEG and the MEG in terms of mean
scores on ROI and NROI. However, both of those groups
had significantly higher mean scores than did the LEG on
ROI and NROI. Results also revealed that gender had no sig
nificant effect on mean scores on ROI (see Table 6). How
Operations With Numbers
Results showed that group had significant effects on mean
scores of ROI and NROI with small effect sizes (see Table 5).
Bonferroni post-hoc tests (p < .05) revealed no significant
TABLE 4. Wilks's Lambda Test of Significance for Effects of Level of Preparing for
High-Stakes
Standardized
on Fourth
Test
Mathematics
Effect
Graded
Error df
Hypothetical df
Group A
Group B
AxB
.93
7.53
10
.98
3.43
5
.02
1.99
10
Note. Fourth graders' performances
and solving routine story problems.
*p < .05.
5. Tests
TABLE
Nonroutine
of Between-Subjects
and Operation
Number
Variable
involved
SS
1,992.00
996.00
1,994.00
routine and nonroutine
answering
Effects
Items
for Performance
and
Solving
147.11
.017
.031*
.010
and operation
in Answering
Routine
Story Problems
25.02
10.88
problems
and
.000*
.048
15.10
.000*
.029
141.94
70.97
31.15
.000*
.059
143.04
151.06
71.52
20.21
.000*
.039
75.53
21.47
.000*
.041
B
Group
2.13
2.13
0.72
.395
.001
0.46
0.46
0.63
.427
.001
5.82
5.82
2.56
.110
.003
27.92
27.92
7.89
.005*
.008
0.49
0.49
0.14
.709
.000
0.88
.416
.002
2.45
.086
.005
Group
RNI
NRNI
ROI
NROI
RSP
.036
.004*
A
73.56
21.77
RNI
NRNI
ROI
NROI
RSP
Routine
number
.000*
MS
df
Group
RNI
NRNI
ROI
NROI
RSP
Performances
A X Group
5.17
2.58
3.54
1.77
B
4.62
2.31
1.01
.363
.002
10.41
5.20
1.47
.230
.003
30.82
15.41
.013*
.009
4.38
Error
RNI
NRNI
ROI
NROI
RSP
Note.
2.94
0.72
1,000
1,000
2.28
3,538.54
3,517.86
1,000
3.52
2,278.38
operation
3.54
items; NRNI = nonroutine number
items; RSP = routine story problems.
RNI = routine number
nonroutine
*p <
1,000
1,000
2,940.13
721.00
= routine
items; ROI
operation
items; NROI
.05.
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
=
240 The
ever, results showed that girls had a significantly higher
mean score than did boys on NROI, t( 1,004) = 3.33, p < .05,
with an effect size near to amedium degree (d = 0.3).
TABLE
scores of RSP groups were
6. Descriptive
Statistics
sig
for Routine
and Nonroutine
Operation
Problems
Operation
problem
Routine
Group
Nonroutine
M SD n
M
SD
Girls
3.50
1.46
4.37
172
1.79
3.35
Boys
1.48
3.97
179
1.81
Total
3.42
1.47
4.16
351
1.81
108
1.96
High emphasis
Moderate
Low
emphasis
3.45
Girls
1.46
4.17
3.46
Boys
1.59
4.11
99
1.98
Total
3.46
1.52
4.14
207
1.97
Girls
2.85
2.51
Boys
1.59
1.47
3.70
3.10
233
1.87
215
1.94
Total
2.68
1.54
3.41
448
1.92
Girls
3.19
3.00
Boys
1.55
1.56
4.02
3.62
513
1.88
493
1.95
3.10
1.56
3.83
emphasis
Total
Girls
TABLE
and boys
7. Descriptive
Statistics
for Routine
and Nonroutine
Story
Story
Group
1,0061.93
Problems
problem
Routine
M
Nonroutine
M SD n
SD
High emphasis
Moderate
Low
Girls
3.99
3.56
Boys
1.90
2.02
0.65
0.75
172
0.70
0.71
179
Total
3.77
1.97
0.70
0.71
351
3.30
Girls
3.84
Boys
1.86
1.91
0.85
0.79
0.71
108
0.80
99
Total
3.56
1.90
0.82
207
0.75
Girls
2.92
1.81
0.70
233
0.69
2.94
Boys
1.79
0.74
0.70
215
Total
2.93
1.80
0.72
448
0.70
3.36
Girls
1.90
0.71
513
0.70
3.35
Boys
1.93
0.75
0.72
493
1.92
0.73 0.71
emphasis
emphasis
Total
Girls
and
boys 3.35
of Educational
Research
nificantly different with a small effect size. Bonferroni post
hoc tests (p < .05) revealed that no significant difference
existed between the HEG and the MEG in terms of mean
scores on RSP. Both groups had significantly higher mean
scores on RSP than did the LEG (see Table 7). Results also
revealed that gender and group had significant interaction
effects on mean scores on RSP with a small effect size. In
Story Problems
Table 5 shows that mean
Journal
1,006
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
2005
March/April
4)] 241
[Vol. 98(No.
the LEG, there was no significant difference between girls
and boys in terms of mean scores on RSP. In the HEG, girls
had significantly higher mean scores than did boys on RSP,
=
2.02, p < .05, but the difference was insignificant
t(349)
because of the small effect size (d = .01). In the MEG, how
ever, boys had a significantly higher mean score than did
=
-2.067, p < .05. The effect size for
girls on RSP, t(205)
=
as small as for the HEG. Results
was
not
the MEG (d
.2)
also showed that group had no significant effect on mean
scores
on NRSP
(see Table
In other
8).
mean
words,
scores
of
the
items,
surface
rization,
were
wrords,
distractors
or
group
dependent.
for
search
Chi-square
cue
only
statistics
for each
1) showed that in 25 items (16 of 25 were
out
contexts)
in the
reported
of
36
the
items,
choice
was
section
Instruments
of
instructional
nonroutine
small
relatively
no
differences
mances
were
That
in
depen
students from
the typical distractor, A.
their
results
routine
show
is reasonable
finding
routine
fourth
the
education
surface
than
in
characteristics,
students
will
be
dardized
that
students
performances,
were
instructed
their
story
spent
test was
answering
problems.
an
the
preparing
espe
problems,
reason
important
with
The
favoring
a test-driven
routine
mostly
who
students
TABLE
8. Tests
problems
ducted
approach
number,
were
mathematics
this
test
who
resulted
from
instructed
of
of Between-Subjects
Group (A)
Group (B)
AxB
is an
505.524
no
untested
that
on
work
that
the
proce
we
problems.
in non
differences
are
concepts
less
the high-stakes
analyzed
likely
2001). Before we con
(Quality Counts,
nonroutine
show
that
story
nonroutine
standard
The
problems.
problem
solving
activities
of mathematics
dimension
standards.
solution
Unlike
was
a solution,
not
that
standards,
previous
solving as engagement
method
to find
Many
& Levi,
although
that
must
version
in a task for which
known
students
notice
researchers
in advance
and
on
knowl
draw
their
in
Fennema,
(e.g.,
1998; Gallagher
gender
was
not
as effective
did
items. When
tests
boys.
the
was
for Nonroutine
Story
Problems
F
MS
on
students'
than did
placed,
the
performed
moderate
girls. In North
rfp
2.070
.127
.004
0.387
.534
.000
0.456
0.903
.406
.002
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
girls
in
0.196
0.506
routine
solving
a large emphasis on
contrary,
p
Franke,
In this study,
girls responded better
operation
On
Jacobs,
1994).
as group
in the group in which
standardized
than
Carpenter,
& De Lisi,
performance,
to nonroutine
in mathemat
differences
gender
emphasis group, boys outperformed
1.046
1,000
to prob
instruction. For
should be emphasized during mathematics
in
the
NCTM
released
2000,
example,
Principles and Stan
a revised version of the
dards for School Mathematics,
better
a
2
observed
study
important
story problems,
Effects
2.0932
0.1961
0.913
of
types
guarantee
nonroutine
includes
that we
emphasized
this
high-stakes
and
with
teaching
to use
when
which
research,
questions
results
boys
students
operation,
of perfor
ized tests used at the elementary
school level in North
we
for
the
last
10
found
that only 5% of the
years;
Cyprus
dfSS
Source
Error
the
classroom,
was
to be emphasized
mathematics
stan
high-stakes
in their
factor
differences
a
for
not
it does
to determine
able
further
ics performances
cially those of problems, are generally emphasized (David
the amount of
son, Deuser, 6k Sternberg, 1994). Although
time
terms
in
groups
edge, and through this process, they will often develop new
mathematical
(NCTM, 2000, p. 51).
understandings"
overall
Cyprus
three
learning algorithms might
in the
occasions
order
items.
the
in North
system
and
procedures
considers
How
a test-driven
that
standard
lem solve. Although
some
"the
perfor
better
mathematics
one
when
graders'
were
items
in nonroutine
performances
examination-oriented
which
that
mathematics
indicates
Therefore,
story problems.
to solve
different
procedures
not
to teach
is obviously
the way
them
students
1989
overall
the
among
defines problem
The
performed
items.
operation
in nonroutine
routine
Discussion
mances
and
sizes
effect
also
approach
number
instructional approach has little to do with
improving
story problems
higher order skills like solving nonroutine
(e.g., Kenney ck Silver, 1997; Lindquist, 1989). There were
A
HEG (67%) andMEG (64%),X2(8,N = 1,006) = 16.183,
p < .01, selected
ever,
ing word
the
dent on the group (see Table 2). For example, in Item 14,
the typical distractor "A" was selected by fewer students
from LEG (44.2%) compared with students from HEG
=
=
32.114,
1,006)
(62.8%) and MEG (57.3%), x2(8, N
p < .01. There was a similar finding in Item 36 (see Table
1) in which the typical distractor was A. For Item 36, fewer
students from the LEG (60%) compared with
on
dures and how to use them correctly. Kantowski
(1981)
stated that problem solving means different things to dif
ferent people; the majority of people believe that it is solv
rote memo
enhance
could
understanding,
item (see Table
in nonroutine
which
test-driven
better
problems
of the three groups were not significantly different on
NRSP.
We also wanted to determine whether the choice of the
distractors
more
Cyprus,
242 The Journal of Educational Research
families have stricter rules for girls than for boys because of
the country's social and ethical perspectives. That rigidity
might cause girls to follow certain rules and prescriptions
more precisely than boys do. Therefore,
it is likely that in a
more
examination-oriented
environment,
more
benefit
girls
boys
We
the
elementary
found
that
skills did not
We
school
level.
more
spending
considered
and
conceptual
of nonroutine
story
in a nonconstructivist
dents
solving.
stu
educating
not
guarantee
may
perspective
test-driven
standardized,
high-stakes
lead to higher
educational
too much
spending
instruction
The
tend
may
that
groups
to
one
with
not
should
the
conclude
in which
group
that emphasized
that
no
observations
showed
that
was
there
use
limited
nontraditional
Therefore,
practices.
teaching
that foster
selection of instructional approaches
is
mathematics
skills
essential.
higher order
likely
careful
Moreover,
groups
that
ly performed
better
on
a test-driven
used
small effect sizes throughout
groups
on
were
test-taking
skills. Small
instructional
test-driven
to better
much
from
that
those
who
experts
all
understanding.
of
not
contribute
interesting
findings of this study was the dependency
of
items
that
could
or
understanding,
rote
enlighten
search
aspects
about
offer
change
the
overall
and
education.
recommendations
following
to eliminate
3. Tests
and
its con
and
knowledge
assessed.
instruction
classroom
skills,
problem-solving
should
especially
should be
and
emphasize
are
that
those
nonroutine.
4. Students should be trained and encouraged to become
skilled problem solvers with the ability to conduct qual
itative
analyses
tative
solutions.
5. Preservice
and
to view
tunity
structivist
assessments
Sur
and MEG were distracted more than
prisingly,
were low-emphasis groups. That finding implies that more
time spent focusing on procedural skills such as drills, test
standardized
high-stakes
of mathematics
be
might
surface
in group.
words
decisions
we
of problems
inservice
before
and
teach
teachers
they
should
mathematics
quanti
perform
have
the
oppor
in a more
con
way.
should
use
open-ended
age the use and integration
of the distractors
memorization,
cue
for only
that
implied
a
doubt
views
the
sources of assessment
2. Multiple
information
used inmaking high-stakes decisions.
6. Teachers
One
attempt
is no
there
Also,
to
time
less
the most
did
approaches
make
summarize,
but
that these
spent
sizes also
effect
large
problems,
the study showed
so different
not
approach
mathematics
many
should
to maintain
of mathematics
knowledge
perspective.
at the
especially
teachers
prepare
to
need
beliefs (in line with changing positive approaches in the
education field) of society as a whole, beginning with the
foster
structivist,
and
routine
and
Researchers
changed.
and pedagogical
educators
nections
con
of
that
tests,
solely because we found no differences among
in higher order mathematics
skills. Classroom
emphasized
the groups
rules,
understanding
computation
based activities
level
1. If it is not possible
problem
was
least
instruction
test-driven
per
algorithms,
was
there
require
in light of our findings and current practice.
scores.
inflated
be
conceptually
school
as
school
a constructivist
with
To
test-driven
instruction
test-driven
formed better on the problems
but
some
only
produce
used more
or
skills
test-taking
not
does
problem
strong content
to the test or
quality. Teaching
on
time
education
in elementary
must
solving
elementary
skills that are
improvement in higher order mathematics
for
scientific
important
thinking and human life. Kenney
and Silver (1997) and Lindquist (1989) have shown that
of basic
discipline
do not
that
algorithms
a
only
and reasoning (B. J. Reys 6k R. E. Reys, 1998). Therefore,
Reys and Nohda suggested that the narrow view of mathe
emphasize
aspects
reasoning
that
found
as
mathematics
and
formulas,
matics
test-taking
story problem
qualitative
and
problems
on
time
class
influence nonroutine
view
might
than boys. For example, Cai (1995) and Hyde, Fennema,
and Lamon (1990) stated that girls performed better than
at
2001; Sowder,
1992). If the instructional
on
is
emphasis
mainly
procedural skills (rote memoriza
tion) and not conceptually
based, then the students not
but also
only suffer from (meta)cognitive
disadvantages
Tchoshanov,
and
classroom
problems
of conceptual
that
encour
knowledge
in
practices.
REFERENCES
the HEG
or
taking,
with
reasoning,
can
memorize
tests
with
practice
connection
students
to
and
this
Unfortunately,
that
elementary
introducing,
ciency
without
2000,
with
written
total
algorithms
E. Reys
little
qualitative
single
as
current
up
and
and
6k Nohda,
a
to
path
1989;
practice,
in
instruction
to
80%)
establishing
solving
understanding
to
them
1997; Lindquist,
range
practicing,
conceptual
1991; R.
a
of mathematics
(estimates
developing,
for
search
with
encourage
as well
study,
the majority
schools
and
and
single answer (Kenney & Silver,
Mestre, 1991; Sacks, 2000).
indicates
years,
prior
understanding
conceptual
distract
procedures
from
is spent
profi
computations
(e.g.,
America
1992;
Pape
6k
(1991). An educational strategy tomove theAmerican educa
DC:
tional system ahead tomeet the needs of the 21st century. Washington,
U.S. Department
of Education.
Amrein, A. L., & Berliner, D. C. (2002). High-stakes
testing, uncertain
1-70.
i0(18),
ty, and student learning. Education Policy Archives,
to standardized educational assessment.
Bowers, B. C. (1989). Alternatives
on Educational Management.
(ERIC
Eugene, OR: ERIC Clearinghouse
America
2000.
Service No. ED312773)
Document
Reproduction
students' mathe
Cai, J. (1995). A cognitive
analysis of U.S. and Chinese
on tasks involving
matical
computation,
performance
simple problem
solving, and complex problem
solving. Journal for Research inMathe
matics Education
Council
(Monograph Series 7). Reston, VA: National
of Teachers of Mathematics.
Cohen,
J. (1988). Statistical power analysis for the behavioral sciences (2nd
ed.). Hillsdale, NJ: Erlbaum.
for learn
L. (1992). Encouraging
students to take responsibility
Corono,
ing and performance.
Elementary School Journal, 93, 69-83.
Dais, T. A. (1993). An analysis of transition assessment practices: Do they rec
DC: Vol. 2, pp. 1-19. (ERIC
ognize cultural differences? Washington,
Document
Service No. ED372519)
Reproduction
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions
March/April
2005
[Vol. 98(No. 4)] 243
E., Deuser, R., & Sternberg, R. J. (1994). The role of metacog
in problem solving. In ]. Metcalfe
& A. R Shimamura
(Eds.),
Metacognition
Cambridge, MA: MIT Press.
(pp. 207-226).
Fennema, F., Carpenter, T. R, Jacobs, V. R., Franke, M. L., & Levi, L. W.
in young children's
(1998). A longitudinal
study of gender differences
mathematical
thinking. Educational Researcher, 27, 6-11.
D., & Camilli, G. (2001, April).
Firestone, W. A., Monfil,
L., Mayrowetz,
The ambiguity of teaching to the test: A multiple method analysis. Paper pre
Educational
of the American
Research
sented at the annual meeting
Davidson,
nition
Seattle, WA.
Association,
in scholastic
A. M., & De Lisi, R. (1994). Gender differences
stu
test?mathematics
solving among high-ability
problem
aptitude
dents. Journal of Educational Psychology, 86, 204-211.
Haney, W., Madaus, G., & Lyons, R. (1993). The fractured marketplace for
standardized testing. Boston: Kluwer.
in
differences
Hyde, ]. S., Fennema, E., & Lamon, S. J. (1990). Gender
Gallagher,
A meta-analysis.
mathematics
Bulletin,
Psychological
performance:
107(2), 139-155.
(Ed.), Mathe
Kantowski, M. G. (1981). Problem solving. In E. Fennema
Council
matics education research (pp. 111-126). Reston, VA: National
of Teachers of Mathematics.
Kenney, P. A., & Silver, E. A. (1997). Results from the seventh mathematics
assessment of theNAEP. Reston, VA: National
Council
of Teachers of
Mathematics.
Kober, N. (2002). Teaching to the test: The good, the bad, and who's responsi
ble. Test talk for leaders, no. 1.Center on Education Policy. Retrieved
Jan
uary 20, 2003, from http://www.ctredpol.org/pubs/testtalkjune2002.html
Lindquist, M. M. (1989). Results from the fourth mathematics assessment of
of Teachers of Mathematics.
theNAEP. Reston, VA: National
Council
E. (1994). Rite of passage: French teachers demand
Marcus,
reasoning.
Educational Vision, 2(2), 18-19.
in pre-college
Mestre,
J. P. (1991). Learning and instruction
physical sei
ence. Physics Today, 44(9), 56-62.
and
Council
of Teachers
of Mathematics.
(1989). Curriculum
evaluation standards for school mathematics. Reston, VA: Author.
of Teachers
National
Council
of Mathematics.
(1991). Professional stan
dards for teaching mathematics. Reston, VA: Author.
National
Council
of Teachers of Mathematics.
(2000). Principles and stan
dards for school mathematics. Reston, VA: Author.
in
M. A. (2001). The role of representation(s)
Pape, S. T., & Tchoshanov,
mathematical
Theory Into Practice, 40(2),
understanding.
developing
National
118-127.
assessment
is better than high
learner-centered
Paris, S. G. (1995). Why
stakes testing. In N. M. Lambert & B. L. McComb
(Eds.), How students
learn: Reforming schools through learner-centered education (pp. 189-210).
Association.
DC: American
Washington,
Psychological
tests don't measure educational
standardized
]. (1999). Why
Popham, W
quality. Educational Leadership, 56(6), 8-15.
VA: Associa
]. (2001). The truth about testing. Alexandria,
Popham, W
tion for Supervision
and Curriculum Development.
Counts.
(2001). A better balance. Education Week, 20(17), 36.
in the elementary curricu
Reys, B. J.,& Reys, R. E. (1998). Computation
lum: Shifting the emphasis. Teaching Children Mathematics,
5, 236-241.
alternatives for the twenty
Reys, R. E., & Nohda, N. (1992). Computational
of Teachers of Mathematics.
first century. Reston, VA: National Council
Sacks, P. (2000). Standardized minds: The high price of America's
testing cul
ture and what we can do to enhance it.Cambridge, MA: Perseus Books.
sense. In D. C. Grouws
and number
Sowder, J. T. (1992). Estimation
(Eds.), Handbook
of research on mathematics
teaching and learning (pp.
New York: Macmillan.
371-389).
S., & Lehr, C. (2001). Princi
R., Thompson,
Thurlow, M., Quenemoen,
systems
ples and characteristics of inclusive assessment and accountability
on Edu
MN: National
Center
(Synthesis Rep. No. 40). Minneapolis,
Quality
cational Outcomes.
This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM
All use subject to JSTOR Terms and Conditions