Statistik und Wissenschaft, Bd. 13 / Statistics and Science, Vol. 13

Transcription

Statistik und Wissenschaft, Bd. 13 / Statistics and Science, Vol. 13
Statistics and
Science
Handbook on the application of quality
adjustment methods in the Harmonised Index
of Consumer Prices
Developed within the European project
”CENEX HICP Quality Adjustment“
by Stefan Linz, Alexandra Beisteiner, Michaela Böttcher, Katrin Dorka,
Rui Evangelista, Jan de Haan, Peter Handmann, Alexandra Lohn,
Daniel Mota, Peter Hein van Mulligen, Andrew Murray, Martin Ribe,
Peter Taschowsky, Marc Vos
Volume 13
Federal Statistical Office of Germany
Bibliographic information of the German National Library
The German National Library has registered this publication
in the German National Bibliography; detailled bibliographic
data are available on the internet at www.dnb.ddb.de
Published by: Statistisches Bundesamt (Federal Statistical Office), Wiesbaden
Homepage: www.destatis.de
Information service
Phone: + 49 (0) 611 / 75 24 05
Fax:
+ 49 (0) 611 / 75 33 30
www.destatis.de/kontakt
Information on this publication
Phone: + 49 (0) 611 / 75 26 59
Fax:
+ 49 (0) 611 / 72 40 00
[email protected]
Published in September 2009
Price: EUR 24,80 [D] (Print)
Order number: 1030813-09900-1 (Print)
ISBN: 978-3-8246-0835-5 (Print)
Price: EUR 0,– [D] (Download)
Order number: 1030813-09900-4 (Download)
ISBN: 978-3-8246-0854-6 (Download)
Distribution partner: SFG Servicecenter Fachverlage
Part of the Elsevier Group
P.O.B. 43 43
72774 Reutlingen
GERMANY
Phone: + 49 (0) 70 71 / 93 53 50
Fax:
+ 49 (0) 70 71 / 93 53 35
[email protected]
© Statistisches Bundesamt, Wiesbaden 2009
Reproduction and distribution, also of parts, are permitted provided that the source is
mentioned.
Preface
This handbook is the result of a new form of co-operation within the European Statistical
System. The basic idea of that new way of co-operation is that statistical experts from
various countries co-operate in developing specific methods and then make their
results available to all statistical institutes in the European Union. The present
handbook shows that this form of co-operation can be very successful.
The idea underlying such forms of co-operation is contained already in the Palermo action
plan of the European Statistical System adopted in 2002. Based on it, and chaired by the
Federal Statistical Office, a Task Force was set up which proposed the “Centres and
Networks of Excellence (CENEX)” as modern co-operation networks. The goal was to
concentrate the development of statistical methods in various areas in a few member
states which are particularly competent in the relevant area. The Task Force had
proposed price statistics for a CENEX pilot project; in 2006 it was implemented under the
chairmanship of the Federal Statistical Office.
The principles of co-operation have in the meantime been developed further. CENEX has
been renamed ESSnet (European Statistical System networks) and has been included as a
model into the European Communities Statistical Programme for 2008 – 2012. With the
“Regulation of the European Parliament and of the Council on European statistics”, which
took effect on 1 April 2009, co-operation networks such as ESSnet have been laid down in
a European legal basis and have thus gained even more importance in the European
Statistical System. The Regulation contains comprehensive framework conditions for the
development, production and publication of European statistics.
The Federal Statistical Office is proud of having actively participated in that promising
form of co-operation from the start. I would like to thank all experts involved for their
commitment and contribution. Their work has advanced price statistics and has
contributed to shaping the way towards a future in the spirit of partnership in European
Statistics.
Yours,
Roderich Egeler
President of the Federal Statistical Office
Federal Statistical Office, Statistics and Science, Vol. 13/2009
3
Preliminary remarks
This “CENEX Handbook” is a result of the project “CENEX HICP Quality Adjustment”. It is
dedicated to the price statisticians within the European Statistical System (ESS) and shall
assist them with the introduction and development of quality adjustment methods. At the
same time it aims at the harmonisation of quality adjustment practices within the ESS.
The ambition is to provide as concrete examples as possible to supplement the general
training of price statisticians in a useful way.
The principle of CENEX (Centre and Network of Excellence) is that statistics experts from
different countries join forces to elaborate practicable approaches to the solution of a certain methodological problem and afterwards make their findings available to all national
statistical institutes (NSIs) of the European Union.
From October 2006 until September 2008 price statistics experts from Austria, Belgium,
Germany, Ireland, the Netherlands, Portugal and Sweden have worked together and discussed practical approaches to quality adjustment in consumer price statistics. Various
practical issues have been intensely discussed and practical solutions were tested on authentic examples from different product areas.
The project has been coordinated by the Federal Statistical Office of Germany. It was cofinanced by Eurostat 1) and was accompanied by a Steering Group which included members from Eurostat, the European Central Bank, and from the national statistical institutes
of Finland, Italy, and the United Kingdom.
The Handbook is meant as an offer for the NSIs helping them fulfil the existing standards.
The proposals in the Handbook do not impose any obligations on the NSIs to use them
and thus do not lead to additional statistical requirements. In this way, the Handbook can
be seen as a self-helping instrument of the member states to reach the requirements of
the ESS.
Even if the examples in the Handbook are meant to be as concrete and instructive as possible it may not always be possible to copy them exactly in regular practice of NSIs for the
following reasons: First, there are country-specific surrounding conditions which may require individual solutions in some cases. Second, examples may become obsolete over
time which especially applies to the products that underlie technical progress which are
dealt with in this Handbook. Third, in the field of quality adjustment methods the methodological discussions have not yet come to an end in all respects. Thus the Handbook will
likely have to be updated by time. This does not only apply to the product-specific examples but also to the more general parts of the Handbook in which the basic ideas of the
product-specific proposals are summarised. The present text can therefore only give a current snapshot of the ongoing discussion on the application of quality adjustment methods
in practice.
1) The CENEX project on HICP Quality Adjustment was co-financed by the European Community, represented
by the Commission of the European Communities (“the Commission”). The sole responsibility for this
Handbook lies with the authors. The Commission is not responsible for any use that may be made of the
information contained therein.
4
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Preliminary remarks
The CENEX team would like to thank the members of the HICP Working Group for their very
helpful comments and advice throughout the duration of the CENEX project.
Wiesbaden, November 2008
Stefan Linz, Alexandra Beisteiner, Michaela Böttcher, Katrin Dorka, Rui Evangelista, Jan
de Haan, Peter Handmann, Alexandra Lohn, Daniel Mota, Peter Hein van Mulligen,
Andrew Murray, Martin Ribe, Peter Taschowsky, Marc Vos
Federal Statistical Office, Statistics and Science, Vol. 13/2009
5
Contents
Page
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Preliminary remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Standard procedures for quality adjustment
1
Basic concepts and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.1
1.2
The HICP as a Laspeyres-type index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
12
2
Identification of consumption segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.1
2.2
2.3
The role of consumption segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Identification of consumption segments in practice . . . . . . . . . . . . . . . . . . . . . . . .
Consumption segments with or without weighting . . . . . . . . . . . . . . . . . . . . . . . . . .
14
15
16
3
Construction of target samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.1
3.2
3.3
3.3.1
3.3.2
3.3.3
3.4
3.5
HICP as sample statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dimensions to be considered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Selection models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cut-off sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Purposive sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stratification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
17
19
19
20
21
21
23
4
Replacement strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
4.1
4.2
4.3
4.3.1
4.3.2
4.4
4.4.1
4.4.2
Replacements and revisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Focus on the dimension of models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Representativity at any point of time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The problem with sticking to similar or identical models by all means . . . .
Replacement to ensure representativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Product specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tight or loose product specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Several product specifications within one consumption segment . . . . . . . . .
24
24
25
25
27
28
28
29
5
Quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
5.1
5.1.1
5.1.2
5.1.3
5.1.4
Creating continuous price series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Link-to-show-no-price-change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Direct price comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Estimating the value of quality differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
30
31
33
33
6
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Contents
Page
5.2
5.2.1
5.2.2
5.2.3
5.2.4
5.2.5
5.3
Applying quality adjustment methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Explicit quality adjustment methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hedonic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Supported judgemental quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Implicit quality adjustment – bridged overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Combining quality adjustment methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
36
37
37
38
39
43
Product-specific guidance and examples
not using hedonics
1
1.1
1.2
1.3
1.4
1.5
1.6
Television sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
45
45
48
49
50
51
2
New cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
2.1
2.2
2.3
2.4
2.5
2.6
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
59
61
62
66
67
3
Used Cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
3.1
3.2
3.3
3.4
3.5
3.6
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
77
83
84
88
89
4
Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
4.1
4.2
4.3
4.4
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
98
100
101
Federal Statistical Office, Statistics and Science, Vol. 13/2009
7
Contents
Page
4.5
4.6
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
105
105
5
Washing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
108
5.1
5.2
5.3
5.4
5.5
5.6
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
108
108
110
112
113
113
6
Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
6.1
6.1.1
6.1.2
6.1.3
6.1.4
6.1.5
6.1.6
6.2
6.2.1
6.2.2
6.2.3
6.2.4
6.2.5
Rapidly-changing market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples for the rapidly-changing market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Long-selling market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
120
120
125
126
126
127
130
130
131
133
134
135
7
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
136
7.1
7.1.1
7.1.2
7.1.3
7.1.4
7.1.5
7.2
7.2.1
7.2.2
7.2.3
7.2.4
7.2.5
Application programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Video games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
136
136
137
139
140
142
142
142
142
145
146
146
8
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Contents
Page
Hedonic regression
1
Applying hedonic regression – concepts and guidance for use . . . . . . . . . . . .
147
1.1
1.2
1.3
1.3.1
1.3.2
1.3.3
1.3.4
1.3.5
1.4
An introductory example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Types of hedonic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Topics for consideration in applying hedonic methods . . . . . . . . . . . . . . . . . . . . .
Topic 1 – Data needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Topic 2 – Data editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Topic 3 – Creating the hedonic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Topic 4 – Index calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Topic 5 – Refreshing the hedonic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
150
150
151
154
157
168
170
172
2
Television sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
174
2.1
2.2
2.3
2.4
2.5
2.6
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
174
174
177
178
182
183
3
Used cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
190
3.1
3.2
3.3
3.4
3.5
3.6
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
190
190
192
192
199
200
4
Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
211
4.1
4.2
4.3
4.4
4.5
4.6
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
211
211
214
214
219
220
5
Washing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
226
5.1
5.2
5.3
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
226
226
229
Federal Statistical Office, Statistics and Science, Vol. 13/2009
9
Contents
Page
5.4
5.5
5.6
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
229
233
234
6
Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241
6.1
6.2
6.3
6.4
6.5
6.6
Proposal for the identification of consumption segments . . . . . . . . . . . . . . . . . .
Proposal for the construction of a target sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the replacement strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposal for the quality adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cost estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241
242
243
245
250
250
7
Annexes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259
7.1
Annexes
Annex 1: Details on different types of hedonic methods . . . . . . . . . . . . . . . . . . . .
Annex 2: Brief theoretical digression on number of price observations . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259
264
265
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
267
7.2
10
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
1
Basic concepts and definitions
1.1
The HICP as a Laspeyres-type index
An important regulation for European price statistics, Council Regulation No. 2494/95,
states that the HICP shall be a Laspeyres-type index. When sticking to the Laspeyres principle the price development is calculated for a previously established expenditure pattern
which is fixed for a certain time period – one year in many cases.
In order to apply the Laspeyres principle to the HICP, so-called consumption segments are identified.
A consumption segment embraces
a set of product-offers, which are
deemed to serve a common consumption purpose. The identification of consumption segments remains constant for the whole
index period.
Definition of the HICP as a Laspeyres-type
index
Council Regulation No. 2494/95 forms the legal
basis for the HICP. According to Article 9 of this
Regulation, “Member States shall process the
data collected in order to produce the HICP,
which shall be a Laspeyres-type index . . .”
The Laspeyres principle does not mean that the different product-offers, which are representing the consumption segments, need to be kept constant with their quality of
models, outlet types and locations. Instead, the product-offers which are selected for
price observation should be chosen in order to represent the consumption segments
with respect to models, outlet types and locations they are belonging to as accurately as
possible. This does not only apply for the initially chosen product-offers, but must rather
be fulfilled at any point of time.
From this follows an important implication: The price statistician faces the challenge of
choosing such product-offers for price observation, which adequately represent the consumption segments, outlet types and locations in question at any time. Markets and
products continuously change. This dynamic requires that models losing their representativity have in due time to be replaced by new ones in order to ensure the sample of
product-offers being representative of their respective consumption segments at any
point of time. This is especially true for markets showing significant technical progress.
However, introducing a more representative product-offer or model as replacement for
an obsolete one which has lost its representativity violates another important goal, the
comparability of subsequent product-offers. As soon as there is any important change in
product quality, in outlet types or in the location of price observation, two subsequent
product-offers are no longer comparable. The difference of the last price from the replaced and the first price of the replacement product-offer may not only consist in the
so-called “pure” price change but may, for example, also include price deviations which
are due to quality differences. Another challenge for the price statistician therefore consists in elaborating the “pure” price change, unbiased by changes in product quality or
other factors, when comparing the last and first prices of subsequent product-offers.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
11
Basic concepts and definitions
The handling of methods which are suitable to reach this end – against the background
of maintaining the representativity of the sampled product-offers – is the focus of this
handbook.
1.2
Basic definitions
The consumption segments can be seen as a starting point for the construction of quality adjusted price indices. They can be subcategories of the 4-digit COICOP classes. The
notion of consumption segments is introduced by Regulation No. 1334/2007. According
to this concept, individual product-offers are categorised on the basis of a common basic purpose of consumption. Product-offers belong to the same consumption segment if
they refer to products which are used in similar situations, which can largely be described
by a common specification, and may be considered by consumers as equivalent.
A product-offer is assigned to one series of observed prices for a model at one period,
location and outlet type. So, a product-offer defines a unique entity for a certain time
span. The term “model” is in this text used to refer to one specific variant of a product. The
hierarchy of consumption segments, models and product-offers is illustrated in Chart 1.
Chart 1
Basic terms
product-offers
(P = price)
Pt=1 Æ Pt=2 Æ Pt=3
model
a
…
consumption
segment I
Pt=1 Æ Pt=2
model
b
4-DigitCOICOP
CLASS
Pt=1 Æ Pt=2 Æ Pt=3 Æ Pt=4
Pt=1 Æ Pt=2 Æ Pt=3
…
…
model
a
consumption
segment II
model
b
…
…
12
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
The set of product-offers initially selected by the price statistician in order to observe
prices is called the “target sample”. It is vital for the construction of the HICP that these
product-offers should be chosen in order to represent the consumption segments they
belong to. The product-offers of a target sample should be representative in terms of the
dimensions model type, outlet type and region.
The price-series belonging to one product-offer is by definition uninterrupted, whereas
the uninterrupted price-series can have different durations. An interruption would for
example be a change in the model’s quality or in the outlet type or a change in the location of price observation, and causes a replacement situation: As soon as such an interruption occurs, the current product-offer ends and should be replaced by a successor. Regulation No. 1334/2007 uses the terms “replaced” and “replacement” product-offer and
states that replacement product-offers shall be selected “so as to maintain the representation of consumption segments” (Article 7). That means that the product-offers should
be representative at any point of time.
Chart 2
Notation of subsequent product-offers (p = price)
“replaced” product-offer
Pt=1 Æ Pt=2 Æ Pt=3
“replacement” product-offer
Æ
Pt =4 Æ Pt=5 Æ Pt=6 Æ Pt=7
The regulation also states that the replacement product-offer should be selected from
the same consumption segment as the replaced one. Normally, samples are managed in
the way that every obsolete product-offer should have one explicit successor. So, within
one consumption segment, a general practice is one-to-one replacement – the initially
chosen product-offers are replaced one-to-one by subsequent replacement product-offers
seeking to represent the universe of transactions taking place in reality within this consumption segment, whereas this universe is called “target universe”.
In the case of a replacement, the price-series of the replaced product-offer has to be
connected to the price-series of the replacement product-offer in order to maintain a
continuing series of price observations. Two cases can be distinguished here:
Either the replacement product-offer is essentially equivalent to its predecessor in terms
of quality. In this case the last price of the replaced offer can be directly compared to the
first price of the replacement product-offer in order to calculate the price development.
Quality means in this context the degree to which products fit to serve the purpose of
the consumption segment to which they belong.
Or the replacement product-offer differs significantly in quality from its preceding offer. In
this case methods of quality adjustment have to be applied. This can for example be done
by estimating the monetary value of the quality difference between the replaced and replacement product-offer and adjusting the last price of the replaced product-offer respectively. According to the terminology of Regulation No. 1334/2007 – which amended Regulation No. 1749/1996 – the product-offers are then “equivalent by quality adjustment”.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
13
Identification of consumption segments
Chart 3
Equivalent replacements
product-offer
replaced product-offer
replacement product-offer
essentially equivalent
equivalent by QA
In the following, a description is given of how to calculate a quality-adjusted price index
for a type of good in, generally, four steps. The four steps are:
•
•
•
identify consumption segments,
construct target samples for each consumption segment,
decide on replacement strategies,
•
apply quality adjustment methods where required.
2
Identification of consumption segments
2.1
The role of consumption segments
As stated above, the Laspeyres principle means
for the HICP that the identified consumption
segments remain constant for the whole index
period (one year in many cases). So the overall price development is represented continually by the same consumption segments throughout one index period. Once the definition of the
consumption purpose is established it should
not be changed during the index period, not
even to react to market changes.
Role of consumption segments
“Consumption segments shall form
the fixed objects in the index basket
to be followed by the HICP.”
Regulation 1749/1996 as amended
by 1334/2007, Article 2a (4).
The consumption segments are linked to the consumers’ usage of the products and to
their perception thereof. This in turn implies that over time consumption segments tend
be about as stable as people’s habits. Thus they are usually much more stable than specific models, but on the other hand they are far from being perfectly constant. Sometimes
new ones are born and subsequently grow, while others fade away and die. For example
when mobile telephones were introduced some years ago, they led to drastic changes in
people’s telephoning habits, taking the phones outdoors etc. Hence, mobile telephones
placed themselves in at least one newly born consumption segment within a short time
period.
14
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
However, normally, during just one year changes in consumption segments will be rather
limited. As HICPs are annually chained it is thus an acceptable Laspeyres approximation
to compute the index with the consumption segments treated as fixed during each year.
If the principle of one-to-one replacements as described in the previous section is considered within the consumption segment, as a result the number of price-series per consumption segment normally stays the same during the index period – obsolete productoffers are each replaced one-to-one by a more representative successor. The replacement
shall always be carried out within the same consumption segment, and not beyond the
boundaries of a consumption segment. On the other hand, quality adjustment methods
should be applied in general when replacing product-offers within a consumption segment – except if the replacement is essentially equivalent to the replaced offer.
2.2
Identification of consumption segments in practice
According to Regulation 1334/2007, a consumption segment means a set of transactions
relating to product offers which, on the grounds of common properties, are deemed to
serve a common purpose, in the sense that they:
•
•
•
are marketed for predominant use in similar situations,
can largely be described by a common specification, and
may be considered by consumers as equivalent.
From the definition in the Regulation it follows that consumption segments are basically
determined by consumers' habits. They should thus be seen as an aspect of consumer
reality, not primarily as an administrative device for price collection. Trying to identify consumption segments should be essential in the index production, in order to make the
index reflect consumer reality appropriately. Consumption segments are thus basically
defined by consumer habits in these respects.
To identify consumption segments practically, proxy variables can be useful. For example,
for new cars the size of the car can serve as a proxy variable: Small new cars are predominantly used for short distance rides in urban areas, e.g. for shopping, whereas large new
cars may also be appropriate for long distances such as vacation trips. Furthermore, the
consumption purpose to transport persons and luggage (e.g. the family cars) also depends
on the size of the car. Using proxy variables as the size of a car to define consumption
segments is (of course) only an imperfect solution. The size of the car can have multiple
meanings, it could refer to the number of doors, size of engine, gross tare weight, etc.
The example does also not implicate that the consumer uses different cars for different
consumption purposes. Furthermore the understanding of short- or long-distance rides
can be different depending on country-specific surroundings. However, it illustrates one
possible approach to the problem of identifying segments with respect to consumption
purpose.
As another example serve TV sets for which the size may, for example, provide an indication of the consumption purpose. Thus, small TV sets could, for example, be situated in
the bedroom or in the children’s room or, more generally, be interesting for less-frequent
users (e.g. for watching the news), whereas large TV sets tend to be bought for usage in
the living room and for watching motion pictures. Here again, the size of the TV set can-
Federal Statistical Office, Statistics and Science, Vol. 13/2009
15
Identification of consumption segments
not be a definitive indicator of usage and the example once again demonstrates the
unavoidable subjective element in the identification of consumption segments. One could
also argue that TV sets can be used for different purposes at different stages of their lifecycle – e.g. the new flat screen TV being placed in the living room whereas the old tube
TV set is used in the bedroom. For the time being, an objective theory-based procedure to
identify consumption segments is not in sight. So NSIs will have to agree on conventions
here. This example also demonstrates the not avoidable subjective element in the identification of consumption segments. Finally, a totally automatic procedure to identify consumption segments will not be possible and one has to agree on some conventions.
As there are so far no general guidelines on how to identify consumption segments, proposals on identification of these are included in the product-specific guidance of this
handbook on a case-by-case basis. These specifications are in general open to countryspecific realisations.
In practice price collection generally has to encompass only a sample of consumption segments. An example: Different kinds of hand-held tools such as hammers and screwdrivers
have different uses, and by the definition they should thus be considered as belonging
to different consumption segments. Now, it is not practically necessary to collect prices for
all those consumption segments of different kinds of tools. Instead one or a few of them
may be sampled for price collection, to represent them all. If a weight is applied here (see
next section), it shall pertain to the consumption in the entire set of consumption segments represented (such as the set of all hand-held tools), not just those in the sample
(such as hammer or screwdriver).
2.3
Consumption segments with or without weighting
Consumption segments can be weighted implicitly or explicitly. A weighting is, however,
not prescribed by Regulation No. 1334/2007. An implicit weighting can result from the
number of price-series if not a mean value per consumption segment but a mean value
of all price-series of a COICOP 4-digit class is computed when aggregating the individual
price developments. Then a consumption segment for which a greater amount of priceseries has been collected enters the sub-index of the respective COICOP 4-digit class with
a higher weight. When weighting explicitly on the other hand, the mean price development
per consumption segment would be calculated and this mean value would obtain a weight
corresponding to the household expenditure on this segment just as the COICOP 4-digit
class.
3
Construction of target samples
3.1
HICP as sample statistics
Price statistics is a sample survey. The following section therefore deals with how productoffers can be selected for price observation. An initial “target sample” has to be constructed which represents the relevant product-offers within the consumption segment.
In doing so, not only the issue of which models to choose plays a role but also which outlets and which regions or locations are relevant.
16
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
For a more thorough treatment on the construction of target samples, please refer to the
“HICP Draft Manual” 1). The inclusion of a sampling section into this Handbook has the
limited scope of setting the stage for the treatment of quality adjustment.
3.2
Dimensions to be considered
In principle, when planning the target sample, three dimensions have to be considered,
the dimensions of models, outlets and regions, which are building up the target universe.
However, that does not mean that all three dimensions always have to be covered by price
collection. If, for example, a retail firm runs several outlets on its own and establishes
the same prices in all outlets it is sufficient to collect prices in one of these outlets to represent the entire chain. Just as well local price collection can be replaced by central price
collection from the internet under certain preconditions. The precondition – namely that
the price development at internet retailers can be regarded as representative of the total
market – should, however, be checked beforehand. The price statistician should be able
to provide a reason for confining price collection to only one or two dimensions.
The target universe puts the identification of the consumption segment in concrete terms
of models, outlets and regions.
Chart 4
Dimensions of the target universe
Consumption
segment
models
outlets
regions
When selecting particular models, the challenge is to compile a representative sample
from a sometimes wide variety of different models within the consumption segment. Multistage sampling approaches are often used in practice, for which different approaches
will be described as examples here. When following the first one, relevant models are
chosen first and then the question is to be answered in which regions and outlets the
prices of these models can best be observed. With the second example, regions and outlets are chosen first and then the relevant models are chosen within the outlets.
The first example is more of a theoretical nature. It can only be used in its pure form if a
register of all available models exists and if the particular market shares are known. To
some extent this approach is suitable in the case that a bestseller list is available for a
1) Eurostat (2008).
Federal Statistical Office, Statistics and Science, Vol. 13/2009
17
Construction of target samples
product, as it is sometimes the case for books, for example. But above that, the example is
suitable to clarify the basic concept that is outlined in the following:
Chart 5
Multistage sampling, Example A
Example A: models > outlets
(1) Choose the relevant models.
(2) Choose representative regions.
(3) Choose representative outlets there.
The second example is helpful in reducing the complexity of the task to find the relevant
models. In this case first outlets are chosen and then the relevant models are determined
within the outlets.
Chart 6
Multistage sampling, Example B
Example B: outlets > models
(1) Choose representative regions.
(2) Choose outlets there which are relevant for this type of good.
(3) Choose representative models in the outlets.
By pre-selecting the outlets, the set of models that have to be considered is reduced:
only models that are offered in the pre-selected outlets are considered. This simplifies
the compilation of the model sample. From a theoretical point of view Example A and B
can yield the same results. From a practical point of view Example B facilitates the selection of particular models.
These two approaches rather serve as examples and can be combined or diversified in
different respects. For example in some cases information is available on market shares
of model types, but not on the share of concrete models. Data from market research institutes might show how market shares for example of LCD and tube TVs are distributed,
but not which market shares can be attributed to specific models in the consumption segment. In this case the information on model types can be used to pre-select model types
and concrete models can be chosen within the outlets. Such an approach will be described
below in the section on “stratification”.
Please note: The selection of outlets and regions is a very comprehensive issue in connection with the calculation of HICPs. This Handbook basically aims at exemplifying that the
18
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
dimensions models, outlets, and regions can be closely connected when compiling a target sample. However, the focus of this Handbook lies on the dimension of “models”.
Please be aware that the expression “choose outlets which are relevant for this type of
good” shall not be restricted to the selection of product-specific shops. The selected outlets may rather be relevant for other products of the HICP as well.
3.3
Selecting models
In the following, selection procedures are described that can be applied in order to compile a representative target sample. The selection procedure should assure that the price
development of the sample reflects the real price development within the consumption
segment to a high degree.
3.3.1
Cut-off sampling
A frequent approach when choosing relevant models is cut-off sampling. In the case of
Example A (models > outlets), the first step is to compile a picture of the models which are
offered in the respective consumption segment, including information on the sales volumes of these models.
In the next step, the most important models according to their turnover can subsequently
be selected for inclusion into the price observation – until a given cumulated market share
of the included models is reached, for example a share of 60 %. Chart 6 illustrates the process of selecting models by cut-off sampling.
Chart 7
Selecting models by cut-off sampling
target universe
market information
Consumption
segment
models
models
M
60% market share
M
target sample
sales volumes
A precondition for applying the method is that market information on the sales volumes 2)
of the models is available – at least for the most important models in the consumption
2) Total sales are used as a proxy for the expenditure by households. Market share and sales volume are used as
proxy for relative expenditure share.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
19
Construction of target samples
segment. If there is a market which is mostly made up of a small amount of top-selling
models and the remainder of the market is made up of a large amount of models each
with a small market share, this remainder of the market does not necessarily need to be
observed. It would suffice to include the top-selling models into the sample as long as
they make up the desired cumulative market share.
A drawback of cut-off sampling is that the price developments of the remaining models,
which are not selected for price observation, may be significantly different from the price
trend of the included models. In this case, the sample would not be representative for the
whole target universe. It is thus not only the market share which has to be considered
when selecting models for the sample – the different price developments within the universe should also be taken into consideration.
In the case of Example B (outlets > models) cut-off sampling can be realised by instructing
the price collector to choose the best-sold models within the selected outlet. The models
can be selected by consulting the sales staff in the particular outlet. However, although
the choice of potentially selected models is in this case reduced to those which are available in the respective shop, the identification of the “most sold” model within a shop may
in many cases be subject to a judgemental decision (see next section on purposive sampling). Moreover, the second disadvantage of cut-off sampling persists – the price development of the best-sold models might not necessarily be representative for the total market. In this case it might become necessary to also consider other models in order to mirror
the price development of the entire consumption segment in a more representative way.
3.3.2
Purposive sampling
Another way to compile a target sample would be to conduct “purposive sampling”, which
is sometimes also referred to as “judgemental sampling” or “typical selection”. When
using purposive sampling, models are chosen which are considered to show a typical
price development. The decision on the inclusion or exclusion of models is based on
judgements from the statistician’s subjective point of view. The advantage shows in the
circumstance that one can deliberately choose other than the most sold models if they
are considered to show more typical price developments and are therefore better suited
for representing the whole market. Furthermore, in the case of judgemental sampling
one does not require a list of models that are for sale and their corresponding sales volumes. However, with Example A (models > outlets), the price statistician still has to get a
general idea of the models offered in the respective consumption segment including a
rough picture of their sales volumes.
For Example B (outlets > models) the selection of a model which shows a price development which is typical for all models sold within this outlet will again be subject to judgemental decisions. The subjective element will even be stronger than in the case of applying
cut-off sampling for Example B: where it may be difficult, but still possible, to identify
the most sold models within the shop, it will be even more difficult to identify one which
shows a rather typical price development.
The disadvantage of judgemental sampling shows in the circumstance that the decisions
on the model selection strongly depend on the decision maker. It is therefore not always
intersubjectively verifiable whether the decision of including a certain model is really
20
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
based on the criterion of representativity – or whether the sample is rather a result of “convenience sampling”, what means to select those models which can be included into price
observation with the least effort or costs.
3.3.3
Random sampling
Random sampling is a way to avoid subjective decisions and at the same time not to confine the sample to the most sold models only. In this case, the models are chosen via a
random process whereby top-selling models shall be selected with a higher probability
than slow-selling models (“probability sampling proportional to size”). This procedure has
the advantage that the selection follows clearly stipulated rules and is therefore not subjective. Furthermore, top-selling models are still considered more than less relevant models. However, the selection is not from the beginning limited to top-selling models. 3)
A practical problem arising is that a complete list of all models including their market
shares must be available – which will for many products not be the case. The approach is
therefore normally used only for parts of the selection process and in combination with
other selection approaches – for example when primary car models are pre-selected as
model types for a new car price index, where information on the market shares of primary car models are normally available from the national car registers. The selection of concrete car models within the definition of the pre-selected model types can then be done by
applying cut-off sampling for example.
Another disadvantage of random sampling may be that the selected models or model
types must be included into price observation, disregarding the practical circumstances
of price observation. The selected units may not be available in the outlets or internet
shops which are normally used for price collection and it can be time consuming or even
not desired to switch to other shops for the price observation (procedures have been invented to cope with this problem by allowing the price collector to switch to evasion products, however this again means watering down the principle of random selection).
3.4
Stratification
In practice, mixtures of different sampling approaches are used often. These mixtures do
in many cases include a so called stratification, where the target universe is first divided
into several strata and the units are then selected within each or within the most important strata 4). As mentioned above, the sample could for example at first be stratified according to model types. For the selection of concrete models representing the strata, one
can for example again first choose relevant outlets and then within the outlets select representative models which fit into the definition of the respective stratum.
3) Random sampling is in principle the only form of sampling that is fully justified by scientific statistical theory. This means that correct use of random sampling is in principle the only way of ensuring absence of
potentially disturbing uncontrolled selection bias due to sampling. The use of random sampling in consumer price statistics is however strongly limited by practical problems.
4) In the theory of random sampling stratification is known as a technique for reducing random errors at a
given sample size. However in price statistics stratification is particularly valuable in attempting to achieve
representativity in use of other sampling methods than random sampling.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
21
Construction of target samples
Chart 8
Multistage sampling, Example C
Example C (with stratification): model types > outlets > models
(1) Stratify the target universe according to model types.
(2) Pre-select relevant model types.
(3) Choose outlets which are relevant for the type of good, accounting for the
regional dimension.
(4) In the outlets, select representative models which belong to the pre-selected
model types.
Information about the market significance of the model types may be gained from market
research institutes, associations or from a small pilot-survey of the national statistical institute for which in some outlets a short description of all available models including their
order of market significance needs to be collected.
Chart 9
Target sample with stratification
target universe
market information
Consumption segment
models
M
M
models
stratum I
stratum II
M
M
stratum III
sales volumes
Ideally, stratification should be accompanied by an implicit weighting of the strata in order
to integrate the information about the market relevancy of the strata. This means, the
higher the relevance of the stratum with respect to market share, the more concrete models should be selected from the stratum in order to represent the entire consumption segment. For reasons of practicability, one often has to cope with a rough approximation of
the number of observations per stratum to the market relevancy of the respective stratum.
22
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
Implicit weights have the advantage that the principle of one-to-one replacement can be
maintained and that in the case of replacements the weights are adapted automatically
(as will be seen in the next section).
Note that it is not necessary to include every stratum into price observation. Analogously
to cut-off sampling, one could pre-select the most important strata and choose the concrete models within these strata only. In this case, the number of models chosen within
each stratum should reflect the market importance of the pre-selected strata only.
The advantages of stratification are that a good cross-sectional representation of the market can be achieved and that even in the case of cut-off sampling, the sample is not confined to the most important models from the beginning. By stratification, it can be assured
that also other important model types enter the sample in any case.
The approach can be expanded to the outlet dimension if information on outlet-types is
available. In this case, one stratum would contain the product offers belonging to one certain model type and one certain outlet type.
3.5
Sample size
In this section a few hints concerning the sample size of products or product groups are
given. Minimum standards for sampling are stated in Article 8 of Commission Regulation
No. 1749/96. The regulation states that the weight of the product category, the diversity of
items within the category, and the variation of price movements in the population should
be taken into account when a sample is compiled (see box).
First of all the weight of a product category could be taken as a rough estimate for the optimal sample size. Second, the diversity of items has to be taken into account in order to
include important market segments. Third, enough prices for categories with volatile price
movements have to be collected taking into consideration resource constraints concerning the maximum number of monthly collected prices.
Theoretically, the optimum
Minimum standards for sampling
allocation of the number of
prices can be calculated by
“HICPs constructed from target samples which, for each
the so-called Neyman allocategory of COICOP/HICP and taking into account the
cation. The standard form
weight of the category, have sufficient elementary agof the Neyman allocation
gregates to represent the diversity of items within the
takes into account the tocategory and sufficient prices within each elementary
tal sample size, the standaggregate to take account of the variation of the price
ard deviations of the price
movements in the population shall be deemed reliable
relatives in the product
and comparable.”
groups and the weight of
Regulation 1749/96, Article 8 (emphasis added)
the product groups. (For
more details see HICP Draft
Manual.) In practice the implementation of such allocation formulae is difficult especially
as the variance of the price development would have to be measured in advance. Furthermore, the diversity of items within the respective categories can be country-specific. This
Handbook can therefore not suggest concrete benchmarks regarding the sample size.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
23
Construction of target samples
Expert knowledge of the particular markets and conditions is necessary especially concerning weights of the product groups, unbiased estimates of the standard deviation, and
detailed data on cost structures regarding the price observations. If detailed data are onhand allocation formulae could serve as tool for optimising the cost-effectiveness.
4
Replacement strategies
4.1
Replacements and revisions
This section deals with the question of when product-offers have to be replaced and
which replacement offers are most suitable as a successor. The replacement of productoffers has consequences for the task of quality adjustment, which will be addressed in
the section on quality adjustment.
Replacements are needed to maintain the representation of a consumption segment over
time – between the comprehensive annual or less frequent revision of the HICP samples.
The fundamental difference between replacement and revision shows in the fact that in
the case of replacements the price difference between the replaced and the replacement
model enters the price index. In the case of a revision, models are also replaced but the
price difference between replaced and replacement model is eliminated through chain
linking and does therefore not enter the price index.
Many countries combine replacement and revision so that a yearly comprehensive revision
is performed in December whereby a sampling procedure is applied, for example, as described in the preceding section on the target sample. Between these revision dates the
sample is sustained representative with the help of replacements. There are, however,
countries that do not revise the sample (at least as regards models) in certain time intervals but rather whenever it seems necessary.
4.2
Focus on the dimension of models
The regulations on the HICP do not specify, whether replacement should only refer to the
model dimension, or also to the outlet dimension or even to the regional dimension of
the target sample. First there is no regulation requirement for replacement and quality
adjustment in the outlet and regional dimensions. Forced replacements between outlets
may be an occurring practice, but for the sake of maintaining representativity of the outlet
sample, frequent enough comprehensive sample revisions can be seen as a more adequate method. Thus, prices would be observed in a fixed choice of outlets and models
would always be replaced within the same outlet.
On the other hand one can take the view that the type of outlet, in which the models are
offered, does also constitute a price-determining characteristic, just as a model’s brand,
material or make. Outlet types could for example be retail warehouses or online-shops.
Thus, outlet types which have lost consumption relevance would have to be replaced by
more representative outlet types and a quality adjustment would also have to be applied if
the quality of the outlet type has been identified as a significant price-determining characteristic.
However, the statement applies that the dynamic of the dimension “model” is usually considerably higher than that of the dimensions “outlet” and “region”. Therefore, the scope
24
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
of this Handbook is confined to the model dimension. In the following, the term “replaced” and “replacement model” is therefore used instead of “replaced” and “replacement product-offers”:
Chart 10
Notation of subsequent models within the remaining text (p = price)
“replaced” model”
Pt=1 Æ Pt=2 Æ Pt=3
“replacement” model
Æ
Pt =4 Æ Pt=5 Æ Pt=6 Æ Pt=7
4.3
Representativity at any point of time
4.3.1
The problem with sticking to similar or identical models by all means
One way of managing a sample of models over time between two revisions may be to stick
to identical or similar models when replacing expiring ones. By replacing unrepresentative models with similar successors, the sample is as far as possible not changed and it is
assured that like is compared with like over time. This approach measures what may be
called the “pure” price change, undisturbed by quality changes.
But the matching of prices of similar models over time will for a range of product categories
lead to the monitoring of a sample of models increasingly unrepresentative of the population of transactions. For some models, it will even not be possible to follow this strategy,
because neither the initially chosen nor similar models will after a while any longer be
available on the market. To stick to identical or similar models is in particular not appropriate for products with significant technical progress. Aiming at replacing obsolete models
by identical or even by similar models in a market with rapid technical progress means
that these replacements will soon be obsolete, too. Obsolete models are by their nature
at the end of their lifecycle and identical or similar replacement models, to be comparable,
must also be near or at the end of their lifecycle.
There is another danger of continuously monitoring similar models of products until they
are out-of-time or even no longer supplied in the shops. Such models may exhibit unusual
price changes as they near the end of their lifecycle, because of the marketing strategies
of the firms. The price development starts to be determined by sell-out prices. Keeping the
model in the sample can lead to declining prices which do not reflect the price trend which
is true for most of the commonly bought up-to-date models.
Another important problem of sticking to similar or identical models arises when the index
is chain linked in the course of the comprehensive revisions. If the sample is only renewed
by applying chain linking then an important part of the price development may get lost,
namely that part of the price development which comes with the introduction of replacement models (see following box).
Federal Statistical Office, Statistics and Science, Vol. 13/2009
25
Replacement strategies
Two types of price changes
The HICP aims at measuring the “pure” price change, unbiased by changes in product
quality or other factors (e.g. the conditions of sale). However, there are two types of
pure price changes and both should be captured. The first occurs when the prices of
a non-replaced model simply change (a price change within one uninterrupted price
series). For instance, a seller changes the price label of a specific computer model in
a specific outlet and location from one period to another. The measurement of this kind
of price effect does not cause any trouble because the pure price change is directly
observable from one period to another.
However, when for example a modified model is launched by a manufacturer this model
may be introduced with a price change, which also needs to be measured. The observed price difference between the preceding and the modified model can in this case
be composed of two effects: (1) The price effect which is due to quality differences between the two models and (2) the price effect which comes with the introduction of the
newly issued model. It is only the price effect of quality changes that should be excluded in order to calculate the pure price change, not the price effect coming with the
model’s launch.
Car models can be taken as an example. Suppose a modified car model is introduced
which includes improved equipment such as a navigation system as standard feature.
Assume that this new equipment has been available before as an option – previously
available at an additional cost of about 400 Euros. Assume that business is doing well
and the dealer decides to seize the opportunity and to increase the price of the new car
model (including the new equipment) by 1 000 Euros compared to the replaced model.
So it could be estimated that the price effect, which is due to quality improvements, may
be at maximum 400 Euros. The pure price change however comes to at least 600 Euros
– the car is more expensive now, even if accounting for the quality improvement.
The price change coming with the introduction of a replacement model should
Revisions and replacements in the HICP
not be excluded from the price measure“Revisions of the HICP sample do not rement. It is an important part of the price
move the need to introduce replacement
development to be captured by price staproduct-offers without delay in between
tistics. The appropriate way to do this is
two revisions.”
to replace models which are becoming
unrepresentative by more representative
Regulation 1749/1996 as amended by
ones. Replacement procedures are espe1334/2007, Article 2a (8).
cially important in the case of technical
progress, because they allow including the quality adjusted price difference between the
replaced and the replacement model into the price measurement and thereby to capture
the price development induced by technical improvements.
On the other hand, chain linking of the index is not a problem when the sample of models
is (by and large) up-to-date and the models within the sample are reflecting the state of
the technology. In this case, price trends due to the technical progress will be captured
26
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
well by replacements between the annual revisions and the index will not systematically
be biased by “chain linking out” the price development in the course of a revision.
4.3.2
Replacement to ensure representativity
The previous sections showed that, especially for products with technical progress, the
sample should be maintained continuously by replacing unrepresentative models. The
models in the sample have to be replaced when they have lost their representativity and
not only when they cannot be found in the outlets any longer.
The chart illustrates this exemplarily. The first step has been to select a target sample according to the available information on sales volumes. The prices of a selected model can
then be traced as long as the model is still representative for the respective consumption segment. This has to be checked at regular intervals.
However, the model distribution which is used in the chart for a regular representativity
check is meant to be hypothetical. Usually, information on sales volumes is not available
for time intervals shorter than one year. Instead, judgemental estimations (relying on supportive evidence, if available) must be used. For example, the products sold most frequently within an outlet also tend to be chosen. The described procedure may be called
“replacement to ensure representativity”.
Chart 11
Replacement to ensure representativity
replacement to ensure
representativity
target universe
models
models
Consumption
segment
M
M
regular judgemental
representativity checks
M
M
most sold
in t = 4
most sold
in t = 1,2,3
estimated sales
volumes
Federal Statistical Office, Statistics and Science, Vol. 13/2009
27
Replacement strategies
4.4
Product specifications
4.4.1
Tight or loose product specifications
Even when striving for representativity, it does not make sense to only follow representativity with replacement. Every replacement implies a more or less reliable judgement on
the quality difference between the replaced and the replacement model. Even if elaborated
quality adjustment methods are applied, there are always uncertainties connected to the
estimation of the quality difference. Additionally, high frequencies of replacement can
cause high costs due to the need of quality adjustment. Therefore, it is meaningful to use
the models that were in the sample in the previous month as a starting point for the model
selection in the current month in order to keep the share of matched models as high as
possible. But if the representativity of an initially chosen model decreases significantly,
it should be replaced by a more representative successor as soon as possible. Thus, the
main criterion for the selection of replacements should be representativity whereas a secondary criterion could be to avoid unnecessary large changes with the selection of replacement models.
A slightly more formal way to account for both criteria would be to define a range of model
variants within which replacement is carried between two revisions. For example for TV
sets, the replacement strategy could be to choose replacements which are of the same TV
type, belong to the same brand cluster and to the same range of screen sizes. A replacement model, for example, might be chosen to fit into the description “LCD TV, screen size
about 32 inches, higher/medium brand”. Those descriptions are often referred to as
“product specification”.
In this context, different market circumstances may be distinguished:
(a) The market of the product is characterised by continuous quality changes over time
that are caused e.g. by technical progress and by which more or less all models are
affected that are offered a similar way.
(b) Product variants of different quality are available in the market that show relevance
in the market at the same point in time, but there is no significant technical progress
which affects the evolution of the product variants in a common way.
In the first case (a), the range of model variants which come into consideration for replacement should be defined loosely so that continuous quality changes due to technical progress can be accounted for. For the example of TV sets, this would for example mean that
the range of screen sizes should be broad enough so that the trend towards larger TV sets
can be reflected in the choice of replacements.
In the second case (b), it would be justifiable to lower the range of model variants coming
into consideration for replacement in order to reduce the scale of quality changes connected with the replacement. This corresponds to a tight product specification.
Tight product specifications can have the further advantage that the price collectors receive more guidance for the selection of replacement models whereas loose specifications
might demand too much from the price collector.
28
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
4.4.2
Several product specifications within one consumption segment
So far, we have assumed that there is only one (tight or loose) product specification for the
replacement within one consumption segment which forms the basis for the selection of
replacement models for price collectors.
One problem shows in the fact that the tighter these product specifications are defined
the smaller is the part of the market which is covered by the range of models of this tight
product specification. This problem may already occur for middle tightness of the product
specification. Referring to the above mentioned example for TV sets, we find a relatively
loose product specification – however, tube and plasma TV sets as well as very small TV
sets are not included in the range of models that may be selected as replacement models. This problem can be solved by applying several product specifications in parallel (see
following chart 12).
Chart 12
Replacement within several product specifications
replacement
target
universe
regular judgemental
representativity checks
models
models
Consumption segment
product specification I
M
M
M
M
M
M
M
most sold within
specification in t = 4
most sold within
specification in t = 3
…
M
product specification II
If a stratification has been applied to compile the target sample, then this stratification
may serve as a first orientation for the definition of several product descriptions in parallel.
5
Quality adjustment
5.1
Creating continuous price series
The next task after having decided on replacement is to connect the prices of the replaced
and the replacement models in order to obtain a continuous series of comparable prices.
In the following, two strategies of connecting models are described which should in no
case be applied automatically. The two methods are based on simplistic assumptions –
Federal Statistical Office, Statistics and Science, Vol. 13/2009
29
Quality adjustment
the first assuming that the entire price difference is due to quality changes and the second
presuming that non of the price difference arises from quality variations.
5.1.1
Link-to-show-no-price-change
With applying “Link-to-show-no-price change” it is assumed that any price difference between the replacement model and the replaced model is caused by their difference in
quality. This strategy should not be used automatically because it rules out any pure price
change possibly coming with the introduction of replacement models on the market.
The following chart 13 shows an example where model “a” exists in the market in period 4 and its successor model “b” is introduced in period 5. With linking the price development of model “a” to the new one of model “b”, it is assumed that the price difference is
completely caused by quality improvements of model “b”. Thus, the price development of
model “b” is directly linked to the one of its predecessor.
Chart 13
Graphical illustration of link-to-show-no-price-change
Assumption:
Price difference = quality difference
Price index
Price
link-to-show-no-price-change
100.0
60
58
model “a”
model “b”
50
83.3
t=4
t=5
The following example illustrates this strategy numerically. Although an actual price difference of + 16 % is observable between period 4 and 5, this is ruled out by the assumed difference in quality. Thus, a price change of 0 % is assumed between periods 4 and 5 instead
of the actual increase of + 16 %.
30
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
Chart 14
Numerical illustration of link-to-show-no-price-change
Period
Price model a
t1
t2
t3
t4
60
54
53
50
t5
Price model b
Price index
Price change
58
100.0
– 10.0 %
90.0
88.3
– 1.9 %
83.3
– 5.7 %
83.3
0%
The main disadvantage of link-to-show“In no case shall a quality change be estino-price-change is that the real price
mated as the whole of the difference in
changes coming with the introduction
price between the two product-offers, unof replacement models cannot be
less this can be justified as an appropriate
tackled. Regulation No. 1334/2007 also
estimate.”
highlights that the method of link-toshow-no-price change should normally
Commission Regulation 1749/1996 as anot be applied. Consequentially, for
mended by 1334/2007, Article 5 (5).
products which improve in quality over
time, the value of quality differences should explicitly be assessed in a replacement
situation in order not to miss price changes coming with the introduction of replacement
models.
5.1.2
Direct price comparison
A second strategy of connecting the prices of replaced and replacement models is direct
comparison. Direct comparison plainly means to assume that there is no relevant quality difference between the replaced model and the replacement. Direct comparison of a
replacement’s price with the one of its predecessor is thus only an adequate method if
both models are essentially equivalent, that means of the same or very similar quality.
If for example the introductory price of the replacement model lies above the exit price
of the replaced model, the price difference is interpreted as a full price increase in the
period in which the replacement takes place. A price increase occurs within the priceseries as shown below.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
31
Quality adjustment
Chart 15
Graphical illustration of direct comparison
Price
Price index
Assumption: No quality change
60
100.0
58
model “a”
model “b”
50
83.3
t=4
t=5
While the price of the replaced model dropped towards its market exit in period 4, the introduction of the replacement model causes a price increase of 16 % from period 4 to period 5. Using direct comparison, it must be assured that this increase is not caused by
quality improvements of the replacement model.
Chart 16
Numerical illustration of direct comparison
Period
Price model a
t1
t2
t3
t4
60
54
53
50
t5
Price model b
Price index
Price change
32
58
100.0
90.0
– 10.0 %
88.3
– 1.9 %
83.3
– 5.7 %
96.7
+ 16 %
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
Direct comparison may in many
“Where no appropriate estimates are availcases have to be accepted as a
able, price changes shall be estimated as the
convention if there is (with reasondifference between the replacement price
able effort) not enough information
and that of the product-offer it has replaced”
available for the estimation of the
value of the quality difference beRegulation 1749/1996 as amended by
tween the replaced and the replace1334/2007, Article 5 (6).
ment model (see next section). A
small excuse for this practice may in some cases be that if the quality difference is
difficult to notice for the price statistician it may be difficult to be noticed by consumers
as well – a quality difference that is not even valued by consumers should not be
considered by statisticians.
Direct comparison may in many cases have to be accepted as a convention if there is
(with reasonable effort) not enough information available for the estimation of the value
of the quality difference between the replaced and the replacement model (see next
section). A small excuse for this practice may in some cases be that if the quality
difference is difficult to notice for the price statistician it may be difficult to be noticed by
consumers as well – a quality difference that is not even valued by consumers should not
be considered by statisticians.
5.1.3
Estimating the value of quality differences
How to include the real price effect of changed models’ launches, but at the same time
exclude the disturbing price effect of quality changes? A consistent solution is to estimate the monetary value of quality changes and to take only this value for the correction
of the price comparison between the replaced and the replacement model. This way, only
price differences which exceed or fall below the monetary value of quality changes are
measured as price changes.
For instance, starting 2004 a certain car model of a certain brand was in Germany offered
with different equipment. The car was amongst other things equipped with a new drive
unit and a stronger engine and petrol consumption was reduced. The air conditioning was
improved, and head airbags were now standard. At the same time, however, the new
price of the model increased by roughly 8 %, and it afterwards cost about 20,000 Euros.
What is the value of the additional equipment which turned standard? Many of these
equipment items were already available as options. In order to estimate the value of the
additional equipment which is now standard, it is possible to use the previous list prices
of the equipment items. The quality-adjusted price trends are then calculated as a change
in the sales price, while ruling out the value of the improved equipment.
If the price of a replacement model is quality adjusted compared to the replaced model,
it is labelled “equivalent by quality adjustment”.
5.1.4
Examples
Suppose there is a product with models newly issued in regular time intervals and that the
models are very expensive when their lifecycles start. Each replacement model is intro-
Federal Statistical Office, Statistics and Science, Vol. 13/2009
33
Quality adjustment
duced with improved quality features. The first model that is introduced is called model “a”.
After a certain time interval, model “a” is replaced by a changed model with improved
quality, which is called model “b”.
Model “a” was introduced at a relatively high price. After its launch, it is getting increaseingly cheaper until it finally disappears from the market. Model “b” is introduced one
month after model “a” has disappeared from the market.
In Chart 17, an artificial example is given with three possibilities of linking the price development of model “a” to that of model “b”.
When “direct comparison” is applied the price of replacement model “b” in t5 is directly
compared to the sell-out price of replaced model “a” in the previous period t4. The price
development thus leaps up with the introduction of replacement model “b”. In this case,
the fact that the quality of replacement model “b” has improved is not accounted for.
If “link-to-show-no-price-change” is used, the introductory price of model “b” is put on
one level with the sell-out price of model “a”. This way, the price-series of model “b” is
tied to the preceding price-series of model “a” and the price increase coming with the
introduction of model “b” is eliminated. The price development is therefore continuous
and the resulting price index decreases more and more. The price declines measured in
this way do not reflect the price trend in the whole market because prices are not really
decreasing to that extent. In this case, the “linking” strategy is clearly misleading.
Chart 17
Direct comparison, link-to-show-no-price-change and quality adjusted price
Price
quality difference
Price index
direct comparison
quality adjusted
price index
(hypothetical)
100.0
model “a”
60
58
50
83.3
model “b”
t=4
t=5
link-to-show-no-price-change
34
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
The third solution shown by the chart is a “quality adjusted” price development. For explanation, the graph shows an example where the quality of the replacement model “b”
has considerably improved. The replacement model could for example include new useful
features. So if the value of the quality difference is considered, the price of the replacement model “b” appears to be lower as when measured by direct comparison.
Chart 18
Numerical illustration of a quality adjusted price
Period
t1
t2
t3
t4
Price model a
60
54
53
50
t5
Price model b
Price index
Price change
58
100.0
90.0
– 10.0 %
88.3
– 1.9 %
83.3
– 5.7 %
90.0
+8%
In the above example the index changes by 8 percent from 83.3 to 90.0. In this example
it is assumed for illustrative reasons that only half of the observed price increase from
50 to 58 (an increase by 16 percent) is regarded as quality improvement. Therefore, the
pure price change amounts to only 8 percent by which the index rises.
The lesson is that “direct comparison” and “link-to-show-no-price-change” can lead to errors in measuring the price trend for the respective product. A consistent solution when
facing quality differences is achieved by estimating the value of the quality difference and
correcting for this value.
Comparing models’ prices with the help of quality adjustment also increases the flexibility of which replacement to choose as a successor model. It enables price statisticians
to a certain degree to choose successor models which are currently most representative
of the consumption segment as a replacement, regardless of whether they differ in quality from the replaced model or not.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
35
Quality adjustment
5.2
Applying quality adjustment methods
In this section, methods of estimating the value of quality differences will be shortly described:
Hedonics, Option Pricing, Supported
Judgement and Bridged Overlap.
Regarding Hedonics, there will be an
extra chapter in this handbook. The
section aims at describing the basic
procedures which apply for most
technical durable goods in more or
less the same way. Detailed product
specific proposals on the application
of the above mentioned methods are
provided in the second chapter.
5.2.1
Quality changes in the HICP
“Concerning quality change, the judgement
shall be based on due evidence of a difference between the specification of a replacement product-offer and the product-offer it
replaced in the sample; That is, a difference
in the product offers’ significant price-determining characteristics, such as brand, material or make, that are relevant to the consumer purpose in question.”
Regulation 1749/1996 as
1334/2007, Article 2a (8).
amended
by
Explicit quality adjustment methods
Explicit methods serve to identify a so-called monetary value of the quality difference between a replacement product and its predecessor. This value can then be purged from
the directly observed price difference between the two subsequent products and what
remains is the pure price change, which should finally enter the index calculation.
Quality adjustment methods as hedonic re-pricing, option pricing, and supported expert
judgement estimate the value of the quality difference between a replaced and a replacement product explicitly – contrary to implicit methods. Two steps are necessary for the
quality adjustment:
(1) Estimate the value of the quality difference.
(2) Adjust the price difference between replaced and replacement model by the value of
the quality difference. This can be done by an additive or multiplicative operation.
The two-steps-structure applies each for hedonic re-pricing, option pricing and supported
expert judgement. Different is the way of estimating the value of the quality difference: In
the case of hedonic re-pricing the estimate of the value of the quality difference is calculated with the values of the variables according to the regression equation. Regarding option pricing the estimate of the value of the quality difference is calculated by means of
prices for the respective options. In this case the value of the quality difference corresponds to the prices of additional features (options), which can be taken from price lists.
Regarding supported expert judgement the value of the quality difference is estimated
whereby additional information may be used. The second step consists of deducting the
estimated value of the quality difference from the price difference between replaced and
replacement model. In the case of hedonic re-pricing it is more convenient to calculate the
quality adjusted price directly by multiplying the observed price by a quality adjustment
factor g , P adj ( B )t =0 = P ( A )t =0 ⋅ gt =0 . Whereas the quality adjustment factor is the ratio
of the (by means of the determining characteristics) estimated prices of the replaced and
the replacement product.
36
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
5.2.2
Hedonic methods (see Chapter Three for more details)
The application of hedonic methods is one way to explicitly estimate the value of the
quality difference between a replaced and a replacement model. A regression is set up
to express the market prices of different products as a function of their characteristics.
This gives a market valuation of the quality difference of the models, which is used as
the ‘monetary value of the change in quality’ when comparing prices.
Different variants exist of how to compute the value of changes of characteristics, but all
hedonic methods are based on a regression analysis. The functional form of the regression
can vary depending on the data. Linear, semi-logarithmic and double log expressions are
possible.
The relation between prices and characteristics may change rapidly from month to month
(e.g. for computers) or less rapidly, e.g. over a time period of several months (e.g. for TVs).
For some products it might therefore be advisable to estimate hedonic regressions on a
regular basis.
A general description of hedonic methods will be provided in Chapter Three of this handbook, as well as examples and product specific guidance on the application of hedonics.
5.2.3
Option pricing
With option pricing, the estimation of the value of the quality difference between two models is done by using listed prices of their characteristics (options). For most products, option pricing is used when a certain feature of a product, like air conditioning in a car, previously used to be an option but has now become standard. Hence the name “option
pricing”.
For some other products, e.g. computers, it is used in a slightly different way. Then, option
pricing is used to adjust for the price differences of a certain feature, when the replacement model has a different variant of that feature than the replaced model. For example,
when the replacement model has a hard disk of 200 GB and the old model one of 100
GB and if the option prices of these features are available, option pricing adjusts for the
price difference of these hard disks.
For option pricing, the following aspects need to be considered:
•
A specific replaced model has to be linked with a specific replacement model. Replacement models are called successors in this case. The differences between replaced models and their successors need to be determined, both qualitatively and quantitatively.
•
The option prices of these particular differences have to be determined.
Given this information, the quality-adjusted price of either the replacement models or the
successors can be calculated. For product groups with a high rate of turnover, like computers, it is generally not known beforehand when a model will be replaced. In practice, option prices can then only be collected in the period when the successor is introduced. In
that case, the quality adjustment is applied to the price of the successor and this price is
compared with the unadjusted price of the replaced item (see following chart). It is also
possible to compare the unadjusted price of the replaced item with the adjusted price of
the replacement.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
37
Quality adjustment
Chart 19
Example for the application of option pricing
Observed price of the replacement
+
Option prices of the characteristics in which the replaced model
differs from the replacement
–
Option prices of these characteristics of the replacement
=
Quality adjusted price of the replacement
Or more formally:
§
·
pˆiy∈,Nm = piy∈,Nm + ¨¨ ¦ bkjy ,m − ¦ bkiy ,m ¸¸ .
k
©k
¹
Here i is the successor and j the replaced model. Furthermore, p̂ denotes the estimated
price, p the observed price in year y and month m . The characteristics are denoted by k
and their values, as expressed in option prices, are denoted by bk .
One drawback of option pricing lies in the fact that a monetary value of the characteristics
in which replaced and replacement model differ from each other has to be collected. With
option pricing, one chooses explicitly to collect additional prices to determine these values. This places a high burden on data collection, as one needs not only information on
prices and characteristics of the models in the sample, but also on the prices of these
characteristics. Moreover, for some characteristics it will not be possible at all to collect
separate prices. Especially for replaced models, the option prices may not be available
because these particular options are not available anymore.
Another drawback of option pricing is that one cannot be sure whether the option prices
correctly reflect the price differences of these characteristics when sold as part of a product as opposed to bought separately. For example possibly only some buyers of a new car
want a particular feature that became standard and are willing to pay the full former option price for it. To capture these aspects in some cases a “reduction factor” by which the
value of the added or subtracted features is reduced from their observed market values
is introduced. In the case of option pricing for cars the HICP standards recommend to apply a reduction factor of 50 per cent.
5.2.4
Supported judgemental quality adjustment
The supported judgemental approach shows many similarities with option pricing. The
monetary value of the quality difference between two subsequent products is calculated
by using supplementary information sources – beyond the prices and quality features of
the products sampled for the price index. Other than for option pricing, the supplementary
information is in this case not restricted to the prices for options taken from real existing
price lists. The value of the quality difference can rather be calculated in more flexible
ways – using a broader source of supplementary information. Three examples are provided
here:
38
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
First example: Energy consumption
One example of the calculation of the benefit in money’s worth is if the replacement model, e.g. an electric appliance, has lower power consumption than its predecessor. The monetary benefit caused by the lower power consumption can be calculated by multiplying a
current average price of the energy saved per year by the assumed lifespan of the appliance.
Second example: Comparable models
An example of the use of comparable models to determine the value of the quality change
may be the case that a manufacturer offers two models that only differ in one characteristic. This characteristic could, for example, be the net capacity of two otherwise identical
freezers. If it can be assumed that prices have been set in a competitive market it would
in this case be justified to take the price difference between the two freezers as an estimation of the value of the quality difference – because the price difference can clearly be
assigned to the respective characteristic. Thus, the price for a certain amount of additional capacity can be determined. The assumption of a linear relation between capacity and
price may be strong but acceptable.
Third example: Fuel efficiency of new cars
A further example of the application of supported judgemental quality adjustment is the
calculation of the quality difference regarding fuel efficiency of new cars. In this case hypothetical values of some parameters are assumed: the amount of kilometres driven per year
is set to 15,000, the number of years a car will be in operation is set at 5 years. Given this
assumption the value of the difference between two in all other characteristics equal cars
can be calculated in an easy way. Model A consumes 8.0 l/100 km Model B 7.5 l/100 km.
Suppose that the actual fuel price is 1.00 €/l. Then the option price for enhanced fuel
efficiency can be calculated as:
(8 l – 7.5 l) • 75,000 km • 1.00 € = 375.00 €.
100 km
l
5.2.5
Implicit quality adjustment – bridged overlap
It has been shown that linking in the form of “link-to-show-no-price-change” is problematic, because the price change coming with the introduction of the replacement product
will not be reflected in the measured price development. Bridged overlap now serves to
link the price-series of a replacement model to the one of a replaced model in a more
meaningful way. The price developments of other models of the same consumption segment, which are not replaced, build the bridge between the replaced and the replacement
model. In this way, the average price development of a set of comparable models can be
used as an estimation for the price change coming with the introduction of the replacement product. Chart 20 (see p. 40) illustrates this procedure.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
39
Quality adjustment
Chart 20
Bridged overlap with a real price increase
Price
bridge
60
quality difference
model “b”
model “a”
50
P of reference models
model “c”
40
model “d”
30
model “e”
20
t=4
t=5
Model “a” disappears from the market and shall be replaced by model “b”. Models “c”,
“d” and “e” are comparable models, that is, models with similar values for their quality
characteristics, compared to models “a” and “b”. They may be called “reference models”
in the context of this method.
When the replacement “b” is introduced to the market, the question arises if the price difference to its predecessor, model “a”, is fully due to its improved quality. The average
price increase of the comparable models “c”, “d” and “e” indicates that this is not the
case. Because the price of other models has risen in the period of comparison, it may be
the case that also the price difference between replaced and replacement model underlies
this trend. With bridged overlap, the same price development is assumed between the replaced and replacement model. The numerical example additionally displays this logic.
The last observed price of model “a” has been 50 in the fourth period, the price of the
replacement in the following period is 58. Using bridged overlap, the average price development of the reference models “c”, “d” and “e” is taken as a proxy for the missing link
between models “a” and “b”. As a result, the average price development of + 10 % is used
to link the price-series of model “a” to that of the replacement model “b” instead of the
actual price increase of + 16 %.
40
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
Chart 21
Numerical illustration of bridged overlap
Period
Price model “a”
t1
t2
t3
t4
60
54
53
50
t5
Price model “b”
Price change
58
– 10.0 %
– 1.9 %
+ 8.9 %
– 5.7 %
Price model “c”
50
45
40
40
42
Price model “d”
40
32
30
30
33
Price model “e”
35
30
25
20
23
Average price
change of reference models
“c”, “d”, “e”
– 14.4 %
– 11.2 %
– 5.3 %
+ 8.9 %
The following Chart 22 provides another example of bridged overlap. It shows a situation,
in which the quality adjusted price difference between replaced and replacement model
is zero. In contrast to the example given above, the price of the reference models “c”,
“d” and “e” remained constant during the bridged period. As a result, no real price change
is assumed between the replaced and the replacement model.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
41
Quality adjustment
Chart 22
Bridged overlap
Price
bridge
60
quality difference
observed price of model “b”
model “a”
50
quality-adjusted price of
model “b”
P of reference models
40
model “c”
model “d”
30
model “e”
t=4
t=5
t
The choice of models which are used as reference models is an important task with this
method. Basically, the reference models must compete with the replaced and replacement
models. However, competition is difficult to observe empirically. Nevertheless, the following conditions should be met:
First, bridged overlap should only be applied if within the respective market the prices of
the products react to each other immediately. This means that the price development of
the reference models in the same month should reflect the price development coming with
the introduction of the replacement model. In other words, it is presumed that the manufacturers or sellers of the reference models react instantly to the introduction of a new
model with price adjustments of their models. Bridged overlap would for instance be senseless in the case of books in Germany. Due to the fixed retail prices of books, sellers can not
react to the introduction of new and possibly competing books and adjust prices accordingly.
Second, the price-series of the reference models should not be influenced by unusual
price fluctuations, e.g. caused by special price strategies of individual sellers. Such fluctuations can bias the price development of the reference products. Thus, replacement model, replaced model and comparable models should exhibit homogenous price changes not
only in the period of comparison, but also prior to it.
42
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Standard procedures for quality adjustment
5.3
Combining quality adjustment methods
Oftentimes several quality adjustment methods are combined in practice. A frequently
used way of distinguishing between the application of two different quality adjustment
methods is to make a distinction between minor and major quality changes. For example, there might be a product for which option pricing is applied in the case of minor
changes and bridged overlap in the case of major changes. The definition of minor and
major changes is product-dependent and will be provided in the product specific proposals (see next chapter).
Chart 23
Combining quality adjustment methods
target
universe
replacement and quality
adjustment
Consumption
segment
models
M
M
M
M
Quality adjustment method
for major changes
Quality adjustment method
for minor changes
Federal Statistical Office, Statistics and Science, Vol. 13/2009
43
Product-specific guidance and examples not using hedonics
The following table gives an overview of the CENEX products and the quality adjustment
methods recommended.
Chart 24
Overview of CENEX products and recommended quality adjustment methods
Non-hedonic
methods
Product
Hedonic methods 1)
TV sets
Bridged overlap
Hedonics
New cars
Option pricing
–
Used cars
Combination of direct
comparison, bridged
overlap and supported
judgement
Hedonics
Notebooks
Desktops
Bridged overlap
Option pricing
Hedonics
Hedonics
Washing machines
Bridged overlap
Hedonics
Computers
•
•
Books
•
Long-selling market
Combination of direct
comparison and expert
judgement
–
•
Rapidly-changing market
Direct comparison
of bestseller lists
Hedonics
Software
•
Applications
Direct comparison with
consumer profile approach
–
•
Games
Direct comparison
–
_____________
1) See next chapter “Hedonic Regression”
All recommendations for quality adjustment are based on the respective Task Force results and HICP standards. Additionally, empirical studies have been performed within the
CENEX HICP Quality Adjustment project. The role of these studies was not to assess the
suitability of different quality adjustment methods – this was already undertaken by the
respective task forces. Rather, the intention was to gain practical experience in the implementation of the suggested methods.
This chapter focuses on the product-specific recommendations not using hedonics. Subsequently, Chapter Three will give a general introduction into the hedonic quality adjustment methods followed by an additional product-specific chapter for TV sets, used cars,
computers, washing machines and books.
44
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
1
Television sets
•
Television sets belong to a market with a low replacement rate and high technological
change (e.g. large screen television sets). The Task Force on Quality Adjustment and
Sampling assessed explicit methods for this market as A-methods and implicit methods (bridged overlap) as B-methods. Expert judgement is rated as a C-method. For
the television market with a low replacement rate and low technological change (e.g.
13 inch television sets) implicit methods (bridged overlap and direct comparison) are
rated as A-methods. Unsupported judgement is assessed as a C-method.
•
The empirical studies of the CENEX members showed that hedonics and bridged overlap with appropriate stratification yield similar results. Expert judgement is not suitable (even with stratification).
•
The following section deals with the application of bridged overlap for television sets.
1.1
Proposal for the identification of consumption segments
•
Television sets are covered by COICOP class 09.1.1.
•
Regarding the identification of consumption segments, different solutions may be possible.
–
–
–
•
1.2
•
For TV sets the screen size may provide an indication of different consumption
purposes. As already mentioned in Chapter One, small TV sets could, for example, be situated in the bedroom or in the children’s room or, more generally, be
interesting for less-frequent users (e.g. for watching the news), whereas large TV
sets tend to be bought for usage in the living room and for watching motion pictures.
An alternative approach might be that all television sets serve the same consumption purpose, namely entertainment, so that there would only be one consumption
segment.
Furthermore, another criterion for the distinction of consumption segments might
be the “mobility” of television sets. If, one day, portable television sets become
more popular it may be necessary to identify individual consumption segment for
them. This criterion is, for example, used for the distinction of desktop and notebook computers.
For practical reasons only one consumption segment is considered in the following.
This pragmatic proposal is, however, not based on convention. For the time being,
the issue remains open.
Proposal for the construction of a target sample
A stratification of the target universe according to model types is suggested, followed
by a pre-selection of the relevant model types. Subsequently, outlets should be chosen
which sell models belonging to the pre-selected model types. Market information on
the sales figures of models can be obtained either from market research institutes
or through a small field survey. In the selected outlets, representative models which
belong to the pre-selected model types should be chosen for price observation.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
45
Television sets
Chart 25
Proposal for the construction of the target sample
model types > outlets > models
(1) Stratify the target universe according to model types.
(2) Pre-select relevant model types.
(3) Choose outlets which are relevant for the type of good, accounting for the regional dimension.
(4) In the outlets, choose representative models which belong
to the pre-selected model types.
(1) Stratify the target universe according to model types
•
TV sets can be characterised by the type of TV (e.g. LCD, tube, plasma) and by the
screen size.
•
Carry out an evaluation on which types of TV sets are available (e.g. tube, plasma,
LCD TV sets) and on how large their market share is.
•
Additionally, evaluate the available screen sizes so that stratification by screen sizes
can be applied.
•
Stratify for example by
a) TV type: LCD, plasma, tube (however, in many countries tube TV sets do not play
a role anymore.)
b) Screen size: Identify the market share of the screen sizes available either with the
help of data from market research institutes (e.g. Gesellschaft für Konsumforschung
International [GfK], AC Nielsen) or with the help of a small field survey. The field
survey can either be conducted on the internet or in a few outlets. One example
would be to establish three screen size strata: “small” with a screen size of about
14 inches, “medium” with a screen size of about 20 inches, and “large” with a
screen size of about 32 inches. The values given in the example might differ for
the TV sets available in your national TV market. Choose the significant screen
sizes of your national market.
Chart 26
Example of straticfication by screen size and TV type
Overview of TV market
LCD
Tube
Plasma
...
~ 14 inches
...
~ 20 inches
...
~ 32 inches
46
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
(2) Pre-select relevant model types
•
The strata with the biggest market shares are successively selected until 50 to 80 %
of the total market are covered. This means that the strata with a small market share
will not be included in the sample (cut-off sampling).
•
The result of the pre-selection will be a list containing the pre-selected TV types plus a
guiding value for the respective screen size. This list can be passed to the price collector for the selection of concrete models.
Chart 27
Example of pre-selection of strata
Overview of TV market
...
~ 14 inches
...
~ 20 inches
...
~ 32 inches
...
LCD
Tube
Plasma
selected
selected
selected
(3) Choose outlets which are relevant for the type of good,
accounting for the regional dimension
•
An idea for sampling outlets could be as follows:
–
–
–
–
–
Distinguish between large/small retailers and mail order/stationary retailers.
Include the largest retailers in any case. Check whether or not the large retail
chains have the same pricing policy in the entire country. If yes, prices can be centrally observed in big retail chains. If not, prices have to be observed locally taking
into account the regional variance of price developments.
Include a sample of small retailers. This could be done by probability sampling if
data can be obtained from the business register of your respective national statistical office. If information from the business register is not available, choose relevant regions and within the regions select typical outlets (just as you do for other
products of the HICP).
Prices from small retailers should be observed locally in general (under consideration of regional variations in price development).
If mail order selling is relevant in the market it should be taken into account by collecting prices from the internet or catalogues additionally.
(4) In the outlets, choose representative models which belong
to the pre-selected model types
•
Within the chosen outlets, select TV models which are best sold and correspond to
the pre-selected strata.
•
Which TV sets are best sold in the respective strata should be determined by the price
collector after consulting a shop assistant. If this is not practicable the price collector
Federal Statistical Office, Statistics and Science, Vol. 13/2009
47
Television sets
can also revert to indicators such as the way the product is advertised or presented
in the shop or he might also revert to his experience. A model that is clearly visible
in the store could, for example, be judged to be representative.
Minimum size of the target sample
•
A number of 25 to 30 product-offers may be suggested as a normal rough minimum
bound for sample size of TV sets, where no more specific indications are present.
1.3
Proposal for the replacement strategy
•
The stratification from the target sample is used to set up the product specification for
replacement, i.e. the replacement should be chosen from the same stratum. Additionally, the replacement model should normally be selected from the same brand cluster.
•
A classification of brand clusters for TV sets can be obtained from the PPP statistics
(Purchasing Power Parities – price indices comparing the national price levels). The
available brand cluster lists of the PPP classify TV sets into high/medium and low
brands. In these lists, not all available brands are classified. Missing brands should
be assigned to one of the brand clusters by personal judgement.
Chart 28
Extract from the PPP classification for TV sets
Brand cluster
Assigned brands
Higher and Medium
HITACHI, JVC, LG, LOEWE, PANASONIC, PHILIPS,
PIONEER, SAMSUNG, SHARP, SONY
Lower
BEKO, BIGA, DAEWOO, DIBOSS, ELEMIS, EVELUX,
FUEGO, LENCO, MATRIX, QUADRO, SENCOR,
SCHNEIDER, SILVA, VIVAX
Source: Eurostat (Lot C), European Comparison Programme: Consummer price survey E07-1
“House and garden” Particular survey guidelines
•
The result is a product specification that is characterised by three dimensions, namely
TV type, screen size and brand cluster. A replacement might, for example, be chosen
so that it fits the description “LCD TV, screen size 32 inches, higher/medium brand”.
Representativity check within the scope of the monthly price collection in the shop
•
•
48
Use the concrete TV models that were in the sample in the previous month as a starting point for the price collection of the current month in order to keep the share of
matched models reasonably high. However, the main criterion for the selection of replacements should be representativity.
The price collector should check whether or not the models of the previous month
are still representative of the corresponding stratum and brand cluster in the current
month. The price collector should assess this monthly with the help of the shop assistant. If this is not practicable the price collector can also revert to indicators such as
the way the product is advertised or presented in the shop or he might also revert to
his experience.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
•
If the model of the previous month is still representative of the stratum and brand
cluster, continue observing the price of this model.
•
If the model is no longer representative or if it is no longer available replace it by a
representative one from the same stratum and brand cluster – if available. The new
model would then be
–
–
–
of the same TV type (LCD, plasma, tube),
of roughly the same screen size,
of the same brand cluster.
•
If possible, try to find a replacement that is still representative but is so similar to the
replaced model that the changes in the characteristics between replaced and replacement model do not constitute a major change. Those characteristics that cause a major change in quality are for example frequency, aspect ratio, sound system (see section “proposal for quality adjustment”).
•
If such a replacement cannot be found for lack of representativity, a replacement has
to be selected which differs in the characteristics causing a major change within the
product specification. The selection of a replacement of the same screen size is thereby more important than selecting a replacement of the same brand.
•
If no model is available from the same stratum the replacement has to be chosen from
the target universe, making a compromise between representativity and similarity to
the replaced model. The replacement could, for example, be of the same TV type but
of a clearly different screen size. In this case, the initial product specification must be
adjusted by the national statistical office to current developments in the TV market as
soon as possible (see below).
Representativity check in the office
•
Additionally to the representativity checks conducted by the price collector a representativity check should be conducted regularly in the office, especially for large TV sets.
•
Check whether the product specifications are still representative of the market. Therefore, the market has to be observed regularly. This regular observation can also be
used to determine general changes (e.g. new equipment, new brands, technological
progress, trend towards larger screen size) which are also important for quality adjustment (see next section).
•
If the product specifications are still representative, carry on using them.
•
If not, adjust the product specifications to the current market circumstances by resampling.
1.4
Proposal for the quality adjustment
•
A distinction in minor and major changes helps to decide whether a quality adjustment is necessary. In general, major changes require a quality adjustment whereas
minor changes do not.
•
Minor quality changes would for example be changes in resolution, brightness, contrast ratio, HD ready, number of scart connectors, station memory, teletext memory,
picture in picture. If minor changes occur the prices can be directly compared.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
49
Television sets
•
Major changes within the same product specification would be changes in:
–
–
–
–
–
Frequency (50 Hz – 100 Hz).
Aspect ratio (4:3 – 16:9).
Sound system (Mono – Stereo – Surround Sound).
Brand cluster (e.g. from medium to low).
Small change of the screen size.
If major changes within the same product specification occur, bridged overlap shall be
applied. The price development of the remaining models in the same product specification is used as the bridge. (If the price collector was forced to choose a replacement from outside the product specification because there was no representative
model available in the initial product specification, the price development of the models in the initial product specification is used as the bridge. Re-sampling should then
be carried out as soon as possible.)
•
A major change normally accompanied by re-sampling would be:
–
–
Big change of the screen size.
Change in the TV type (e.g. from tube TV to LCD TV).
Chart 29
Quality adjustment for minor and major changes
M
M
M
M
TV sets
Consumption segment
models
M
M
M
M
Minor change within
product specification:
Direct comparison
1.5
Major change within
product specification:
Bridged overlap
Cost estimation
One-time effort for the introduction of option pricing for television sets
• The one-time implementation effort amounts to approximately one month.
Current effort of price collection and index calculation
•
50
The monthly operational costs involve approximately 1.5 working days for the product
expert.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
1.6
Examples
Example of the construction of the target sample
The field survey on available TV types and screen sizes in one exemplary country has
yielded the following stratification results:
Chart 30
Market share in percentage of sales volume
Overview of TV market
...
~ 14 inches
...
~ 20 inches
...
~ 32 inches
...
LCD
1
16
33
Tube
...
11
...
<1
...
2
...
Plasma
<1
<1
2
The stratum with the largest market share of 33 % is LCD TV sets with a screen size of 32
inches. This is the first stratum to be pre-selected. The second stratum that is pre-selected
is LCD TV sets with a screen size of 20 inches. The two strata pre-selected so far add up
to 49 % of the total sales. In order to reach the cut-off border of at least 50 %, another stratum has to be selected. The third largest stratum is tube TV sets with a screen size of 14
inches with a market share of 11 %. Finally, three strata are pre-selected, which together
cover 60 % of the market:
•
•
•
Tube TV sets with a screen size of ~ 14 inches,
LCD TV sets with a screen size of ~ 20 inches,
LCD TV sets with a screen size of ~ 32 inches.
The pre-selection of model-types is finished. We now have to pre-select outlet types. We
distinguish between large and small sellers. Once this distinction is finished we can determine concrete outlets. The distribution of the sales across points of sales might look like
this:
Chart 31
Pre-selection of outlet types
Large seller 1
Large seller 2
Small sellers
50 %
30 %
20 %
Two large sellers cover 80 % of the market while the sales by small sellers make up 20 %
of the market. Large sellers should be included in any case. Furthermore, a probability
sample should be taken from all small sellers.
The next step is to determine how many TV sets will have to be observed in which outlets.
First it has to be calculated how large the shares of the three pre-selected segments are
in the sample.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
51
Television sets
Chart 32
Shares of pre-selected strata
Overview
of TV market
Share
of total sales
Share
in the sample
Tube TV (screen size 14 inches)
11%
Æ
18%
LCD TV (screen size 20 inches)
16%
Æ
27%
LCD TV (screen size 32 inches)
33%
Æ
55%
60%
Æ
100%
Then, it has to be determined which share of the sample has to be observed in which
outlet-types. The following matrix will help to do so. It refers to the tables above.
Chart 33
Matrix of shares
Overview
of TV market
Share
in the sample
Tube (14 inches)
18 %
LCD (20 inches)
27 %
LCD (32 inches)
55 %
Total
Large seller 1
Large seller 2
Small sellers
50 %
30 %
20 %
100 %
For example, in order to find out how many tube TV sets with a screen size of 14 inches
to observe in the outlet of large seller 1, multiply 18 % by 50 % and you obtain a value
of 9 %. The resulting table looks like this:
Chart 34
Completed matrix of shares
Overview
of TV market
Share
in the sample
Large seller 1
50 %
Large seller 2
30 %
Small sellers
20 %
Tube (14 inches)
18 %
9.0 %
5.4 %
3.6 %
LCD (20 inches)
27 %
13.5 %
8.1 %
5.4 %
LCD (32 inches)
55 %
27.5 %
16.5 %
11.0 %
Total
100 %
For our example we assume that 30 price observations are required. The following table
shows how many TV sets of which screen size have to be observed in which shop (numbers rounded):
52
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 35
Matrix containing numbers of observations to be included
Share
in the
sample
Overview
of TV market
Large seller 1
Large seller 2
50 %
Small sellers
(prob. sample)
30 %
20 %
Tube (14 inches)
18 %
3
2
1
LCD (20 inches)
27 %
4
2
2
LCD (32 inches)
55 %
8
5
3
100 %
15
9
6
Total
In this example it is assumed that both large sellers pursue a pricing strategy that is identical across all regions within one country. Therefore, it is does not matter in which outlets of the large retailers the prices are finally observed.
In a next step the price collector would be instructed to collect prices of e.g. three TV sets
of the type “LCD, ~ 32 inches, higher or medium brand” within one outlet of large seller 1.
He might be provided with the following price collection form:
Chart 36
Example of a price collection form
Television, screen size ~ 32 inches, LCD, higher or medium brand
Price (€):
929.–
Brand and model description:
Sony KDL-32D3000
Screen size (inch):
32
Resolution:
1366 x 768 pixels
Contrast ratio:
8000:1
Reaction time:
8 ms
Brightness:
450 cd/m²
Frequency:
: 50 Hz
… 100 Hz
Aspect ratio:
: 16:9
… 4:3
Sound system:
: Surround
… Stereo
… Mono
… Others:_________
Connectors:
… VGA
… DVI
: HDMI
: Others: Scart, PC
… Picture in Picture
: HD ready
Shop: ___________________________
Date: ___________________________
In case of bridged overlap being used it may be unnecessary to collect all the mentioned
variables from the price collection form above.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
53
Television sets
Example: Minor change within the same product specification
In month 3, Model 4a was no longer representative and has therefore been replaced by
model 4b (see chart 37).
Chart 37
TV sets collected within the product specification “LCD, ~ 20 inches,
higher and medium brands” – minor changes
Model
Brand
Type
Screen
Size
Price
in month 1
Price
in month 2
Price
in month 3
1
LG
RE20LA30
20
799.–
799.–
699.–
2
PHILIPS
20PF8846
20
599.–
599.–
599.–
3
SEG
MONTECARLO
20
599.–
599.–
599.–
4a
TOSHIBA
20VL33G
20
810.–
810.–
4b
SONY
KLV20SR3
20
Average price (geometric mean, rounded)
710.–
694.–
Price change (in %)
Index value
100.0
694.–
650.–
0
– 6.4
100.0
93.6
The change between models 4 a and 4 b is evaluated as a minor change in the model characteristics, therefore no quality adjustment is necessary. The price of the replacement TV
set Sony is compared directly to the price of the replaced Toshiba.
The seventh line of the matrix above shows the average price (geometric mean). The average prices for month 1 and month 2 are the same and therefore the average price change
is 0.0 %. In month 3, if prices are compared directly we can see a price decline for the LG
from 799.– € to 699.– € and for the Sony replacing the Toshiba from 810.– € to 710.– €.
This results in a decrease of the average price from 694.– € to 650.– € and in a price
change of – 6.4 %.
Example: Major change within the same product specification
In this example the change in the model specification from model 4 a to model 4 b is assessed to be a major quality change. A quality adjustment should be done. In the following
example the Toshiba model is replaced by a larger Sharp TV. A rise in the screen size from
20 to 21 inches is observed. To adjust for this difference bridged overlap is applied.
54
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 38
TV sets collected within the product specification “LCD, ~ 20 inches,
higher and medium brands” – major changes
Model
Brand
Type
Screen
size
Price
in month 1
Price
in month 2
Price
in month 3
1
LG
RE20LA30
20
799.– €
799.– €
699.– €
2
PHILIPS
20PF8846
20
599.– €
599.– €
599.– €
3
SEG
MONTECARLO
20
599.– €
599.– €
599.– €
659.– €
659.– €
631.– €
Average price of models 1 – 3 without QA
(geometric mean)
Average price change of models 1 – 3
without QA (in %)
0.0
4a
TOSHIBA
20VL33G
20
4b
SHARP
LC22SV2E
21
810.– €
Average of directly observed prices
694.– €
Quality adjusted average price
– 4.4
810.– €
680.– €
650.– €
694.– €
635.– €
664.– €
Price change (in %)
Monthly indexes
100.0
0.0
– 4.4
100.0
95.6
A bridge is needed to reflect and apply the price change of all unchanged TV models to the
new Sharp model. The bridge is taken from the average prices of models 1 to 3. The price
change of these models from month 2 to month 3 is – 4.4 %. The Sharp TV has a price of
650.– € and the imputed price for the previous month can now be calculated:
650.– € / 0.956 = 680.– €
with 0.956 = 1– 0.044
The quality adjusted average price in the previous month amounts to 664.– €. The result on
the basis of this quality adjusted average price is a quality adjusted price change of – 4.4 %
and a decline of the index from 100.0 to 95.6.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
55
New cars
2
New cars
•
As the first step, the COICOP four-digit-level: 07.1.1 Motor Cars (containing motor cars,
passenger vans, station wagons, estate cars and the like with either two-wheel drive
or four-wheel drive) is divided into new cars and used cars. This section only deals
with new cars.
•
The recommended quality adjustment method for new cars is Option Pricing. This is in
line with the results of the Task Force on Quality Adjustment and Sampling and the
HICP standards which propose option pricing as a B-method. No method was assessed as an A-method for cars.
•
The following section deals with the application of option pricing for new cars.
2.1
Proposal for the identification of consumption segments
•
It is suggested to divide the universe of new cars into different consumption segments
according to the different consumption purposes.
•
The official EU car classification system, Euro-NCAP, contains nine market segments
and can serve as a starting point for inspiration:
–
–
•
Superminis (e.g. Renault Twingo), Small family cars (e.g. Ford Focus), Large Family
cars (e.g. VW Passat), Executive cars (e.g. BMW 5 Series), Roadsters (e.g. BMW
Z4), Small multi-purpose vehicles (e.g. Opel Meriva), Multi-purpose vehicles (e.g.
Toyota Previa), Small off-road vehicles (e.g. Honda CR-V), Large off-road vehicles
(e.g. Jeep Grand Cherokee).
This Euro-NCAP classification is not exhaustive since luxury cars (e.g. MercedesBenz S-class) are excluded.
According to Regulation 1334/2007 consumption segments are characterised by consumption purposes and pre-dominantly usage in similar situations. Since the consumption purpose of a car can not be measured, the size of the car serves as proxy variable.
The Euro-NCAP classification seems to be too tight for a clear distinction of consumption purposes. Therefore it is suggested to aggregate the nine segments into four consumption segments according to the size of the car:
–
–
–
–
small new cars,
medium-sized new cars,
large new cars,
special new cars.
•
This consumption segment identification should be done country-specifically in accordance with the national market circumstances.
•
Small new cars are predominantly used for short distance rides in urban areas, e.g.
for shopping, whereas large new cars may also be appropriate for long distances such
as vacation trips. Furthermore the consumption purpose to transport persons and luggage (e.g. family cars) also depends on the size of the car. A further segment of cars
serves individual consumption purposes; these are for example convertibles, off-road
vehicles and roadsters. For these cars a separate consumption segment should be
identified.
56
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 39
Proposal for the identification of consumption segments for new cars
“small new cars”
“medium-sized new cars”
New cars
“large new cars”
COICOP 07.1.1
Motor cars
“special new cars”
Used cars
•
The next step is to assign the universe of the new car models available in the country
to the different consumptions segments.
•
This should be done on the basis of primary models. A primary model is an umbrella
term for a model of a passenger car (e.g. VW Golf) whereas the term “sub-model” refers to a specific engine – and equipment version of the primary model (e.g. VW Golf
V 1.9 TDI Trendline).
•
The universe of primary models consists of approximately 300 models for a country.The
information which primary models are currently sold in the particular country can be
obtained from the national central car register.
•
The assignment of primary models to the mentioned consumption segments can be
conducted individually by each country and may to a certain degree depend on national circumstances (possibly, the consumption segment “large new cars” contains different primary models in different countries).
Federal Statistical Office, Statistics and Science, Vol. 13/2009
57
New cars
Chart 40
Example for the assignment of concrete models to the consumption segments
Central car register
Primary models
Small new cars
Medium-sized new cars
Large new cars
Special new cars
Registrations
per year
VW Golf
...
Opel Corsa
...
Mercedes-Benz SLK
...
Mazda 323
...
...
...
•
The central car register may already contain a variable which divides the primary models into different segments. In many cases this segmentation has only to be adjusted
to the segmentation of the four consumption segments. Altogether, the assignment
to segments is not very extensive. As a result all primary models that are sold in the
particular country are assigned to the consumption segments mentioned above.
•
The exclusion of business cars:
–
–
–
–
–
–
In correspondence to the HICP regulation framework, business cars shall not be
covered.
The practical feasibility depends on whether the national central car register contains a distinction between business cars and privately used cars.
The term “business cars” consists of two types: Company cars (that may also be
privately used) and cars of self-employed persons.
A complete exclusion of business cars could result in a declined quality of the car
sample since such cars are normally concentrated in specific consumption segments and specific brands.
The cars of self-employed persons are rarely used for business purposes only and
even for company cars, white collar workers may have to finance a significant part
of the car which legitimates a private usage of the company car.
In case the central car register contains a distinction between business car types,
only exclude company cars. If the central car register does not contain any distinction of business cars, it is suggested to keep the business cars in the sample.
Weighting of the consumption segments
•
58
It is suggested to weight the mentioned consumption segments explicitly according
to the expenditure shares.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
•
Generally, the expenditure shares will not be available but could be obtained by multiplying the total number of sales from the previous year by an average price for each
consumption segment.
•
The total number of sold cars can be derived from the information in the central car
register. This contains the number of initial registrations (from the previous year) per
primary model which can serve as a proxy for the sales frequencies of the primary
models.
•
An average price for the consumption segment could be calculated on the basis of the
sample of list prices, which are collected anyway. The collection of list prices will be
described in the next section.
2.2
•
Proposal for the construction of a target sample
It is suggested to first select primary models. After that, precise sub-models for each
primary model have to be selected which represent the corresponding primary model.
Subsequently, the list prices of these particular cars are observed.
Chart 41
Proposal for the construction of the target sample
primary models > sub-models > manufacturer’s list prices
(1) Select primary models by probability or cut-off sampling.
(2) Select sub-models by purposive sampling.
(3) Collect list prices for the chosen sub-models.
(1) Select primary models by probability or cut-off sampling
•
Two possible sampling methods can be applied: Probability sampling proportional to
size could, in the case of new cars, be applied without much effort because of the limited complexity of the universe of car models (only approximately 300 primary models). Another possibility would be cut-off sampling within the consumption segments.
The selection criterion for both sampling methods is the number of initial registrations
from the previous year for each particular primary model.
•
When applying probability sampling the central car register can serve as a sampling
frame. It is likely to be feasible to let the sampling of models be based on the number of initial registrations of car models in the previous year.
•
The suggested procedure is similar for cut-off sampling. Here, the models with the
highest number of registrations in the previous year according to the car register are
selected successively until a defined cumulative market share is reached. However,
there is a risk of bias that is caused by the exclusion of primary models in market
niches.
•
Therefore, probability sampling should be preferred to cut-off sampling because when
using probability sampling all primary models available have a chance of being in-
Federal Statistical Office, Statistics and Science, Vol. 13/2009
59
New cars
cluded in the sample whereas, when using cut-off sampling, it is accepted right from
the beginning that the sample does not mirror the entire market. In the case of cars
it cannot be excluded that a significant number of left-out models will have a diverging
price development (e.g. expensive luxury models, models that are important in a specific market niche, new models at the start of the life cycle or old models that will soon
disappear from the market).
•
The risk of bias in the case of cut-off sampling can be alleviated by means of stratification according to consumption segments. In this case however an adequate number
of consumption segments should be precisely defined so that they effectively represent the different market niches. A good starting point for this exercise would be the
already mentioned Euro-NACP classification plus an extra segment for luxury cars.
(2) Select sub-models by purposive sampling
•
After the selection of primary models, sub-models of the selected primary models
have to be chosen. For this sampling stage, purposive sampling is suggested.
•
The sub-models may be selected according to the following criteria:
–
–
–
The sub-model should be well sold (e.g. a well sold engine version such as VW
Golf 1.9 TDI).
The sub-model might be of the least extensive or a very popular equipment level
(e.g. the least extensive equipment level of Volkswagen is called “Trendline”).
The sub-model must not be a special edition (e.g. VW Golf equipment version
“Goal”).
•
The higher the number of registrations of a primary model, the higher the number of
price observations (i.e. implicit weighting). Each collected list price has to refer to
one precise sub-model.
•
If a high number of list prices is collected, which means that a corresponding high
number of sub-models has to be observed, it might be necessary to diverge from the
sub-model selection criteria mentioned above. Especially, the restriction on the equipment level could be relaxed.
•
For the selection of precise sub-models, consider a representative relation of diesel
and petrol engines. It may be determined in advance how many petrol engine and
diesel engine cars should be in the sample for each consumption segment.
(3) Collect list prices for the chosen sub-models
•
According to the HICP standards for cars and other vehicles, list prices can be used
as an approximation of real transaction prices in the case of new cars. The term “list
price” in the HICP standard can be interpreted in two ways: either as the recommended retail price of the manufacturer or as the offer price (asking price) of a certain retailer. For ease of expression it is here interpreted as the manufacturer’s recommended retail price.
•
This allows for a central price collection on the websites of the manufacturers. The
regional and outlet dimensions are then be disregarded.
60
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Minimum size of the target sample
•
The sample size should be established country-specifically.
•
A pragmatic proposal is to apply a minimum sample size of 100 observations monthly
for the index of new cars. That means, for four consumption segments a minimum of
25 price observations per consumption segment is sufficient to ensure representativity.
Weighting of the selected primary models
•
2.3
The primary models which are chosen to represent the consumption segment are
weighted implicitly according to the respective number of registrations (see section
(2) Select sub-models). That means the sample size of approximately 25 price observations is distributed over 25 sub-models which refer to primary models.
Proposal for the replacement strategy
•
The primary models which were selected for the target sample serve as a product specification for the replacement.
•
A replacement can be carried out by selecting a new sub model within the range of the
corresponding primary model (product specification) or the replacement can be selected by changing the primary model. The decisive criterion for the selection of a replacement is representativity. For practice in the following a distinction is made between replacements within the scope of the monthly price collection and a regular
re-sampling.
Replacements within the scope of the monthly price collection
•
Use the concrete primary model and the corresponding selected sub-models that
were in the sample in the previous month as a starting point for the price collection
of the current month in order to keep the share of matched models reasonably high.
However, if an observation has to be replaced the main criterion for the selection of
the successor model should be representativity.
•
If the sub-model dramatically loses its market share within the primary model or if the
primary model dramatically loses its market share within the consumption segment,
it has to be replaced by a more representative sub- or primary model that is most similar to the old one (non-forced replacement). In the case of the non-forced replacement
of a primary model its successor can belong to the same or to an alternative brand.
If the successor belongs to an alternative brand, it should belong to the same “precisely defined” consumption segment (e.g. same Euro-NCAP class). For the newly
chosen primary model precise sub-models and their corresponding list prices have
to be assigned again. This sub-model selection should correspond to the procedure
as mentioned in the preceding section.
•
A forced replacement occurs if the sub-model or primary model disappears. In this
case the same rules apply as in the case of non-forced replacements.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
61
New cars
Periodical re-sampling
•
The periodical re-sampling should be conducted regularly (e.g. yearly). This periodical re-sampling depends on the publication of the statistics of the initial registrations
(e.g. from the previous year) by the central car register.
•
Repeat the pre-selection process as described for the compilation of the target sample. Corresponding to the sales figures (or the number of initial registrations) the preselection of primary models has to be renewed regularly. This way current trends in
characteristics, e.g. towards larger size of the cars within the consumption segments
are included.
•
Within the scope of the re-sampling the implicit weights of the selected primary models are renewed. Therefore, the number of selected sub-models and the corresponding number of price observations for each primary model can change.
2.4
Proposal for the quality adjustment
•
In the case of new cars, minor changes constitute changes in the equipment of an
essentially equivalent model which can be adjusted explicitly. Major changes in the
quality of new cars are too complex for an explicit quality adjustment. In these cases
an implicit method is applied.
•
Minor changes in quality can occur in three situations:
–
–
–
•
The equipment of a precise sub-model changes.
The replacement of a precise sub-model within the range of the same primary
model only contains changes in the equipment.
The succession of an old by a new version of a primary model is judged to be essentially equivalent (expert knowledge necessary). That means the only difference
between the old and the new version of the same primary model is equipment
features. A hint for the essential equivalence of an old and a new version can be
that the old version had a rather short life span and the follow-up model looks
rather similar.
Major changes in quality occur in three situations:
–
–
–
The primary model is replaced within the scope of monthly price collection.
The succession of an old by a new version of a primary model contains too many
complex technological modifications.
The replacement of a precise sub-model within the range of the same primary
model contains too many complex technological modifications.
Quality adjustment in the case of minor changes: Option pricing
•
Option pricing should be applied whenever it is possible.
•
Minor quality changes could for example be changes in the extension of the standard
equipment package, e.g. the inclusion of:
–
–
–
62
automatic bi-zone air-conditioning,
parking sensor,
...
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
•
For the application of option pricing it is important that the considered equipment
items were already available as options prior to the inclusion into the standard equipment package. A description of the calculation of a quality adjusted price when applying option pricing can be found in the first chapter of the handbook. The price of the
new model is adjusted for 50 % of the value of the changed equipment that was observable as an option in the previous period. The 50 % rule is confirmed by the HICP
standards on cars.
•
It is possible to adjust the price of the replaced model as well as to adjust the price
of the replacement model with regards to the same option prices.
•
Option pricing is applied in the case of the extension as well as in the case of a reduction of the equipment level.
•
There are two cases where option pricing should be applied even though no particular option price is available. In these cases indirect option prices are calculated.
–
–
A change in the fuel consumption of a car.
A change in the engine power of a car.
•
If the presumed fuel consumption of a car changes this could be adjusted by applying
supported expert judgement. This expert judgement calculates an indirect option price
for a hypothetical amount of mileage. It is suggested to calculate the value of changing
fuel consumption for an assumed mileage of 75,000 kilometres. (The car is presumed
to be driven 75,000 kilometres in the following 5 years). That means to multiply the
difference in fuel consumption with 75,000 kilometres and the current fuel price. This
indirect option price should be adjusted for 100 %. Applying this indirect option price
aims at adjusting for changing running expenses from the consumer’s perception. It
is assumed that the consumer evaluates the value of changed fuel consumption for
the following five years on the basis of the recent purchase price of petrol or diesel.
Furthermore it is assumed that the average consumer might drive 15,000kilometres
per year. NSIs may establish conventions for the assumptions involved.
•
If the engine power of a car changes this should be adjusted by applying supported
expert judgement calculating an indirect option price. This indirect option price should
be based on the former price range of sub-models that only differ with regard to engine power. Suppose a range of three sub-models with an engine power of 80, 100 and
120 hp and that their list price is 12,000, 13,500 and 15,000 €. The new cheapest
sub-model has 90 hp and costs 12,500 €. In this case the indirect option price would
be based on the difference between the list prices of the former sub-models with 80
and 100 hp. A complication occurs when differences in engine power are coupled with
differences in other relevant technical characteristics (e.g. better brakes) or enhanced
standard equipment (e.g. ESP is standard in case of the higher engine versions). The
price differences between different sub-models with different engine versions can
serve as a basis for option pricing if such combinations are taken into account. For example if the sub-model with the higher engine power has some extra equipment standard that costs 1,000 €, then this 1,000 € should be subtracted from the price difference between the two sub-models.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
63
New cars
Discussion box
It has to be considered that the application of an indirect option price for engine power is a completely new topic in the field of quality adjustment for new
cars. The proposal is based on a convention of the CENEX consortium and was
in the focus of discussion. Nevertheless, this proposal should be subject of practical advanced empirical investigations in future.
•
The option prices should ideally stem from the particular manufacturer for the particular primary model (from the previous period). If such an option price is not available
option price lists of alternative brands for similar primary models from the same consumption segments could also be used.
Discussion box
The price range of identical options over primary models of different brands may
differ to a large extent although the respective cars refer to the same consumption segment. In this case it could be more appropriate to use option prices from
a different primary model of the same brand, even if the model refers to a different consumption segment.
Quality adjustment in the case of major changes: Bridged overlap
•
In the case of major changes in quality, no explicit quality adjustment is possible since
the replaced and the replacement product are in principle not comparable at all.
•
If major quality changes occur, bridged overlap shall be applied. The price development of all other models of the same consumption segment which were not replaced
build the bridge between the replaced and the replacement model. In this way the average price development of a set of comparable models is used.
64
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Discussion box
It might be argued that the assumptions concerning the application of bridged
overlap (see Chapter One) might not be entirely met in the case of new cars.
Especially, the assumption that manufacturers and sellers react to the introduction of a new model in the same period might be questionable. Nevertheless, the
HICP standards on cars and other vehicles classify the application of bridged
overlap for fundamental changes in quality (major changes) as a B method.
As a bridge the price development of all other models in the same consumption segments is used. To apply an alternative bridge as the price development
of models from the same manufacturer is impractical due to the sampling strategy (since it is not secured that other models of the same manufacturer are in
the sample).
Explicit quality adjustment methods as option pricing or hedonic methods may
fail in the case of fundamental changes between car models which contain too
complex technical developments.
In summary, in the case of major changes there might be no alternative to bridge
overlap due to practical reasons. This decision was made by convention due to
practical reasons since fundamental changes between car models might be too
complex for explicit quality adjustment.
The appropriate number of consumption segments in the case of cars depends
on the use of such segments. In the case of cut-off sampling, combined with
stratification according to consumption segments, the appropriate number would
be about 10. The same applies in the case of non-forced replacements where the
successor belongs to a different brand.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
65
New cars
Chart 42
Regular re-sampling
Re-sampling
È
Primary model, e.g. VW Golf
M
M
“Medium-sized new cars”
Consumption segment
M
M
M
M
M
M
Primary model, e.g. Opel Astra
M
M
M
M
M
M
M
M
Primary model, e.g. MB A-class
M
M
M
Primary model is no
longer representative
M
Change to a different primary model:
Bridged overlap
2.5
Cost estimation
One-time effort for the introduction of option pricing for new cars
•
The start-up costs stem from the creation a new car index system. This involves:
(a) the setting up of a sample structure,
(b) the selection of the data sources and the creation of a documentation system,
(c) the device of new QA-procedures – option pricing in the case of new cars.
•
The setting up of a sample structure involves a thorough investigation of the sample
frame, the creation of the sample structure itself and the identification of market segments for all car models. This work can easily involve 15 full working days (mostly of
an academic level except for the identification of the market segments).
•
The selection of the data sources, the creation of the documentation system and the
device of the new QA procedures can be performed by the product expert. All this
work should however be tightly supervised – especially the device of the new QAsystem.
•
The work on parts (a) and (b) amount to approximately 15 full working days.
66
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Current effort of price collection and index calculation
•
The monthly operational costs involve approximately 5 full working days for the product expert.
Effort of periodical re-sampling
•
The cost of the yearly re-sampling will depend on the thoroughness of the operation.
In Belgium, the sample structure is totally overhauled every two years. This process
involves up to 10 full working days. In the years in between the sample structure gets
a less complete update (the weights and number of observations for each consumption segment will remain constant). A semi-update of the sample structure will cost
approximately 5 full working days.
2.6
Examples
Example: Generation of consumption segments
•
In the first step, the new car market has to be divided into different consumption segments according to the size of the car. The pragmatic proposal is to define four consumption segments:
–
–
–
–
•
small new cars,
medium-sized new cars,
large new cars,
special new cars.
The universe of primary models consists of approximately 300 cars in each country.
Each of these primary models has to be assigned to the identified consumption segments. The distinguishing criterion is the size of the car.
–
The data on the particular primary models that are registered in the country can
be obtained from the national central car register.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
67
New cars
Chart 43
Example for the assignment of concrete models to the consumption segments
Central car register
Registrations
per year
Primary models
Small new cars
VW Golf
...
Opel Corsa
...
Mercedes Benz SLK
...
Mazda 323
...
...
...
Medium-sized new cars
Large new cars
Special new cars
•
As a result, each consumption segment consists of precise primary models.
Chart 44
Consumption segments and corresponding primary models
Small new cars
Primary
model
Initial
registrations
Opel
Corsa
26,082
VW
Polo
16,332
Renault
Twingo
11,094
...
sum
...
192,684
Consumption segments
Medium-sized
Large new cars
new cars
Primary
Initial
Primary
Initial
model
registrations
model
registrations
Mercedes
VW
73,692
Benz
18,631
Golf
C class
BMW
Opel
16,247
36,889
5 series
Astra
Mercedes
Benz
34,851
Audi A 6
12,937
C class
...
...
...
...
sum
440,878
sum
73,504
Special new cars
Primary
model
Opel
Zafira
Initial
registrations
14,620
VW
Caravelle
8,305
VW
Sharan
7,162
...
sum
...
124,923
•
The consumption segments have to be weighted according to their expenditure share.
•
To obtain the total expenditure of a consumption segment, multiply the total number
of initial registrations from the previous year for the particular consumption segment
by its average price.
– The total number of initial registrations for each consumption segment can be taken from the central car register. The average price for a consumption segment can
be calculated on the basis of the sample of list prices for the selected sub-models (see next section).
68
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 45
Calculation of expenditure weights for the consumption segments
Weighting the consumption segments
Small
new cars
Medium-sized
new cars
Large
new cars
Special
new cars
Total number
of initial
registrations
Average
price
(in €)
Total number
of initial
registrations
Average
price
(in €)
Total number
of initial
registrations
Average
price
(in €)
Total number
of initial
registrations
Average
price
(in €)
192,684
15,000
440,878
20,000
73,504
30,000
124,923
25,000
Total expenditure (in billion €):
2.89
8.81
2.21
3.12
Total expenditure of all consumption segments: 17.03 billion €
Expenditure share (in %):
17.0
51.8
12.9
18.3
Example: Construction of a target sample
•
Select primary models by probability sampling (cut-off sampling also possible):
–
–
The primary models are sampled in accordance to their market shares within the
consumption segment.
An example of sampled primary models within the consumption segment “medium-sized new cars” could look like this:
Chart 46
Exemplary target sample for the consumption segment “medium-sized new cars”
Target sample for “medium-sized new cars”
Initial registrations
(from the previous year)
Market share
(in %)
VW Golf
73,692
16.7
Opel Astra
36,889
8.4
Mercedes-Benz C-class
34,851
7.9
BMW 3 series
33,548
7.6
Ford Focus
32,264
7.3
Audi A4
30,123
6.8
Audi A3
26,007
5.8
Primary model
–
The selected primary models represented in the table make up 60.5 % of the market for “medium-sized new cars”.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
69
New cars
•
Select sub-models by purposive sampling:
–
–
The selected primary models have to be implicitly weighted according to the relation of their sales figures. Since in our proposal the decision was made to collect 25 price observations per consumption segment, we have to determine how
many price observations to collect for each primary model.
The distribution of the price observations might look like this (implicit weighting):
Chart 47
Number of observations to be included
Distribution of price observations
Primary model
Share in the
sample (ratio)
(in %)
Market share
(in %)
Number of price
observations
16.7
Æ
27.6
7
Opel Astra
8.4
Æ
13.9
4
Mercedes-Benz C class
7.9
Æ
13.1
3
BMW 3 series
7.6
Æ
12.6
3
Ford Focus
7.3
Æ
12.1
3
Audi A4
6.8
Æ
11.2
3
Audi A3
5.8
Æ
9.6
2
60.5
Æ
VW Golf
Total
–
–
–
70
100
25
In this example seven sub-models for the primary model VW Golf have to be selected in order to reflect the weight of its market share within the consumption
segment “medium-sized new cars”.
The sub-model selection has to be conducted by purposive sampling according
to the following criteria:
– Should be a well sold sub-model within the range of the corresponding primary model.
– Should be of the least extensive equipment level.
– Must not be a special edition.
The following chosen sub-models meet these criteria. Additionally, the approximate ratio of diesel and petrol engines is considered.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 48
Precisely selected sub-models for the primary model “VW Golf”
Sub-model selection for VW Golf
Observation
•
Sub-model
Manufacturers’
list price
(in €)
1
Golf V 1.4 Trendline
16,300
2
Golf V 1.6 Trendline
17,600
3
Golf V 1.4 TSI Trendline
18,975
4
Golf V 1.9 TDi Trendline
19,150
5
Golf V 2.0 TDi Trendline
22,000
6
Golf V 1.4 Comfortline
17,800
7
Golf V 1.9 TDi Comfortline
20,650
Collect list prices for the chosen sub-models:
–
The list prices for the corresponding sub-models are collected on the national website of the manufacturer.
Example: Replacement strategy
•
Replacements within the scope of the monthly price collection:
–
•
Representativity check for the periodical re-sampling
–
–
•
The price collector (product expert) should assess whether replacements are called
for. Therefore it should be checked:
– Whether the sub-model is still available. Even if the sub-model can still be
found on the manufacturer’s website it might in fact not be available anymore because maybe the manufacturer’s website is not updated.
– Whether the introduction of a new sub-model within the range of the corresponding primary model results in declining representativity of the formerly
selected sub-models.
– Whether the considered primary model is followed by a new version: The follow-up model for VW Golf IV is the VW Golf V.
The target sample should be updated regularly, maybe yearly, depending on the
publication of the statistics of the initial registrations by the central car register.
The procedure is the same as described in section 2.2.
The following table (see chart 49) shows the replacement of the sub-model 1.4 TSI
Trendline by 1.4 TSI Comfortline.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
71
New cars
Chart 49
Exemplary replacement situation for concrete sub-models
Exemplary replacement situation
Selected sub-models for
VW Golf V in period 1
Price in period 1
(in €)
Selected sub-models for Price in period 2
VW Golf V in period 2
(in €)
Golf V 1.4 Trendline
16,300
Æ
Golf V 1.4 Trendline
16,300
Golf V 1.6 Trendline
17,600
Æ
Golf V 1.6 Trendline
17,600
Golf V 1.4 TSI Trendline
18,975
Æ
Golf V 1.4 TSI Comfortline
20,475
Golf V 1.9 TDi Trendline
19,150
Æ
Golf V 1.9 TDi Trendline
19,150
Golf V 2.0 TDi Trendline
22,000
Æ
Golf V 2.0 TDi Trendline
22,000
Golf V 1.4 Comfortline
17,800
Æ
Golf V 1.4 Comfortline
17,800
Golf V 1.9 TDi Comfortline
20,650
Æ
Golf V 1.9 TDi Comfortline
20,650
Example: Quality adjustment
•
Minor changes in quality can occur in three situations:
–
–
–
•
Major changes in quality can occur in three situations:
–
–
–
72
The basic equipment of the sub-model VW Golf V 1.9 TDi Trendline is for instance
extended by a parking sensor.
The sub-model VW Golf 1.4 TSI Trendline is replaced by the VW Golf 1.4 TSI
Comfortline. Caution, the extension of the equipment version is considered to include additional equipment features rather than complex technological modifications.
The succession of an old by a new version of the primary model is judged to be
a minor change that means it is considered to include changed equipment features rather than complex technological modifications. E.g. the planned succession of the Peugeot 307 by the Peugeot 308. The Peugeot 307 has had a rather
short life span and the follow-up version hardly contains technological developments and even looks very similar.
The primary model is replaced due to the representativity check within the scope
of monthly price collection. For example, when the production of the VW Lupo was
stopped in 2005 it had to be replaced by another representative primary model.
The succession of an old by a new version of a primary model contains many technological modifications. For example, the primary model VW Golf IV was followed
by the VW Golf V including technical progress in engine, gearbox and chassis
suspension.
The replacement of a precise sub-model within the range of the same primary
model: This should be an exemption in case the replacement sub-model contains complex technological modifications and no corresponding option prices
exist. E.g. The replacement of a VW Golf V 1.9 TDi Trendline by a VW Golf V GTI
2,0 FSI.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
•
Option pricing for minor changes:
–
–
–
Option pricing matches the price of the new observation and the price of the old
observation by adjusting for 50 % of the value for all equipment features that differ between the replaced and the replacement product.
In the example below (see chart 50), a VW Golf V 1.4 TSI Trendline is replaced by
the same model but with the equipment level “Comfortline”. These equipment levels differ for instance in three features: an air conditioning, light-alloy wheels and
an airbag system.
It is possible to adjust the price of the replaced model as well as to adjust the
price of the replacement model with regards to the same option prices. In the example provided below the price of the replacement model is adjusted. The price
of the new car is adjusted for 50 % of the value of these changed features.
Chart 50
Option pricing for minor changes
Option pricing for minor changes
Sub-model in period 1
Price
in period 1
(in €)
Golf V 1.4 TSI Trendline
18,975
Price
in period 2
(in €)
Sub-model
in period 2
Æ
Golf V 1.4 TSI Comfortline 5)
Air conditioning
20,475
1,610 €
– 805
340 €
– 170
800 €
– 400
Light-alloy
wheels
Head-airbag
system
Adjusted price
•
19,100
Option pricing for fuel efficiency:
–
–
The information on fuel efficiency can be gained from manufacturer information,
which is established in accordance with regulation 1999/94/EG. The fuel efficiency has to be measured by the fuel consumption in the case of “combined drive
cycle” (= mix of urban and extra urban drive cycle).
To calculate an indirect option price for the fuel consumption one has to presume
a precise amount of mileage for which the value of fuel consumption can be calculated. This can be illustrated by the following example:
5) Fictitious example.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
73
New cars
–
–
74
Sub-model A is replaced by the similar sub-model B. The fuel efficiency has
enhanced because the fuel consumption in the case of “combined drive cycle”
has diminished from 8.0 to 7.5 l/100 km. Suppose that the actual fuel price
is 1.50 € (price of diesel). Then the option price for enhanced fuel efficiency
can be calculated as:
(8.0 l – 7.5 l) / 100 km) · 75,000 km · 1.50 €/l = 562.50 €.
In the case of fuel consumption the indirect option price equals 100 % of the
value of the difference in fuel consumption of the presumed mileage.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
3
Used cars
•
In the first step, the COICOP four-digit-level: 07.1.1 Motor cars (containing motor cars,
passenger vans, station wagons, estate cars and the like with either two-wheel drive
or four-wheel drive) is divided into new cars and used cars. This section only deals
with used cars.
•
The Task Force on Quality Adjustment and Sampling did not establish an explicit classification of A, B, C-methods for used cars.
•
The recommended quality adjustment methods for used cars depend on the available
data source. For data from internet platforms or car magazines it is recommended to
apply the regression approach or supported expert judgement and in the case of data
from a market research institute a combination of direct comparison and bridged overlap should be applied.
•
With regard to the data collected on internet platforms and car magazines, the empirical studies of the CENEX members showed that hedonic re-pricing is more suitable
than supported expert judgement. The empirical studies showed that the hedonic repricing method yields more reliable results with less effort.
•
The following section deals with the application of a combination of direct comparison and bridged overlap for market research data and with supported expert judgement in the case of data from internet platforms and car magazines.
3.1
Proposal for the identification of consumption segments
•
The identification of appropriate consumption segments for used cars equals the one
for new cars (see section 2.1).
•
Therefore, the pragmatic proposal is to divide the used car market into four consumption segments according to the size of the car:
Chart 51
Proposal for the identification of consumption segments for used cars
New cars
“small used cars”
COICOP 07.1.1
Motor cars
“medium-sized used cars”
Used cars
“large used cars”
“special used cars”
Federal Statistical Office, Statistics and Science, Vol. 13/2009
75
Used cars
•
The next step is to assign the universe of the used car models registered in the country
to the different consumptions segments.
•
This should be done on the basis of primary models. A primary model is an umbrella
term for a model of a passenger car (e.g. VW Golf) whereas the term “sub-model” refers to a specific engine – and equipment version of the primary model (e.g. VW Golf
V 1.9 TDI Trendline).
•
The universe of primary models within the used car market might be bigger than the
new car universe (consisting of approximately 300 models for a country). The larger
universe is due to the fact that the used car universe goes back in time and therefore also contains primary models that have been sold years ago. The information
which primary models are currently registered in the particular country can be obtained from the national central car register or from a national car seller’s association.
•
The assignment of primary models to the mentioned consumption segments can be
conducted individually by each country and may to a certain degree depend on national circumstances (possibly, the consumption segment “large used cars” contains different primary models in different countries).
Chart 52
Example for the assignment of concrete models to the consumption segments
Central car register
Primary models
Small used cars
Medium-sized used cars
Large used cars
Special used cars
•
76
Registrations
per year
VW Golf
...
Opel Corsa
...
Mercedes-Benz SLK
...
Mazda 323
...
...
...
The central car register (or the national car seller’s association) usually already contains a variable which divides the primary models into different segments. In many
cases this segmentation has only to be adjusted to the segmentation of the four consumption segments. Altogether, the assignment to segments is not very extensive. As
a result all primary models that are registered in the particular country are assigned
to the consumption segments.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Weighting of the consumption segments
•
The COICOP four-digit-level 07.1.1 Motor cars is divided into new cars and used cars
with individual weights. The weight for used cars in general has to conform to the netprinciple of the HICP. This net concept takes into account the result of all interaction
in purchases between the business sector and the private sector. This means that the
weight shall reflect all purchases from business to private households minus the purchases from private households to the business sector – but not transactions between private households.
•
As the weight for used cars (on the COICOP five-digit-level) is established, the weight
may be split over the consumption segments. It is suggested to weight the mentioned
consumption segments explicitly according to the expenditure shares.
•
Generally, the expenditure shares will not be available but could be obtained by multiplying the number of ownership changes from the previous year by an average price
for the cars of the respective consumption segment.
•
The total number of ownership changes for each consumption segment can be obtained from the central car register and should refer to the previous year. In order to
ensure the net concept of the HICP, it is important to only consider sales from the
business sector to private households. If the central car register contains the information whether the car was registered by a private person or by business cooperation
prior to the transaction, only ownership changes from business retailers have to be
considered. If the information on the ownership status prior to the transaction is not
contained in the central car register, only those ownership changes with a temporary
vehicle deregistration prior to the transaction should be considered assuming that
only business sales are temporarily deregistered.
•
An average price for the consumption segment could be calculated on the basis of the
sample of list prices, which are collected anyway. The collection of list prices will be
described in the next section.
3.2
•
Proposal for the construction of a target sample
It is suggested (for each consumption segment) to first stratify the universe of used
cars according to age classes. The second step is to select relevant primary models
within these age classes for price observation with the help of information on the market shares from the central car register. The next step is to assign precise sub-models
to each primary model. For the price collection, the regional dimension should only
be considered roughly. The outlet dimension can be neglected.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
77
Used cars
Chart 53
Proposal for the construction of a target sample
Stratification according to age classes > primary models > sub-models > offer prices
(1)
(2)
(3)
(4)
Stratification by age classes.
Selection of relevant primary models.
Assignment of precise sub-models.
Collection of offer prices.
(1) Stratification by age classes
•
The consumption segments should be stratified according to country-specific age
classes following “peaks” in the amount of ownership changes in the national used
car market. This information can be obtained from the central car register.
•
For practical reasons, there should be about three age classes.
•
In many European markets these peaks will occur for up to two-year-old cars (which
are often formerly leased or rental cars). An age class of ten years and older should
be avoided.
•
As a pragmatic proposal the used car market could be stratified according to the following age classes:
– ~ 2 years old cars,
– ~ 3 years old cars,
– ~ 4 years old cars.
Chart 54
Visualisation of the stratified used car market
Used cars
CS
“small used
cars”
CS
“medium-sized
used cars”
CS
“large used
cars”
CS
“special used
cars”
• ~ 2 years
• ~ 3 years
• ~ 4 years
• ~ 2 years
• ~ 3 years
• ~ 4 years
• ~ 2 years
• ~ 3 years
• ~ 4 years
• ~ 2 years
• ~ 3 years
• ~ 4 years
(2) Selection of relevant primary models
•
78
Two possible sampling methods can be applied: The first is to apply cut-off sampling
within the consumption segments and age classes. If a small number of models represents a large share of the market this sampling method works well. If, on the other
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
hand, the market shares are well spread among primary models, probability sampling
proportional to size could be a better choice. In general, probability sampling in the
case of cars can be applied without much effort because of the limited complexity of
the universe of car models (only approximately 300 primary models). The selection
criterion for both sampling methods is the number of ownership changes (of the particular primary model) for the considered age class from the previous year. As there
is data on model-specific expenditure shares available, those should be applied as
the selection criterion.
•
When applying probability sampling the central car register can serve as a sampling
frame. The sampling of models may be based on the number of ownership changes of
car models from the previous year. Please ensure that only ownership changes from
business retailers to private households or alternatively with a deregistration prior to
the transaction, are considered.
•
The procedure is quite similar for cut-off sampling. Here, the models with the highest
number of ownership changes in the age class are selected successively until a defined cumulative share of ownership changes is reached.
(3) Assignment of precise sub-models
•
After the selection of primary models, sub-models of the selected primary models
have to be chosen. For this selection process, purposive sampling is suggested.
•
The sub-models may be selected according to the following criteria:
– The sub-model should be well sold (e.g. a well sold engine power alternative such
as VW Golf 1.9 TDI).
– The sub-model should be of the least extensive equipment level (e.g. the least
extensive equipment level of Volkswagen is called “Trendline”).
– The sub-model must not be a special edition (e.g. VW Golf equipment version
“Goal”).
•
The higher the number of ownership changes of a primary model, the higher is the
number of price observations (i.e. implicit weighting). Each collected offer price has
to refer to one precise sub-model.
•
If a high number of offer prices is collected, which means that a corresponding high
number of sub-models has to be observed, it might be necessary to diverge from the
sub-model selection criteria mentioned above. Especially, the restriction on the equipment level could be relaxed.
•
For the selection of precise sub-models, consider a representative relation of diesel
and petrol engines. It should therefore be determined in advance how many petrol engine and diesel engine cars should be in the sample for each consumption segment.
•
Select sub-models that have similar engine power. That means no engine versions
with very high or very low engine performance should be chosen.
(4) Collection of offer prices
•
According to the HICP standards for cars and other vehicles, list prices can be used
as an approximation of real transaction prices. For used cars, the term “list price” in
Federal Statistical Office, Statistics and Science, Vol. 13/2009
79
Used cars
the HICP standards can be interpreted as an offer price since it is often not possible
to gain information on real transaction prices in the scope of price collection.
•
The regional dimension should only be considered roughly since it is not assumed
to be a major issue.
•
This allows for a central price collection.
•
In general, there are three approaches for the price observation:
–
–
–
Observation of transaction prices from a market research institute.
Observation of offer prices from a market research institute.
Observation of offer prices from internet platforms or car magazines.
•
The usage of data from a market research institute may often be preferable if available. Generally, market research institutes have available many price observations
which they can use as a basis for the calculation of specific used car values. For this
reason, the exact used car price for a certain sub-model with exactly presumed age
in months and mileage can be observed.
•
The first choice is to use data from a market research institute that provides transaction prices if available. In many European countries, transaction prices are provided
for example by DAT.
•
The second choice is to use data from a market research institute that provides offer
prices. In many European countries data is available for example from Eurotax
Schwacke.
•
If data provided by a market research institute are not available for the national used
car market, an own survey on internet platforms, such as www.autoscout24.xx, and
car magazines has to be performed. Keep in mind to collect only offer prices from business retailers.
– In the case of price collection on internet platforms and car magazines, the following issues are of importance:
– It is most important to refer to the same sub-model over time.
– The age in months should be held constant as far as possible. In practice,
there will be variations in the precise age which have to be adjusted in the following (see section 3.4 “Proposal for the quality adjustment”).
– The mileage should be considered roughly. That means the mileage should be
of the same dimension from period to period. The variations in mileage also
have to be adjusted in the following.
– It is very important not to consider product-offers from private persons, cars
that have been involved in an accident and cars with a lot of obvious extra
equipment or modifications.
– If there is a large range of product-offers that fit all the mentioned criteria,
any concrete product-offer among the price variance may be selected.
– For used cars, the characteristics which are relevant for the price and thus have
to be collected are: Brand, primary model, sub-model, age and mileage.
– An appropriate price collection form should precisely ask for the considered characteristics and might look like this:
80
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 55
Example of a price collection form
Used car: Consumption segment “medium-sized cars” and age class ~ 3 years
Brand/Make:
Primary model:
Sub-model (with equipment level):
Price (€):
Date of initial registration [year and month]:
Mileage (km):
Retailer: ABC retailer
Date: 2008/07/20
Volkswagen
Golf V
1.9 TDi Trendline
13,450
2005/05
42,400
Minimum size of the target sample
•
The sample size should be established country-specifically since the number of observations needed depends on the number of generated consumption segments and age
classes.
•
A pragmatic proposal may be a minimum bound of normally 100 product-offers monthly
for the sample of used cars where no more specific indications are present.
•
That means, for four consumption segments a minimum of 25 to 30 price observations
are spread over three age classes resulting in a sample size of approximately 8 price
observations per age class within the consumption segment.
Chart 56
Distribution of price observations
Used cars
~ 100 observations
CS
“small used
cars”
CS
“medium-sized
used cars”
CS
“large used
cars”
CS
“special used
cars”
25 – 30 observations per consumption segment
•
•
•
~ 2 years
~ 3 years
~ 4 years
•
•
•
~ 2 years
~ 3 years
~ 4 years
•
•
•
~ 2 years
~ 3 years
~ 4 years
•
•
•
~ 2 years
~ 3 years
~ 4 years
Approx. 8 observations per age class and consumption segment
Federal Statistical Office, Statistics and Science, Vol. 13/2009
81
Used cars
Weighting of the selected primary models
•
The primary models which are chosen to represent the age class in the consumption
segment are weighted implicitly according to the respective market share (see subsection (3) Assignment of precise sub-models). The criterion for implicit weighting is
the share of ownership changes of the respective primary model within the age class
and consumption segment.
Alternative proposal for the construction of a target sample
•
As the procedure described above may cause some work load and costs it could be
useful to apply a rather simple approach for the target sample: Use the same target
sample for used cars as you have used for new cars, for example from two years ago.
•
This simple approach might not be recommendable for every country. It is important
that the structure of the new car market and the used car market is rather similar. That
means, the same primary models which are sold as new cars are also sold as used
cars two years later. This assumption could be violated when the used cars are imported to a large extent prior to their purchase.
•
If the new car sample from two years ago can be applied, this means there is only
one age class: 2-year-old cars. The price development of this age class represents the
whole range of ages in the used car market. The relevant primary models have been
selected with the help of information on the initial registrations from the central car
register. The corresponding sub-models that have been assigned to the primary models could be the same as for new cars again. The price collection would be the same as
described above, the regional dimension should only be considered roughly and the
outlet dimension can be neglected.
•
So, the construction of the target sample consists of only two steps:
Chart 57
Proposal for the construction of a target sample
New car sample from two years ago > offer prices
(1) Use the new car sample from two years ago
(2) Collection of offer prices
•
Regarding the minimum size of the target sample, the sample size corresponds to the
sample size of new cars two years ago. A sample size of 100 observations monthly
should be ensured. That means, for four consumption segments a minimum of 25 price
observations per consumption segment is sufficient to ensure representativity.
•
The weights of the primary models should correspond to the weights that have been
applied in the new car sample two years ago. That means the sample size of approximately 25 price observations is distributed over 25 sub-models which refer to primary
models (implicit weighting).
82
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
3.3
Proposal for the replacement strategy
•
A peculiarity of the used car market is that no observation can be assumed to be identical over time since the characteristics “age” and “mileage” change continuously.
Thus, no used car can be considered to be “identical” over periods. This means, in every period each observed used car has to be replaced minimising differences in the
particular model (as an approximation for similar equipment), precise age and mileage.
Note that this peculiarity is solved in the case that data from a market research institute are used where the characteristics of the specific cars are kept fix. In this case one
is comparing an average price of sold cars with the same characteristics in consecutive months.
•
The primary models within the age classes which were selected for the target sample
serve as a product specification for the replacement. To keep the sample fix means
that the replacement model is characterised by the same primary model and exactly
the same sub-model with identical age (in months) and mileage. To hold the precise
age and mileage constant will only be possible if market research data are used, otherwise the observed sub-models will vary in these characteristics.
•
A replacement can be carried out by selecting:
– the same sub-model (with small variations in age and mileage in case of data collected on internet platforms or in car magazines)
– a new sub-model within the range of the corresponding primary model (with variations in age and mileage in case of data collected on internet platforms or in car
magazines).
– or the replacement can be selected by changing the primary model.
•
In the following a distinction is made for practical reasons between a representativity
check within the scope of the monthly price collection and a regular representativity
check in the office.
Representativity check within the scope of the monthly price collection
•
In the case of used cars it is suggested to evaluate the sample based on yearly figures.
This is due to practical reasons like the issue of seasonality and the availability of
monthly data from the central car register.
•
Use the concrete primary model and the corresponding selected sub-models that
were in the sample in the previous month as a starting point for the price collection
of the current month in order to keep the share of matched models reasonably high.
•
If the sub-model of the previous month is no longer available it should be replaced.
The replacement sub-model has to correspond to the same primary model maintaining
the basic characteristics so that the effect on the sample characteristics will be minimal. The new sub-model would then be
– of the same primary model,
– of the same age class,
– of the same mileage dimension.
•
If the primary model significantly loses its market share within the consumption segment and age class, maybe due to a former production stop, the primary model has
Federal Statistical Office, Statistics and Science, Vol. 13/2009
83
Used cars
to be replaced by a more representative primary model that is most similar to the old
one. To give an example, the production of the VW Lupo was stopped (therefore this
primary model became unrepresentative) and it was replaced by the VW Fox. This
replacement primary model could be a different primary model (with similar size) of
the same or an alternative brand. For the newly chosen primary model precise submodels in the corresponding age class have to be assigned again as described in the
preceding section.
Representativity check in the office for the periodical re-sampling
•
Additionally to the representativity checks conducted within the scope of price collection a representativity check should be conducted regularly (e.g. yearly). This periodical renewal depends on the publication of the statistics of the ownership changes
(e.g. from the previous year) by the central car register.
•
Check whether the pre-selected primary models are still representative of the market
in the particular consumption segments and age classes. Therefore, the number of
ownership changes has to be observed regularly. This regular observation can also
be used to determine general changes in the market, e.g. new primary models, new
sub-models, new brands.
•
If the selected primary models are still representative, continue the observation of the
corresponding sub-models.
•
If not, repeat the pre-selection process as described for the compilation of the target
sample. Corresponding to the sales figures (number of ownership changes) the preselection of primary models has to be renewed regularly (yearly). If the formerly preselected primary models are no longer representative they will be replaced by more
representative primary models.
3.4
Proposal for the quality adjustment
•
A distinction of minor and major changes helps to decide whether an explicit or an implicit quality adjustment is necessary. Both, major and minor changes require a quality
adjustment. In general, major changes in the quality of used cars are too complex for
an explicit quality adjustment whereas minor changes can be adjusted explicitly.
•
Minor changes in quality of used cars can only occur in case of observing the same
sub-model over time:
– changes in the equipment of a precise sub-model,
– the age of the car [in months] differs,
– the mileage of the car differs.
•
Major changes in the quality of used cars are:
– the change of the primary model,
– the introduction of a new generation of the primary model (the old version of the
primary model is followed by a new version),
– the change of the sub-model.
84
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Quality adjustment in the case of major changes: Bridged overlap
•
In the case of major changes in quality, no explicit quality adjustment is possible since
the replaced and the replacement product are in principle not comparable at all.
•
If major quality changes occur, bridged overlap shall be applied. The price development of all other models of the same consumption segment and age class which were
not replaced build the bridge between the replaced and the replacement model. In
this way the average price development of a set of comparable models is used.
Quality adjustment in the case of minor changes: Supported expert judgement
•
Minor quality changes in the equipment of a precise sub-model can not be adjusted
explicitly since the value of a singular equipment feature which is a component of
the used car is unknown. The depreciation rate of equipment features in a used car
is different from the depreciation rate of the whole car itself. Therefore it is not
possible to adjust for changed equipment which is already of used condition. In
order to alleviate the problem the equipment should not significantly differ from
month to month. In practice this means, the observed used car should not contain
too many additional equipment features or any modifications. Direct comparison
should be applied.
•
Changes in the age (in months) or in the mileage of the car should be adjusted by supported expert judgement.
•
Note that these changes in age and mileage between two periods can only occur if the
used cars prices are collected on internet platforms or in car magazines. If market
research data is used, no supported judgement adjustment would be necessary.
•
The supported expert judgement approach calculated for each considered primary
model in each age class a separate depreciation rate for age and for mileage. These
depreciation rates are calculated on a yearly basis.
•
That means, for each primary model in each relevant age class two additional samples
of approximately 10 observations have to be collected: One sample to calculate a depreciation rate for age and one sample for mileage. The price observations of these
two samples should refer to one specific sub-model. The samples should only contain
product-offers from business retailers. Cars that have been involved in an accident
and cars with a lot of obvious extra equipment or modifications should be skipped.
•
To calculate a depreciation rate for the age of a particular primary model and in a particular age class, 10 used car prices should be observed. These 10 price observations
consist of 5 pairs, each should be of the same sub-model, very similar mileage and
of different ages within the range of the corresponding age class. It is important that
the mileage of the observation pairs is very similar (nearly constant) whereas the age
in months can vary within the interval of the considered age class.
•
The sample for the “age depreciation” might look like this (see following chart 58):
Federal Statistical Office, Statistics and Science, Vol. 13/2009
85
Used cars
Chart 58
Sample for the calculation of the depreciation rate for age for the “VW Golf V”
Primary model: Volkswagen Golf V
Sub-model: 1.9 TDi Trendline
Age class ~ 2 years
Age
in month
18
26
17
24
15
29
19
24
20
26
Observation
1
2
3
4
5
6
7
8
9
10
•
Mileage
in km
32,000
31,000
13,000
13,800
33,000
33,000
40,000
39,000
31,000
33,000
Used car price
in €
15,000
14,000
14,800
14,000
16,000
13,000
15,500
14,000
15,000
14,000
On the basis of the first sample, a depreciation rate for the age in months can be calculated. Therefore, five depreciation rates for the five observation pairs have to be
computed. These depreciation rates are averaged by the arithmetic mean. Thus, the
calculation follows the equation:
§
( P9 − P10 ) ·
( P3 − P4 )
( P1 − P2 )
...
¸
+
+ +
5 © ( Age2 − Age1 ) ( Age4 − Age3 ) ... ( Age10 − Age9 ) ¸¹
1
Model _ A, Age _ class
δ Age
= ¨¨
•
A, Age _ class
can be interpreted to be the absolute monetary value of one month
The δ Age
of age for the considered primary model in the considered age class.
•
Analogously, to calculate a depreciation rate for the mileage of a particular primary
model and in a particular age class, a second sample of 10 used car prices should be
observed. These 10 price observations consist of 5 pairs again, each should be of the
same sub-model, identical age in months but of different mileages. Thereby the age
between the pairs can differ within the range of the age class.
•
On the basis of the second sample, a depreciation rate for the mileage can be calculated following the equation:
( P3 − P4 )
( P1 − P2 )
1§
...
+
+ +
5 © ( Mileage2 − Mileage1 ) ( Mileage4 − Mileage3 ) ...
Model _ A, Age _ class
= ¨¨
δ Mileage
+
86
·
( P9 − P10 )
¸
( Mileage10 − Mileage9 ) ¸¹
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
A, Age _ class
can be interpreted to be the absolute monetary value of an addiThe δ Mileage
•
tional mileage of 1,000 kilometres for the considered primary model in the considered
age class.
These model- and age-class-specific depreciation rates can be used to adjust observed used cars with subsequently different ages and mileages from month to month.
This would especially be the case when the data is collected on internet platforms and
in car magazines.
Chart 59
Quality adjustment when using data from internet platforms and car magazines
“medium-sized used
cars”
Consumption segment
24 months
M
26 months
Product specification:
VW Golf IV 1.9 TDi Trendline
• Age: ~ 2 years
• Mileage: ~ 30.000 km
M
M
Further selected models
• Age: ~ 2 years
• Mileage: ~ 30.000 km
...
Minor change (age):
Supported judgement
Golf VI 1.9 TDi Trendline
• Age: ~ 2 years
• Mileage: ~30.000 km
Major change (new sub-model):
Bridged overlap
Drawbacks of supported expert judgement
•
•
•
The empirical study on used cars conducted within the scope of the CENEX project
showed that supported expert judgement should be assessed to be the second-best
approach for used cars in the case that the hedonic re-pricing approach is not applicable.
There are several concerns against supported expert judgement:
– The sub-samples collected for the calculation of the depreciation rates for age
and mileage refer to the specific primary model and age class. Therefore these
sub-samples are rather small (10 observations) possibly resulting in unreliable
depreciation rates. The reason is that the collected offer prices can vary to a large
extent even for cars with very similar characteristics.
– Furthermore, the availability of the observations with precisely defined characteristics (sub-model, equipment version, age and mileage) can cause some difficulties.
Especially in small countries there might not be enough observations available.
In case of data obtained by a market research institute, no variations of the characteristics age and mileage occur. Therefore, no supported expert judgement has to be
applied.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
87
Used cars
Chart 60
Quality adjustment when using data from market research institutes
“medium-mized used
cars”
Consumption segment
24 months
24 months
M
VW Golf IV 1.9 TDi Trendline
• Age: ~ 2 years
• Mileage: ~ 30.000 km
M
M
...
No change in age
and mileage:
Direct comparison
3.5
Product specification:
VW Golf VI 1.9 TDi Trendline
• Age: ~ 2 years
• Mileage: ~ 30.000 km
Further selected models
• Age: ~ 2 years
• Mileage: ~ 30.000 km
Major change (new sub-model):
Bridged overlap
Cost estimation
One-time effort for the introduction
•
For the implementation of the non-hedonic method one staff member with market
knowledge and economic expertise is required. Up to three months seem to be necessary to set up an adequate sample and to perform the suggested non-hedonic approach. The time effort also depends on the current practice in use. For example, if
there is already an appropriate sampling procedure implemented the time effort for
the introduction might be lower. Market research seems to be unnecessary since the
used car market is well determined in the statistics of the central car registers.
Current effort of price collection
•
The time effort for the price collection depends on the data source that is used. In
the case of data provided by a market research institute an own price collection is
not required. In the case that the prices are collected by an own central survey on
internet platforms and in car magazines, up to 4 man-days seem to be sufficient to
collect the observations. To include the observed prices and characteristics into a
database might additionally last one day. The price collector should be a trained staff
member. There is no need for academic skills or expert knowledge in the used car
market.
•
In case that the observations have to be collected on internet platforms and in car
magazines, for each considered primary model in each age class two additional samples for the calculation of the specific depreciation rates have to be collected. This
might be a work load of up to ten days for a non-academic staff member. These depreciation factors are compiled yearly.
88
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Current effort of index calculation
•
Regarding the index calculation, the price statistician should have econometric knowledge, especially when the index calculation is conducted manually. In the case of an
automatic or semi-automatic process only moderate economic skills are required. The
index calculation might be done quickly; an effort of approximately 1 man-day seems
to be sufficient.
•
The effort of the yearly re-sampling amounts to up to 5 days for a staff member with
non-academic qualification.
3.6
Examples
Example: Generation of consumption segments
•
In the first step, the used car market has to be divided into different consumption segments in according to the size of the car. This step should correspond to the procedure
for new cars where the pragmatic proposal is to define four consumption segments:
– small used cars,
– medium-sized used cars,
– large used cars,
– special used cars.
•
As in the case of new cars, the universe of primary models consists of approximately
300 cars in each country. Each of these primary models has to be assigned to the identified consumption segments. The distinguishing criterion is the size of the car.
– The data on the particular primary models that are registered in the country can
be obtained from the national central car register.
Chart 61
Example for the assignment of concrete models to the consumption segments
Central car register
Primary models
Small used cars
Medium-sized used cars
Large used cars
Special used cars
•
Registrations
per year
VW Golf
...
Opel Corsa
...
Mercedes-Benz SLK
...
Mazda 323
...
...
...
As a result, each consumption segment consists of precise primary models.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
89
Used cars
Chart 62
Consumption segments and the corresponding primary models
Consumption segments
Small used cars
Primary
model
Medium-sized used cars
Ownership
changes
Primary
model
Ownership
changes
Large used cars
Primary
model
Opel Corsa
26,082
VW Golf
73,692
MercedesBenz
E class
18,631
VW Polo
16,332
Opel Astra
36,889
BMW 5
series
16,247
Audi A6
12,937
Renault
Twingo
11,094
MercedesBenz
C class
...
...
...
sum
192,684
sum
Special used cars
Ownership
changes
Primary
model
Ownership
changes
Opel Zafira
14,620
VW Caravelle
8,305
VW Sharan
7,162
34,851
...
440,878
...
...
...
sum
73,504
sum
...
124,923
•
The consumption segments have to be weighted in accordance to their expenditure
share.
•
To obtain the total expenditure of a consumption segment, multiply the total number
of ownership changes from the previous year for the particular consumption segment
by its average price.
– The total number of ownership changes for each consumption segment can be taken from the central car register. The average price for a consumption segment can
be obtained from the section on price collection below. It should refer to one specific arbitrary age class which has to be the same for all consumption segments.
Chart 63
Calculation of expenditure weights for the consumption segments
Weighting the consumption segments
Small used cars
Medium-sized used cars
Large used cars
Special used cars
Total number
Average
Total number
Average
Total number
Average
Total number
of ownership
price
of ownership
price
of ownership
price
of ownership
price
changes
(in €)
changes
(in €)
changes
(in €)
changes
(in €)
192,684
7,500
440,878
10,000
73,504
15,000
124,923
12,500
Average
Total expenditure (in billion €):
1.44
4.40
1.11
1.56
Total expenditure of all consumption segments: 8.51 billion €
Expenditure share (in %):
17.0
90
51.8
12.9
18.3
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Example for the construction of a target sample
•
(1) Stratify the universe of ownership changes by age classes:
–
Two or three age classes should be generated following “peaks” of ownership
changes in the national market. Therefore, the distribution of ownership changes
according to the age of the cars should be checked. In the example provided below, the number of ownership changes according to age exhibits three “peaks”
in two, three and four-year-old cars.
Chart 64
Number of ownership changes for different ages of the car
Number of
ownership
changes
Selected age classes
1
–
•
3
5
Age (in years)
The pragmatic proposal is to form three age classes: Two-year-old, three-year-old
and four-year-old cars within each consumption segment. In the following this
recommendation will be considered.
(2) Select primary models by cut-off sampling (probability sampling is also possible):
– In each consumption segment and age class, include all primary models until a
cumulative market share of 50 % to 60 % is reached.
– This means to draw a sample for the recommended twelve segments: Four consumption segments each contain three age classes.
– An example for the consumption segment “medium-sized used cars” and the age
class of two-year-old cars (see following chart 65):
Federal Statistical Office, Statistics and Science, Vol. 13/2009
91
Used cars
Chart 65
Exemplary target sample for the consumption segment “medium-sized used cars”
Target sample for “medium-sized used cars”and two-year age class
Primary model
Ownership changes
(from the previous year)
Market
share
Cumulative
market share
in %
VW Golf
32,040
16.7
16.7
Opel Astra
16,039
8.4
25.1
Mercedes-Benz C class
15,153
7.9
33.0
BMW 3 series
14,586
7.6
40.6
Ford Focus
14,028
7.3
47.9
Audi A4
13,097
6.8
54.7
–
•
The selected primary models represented in the table make up 54.7 % of the market for “medium-sized used cars” within the considered age class.
Assignment of precise sub-models by purposive sampling
– The selected primary models have to be implicitly weighted according to the relation of their sales figures. The proposal is collect 25 to 30 price observations per
consumption segment which spread over the three age classes. This results in a
sample size of approximately 10 per stratum. Now it is to determine how many
price observations to collect for each primary model.
– The distribution of the price observations might look like this (implicit weighting):
Chart 66
Number of observations to be included
Distribution of price observations
“medium-sized used cars” and two-year age class
Primary model
Market share
Share in the
sample (ratio)
in %
in %
Number of price
observations
16.7
Æ
30.5
3
Opel Astra
8.4
Æ
15.4
1
Mercedes-Benz C class
7.9
Æ
14.4
1
BMW 3 series
7.6
Æ
13.9
1
Ford Focus
7.3
Æ
13.3
1
Audi A4
6.8
Æ
12.5
1
54.7
Æ
VW Golf
Total
92
100
8
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
–
–
–
In this example, three sub-models for the primary model VW Golf have to be selected in order to reflect its market share of the consumption segment “mediumsized used cars” within the age class of two-year-old cars.
The sub-model selection has to be conducted by purposive sampling according
to the following criteria:
– Should be a well sold sub-model within the range of the corresponding primary model.
– Should be of the least extensive equipment level.
– Must not be a special edition.
The following chosen sub-models meet these criteria. Additionally, the approximate ratio of diesel and petrol engines is considered.
Chart 67
Precisely selected sub-models for the primary model “VW Golf”
Sub-model selection for VW Golf V
•
Observation
Sub-model
1
Golf V 1.6 Trendline
2
Golf V 1.9 TDi Trendline
3
Golf V 2.0 TDi Trendline
Collect offer prices for the chosen sub-models
– The prices for the selected sub-models can ideally be obtained from market research institutes. Otherwise, collect prices on internet platforms and in car magazines.
Example: Replacement strategy
•
Representativity check within the scope of the monthly price collection:
– The price collector (product expert) should refer to the selected sub-models of
the previous month. In the case that a specific sub-model of a precise age is no
longer observable it should be replaced. Maybe, the considered primary model
is followed by a new version: For example, the follow-up model for VW Golf IV is
the VW Golf V.
•
Representativity check for the periodical renewal of the sample
– The target sample should be updated regularly, maybe yearly, depending on the
publication of the statistics of the ownership changes by the central car register.
Example: Quality adjustment
•
It is important to distinguish between minor and major changes
•
Major changes can occur in three situations:
– Change of the primary model due to a former production stop.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
93
Used cars
–
–
The old version of a primary model is followed up by a new generation: e.g. VW
Golf IV is followed by a VW Golf V.
The sub-model is replaced since it is no longer available.
•
Minor changes:
– The equipment of a particular sub-model is, for example, extended by a parking
sensor.
– The product observations of a particular sub-model differ in age and/or in mileage.
•
Differences in the equipment of the used cars are not adjusted.
•
As regards data from market research institutes: Minor changes due to variations in
age and/or mileage do not occur. Therefore, direct comparison is applied in the case
of changed equipment and bridged overlap in case of major changes.
•
When prices are observed on internet platforms or car magazines, differences in age
and/or mileage can occur. In this case, supported expert judgement is applied.
•
For each considered primary model in each relevant age class two additional samples of approximately 10 observations have to be collected yearly.
•
Collect a first sample with (nearly) constant mileages and varying age in months within
the range of the age class:
Chart 68
Sample for the calculation of the depreciation rate for age for the “VW Golf V”
Primary model: Volkswagen Golf V
Sub-model: 1.9 TDi Trendline
Age class ~ 2 years
Age
Mileage
Used car price
(in months)
(in km)
(in €)
1
18
32,000
15,000
2
26
31,000
14,000
3
17
29,000
14,800
4
24
30,000
14,000
5
15
33,000
16,000
6
29
33,000
13,000
7
19
32,000
15,500
8
24
30,000
14,000
9
20
31,000
15,000
10
26
33,000
14,000
Observation
•
94
The depreciation rate for age in months for the VW Golf V 1.9 TDi Trendline in the age
class of ~ 2 years can be calculated as follows:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
1§
(P − P )
(P − P )
(P − P )
...
·
Model _ A, Age _ class
9
10
1
2
3
4
¸
= ¨¨
+
+ +
δ Age
5 © ( Age2 − Age1 ) ( Age4 − Age3 ) ... ( Age10 − Age9 ) ¸¹
VWGolf ,~2years
δ Age
=
•
1 § (1,000€) ( 800€) (3000€) (1500€) (1000€) ·
+
+
+
+
¸ = 184.05 €
¨
5 ¨© ( 8 )
(7 )
(14)
(5 )
( 6 ) ¸¹
A, Age _ class
can be interpreted as follows: If a replacement is chosen that is
The δ Age
1 month older, the price of the replacement car has to be adjusted for 184.05 €. The
applied depreciation rate corresponds to the primary model and the considered age
class.
•
Analogously, to calculate a depreciation rate for the mileage of a particular primary
model and in a particular age class, a second sample of 10 used car prices should
be observed. These 10 price observations consist of 5 pairs again, each should be of
the same sub-model, identical age in months but of different mileages.
Chart 69
Sample for the calculation of the depreciation rate for mileage for the “VW Golf V”
Primary model: Volkswagen Golf V
Sub-model: 1.9 TDi Trendline
Age class ~ 2 years
Age
Mileage
Used car price
in months
in 1,000 km
in €
1
24
15
15,000
2
24
45
13,500
3
22
20
14,500
4
22
30
14,800
5
25
20
15,500
6
25
60
13,200
7
20
15
15,500
8
20
35
14,000
9
26
30
14,500
10
26
50
13,800
Observation
•
On the basis of the second sample, a depreciation rate for the mileage in kilometres
can be calculated following the equation:
1§
(P − P )
(P − P )
...
Model _ A, Age _ class
3
4
1
2
+
= ¨¨
+ +
δ Mileage
5 © ( Mileage2 − Mileage1 ) ( Mileage4 − Mileage3 ) ...
+
·
( P9 − P10 )
¸
( Mileage10 − Mileage9 ) ¸¹
Federal Statistical Office, Statistics and Science, Vol. 13/2009
95
Used cars
1 § (1,500€) ( −300€) (2.300€) (1,500€) (700€) ·
+
+
+
+
¸ = 37.50 €
5 © (30 )
(10 )
( 40 )
(20 )
(20 ) ¹
VWGolf ,~2years
δ Mileage
= ¨
•
A, Age _ class
The δ Mileage
can be interpreted as follows: If a replacement is chosen that ex-
hibits a mileage of 1,000 kilometres more, the price of the replacement car has to
be adjusted for 37.50 €. The applied depreciation rate corresponds to the primary
model and the considered age class.
•
In case of data collection on internet platforms and in car magazines, variations of the
characteristics age and mileage of two subsequent product observations are adjusted
by applying the model-specific depreciation values.
Chart 70
Example of quality adjustment for minor and major changes
“medium-sized used
cars”
Consumption segment
24 months,
30,000 km
26 months,
35,000 km
M
Product specification:
VW Golf IV 1.9 TDi Trendline
• Age: ~2 years
• Mileage: ~ 30.000 km
M
M
...
Supported judgement
VW Golf VI 1.9 Tdi Trendline
• Age: ~ 2 years
• Mileage: ~ 30.000 km
Further selected models
• Age: ~ 2 years
• Mileage: ~ 30.000 km
Bridged overlap
•
Assume that the observed used car in the first period (24 months old and 30,000 km)
has a price of 16,000 € and the observed used car in the second period (26 months
old and 35,000 km) has a price of 15,000 €.
•
Since the two product offers differ in age and mileage the calculated model-specific
and age class-specific depreciation rates for age and mileage have to be applied to
match the price observations.
– Price observation in the first period: P124 months, 30,000 km = 16,000 €.
– Price observation in the second period: P226 months, 35,000 km = 15.000 €.
– The price of the second observation has to be adjusted for 2 months and 5,000
kilometres:
A, Age _ class
A, Age _ class
– P2adj = P2 + δ Age
· (age2 – age1) + δ Mileage
· (mileage2 – mileage1)
= 15,000 € + (184.05 €) (26 mo. – 24 mo.) + (37.50 €) · (35 km – 30 km)
= 15,555.60 €.
96
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Discussion box
For various reasons used cars is a tricky topic in the price index and may have deceptive pitfalls. Reservation must therefore be made that optimal advice may not yet
have been found.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
97
Computers
4
Computers
•
The Task Force on Quality Adjustment and Sampling rated hedonics as an A-method.
100 % option pricing (for desktops), supported judgement as a proxy to 100 % option
pricing, and bridged overlap (applied under certain conditions) were assessed as
B-methods.
•
The empirical studies of the CENEX members showed that hedonic re-pricing is more
suitable than bridged overlap for desktop computers, whereas both methods yield
similar results for notebooks. Furthermore, the empirical studies showed some difficulties regarding the application of 100 % option pricing for both desktops and
notebooks.
•
The following section deals with the application of bridged overlap for notebooks
and 100% option pricing for desktops.
4.1
Proposal for the identification of consumption segments
•
Computers are covered by COICOP class 09.1.3. The proposal is to form two consumption segments:
– “desktops” and
– “notebooks”.
•
Reason: If the criterion of mobility is applied, desktops and notebooks serve two different consumption purposes. Desktops are used mostly at home whereas notebooks
can be used anywhere. Another criterion stated in Regulation (EC) No. 1334/2007 is
that a consumption segment is formed by product-offers which “can largely be described by a common specification”. This can be applied in the case of computers to
form the consumption segments: notebooks combine display and computer, desktops
do not.
•
This identification is also in alignment with the Draft Report of the Task Force Classification COICOP/HICP (November 2005).
Weighting of the consumption segments
•
The consumption segments should be explicitly weighted. Weights should ideally be
based on the relative expenditure shares of desktops and notebooks. Information
about the expenditure share can be obtained for example from market research institutes, from retail trade organisations or by an own (internet) survey. Another source
could be the regular official survey based on Regulation (EC) No. 808/2004 concerning the Community statistics on the information society.
•
If only sales frequencies are available the expenditure share could be obtained by multiplying the respective sales figures by a roughly estimated average price of both desktops and notebooks. A roughly estimated average price could stem from a small market survey from the collected prices of preceding periods.
4.2
•
98
Proposal for the construction of a target sample
It is suggested to first select outlets, since usually information on the sales frequencies of specific models is either not available or changes too fast to be used as a starting point. Subsequently, representative models are chosen within the outlets.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 71
Proposal for the construction of a target sample
outlets > models
(1) Choose outlets which are relevant for computers, accounting for
the regional dimension.
(2) Choose representative models in the outlets.
•
In the following it will be described how to proceed with these two steps for computers.
(1) Choose outlets which are relevant for computers,
accounting for the regional dimension
•
In the case of computers it is likely that in such a highly competitive market the price
development is identical across the country. Generally, regional market power of a certain outlet is not given. The market is highly transparent as consumers can use the
internet for price comparisons. Computers are standardised products which are available from different outlet types. A central price collection may therefore be suitable.
•
Online price collection is suggested as this is a very efficient way of collecting prices
and characteristics. Computer stores, especially the larger ones, usually have websites advertising the computers they have on sale. Internet is also the only available
source for computer sellers that operate online through their websites such as Dell.
A practical way is to choose outlets first and then to check whether the prices in these
outlets can be observed online. In each outlet, several observations should be made
in order to increase efficiency.
•
If not enough models can be observed using online sources, additional price collection in physical stores is necessary. In that case, again several observations should be
made in individual outlets.
•
If, in a certain country, in spite of the transparent market price developments should
differ significantly throughout regions prices should be collected locally: A procedure
like the following may be used.
– Distinguish between large/small retailers and mail order/stationary retailers.
– Include the largest retailers in any case. Check whether or not the large retail chains
have the same pricing policy in the entire country. If yes, prices can be centrally
observed in big retail chains. If not, prices have to be observed locally taking into
account the regional variance of price developments.
– Include a sample of small retailers. This could be done by probability sampling if
data can be obtained from the business register of your respective national statistical office. If information from the business register is not available, choose relevant regions and within the regions select typical outlets (just as you do for other
products of the HICP).
– If prices are collected locally check whether mail order selling is relevant in the
market. If this is the case prices should be collected from the internet or
catalogues additionally.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
99
Computers
(2) Choose representative models in the outlets
•
For the target sample, those models should be observed that are representative of the
sales of the outlet where they are observed. A good rule of thumb is that any model
that is fairly new or clearly visible in the outlet (on display or on its webpage) is very
likely representative. Sales staff can also help select the most sold computers.
Minimum size of the target sample
•
A number of 25 to 30 product-offers may be suggested as a normal rough minimum
bound for sample size of computers, where no more specific indications are present.
Additional aspects when applying option pricing (desktops only)
•
When applying option pricing it is necessary to collect not only the prices of models.
Additionally, characteristics of the chosen models and their respective prices (= option
prices) have to be collected in order to apply option pricing.
•
The following characteristics are may be suggested for observation. (These characteristics are examples resulting from the empirical studies by the CENEX members.)
– Processor type (e.g. Celeron, Pentium IV, . . .).
– Processor speed (in MHz).
– Hard disk capacity (in MB/GB).
– Working memory (in MB/GB).
– Operating system (Windows XP, Linux, none, . . .).
– Other software (MS Office, . . .).
– Type of drive (CD-rewriter, DVD player/rewriter, . . .).
•
Other characteristics may be relevant as well.
4.3
Proposal for the replacement strategy
General remarks
•
Given the rapid technological change of computers, replacing old models will be a frequent task. Many of the replacements will be voluntary: when a model is judged not
to be representative anymore it should be replaced, even if it is still available. Forced
replacements will probably be frequent as well.
•
A replacement model should be observed in the same outlet as the replaced model
and it should be representative of computer sales in that outlet. A model that is a suitable replacement in one outlet may not be suitable in another one, because not every
outlet will have the same models available.
•
Note that similarity of the replacement model with the old model is not relevant. Only
its representativity for the sales of the outlet where it is selected is important. Also,
the reason for its disappearance is not relevant.
•
Similarity may only play a role when there are several equally representative replacement models available.
•
In the case of online price collection the price statistician from the office will often also
be the one who collects the prices. The price statistician will thus be constantly up to
100
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
date regarding current developments in the computer market. A representativity check
in the office is therefore not explicitly mentioned here. Additionally, when using bridged
overlap for notebooks it is not relevant to explicitly observe characteristics.
Representativity check within the scope of the monthly price collection
•
Ideally, there could be reason to have the sample refreshed every month. As it would
not be desirable to have no or only a few models with a match in the previous month it
is suggested to use the models that were in the sample in the previous month as a
starting point for the price collection of the current month in order to keep the share of
matched models reasonably high.
•
During the collection process, the price collector should ask him or herself for each
model that was observed in the previous month, whether this model is still representative. If the model is still available, the answer will usually be “yes”, as models that are
out of date will disappear very quickly.
•
If the answer is “yes”, the collector should quote the current price. If the answer is “no”,
the collector should select a new model that is judged to be representative. No effort
should be made to try to find a replacement model that is similar to the old one. Similarity to disappearing models is of no relevance at all. This is a very important point.
•
Similarity to the predecessor may only play a role when the price collector can choose
between multiple equally representative replacements.
Additional aspects when applying option pricing
•
Given the rapid technological change of computers, new features are included with
certain regularity. This is not so frequent that it would be necessary to select a new set
of relevant characteristics every month, but the price statistician should be aware of
the possibility. Whenever a new characteristic (say, a new kind of processor or a new
drive) is introduced, the price statistician should judge whether the new characteristic
is important enough to be included. Ideally, this would happen only a few times each
year.
•
Of course, a new characteristic can only be considered for quality adjustment if this
characteristic is collected in both the previous and the current month.
4.4
Proposal for the quality adjustment
I.
Bridged overlap (notebooks only)
•
Matched models can be directly compared.
•
If an old model is replaced by a new one bridged overlap shall be applied. The price
development of all other (matched) notebooks is used as the bridge.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
101
Computers
Replacement to ensure representativity
Chart 72
Replacement to ensure representativity
regular judgemental
representativity checks
replacement to ensure
representativity
target universe
models
models
M
M
M
most sold
in t = 1, 2, 3
M
An unrepresentative
model is replaced
by a representative
one in t = 4.
•
most sold
in t = 4
estimated sales
frequency
In order to adjust for the quality change between replaced and replacement model
the price development of all other observed models is used as bridge (see below).
Chart 73
Bridged overlap with a real price increase
Price
bridge
60
quality difference
model “b”
model “a”
50
P of reference models
model “c”
40
model “d”
30
model “e”
20
t=4
102
t=5
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
II. Option pricing (desktops only)
•
For most products, option pricing is used when a certain feature of a product, like air
conditioning in a car, used to be an option but has now become standard. Hence the
name “option pricing”. For computers, it is used in a slightly different way. Here, we
use option pricing to adjust for the price differences of a certain feature, when the replacement model has a different variant of that feature than the old model. For example, when the replacement model has a hard disk of 200 GB and the old model one of
100 GB, option pricing adjusts for the price difference of these hard disks.
•
The following characteristics are suggested for observation:
– Processor type (e.g. Celeron, Pentium IV, . . .).
– Processor speed (in MHz).
– Hard disk capacity (in MB/GB).
– Working memory (in MB/GB).
– Operating system (Windows XP, Linux, none, . . .).
– Other software (MS Office, . . .).
– Type of monitor, including size (TFT or not, . . .).
– Type of drive (CD-rewriter, DVD player/rewriter, . . .).
•
If a model can be observed in two subsequent periods (matched model) the prices of
the previous and the current period can be directly compared.
•
If a change in one of the characteristics above occurs option pricing shall be applied.
For desktops, 100 % option pricing is suggested. Let us assume that an optional hard
disk capacity of 100 GB is available at a price of 40 Euros. For the above mentioned
example, this means that the price of the replacement model is adjusted by 40 Euros.
•
For option pricing, we need information on three aspects:
– We need to link specific replaced models with specific replacement models. Replacement models are called successors in this case.
– We need to know the differences between replaced models and their successors,
both qualitatively and quantitatively.
– We need to know the option prices of these particular differences.
Given this information, we can calculate the quality-adjusted price of either the replaced models or the successors. For product groups with a high rate of turnover, like
computers, we generally do not know beforehand when a model will be replaced. In
practice, option prices can then only be collected in the period when the successor is
introduced. It is possible to adjust the price of the replaced model as well as to adjust
the price of the replacement model with regards to the same option prices.
•
•
In the example provided below the price of the replacement model is adjusted. In that
case, the quality adjustment is applied to the price of the successor and this price is
compared with the unadjusted price of the replaced item.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
103
Computers
•
Therefore:
Observed price of the replacement
Option prices of the characteristics in which the replaced model
differs from the replacement
– Option prices of these characteristics of the replacement
__________________________________________________________
= Quality adjusted price of the replacement
+
•
Or more formally:
§
pˆiy∈,Nm = piy∈,Nm + ¨
¨
©
¦b
y ,m
kj
k
−
¦b
k
·
y ,m ¸
ki ¸
¹
Here i is the successor and j the replaced model. Furthermore, p̂ denotes the estimated price, p the observed price in year y and month m . The characteristics are denoted by k and their values, as expressed in option prices, are denoted by bk.
•
Example of the calculation:
Suppose i and j are two desktops that are identical in every aspect except for their
hard disk. Computer I has a hard disk of 200 GB and computer j a hard disk of 100 GB.
When bought separately in the current month, a 200 GB hard disk would cost 100 €,
whereas a 100 GB hard disk would cost 40 €.
Observed price of the replacement
+ 40 €
– 100 €
__________________________________________________________
= Quality adjusted price of the replacement
Æ Subtract 60 € from the observed price of the successor to obtain the quality ad
justed price of the replacement.
Drawbacks of 100 % option pricing for desktops
•
The empirical study on computers conducted within the scope of the CENEX project
did not necessarily yield reliable results for option pricing.
•
There are several other problems with option pricing:
– We need to be able to determine in which respects the replacement and the successor differ.
– We need a monetary value of the characteristics in which these models differ from
each other. With option pricing, one chooses explicitly to collect additional prices
to determine these values. This places a huge burden on data collection, as one
needs not only information on prices and characteristics of the models in the sample, but also on the prices of these characteristics.
•
The biggest problem with the option pricing method is that one cannot be sure whether
the option prices correctly reflect the price differences of these characteristics when
sold as part of a computer as opposed to bought separately. Moreover, on some char-
104
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
acteristics it will not be possible at all to collect these separate prices. Especially for
replaced models, the option prices may not be available because these particular options are not available anymore.
•
4.5
Since we lack any information on the reliability of these additional option prices, we
cannot assume that option pricing leads to a reliable quality adjustment in the case
of computers. In that case, an implicit quality-adjustment method like bridged overlap
may result in an index that is as good as an index that was quality-adjusted using option pricing.
Cost estimation
One-time effort for the introduction
•
With regard to bridged overlap and option pricing, there is no more effort than setting
up an initial sample (which should be done anyway, regardless of quality adjustment).
Current effort of price collection
•
With regard to bridged overlap, there is the same effort as for other product groups or
segments without explicit quality adjustment.
•
With regard to option pricing, besides the prices of the product-offers, also characteristics need to be observed. This effort is the same as for the hedonic approach and
requires the same expertise. In the month where a model is replaced, the option prices
for the features in which the new and old models differ need to be collected in the same
store. This may be the task of the price collector and therefore requires some technical
expertise. The amount of extra time depends on the number of replacements and thus
on the sample size.
Current effort of index calculation
•
With regard to bridged overlap, there is the same effort as for other product groups or
segments without explicit quality adjustment.
•
With regard to option pricing, price changes of replacements are adjusted using option
prices. This requires knowledge about the option price method. Since this method
is fairly widespread among NSIs, this probably does not require additional expertise
from the price statistician. The amount of extra time necessary for quality adjustment
depends on the number of replacements and thus on the sample size. It should not
exceed one day.
4.6
•
Examples
In the following an example for the application of bridged overlap is described which
is only relevant for notebooks.
Example of the construction of the target sample
•
For the pre-selection of outlets where relevant notebooks are sold we distinguish
between large and small sellers. Once this distinction is finished we can determine
concrete outlets. The distribution of the sales across point of sales might look like
this:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
105
Computers
Chart 74
Pre-selection of outlet types
Large seller 1
Large seller 2
Small sellers
40 %
40 %
20 %
•
Two large sellers cover 80 % of the market while the sales by small sellers make up
20 % of the market. Large sellers should be included in any case. Furthermore, a probability sample should be taken from all small sellers. The regional dimension has to
be considered.
•
Then, it has to be determined which share of the sample has to be observed in which
outlet types. The following table will help to do so. For our example we assume that
30 price observations are required. The following table shows how many notebooks
have to be observed in which shop (numbers rounded):
Chart 75
Matrix containing numbers of observations to be included
Notebooks
•
Large seller 1
Large seller 2
40 %
12
40 %
12
Small sellers
(prob. sample)
20 %
6
In a next step the price collector would be instructed to collect prices in an outlet of
large seller 1. He might be provided with the following price collection form:
Chart 76
Price collection form
Notebook in outlet x of large seller 1
Price (€):
Processor type:
Processor speed (GHz):
Hard disk drive capacity (GB):
Working memory (MB):
Operating system:
Other software:
Type of display:
Size of display (inch):
Type of drive:
Graphic card:
Brand and model description:
Other features:
Shop: ___________________________
Date: ___________________________
106
799.–
Intel Core2 Duo T8100
2,200
60
1,024
Windows XP
–
LCD
15.4
DVD writer
nVidia GeForce 8000M GS
xxx
–
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Example for quality adjustment
In this example model D1 is replaced by model D2 in month 3. Therefore, a quality adjustment should be done by applying bridged overlap.
Chart 77
Bridged overlap for notebooks
Model
Brand
Price
in month 1
Price
in month 2
Price
in month 3
A
DELL
799.– €
799.– €
699.– €
B
DELL
599.– €
599.– €
599.– €
C
DELL
599.– €
599.– €
599.– €
659.– €
659.– €
631.– €
Average price of models 1 – 3 without
quality adjustment (geometric mean)
Average price change of models 1 – 3
without qualitiy adjustment (in %)
D1
DELL
D2
DELL
0.0
Average of directly observed prices
– 4.4
810.– €
810.– €
680.– €
650.– €
694.– €
694.– €
635.– €
Quality adjusted average price
664.– €
Price change (in %)
Monthly indexes
100.0
0.0
– 4.4
100.0
95.6
A bridge is needed to reflect and apply the price change of all unchanged notebooks to
the new model D2. The bridge is taken from the average prices of models A to C. The
price change of these models from month 2 to month 3 is – 4.4 %. Model D2 has a price
of 650.– € in month 3 and the imputed price for the previous month can now be calculated:
650.– € / 0.956 = 680.– €
with 0.956 = 1 – 0.044
The quality adjusted average price in the previous month amounts to 664.– €. The result on
the basis of this quality adjusted average price is a quality adjusted price change of – 4.4 %
and a decline of the index from 100.0 to 95.6.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
107
Washing machines
5
Washing machines
•
For washing machine market, the Task Force on Quality Adjustment and Sampling
rated explicit methods (hedonics and option pricing) as A-methods. Implicit methods
(bridged overlap) are assessed as B-methods.
•
The following section deals with the application of bridged overlap for washing machines.
5.1
Proposal for the identification of consumption segments
•
Household appliances are covered by COICOP group 05.3. Among others, washing
machines are one major item in this category.
•
One consumption segment should be built for each of the products relevant in the category household appliances (e.g. fridge, freezer, cooker, microwave, [vacuum] cleaner,
washer-dryer etc.). In the following a detailed guidance for washing machines is described.
•
One consumption segment named “washing machines” is suggested, including all
washing machines but excluding washer-dryers, which should be treated separately in
another consumption segment.
•
Reason: All washing machines are marketed for predominant use in similar situations,
serving the same consumption purpose. Washer-dryers have an additional purpose
(drying).
5.2
•
Proposal for the construction of a target sample
A stratification of the target universe according to model types is suggested, followed
by a pre-selection of the relevant model types. Subsequently, outlets should be chosen
which are relevant for the type of good. Market information on the sales figures of models can be obtained either from market research institutes or through a small field
survey. In the selected outlets, representative models which belong to the pre-selected
model types should be chosen for price observation.
Chart 78
Proposal for the construction of a target sample
model types > outlets > models
(1) Stratify the target universe according to model types.
(2) Pre-select relevant model types.
(3) Choose outlets which are relevant for the type of good, accounting for the
regional dimension.
(4) In the outlets, choose representative models which belong to the preselected model types.
108
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
(1) Stratify the target universe according to model types
•
For washing machines, model types might be characterised by the type of washing
machine (e.g. front or top loader) and by their capacity.
•
Carry out an evaluation on which types of washing machines are available (e.g. front
loader, top loader) and on how large their market share is.
•
Additionally, evaluate the available capacities so that stratification by capacity classes
can be applied.
•
Stratify for example by
a) Washing machine type: top loader – front loader.
b) Capacity: up to 5 kg – between 5 kg and 7 kg – more than 7 kg.
Chart 79
Example of stratification: Market shares in percentage of sales volume
Overview of washing machine market
Front loader
Top loader
Capacity: up to 5 kg
22
8
Capacity: 5 – 7 kg
53
7
Capacity: more than 7 kg
10
0
(2) Pre-select relevant model types
•
The strata with the biggest market shares are successively selected until 50 to 80 %
of the total market are covered. This means that the strata with a small market share
will not be included in the sample (cut-off sampling).
•
Another possibility to pre-select model types would be probability sampling. In this
case the market shares of all strata are reflected by the number of observations of the
respective strata.
•
The result of the pre-selection will be a list containing the pre-selected washing machine types plus a guiding value for the respective capacity class. This list can be
passed to the price collector for the selection of concrete models.
(3) Choose outlets which are relevant for this type of good, considering
the regional dimension
•
An idea for sampling outlets could be as follows.
– Distinguish between large/small retailers and mail order/stationary retailers.
– Include the largest retailers in any case. Check whether or not the large retail chains
have the same pricing policy in the entire country. If yes, prices can be centrally
observed in big retail chains. If not, prices have to be observed locally taking into
account the regional variance of price developments.
– Include a sample of small retailers. This could be done by probability sampling if
data can be obtained from the business register of your respective national statis-
Federal Statistical Office, Statistics and Science, Vol. 13/2009
109
Washing machines
–
–
tical office. If information from the business register is not available, choose relevant regions and within the regions select typical outlets (just as you do for other
products of the HICP).
Prices from small retailers should be observed locally in general (under consideration of regional variations in price development).
If mail order selling is relevant in the market it should be covered by collecting
prices from the internet or catalogues additionally.
(4) In the outlets, choose representative models which belong to the pre-selected
model types
•
Within the chosen outlets, select washing machines which are best sold corresponding to the pre-selected strata.
•
Which washing machines are best sold in the respective strata should be determined
by the price collector after consulting a shop assistant. If this is not practicable the
price collector can also revert to indicators such as the way the product is advertised
or presented in the shop or he might also revert to his experience. A model that is clearly visible in the store could, for example, be judged to be representative.
Minimum size of the target sample
•
A number of 25 to 30 models seems reasonable to represent the consumption segment. Large countries may, however, need a higher number of price observations depending on their regional variance of price development.
5.3
Proposal for the replacement strategy
•
The stratification from the target sample is used to set up the product specification for
replacement, i.e. the replacement should be chosen from the same stratum. Additionally, the replacement model should always be selected from the same brand cluster.
•
A classification of brand clusters for washing machines can be obtained from the PPP
statistics (Purchasing Power Parities – price indices comparing the national price
levels). The available brand cluster lists of the PPP classify washing machines into
higher, medium, and lower brands. In these lists, not all available brands are classified. Missing brands should be assigned by personal judgement.
Chart 80
Extract from the PPP classification (adapted for washing machines)
Brand cluster
Assigned brands
Higher
AEG-ELECTROLUX, BOSCH, DYSON, MIELE, SIEMENS
Medium
ARISTON, BAUKNECHT, CANDY, ELEKTRA BREGENZ,
ELECTROLUX, EUDORA, GORENJE/KÖRTING/MORA,
HOOVER, LG, SAMSUNG, WHIRLPOOL, ZANUSSI
Lower
BEKO, BRANDT, DAEWOO, EUROTECH, FAGOR, KONCAR,
HOTPOINT, INDESIT, SANYO
Source: Eurostat (Lot C), European Comparison Programme: Consumer price survey E07-1
“House and garden” Particular survey guidelines
110
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
•
The result is a product specification that is characterised by three dimensions, namely
washing machine type, capacity and brand cluster. A replacement might, for example,
be chosen so that it fits the description “front loader, 6 kg capacity, higher brand”.
Representativity check within the scope of the monthly price collection in the shop
•
Use the concrete washing machine models that were in the sample in the previous
month as a starting point for the price collection of the current month in order to keep
the share of matched models reasonably high. However, the main criterion for the selection of replacements should be representativity.
•
The price collector should check whether or not the models of the previous month are
still representative of the corresponding stratum and brand cluster in the current month.
The price collector should assess this with the help of the shop assistant. If this is not
practicable the price collector can also revert to indicators such as the way the product is advertised or presented in the shop or he might also revert to his experience.
•
If the model of the previous month is still representative of the stratum and brand cluster, continue to observe the price for this model.
•
If the model is no longer representative or if it is no longer available replace it by a representative one from the same stratum and brand cluster – if available. The new model
would then be
– of the same washing machine type (front loader, top loader),
– of the same capacity (up to 5 kg – between 5 and 7 kg, more than 7 kg),
– of the same brand cluster (higher – lower brand).
•
If possible, try to find a replacement that is still representative but is so similar to the
replaced model that the changes in the characteristics between replaced and replacement model do not constitute a major change. Those characteristics that cause a major change in quality are for example changes in brand cluster, waterproofing, energy
label, revolutions per minute (see section “proposal for quality adjustment”).
•
If such a replacement cannot be found for lack of representativity, a replacement has
to be selected which differs in the characteristics causing a major change within the
product specification.
•
If no model is available from the same stratum the replacement has to be chosen from
the target universe, making a compromise between representativity and similarity to
the replaced model. The replacement could, for example, be of the same washing machine type but of a different capacity. In this case, the initial product specification
must be adjusted by the national statistical office to the current developments in the
washing machine market as soon as possible.
Representativity check in the office
•
As the market for washing machine is relatively stable with regards to technological
progress, a yearly representativity check might be sufficient. Nevertheless, the market
should be observed for general changes such as new equipment features, new brands,
and technological trends in general.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
111
Washing machines
•
5.4
If the pre-selected strata and product specifications are still representative, carry on
using them. If not, adjust the product specifications to the current market circumstances by the annual re-sampling.
Proposal for the quality adjustment
•
A distinction in minor and major changes helps decide whether a quality adjustment
is necessary. In general, major changes require a quality adjustment whereas minor
changes do not.
•
Minor quality changes could for example be:
– construction type (freestanding – built-in possibility),
– display of remaining time (yes – no),
– time manager (yes – no).
If minor changes occur the prices can be directly compared.
•
Major changes within the same product specification would be changes in:
– change in the brand cluster (e.g. from medium to low),
– waterproofing (high – low),
– energy label (high – low, with respect to power consumption, water consumption,
spin efficiency),
– revolutions per minute (e.g. up to 1400 rpm – more than 1400 rpm).
If major changes within the same product specification occur, bridged overlap shall
be applied. The price development of the remaining models in the same product specification is used as the bridge. (If the price collector was forced to choose a replacement from outside the product specification because there was no representative
model available in the initial product specification, the price development of the models in the initial product specification is used as the bridge. Re-sampling should then
be carried out as soon as possible.)
•
112
Major changes normally accompanied by re-sampling would be:
– change of the capacity class,
– change in the washing machine type (e.g. from top loader to front loader).
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 81
Quality adjustment for minor and major changes
models
„washing machines“
Consumption segment
M
M
M
M
M
M
M
M
Minor change within
product specification:
Direct comparison
5.5
Major change within
product specification:
Bridged overlap
Cost estimation
One-time effort for the introduction
•
There is no more effort than setting up an initial sample (which should be done anyway, regardless of quality adjustment). Up to three months seem necessary to perform
an initial design study on data requirements and market research as well as testing
new price collector forms etc. in order to set up an initial sample.
Current effort of price collection
•
No expert knowledge necessary, instruction with concrete guidelines is sufficient.
•
Monthly collection of prices and characteristics: 2 days.
•
Entry of prices and quality-adjusted prices into database: 1 hour.
Current effort of index calculation
•
Qualification: market knowledge and economic expertise
•
Index calculation: 1 hour per month
5.6
Examples
Example of the construction of the target sample
The field survey on available washing machine types and capacities in one exemplary
country has yielded the following stratification results:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
113
Washing machines
Chart 82
Example of stratification: Market shares in percentage of sales volume
Overview of washing machine market
Front loader
Top loader
Capacity: up to 5 kg
22 %
8%
Capacity: 5 – 7 kg
53 %
7%
Capacity: more than 7 kg
10 %
0%
The stratum with the largest market share of 53 % is front loaders with a capacity of
5 – 7 kg. This is the first stratum to be pre-selected. The second stratum that is preselected is front loaders with a capacity of up to 5 kg. The two strata pre-selected so far
add up to 75 % of the total sales. This is sufficient. Only these two strata are considered
in the following.
The pre-selection of model-types is finished. We now have to pre-select outlet types. We
distinguish between large and small sellers. Once this distinction is finished we can determine concrete outlets. The distribution of the sales across points of sales might look
like this:
Chart 83
Pre-selection of outlet types
Large seller 1
Large seller 2
Small sellers
40 %
40 %
20 %
Two large sellers cover 80 % of the market while the sales by small sellers make up 20 %
of the market. Large sellers should be included in any case. Furthermore, a probability
sample should be taken from all small sellers.
The next step is to determine how many washing machines will have to be observed in
which outlets. First it has to be calculated how large the shares of the two pre-selected
segments are in the sample.
Chart 84
Shares of pre-selection strata
Overview of washing
machine market
114
Share
of total sales
(in %)
Share
in the sample
(in %)
Front loader
(capacity up to 5 kg)
22
Æ
29
Front loader
(capacity 5 – 7 kg)
53
Æ
71
Total
75
Æ
100
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Then, it has to be determined which share of the sample has to be observed in which
outlet-types. The following matrix will help to do so. It refers to the tables above.
Chart 85
Shares matrix
Large seller 1 Large seller 2
Share
in the sample
in %
(in %)
40
40
Front loader (up to 5 kg)
29
Front loader (5 – 7 kg)
71
Total
Small sellers
20
100
For example, in order to find out how many front loaders with a capacity of 5 – 7kg to observe in the outlet of large seller 1, multiply 71 % by 40 % and you obtain a value of 28.4 %.
The resulting table looks like this:
Chart 86
Completed shares matrix
Large seller 1 Large seller 2
Share
in the sample
in %
(in %)
40
40
Small sellers
20
Front loader (up to 5 kg)
29
11.6
11.6
5.8
Front loader (5 – 7 kg)
71
28.4
28.4
14.2
Total
100
For our example we assume that 30 price observations are required. The following table
shows how many front loaders of which capacity class have to be observed in which shop
(numbers rounded):
Chart 87
Matrix containing numbers of observations to be included
Share
in the sample
(in %)
Large seller 1 Large seller 2
Small sellers
in %
40
40
20
Front loader (up to 5 kg)
29
3
3
2
Front loader (5 – 7 kg)
71
9
9
4
100
12
12
6
Total
Federal Statistical Office, Statistics and Science, Vol. 13/2009
115
Washing machines
In this example it is assumed that both large sellers pursue a pricing strategy that is identical across all regions within one country. Therefore, it is does not matter in which outlets
of the large retailers the prices are finally observed.
In a next step the price collector would be instructed to collect prices of e.g. four washing
machines of the type “Front loader, capacity of 5 – 7 kg, higher or medium brand” within
one outlet of large seller 1. He might be provided with the following price collection form:
Chart 88
Price collection form
Washing machine, front loader, capacity of 5 – 7 kg, higher brand
Price (€):
649.–
Brand and model description:
Bosch WAS 28440
Capacity (kg):
7
Waterproofing:
:Yes
Maximum spin speed (rpm):
1400
… No
Shop: ___________________________
Date: ___________________________
Example: Minor change within the same product specification
In month 3, Model 3a was no longer representative and has therefore been replaced by
model 3b (see chart 89).
Chart 89
Washing machines collected within the product specification
“front loader, 5 – 7 kg, higher brands” – minor changes
Model
Brand
Type
Capacity
Price
Price
Price
in month 1 in month 2 in month 3
1
Bosch
WAS 28440
7 kg
649.– €
649.– €
629.– €
2
Miele
W1614
6 kg
819.– €
819.– €
819.– €
3a
AEGElectrolux
Lavamat 64840
6 kg
599.– €
599.– €
3b
Siemens
WM 14 E 421
7 kg
Average price (geometric mean, rounded)
499.– €
683.– €
Price change (in%)
Index value
116
100.0
683.– €
636.– €
0
– 6.9
100.0
93.1
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
The change between models 3a and 3b is evaluated as a minor change in the model characteristics as the only difference is, for example, the existence of a time manager. Therefore no quality adjustment is necessary. The price of the replacement washing machine
Siemens is compared directly to the price of the replaced AEG-Electrolux.
The sixth line of the matrix above shows the average price (geometric mean). The average
prices for month 1 and month 2 are the same and therefore the average price change is
0.0 %. In month 3, if prices are compared directly we can see a price decline for the Bosch
from 649.– € to 629.– € and for the Siemens replacing the AEG-Electrolux from 599.– €
to 499.– €. This results in a decrease of the average price from 683.– € to 636.– € and
thus in a price change of – 6.9 %.
Example: Major change within the same product specification
In this example the change in the model specification from model 3a to model 3b is assessed to be a major quality change. A quality adjustment should be done. In the following
example the AEG-Electrolux model is replaced by a Samsung which is assigned to the
medium brand cluster. To adjust for this difference in brand class cluster bridged overlap
is applied.
Chart 90
Washing machines collected within the product specification
“front loader, 5 – 7 kg, higher brands” – major changes
Model
Brand
Type
Capacity
Price
Price
Price
in month 1 in month 2 in month 3
1
Bosch
WAS 28440
7 kg
649.– €
649.– €
629.– €
2
Miele
W1614
6 kg
819.– €
819.– €
819.– €
729.– €
729.– €
718.– €
Average price of models 1 – 2 without quality
adjustment (geometric mean)
Average price change of models 1 – 2
without quality adjustment (in %)
3a
3b
0.0
– 1.6
AEGElectrolux
Lavamat 64840
6 kg
Samsung
WFB 146 NV White
6 kg
Average of directly observed prices
599.– €
683.– €
Quality adjusted average price
Federal Statistical Office, Statistics and Science, Vol. 13/2009
456.– €
449.– €
683.– €
614.– €
624.– €
Price change (in %)
Monthly indexes
599.– €
100.0
0.0
– 1.6
100.0
98.4
117
Washing machines
A bridge is needed to reflect and apply the price change of all unchanged washing machine models to the new Samsung model. The bridge is taken from the average prices of
models 1 and 2. The price change of these models from month 2 to month 3 is – 1.6 %.
The Samsung washing machine has a price of 449.– € and the imputed price for the previous month can now be calculated:
449.– € / 0.984 = 456.– €
with 0.984 = 1 – 0.016
The quality adjusted average price in the previous month amounts to 624.– €. The result on
the basis of this quality adjusted average price is a quality adjusted price change of – 1.6 %
and a decline of the index from 100.0 to 98.4.
118
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
6
Books
•
As a first step the COICOP four-digit-level 09.5.1 Books is divided into two headlines:
“rapidly-changing market” and “long-selling market”. In the following, both headlines
are treated separately as they require different approaches:
– Rapidly-changing market (“changing-product principle”): Applicable to products
which have relatively short commercial life-spans and high rates of introduction
of new products (books for children, books for leisure and culture including recently published books).
– Long-selling market (“constant product principle”): Applicable to products that
have relatively long commercial life-spans and low rates of introduction of new
products (schoolbooks, dictionaries, encyclopaedias, evergreens and the like).
•
The HICP standards rate hedonics as an A-method in the rapidly-changing market.
Direct comparison and single variable adjustment (using the logarithm or square root
of the number of pages) are rated as B-methods. For the long-selling market direct
comparison is rated as an A-method, whereas a careful application of overlap and
bridged overlap is rated as a B-method. If applied automatically, overlap and bridged
overlap must be regarded as C-methods.
•
For the rapidly-changing market the empirical studies of the CENEX showed that the
index with hedonic adjustment exhibits the smallest volatility followed by the index
with adjustment by the natural logarithm of the number of pages and an index based
on average prices. The index with linear adjustment by the number of pages exhibits
the highest volatility. This result stresses once more that a single variable adjustment
has to be treated with some caution. The assumption that the price of one page of a
small book is equal to the price of one page of a thick book does not seem to hold.
Therefore, an adjustment with the natural logarithm is more sensible.
•
The following section deals with the application of
– direct comparison of bestseller lists in the rapidly-changing market, and
– a combination of direct comparison and expert judgement in the long-selling
market.
6.1
•
Rapidly-changing market
In the rapidly-changing market, the price collection should be focused on individual
titles that are representative in a given time frame. In the case of individual titles such
a time frame is rather short. Therefore the subset of individual book titles that is relevant for price collection can be very unstable. Suppose that, in the book market, the
universe of price movements could be followed directly. Even then it would not be
easy to make an objective assessment of the price movements in a very volatile market where most of the products have a very short life cycle. The reason is that in such a
market the detection of price movements will probably be more a matter of comparison between different books in subsequent points of time, than instead a matter of
observing the price evolution of the same book during a certain time span. The strategy of price collection should take account of this.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
119
Books
6.1.1
•
Proposal for the identification of consumption segments
Consumption segments can be built following the Task Force “Classification COICOP/
HICP” (2005) with slight modifications:
Chart 91
Consumption segments following the Task Force Classification COICOP/HICP
for the rapidly-changing market
09.5.1
Books
09.5.1.1
Rapidly-changing market
09.5.1.1.1
Fiction
09.5.1.1.2
Non-fiction (including biographies)
Children’s books
•
Alternatively, a different identification of consumption segments is possible. The required information could stem from a research study, which might be available at a
national bookstore association or at a market research institute.
•
Note: The application of a bestseller approach is may be more suitable if a bestseller
list exists. Thus, if a bestseller list is available for children’s books this consumption
segment should be included in the rapidly-changing market. Otherwise it is part of the
long-selling market.
Weighting of the consumption segments
•
Explicit weights are assigned to the consumption segments. Information about weights
can be obtained from the national association of bookstores, market research institutes (e.g. Media Control GfK International) or a big book store. Either detailed data
can be provided or at least purposive weights can be assigned to the consumption
segment based on the experience of the booksellers.
6.1.2
Proposal for the construction of a target sample
•
The suggested sampling method is a bestseller approach.
•
The way best known of keeping in touch with the most dynamic parts of the book
market is the mirroring of readily available bestseller lists. Bestseller lists can take
several specific forms. The price collector will have to decide if a specific bestseller
list is suitable for the task. In any case, check the reliability of the bestseller list provider.
•
The following chart shows the different possibilities for the construction of a target
sample for books in the rapidly-changing market. The choice of the procedure thereby
depends on the availability of bestseller lists at a national or local level.
•
Please note that national bestseller lists are not generally preferred to local bestseller
lists. Both approaches can be used depending on the index to be published. If only
120
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
a nationwide index is to be published then a national bestseller list of all book sales
may be the more appropriate choice. If additionally regional indexes are to be published regional bestseller lists may the better choice.
Chart 92
Different forms of bestseller lists at several levels
bestseller
central
shops
List at a national level
shops
Shop Î list of bestselling books
regional
Shop Î list of bestselling books
Shop Î information provided by shop employees
Shop Î books placed prominently (space)
approximation
If a national bestseller list is used
•
First, relevant models have to be chosen. If there is a national bestseller list available
for the rapidly-changing market then the relevant models are known. Representative
outlet types and particular outlets have to be identified under consideration of the regional dimension in the next step.
Chart 93
When using a national bestseller list
models > outlets
(1) Choose relevant models.
(2) Choose representative outlets, accounting for the regional dimension.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
121
Books
(1) Choose relevant models
•
Select the specific book titles from the national bestseller list by selecting the first n
bestsellers from the list.
(2) Choose representative outlets, accounting for the regional dimension
•
An idea for sampling outlets could be as follows.
– Distinguish between large/small retailers and mail order/stationary retailers.
– Include the largest retailers in any case. Check whether or not the large retail
chains have the same pricing policy in the entire country. If yes, prices can be
centrally observed in big retail chains. If not, prices have to be observed locally
taking into account the regional variance of price developments.
– Include a sample of small retailers. This could be done by probability sampling
if data can be obtained from the business register of your respective national statistical office. If information from the business register is not available, choose
relevant regions and within the regions select typical outlets (just as you do for
other products of the HICP).
– Prices from small retailers should be observed locally in general (under consideration of regional variations in price development).
– If mail order selling is relevant in the market it should be taken into account by
collecting prices from the internet or catalogues additionally.
– In several countries, the prices for books are fixed. In these cases a central price
collection is suitable and it is sufficient to select one outlet. The situation may, of
course, differ from country to country so that a country-specific evaluation should
be conducted before deciding on whether to collect prices centrally or locally.
Chart 94
Decision on central or local price collection
Fixed book prices
or similar price development throughout regions
Yes
Central price collection
No
Local price collection
If a national bestseller list is not used
•
122
If there is no national bestseller list available proceed in reversed order. First, select
representative outlet types as well as particular outlets under consideration of the regional dimension. In these outlets, the most representative books have to be chosen.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 95
When a national bestseller list is unavailable
outlets > models
(1) Choose representative outlets, accounting for the regional dimension.
(2) Choose representative models in the outlets.
(1) Choose representative outlets, accounting for the regional dimension
•
An idea for sampling outlets could be as follows.
– Distinguish between large/small retailers and mail order/stationary retailers.
– Include the largest retailers in any case. Check whether or not the large retail
chains have the same pricing policy in the entire country. If yes, prices can be centrally observed in big retail chains. If not, prices have to be observed locally taking
into account the regional variance of price developments.
– Include a sample of small retailers. This could be done by probability sampling if
data can be obtained from the business register of your respective national statistical office. If information from the business register is not available, choose relevant regions and within the regions select typical outlets (just as you do for other
products of the HICP).
– Prices from small retailers should be observed locally in general (under consideration of regional variations in price development).
– If mail order selling is relevant in the market it should be taken into account by collecting prices from the internet or catalogues additionally.
(2) Choose representative models in the outlets
•
Within the shop, the selection of specific book titles may be carried out in three different ways, depending on the availability of information:
– In some cases shops provide local bestseller lists. The specific book titles can
then be selected according to these local bestseller lists by choosing the first n
bestsellers.
– In other cases information can be obtained from shop employees.
– Alternatively, prominently placed books (e.g. books placed on tables) can serve
as an indicator for how well these books are sold. The rationale for this is that only
the most recent titles with a significant market share are placed prominently. However, please be careful not to select special offers.
Minimum size of the target sample
•
In the rapidly-changing market, a relatively high number of price observations is required in order to cope with the relatively high variance.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
123
Books
Chart 96
Rough guidance for the minimum size of the target sample
National
bestseller list
available
Fixed book prices
throughout the country
one small sample in one outlet
is sufficient
small sample size
unavailable
different bestselling items
have to be collected in
different outlets
large sample size
No fixed book prices
the same items from the
bestseller list are collected in
different outlets
large sample size
different bestselling items have
to be collected in different
outlets
large sample size
Additional aspects
•
The application of bestseller lists can have problematic aspects of its own.
– Firstly, there is the fact that particular bestseller lists can be very heterogeneous.
You may experience an inherent matching problem of individual items caused,
e.g., by different size, prestige or covering of the books. This may, however, not be
a problem on the aggregate level.
– Secondly, particular bestseller lists are representative of a specific sub-sector of
the market. Sometimes such a sub-sector will be representative of the wider general book market, otherwise it will not. For instance it is clear that one has to be
careful in the case that a bestseller list originates from a not well-known internetbookstore.
•
There are some possibilities to solve the variance problem. Please remember that they
need to be handled with care under consideration of their inherent disadvantages:
– Use bestseller lists with a relatively large number of items. A larger sample can decrease variability. The number of observed prices depends on different national
circumstances. For example in the case of countries with regulated prices, less
price observations are needed compared to countries with a fully competitive market.
– Apply appropriate stratification procedures and create several less heterogeneous
sub-samples. Homogenise the bestseller list by applying certain criteria (e.g. hardcover/paperback, pocket/non-pocket, page intervals). This strategy can of course
in principle reduce the representativeness of the sample. Therefore one should
be careful. The following chart shows some possibilities for stratification of the
rapidly-changing market:
124
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 97
Examples for product descriptions for different consumption segments
with stratification
Rapidly changing market
Soft covered books, between 300 and
500 pages of a “normal” size
Bestseller Fiction; belonging to 5 most
sold books, Novels, several types
Bestseller Non-fiction; belonging
to 5 most sold books
a: Paperback, b: Hardback, c: Special
edition
a: Paperback, b: Hardback, c: Special
edition
No. of pages; Length in cm
No. of pages; Length in cm
Bestseller Children’s Book; belonging
to 4 most sold books, several types
a: Paperback, b: Hardback, c: Special
edition
No. of pages; Length in cm
–
–
Apply simple quality adjustment formulas (e.g. division of the price through logarithm of number of pages). Although such strategy can be effective in some cases,
it is in general not as effective as the application of hedonic QA-procedures.
The stability could be further enhanced through the systematic exclusion of outliers. This strategy is however even more risky (can create bias) than the artificial
homogenisation of bestseller lists, and is therefore not advisable.
•
In case that stratification is used and that local price collection is necessary, the local price collectors need to be carefully instructed. They need information on the concrete product specifications for their observations and on the amount of bestsellers
and prices that need to be observed.
•
In order to gain information on characteristics of a certain book the ISBN is a unique
identifier. It can be used to centrally collect the characteristics of a certain book title
in databases rather than locally in the stores. They can, for example, be found on the
internet (e.g. at www.amazon.xx).
6.1.3
Proposal for the replacement strategy
•
Replacements occur quite often – sometimes so often that the sample has been completely renewed after a relatively short time span. The vast majority of replacements
will therefore have an unforced character which means that in most cases the replaced book is still available in the market but no longer a representative bestseller.
•
In the case of bestseller lists one can not speak of one-to-one replacements. Instead,
each month the first n books are chosen from the same bestseller list that was already
considered in the previous month.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
125
Books
•
If no bestseller lists are available the price collectors have to check every month
whether the books of the previous month are still representative by applying the same
criteria as in the previous month (e.g. books placed prominently in the outlet). If they
are no longer representative they have to be replaced by representative ones.
Chart 98
Monthly replacement of bestseller lists (with stratification)
M
M
M
M
...
e.g. paperback
bestseller list
...
e.g. hardback
bestseller list
M
“Fiction”
Consumption segment
target sample
M
M
M
M
M
M
M
M
M
M
change to updated
bestseller list
6.1.4
Proposal for quality adjustment
Direct comparison of bestseller lists
•
The prices of the books from the same bestseller list are directly compared.
•
Calculate the arithmetic or geometric mean of all prices monthly.
•
Compare the arithmetic or geometric mean values over time to calculate the index.
•
The choice between a Jevons- or Dutot-Index is left to the national NSI, since both formulas are allowable and allowed.
6.1.5
Cost estimation
One-time effort for the introduction
•
Approximately one month may be sufficient for the implementation of the method.
Current effort of price collection
•
One to two man-days may be sufficient to collect prices.
Current effort of index calculation
•
126
The index calculation should not take more than one day.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
6.1.6
Examples for the rapidly-changing market
Example for the identification of consumption segments
•
Identify the relevant consumption segments for your country. Use the recommendation of the Task Force on “Classification COICOP/HICP” as a starting point.
•
It is assumed that three consumption segments are relevant for your country:
– fiction books,
– non-fiction books,
– children’s books.
•
In the following, the focus lies on fiction books. In order to deal with the two remaining consumption segments you would need to proceed analogously.
Example for the construction of the target sample
•
In a first step, check whether a national bestseller list is available for fiction books for
your country. If a bestseller list is unavailable, please proceed as described above in
the section on sampling.
•
For this example, a homogeneous bestseller list for fiction books is available. It is,
furthermore, assumed that prices are fixed throughout the country.
•
The top five fiction titles are exemplarily considered from the bestseller list for price
collection.
•
As the price development is similar throughout regions due to fixed prices, the prices
can be collected centrally, for example from a big bookstore.
Chart 99
Fictitious example of the sample of bestseller fiction books
in the base period
Position
Author
Titel
Price
1
Danielle Steel
Roque
17.95 €
2
Paulo Coelho
The Witch of Portobello
20.99 €
3
Robert Menasse
Don Juan de la Mancha
20.99 €
4
Catherine Coulter
Tailspin
16.45 €
5
Henning Mankell
Italian Shoes
24.99 €
Digression on stratification
•
In case that there is no (homogeneous) bestseller list, stratification should be applied
in order to avoid high volatility. The following chart shows some possibilities for stratification of the rapidly-changing market:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
127
Books
Chart 100
Examples for product descriptions for different consumption segments
with stratification
Rapidly changing market
Soft covered books, between 300 and
500 pages of a “normal” size
Bestseller Fiction; belonging to 5 most
sold books, Novels, several types
Bestseller Non-fiction; belonging
to 5 most sold books
a: Paperback, b: Hardback, c: Special
edition
a: Paperback, b: Hardback, c: Special
edition
No. of pages; Length in cm
No. of pages; Length in cm
Bestseller Children’s Book; belonging
to 4 most sold books, several types
a: Paperback, b: Hardback, c: Special
edition
No. of pages; Length in cm
•
In case that stratification is used and that local price collection is necessary, the local
price collectors need to be carefully instructed. They need information on the concrete
product specifications for their observations and on the amount of bestsellers and
prices that need to be observed. The price collectors might be provided with the following price collection form:
Chart 101
Example of a price collection form
Fiction book, hardback, 300 – 500 pages
Author: Paulo Coelho
Title: The Witch of Portobello
ISBN: 978-0007251865
Price: 20.99 €
Number of pages: 320
length: 21.6 cm
Binding:
… binds in board
: hardback
… special binding
Shop: _______________
Date: _______________
•
128
The ISBN is needed for a clear identification of different book titles and its respecttive editions.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Example for the replacement strategy
•
In the case of bestseller lists one can not speak of one-to-one replacements. Instead,
for our example, in each month the first five books are chosen from the bestseller
list originating from the same source that was already considered in the previous
month. This way, the sample remains representative over time.
Chart 102
Fictitious example of the sample of bestseller fiction books
in three sequent months
Position
1
2
3
4
5
•
Month 1
Author / Title
Danielle Steel
Rogue
Paulo Coelho
The Witch of
Portobello
Month 2
Price
Author / Title
17.95 €
Joanne K. Rowling
Harry Potter and the
deathly hallows
20.99 €
Month 3
Price
Author / Title
Price
18.95 €
Joanne K. Rowling
Harry Potter and the
deathly hallows
18.95 €
Paulo Coelho
The Witch of
Portobello
20.99 €
Paulo Coelho
The Witch of
Portobello
20.99 €
Danielle Steel
Rogue
17.95 €
Danielle Steel
Rogue
17.95 €
Robert Menasse
Don Juan de la
Mancha
20.99 €
Catherine Coulter
Tailspin
16.45 €
Cornelia Funke
Inkdeath
17.99 €
Stephenie Meyer
The Host
18.95 €
Henning Mankell
Italian Shoes
19.99 €
Stephenie Meyer
The Host
18.95 €
Henning Mankell
Italian Shoes
19.99 €
The chart above demonstrates the dynamics of bestseller lists:
– Two books remain in the sample in all three months: Rogue by Danielle Steel and
The Witch of Portobello by Paulo Coelho.
– The book Italian Shoes by Henning Mankell is in the sample in the first month,
loses representativity in the second month so that it needs to be excluded and regains representativity in the third month when it is again among the five top selling
books and therefore re-included in the sample.
– The remaining books stayed in the sample either only for one month or for two sequent months.
Example for quality adjustment
•
For the calculation of the index for fiction books the prices from the fiction bestseller
list are directly compared.
– In a first step the arithmetic mean is calculated for months one, two, and three as
follows:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
129
Books
Chart 103
Calculation of arithmetic mean prices, month 1 – 3
–
Month 1:
17.95 + 20.99 + 20.99 + 16.45 + 19.99 = 19.27 €
5
Month 2:
18.95 + 20.99 + 17.95 + 17.99 + 18.95 = 18.97 €
5
Month 3:
18.95 + 20.99 + 17.95 + 18.95 + 19.99 = 19.37 €
5
In the next step, the arithmetic mean values are directly compared to calculate
the sub-index for fiction books for month one, two, andf three:
Chart 104
Direct comparison of arithmetic means for months 1 – 3
Index month 1
(= base period)
=
100.0
Index month 2
= 18.97/19.27
=
98.4
Index month 3
= 19.37/19.27
=
100.5
6.2
Long-selling market
6.2.1
Proposal for the identification of consumption segments
•
Consumption segments can be built following the Task Force “Classification COICOP/
HICP” (2005) with slight modifications:
Chart 105
Consumption segments following the Task Force Classification COICOP/HICP
for the long-selling market
09.5.1
Books
09.5.1.2
Long-selling market
09.5.1.2.1
Dictionaries
09.5.1.2.2
Schoolbooks
09.5.1.2.3
Children’s books (including scrapbooks and albums for children)
09.5.1.2.4
Art books
09.5.1.2.5
Travel guides, reference books
09.5.1.2.6
Other books (includes book binding)
Book series
130
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
•
Alternatively, a different identification of consumption segments is possible. The required information could stem from a research study, which might be available at a
national bookstore association or at a market research institute.
•
In practice, not all consumption segments need to be equally relevant in all countries.
For example, in countries with governmental subsidies for schoolbooks the consumption segment “schoolbooks” might be empty due to low relevance for private consumption.
•
Note: The application of a bestseller approach may be more suitable if a bestseller list
exists. Thus, if a bestseller list is available for children’s books this consumption segment should be included in the rapidly-changing market. Otherwise it is part of the
long-selling market.
Weighting of the consumption segments
•
Explicit weights are assigned to the consumption segments. Information about weights
can be obtained from the national association of bookstores, market research institutes (e.g. Media Control GfK International) or a big book store. Either detailed data can
be provided or at least purposive weights can be assigned to the consumption segment based on the experience of the booksellers.
6.2.2
•
Proposal for the construction of a target sample
Usually, bestseller lists do not exist for books of the long-selling market. As a consequence, in a first step outlets should be chosen which are relevant for the long-selling
market under consideration of the regional dimension. Subsequently, representative
models should be chosen within the outlets for price collection.
Chart 106
Proposal for the construction of a target sample
outlets > models
(1) Choose outlets which are relevant for the long-selling market, accounting
for the regional dimension.
(2) Choose representative models in the outlets.
(1) Choose outlets which are relevant for the long-selling market,
considering the regional dimension
•
Distinguish between large/small retailers and mail order/stationary retailers.
•
Include the largest retailers in any case. Check whether or not the large retail chains
have the same pricing policy in the entire country. If yes, prices can be centrally observed in big retail chains. If not, prices have to be observed locally taking into account the regional variance of price developments.
•
Include a sample of small retailers. This could be done by probability sampling if data
can be obtained from the business register of your respective national statistical of-
Federal Statistical Office, Statistics and Science, Vol. 13/2009
131
Books
fice. If information from the business register is not available, choose relevant regions
and within the regions select typical outlets (just as you do for other products of the
HICP).
•
Prices from small retailers should be observed locally in general (under consideration
of regional variations in price development).
•
If mail order selling is relevant in the market it should be taken into account by collecting prices from the internet or catalogues additionally.
(2) Choose representative models in the outlets
•
For the long-selling market, the “matched model approach” is suitable. This means
that price collection is not focused on e.g. one specific book title but rather on a representative of a particular series or edition. These representatives are deemed interchangeable. This means that the series or edition may serve as product specification
for replacements. The criterion for inclusion in the sample is that the market share is
stable and relevant.
•
The matched model approach is especially suitable for:
– Book series: Series are a numbered sequence of books of the same publisher that,
except for their particular content, have most or all of their objective characteristics in common. Series can be defined around a specific author (Karl May, Agatha
Christie) or around a specific theme (SF-stories, the French science pocket series
Que sais-je?).
– Dictionaries,
– schoolbooks,
– children’s books,
– art books,
– travel guides,
– . . . and the like.
•
Within the shop, the selection of specific book titles may be carried out in different
ways, depending on the availability of information. In most cases, information on longsellers can be obtained from shop employees. In the following, the selection of specific book titles is exemplified:
– For the observation of a book series, select one specific representative of the series. From the point of view of price statistics different titles of the same series
are normally interchangeable: they are (quasi-)perfect substitutes. As long as the
market share of the series is stable and relevant, it will be a good candidate to
be included in a book (sub-)index. It can be expected that pockets have a higher
probability to be included in series than normal paper- or hardbacks.
– For travel guides, for example, select the most sold travel guides. If possible (i.e. if
information is available) use the most recent representatives of a large and well
sold series as for example guides from the Rough Guide Travel Guides series or
the Lonely Planet Country Guides series. If there is no information about this matter then travel guides of those countries could be selected which are most visited
by the citizens of your country.
132
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
–
•
Concerning art books evaluate relevant and stable series, for example the Taschen
Basic Art Series. The latest edition in a series could be considered for price observation.
The selection of price representatives is done in the base period. Once selected the
sample is kept constant for one year or longer – if the chosen books are still representative. In each reporting period, the price of the latest edition or issue shall be reported.
Minimum size of the target sample
•
The sample for the long-selling market can be smaller than the one for the rapidlychanging market as in the latter a bigger sample is needed to reduce variance of the
index numbers.
•
For each consumption segment around three models might be observed in each selected outlet.
Additional aspects
•
If the books selected for one consumption segment within the long-selling market are
very heterogeneous then it may be necessary to stratify the market for example according to their binding. The following chart shows some possibilities for stratification:
Chart 107
Examples for product descriptions for different consumption segments
with stratification
Long selling market
6.2.3
Textbook; German dictionary,
best sold
Textbook; English-German dictionary,
best sold
a: Paperback, b: Hardback, c: Special
edition
a: Paperback, b: Hardback, c: Special
edition
No. of pages; Length in cm
No. of pages; Length in cm
Textbook; Travel guide,best sold
Textbook; Cookbook,best sold
a: Paperback, b: Hardback, c: Special
edition
a: Paperback, b: Hardback, c: Special
edition
No. of pages; Length in cm
No. of pages; Length in cm
Proposal for the replacement strategy
•
The sample will be kept constant during a fixed period (for instance a full calendar
year). During this period the sample is considered to be representative.
•
As already mentioned the latest edition or issue of a book series should be observed;
sequent editions or issues are considered interchangeable. By definition, unforced replacements (a certain book that has lost too much market share to remain representative) do not occur.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
133
Books
•
Forced replacements (the book has disappeared from the market) are possible, but
are not expected to happen very often. A replacement situation occurs only in the case
that a model disappears from the market during the fixed period.
•
Especially for series an update to new series has to be made on a regular basis, approximately once a year. Series or editions of books that are no longer representative
are then replaced by representative ones.
Chart 108
Forced replacement – long selling market
M
“Dictionaries”
Consumption segment
e.g. Langenscheidt English-German
M
...
e.g. Pons English-German
M
M
M
M
no longer available
Forced replacement
6.2.4
Proposal for the quality adjustment
•
Within the matched model approach, different titles of the same series and different
editions are deemed interchangeable. In this case, the prices of the different titles are
directly compared.
•
In case of forced replacements two types of changes can occur within the matched
model approach:
– Minor changes: If there is a forced replacement the obvious strategy would be to
search for a replacement that can be deemed comparable with the disappeared
item and to apply the direct comparison procedure in this case. For example, one
English-German dictionary is replaced by a new edition of the same quality. As the
dictionaries are of the same quality they are deemed interchangeable and direct
comparison may be applied.
– Major changes: Only in those cases where the incomparability between the disappeared item and its replacement is obvious an automatic application of direct
comparison cannot be maintained. In this case, expert judgement needs to be applied, meaning here that by expert judgement an appropriate quality adjustment
has to be chosen.
– Suppose that the sample of a book sub-index contains the cheap paperback
version of a vocabulary. At a certain moment this paperback version isn’t available any more in the market and therefore has to be replaced in the sample,
e.g. by a hardcover version. Other publishers still offer paperback versions,
134
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
–
but due to practical reasons these items will not be used as a replacement
(suppose that the local price collection is based on one of the three book
stores in town and that this book store sells vocabularies of only one publisher). In this case it is obvious that the most appropriate quality adjustment
procedure would be bridged overlap.
Suppose that the only cheap paperback version of a vocabulary on the market
isn’t available any more and therefore all consumers have no choice but to buy
a significantly more expensive hardback version. The application of a bridged
overlap is not straightforward in this situation. After all a significant part of the
consumers would have preferred the cheaper paperback version that has disappeared from the market. At first sight a direct comparison is not straightforward either: The more expensive version has (at least in some dimensions)
more quality than the cheaper version (e.g. it is more suitable to give it a
prominent place in the bookcase in the living room or it can serve as a nice
present) and a relevant part of the consumers will value this. However in this
case one should adopt a pragmatic perspective. Since the consumers have no
choice in the matter, direct comparison is still a sound quality adjustment strategy in this case.
Chart 109
Quality adjustment – long-selling market
M
“Dictionaries“
Consumption segment
e.g. Langenscheidt English-German
...
e.g. Pons English-German
M
M
M
Minor change:
Direct comparison
6.2.5
M
M
no longer available
Major change:
Expert judgement
(e.g. direct comparison
or bridged overlap)
Cost estimation
One-time effort for the introduction
• Approximately one month may be sufficient for the implementation of the method.
Current effort of price collection
• One to two man-days may be sufficient to collect prices.
Current effort of index calculation
• The index calculation should not take more than one day.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
135
Software
7
Software
•
Software appears in two COICOP-classes:
– Application programs belong to COICOP 09.1.3 “Information processing equipment”.
– Video games belong to COICOP 09.3.1 “Games, toys, and hobbies”.
•
Therefore, the approaches for application programs and video games will be described
separately in the following.
•
Regarding application programs, the Task Force results are only tentative recommendations proposing a monthly chaining and re-sampling approach.
•
Concerning video games, the HICP standards rate direct comparison in combination
with bestseller lists as a B-method. No method is classified as an A-method.
•
In the following only pre-packaged software is considered. Pre-packaged software products are purchased ready-made and not customised. Additionally, programs which are
pre-installed on the hardware are not considered, since there are different pricing
strategies of the producers for pre-installed and separately purchased software. As
a general rule, pre-installed software is regarded to be a component of the computer
hardware package.
•
The following section deals with the application of
– direct comparison of bestsellers within “consumer profiles” for application programs,
– direct comparison of bestseller lists for video games.
7.1
Application programs
7.1.1
Proposal for the identification of consumption segments
•
The term “application program” comprises a large variety of computer programs for
different consumption purposes. For the HICP only those application programs should
be considered that are relevant for the private consumer. The relevance of the different application programs for private consumption can differ from country to country.
Hence, the decision on which application programs to consider should be made individually by each NSI, following the circumstances of the national market.
•
The relevant consumption segments can differ from country to country. Information on
which consumption segments are relevant in a country can be obtained from market
research institutes or by performing a small own survey in relevant outlets. As a pragmatic proposal, approximately four groups of software applications could be considered. For example, the following consumption segments could be included:
– home office applications (e.g. MS Office),
– security software (e.g. anti-virus programs),
– education (e.g. language learning software),
– personal finance (e.g. tax preparation programs),
– ...
136
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
•
Consumption segments are identified according to the consumption purpose of the
application programs. For example, the consumption segment “home office applications” includes word processing, spreadsheet applications and the like for managing
a household’s private affairs. Security software serves the protection of the computer
system.
•
Operating systems are not considered as a separate application since the hardware
does not work without an appropriate operating system which is in most cases preinstalled on the computer. Therefore, operating systems are regarded as a part of the
computer hardware package.
7.1.2
Proposal for the construction of a target sample
•
As a sampling method a bestseller approach is suggested as this is a suitable method
to select representative application programs for the target sample.
•
For the compilation of the target sample two approaches are possible, depending on
whether national bestseller lists are used:
– choose application programs first (according to a bestseller list), then sample outlets, or
– sample outlets first, then choose representative application programs in the chosen outlets.
Chart 110
Different forms of bestseller lists at several levels
bestseller
central
shops
List at a national level
shops
Shop Î list of bestselling software applications
regional
Shop Î list of bestselling software applications
Shop Î information provided by shop employees
Shop Î software placed prominently (space)
approximation
Federal Statistical Office, Statistics and Science, Vol. 13/2009
137
Software
•
In the following, both sampling approaches are described in more detail.
If national bestseller lists are used
•
If there are national bestseller lists available then the relevant application programs
are known. Representative outlet types and particular outlets have to be identified in
consideration of the regional dimension in the next step.
Chart 111
When using a national bestseller list
models > outlets
(1) Choose relevant application programs.
(2) Choose representative outlets, accounting for the regional dimension.
(1) Choose relevant application programs
•
Depending on the kind of available bestseller lists, application programs have to be
chosen.
– If there are bestseller lists for each considered consumption segment select the
first n bestsellers from the list.
– If there is a heterogeneous bestseller list containing models of different consumption segments the models from the bestseller list have to be assigned to the respective consumption segments first.
•
Bestseller lists are compiled and published by market research institutes and/or by
computer magazines.
(2) Choose relevant outlets, accounting for the regional dimension
•
Distinguish between large/small retailers and mail order/stationary retailers.
•
Include the largest retailers in any case. Check whether or not the large retail chains
have the same pricing policy in the entire country. If yes, prices can be centrally observed in big retail chains. If not, prices have to be observed locally taking into account the regional variance of price developments.
•
Include a sample of small retailers. This could be done by probability sampling if data
can be obtained from the business register of your respective national statistical office. If information from the business register is not available, choose relevant regions
and within the regions select typical outlets (just as you do for other products of the
HICP).
•
Prices from small retailers should be observed locally in general (under consideration
of regional variations in price development).
•
If mail order selling is relevant in the market it should be taken into account by collecting prices from the internet shops or catalogues of the particular retailers additionally.
138
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
If a national bestseller list is not used
•
If no national bestseller list is used proceed in reversed order. First, select representative outlet types as well as particular outlets under consideration of the regional dimension. In these outlets, the most representative application programs have to be
chosen.
Chart 112
When a national bestseller list is not used
outlets > models
(1) Choose representative outlets, accounting for the regional dimension.
(2) Choose representative application programs in the outlets.
(1) Choose representative outlets, accounting for the regional dimension, as described
directly above
(2) Choose representative application programs in the outlets
•
Within the shop, the selection of specific application programs may be carried out in
three different ways, depending on the availability of information:
– In some cases shops provide local bestseller lists. The specific application programs can then be selected according to these local bestseller lists by choosing
the first n bestsellers.
– In other cases information can be obtained from shop employees.
– Alternatively, prominently placed application programs can serve as an indicator
of how well these application programs are sold. The rationale for this is that only
the most recent programs with a significant market share are placed prominently.
However, please be careful not to select special offers.
Minimum size of the target sample
•
Regardless of the chosen approach, the number of relevant bestsellers might be country-specific. However, in general we can assume that a few models per consumption
segment (e.g. the top two or the top three bestsellers) in each selected outlet are sufficient since the software market in general has an oligopolistic structure and thus the
products within the bestseller lists exhibit a relatively high market share.
7.1.3
•
Proposal for the replacement strategy
A bestseller list (whether a national one or one from a local shop) was used to select
initially relevant products. Those products that are still available and on the bestseller
list in the following month are not replaced. Those products that are not available anymore or have become clearly unrepresentative have to be replaced. Each month comparable replacements of these products have to be found. For this purpose – and only
for this purpose – consumer profiles are introduced.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
139
Software
•
The consumer profiles are based on different levels of the consumers’ consumption
purpose within one consumption segment. In other words, the consumer profiles divide the universe of software users in different sub-categories according to the range
of use of the programs. For example, regarding home office applications the overall
consumption purpose is managing the household’s correspondence and expenses.
But not every household needs a database management system which could be included in a premium version for managing the household’s private affairs. However,
some households want to exploit the whole range of an application program bundle.
Consumer profiles are introduced in order to catch the different “quality levels” in a
developing market.
•
Two consumer profiles are suggested in the following:
– Occasional user: This user is interested in the basic feature(s) of a product.
– Power user: This user is interested in the full scope of a program or software bundle (or product bundle).
These consumer profiles are only relevant for replacement, not for the target sample.
Therefore, the distribution of the market over the profiles does not play a role. The representativity of the sample is ensured by the bestseller list, not by the consumer profiles.
•
•
As a general rule, the replacement should be chosen so that it fits the consumer profile. By this, the comparability of replaced and replacement model is ensured. In order
to also ensure representativity, the first product on the bestseller list fitting into the
consumer profile should be chosen as a replacement.
•
In a replacement situation the following question has to be answered: “Which product
would the respective user buy instead of the one to be replaced?” Usually manufacturers offer an overview which can help assign the different versions to the respective
consumer profiles. The program’s name extensions like “plus”, “premium”, “extended
version”, “ultimate”, “standard”, “home”, “deluxe”, “pro”, etc. can be used as hints
for the assignment.
•
However, specific product knowledge is necessary, since name extensions are only
hints that are not used in a systematic way. Each manufacturer uses different strategies to mark different software bundles. In some cases the “power user”-package
may consist not only of a program (bundle), but of a bundle that offers additional services such as free online support, free updates etc. Information on the disposable versions or editions is available on the internet sites of the respective manufacturers.
•
For example: An unrepresentative deluxe version of an anti-virus program is replaced
by a now representative deluxe version of another anti-virus program. The replacement
is chosen by answering the question “Which program would the respective consumer
buy instead of the one to be replaced?”. This new program should again be chosen
from a bestseller list in order to maintain representativity of the models in the sample.
7.1.4
•
140
Proposal for the quality adjustment
With regard to quality adjustment several reasons suggest that the following quality
adjustment methods are not applicable for software:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
–
–
–
–
•
Hedonics: The quality of application programs is not measurable in physical units,
e.g. measuring operator convenience or the value of a single (new) software feature.
Bridged overlap: The assumption of a competitive market is violated in the field
of software. The small number of suppliers with extensive market shares results in
an oligopolistic market structure.
Option pricing: In general, options are not available in the field of application programs.
Expert judgement: The quality of application programs is difficult to capture. Comprehensive estimations of quality changes would be highly subjective and complex.
Since the above mentioned methods are not applicable in the field of application programs, direct comparison of bestsellers is suggested. The prices of the selected products are directly compared since none of the above mentioned quality adjustment
methods are adequate. In order to compare more or less homogenous models direct
comparison is applied within the consumer profiles. When applying this approach, the
price of a replaced basic version is directly compared with the price of a now representative replacement basic version.
Chart 113
Replacement within consumer profiles according to bestseller lists
Bestseller list
(Month 1)
Bestseller list
(Month 2)
1. MS Office Home and
Student
2. Norton 360 Version
2.0
1. Norton Antivirus 2008
2. MS Office Home and
Student
3. Kaspersky Internet
Security 2009
3. Norton Antivirus 2008
...
“Security software”
Consumption segment
...
M
M
M
M
Consumer profile 1
“Occasional user”
Consumer profile 2
“Power user”
Model change within consumer profile 2:
Direct comparison
Federal Statistical Office, Statistics and Science, Vol. 13/2009
141
Software
7.1.5
Cost estimation
One-time effort for the introduction
•
For the implementation of the method one staff member with some market expertise
is required.
•
One month seems to be necessary to set up an adequate sample and to perform the
suggested method.
Current effort of price collection
•
The price collector should be a trained staff member. There is no need for academic
skills. However, the price collector should have some market expertise.
•
One to three man-days may be sufficient to collect prices.
Current effort of index calculation
•
Regarding the index calculation, the price statistician does not necessarily need broad
econometric knowledge, only moderate skills are required.
•
The index calculation might be done quickly; an effort of approximately 1 man-day
seems to be sufficient.
7.2
Video games
7.2.1
Proposal for the identification of consumption segments
•
Two consumption segments of video games are suggested, according to the required
hardware systems:
– PC games,
– console games.
•
The consumption purpose of all video games is the same, namely entertainment. However, different specifications according to different hardware systems exist, which can
be used as a segmentation criterion. Regulation 1334/2007 states that consumption
segments can be formed around models that are largely described by the same specifications. (For console games a further stratification is suggested [see below]. However, it is not suitable to identify consumption segments by a particular console such
as Xbox or Playstation since the market changes quite rapidly.)
•
An explicit weighting for the two consumption segments is suggested. Information about market shares is available from market research institutes, game magazines or
from a small own survey.
7.2.2
•
142
Proposal for the construction of a target sample
As a sampling method a bestseller approach is suggested as this is a suitable method
to select representative models each month. As for application programs, the construction of the target sample depends on whether or not national bestseller lists are
available.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
Chart 114
Different forms of bestseller lists at several levels
bestseller
central
shops
List at a national level
shops
Shop Î list of bestselling games
Shop Î list of bestselling games
regional
Shop Î information provided by shop employees
Shop Î games placed prominently (space)
approximation
If national bestseller lists are used
•
First, relevant games have to be chosen. If there are national bestseller lists available
then the respective relevant games are known. Representative outlet types and particular outlets have to be identified in consideration of the regional dimension in the next
step.
Chart 115
When using national bestseller lists
models > outlets
(1) Choose relevant video games.
(2) Choose representative outlets, accounting for the regional dimension.
(1) Choose relevant video games
•
Depending on the kind of available bestseller lists, video games have to be chosen.
– If there are bestseller lists for each considered consumption segment select the
first n bestsellers from the lists.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
143
Software
–
•
If there is a heterogeneous bestseller list containing models of different consumption segments the models from the bestseller list have to be assigned to the respective consumption segments first.
Bestseller lists are compiled and published by market research institutes and/or by
computer magazines.
(2) Choose relevant outlets, accounting for the regional dimension
•
Distinguish between large/small retailers and mail order/stationary retailers.
•
Include the largest retailers in any case. Check whether or not the large retail chains
have the same pricing policy in the entire country. If yes, prices can be centrally observed in big retail chains. If not, prices have to be observed locally taking into account the regional variance of price developments.
•
Include a sample of small retailers. This could be done by probability sampling if data
can be obtained from the business register of your respective national statistical office. If information from the business register is not available, choose relevant regions
and within the regions select typical outlets (just as you do for other products of the
HICP).
•
Prices from small retailers should be observed locally in general (under consideration
of regional variations in price development).
•
If mail order selling is relevant in the market it should be taken into account by collecting prices from the internet shops or catalogues of the particular retailers additionally.
If national bestseller lists are not used
•
If no national bestseller list used proceed in reversed order. First, select representative
outlet types as well as particular outlets under consideration of the regional dimension. In these outlets, the most representative video games have to be chosen.
Chart 116
When national bestseller lists are not used
outlets > models
(1) Choose representative outlets, accounting for the regional dimension.
(2) Choose representative video games in the outlets.
(1) Choose representative outlets, accounting for the regional dimension, as described
directly above
(2) Choose representative video games in the outlets
•
144
Within the shop, the selection of specific video games may be carried out in three different ways, depending on the availability of information:
– In some cases shops provide local bestseller lists. The specific video games can
then be selected according to these local bestseller lists by choosing the first n
bestsellers.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Product-specific guidance and examples not using hedonics
–
–
In other cases information can be obtained from shop employees.
Alternatively, prominently placed PC and console games can serve as an indicator
of how well these games are sold. The rationale for this is that only the most recent games with a significant market share are placed prominently. Please be careful not to select special offers.
Minimum size of the target sample
•
The number of relevant bestsellers might be country-specific. However, in general we
can assume that a few models per consumption segment and stratum (e.g. the top two
or the top three bestsellers) in each selected outlet should be sufficient.
Additional aspects concerning console games
•
Concerning console games it is appropriate to further stratify according to the particular console:
– GameCube,
– Xbox,
– Playstation.
•
This stratification should be adjusted when the market circumstances change, e.g. a
new popular console is introduced in the market. The stratification has to be adapted
when new consoles (as for example “Wii”) are gaining relevant market shares.
•
If available, bestseller lists of console games according to the type of console can be
used for sampling. Bestseller lists according to console types can often be found in
magazines or in large outlets. For example, in a specific outlet games available for the
Xbox may be sorted according to their respective sales rank in that outlet. If there is
only an overall bestseller list for all console games the bestsellers have to be assigned
to the respective strata.
•
The console strata are roughly weighted according to their market shares. The weighting
information is available from market research institutes, game magazines or has to be
obtained from an own small survey.
•
For PC games no further stratification is suggested.
7.2.3
Proposal for the replacement strategy
•
When using a bestseller list, choose each month the first (approximately) two to three
games for console games from the same bestseller list in each stratum (depending on
the particular market structure in the specific country). Regarding PC games choose
the two to three bestsellers from the bestseller list. Due to this procedure, there is no
exact one-to-one replacement.
•
If no bestseller lists are available the price collectors have to check every month
whether the games observed in the previous month are still representative by applying the same criteria as in the previous month (e.g. games placed prominently in
the outlet). If they are no longer representative they have to be replaced by representative ones. In this case, the console strata from the target sample should be used as
product specification so that a Playstation game is replaced by another Playstation
game and not by an Xbox game.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
145
Software
7.2.4
Proposal for the quality adjustment
•
Calculate the arithmetic or geometric mean of the prices from one bestseller list or
from the compiled list of the outlets.
•
Directly compare the arithmetic or geometric means over time. The weighted mean
price changes are forming the index.
Chart 117
Replacement within strata according to bestseller lists
“Console games”
Consumption segment
M
M
M
M
M
Bestseller list of stratum 1
(e.g. Game Cube)
M
M
M
M
M
M
M
Bestseller list of stratum 2
(e.g. Playstation)
New game within stratum 2:
Direct comparison
7.2.5
Cost estimation
One-time effort for the introduction
•
For the implementation of the method one staff member with some market expertise
is required.
•
Up to one month seems to suffice to set up the sample and to perform the suggested
method.
Current effort of price collection
•
The price collector should be a trained staff member. There is no need for academic
skills or comprehensive market expertise.
•
One to two man-days should be sufficient to collect prices.
Current effort of index calculation
•
Regarding the index calculation, the price statistician does not necessarily need
broad econometric knowledge, only moderate skills are required.
•
The index calculation might be done quickly; an effort of approximately 1 man-day
seems to be sufficient.
146
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
1
Applying hedonic regression – concepts and guidance for use
The hedonic approach has basically the same aim as other approaches to quality adjustment. That is, the quality adjustment aims at making the price index show pure price
change, unaffected by quality change. The hedonic methods handle this task by using statistical regression analysis. A simple example will show how it works.
1.1
An introductory example
Consider a product group with a sample of 6 models. For times t = 1 and t = 2, prices for
each of the 6 models have been collected as in the following table.
Chart 118
t=1
t=2
Price
Size
Trait_A
Price
Size
Trait_A
390
480
700
550
520
490
23
39
51
39
35
43
0
0
1
0
1
0
290
519
700
550
520
698
23
39
51
39
35
53
0
0
1
0
1
1
Here “Size” is a size measure, while “Trait_A” is a variable with values 1 = “yes” and
0 = “no”, telling whether the model has a particular feature. “Size” and “Trait_A” are
called characteristics variables, as they express characteristics of the models.
In the last line of the table, the model was replaced between time t = 1 and t = 2, as the
model disappeared or became unimportant. By the replacement, the characteristics variables changed, as you can see.
Hedonic regression equation and hedonic function
We make a computer run with statistical analysis software, to fit a regression equation
to the data for time t = 1. The result is this equation:
(1)
ln Price = 5.604 + 0.0155 ⋅ Size + 0.1331 ⋅ Trait_A + ε
Here “ln” stands for natural logarithm. This equation is known as a hedonic regression
equation, and it expresses how the prices of the models depend on characteristics for
the models. The last term ε is called the residual term and is needed as prices are not
exactly determined by characteristics.
The hedonic regression equation may be re-written in this form:
(2)
Pric e = h ( Size,Trait _ A ) + r
= e 5.604 + 0.0155 ⋅ Size + 0.1331 ⋅ Trait _ A + r
Federal Statistical Office, Statistics and Science, Vol. 13/2009
147
Applying hedonic regression – concepts and guidance for use
Here h is a mathematical function, known as hedonic function, which expresses price as
a function of characteristics variables. Further e ≈ 2.718 is the base of the natural logarithm. Again, the last term r is a residual term, with a role similar to that of ε in Eq. (1).
Index calculation – concluding the example
Without quality adjustment, a price index with reference period t = 1 and current period
t = 2 is calculated from the prices in the table as:
(3)
§ 290 519 700 550 520 698 ·
I = ¨
⋅
⋅
⋅
⋅
⋅
¸
© 390 480 700 550 520 490 ¹
1/ 6
⋅ 100 = 102.29
Here the Jevons index formula, or geometric mean index, was used.
Now to apply hedonic quality adjustment, consider the replacement shown in the last line
of the table. We shall use the hedonic function h to compute a quality adjustment factor as
(4)
g =
h (Size of replacement model,Trait_A of replacement model)
h (Size of replaced model,Trait_A of replaced model)
Here plug in the computed expression Eq. (2) for h, and values from the table, to get
(5)
g = e 0.0155 ⋅ ( 53 − 43)
+ 0.1331 ⋅ ( 1 − 0 )
= 1.3339
The price index with quality adjustment by hedonics is calculated similarly to that without
quality adjustment in Eq. (3). But this time, multiply the reference price in the replacement by the quality adjustment factor g = 1.3339. The calculation thus becomes:
1/ 6
(6)
§ 290 519 700 550 520
·
698
¸¸
⋅
⋅
⋅
⋅
⋅
I = ¨¨
© 390 480 700 550 520 490 ⋅ 1.3339 ¹
⋅ 100 = 97.49
This completes the example.
Characteristics variables – continuous variables and dummy variables
As is seen in the example, characteristics variables play a crucial role in hedonics. These
variables serve to express quality of the models, so that quality change can be mathematically adjusted for. The idea of hedonics is to quality adjust the price index, so that the index shows price change corresponding to unchanged characteristics.
The example demonstrates that characteristics variables may be of two main types:
•
“Size” is a continuous variable, that is, a measure of some kind which may take various numerical values. Example: Screen size of a TV set.
•
“Trait_A” is a dummy variable, that is, a variable with only two possible values, 0 or 1,
indicating “no” or “yes”. Such a variable typically tells whether or not the product
model in question has some particular feature. Example: Stereo sound in a TV set, yes
or no.
148
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Some more terminology
The numbers 0.0155 and 0.1331 on the right of Eq. (1) are known as regression coefficients. They express the impact of the characteristics variables on the price. This will be
further described in Section 1.3.3, subsection “Assess the hedonic function at design
stage – plausibility of results”.
The characteristics variables used in hedonic regression are regressors, in the wording of
general regression analysis textbooks. Alternatively they are often called explanatory variables, as the regression equation “explains” price variations in terms of their variations.
In general regression theory, a regression equation is usually said to express a “regression
model”. However, in this handbook the expression regression equation will be preferred,
and “regression model” will be avoided. This is in order not to confuse with the priced
product models considered all the time, such as TV models, dishwasher models etc.
A synonym for “residual term” is “error term”, not to be confused with other uses of the
word “error” in statistics.
Role and meaning of the hedonic function
To summarise the above example in one sentence: The procedure is that first a hedonic
function h is computed from collected data, and then the computed hedonic function is
used for quality adjustment in the index calculation.
The hedonic function h thus serves as an intermediary step in the process, to yield the
quality adjustment factor g as defined by Eq. (4). It may be mentioned that the hedonic
function h also has an independent meaning in itself. As is shown by Eq. (2) the hedonic
function h reproduces the prices up to an “error”, measured by the residual term r.
The value of the hedonic function h for a given set of characteristics may then itself be interpreted as a price. This is a theoretically expected, or “predicted”, price for a model with
that set of characteristics. However, in using the type of hedonic method used in the example (hedonic re-pricing, cf. Section 1.2 below), we actually do not have to think much
about this fact in the practical calculations.
It should also be noted that the fictitious example given here is simplified, to demonstrate
the basic principles of hedonics. In actual applications of hedonics one uses considerably
larger sample sizes than just 6 models as in the example. Also, mostly more than two characteristics variables are used, although sometimes in actual practice using just two is relevant.
An authentic example of a hedonic regression equation
The following regression equation is an authentic example, expressing how prices P of
computer models depend on the characteristics “speed” and “memory”:
(7)
lnP = a + 0.783 ln( speed ) + 0.219 ln( memory ) + ε
The interpretation of this equation is similar to that of Eq. (1) in the example above. The
computer example is a simplified form of the hedonic regression equation found by
Dulberger (1989) and cited by Triplett (2006).
Federal Statistical Office, Statistics and Science, Vol. 13/2009
149
Applying hedonic regression – concepts and guidance for use
1.2
Types of hedonic methods
There are several technically different possible ways to use hedonic functions for quality
adjustment. The basic aim is the same for the types of hedonic methods, namely, to give
an index for price change at unchanged quality in terms of product characteristics. However the methods differ technically in the way they accomplish this.
Four main types of hedonic methods can be identified, namely the following (Triplett 2006;
cf. National Research Council 2002):
•
The hedonic time dummy variable method.
•
The hedonic characteristics price index method.
•
The hedonic price imputation method.
•
The hedonic re-pricing method.
In practice it should mostly be suitable to try to keep to one of the methods throughout.
The hedonic re-pricing method is here suggested first choice among the four methods.
This is the method used in the introductory example above. This method works by modifying observed prices in proportion to a hedonic function. Important advantages of the
Hedonic re-pricing method are that it works transparently, and that it can be applied flexibly to meet practical conditions.
The four types of hedonic methods and the choice between them are described and discussed more in detail in Annex 1.
1.3
Topics for consideration in applying hedonic methods
To apply the hedonic approach, one needs to consider and decide on several issues. Those
issues may be grouped into five topics:
•
Topic 1 – Data needed.
•
Topic 2 – Data editing.
•
Topic 3 – Creating the hedonic function.
•
Topic 4 – Index calculation.
•
Topic 5 – Refreshing the hedonic function.
In the following these five topics will be dealt with one by one. It is assumed that Hedonic
re-pricing is the method to be used.
Design stage and operative stage
Normally one should at least attempt to distinguish between design stage and operative
stage of the work. The design stage comes before the operative stage. It is at the design
stage that decisions are made on methods, variables etc. The work at the design stage
results in a plan and process instruction for the work at the operative stage. The operative
stage is when index numbers for presentation are actually produced from currently or recently collected price data.
150
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
The considerations according to the five topics should as far as possible be completed
at the design stage. The plan produced at the design stage should be thoroughly considered and tested, so that it can be followed without problems at the operative stage.
A distinction between design stage and operative stage is essential not least to the credibility of the index. There should be no room for suspicion that the choice of methods was
unduly influenced by desirability of results. Statistical theory is largely valid under the
condition that methods are fixed before one encounters actual data on which statistical
results are to be based.
1.3.1
Topic 1 – Data needed
Data are needed for two uses:
•
fitting the hedonic regression equation,
•
monthly index calculation.
Data needed for fitting the regression equation – price and characteristics variables
For fitting the hedonic regression equation, data on both price and characteristics variables are needed for a sufficiently large set of models of the product in question. This set of
data is to be used in a statistical regression analysis which yields the estimated hedonic
regression equation. The result is a hedonic function which is to be used in monthly index
calculation.
The regression analysis and re-computation of the hedonic function is typically not performed monthly but with longer time intervals in between. In the periods when this is to
be done, a larger sample may have to be used than in regular monthly index calculation,
due to the requirements of the regression analysis.
Data needed for monthly index calculation – price and characteristics variables
Also in the monthly data collection for index calculation, values of the characteristics variables have to be collected along with prices. Specifically, values of characteristics variables have to be collected when product models are replaced in the price collection, such
as when a model in the sample is no longer for sale or is no longer chosen by many consumers. In such replacement situations, characteristics of the replacement model are collected for use in the quality adjustment by hedonics.
How to select characteristics variables
A crucial question is what characteristics variables to collect and use in the hedonic function. Some initial selection is called for, to avoid overambitious data collection which
would be unnecessarily costly to the index producer and to respondents. Notably, not all
kinds of variables would be relevant anyhow.
The selection of characteristics variables has to be considered with regard to the features
of the particular product area and market in question. Differences in market conditions
could potentially lead to differences between countries regarding what choice is most adequate.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
151
Applying hedonic regression – concepts and guidance for use
However, generally the following criteria are relevant for the characteristics variables to
be selected:
•
They should reflect quality in a consumer perspective.
•
They should be perceived by consumers.
•
They should be related to the price.
•
They should not reflect fashion.
•
They should not reflect production cost not giving consumer value.
To keep fashion variables out is necessary to avoid a long-term bias. Fashion features
such as skirt length can oscillate forth and back over time, and changes in both directions
cannot be considered lasting quality improvements (cf. Guédès 2007). The borderline between quality and fashion may not always be entirely clear, so here again some sound
judgement is required. To take another example, if the colours of laptop computers change
over time, this should probably be seen as due to fashion – unless it can really be shown
that certain colours have a lasting higher value to consumers.
It is not the aim to find variables to account for all price differences. It is only the qualityrelated differences that are to be captured by the characteristics variables.
Variables expressing production cost
Variables expressing production cost do not necessarily reflect quality to the consumer.
Where they do not, they should not be quality adjusted for, even if they may influence the
price. For instance transport of raw materials needed for a product may influence the price
but does not constitute quality to the consumer.
Nevertheless it may occur that variables related to production cost are also related to quality to the consumer, such as the kind of material in a garment. Such variables should of
course be considered to be quality adjusted for. Some support for sometimes using production cost variables is supplied by Gordon (1990, Sect. 2.5). He shows theoretically
that under certain conditions, the user-value and resource-cost criteria will tend to give
equivalent results.
Omitted variables bias
The set of characteristics variables should be complete enough to reasonably fully account
for occurring quality differences between models. An incomplete set of variable may entail
omitted variables bias.
An example could occur for computers. Suppose the hedonic function accounts for clock
rate in gigahertz (GHz) but not for processor type. Now, a computer with a processor of
more advanced type may have higher performance than a computer with a processor of
less advanced type, even if the clock rate is not higher. Not recognising this, and then
omitting indication of processor type, can make the quality adjustment go wrong.
Knowledge about products and market conditions may generally be essential to avoid
omitted variables bias. Relevant product features are often likely to be announced in advertising etc. which should be followed.
152
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Intangible product properties corresponding to “subjective” consumer value may present
some difficulty. They may be essential to consumer perception of quality, but they may not
so easily be measured by some characteristics variables.
Brand dummy variables
In some cases it may be motivated to include dummy variables for groups of brands as
characteristics variables. This could be the case if one of these conditions applies:
•
Brand may be a proxy for intangible aspects of quality, related to “subjective” consumer perception.
•
It may sometimes be necessary to control for brand to make the effect of other variables show up correctly.
Use of dummy variables for brands is somewhat controversial among experts. An argument against such use is that brand does not in itself express useful quality, and then accounting for brand may disturb the impact of other variables. However, a conclusion for
HICP practice is that dummy variables for brand groups may be motivated under each of
the two conditions just mentioned.
The price variable
There is a somewhat fine point on the price variable, regarding the treatment of price reductions. The ideal requirements on the price variable are slightly different between the
use for fitting the hedonic regression equation and the use in monthly price calculation.
For the monthly index calculation, there are rules for the treatment of reduced prices in the
Regulation on price reductions (Commission of the European Communities 2000). The rules
there state that the prices used shall be as after deduction of such price reductions that
are available to all consumers. This applies to prices reduced due to sales, stock clearances etc.
For fitting the hedonic regression equation, the matter is somewhat different. It could reasonably be argued that here it might be preferable to take prices as they are before such
short-lasting price reductions that are due to temporary promotion offers, stock clearances, going out of season or out of fashion, or similarly. The reason is that in order to accurately reflect market value of quality, prices should reflect a state of equilibrium, that
is, a state of balance between supply and demand. Stock clearance prices etc. do not.
Or to put it in other words, retailers and producers occasionally have logistic reasons
(aging overstock, large deliveries etc.) to speed up business by offering bargains, with
prices considerably below a sustainable market value. It may then be preferable to exclude those bargain prices, as it seems at least questionable if they duly reflect the quality
of the products.
However this may be a somewhat subtle point, and there may be special cases depending
on the kind of price reduction. Also, in practice there may be limitations on what can be
done in price collection here. Nevertheless it may under some conditions be possible to
collect the desired information on causes of price reductions.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
153
Applying hedonic regression – concepts and guidance for use
Number of price observations needed
Typically more price observations are needed for fitting the hedonic regression equation
than for the monthly index calculation. In the index calculation the law of large numbers
ensures that sampling errors tend to cancel out on higher levels of aggregation. In the regression analysis one cannot be sure it works quite that way. If the sample for the regression analysis is too small, the hedonic function will be dominated by noise and cannot
reflect quality differences.
A general rule of thumb may be stated as:
For regression analysis use a sample so large that there are at least some 15 to 20 price
observations per characteristics variable in the regression equation.
This rule of thumb assumes that the sample is statistically fairly efficient, as would be the
case for a simple random sample in a homogenous stratum of models. If the sample is
less efficient a correspondingly larger sample is called for.
There is some basis in statistical theory for the rule of thumb; cf. Annex 2.
1.3.2
Topic 2 – Data editing
Data have to be edited and prepared in some ways to be rendered suitable for use in the
computations: Fitting the hedonic regression equation, and calculating the index.
Identify spurious outliers (as usual)
The usual data cleaning procedures used in index calculations apply as well when the
hedonic approach is used, of course. Indeed the data cleaning should in principle be
performed at least as carefully as otherwise.
This applies to the treatment of outliers, in the sense of observations with unusually high
or low prices or price changes. Usually in index calculation, such outliers are picked out
for verification, as they may be due to mistake or error in observation or recording. This
cleaning is essential not least before fitting the hedonic regression equation, as regression
analysis in its usual form is notably sensitive to outliers.
Transform variables by logarithm
The hedonic regression equation usually contains the logarithm of the price. Sometimes it
may also contain the logarithm of other variables. For that reason the price, and possibly
other variables, have to be transformed by logarithm before the regression analysis.
Transform category variables into dummy variables
The original data sets are likely to contain variables expressing category or type in some
respect. Examples are type of screen on a TV set, and type of cd-rom or dvd drive in a computer.
To facilitate the application in the hedonic approach, it is convenient to transform such
category variables into sets of dummy variables. Then define one dummy variable for each
category except one of them. The excluded category serves as a reference to which the
other categories are compared.
154
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Example: Type of drive on a computer. If there are n drive types occurring in current data,
then define n – 1 dummy variables: One dummy variable for each drive type, except for
one drive type. Each dummy variable indicates presence of the drive type in question.
Tricks to edit down the number of characteristics variables
It is often a good idea to try to have relatively few characteristics variables in the hedonic
regression equation. Some tricks for achieving this by data preparation are:
•
Group together alternative features into groups with one dummy variable per group,
instead of having one dummy variable for each single alternative.
Example: Brands. There may be tens or hundreds of brands, so normally do not define individual dummy variables for single brands. Instead define a few broad brand
groups. Perhaps it could suffice with just three brand groups, such as brands of low,
medium or high status, respectively.
•
Select among or take average of variables that are closely related to each other.
Example: Size measures. The size of a book could be measured in several ways, such
as number of pages, physical weight, height of cover, width of cover, area of cover,
physical volume etc. Do not use all those variables in the hedonic regression equation, but instead choose perhaps two of them, or perhaps the geometric mean of two
of them.
•
Consider representing the rank of ordered categories as one continuous variable, instead of using separate dummy variables.
Example: Energy economy rating for appliances. Instead of defining dummy variables
for rating categories, consider treating the rank of the rating category as one single
continuous variable. Some care should be taken to check by judgement that the stepsize does not become vastly varying over the range.
Generally, do not include variables that are actually constant or practically constant.
The desirable feature of using few variables is sometimes known as parsimony. This may
enhance the accuracy and keep down the necessary number of observations in the estimation of the hedonic function.
Missing values in fitting the hedonic regression equation
In practice it may occur that for some of the product models in the sample, information
is not available on all of the characteristics variables. For instance, a cheap TV set sold in a
hypermarket might not be accompanied by a full technical specification with all details
such as screen resolution etc. Hence some values of characteristics variables may be missing. However, regression analysis in its usual form can use only observations that have
values on all the variables occurring in the regression equation.
Dropping all observations (product models) with some value missing is of course a potential way to deal with the problem. This could perhaps do if observations with missing values occur only exceptionally. But usually this is not a desirable solution, as it may lead to
selection bias and bad precision.
Imputation in a simple pragmatic form should mostly be a better practice. It could be carried out in this way:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
155
Applying hedonic regression – concepts and guidance for use
•
For each continuous variable that may have missing values, take two actions: (a) Define a dummy variable indicating whether or not the value is missing; and (b) For the
continuous variable itself, impute (insert) a constant for all missing values. (Note 1:
The choice of the constant is not critical but should be in the natural range of the variable. Note 2: If there are two or more continuous variables which can be missing only
together, use just one dummy variable for their joint missingness.)
•
For each dummy variable that may have missing values: Impute a default value for all
missing values. (Note: The default value chosen for imputation should normally correspond to a “plain” model of the product, without more luxurious or unusual features.)
•
Optionally, one may consider defining dummy variables indicating missing values also
in sets of dummy variables. Example: Brand groups. A separate dummy variable could
be defined to indicate “no brand”.
This simple approach may often be adequate. An advantage is that the process is fairly
transparent, and one has some common sense control of what happens here. The choice
of default values for dummy variables should be done using product knowledge.
Imputation by more advanced statistical techniques may also be considered, although they
are probably not yet so much corroborated for use in hedonic regression. Such techniques
may rely on assumptions which the potential implementer should be well aware of and
should then assess in the light of product knowledge.
Missing values in index calculation
In the monthly index calculation, missing values for characteristics variables may often
be less of a problem than in fitting the hedonic regression equation. In the index calculation characteristics variables are used for quality adjustment when models are replaced.
Thus imputation should often not need to be considered in index calculation. Nevertheless,
under some conditions the same imputation as for the regression may have to be used
also in index calculation.
Remember to update the editing process
The development in the markets alters the conditions also for the data editing process.
For instance, category variables may take more possible values than before, as models
with newly developed features show up on the market. Then one has to decide on how to
handle this in the transformation to dummy variables. New dummy variables may have
to be created, while others may have become obsolete and can be dropped. Or perhaps
new features may be grouped together with existing ones, and be represented by the same
dummy variable.
Brand groups (cf. Section 1.3.1, subsection “Brand dummy variables”) are likely to deserve particular attention, if they are used. In some markets new brands emerge all the
time, and then the new brands have to be classified into appropriate brand groups. This
classification is a judgemental task which requires knowledge of markets and products.
156
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
1.3.3
Topic 3 – Creating the hedonic function
Creating the hedonic function is naturally a key issue. The hedonic function has to be designed and assessed to ensure that it can adequately reflect quality differences. In this
work it is essential to have a reasonably thorough understanding of statistical regression
theory, as it is treated in textbooks such as those by Ryan (1997); Draper & Smith (1998);
Kutner et al. (2004). The following treatment can only highlight a few essential points, and
for further advice and explanations the reader is referred to the textbooks.
Again – how to choose characteristics variables
Again, the choice of characteristics variables is a crucial issue. In the treatment of Topic 1
in Section 1.3.1 above, principles for choosing characteristics variables were discussed.
And in the treatment of Topic 2, it was shown how the characteristics variables may be
modified by editing.
For the design of the hedonic function it has to be more definitely decided on which variables to include in the hedonic function, and in which form. And again a primary concern
should be to select the variables so that they reflect essential quality differences between
product models.
These important considerations must to a large extent rely on knowledge of the product
area. Also, specific advice for such considerations has to be given separately for each
product area and take account of the particular product aspects there. Nevertheless, the
view should be that of consumers rather than that of product engineers. Thus the characteristics variables in the hedonic function should at least indirectly reflect quality as it
makes sense to consumers.
Choosing the function form – normally choose a log-linear form
The choice of function form for the hedonic regression equation is usually likely to be less
critical than the choice of characteristics variables. A standard choice should mostly be
adequate.
A semi-logarithmic hedonic regression equation looks like this:
(8)
ln P = b0 + b1 ⋅ z 1 + b2 ⋅ z 2 + ... + bk ⋅ z k + ε .
Here ln stands for natural logarithm, P is the price, and z1 , z2 , ... , z k are characteristics variables, which may each be either continuous or a dummy variable. Further
b1 , b2 , ... , bk are regression coefficients to be estimated by fitting of the regression
equation to observed data, and ε is a residual term.
A double-logarithmic hedonic regression equation is similar and can alternatively be used
when the characteristics variables are continuous. In this form logarithms are used also
for the characteristics variables, so the equation looks like:
(9)
ln P = b0 + b1 ⋅ ln z 1 + b2 ⋅ ln z 2 + ... + bk ⋅ ln z k + ε .
The double-logarithmic form has to be modified if dummy variables are present, as the logarithm is then not taken for dummy variables. For example, with two continuous characteristics variables and one dummy variable, the hedonic regression equation would be like:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
157
Applying hedonic regression – concepts and guidance for use
(10)
ln P = b0 + b1 ⋅ ln z 1 + b2 ⋅ ln z2 + b3 ⋅ z 3 + ε .
The semi-logarithmic and the double-logarithmic forms, together known as log-linear
forms, may generally both be seen as recommended choices. They seem to be the forms
most commonly used in hedonic regression, and there seems to be good reasons for this,
for instance considering the behaviour of the residual term (cf. end of next subsection).
The semi-logarithmic and double-logarithmic forms both yield regression coefficients that
have meaningful independent interpretations. This is an advantage for transparency and
in assessment of the performance in practice.
A further notable property of the semi-logarithmic and double-logarithmic forms is this:
If all prices are multiplied by the same constant factor, then all the regression coefficients
b1 ,..., bk are unaffected, and only the constant term b0 changes. This should be a desirable property, as the relative impact of quality characteristics on the price should not have
anything to do with the purchasing power of the Euro.
The choice between the semi-logarithmic and double-logarithmic forms may be made case
by case. The semi-logarithmic form may have a slight transparency advantage, in treating
continuous variables and dummy variables equally (cf. ILO et al. 2004, Sect. 7.98).
On the other hand, the double-logarithmic form may be more suitable when continuous
variables may vary over wide ranges, with high ratios between upper and lower bounds.
In use of the double-logarithmic form, be careful if some continuous characteristics variables may assume zero value, as logarithm of zero is undefined. A possible remedy is
then to do as suggested for missing values above (Section 1.3.2, subsection “Missing values in fitting the hedonic regression equation"): For each continuous variable where zeros
may occur, define a dummy variable to indicate zeros, and impute a positive constant for
the zeros.
Other function forms
As the log-linear function forms just described should be adequate in most cases, other
function forms should not often need to be considered. However potentially several alternatives would be possible.
In a plain linear hedonic regression equation, logarithms are not used, and it looks like:
(11)
P = b0 + b1 ⋅ z 1 + b2 ⋅ z2 + ... + bk ⋅ z k + ε .
This form has found some use, but it may generally be seen as less adequate than the
log-linear forms described above. An indication of inadequacy is that the hedonic function may here turn out to predict negative prices, which does not make sense.
Using a Box-Cox transformation for the price on the left-hand side could be seen as a generalisation of the semi-logarithmic and plain linear forms. This transformation is defined
by a control parameter λ as:
158
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
(12)
­P λ − 1
°
°° λ
®
°lnP
°
°¯
for λ > 0
for λ = 0 .
For λ = 0 using this transformation would give the semi-logarithmic form, and for λ = 1 essentially the plain linear form. Those two alternatives now appear as opposite extremes
of a continuous range of alternative function forms, obtained by varying λ . On transformations, cf. further Ryan (1997), Ch. 6; Davidson & MacKinnon (1993), Ch. 14; and Emerson
& Stoto (1983).
Box-Cox transformations are probably of at most secondary relevance for hedonics, although the control parameter λ could be optimised for best fit to data. A reason is that
on intervals of relevant length, correlations tend to be high even between functions of notably different form. Then probably not so much is to be gained by fine-tuned transformations.
In other applications of regression analysis than hedonics, such as econometrics, different
criteria are in use for choosing function form (cf. Kennedy 2008). One such criterion is to
stabilise the variance of the residual term, or in other words, to eliminate heteroskedasticity; see subsection “Residual analysis – interpretation” a few pages below. In
practice it may be difficult to formally apply such criteria to hedonic applications, and normally this should not be urgently needed. Nevertheless, it may be noted that from general
theoretical plausibility arguments, the log-linear function forms should often be expected
to be suitable for hedonics also in view of the mentioned criterion.
Estimating the hedonic function
The actual fitting of the hedonic regression equation, yielding the estimated hedonic
function, is carried out by a computer run using some standard software package for statistical analysis. Regression analysis is available in several well-known such program
packages, such as Minitab, R, SAS, SPSS, STATA, and SYSTAT.
In practice this means the following. For example, suppose that the hedonic regression
equation to be fitted is of the semi-logarithmic form,
(8)
ln P = b0 + b1 ⋅ z 1 + b2 ⋅ z 2 + ... + bk ⋅ z k + ε .
A data table (computer file) is prepared containing one row for each product model in
the sample. The table has columns containing the values of ln P and the values of the
characteristics variables z1 , z2 , ... , z k . The computer program is set up to perform a
linear regression analysis on the data in the table, with the column containing ln P as
“dependent” variable and” z1 , z2 , ... , z k as “independent” variables. The output of
the computer run consists in the estimated values of the regression coefficients
b1 , b2 , ... , bk . And along with them there is usually a lot of additional information
produced.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
159
Applying hedonic regression – concepts and guidance for use
The computer run for the regression analysis is in a way the easiest part of it all. The user
just has to feed in the data, and the computer does the work. The more difficult part consists in the considerations needed before and after, as is described in this chapter, to ensure that it all makes sense.
The computational method used by the computer program is normally a classical method
known as ordinary least squares (OLS). The guiding principle of that method is to determine the regression coefficients so that the sum of squares of the values of the residual
term becomes as small as possible (cf. e.g. Ryan 1997, Sect. 1.4). But that all takes place
inside the computer. For deeper views on the estimation theory, see McCullagh & Nelder
(1989).
Weighted regression?
Occasionally some kind of weighting information may be available for the product models.
For instance sales volumes may be known for the individual models. In index calculation it
is often seen as advantageous to use such weighting information where it is available.
On the other hand, for the regression analysis the first choice should be not to use
weighting information.
Some explanation is called for. There are forms of regression analysis that can use
weighting information for the observations, available in standard software packages. The
question is whether to use this option or not.
A basic fact from the classical Gauss-Markov theorem (Wilks 1962, Sect. 10.3; Davidson &
MacKinnon 1993, Sect. 5.5; Kutner et al. 2004) is this: Under the condition that the regression equation is “correct” in the sense that prices truly tend to follow a law of the
same form (but with unknown true coefficient values), then un-weighted estimation of
the regression coefficients is unbiased. So under that condition weighting is not needed
to yield unbiased coefficient estimates.
A regression equation is never perfectly correct in the mentioned sense. But if it has been
deemed correct enough to be applied at all, it may be correct enough also for un-weighted
coefficient estimation. Furthermore it follows from theoretical work by Nordberg (1989)
that un-weighted estimation of regression coefficients is unbiased also under somewhat
more general conditions. Essentially such a condition is zero correlation between residuals
and weights.
Regarding precision, again the Gauss-Markov theorem just mentioned asserts that under
the condition of the regression equation being correct, un-weighted regression gives the
best precision. An intuitive way to realise this is to note that in a weighted analysis, a few
observations with large weights may dominate, so that the sample size becomes effectively small there.
But there is still an issue of robustness. Sometimes observations with small weights may
be likely to be exceptional outliers. Those may disturb unduly if they are used without account for their low weights.
A conclusion could be this: Normally do not use weights. However, if models with very low
weights are likely to be exceptional (outliers in price), then possibly consider using
160
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
weights with a suitably chosen low upper limit. In the latter case, retain weights below
the limit, and replace those above the limit by a constant equal to the limit.
It should be mentioned that this advice may not be quite uncontroversial, as some experts
advocate use of weighting. Also, the advice given here to avoid weighting is restricted to
the hedonic re-pricing method. For the Hedonic time-dummy variable method (cf. Annex 1),
probably available weights should normally be used.
Assess the hedonic function at design stage – principles
At the design stage one should perform test runs of the regression analysis, using available data from previous periods, from pilot surveys or so. This should be done to assess
whether the hedonic regression equation can be considered to work adequately: whether
it fairly fits data, is not disrupted by disturbances, etc.
A lot of information is usually produced in regression runs. In the test runs, this information should be examined carefully, as it could reveal possible problems with the choice
of variables, impurities in data, or other design aspects. Some primary issues on this will
be discussed in the following.
The assessment should essentially be completed at the design stage. Nevertheless some
monitoring should be in place also at the operative stage, in the production of actual indices to be presented. The latter efforts may be seen as a complement to data editing, to
ensure that unexpected deficiencies in data or processing do not go undetected.
The R-square (R2)
The R-square, or R2, is a number produced in the regression analysis, along with the regression coefficients. R-square is a number between 0 and 1. A high R-square indicates that
the regression equation fits closely to the data, as then the residual term is generally
small. Then the hedonic function closely follows the prices.
An R-square equal to 0 would mean that the hedonic function is constant. Then the chosen
characteristics variables are not at all related to the price, and so the hedonic function
can say nothing about price variation. The opposite extreme, an R-square equal to 1,
would mean that the hedonic function is always equal to the price. This would be as if
one monopolist sets all prices according to one simple tariff, and it should normally not
happen in real markets.
The R-square is among the first things to look at to assess a regression run. The Rsquare should be fairly high, to indicate that the hedonic application may work properly.
How high R-square should be expected may vary between product areas, and this has to
some extent to be based on experience. General rules of thumb could hardly be given. In
some cases even an R-square as low as 0.25 may be adequate, while in other cases one
would rather expect a figure around perhaps 0.75.
It should be stressed that a higher R-square is not always better than a lower R-square.
For instance, it may in some cases be more adequate to run separate regressions for each
of two strata, rather than one regression on both strata pooled together, even if the Rsquares then become substantially lower.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
161
Applying hedonic regression – concepts and guidance for use
Which R-square – take the “adjusted” one
Actually two R-square numbers are presented in parallel: The “unadjusted” R-square, and
the lower “adjusted” one (see, e.g., Ryan 1997, Sect. 7.4.1).
Suggestion: Mainly use the adjusted R-square, rather than the unadjusted one. The adjustment corrects for the fact that due to finiteness of the sample, the “raw” R-square
tends to be higher than what would be expected for a very large sample. The adjusted Rsquare may potentially even become negative, and just like a zero or very low R-square,
that outcome indicates that the characteristics are practically unrelated to the price.
Nevertheless, one may quickly glance at the unadjusted R-square as well, just to check
that it is not unexpectedly extremely close to 1, with a value of 0.99 or so. Such a high
value could possibly indicate some kind of degeneracy in data, leading to invalid results.
Assess the hedonic function at design stage – plausibility of results
Assess the plausibility of the regression coefficients. The regression coefficients have
an independent interpretation and could thus be assessed in view of product
knowledge and common sense.
Consider again the hedonic regression equation obtained in the introductory example
above.
(1)
ln Price = 5.604 + 0.0155 ⋅ Size + 0.1331 ⋅ Trait_A + ε .
This hedonic regression equation is of semi-logarithmic form (as defined above). For such
an equation, the regression coefficients can be interpreted in the following simplified approximate way:
•
The regression coefficient 0.0155 for the continuous variable Size means this: If
Size increases by one unit, then the expected price increases by approximately
0.0155 . 100 = 1.55 %.
•
The regression coefficient 0.1331 for the dummy variable Trait_A means this: If the
feature indicated by Trait_A is added, then the expected price increases by approximately 0.1331 . 100 = 13.31 %.
So it is possible to assess whether the results make sense from a point of view of product
knowledge. However some estimation error must be allowed for; see next subsection.
Particularly, beware of coefficients with “wrong sign”. If increment in a variable should
mean higher quality and higher expected price, then the estimated coefficient should be
larger than zero. If instead it has a minus sign in front of it and is thus smaller than zero,
then this should be taken as a warning. Possibly there could be some disturbance from
omitted variables bias (cf. Section 1.3.1, subsection “Omitted variables bias”), or from
multicollinearity (cf. subsection “Use diagnostics for multicollinearity”, shortly below), etc.
For further discussion, see Kennedy (2008, Sect. 22.3 [with Notes]).
It may be noted that the interpretation of coefficients just described involves rather rough
approximations, with the limited aim of facilitating quick assessments of plausibility. For a
closer discussion on interpretation of coefficients of dummy variables, see Kennedy (2008,
Sect. 15.1 [with Notes]).
162
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Test statistics – use with care
The regression run usually also provides estimated standard errors for the estimated regression coefficients. There may as well be “t-values” which are in principle ratios between
the regression coefficients and their standard errors.
The standard errors indicate the uncertainty or “estimation error” in the estimation of the
regression coefficients. A “margin of error” by usual standards corresponds to a double
standard error. Thus under certain conditions, the true coefficient should mostly lie within
the interval given by the estimated coefficient plus/minus two times the standard error.
The t-values are used to test whether the coefficients are significantly distinct from zero;
that is, whether the variable in question is significantly related to the price. Very simply
stated, by usual standards the variable in question is deemed to be significantly related
to the price if the t-value is either below – 2 or above + 2.
However, it should be stressed that one should be a little careful here and perhaps take
the standard errors and t-values as merely indicative, not too literally.
The reason is that the computations rely on a mathematical theory using assumptions that
may be problematic in the price statistics context. There are assumptions of stochastic
mutual independence between observations, and these may in price-statistics be disrupted as observations are clustered by manufacturer etc.
If a weighted regression is used, standard errors and t-values may be grossly invalid and
should not be used at all, unless particular precautions are taken.
Criteria for variable selection
There are actually several criteria for variable selection, and for instance the adjusted Rsquare can be useful here. It may turn out that some variables could be omitted from the
regression equation without diminishing the adjusted R-square very much. Then it could
be worth considering doing so – even if the t-values suggest that the variables are “signifycant”.
Other criteria are for instance the “information criteria” AIC and BIC (see e.g. Kennedy
2008, Notes to Sect. 6.2). Like the adjusted R-square these can be used to compare the
fit of models with fewer or more variables. In doing so they put a penalty on the number
of variables, so as to somewhat favour having fewer variables in the regression equation.
Data-driven variable selection?
It is sometimes suggested that decisions on design issues such as variable selection
should be data-driven. The idea is then to let decisions on variable selection be guided by
assessment on data, usually aiming at a regression equation with good fit to the data.
However, there may be reasons to be a little careful here. A primary concern should always be that the characteristics variables chosen have to be relevant from points of view
of consumer perception and product knowledge. A too automatic hunt for optimal fit could
then be misleading, for instance by the difficulty to identify variables reflecting fashion
which should not be quality adjusted for.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
163
Applying hedonic regression – concepts and guidance for use
In the regression equation, should one exclude variables with regression coefficients not
significantly distinct from zero in test runs? (Cf. preceding subsection.) Often this may
be the right thing to do, as those “insignificant” variables do not prove to be important.
An appealing reason for excluding such variables is that their coefficients may be likely
to occasionally come out with the “wrong sign” just by chance, which may not seem to
make good sense (cf. subsection “Assess the hedonic function at design stage – plausibility of results”, recently above).
However there is some reason to think twice here. There may be a possibility that separately insignificant variables may be significant together. There may also be a possibility
that an insignificant variable would have turned up as significant if the sample had been
larger. Thus occasionally insignificant variables should perhaps not be thrown out
immediately, as they may have a role to play in the larger picture.
Another complication is that significance tests should be taken as merely indicative, as
was mentioned in the preceding subsection. Further, when a lot of significance tests are
considered simultaneously there is a problem of “mass significance” or “multiple comparisons” which would have to be tackled so that control is not lost.
However, even if one should be a little careful in interpreting the t-values and significance
tests, one should of course not ignore them. Their message is indeed important. A particular issue may be noted here. It may occur that a variable turns out to be persistently
insignificant, although it would be believed to be important from a perspective of market
knowledge. This would seem to indicate that the variables are actually too many, for the
regression analysis to fully sort out their impacts, at least in the sample used (due to
“collinearity”, or co-variation of variables; cf. the subsection “Use diagnostics for multicollinearity” shortly below). Then some variable should be excluded, probably the persistently insignificant one.
It may be advisable to use a compromise approach, and take guidance from both market
knowledge and outcomes of test runs. The process could be roughly as follows:
Start with an initial idea from product knowledge. Then assess the initial idea in view of
test runs, but be a little conservative. Do not modify the variable set immediately after a
single test run. Instead modify it if a persistent pattern suggesting this emerges after
several test runs and informed interpretation of their outcome.
More in passing it may be noted that modern statistical theory provides more systematic
approaches to variable selection and multiple testing, as developed by Miller (2002),
Miller (1981) and Savin (1984). Possible application of such techniques to hedonics may
need more methodological research.
Use diagnostics for influential observations
The software for regression analysis usually produces a lot of further information, “regression diagnostics” which may give useful indications on potential problems; see, e.g.,
Atkinson & Riani (2000); Belsley et al. (1980); Donald & Maddala (1993); Kennedy
(2008); Ryan (1997); SAS (2004); cf. also Bartlett & Lewis (1994).
Some of these diagnostics warn about “influential observations”. There are several such
diagnostics, known as leverage, Cook's distance, DFFITS etc. “Influential observations” are
164
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
observations with particularly strong impact on the estimated regression coefficients. The
diagnostics show if there are single observations which on their own have a notable impact on the regression coefficients or on the hedonic function. If this is the case, it may
indicate a too great sensitivity to disturbances.
Solving such a problem, if it turns up, may need some consideration. One could have to
re-consider variable selection, sample size, the process of data editing etc. But do not arbitrarily use these tools in the operative index production just to throw away problematic observations, as that could be bias-prone.
Use diagnostics for multicollinearity
Multicollinearity is said to occur when there is some close relationship between characteristics variables. When multicollinearity occurs that may be problematic, as the regression equation then becomes unstable and the regression coefficients uncertain.
There are regression diagnostics to detect multicollinearity. For instance, one can use
variance inflation factors (VIF), eigenvalues and condition number of the covariance matrix, and correlations between variables and between estimated regression coefficients.
It should be good practice to take a quick look at those diagnostics, although, as Triplett
(2006) notes, the problem of multicollinearity should perhaps not be exaggerated for
hedonic regression.
The variance inflation factors (VIF) are computed for each characteristics variable. They are
factors by which the variance of the coefficient estimates are inflated due to collinearity. It
has been suggested that a VIF of 10 or above may be taken as a warning (cf. Kutner et al.
2004).
The condition number of the matrix XTX (see Annex 2) is equal to the ratio between the
largest and the smallest eigenvalues of that matrix. Normally exclude the constant column
(corresponding to the intercept) of the X matrix in this consideration; this is achieved by
for instance option COLLINOINT of the usual regression procedure PROC REG in the SAS
System (cf. SAS 2004). Limits between 10 and 100 have been suggested for considering
the condition number “high” (Belsey et al. 1980; SAS 2004).
When multicollinearity occurs in hedonic applications, it may often be expected in advance, by the nature of the products in question. For instance, for some kinds of products such as cars and books, there are different variables that are each strongly related
to the size of the model. Then diagnostics may not have much to add. The problem should
then preferably be handled upfront in selection of variables, rather than by trying to cope
with too many variables as long as diagnostics remain just modestly alarming.
Examine residuals
Consider again the hedonic regression equation, for instance in the semi-logarithmic form,
(8)
ln P = b0 + b1 ⋅ z 1 + b2 ⋅ z 2 + ... + bk ⋅ z k + ε .
The last term ε on the right-hand side here, known as residual term (or error term), may
be seen as a little more interesting than just a disturbance. The behaviour of the residuals
could reveal a little more than the R-square about how well the regression equation fits
to the data.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
165
Applying hedonic regression – concepts and guidance for use
When the regression equation is designed to fit observed data, it may therefore be useful to examine the values of the residual term. The residuals are then computed for each
of the observations used to fit the regression equation, and this can often be practically
accomplished by the software used to fit the regression equation.
The residuals may be plotted against characteristics variables or background variables, or
they could be analysed in statistical tables; see, e.g., Goodall (1983); Ryan (1997); Kutner
et al. (2004). Such plots or tables could reveal presence of outliers, skewed outlier distributions, deviating outlier distributions in some cells, etc. This in turn could indicate risk of
omitted variable bias, need of enhanced data editing etc.
Residual analysis – an example of setup
The following Chart 119 shows an example of a residual analysis. A hedonic regression
was run on a set of German price data for notebook computers in April 2006. The regression equation was of semi-logarithmic type, with continuous characteristics variables for
clock rate (“speed”), RAM capacity and screen resolutions, and four dummy variables for
processor types and one for type of screen. The number of observations was 199, and the
adjusted R-square was 0.59.
The chart is a scatter plot, where the computer models in the sample are plotted by residual on the vertical axis and clock rate in megahertz (MHz) on the horizontal axis. Each
“x” or “c” denotes one or more computer models, and the models denoted by “c” are
those of a particular brand. The scatter plot was produced by the regression program (here
PROC REG of the SAS System; cf. SAS 2004).
The scale of the vertical axis refers to differences in the natural logarithm of the price. This
so since the regression equation is of the semi-logarithmic form like Eq. (8). This means
that for instance a residual value of 0.1 corresponds to an observed price that is roughly
10 % higher than the price given by the hedonic function (as 0.1⋅100 = 10).
The idea of the scatter plot is to give a useful impression how the residuals are distributed. Particularly it can be seen how the distribution of residuals depends on the clock
rate, which was chosen for the horizontal axis as it is a key characteristics variable for
computers. Namely, the x's and c's along a given vertical line correspond to computer
models with a given clock rate.
166
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 119
Scatter plot of residuals by clock rate (MHz) for a sample of notebook computers
Residual analysis – interpretation
A residual analysis such as the one just shown (Chart 118) can be interpreted as follows.
What one would like to find here, if the regression equation works as assumed, is fulfilment of these conditions:
(1) The residuals should be fairly symmetrically distributed around the zero level, everywhere along the horizontal axis.
(2) The residuals should be fairly symmetrically distributed around the zero level, also for
important subgroups of observations, such as brand “c” in this example.
(3) The dispersion of the residuals should not vary much between different values on the
horizontal axis.
(4) There should essentially not be exceptional outliers among residuals.
If conditions (1) or (2) would be violated, then this could indicate that there are some
essential characteristics variables that have been omitted and would have to be included
in the regression equation. Not doing so could result in omitted variables bias (cf. Section 1.3.1, subsection “Omitted variables bias”). Violations of condition (3), known as
heteroskedasticity, may normally be a little less worrying in hedonic regression. It should
Federal Statistical Office, Statistics and Science, Vol. 13/2009
167
Applying hedonic regression – concepts and guidance for use
normally not lead to bias but is likely to affect precision and significance tests (cf. Davidson
& MacKinnon 1993, Sect. 16.5). In exceptional cases, the function form of the hedonic
equation might have to be re-considered, or weighting considered, in order to deal with
heteroskedasticity. Outliers (condition [4]) could be disturbing if they are prevalent, and
thus they then need consideration.
The scatter plot could be supplemented by computations, such as computation of the
mean residuals for chosen subgroups. Those means should be close to zero, to verify condition (2).
For the actual data shown in Chart 119, it is seen that there is not so much to worry about
regarding conditions (1) – (4). It is seen that the residuals are indeed fairly symmetrically distributed around the zero level. There is no clear evidence of heteroskedasticity,
as the variation in dispersion does not seem clearly larger than may correspond to variation in number of observations. There are some outliers but they seem notably few, in this
example.
Particularly, the residuals for models of brand “c” are also seen to be symmetrically distributed around zero. Actually a computation shows that the residuals for brand “c” have
mean – 0.02, thus close enough to zero, and standard deviation 0.20.
Potential of advanced regression methods
Nowadays there are several more advanced methods for regression analysis. Far from all
of them, but some of them may be of potential value for hedonic regression, for instance
in handling outliers, or interdependence between observations. The latter might make statistical significance testing better valid. Examples could be “forward search” (of observations), and generalised linear models, both described by e.g. Atkinson & Riani (2000),
and further robust regression, as described by e.g. Maronna et al. (2006). Here further
methodological research would be called for.
1.3.4
Topic 4 – Index calculation
When the method of hedonic re-pricing is used, the index calculation is basically carried
out as it is without quality adjustment, except that a hedonic function is used to modify
prices.
Index calculation usually straightforward
The monthly current index calculation is usually straightforward when the method of
hedonic re-pricing is used. The hedonic function h (.) should usually have been computed
earlier, so it can be assumed to be available for use in the index computation.
Then the monthly process for index calculation is as follows:
1. For each product-offer with a model replacement since the preceding month, compute
a quality adjustment factor as:
(13)
168
g =
h ( Characteristics of replacement model)
h ( Characteristics of replaced model)
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
2. For each of the mentioned product-offers, modify the reference price by multiplying
it with the quality adjustment factor g, computed for that product-offer.
3. Calculate the index by the usual index formula.
Hedonic function form vs index formula – don't worry
Normally one should not have to worry whether the hedonic function form and the index
formula go together well. In practice a log-linear hedonic function form should be useful
irrespectively of which elementary index formula is used: geometric mean index (Jevons
index), or ratio of mean prices (Dutot index). The approximation possibly involved should
usually be quite harmless.
The process just described for monthly index calculation applies to both index formulas.
The quality adjustment factor g in each replacement is computed in the same way for both
cases, and in both cases the reference price is multiplied by it.
Application for non-matched replacement
The hedonic re-pricing method can be used also when model replacements in the price
collection are not made one-to-one.
In the usual price-collection practice, there is a one-to-one matching between observed
product-offers in the price reference period and those in the comparison period. Many of
the product-offers are the same in both periods, while some may be replaced one-toone as they have gone out of the market or become unimportant.
In some cases such one-to-one matching between the periods is not possible or not suitable. For instance, novels or newly built houses are more or less individual objects and
cannot suitably be matched between periods.
To apply hedonic re-pricing to such a situation without matching, modify all prices of both
periods, by multiplying them with a quality adjustment factor obtained as:
(14)
g =
h ( Standard set of characteristics)
h ( Characteristics of actual model)
The numerator here is the value of the hedonic function for some hypothetical “standard”
model with given characteristics.
The idea is that all prices, of both the price reference period and the current period, are
modified so as to correspond to the same standard quality.
Choice of domains for hedonic functions
A decision to be made at the design stage is to determine the domains of products that
shall have their own hedonic functions, that is, their own hedonic regression equations.
A probably often useful default is to choose consumptions segments as domains for the
hedonic functions, and thus use separate hedonic functions throughout separate consumption segments. There are conceptual reasons for this choice, as according to the Regulation (1334/2007) consumption segments shall form the fixed objects of the index basket to be followed.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
169
Applying hedonic regression – concepts and guidance for use
However, the default idea of having one hedonic function per consumption segment
should not be taken as a strict rule. In some cases the accuracy of the index might be
deemed to be better served by some other choice. It could occur either that one common
hedonic function could be used for two or more consumption segments, or conversely
that separate hedonic functions could be used for different strata within the same consumption segment.
Using one hedonic function over two or more consumption segments could be applicable
under some conditions: First, the relevant characteristics variables should in principle be
the same in the consumption segments in question. Second, these characteristics variables should also have a similar role for the quality in the different consumption segments.
If one hedonic function is used over two or more consumption segments, a dummy variable should be used to indicate the consumption segment. A potential advantage of the
approach could be better accuracy, by better precision in the estimated regression coefficients.
On the other hand, using separate hedonic functions for different strata within a consumption segment could be motivated if different strata have very different sets of relevant characteristics variables. This could occur in a technology shift, such as that from tube TV sets
to flat-screen TV sets. However, an alternative way of handling such a situation could be
using the same hedonic function throughout the consumption segment anyhow, and allowing for missing values by the imputation technique described in Section 1.3.2.
1.3.5
Topic 5 – Refreshing the hedonic function
The hedonic function has to be re-computed, that is “refreshed”, often enough to stay
up-to-date.
Need of frequent refreshing due to product development
Product development causes a need to refresh the hedonic function. Product characteristics get changed valuation, and newly developed features become important. Then the
hedonic function needs to be re-computed to stay adequate.
Example: Assume computer producers benefit from a sharp drop in prices of processor
chips used in computers. Then prices of computers are likely to become less strongly dependent on the processor clock rate. If yet the existing hedonic function continues to be
used, this will lead to an over-adjustment for improvements in clock rate. For the retained coefficient for clock rate in the hedonic function is higher than it currently should be.
Frequent enough refreshing of the hedonic functions should be most urgent in areas with
rapid technological development, such as computers and video cameras.
How about other areas, with less rapid technological development, should frequent refreshing be considered urgent there too? Here there is a lack of clear evidence, and views
differ somewhat. It does not seem quite as evident to what extent frequent refreshing may
be urgent for e.g. TV sets, where the consumer visible technology is relatively more stable.
170
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
A trade-off
There may be some trade-off to be made here. There may be some danger in refreshing
more frequently than as is motivated, as it may then be more problematic to prevent fashion from getting an undue influence.
Example: If, say, metallic blue TV sets would become fashionable, they may get a higher
price than other TV sets. But when they go out of fashion next year or so, they get a lower
price than other TV sets. Now, suppose that metallic blue colour becomes accounted for
in a frequently refreshed hedonic function. Then metallic blue colour would be accounted
for first as good quality when it comes in, and then as bad quality when it goes out.
And so, in the end TV sets would be treated as having become permanently better just by
having been metallic blue for a while. Of course that would not make sense and would
produce a long-term bias in the index.
This hypothetical example may seem somewhat extreme. A feature such as colour of a TV
set would likely be recognised from start as a fashion feature and thus not considered
for quality adjustment. But the point is that the borderline between quality and fashion
may not always be very clear. Some new feature highly valued today may or may not be
highly valued tomorrow, perhaps no-one knows yet. To complicate further, possibly some
new features may at start have both a temporary fashion value and a lasting quality value.
The difficulty to recognise fashion features vs quality features may entail a risk that frequent refreshing of the hedonic function leads to bias as in the example.
When to refresh – monitor the market and run control regressions
Usually it is probably not feasible to refresh the hedonic function every month. So there is
need for some indications on when refreshing is called for. Two main possibilities here are:
•
•
Monitoring the market for information on changes in product features and their valuation.
Control regressions to check for change in regression coefficients.
Probably a combination of these two approaches is advisable. A fully automatic criterion
based on control regressions may not be feasible, as the tools provided by statistical theory may not fully comply with the conditions here. There is not a full consensus on this matter. For evaluation of the suggested control regressions, significance tests on regression
coefficients can be used as indicative evidence whether the impact on the price of the
characteristics variables has changed (cf. Section 1.3.3, Subsection “Test statistics – use
with care”). A suitable option could be a form of Chow test as described by Kennedy (2008,
Notes to Sect. 15.4).
By the latter approach one would, just for the sake of the control regression, formally apply
the computations of the hedonic time dummy variable method (see Section 7.1.1) in two
versions: In one version the coefficients for the characteristics variables are the same for
both periods compared (as in Eq. [15]), and in the other they are allowed to differ between
the periods. It can then be tested by an F test whether the latter regression equation fits
the data significantly better than the former does. (It may be noted that the second regression equation mentioned would be useless for directly computing an index, and it is of
interest only for the comparison of fit described.)
Federal Statistical Office, Statistics and Science, Vol. 13/2009
171
Applying hedonic regression – concepts and guidance for use
Suggested convention
In the lack of clear evidence on appropriate frequency of refreshing the hedonic function,
the following convention may be suggested:
•
Perform scheduled control regressions to check the constancy of the regression coefficients.
•
Switch to using re-estimated regression coefficients if they deviate significantly from
those in use.
•
Assign a person or a group with product knowledge, to monitor the market for signs of
need of refreshing.
1.4
Cost aspects
Applying hedonic regression in index production incurs costs due to:
•
competence needs,
•
data processing amendment,
•
extended data collection.
Competence needs
In the index computation staff, competence is needed both on the markets, product areas
in question, and on the statistical theory and practice of regression analysis. This competence is needed to provide a sound basis for decisions on design of process details, including variable selection, stratification, refreshing plan, etc., as has been discussed in
this chapter. In the staff there should be persons with a thorough background in statistical theory of regression analysis.
The competence needs are most prominent at the design stage. But to a notable extent
they are needed also at the operative stage, to monitor diagnostics etc. in re-computations
of the hedonic function.
Data processing amendment
Some effort of system design is probably needed. The data processing environment for the
index production has to be enhanced to provide for:
•
Flexible management of characteristics variables.
•
Smooth extraction and transfer of data needed for hedonic regression by standard
software for statistical analysis.
•
Management (storage and presentation) of hedonic coefficients etc.
•
Automatic process in monthly index production, for modifying prices by hedonic function.
•
For monthly index production also enhanced tools for data editing, particularly for
micro-level tracing of impact of quality change.
These enhancements should primarily be a one-time effort when the system is (re-)designed for hedonics.
172
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Extended data collection
Data collection has to be extended to include the characteristics variables needed. Also,
the sample size may have to be increased in periods of refreshing the hedonic function.
The need to collect characteristics values of course increases the demand on monthly
price collection. There may also be an increased demand on manual efforts in monthly
data editing, as more suspected measurement errors may be found and have to be followed up for verification of the measurements.
As far as the data needed can be collected from web sites, that should be a cost advantage. For representativity, monthly price collection may still have to be made in conventional shops, and then characteristics variables have to be included. But for refreshing
the hedonic function, it might possibly suffice to use web sources. These considerations
have to be carried out in a skilled way for each situation, in view of relevant conditions
regarding index accuracy, market features and cost.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
173
Television sets
2
Television sets
•
Television sets belong to a market with a low replacement rate and high technological change (e.g. large screen television sets). The Task Force on Quality Adjustment
and Sampling concluded that the hedonic method is highly recommended for television sets (A-method) besides some implicit methods like bridged overlap which are
subject of Chapter Two.
•
The empirical studies of the CENEX members showed that hedonics and bridged overlap with appropriate stratification yield similar results.
•
The following section deals with the application of hedonic re-pricing for television
sets.
2.1
Proposal for the identification of consumption segments
•
The suggested consumption segments are identified as described in the non-hedonic
section for television sets.
2.2
Proposal for the construction of a target sample
•
For the application of hedonic re-pricing, usually two samples have to be collected.
The first sample is called “regression sample” which serves the calculation of the
hedonic regression equation. The second sample is called “index sample” and is used
to calculate the price index for TV sets. The sampling procedure for the index sample
is described in non-hedonic section for television sets. The index sample may or may
not be a sub-sample of the regression sample.
The regression sample
•
To apply the hedonic method, a regression sample has to be collected. This sample
is needed for the calculation of a systematic relationship between the prices of the
TV sets and their particular characteristics. Resulting from this relationship the hedonic
equation can be established.
•
In the case that different types of TV sets are considered it is suggested to collect
separate regression samples which might be used to run separate regressions.
•
The regression sample can be collected according to the same procedure as the index
sample. The index sample may also be a sub-sample of the regression sample or it
can be independent from the regression sample.
•
To find out which characteristics are relevant to quality adjust TV sets for, if possible
try several approaches:
– Use publicly available information from advertising, shop displays, TV market magazines etc.
– Use experience from other national statistical offices, which may be further ahead
in implementation of hedonics for TV sets, or from other research.
– Consult product experts in the TV set business – but remember that the consumer
perspective is what is relevant, not engineering effort or so as such.
174
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
–
A more ambitious approach could be to conduct a pilot study with a rather broadminded collection of characteristics. But do not do that if you do not already have
a rather clear mind about what you are looking for, as else you would likely just
waste your and respondents' resources.
•
The same set of characteristics needs to be collected for each television set, regardless of whether it belongs to the regression sample or to the index sample. Since there
are different TV types (LCD, tube, plasma – see below) which differ in their characteristics different sets of characteristics need to be collected for each TV type.
•
It is important not to consider those variables that reflect fashion. This could be for
example the colour of the TV set.
•
The following characteristics may serve as orientation for LCD TVs:
– brand cluster,
– TV type,
– screen size,
– number of pixels,
– brightness,
– contrast ratio,
– reaction time,
– frequency,
– picture format (aspect ratio),
– sound system,
– picture in picture,
– types and number of connectors (scart, HDMI, etc.),
– outlet type,
– ...
•
The following characteristics may serve as orientation for tube TVs:
– brand cluster,
– screen size,
– picture format (aspect ratio),
– sound system,
– types and number of connectors (scart, HDMI, etc.),
– outlet type,
– frame rate,
– ...
•
The brands of the selected TV sets should normally be assigned to brand clusters. A
classification of brand clusters can be obtained from the PPP statistics (Purchasing
Power Parities – price indices comparing the national price levels). The available
brand cluster lists of the PPP classify brands into high/medium and low brands. In
these lists, not all available brands are classified. Missing brands should be assigned
to one of the brand clusters by informed judgement.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
175
Television sets
Chart 120
Extract from the PPP classification (adjusted for TV sets)
Brand cluster
Assigned brands
Higher and Medium
HITACHI, JVC, LG, LOEWE, PANASONIC, PHILIPS,
PIONEER, SAMSUNG, SHARP, SONY
Lower
BEKO, BIGA, DAEWOO, DIBOSS, ELEMIS, EVELUX,
FUEGO, LENCO, MATRIX, QUADRO, SENCOR,
SCHNEIDER, SILVA, VIVAX
Source: Eurostat (Lot C), European Comparison Programme: Consummer price survey E07-1
“House and garden” Particular survey guidelines
•
Given the technological change of TV sets, new features should be included regularly.
This is not so frequent that it would be necessary to select a new set of relevant characteristics every month, but the price statistician should be aware of the possibility.
•
Whenever a new characteristic is introduced, the price statistician should judge
whether the new characteristic is important enough to be included. Therefore, a continuous market observation is necessary.
•
Of course, new characteristics can only be used to adjust for quality differences if
this characteristic is collected in both the price reference month and the current month.
•
An appropriate form which could be distributed to the price collectors should precisely ask for the considered characteristics and might look like this:
Chart 121
Example of a price collection form
Television, screen size ~ 32 inches, LCD, higher or medium brand
Price (€):
929.–
Brand and model description:
Sony KDL-32D3000
Screen size (inch):
32
Resolution:
1366 x 768 pixels
Contrast ratio:
8000:1
Reaction time:
8 ms
Brightness:
450 cd/m²
Frequency:
: 50 Hz
… 100 Hz
Aspect ratio:
: 16:9
… 4:3
Sound system:
: Surround
… Stereo
Connectors:
… VGA
… DVI
… Picture in Picture
… Mono
… Others:_________
: HDMI
: Others: Scart, PC
: HD ready
Shop: ___________________________
Date: ___________________________
176
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
For the estimation of the regression equation it is desirable that price observations
used reflect a state of equilibrium (balance between supply and demand). That means
that if it is practically possible, it would normally be preferable that temporarily reduced prices due to promotion offers or stock clearances can be identified in the
price collection and not used for estimating the regression equation. (However, note
that in the index calculation, on the other hand, reduced prices shall not be excluded.)
Minimum size of the regression sample and the index sample
•
2.3
The regression sample should contain 100 to 300 observations depending on the number of characteristics that are used in the regression equation (see section on quality
adjustment). As a rule of thumb, 15 to 20 observations per characteristic which is finally included in the regression equation should be collected.
Proposal for the replacement strategy
General remarks
•
The general remarks for the replacement strategy are the same as described in the
non-hedonic section for television sets. Please consider that the replacement strategy only refers to the index sample. The regression sample is a cross-sectional sample where no “replacements” can occur.
•
In order to check whether the chosen products are still representative of the market
the market has to be observed regularly. This regular observation can also be used
to determine general changes (e.g. new equipment, new brands, technological progress, new characteristics) which might have to be included into the regression equation (see section on quality adjustment below).
•
If a model is no longer available or no longer well sold it should be replaced. The successor model should be best sold and representative.
•
It is necessary to observe the market in order to account for general changes. Therefore, the representativity of the models in the index sample should be checked within
the scope of monthly price collection and on a regular basis in the central office.
Representativity check within the scope of the monthly price collection in the shop
•
Use the concrete TV models that were in the sample in the previous month as a starting point for the price collection of the current month in order to keep the share of
matched models reasonably high. However, the main criterion for the selection of replacements should be representativity.
•
The price collector should check whether or not the models of the previous month
are still representative. The price collector should assess this each month with the
help of the shop assistant. If this is not practicable the price collector can also revert
to indicators such as the way the product is advertised or presented in the shop or he
might also revert to his experience.
•
If the model of the previous month is still representative continue observing the price
of this model. If the model is no longer representative or if it is no longer available replace it by a representative one.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
177
Television sets
Representativity check in the office
•
Additionally to the representativity checks conducted by the price collector a representativity check should be conducted regularly in the office.
•
Check whether the chosen products are still representative of the market. Therefore,
the market has to be observed regularly. This regular observation can also be used to
determine general changes (e.g. new equipment, new brands, technological progress,
new characteristics, trend towards larger screen size) which might have to be included
into the regression equation (see section on quality adjustment below).
2.4
•
Proposal for the quality adjustment
As stated in the section on the construction of the target sample, for the application
of hedonic re-pricing it is necessary to collect two data sets. In order to calculate the
hedonic equation a comprehensive database is necessary which is called “regression
sample”. The hedonic equation is used to apply quality adjustment in the case of replacement situations which can only occur in the index sample. In the following sections the individual steps are described in great detail.
Topic 1: Data needed
Collection of quality characteristics
•
The compilation of the regression sample as well as of the index sample is described
in the section on the target sample.
•
The regression sample could look like this:
Brand
Version
Price
Brand cluster
Size (cm)
Brightness
Picture format
Contrast ratio
Sound
A
LG
RE20LA30
799
High
51
450
4:3
450:1
Stereo
1
B
Philips
20PF8846
599
High
51
450
4:3
350:1
...
...
...
...
...
...
...
...
... ...
•
Virtual
Sound
...
...
Model
1
Number
of pixels
t
Chart 122
Example of a data base of a regression sample for LCD TV sets
640x480 . . .
640x480 . . .
...
...
The variable “Version” is not needed for the calculation but makes the identification
of individual TV models easier.
Topic 2: Data editing
•
178
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
Reformat category variables to dummy variables. These can only take values of 0 or 1
depending on whether the model is equipped with a specific feature or not.
•
Numeric variables like the price or the screen size are transformed by logarithm prior
to the regression.
t
Model
Price (ln)
Brand
Size (ln)
Picture format
Brightness (ln)
Picture format
Contrast ratio (ln)
Surround Sound
Stereo sound
...
Chart 123
Example for the modified data base of the regression sample for LCD TV sets
1
LG
6.6834
1
3.9318
0
6.1092
0
6.1092
1
0
...
1
Philips
6.3953
1
3.9318
0
6.1092
0
5.8579
0
1
...
...
...
...
...
...
...
...
...
...
...
...
...
Topic 3: Creating the hedonic function
•
After the data has been arranged, the hedonic regression equation can be set up. In
this regard, the price (as dependent variable) should be explained as a function of
the model’s characteristics (independent variables).
•
Statistical programs like SPSS, Stata or SAS are helpful to calculate the regression.
The computer program is set up to perform a linear regression analysis on the data
of the regression sample. The price is to be considered as the “dependent” variable
and the characteristics (brand dummies, screen size, etc.) are considered as “independent” variables.
•
With regard to the general function form of the hedonic regression equation a doublelogarithmic function form is suggested. For dummy variables the logarithm is not taken
whereas continuous variables have to be transformed by logarithms.
ln P = b0 + b1 ⋅ ln ( screen size ) + b2 ⋅ picture format + b3 ⋅ specialised store +
+ b4 ⋅ warehouse + b5 ⋅ brand cluster + + ε
•
The hedonic regression equation contains dummy variables (brand, picture format,
specialised store, warehouse) and continuous variables (screen size). For each dummy variable, one characteristic value has to be defined as default (reference) value,
e.g. the reference value for picture format may be “4:3” and for brand cluster it may
be “lower brands”.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
179
Television sets
•
The statistical program computes the coefficients (corresponding to the particular
characteristics) and further statistical values which contain information on the significance of the variables.
•
The result of the regression analysis could look like this:
Chart 124
Example for the results of the hedonic regression
R² = 0.8172
N = 91
Regression coefficients
t-value
Variance inflation
b0
Constant
1.01320
1.75
2.2574
b1
ln (screen size)
1.4052
9.29
2.26994
b2
Picture format
0.3111
3.91
1.14571
b3
Specialised store
0.2986
3.94
1.22207
b4
Warehouse
0.1783
2.76
1.24657
b5
Brand cluster “high”
0.2338
3.15
1.17197
•
The variables that were deemed relevant to quality adjust for have been collected as
discussed under Topic 1. These were included into the regression equation. Potentially, some of the considered variables might not be significantly related to the price.
Those “insignificant” variables could be identified by their t-values. At the design
stage when the regression equation is tested, variables with t-values tending to be
within the interval of – 2 to + 2 could be considered for skipping from the regression
equation to be used operatively. One may have to be a little careful here, and some
skilled judgement is called for. It should be noted that the t-test is approximate, as
strictly seen it depends on assumptions that are usually not under perfect control.
•
Particularly in the testing of the regression equation at the design stage, it should be
checked whether the algebraic signs of the coefficients are as expected in view of
the meaning of these. If a “wrong” sign occurs, the possible cause should be searched:
Whether it is just expected random variation, or a misspecification of the variable, or
contaminated data, or something else. Possibly a variable may then be identified as
problematic and ultimately have to be skipped from the regression equation. But
DON'T skip a variable automatically as soon as a “wrong” sign turns up, as doing so
may just sweep problems under the carpet and produce invalid results.
•
The particular regression coefficients can be interpreted as follows: The coefficient for
a continuous variable like screen size (b1 = 1.4052) indicates that a larger screen size
of one percent results in an expected price increase of approximately 1.4052 %. The
coefficient for a dummy variable like picture format (b2 = 0.3111) means the inclusion
of that feature (from 4:3 to 16:9) results in an expected price increase of approximately 31.11 %.
•
In order to assess the adequacy of the regression estimation, consider the adjusted
R² value. For TV sets this value should probably be expected to be around 0.5 – 0.6
180
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
or higher, as in principle a persistent relation of technological characteristics and the
price should be expected. An R² very close to 1 should not normally occur and may
indicate severely contaminated data. On the other hand, an adjusted R² considerably below 0.5 indicates a relatively low explanatory ability of the regression equation.
A possible cause may be that the sample is taken from a fairly narrow and homogeneous stratum, where large characteristics differences do not occur. If so, the results
may possibly be useful anyhow, in spite of the low R²; here some skilled judgement
may be called for. Another possible cause for a low R² may be that some essential characteristics variable has been left out and would have to be included in the regression
equation.
Topic 4: Index calculation
•
In the following the index sample is considered which contains 25 to 30 price observations for TV sets.
•
At first those situations where a replacement takes place have to be detected. In these
cases the sampled TV models change from one month to the next which implies different characteristic values and thus different quality levels of the two products. Those
quality differences have to be adjusted as described in the following.
•
In each replacement situation the values of the hedonic function h for the replaced
and the replacement model has to be calculated. Therefore, insert the variable values
of the respective model in the hedonic equation.
•
The coefficients of the hedonic equation have been calculated in Topic 3 and the variable values of the respective models are collected within the scope of price collection.
In the example, the regression coefficients are:
Chart 125
Example of the regression coefficients
Regression coefficients
•
b0
Constant
1.01320
b1
ln (screen size)
1.4052
b2
Picture format
0.3111
b3
Specialised store
0.2986
b4
Warehouse
0.1783
b5
Brand cluster “high”
0.2338
In the next step, the quality adjustment factor has to be computed. The quality adjustment factor (“g”) is the relation of the values of the hedonic function h of replaced
and replacement model. In other words, it is the mathematical ratio of quality for the
new and the old model and refers to only this particular replacement situation:
g=
h ( X tB )
h ( X tA−1 )
=
Market value of the replacement
Market value of the replaced model
Federal Statistical Office, Statistics and Science, Vol. 13/2009
181
Television sets
•
This quality adjustment factor is used to adjust the quality of the replaced model by
multiplying its price with the quality adjustment factor:
Pt A−1 ⋅ g = quality adjusted price of the replaced m odel
•
To calculate the quality adjusted price development, divide the price of the replacement product by the quality adjusted price of the replaced product:
PtB
Pt A−1
•
⋅g
=
observed price of the replacement
observed price of the replaced model ⋅ g
For the index calculation, insert the quality adjusted price development in cases of replacements instead of the normal term. Here model A has been replaced by model B:
§ PB
PC
PD PE
PF
I = ¨ At =1 ⋅ tC=1 ⋅ tD=1 ⋅ tE=1 ⋅ tF=1
¨P ⋅g P
t =0 Pt =0 Pt =0 Pt = 0
© t =0
·
¸
¸
¹
1/ 5
, with
PtB=1
Pt A=0
being replaced by
PtB=1
Pt A=0 ⋅ g
Topic 5: Refreshing the hedonic function
•
Monitor the market in order to consider technological progress and market changes.
•
As the product-specific market develops the validity of the calculated linear regression
can change. Especially the significance of the variables as well as the regression coefficients themselves can change since the product undergoes technological progress.
In order to reflect those technological changes, the regression equation has to be
updated regularly. The empirical studies conducted within the scope of the CENEX
showed that the regression equation for TV sets stayed valid for two to three months.
That means every two or three months a new regression has to be calculated. Therefore, a new regression sample may have to be collected every three months, following
the procedure described for the target sample.
•
Update the regression coefficients when significant changes appear.
2.5
Cost estimation
One-time effort for the introduction
•
The following steps are required for the introduction: Initial design study on data requirements; market research; reading, understanding and contemplating Handbook
advice; cognitive testing etc. of new price collector forms; initial construction with testing and assessment of regression equation; methodology reporting, approval and documentation; systems design for smooth regular production. If an initial pilot study with
extended data collection is needed, this of course requires additional resources.
•
The following steps are necessary for the operative introduction: Collection of data
for the estimation of the regression equation (incl. price collection in the shop, entry
into data sheet, online collection of characteristics).
•
All in all up to four months may be necessary for the introduction.
182
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Current effort of price collection and index calculation
•
Qualification for price collection: No expert knowledge necessary, instruction with concrete guidelines is sufficient.
•
Qualification for index calculation: academic.
•
Approximately four days may be necessary for price collection and index calculation.
2.6
Examples
Example for the identification of consumption segments:
•
In this example we assume that only one consumption segment exists.
Example for the construction of a target sample:
•
As described for the compilation of the target sample a market evaluation yielded
that three screen size strata are important to be considered:
– Tube TV sets with a screen size of ~ 14 inches (~ 36 cm).
– LCD TV sets with a screen size of ~ 20 inches (~ 51 cm).
– LCD TV sets with a screen size of ~ 32 inches (~ 81 cm).
•
In the following example, the screen size is given in centimetres. Only the proceeding
for LCD TV sets with a screen size of ~ 51 cm is described. For other TV sets proceed
analogously.
•
In the first step representative outlets have to be chosen. In our example, let us assume that there is no difference in the price development throughout regions, so that
the regional dimension is no major issue.
•
The next step is to select representative models in the chosen outlets. For the index
sample, at least 25 prices should be collected from the retailers. For the regression
sample, 100 to 300 observations are necessary.
•
Besides the price a set of characteristics has to be collected for each TV set model. The
price collection form below shows some exemplary characteristics that are deemed
to impact the price of LCD TV sets. The set of variables could be different in reality,
of course.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
183
Television sets
Chart 126
Example of a price collection form
Television, screen size ~ 51 cm, LCD, higher or medium brand
Price (€):
Brand and model description:
Screen size (cm):
Resolution:
Contrast ratio:
Reaction time:
Brightness:
Frequency:
: 50 Hz
Aspect ratio:
… 16:9
Sound system:
… Surround
Connectors:
… VGA
799.–
LG RE20LA30
51
640 x 480 pixels
45:1
8 ms
450 cd/m²
… 100 Hz
: 4:3
: Stereo
… Mono
… DVI
… HDMI
… Picture in Picture
Shop: ___________________________
Date: ___________________________
•
… Others:_________
: Others: Scart, S-video,
RCA
… HD ready
Consider in month 1, the price collector collects a number of representative TV models
for the stratum LCD TV with a screen size of ~ 32 inches. The price collection table
could look like this:
Brand
Version
Price
Brand cluster
Size (cm)
Brightness
Picture format
Contrast ratio
Sound
A
LG
RE20LA30
799
High
51
450
4:3
450:1
Stereo
1
B
Philips
20PF8846
599
High
51
450
4:3
350:1
...
...
...
...
...
...
...
...
... ...
Virtual
Surround
...
...
Model
1
Number
of pixels
t
Chart 127
Example of a price collection table for the index sample for LCD TV sets
640x480 . . .
640x480 . . .
...
...
Example for the replacement strategy:
•
The following replacement situation refers to the index sample only.
•
Suppose that in month 3 model “D” is no longer available or no longer well sold and
it is replaced by model “E” (see next chart).
•
Those replacement situations are marked with a “Q” in the price collection table.
184
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
51
...
799.– €
799.– €
699.– €
B
PHILIPS
20PF8846
51
...
599.– €
599.– €
599.– €
C
TOSHIBA
20VL33G
51
...
999.– €
999.– €
999.– €
D SEG
MONTECARLO
51
...
999.– €
799.– €
E
27LCDB03B
67
...
•
Q
Q
Brand
THOMSON
Price
in month 3
Price
in month 1
RE20LA30
Price
in month 2
...
LG
Version
A
Model
Screen size
(cm)
Chart 128
Example of the price collection table
Q
Q
899.– €
The different models exhibit different characteristics with inherent different qualities.
To match the products quality adjustment has to be applied.
Example for quality adjustment by applying hedonic re-pricing:
Topic 1: Data needed
•
The compilation of the regression sample is described in the section on the target
sample.
•
The regression sample could look like this:
Brand
Version
Price
Brand cluster
Size (cm)
Brightness
Picture format
Contrast ratio
Sound
A
LG
RE20LA30
799
High
51
450
4:3
450:1
Stereo
1
B
Philips
20PF8846
599
High
51
450
4:3
350:1
...
...
...
...
...
...
...
...
... ...
Virtual
Surround
...
...
Model
1
Number
of pixels
t
Chart 129
Example of a data base for the regression sample for LCD TV sets
640x480 . . .
640x480 . . .
...
...
Topic 2: Data editing
•
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
•
Since a double-logarithmic function form is suggested, the price, number of pixels,
contrast ratio, brightness, and the screen size are transformed by logarithm.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
185
Television sets
•
The category variables brand, connectors, picture format, sound system, outlet type
are transformed into dummy variables. These can only take values of 0 or 1 (depending on whether the model is equipped with a specific feature or not).
•
Now, one characteristic value has to be chosen as reference value for each dummy
variable.
– The reference value for the brand dummy variables is lower brand. That means, in
the case of lower brand LCD TV sets, the alternative brand dummy (higher) brand
has a value of 0.
– The reference value for the connectors variable is “scart”. That means, in a case of
a LCD TV set equipped with only a scart connector, the alternative connector dummies are 0.
– The reference value for the picture format variable is 4:3. That means, in the case
of LCD TV set equipped with a 16:9 picture format, the dummy has a value of 1.
– The reference value for the sound system variable is mono sound. That means,
in the case of LCD TV set equipped with a surround sound system, the surround
sound system dummy has a value of 1; if it is equipped with stereo sound system, the stereo sound system dummy has a value of 1.
– The reference value for the store type variable is self service store. That means,
in the case of a LCD TV set observed in a self service store, the alternative dummies have a value of 0.
t
Model
Price (ln)
Brand
Size (ln)
Picture format
Brightness (ln)
Picture format
Contrast ratio (ln)
Surround Sound
Stereo sound
...
Chart 130
Example of the modified data base of the regression sample for LCD TV sets
1
LG
6.6834
1
3.9318
0
6.1092
0
6.1092
1
0
...
1
Philips
6.3953
1
3.9318
0
6.1092
0
5.8579
0
1
...
...
...
...
...
...
...
...
...
...
...
...
...
Topic 3: Creating the hedonic function
•
The procedure of data collection for the regression sample and the subsequent creation of the hedonic function are exemplarily described in the section on quality adjustment. The coefficients of the hedonic equation can be seen in the table below.
•
In this example, all the characteristics in the table below impact the price significantly.
All other characteristics as described in the example construction of a target sample
impact the price insignificantly and are not used to calculate the values of the hedonic
function.
186
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 131
Example of the regression coefficients
Regression coefficients
b0
Constant
1.01320
b1
ln (screen size)
1.4052
b2
Picture format
0.3111
b3
Specialised store
0.2986
b4
Warehouse
0.1783
b5
Brand cluster “high”
0.2338
Topic 4: Index calculation
•
In the following table you can find four price series for three periods containing the
data that is required for the index calculation. At first, all replacement situations have
to be marked since only in those cases quality adjustment has to be applied. In this
example from month 2 to month 3 there is one replacement situation.
...
799.– €
799.– €
699.– €
B
PHILIPS
20PF8846
51
...
599.– €
599.– €
599.– €
C
TOSHIBA
20VL33G
51
...
999.– €
999.– €
999.– €
D SEG
MONTECARLO
51
...
999.– €
799.– €
E
27LCDB03B
67
...
•
Q
Q
899.– €
The index calculation from month 1 to month 2 follows the usual formula:
§ P A P B PC PD
I2 = ¨ t A=2 ⋅ t B=2 ⋅ tC=2 ⋅ tD=2
¨P
© t =1 Pt =1 Pt =1 Pt =1
•
Q
Brand
THOMSON
Price
in month 3
51
Price
in month 2
Price
in month 1
RE20LA30
Q
...
LG
Version
A
Model
Screen size
(cm)
Chart 132
Example of the price collection table
·
¸
¸
¹
1/ 4
§ 799 € 599 € 999 € 799 € ·
¸¸
= ¨¨
⋅
⋅
⋅
© 799 € 599 € 999 € 999 € ¹
1/ 4
= 0.946
The index from month 2 to month 3 contains a replacement situation. Model “D” is replaced by model “E”. For both price observations, the replaced and the replacement
model, the values of the hedonic function have to be calculated. As you know the
hedonic regression equation is a double-logarithmic function. Hence, the exponential
function is needed to obtain the values of the hedonic function which are needed for
Federal Statistical Office, Statistics and Science, Vol. 13/2009
187
Television sets
the quality adjusted price. This is done by taking the anti-log of the sum of the constant and the respective products of characteristics (or logs of the respective characteristics) and coefficients:
ht = e ( b0 + b1 ⋅ v1 + ... + b5 ⋅ v 5 )
•
To calculate the values of the hedonic function h for both models, it is necessary to
know the particular characteristics, which can be seen in the chart below:
Aspect Ratio 16:9
Specialised store
Warehouse
Brand cluster
SEG
MONTECARLO
3.9318
0
1
0
0
E
THOMSON
27LCDB03B
4.2047
1
0
1
0
•
Brand
Model
D
Model
Screen Size (ln)
Chart 133
Example of the characteristics of replaced and replacement models
Inserting the variables into the hedonic equation yields the values of the hedonic function h for the particular models:
h ( Model D )t =2 = e(1.0132 + 1.4052 ⋅ 3.9318 + 0.3111 ⋅ 0 + 0.2986 ⋅ 1 + 0.1783 ⋅ 0 + 0.2338 ⋅ 0 ) = 931.47 €
h ( Model E )t =3 = e(1.0132 + 1.4052 ⋅ 4.2047 + 0.3111⋅ 1 + 0.2986 ⋅ 0 + 0.1783 ⋅ 1 + 0.2338 ⋅ 0 ) = 1,654.15 €
•
The mathematical ratio of the values of the hedonic function from the replacement and
the replaced model yields the quality adjustment factor “g” which can be calculated
as followed (the subscript r = 1 indicates that the regression was run in month 1):
grD=,1E =
•
h ( replacement )t =3
h ( Model E )t =3 1,654.15 €
=
=
= 1.7758
931.47 €
h ( replaced model )t =2 h ( Model D )t =2
To calculate the index from month 2 to month 3 the following formula is applied again:
B
C
E
§ PA
¨ t =3 P t =3 P t =3 P t =3
⋅
⋅
⋅
I3 =¨
B
C
D
¨ PA
© t =2 Pt =2 Pt =2 Pt =2
•
1/ 4
The index contains one replacement from model “D” in month 2 to model “E” in
month 3. As hedonic re-pricing is used for quality adjustment
PtE=3
PtD=2
188
·
¸
¸
¸
¹
is replaced by
PtE=3
PtD=2 ⋅ g rD=,E1
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
The index for month 3 is calculated in the following (including the application of
hedonic re-pricing):
§ P A PB PC
PD
I3 = ¨ t A=3 ⋅ tB=3 ⋅ tC=3 ⋅ D t =3 D,E
¨P
© t =2 Pt =2 Pt =2 Pt =2 ⋅ g r =1
·
¸
¸
¹
1/ 4
§ 699 € 599 € 999 €
·
899 €
¸¸
= ¨¨
⋅
⋅
⋅
© 799 € 599 € 999 € 799 € ⋅ 1.7758 ¹
= 0.863
1/ 4
RE20LA30
51
799.– €
799.– €
699.– €
B
PHILIPS
20PF8846
51
599.– €
599.– €
599.– €
C
TOSHIBA
20VL33G
51
999.– €
999.– €
999.– €
D
SEG
MONTECARLO
51
999.– €
799.– €
E
THOMSON
27LCDB03B
67
Type
Brand
Price
in month 3
Price
in month 2
LG
Q
Price
in month 1
A
Model
Screen Size
(in cm)
Chart 134
Monthly indexes
Q
Q
899.– €
Quality adjusted prices
D
SEG
Monthly indexes
MONTECARLO
67
1 418.86 €
100.0
Federal Statistical Office, Statistics and Science, Vol. 13/2009
94.6
86.3
189
Used Cars
3
Used cars
•
As the first step, the COICOP four-digit-level: 07.1.1 Motor cars (containing motor cars,
passenger vans, station wagons, estate cars and the like with either two-wheel drive
or four-wheel drive) is divided into new cars and used cars. This section only deals
with used cars.
•
The Task Force on Quality Adjustment and Sampling did not establish an explicit classification of A-, B-, C-methods for used cars.
•
The following section deals with the application of hedonic re-pricing for used cars.
This approach is recommended for the case that no appropriate market research data
is available and the data hat to be collected on internet platforms and in car magazines.
3.1
•
3.2
Proposal for the identification of consumption segments
The identification of consumption segments is described in the non-hedonic section
for used cars.
Proposal for the construction of a target sample
•
For the application of hedonic re-pricing, usually two samples have to be collected.
The first sample is called “regression sample” which serves the calculation of the
hedonic regression equation.
•
The second sample is called “index sample” and is used to calculate the price index
for used cars. The proposal for the construction of an index sample can be found in the
non-hedonic section for used cars. The index sample may or may not be a sub-sample
of the regression sample.
•
Unlike the non-hedonic approach for used cars, only two data sources could be used:
– Offer prices on internet platforms.
– Offer prices from car magazines.
•
If data provided from market research institutes are available, it is not necessary to
apply the hedonic approach for used cars since the observed cars do not differ in age
and mileage from month to month.
•
The recommended data source for the sample collection (regarding the index sample
as well as the regression sample) is internet platforms. By entering the criteria of a
precise primary and sub-model in the screen mask the used car prices are easily observable. Besides the price further important characteristics can be collected, especialy “age” and “mileage”. The product-offers also contain additional explicit information. Keep in mind to collect only offer prices from business retailers from your
own national market. A possible internet platform for used cars is for example
www.autoscout24.xx which is available in several European countries. If no internet
platform is available, the prices and characteristics of used cars should be collected in
car magazines.
•
In the case of price collection on internet platforms and car magazines, the following
issues are of importance:
190
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
–
–
–
–
–
It is most important to refer to the same sub-model over time.
The age in months should be held constant as far as possible. In practice, there
will be variations in the precise age which have to be adjusted in the following
(see section 3.4 Proposal on quality adjustment).
The mileage should be considered roughly. That means the mileage should be of
the same dimension from period to period. The variations in mileage also have
to be adjusted in the following.
It is very important not to consider product-offers from private persons, cars that
have been involved in an accident and cars with a lot of obvious extra equipment
or modifications.
If there is a large range of product-offers that fit all the mentioned criteria, any concrete product-offer among the price variance may be selected.
The regression sample
•
To apply the hedonic method, a regression sample has to be collected. This sample is
needed for the calculation of a systematic relationship between the prices of the used
cars and their particular characteristics. Resulting from this relationship the hedonic
equation can be established.
•
The regression sample can be collected according to the same procedure as the index
sample. In the regression period, the index sample may also be a sub-sample of the
regression sample or it can be independent from the regression sample.
Discussion box
Regarding the regression sample, it is suggested that the data refer to one
particular month only.
Alternatively, a regression sample with pooled data which refer to observations
from several previous months might be possible. An advantage is that seasonal
effects are avoided which otherwise might appear on the estimates. Furthermore
pooling the data alleviates the burden of data collection in order to have enough
reliable observations for the regression. Besides that, the “depreciation factor”
one is estimating can be seen as an average one, not specific to a particular
month.
However there are some counter arguments against the approach to use pooled
data. Especially the assumptions that there is no price change over time and the
parameters are constant over periods might be a serious issue.
Minimum size of the regression sample
•
The sample sizes of the regression and the index sample should be established country-specifically since the number of observations depends on the number of generated
consumption segments and age classes.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
191
Used Cars
•
The regression sample should contain at minimum 135 observations. This minimum
sample size results from the number of characteristics that are used in the regression
equation (see section on quality adjustment). As a rule of thumb, 15 to 20 observations per characteristic which is finally included in the regression equation should be
collected.
•
The regression sample should contain all the selected primary models from the different consumption segments. There is only one regression sample for all consumption
segments.
3.3
Proposal for the replacement strategy
General remarks
•
Please consider that the replacement strategy only refers to the index sample. The regression sample is a cross-sectional sample where no “replacements” can occur.
•
The proposal for the replacement strategy is the same as for the non-hedonic approach which is already described in the non-hedonic section for used cars. Only regarding the selection of the replacement model, the restriction holding the characteristic values age and mileage constant can be relaxed.
3.4
Proposal for the quality adjustment
•
A distinction of minor and major changes helps to decide whether an explicit or an implicit quality adjustment is necessary. Both, major and minor changes require a quality
adjustment. In general, major changes in the quality of used cars are too complex for
an explicit quality adjustment whereas minor changes can be adjusted explicitly.
•
Minor changes in quality of used cars can only occur in case of observing the same
sub-model over time:
– changes in the equipment of a precise sub-model,
– the age of the car (in months) differs,
– the mileage of the car differs.
•
Major changes in the quality of used cars are:
– the change of the primary model,
– the introduction of a new generation of the primary model (the old version of the
primary model is followed by a new version),
– the change of the sub-model.
Quality adjustment in the case of major changes: Bridged overlap
•
In the case of major changes in quality, no explicit quality adjustment is possible since
the replaced and the replacement product are in principle not comparable at all.
•
If major quality changes occur, bridged overlap shall be applied. The price development of all other models of the same consumption segment and age class which were
not replaced build the bridge between the replaced and the replacement model. In
this way the average price development of a set of comparable models is used.
192
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Quality adjustment in the case of minor changes: Hedonic re-pricing
•
Minor quality changes in the equipment of a precise sub-model can not be adjusted
explicitly since the value of a singular equipment feature which is a component of the
used car is unknown. The depreciation rate of equipment features in a used car is different of the depreciation rate of the whole car itself. Therefore it is not possible to adjust for changed equipment which exhibits a used condition. In order to alleviate the
problem the equipment should not significantly differ from month to month. In practice this means, the observed used car should not contain an amount of additional
equipment features or any modifications. Direct comparison should be applied.
•
Changes in the age (in months) or in the mileage (in kilometres) of the car should be
adjusted by hedonic re-pricing.
•
Note that those variations in age and mileage between two periods can only occur if
the used cars prices are collected on internet platforms or in car magazines.
Chart 135
Example for replacement situations
26 months,
30,000 km
Product specification:
VW Golf V
“Medium-sized used
cars”
Consumption segment
24 months,
25,000 km
M
•
M
•
Age: ~ 2 years
Mileage: ~ 30.000 km
VW Golf VI 1.9 TDi Trendline
M
•
•
Age: ~ 2 years
Mileage: ~ 30.000 km
Further selected models
•
•
...
Minor change (age):
Hedonic re-pricing
Age: ~ 2 years
Mileage: ~ 30.000 km
Major change (new sub-model):
Bridged overlap
Federal Statistical Office, Statistics and Science, Vol. 13/2009
193
Used Cars
Discussion box
It might seem paradox that minor changes are treated with explicit quality adjustment whereas major changes are treated implicitly. However, the car market implicates very complex technological developments which complicate the explicit
measurement of quality changes over time. This applies especially to used cars,
where the constitution of the product appears as a further dimension of quality.
It may be argued that applying bridged overlap for major changes implicate the
risk of a so-called „vintage effect”: Cars of older model types tend to decrease in
price more than models which are at the beginning of their product-life-circle.
This is especially relevant for the case that an old version of a primary model is
followed up by a new version (generation). This could result in a downward bias
when indexes for these models are just linked in using bridged overlap.
But it seems that the assumptions of bridged overlap are met very well in the case
of used cars since the used car market is very competitive, transparent and the
transaction prices are (nearly) equilibrium prices. Therefore, the risk of vintage effects may be small.
•
As stated in the section on the construction of the target sample, for the application
of hedonic re-pricing it is necessary to collect two data sets. In order to calculate the
hedonic equation a comprehensive database is necessary which is called “regression
sample”. The hedonic equation is used to apply quality adjustment in the case of replacement situations which can only occur in the index sample. In the following sections the individual steps are described in detail.
Topic 1: Data needed
•
The compilation of the regression sample as well as of the index sample is described
in the section on the target sample.
•
In order to ensure a reliable regression, additional variables have to be added which
specify the selected primary models:
– brand dummies (for brand cluster),
– size dummies (for size cluster).
•
The variable “brand cluster” can exhibit a value of 1, 2 or 3 and divides the selected
primary models according to the prestige or image of the corresponding make. This
variable serves as a proxy variable for the overall quality and the prestige of the models in order to reflect the difference between makes, like Citroën (brand cluster = 3),
Ford (2) and Mercedes-Benz (1).
•
The variable “size class” can exhibit a value of 1, 2 or 3 and divides the selected primary models within one consumption segment according to the size of corresponding the cars. This variable is important since the models within one consumption segment are still heterogeneous with regards to their size. This variable should reflect the
relation of the sizes of the considered primary models within the consumption segment. For orientation the length of the car can serve which can be obtained from
194
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
www.wikipedia.com for each primary model. In the consumption segment “small used
cars”, this variable serves for the distinction of e.g. a VW Lupo (size class = 1) and a
VW Polo (size class = 3). In the consumption segment “large cars”, this variable
serves for the distinction of e.g. a VW Passat (size class = 1) and a BMW 7 series (size
class = 3). The variable CS is a dummy variable that indicates the consumption segment.
•
The regression sample could look like this:
Brand cluster
Size cluster
Mileage
(1,000 km)
Age (months)
Price (€)
VW
Golf V
1.9 TDi
Trendline
Med.
2
2
42,400
26
13,400
5-2008
MercedesBenz
C-class
180 classic
Med.
1
3
45,800
22
18,790
...
...
...
Month
...
...
Sub-model
7-2008
Brand/Make
CS
Primary model
Chart 136
Example of a data base of a regression sample for used cars
...
Topic 2: Data editing
•
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
•
Reformat category variables to dummy variables. These can only take values of 0 or 1
depending on whether the model contains the characteristic value or not. This is relevant for the variables “CS”, “brand_cluster” and “size_cluster”.
•
As a semi-logarithmic function form is applied. Therefore, the numeric variable price
is transformed by logarithm prior to the regression.
•
The variable mileage should be transformed into a relative mileage per year.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
195
Used Cars
CS 2
CS 3
CS 4
Brand cluster 1
...
ln Price
Age (months)
Relative
mileage
(1,000 km p.a.)
0
1
0
0
0
...
9.503
26
19.570
. . . C-class
180 classic
0
1
0
0
1
...
9.841
22
24.981
...
...
...
...
...
...
...
...
...
...
...
...
Sub-model
1.9 TDi
Trendline
Primary model
. . . Golf V
...
CS 1
Chart 137
Example for the modified data base of the regression sample for used cars
Topic 3: Creating the hedonic function
•
After the data has been arranged, the hedonic regression equation can be set up. In
this regard, the price (as dependent variable) should be explained as a function of the
model’s characteristics (independent variables).
•
Statistical programs like SPSS, Stata or SAS are helpful to calculate the regression.
The computer program is set up to perform a linear regression analysis on the data of
the regression sample. The price is to be considered as the “dependent” variable and
the characteristics (age, brand dummies, size dummies, etc.) are considered as “independent” variables.
•
With regard to the general function form of the hedonic regression equation a semilogarithmic function form is suggested. Only the logarithm of the price is taken. If all
the mentioned variables are considered the hedonic regression equation would look
like this:
ln P = b0 + b1 ⋅ age + b2 ⋅ rel. mileage + b3 ⋅ size _ 2 + b4 ⋅ size _ 3 + b5 ⋅ brand _ 2 +
+ b6 ⋅ brand _ 3 + b7 ⋅ CS2 + b8 ⋅ CS3 + b9 ⋅ CS4 + ε
•
The hedonic regression equation contains two continuous variables, age and mileage,
and dummy variables for the consumption segments, the brand clusters and the size
clusters.
•
The category variables brand_cluster and size_cluster might alternatively be treated
as continuous variables since it is possible to treat an “ordinal scale” as if it was an
“interval scale”. The advantage might be to bring down the number of coefficients to
be estimated. The assumption is that there exists an (hypothetical) linear relationship
between the different states of each one of the two categorical variables. For the
brand_cluster, for example, this means, one is assuming the difference between
brands 1 and 2 is exactly the same as the difference between brands 2 and 3.
•
The reference size class is size class 1, the reference brand cluster is brand cluster 1
and the reference consumption segment is consumption segment 1.
196
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
There should be only one common regression equation for the complete used car regression sample (containing all different consumption segments).
•
The computational method used by the computer program should be the ordinary
least squares (OLS).
•
The statistical program computes the coefficients (corresponding to the particular
characteristics) and further statistical values which contain information on the significance of the variables.
•
The regression may yield the following coefficients:
Chart 138
Example for the results of the hedonic regression (fictitious)
Adj. R² = 0.70
Regression coefficients
t-value
b0
Constant
b1
Age
– 0.010
9.646
<–2
>2
b2
Relative mileage
– 0.008
<–2
b3
Size class 2
0.130
>2
b4
Size class 3
0.205
>2
b5
Brand cluster 2
– 0.060
<–2
b6
Brand cluster 3
– 0.217
<–2
b7
CS2
0.229
>2
b8
CS3
0.494
>2
b9
CS4
0.424
>2
•
The variables that were deemed relevant to quality adjust for have been collected as
discussed under Topic 1. These were included into the regression equation. Potentially, some of the considered variables might not be significantly related to the price.
Those “insignificant” variables could be identified by their t-values. At the design
stage when the regression equation is tested, variables with t-values tending to be
within the interval of – 2 to + 2 could be considered for skipping from the regression
equation to be used operatively. One may have to be a little careful here, and some
skilled judgement is called for. It should be noted that the t-test is approximate, as
strictly seen it depends on assumptions that are usually not under perfect control.
•
Particularly in the testing of the regression equation at the design stage, it should be
checked whether the algebraic signs of the coefficients are as expected in view of the
meaning of these. If a “wrong” sign occurs, the possible cause should be searched:
Whether it is just expected random variation, or a misspecification of the variable, or
contaminated data, or something else. Possibly a variable may then be identified as
problematic and ultimately have to be skipped from the regression equation. But
DON'T skip a variable automatically as soon as a “wrong” sign turns up, as doing so
may just sweep problems under the carpet and produce invalid results.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
197
Used Cars
•
In the table presented above, the variables age and relative mileage have a negative
impact on the price. This seems to be highly reasonable. The size class variable exhibits a positive coefficient. This also seems to be reasonable since this indicates a higher
price for larger cars. The negative algebraic sign of the coefficient for brand clusters
also meets the expectations since prestigious brands are indicated with a variable
value of one and other brands with values of two or three.
•
The particular regression coefficients can be interpreted as follows: The coefficient for
a continuous variable like age (b1 = – 0.010) indicates that increasing age of one month
results in an expected price decrease of approximately – 1.0 %.
•
In order to assess the adequacy of the regression estimation, consider the adjusted R²
value. For used cars this value should probably be expected to be around 0.5 – 0.6 or
higher, as in principle a persistent relation of technological characteristics and the
price should be expected. An R² very close to 1 should not normally occur and may indicate severely contaminated data. On the other hand, an adjusted R² considerably
below 0.5 indicates a relatively low explanatory ability of the regression equation. A
possible cause may be that the sample is taken from a fairly narrow and homogeneous stratum, where large characteristics differences do not occur. If so, the results may
possibly be useful anyhow, in spite of the low R²; here some skilled judgement may be
called for. Another possible cause for a low R² may be that some essential characteristics variable has been left out and would have to be included in the regression equation.
Topic 4: Index calculation
•
In the following the index sample is considered.
•
At first those situations where a replacement takes place including a change in the
precise age or mileage have to be detected. Those changes in the characteristic values
constitute quality differences have to be adjusted as described in the following. In the
case of used cars which are observed on internet platforms and car magazines this will
be the usual case since the variables age and mileage might change from observation
to observation.
•
In each replacement situation the values of the hedonic function h for the replaced
and the replacement model has to be calculated. Therefore, insert the variable values
of the respective model in the hedonic equation.
•
The coefficients of the hedonic equation have been calculated in Topic 3 and the variable values of the respective models are collected within the scope of price collection.
•
In the next step, the quality adjustment factor has to be computed. The quality adjustment factor (“g”) is the relation of the values of the hedonic function of replaced and
replacement model. In other words, it is the mathematical ratio of quality for the replacement and the replaced model and refers to only this particular replacement situation:
g=
198
h ( X tB )
h ( X tA−1 )
=
Market value of the replacement
Market value of the replaced model
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
This quality adjustment factor is used to adjust the quality of the replaced model by
multiplying its price with the quality adjustment factor:
Pt A−1 ⋅ g = quality adjusted price of the replaced model
•
To calculate the quality adjusted price development, divide the price of the replacement product by the quality adjusted price of the replaced product:
PtB
Pt A−1
•
⋅g
=
observed price of the replacement
observed price of the replaced model ⋅ g
For the index calculation, insert the quality adjusted price development in cases of replacements instead of the normal term. Here model A has been replaced by model B:
§ PB
PC
PD PE
PF
I = ¨ At =1 ⋅ tC=1 ⋅ tD=1 ⋅ tE=1 ⋅ tF=1
¨P ⋅g P
t =0 Pt =0 Pt =0 Pt = 0
© t =0
·
¸
¸
¹
1/ 5
, with
PtB=1
Pt A=0
being replaced by
PtB=1
Pt A=0 ⋅ g
Topic 5: Refreshing the hedonic function
•
Regarding the periodical refreshing of the hedonic function, no general rule can be recommended.
•
As a pragmatic proposal it seems to be sufficient to re-estimate the hedonic equation
yearly as there is no rapid technological development from month to month in existing
old cars. This proposal has a practical component since many data bases (e.g. the figures of the ownership changes) might be updated yearly.
•
Furthermore a person or a group with product knowledge should be assigned, to monitor the market for signs of the necessity for updating the hedonic function. In the case
that there are hints for a need to refresh the hedonic function, a new regressions should
be compiled. Otherwise, an annual re-estimation seems to be sufficient.
3.5
Cost estimation
One-time effort for the introduction
•
A staff member with moderate knowledge of econometric methods is required to implement the method. Specification of the regression equation is basically standard
procedure in this particular market and not much can be changed or tested. Most work
will be in software development and creation (if needed) and management of new databases.
•
Collection of data for the estimation of the regression equation: about 1 to 3 manmonths, depending on the quality and availability (or access to) of the new databases.
An independent survey on European NSI was conducted and, on average, 4 manmonths are generally suggested as necessary to implement the method. If there are
members with econometric skills, the estimated time needed drops to 2 months; on
the contrary, if there is none expert on the methods, the estimated time needed increases to 6 months.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
199
Used Cars
Current effort of price collection
•
No expert knowledge seems necessary. Price collection is still an issue to be discussed
regarding this particular market, especially due to wide cultural differences among
European countries. Also, no transaction prices are usually available so the quality
of proxy variables should be thoroughly checked and only when deemed good should
they be used.
•
It depends primarily on data sources. Price collection can be done either centrally or
by standard procedures. Central collection can be done quite rapidly, 1 man-day
should be enough. In what concerns standard price collection, 3 man-days seems a
good estimate. These estimates are based on a survey.
Current effort of index calculation
•
The price statistician should have moderate econometric knowledge when setting up
the system. On a regular production system, any price statistician is able to handle it
(under an automatic or semi-automatic process). If the index calculation is done manually, then the price statistician should have some econometric skills.
•
As long as the databases are available, the index calculation is quite fast. Around 4 to
8 man-hours in an automatic environment (in order to check inputs and results) might
be sufficient. Multiply by a factor of 3 if done manually.
3.6
Examples
Example: Generation of consumption segments
•
In the first step, the used car market has to be divided into different consumption segments according to the size of the car. This step should correspond to the procedure
for new cars where the pragmatic proposal is to define four consumption segments:
– small used cars,
– medium-sized used cars,
– large used cars,
– special used cars.
•
As in the case of new cars, the universe of primary models consists of approximately
300 cars in each country. Each of these primary models has to be assigned to the
identified consumption segments. The distinguishing criterion is the size of the car.
– The data on the particular primary models that are registered in the country can
be obtained from the national central car register.
200
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 139
Example for the assignment of concrete models to the consumption segments
Central car register
Registrations
per year
Primary models
Small used cars
Medium-sized used cars
Large used cars
Special used cars
•
VW Golf
...
Opel Corsa
...
Mercedes-Benz SLK
...
Mazda 323
...
...
...
As a result, each consumption segment consists of precise primary models.
Chart 140
Consumption segments and the corresponding primary models
Consumption Segments
Small used cars
Primary
model
Ownership
changes
Medium-sized used cars
Primary
model
Ownership
changes
Large used cars
Primary
model
Ownership
changes
Special used cars
Primary
model
Ownership
changes
Opel Corsa
26,082
VW Golf
73,692
MercedesBenz
C-class
18,631
Opel Zafira
14,620
VW Polo
16,332
Opel Astra
36,889
BMW 5
series
16,247
VW Caravelle
8,305
Renault
Twingo
11,094
MercedesBenz
C-class
34,851
Audi A6
12,937
VW Sharan
7,162
...
...
...
...
...
...
...
...
sum
192,684
sum
440,878
sum
73,504
sum
124,923
•
The consumption segments have to be weighted in accordance to their expenditure
share.
•
To obtain the total expenditure of a consumption segment, multiply the total number
of ownership changes from the previous year for the particular consumption segment
by its average price.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
201
Used Cars
–
The total number of ownership changes for each consumption segment can be
taken from the central car register. The average price for a consumption segment
can be obtained from the section on price collection below. It should refer to one
specific arbitrary age class which has to be the same for all consumption segments.
Chart 141
Calculation of expenditure weights for the consumption segments
Weighting the consumption segments
Small used cars
Total number
Medium-sized used cars
Average
Total number
of ownership
price
changes
(in €)
192,684
7,500
Large used cars
Average
Total number
of ownership
price
changes
(in €)
440,878
10,000
Special used cars
Average
Total number
Average
of ownership
price
of ownership
price
changes
(in €)
changes
(in €)
73,504
15,000
124,923
12,500
Total expenditure (in billion €):
1.44
4.40
1.11
1.56
Total expenditure of all consumption segments: 8.51 billion €
Expenditure share (in %):
17.0
51.8
12.9
18.3
Example for the construction of a target sample
•
The following example refers to the index sample.
•
(1) Stratify the universe of ownership changes by age classes:
– Two or three age classes should be generated following “peaks” of ownership
changes in the national market. Therefore, the distribution of ownership changes
according to the age of the cars should be checked. In Germany, the number of
ownership changes according to age exhibits two specific “peaks” in one- and
three-year-old cars. For the German market, it would be recommendable to generate these two age classes.
202
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 142
Number of ownership changes for different ages of the car
Number of
ownership
changes
Selected age classes
1
–
•
3
5
Age (in years)
The more general pragmatic proposal is to form three age classes: Two-year-old,
three-year-old and four-year-old cars within each consumption segment. In the
following this recommendation will be considered.
(2) Select primary models by cut-off sampling (probability sampling is also possible):
– In each consumption segment and age class, include all primary models until a
cumulative market share of 50 % to 60 % is reached.
– This means to draw a sample for the recommended twelve segments: Four consumption segments each contain three age classes.
– An example for the consumption segment “medium-sized used cars” and the age
class of two-year-old cars:
Chart 143
Exemplary target sample for the consumption segment “medium-sized used cars”
Target sample for “medium-sized used cars”and two-year age class
Primary model
Ownership changes
(from the previous year)
Market
share
Cumulative
market share
in %
VW Golf
32,040
16.7
16.7
Opel Astra
16,039
8.4
25.1
Mercedes-Benz C-class
15,153
7.9
33.0
BMW 3 series
14,586
7.6
40.6
Ford Focus
14,028
7.3
47.9
Audi A4
13,097
6.8
54.7
Federal Statistical Office, Statistics and Science, Vol. 13/2009
203
Used Cars
–
•
The selected primary models represented in the table make up 54.7 % of the market for “medium-sized used cars” within the considered age class. In this example quantity shares are applied for sampling, but if available it would be better to
apply expenditure shares.
Assignment of precise sub-models by purposive sampling
– The selected primary models have to be implicitly weighted according to the relation of their sales figures. The proposal is collect 25 to 30 price observations per
consumption segment which spread over the three age classes. This results in a
sample size of approximately 10 per stratum. Now it is to determine how many
price observations to collect for each primary model.
– The distribution of the price observations might look like this (implicit weighting):
Chart 144
Number of observations to be included
Distribution of price observations
“medium-sized used cars” and two-year age class
Primary model
Market share
Share in the
sample (ratio)
in %
in %
Number of price
observations
16.7
Æ
30.5
3
Opel Astra
8.4
Æ
15.4
1
Mercedes-Benz C-class
7.9
Æ
14.4
1
BMW 3 series
7.6
Æ
13.9
1
Ford Focus
7.3
Æ
13.3
1
Audi A4
6.8
Æ
12.5
1
54.7
Æ
VW Golf
Total
–
–
–
204
100
8
In this example, three sub-models for the primary model VW Golf have to be selected in order to reflect its market share of the consumption segment “mediumsized used cars” within the age class of two-year-old cars.
The sub-model selection has to be conducted by purposive sampling according
to the following criteria:
– Should be a well sold sub-model within the range of the corresponding primary model.
– Should be of the least extensive equipment level.
– Must not be a special edition.
The following chosen sub-models meet these criteria. Additionally, the approximate ratio of diesel and petrol engines is considered.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 145
Precisely selected sub-models for the primary model “VW Golf”
Sub-model selection for VW Golf V
Observation
•
Sub-model
1
Golf V 1.6 Trendline
2
Golf V 1.9 TDi Trendline
3
Golf V 2.0 TDi Trendline
Collect offer prices for the chosen sub-models on internet platforms and in car magazines. The price collection is described in the section on the construction of the target sample.
Example: Replacement strategy
•
Representativity check within the scope of the monthly price collection:
– The price collector (product expert) should refer to the selected sub-models of the
previous month. In the case that a specific sub-model of a precise age is no longer
observable it should be replaced. Maybe, the considered primary model is followed by a new version: For example, the follow-up model for VW Golf IV is the VW
Golf V.
•
Representativity check for the periodical renewal of the sample.
–
The target sample should be updated regularly, maybe yearly, depending on the
publication of the statistics of the ownership changes by the central car register.
Example: Quality adjustment applying hedonic re-pricing
•
It is important to distinguish between minor and major changes
•
Major changes can occur in three situations:
– Change of the primary model due to a former production stop.
– The old version of a primary model is followed up by a new generation: e.g. VW
Golf IV is followed by a VW Golf V.
– The sub-model is replaced since it is no longer available.
•
Minor changes:
– Small variations in the equipment of the used cars.
– The product observations of a particular sub-model differ in age and/or in mileage.
•
Differences in the equipment of the used cars can not be considered.
•
Variations in age and mileage of the corresponding used cars might be the usual case
since prices are observed on internet platforms or car magazines. In this case, hedonic
re-pricing is applied.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
205
Used Cars
Example for quality adjustment by applying hedonic re-pricing:
Topic 1: Data needed
•
The compilation of the regression sample is described in the section on the target
sample.
•
The regression sample could look like this:
Brand cluster
Size cluster
Mileage
(1,000 km)
Age (months)
Price (€)
VW
Golf V
1.9 TDi
Trendline
Med.
2
2
42,400
26
13,400
5-2008
MercedesBenz
C-class
180 classic
Med.
1
3
45,800
22
18,790
...
...
...
Month
...
...
Sub-model
7-2008
Brand/Make
CS
Primary model
Chart 146
Example of a data base of a regression sample for used cars
...
Topic 2: Data editing
•
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
•
Since a semi-logarithmic function form is applied, the price is transformed by logarithm.
•
The category (alphanumerical) variable consumption segment is transformed into dummy variables. It can only take values of 0 or 1 depending on whether the model fulfils the characteristic or not.
CS 2
CS 3
CS 4
Brand cluster 1
...
ln Price
Age (months)
Relative
mileage
(1,000 km p.a.)
0
1
0
0
0
...
9.503
26
19.570
. . . C-class
180 classic
0
1
0
0
1
...
9.841
22
24.981
...
...
...
...
...
...
...
...
...
...
...
206
...
Sub-model
1.9 TDi
Trendline
Primary
model
. . . Golf V
...
CS 1
Chart 147
Example for the modified data base of the regression sample for used cars
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Topic 3: Creating the hedonic function
•
The procedure of data collection for the regression sample and the subsequent creation of the hedonic function is exemplarily described in the section on quality adjustment. The coefficients of the recent hedonic equation can be seen in the table below
again:
Chart 148
Example for the results of the hedonic regression (fictitious)
Adj. R² = 0.70
Regression coefficients
t-value
•
b0
Constant
9.646
>2
b1
Age
– 0.010
<–2
b2
Relative mileage
– 0.008
<–2
b3
Size class 2
0.130
>2
b4
Size class 3
0.205
>2
b5
Brand cluster 2
– 0.060
<–2
b6
Brand cluster 3
– 0.217
<–2
b7
CS2
0.229
>2
b8
CS3
0.494
>2
b9
CS4
0.424
>2
All the characteristics in the table above are deemed to impact the price signifycantly.
Topic 4: Index calculation
•
The example provided here focuses on the index calculation.
•
The following chart shows a replacement situation from the month 1 to the month 2
which includes a change in the characteristics age and mileage. In those cases quality
adjustment should be applied. The replacement situation from a VW Golf V to the VW
Golf VI constitutes a future case where the next generation of the primary model is
becomes relevant. In this case, bridged overlap should be applied.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
207
Used Cars
Chart 149
Example for replacement situations
24 months,
25,000 km
26 months,
30,000 km
“Medium-sized used
cars”
Consumption segment
Product specification:
A
M
VW Golf VI 1.9 TDi Trendline
• Age: ~ 2 years
• Mileage: ~ 30.000 km
Further selected models
...
t=1
Hedonic re-pricing
•
VW Golf V
• Age: ~ 2 years
• Mileage: ~ 30.000 km
A
•
•
t=2
Age: ~ 2 years
Mileage: ~ 30.000 km
t=3
Bridged overlap
Regarding the replacement from month 1 to month 2, the values of the hedonic function have to be calculated by applying the recent hedonic function for the observations, both, the replaced and the replacement model. As you know the hedonic regression equation is a semi-logarithmic function. Hence, the exponential function is
needed to obtain the values of the hedonic function which are needed for the calculation of the quality adjusted price. This is done by taking the anti-log of the sum
of the constant and the respective products of characteristics and coefficients:
ht = e ( b0 + b1 ⋅ v1 + ... + b9 ⋅ v 9 )
•
208
To calculate the values of the hedonic function for both models, it is necessary to
know the particular characteristics, which can be seen in the following table.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Opel
Astra H
1.6
1
MercedesC-class
Benz
...
2
VW
...
•
...
...
Golf V
...
180 edition
...
1.9 TDi
Trendline
...
...
Price (€)
1
Age (months)
1.9 TDi
Trendline
Mileage
(1,000 km)
Golf V
Size cluster
VW
Brand cluster
1
CS
Sub-model
Primary
model
t
Brand/Make
Chart 150
Example of the index sample
Med.
2
2
30,000
24
13,000
2
2
33,000
25
11,200
Med.
1
3
28,000
23
16,000
...
...
...
...
...
...
Med.
2
2
38,000
26
12,000
...
...
...
...
...
...
The table containing the transformed characteristics might look like this:
Primary model
CS 1
CS 2
CS 3
CS 4
Brand cluster
Size cluster
Age (months)
ln Price
Golf V
1.9 TDi
Trendline
0
1
0
0
2
2
15
24
9.4727
2
Golf V
1.9 TDi
Trendline
0
1
0
0
2
2
17.538
26
9.3927
...
...
...
...
...
...
...
...
...
...
Sub-model
t
1
...
•
Relative mileage
(1,000 km p.a.)
Chart 151
Example of the characteristics of replaced and replacement model
...
Inserting the variable values into the recent hedonic equation yields the values of
the hedonic function for the particular models:
htA=1 = e ( 9.646 + ( −0.010 ) ⋅ 24 + ( −0.008 ) ⋅ 15 + 0.130 − 0.060 + 0.229 ) = 14,544.92 €
htA=2 = e( 9.646 + ( −0.010 ) ⋅ 26 + ( −0.008 ) ⋅ 17.538 + 0.130 − 0.060 + 0.229 ) = 13,974.61 €
•
The mathematical ratio of the values of the hedonic function h of the replacement
and the replaced model yields the quality adjustment factor “g” which can be calculated as follows (the subscript r = 1 indicates that the regression was run in month 1):
Federal Statistical Office, Statistics and Science, Vol. 13/2009
209
Used Cars
grA=,t1=2 =
•
htA=2 13,974 .61€
=
= 0.9608
htA=1 14,544.92 €
To calculate the index from month 1 to month 2 the following formula is applied:
§ P A PB PC
I2 = ¨ t A=2 ⋅ t B=2 ⋅ tC=2
¨P
© t =1 Pt =1 Pt =1
•
·
¸
¸
¹
1/ 3
Suppose, for each observed model the variables age and relative mileage change
from month to month. As hedonic re-pricing is applied for quality adjustment, for each
model the price development has to be calculated adjusting for the difference in the
particular variable values. For the price development of the model A (VW Golf 1.9 TDi
Trendline) this means:
Pt A=2
Pt A=1
•
is replaced by
Pt A=2
Pt A=1 ⋅ g rA=,t1=2
The price index for the particular sub-model would be calculated as:
I tA=2 =
•
Pt A=2
Pt A=1
⋅ g rA=,t1=2
=
12,000 .00 €
= 0.9607
13,000 .00 € ⋅ 0.9608
The index for month 2 is calculated in the following (including the application of
hedonic re-pricing):
§
PA
PB
PC
I2 = ¨ A t =2A,t =2 ⋅ B t =2B,t =2 ⋅ C t =2C ,t =2
¨P ⋅g
Pt =1 ⋅ g r =1 Pt =1 ⋅ g r =1
© t =1 r =1
·
¸
¸
¹
1/ 3
Discussion box
For various reasons used cars is a tricky topic in the price index and may have deceptive pitfalls. Reservation must therefore be made that optimal advice may not yet
have been found.
210
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
4
Computers
•
The Task Force on Quality Adjustment and Sampling rated hedonics as an A-method.
100 % option pricing (for desktops), supported judgement as a proxy to 100 % option pricing, and bridged overlap (applied under certain conditions) were assessed as
B-methods.
•
The empirical studies of the CENEX members showed that hedonic re-pricing is more
suitable than bridged overlap for desktop computers, whereas both methods yield
similar results for notebooks.
•
The following section deals with the application of hedonic re-pricing for both desktops and notebooks.
4.1
•
4.2
Proposal for the identification of consumption segments
The suggested consumption segments are described in the non-hedonic section for
computers.
Proposal for the construction of a target sample
•
The proposal for the sampling procedure is valid for both the regression and the index
sample. It is already described in the non-hedonic section for computers.
•
For the application of hedonic re-pricing, usually two samples have to be collected.
The first sample is called “regression sample” which serves the calculation of the
hedonic regression equation. The second sample is called “index sample” and is used
to calculate the price index. These two samples need to be collected for each desktops
and notebooks separately.
•
For each of desktops and notebooks, it is suggested to compile one (representative)
regression sample every month. Therefore, there is no reason to compile a separate
index sample. In this case, the regression sample can also be used as index sample
using all collected information cost effectively.
The regression sample
•
To apply the hedonic method, a regression sample has to be collected. This sample
is needed for the calculation of a systematic relationship between the prices of the
desktops and notebooks respectively and their particular characteristics. Resulting
from this relationship the hedonic equation can be established.
•
The regression sample can be collected according to the same procedure as the index sample which is described in the non-hedonic part. The index sample may also
be a sub-sample of the regression sample or it can be independent from the regression sample.
•
To find out which characteristics are relevant to quality adjust computers for, if possible try several approaches:
– Use publicly available information from advertising, shop displays, computer magazines etc.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
211
Computers
–
–
–
Use experience from other national statistical offices, which may be further ahead
in implementation of hedonics for computers, or from other research.
Consult product experts in the computer business – but remember that the consumer perspective is what is relevant, not engineering effort or so as such.
A more ambitious approach could be to conduct a pilot study with a rather broadminded collection of characteristics. But do not do that if you do not already have
a rather clear mind about what you are looking for, as else you would likely just
waste your and respondents’ resources.
•
For desktops and notebooks different sets of characteristics need to be collected.
However, the same set of characteristics needs to be collected for each desktop, regardless of whether it belongs to the regression sample or to the index sample. Also
for each notebook the same set of characteristics has to be collected regardless of
which sample it belongs to.
•
It is important not to consider those variables that reflect fashion. This could be for
example the colour of a notebook.
•
The following characteristics may serve as orientation for desktops:
– Brand (e.g. Dell, LG, none, . . .).
– Processor type (e.g. Intel Pentium4, AMD Athlon, . . .).
– Processor speed (in GHz).
– Hard-disk drive capacity, abbreviated: HDD (in GB).
– Working memory, abbreviated: RAM (in GB).
– Operating system (Windows XP, Linux, none, . . .).
– Other software (MS Office, . . .).
– Type of drive (CD-rewriter, DVD player/rewriter, . . .).
– Graphical card.
– ...
•
The above mentioned characteristics only serve as orientation. More variables may
be important, such as soundcards or TV cards which may be included in the desktop
computer.
•
For notebooks, the same set of characteristics can be collected. Additionally, two
very important characteristics are the size and type of the display, as the display of
a notebook is one of the most price-determining characteristics. Therefore, the following characteristics may serve as orientation for notebooks:
– Type of display (SVGA, VXA, . . .).
– Size of the display.
– Brand (e.g. Dell, LG, none, . . .).
– Processor type (e.g. Intel Pentium4, AMD Athlon, . . .).
– Processor speed (in GHz).
– Hard-disk drive capacity, abbreviated: HDD (in GB).
– Working memory, abbreviated: RAM (in GB).
– Operating system (Windows XP, Linux, none, . . .).
– Other software (MS Office, . . .).
212
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
–
–
–
Type of drive (CD-rewriter, DVD player/rewriter, . . .).
Graphical card.
...
•
Given the rapid technological change of computers, new features are included with
certain regularity. This is not so frequent that it would be necessary to select a new
set of relevant characteristics every month, but the price statistician should be aware
of the possibility and at least check this once a year.
•
Whenever a new characteristic (say, a new kind of processor or a new drive) is introduced, the price statistician should judge whether the new characteristic is important
enough to be included. Therefore, a continuous market observation is necessary.
•
Of course, new characteristics can only be used to adjust for quality differences if this
characteristic is collected in both the price reference month and the current month.
•
An appropriate form which could be distributed to the price collectors should precisely
ask for the considered characteristics and might look like this:
Chart 152
Example of a price collection form
Desktop Computer
Price (€):
Brand and model description:
Processor type
Processor speed (GHz):
HDD capacity (GB):
Working memory (GB):
Type of working memory
Operating system
Further pre-installed software
Type of drive
Number of USB interfaces
Graphic card
Graphic memory (MB):
TV card
Sound card
Shop: _____________________________
Date: _____________________________
•
299.–
M&M Value HD
AMD AM2 Athlon64X2 5200 + EE
2.6
320
2.048
DDR2
Windows Vista
Antivirus
Dual Format DVD writer
6
ASUS EAH 3450 HTP
256
X
X
For the estimation of the regression equation it is desirable that price observations
used reflect a state of equilibrium (balance between supply and demand). That means
that if it is practically possible, it would normally be preferable that temporarily re-
Federal Statistical Office, Statistics and Science, Vol. 13/2009
213
Computers
duced prices due to promotion offers or stock clearances can be identified in the price
collection and not used for estimating the regression equation. (However, note that
in the index calculation, on the other hand, reduced prices shall not be excluded.)
Minimum size of the target sample
•
The regression samples should contain 100 to 300 observations for each desktops
and notebooks, depending on the number of characteristics that are used in the regression equation (see section on quality adjustment). As a rule of thumb, 15 to 20
observations per characteristic which is finally included in the regression equation
should be collected.
•
The empirical studies conducted within the CENEX HICP Quality adjustment showed
that for each index sample, i.e. one for desktops and one for notebooks, a number
of around 25 observations seems to be reasonable as long as representativity is ensured. The index sample may or may not be a sub-sample of the regression sample.
The size of the index sample needs to remain constant.
4.3
Proposal for the replacement strategy
General remarks
•
The general remarks for the replacement strategy are the same as described in the
non-hedonic section for computers. Please consider that the replacement strategy only refers to the index sample. The regression sample is a cross-sectional sample where
no “replacements” can occur.
•
In order to check whether the chosen products are still representative of the market
the market has to be observed regularly. This regular observation can also be used to
determine general changes (e.g. new equipment, new brands, technological progress,
new characteristics) which might have to be included into the regression equation
(see section on quality adjustment below).
4.4
•
Proposal for the quality adjustment
As stated in the section on the construction of the target sample, for the application
of hedonic re-pricing it may be necessary to collect two data sets for each desktops
and notebooks. That means that there is one regression sample and one index sample
for desktops, and one regression sample and one index sample for notebooks. If a
regression is carried out monthly the regression sample and the respective index
sample may be identical. In order to calculate the hedonic equation a comprehensive
database is necessary which is called “regression sample”. The hedonic equation is
used to apply quality adjustment in the case of replacement situations which can only
occur in the index sample. In the following sections the individual steps are described
in great detail.
Topic 1: Data needed
•
214
The compilation of the regression sample as well of the index sample is described in
the section on the target sample.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
The regression sample could look like this:
t
Model
Price (in €)
Brand
Processor
type
Processor
speed (GHz)
RAM (GB)
HDD (GB)
OS
Type of
drive
...
Chart 153
Example of a data base of a regression sample for desktops
1
A
599.–
DELL
ATH
2.200
1.024
60
WinXP
AA
...
1
B
799.–
DELL
ATH
3.000
2.048
80
WinXP
AA
...
1
C
599.–
DELL
P4
1.900
1.024
40
WinXP
AA
...
...
...
...
...
...
...
...
...
...
...
•
The variable “Model” is not needed for the calculation but makes the identification
of individual desktop models easier.
Topic 2: Data editing
•
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
•
Also for computers, brands might be “catch-all” variables that contain price effects not
included in the other variables. If the number of different brands in the sample is not
too big, they might each get a separate dummy. But if the number of brands is big,
grouping them might be better, as that would keep the number of variables lower.
•
Reformat category variables to dummy variables. These can only take values of 0 or 1
depending on whether the model is equipped with a specific feature or not.
•
As a semi-logarithmic function form is applied, only the price is transformed by logarithm prior to the regression.
t
Model
Price (ln)
Brand A
Brand C
Processor
type ATH
Processor
type Cel
Processor
speed (GHz)
RAM
HDD (GB)
OS: Win2000
...
Chart 154
Example of the modified data base of a regression sample for desktops
1
A
6.3953
0
0
1
0
2.200
1.024
60
0
...
1
B
6.6834
0
0
1
0
3.000
2.048
80
0
...
1
C
6.3953
0
0
0
0
1.900
1.024
40
0
...
...
...
...
...
...
...
...
...
...
...
...
...
Federal Statistical Office, Statistics and Science, Vol. 13/2009
215
Computers
Topic 3: Creating the hedonic function
•
After the data has been arranged, the hedonic regression equation can be set up. In
this regard, the price (as dependent variable) should be explained as a function of
the model’s characteristics (independent variables).
•
Statistical programs like SPSS, Stata or SAS are helpful to calculate the regression.
The computer program is set up to perform a linear regression analysis on the data
of the regression sample. The price is to be considered as the “dependent” variable
and the characteristics (brand dummies, processor speed, etc.) are considered as “independent” variables.
•
With regard to the general function form of the hedonic regression equation, for computers a semi-logarithmic function form is suggested. Only the logarithm of the price
is taken. The hedonic regression equation could, for example, look like this:
ln P = b0 + b1 ⋅ brand a + b2 ⋅ brand c + b3 ⋅ processor ATH + b4 ⋅ processor Cel +
+ b5 ⋅ processor speed + b6 ⋅ RAM + b7 ⋅ HDD + b8 ⋅ OS + ε .
Since the computer market is developing very rapidly, this regression equation contains only exemplary variables which are relevant at the time being.
•
The hedonic regression equation contains dummy variables – brand a, brand c, processor ATH, processor Cel and operating system (OS) – and continuous variables –
processor speed, working memory (RAM), hard disk drive capacity (HDD). For each
dummy variable, one characteristic value has to be defined as default (reference) value. In the example above, for the brands, Brand b serves as the default value, for the
processor types, the P4 processor serves as a reference, and for the operating systems, Windows XP serves as default value. No dummies are included in the regression
equation for the default values.
•
The statistical program computes the coefficients (corresponding to the particular characteristics) and further statistical values which contain information on the signifycance of the variables.
•
The regression may yield the following coefficients:
216
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 155
Example of the regression coefficients
Regression coefficients *
b0
Constant
3.5115
b1
Brand a
– 0.3100
b2
Brand c
– 0.3141
b3
Processor: Athlon
– 0.1440
b4
Processor: Celeron
– 0.3488
b5
Processor speed
0.1439
b6
Working memory
0.0732
b7
Hard disk drive capacity
0.0650
b8
Operating system: Windows 2000
– 0.7930
* Default values: brand b, P4 processor, Windows XP.
•
The variables that were deemed relevant to quality adjust for have been collected as
discussed under Topic 1. These were included into the regression equation. Potentially, some of the considered variables might not be significantly related to the price.
Those “insignificant” variables could be identified by their t-values. At the design
stage when the regression equation is tested, variables with t-values tending to be
within the interval of – 2 to + 2 could be considered for skipping from the regression
equation to be used operatively. One may have to be a little careful here, and some
skilled judgement is called for. It should be noted that the t-test is approximate, as
strictly seen it depends on assumptions that are usually not under perfect control.
•
Particularly in the testing of the regression equation at the design stage, it should be
checked whether the algebraic signs of the coefficients are as expected in view of the
meaning of these. If a “wrong” sign occurs, the possible cause should be searched:
Whether it is just expected random variation, or a misspecification of the variable,
or contaminated data, or something else. Possibly a variable may then be identified
as problematic and ultimately have to be skipped from the regression equation. But
DON'T skip a variable automatically as soon as a “wrong” sign turns up, as doing so
may just sweep problems under the carpet and produce invalid results.
•
The particular regression coefficients can be interpreted as follows: The coefficient for
a continuous variable like processor speed (b5 = 0.1439) indicates that a faster processor of 1 GHz results in an expected price increase of approximately 14.39 %. The
coefficient for a dummy variable like “Brand a” (b1 = – 0.3100) means that the price
difference of “Brand b” (as reference brand) and “Brand a” is approximately – 31.00 %.
That means, a replacement of a “Brand b” computer by a “Brand a” computer inherits
a price decline of approximately 31.00 %.
•
In order to assess the adequacy of the regression estimation, consider the adjusted
R² value. For computers this value should probably be expected to be around 0.5 – 0.6
or higher, as in principle a persistent relation of technological characteristics and
the price should be expected. An R² very close to 1 should not normally occur and
may indicate severely contaminated data. On the other hand, an adjusted R² con-
Federal Statistical Office, Statistics and Science, Vol. 13/2009
217
Computers
siderably below 0.5 indicates a relatively low explanatory ability of the regression
equation. A possible cause may be that the sample is taken from a fairly narrow and
homogeneous stratum, where large characteristics differences do not occur. If so, the
results may possibly be useful anyhow, in spite of the low R²; here some skilled judgement may be called for. Another possible cause for a low R² may be that some essential characteristics variable has been left out and would have to be included in the regression equation.
Topic 4: Index calculation
•
In the following the index sample is considered which contains at least around 25 price
observations for each desktops and notebooks.
•
At first those situations where a replacement takes place have to be detected. In these
cases the sampled models change from one month to the next which implies different
characteristic values and thus different quality levels of the two products. Those quality differences have to be adjusted as described in the following.
•
In each replacement situation the values of the hedonic function h for the replaced
and the replacement model has to be calculated. Therefore, insert the variable values
of the respective model in the hedonic equation. The coefficients of the hedonic equation have been calculated in Topic 3 and the variable values of the respective models
are collected within the scope of price collection.
•
In the next step, the quality adjustment factor has to be computed. The quality adjustment factor (“g”) is the relation of the values of the hedonic function h of replaced
and replacement model. In other words, it is the mathematical ratio of quality for the
new and the old model and refers to only this particular replacement situation:
g=
•
h ( X tB )
Market value of the replacement
=
A
h ( X t −1 ) Market value of the replaced model
This quality adjustment factor is used to adjust the quality of the replaced model by
multiplying its price with the quality adjustment factor:
Pt A−1 ⋅ g = quality adjusted price of the replaced m odel
•
To calculate the quality adjusted price development, divide the price of the replacement product by the quality adjusted price of the replaced product:
PtB
Pt A−1
•
⋅g
=
observed price of the replacement
observed price of the replaced model ⋅ g
For the index calculation, insert the quality adjusted price development in cases of replacements instead of the normal term. Here model A has been replaced by model B:
§ PB
PC
PD PE
PF
I = ¨ At =1 ⋅ tC=1 ⋅ tD=1 ⋅ tE=1 ⋅ tF=1
¨P ⋅g P
t =0 Pt =0 Pt =0 Pt = 0
© t =0
218
·
¸
¸
¹
1/ 5
, with
PtB=1
Pt A=0
being replaced by
PtB=1
Pt A=0 ⋅ g
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Topic 5: Refreshing the hedonic function
•
Monitor the market in order to consider technological progress and market changes.
•
As the product-specific market develops the validity of the calculated linear regression
can change. Especially the significance of the variables as well as the regression coefficients themselves can change since the product undergoes technological progress.
In order to reflect those technological changes, the regression equation has to be
updated regularly. The empirical studies conducted within the scope of the CENEX
showed that for computers the relation between prices and characteristics changes
rapidly from month to month. That means every month a new regression has to be calculated. Therefore, a new regression sample has to be collected monthly, following
the procedure described for the target sample. In this case, the same characteristics
variables are used as before, only their respective coefficients are updated.
•
Approximately once a year, the set of relevant characteristics should be reviewed and
possibly adjusted. After that, the same set of characteristics is used in the monthly
regressions.
4.5
Cost estimation
One time effort for introduction
•
The introduction of the hedonic method requires more effort and expertise than upkeeping it. The introduction requires:
– setting up a sample and determining where and how many models should be observed,
– setting up a system of smoothly processing collected data on prices and characteristics,
– setting up a system that automatically calculates the hedonic regression and the
resulting indices.
•
Both require expertise in statistical analysis methods or econometrics for the specification of the regression function, product expertise for the selection of the relevant
characteristics and the setting up of the sample and IT-expertise to design a system
to process the data to yield the index. The monthly calculation of the index does not
require much econometric or IT-expertise, barring potential catastrophes. Only when
the set of characteristics is re-examined once each year, additional econometric and
IT-expertise is required to some extent.
•
Definition of a sample and determination of relevant characteristics: ca. 2 man-months
with statistical/econometric expertise. Setting up a system of efficiently processing
data and calculating the index: ca. 3 man-months with adequate programming expertise.
Current effort of price collection
•
Ideally, the price collector should have sufficient knowledge of computers to be able
to determine which computers should be included in the sample (e.g. those that are
representative). In the case of observation through the internet, the price statistician
Federal Statistical Office, Statistics and Science, Vol. 13/2009
219
Computers
will often also be the one who collects prices. In that case, expert knowledge should
be guaranteed.
•
If the price collector does not have much technical expertise on computers, clear
guidelines for selecting replacement items are necessary. The ONS currently uses a
clear and exhaustive step-by-step procedure, so this approach can work in practice.
•
Based on Dutch experience, where computer prices and characteristics are only collected online, the collection of prices and characteristics of 100 observations takes
about two days. Having two samples (desktops and notebooks) of 150 models each
would thus take about one week each month for price collection. This includes both
collecting and processing the data on prices and characteristics.
Current effort of index calculation
•
Once the systems for processing observed prices and characteristics and for the calculation of the index have been implemented, the monthly index calculation should
not take too much time and expertise. Software skills (MS Access, MS Excel, SPSS or
any other statistical application) are sufficient.
•
Even if the index calculation takes place using a less advanced system, a price statistician with adequate skills in the relevant applications should be able to perform the index calculation.
•
If a system has been developed to calculate the index automatically, the index calculation is a matter of seconds. When done manually using e.g. SPSS and Excel, one day
should be enough for the index calculation (data processing is already included in the
above section).
4.6
Examples
Weighting of consumption segments:
•
Explicit weights of the consumption segments should stem from a reliable data source.
For example, an external agency could report the following relative expenditure shares
of desktops and notebooks:
– desktops: 40 %,
– notebooks: 60 %.
•
In the following, we will only refer to desktops.
Example for the construction of a target sample:
•
In the first step representative outlets have to be chosen. In our example, let us assume that there is no difference in the price development throughout regions, so that
the regional dimension is no major issue.
•
The next step is to select representative models on the internet sites of the chosen
computer stores where the prices can be collected centrally. For the index sample,
at least 25 prices should be collected from the retailers for each desktops and notebooks. For the regression sample, 100 to 300 observations are necessary.
220
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
Besides the price a set of characteristics has to be collected for each computer model.
The table below shows some exemplary characteristics that are deemed to impact the
price of desktop computers. The set of variables could be different in reality, of course.
•
Consider in month 1, the price collector collects a number of representative desktop
computers. The data collection table for desktop computers could look like this:
Model
Brand
Processor
type
Processor
speed (GHz)
RAM (GB)
HDD (GB)
OS
Type of drive
Price
in t = 1
...
Chart 156
Example of the price collection table for the index sample for desktops (t = 1)
A
DELL
ATH
2.200
1.024
60
WinXP
AA
599.– €
...
B
DELL
ATH
3.000
2.048
80
WinXP
AA
799.– €
...
C
DELL
P4
1.900
1.024
40
WinXP
AA
599.– €
...
...
...
...
...
...
...
...
...
...
...
Example for the replacement strategy:
•
The following replacement situation refers to the index sample only.
•
Suppose that in month 2 the selected desktop computer model “C” is no longer available or no longer well sold and it is replaced by a more representative alternative desktop model “D”.
•
Model
Brand
Proessor
type
Processor
speed (GHz)
RAM (GB)
HDD (GB)
OS
Type of drive
Price
in t = 1
Price
in t = 2
Chart 157
Example of the price collection table for the index sample for desktops (t = 2)
A
DELL
ATH
2.200
1.024
60
WinXP
AA
599.– €
599.– €
B
DELL
ATH
3.000
2.048
80
WinXP
AA
799.– €
749.– €
C
DELL
P4
1.900
1.024
40
WinXP
AA
599.– €
D
DELL
P4
2.660
1.024
40
WinXP
AA
649.– €
The different models exhibit different characteristics with inherent different qualities.
To match the products quality adjustment has to be applied.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
221
Computers
Example for quality adjustment by applying hedonic re-pricing:
Topic 1: Data needed
•
The compilation of the regression sample is described in the section on the target
sample.
•
The regression sample could look like this:
t
Model
Price (in €)
Brand
Processor
type
Processor
speed (GHz)
RAM (GB)
HDD (GB)
OS
Type of drive
...
Chart 158
Example of a data base of a regression sample for desktops
1
A
599.–
DELL
ATH
2.200
1.024
60
WinXP
AA
...
1
B
799.–
DELL
ATH
3.000
2.048
80
WinXP
AA
...
1
C
599.–
DELL
P4
1.900
1.024
40
WinXP
AA
...
...
...
...
...
...
...
...
...
...
...
Topic 2: Data editing
•
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
•
Since a semi-logarithmic function form is applied, the price is transformed by logarithm.
•
The category variables brand, processor type and operating system are transformed
into dummy variables. These can only take values of 0 or 1 depending on whether
the model is equipped with a specific feature or not.
•
Now, one characteristic value has to be chosen as reference value for each dummy
variable.
– The reference value for the brand dummy variables is DELL. That means, in the
case of DELL computers, the alternative brand dummies have a value of 0.
– The reference value for the processor type variable is P4 (indicating Intel Pentium4).
That means, in the case of a P4 processor, the alternative processor dummies have
a value of 0.
– The reference value for the operating system variable is WinXP (indicating Windows XP). That means, in the case of computer equipped with a Windows XP operating system the OS dummy has a value of 0 and in the case of a computer equipped with a Windows2000, the OS dummy has a value of 1.
222
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
t
Model
Price (ln)
Brand A
Brand C
Processor
type ATH
Processor
type Cel
Processor
speed (GHz)
RAM
HDD (GB)
OS: Win2000
...
Chart 159
Example of the modified data base of the regression sample for desktops
1
A
6.3953
0
0
1
0
2.200
1.024
60
0
...
1
B
6.6834
0
0
1
0
3.000
2.048
80
0
...
1
C
6.3953
0
0
0
0
1.900
1.024
40
0
...
...
...
...
...
...
...
...
...
...
...
...
...
Topic 3: Creating the hedonic function
•
The procedure of data collection for the regression sample and the subsequent creation of the hedonic function is exemplarily described in the section on quality adjustment. The coefficients of the recent hedonic equation can be seen in the chart below
again:
Chart 160
Example of the regression coefficients
Regression coefficients
b0
Constant
3.5115
b1
Brand a
– 0.3100
b2
Brand c
– 0.3141
b3
Processor: ATH
– 0.1440
b4
Processor: Celeron
– 0.3488
b5
Processor speed
0.1439
b6
Working memory (RAM)
0.0732
b7
HDD capacity
0.0650
b8
Operating system
– 0.7930
•
All the characteristics in the chart above are deemed to impact the price signifycantly.
•
The reference brand is DELL, the reference processor type is P4, the reference operating system is Windows XP.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
223
Computers
Topic 4: Index calculation
•
The index from month 1 to month 2 contains a replacement situation. Model “C” is
replaced by model “D”. For both price observations, the replaced and the replacement model, the values of the hedonic function h have to be calculated by applying
the recent hedonic function. As you know the hedonic regression equation is a semilogarithmic function. Hence, the exponential function is needed to obtain the values
of the hedonic function which are needed for the calculation of the quality adjusted
price. This is done by taking the anti-log of the sum of the constant and the respective products of characteristics and coefficients:
ht = e ( b0 + b1 ⋅ v1 + ... + b8 ⋅ v 8 )
•
To calculate the values of the hedonic function for both models, it is necessary to
know the particular characteristics, which can be seen in the chart below:
t
Model
Brand A
Brand C
Processor
type ATH
Processor
type Cel
Processor
speed (GHz)
RAM
HDD (GB)
OS: Win2000
Chart 161
Example of the characteristics of replaced and replacement model
1
C
0
0
0
0
1.900
1.024
40
0
2
D
0
0
0
0
2.660
1.024
40
0
•
Inserting the variable values into the recent hedonic equation yields the values of the
hedonic function for the particular models:
h ( Model _ C )t =2 = e( 3.5115 + 0.1439 ⋅ 1.900 + 0.0732 ⋅ 1.024 + 0.0650 ⋅ 40 ) = 638.98 €
h ( Model _ D )t =2 = e( 3.5115 + 0.1439 ⋅ 2.660 + 0.0732 ⋅ 1.024 + 0.0650 ⋅ 40 ) = 712.82 €
•
The mathematical ratio of the values of the hedonic function h of the replacement
and the replaced model yields the quality adjustment factor “g” which can be calculated as follows (the subscript r = 2 indicates that the regression was run in month 2):
grC=,D2 =
•
h ( Model _ D )t =3 712.82 €
=
= 1.1156
h ( Model _ C )t =3 638.98 €
To calculate the index from month 1 to month 2 the following formula is applied:
§ P A PB PD
I2 = ¨ t A=2 ⋅ t B=2 ⋅ tC=2
¨P
© t =1 Pt =1 Pt =1
224
·
¸
¸
¹
1/ 3
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
The index contains one replacement from model C in month 1 to model D in month 2.
As hedonic re-pricing is used for quality adjustment
PtD=2
PtC=1
•
is replaced by
PtD=2
PtC=1 ⋅ g rC=,D2
The index for month 2 is calculated in the following (including the application of
hedonic re-pricing):
§P A PB
PD
I2 = ¨ t A=2 ⋅ t B=2 ⋅ C t =2 C ,D
¨P
© t =1 Pt =1 Pt =1 ⋅ g r =2
·
¸
¸
¹
1/ 3
§ 599 € 749 €
·
649 €
¸¸
= ¨¨
⋅
⋅
© 599 € 799 € 599 € ⋅ 1.1156 ¹
13
= 0.969
Chart 162
Monthly indexes
Model
Brand
Price
in month 1
Price
in month 2
A
DELL
599.– €
599.– €
B
DELL
799.– €
749.– €
C
DELL
599.– €
D
DELL
Q
Q
649.– €
Quality adjusted price
C
DELL
Monthly indexes
668.24 €
100.0
Federal Statistical Office, Statistics and Science, Vol. 13/2009
96.9
225
Washing machines
5
Washing machines
•
For washing machines, the Task Force on Quality Adjustment and Sampling rated explicit methods (hedonics and option pricing) as A-methods. Implicit methods (bridged
overlap) are assessed as B-methods.
•
The following section deals with the application of hedonic re-pricing for washing machines.
5.1
•
5.2
•
Proposal for the identification of consumption segments
The suggested consumption segments are identified as described in the non-hedonic
section for washing machines.
Proposal for the construction of a target sample
For the application of hedonic re-pricing, usually two samples have to be collected.
The first sample is called “regression sample” which serves the calculation of the
hedonic regression equation. The second sample is called “index sample” and is used
to calculate the price index for washing machines.
The regression sample
•
To apply the hedonic method, a regression sample has to be collected. This sample
is needed for the calculation of a systematic relationship between the prices of the
washing machines and their particular characteristics. Resulting from this relationship the hedonic equation can be established.
•
The regression sample can be collected according to the same procedure as the index sample which is described in the non-hedonic part. In the regression period, the
index sample may also be a sub-sample of the regression sample or it can be independent from the regression sample.
•
To find out which characteristics are relevant to quality adjust washing machines for, if
possible try several approaches:
– Use publicly available information from advertising, shop displays, magazines etc.
– Use experience from other national statistical offices, which may be further ahead
in implementation of hedonics for washing machines, or from other research.
– Consult ptoduct experts in washing machine business – but remember that the
consumer perspective is what is relevant, not engineering effort or so as such.
– A more ambitious approach could be to conduct a pilot study with a rather broadminded collection of characteristics. But do not do that if you do not already have
a rather clear mind about what you are looking for, as else you would likely just
waste your and respondents’ resources.
•
The same set of characteristics needs to be collected for each washing machine, regardless of whether it belongs to the regression sample or to the index sample.
•
It is important not to consider those variables that reflect fashion. This could be for
example the colour of the washing machine.
226
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
The following characteristics may serve as orientation for washing machines:
– brand cluster,
– washing machine type (front loader, top loader),
– electricity consumption (in kWh),
– maximum spin speed (in rpm),
– water consumption (in litres),
– capacity (in kg),
– built-in possibility,
– switch-on delay,
– waterproof systems (aqua-stop, . . .),
– ...
•
The brands of the selected washing machines should normally be assigned to brand
clusters. A classification of brand clusters can be obtained from the PPP statistics (Purchasing Power Parities – price indices comparing the national price levels). The available brand cluster lists of the PPP classify brands into high, medium and low brands.
In these lists, not all available brands are classified. Missing brands should be assigned to one of the brand clusters by informed judgement.
Chart 163
Extract from the PPP classification (adapted for washing machines)
Brand cluster
Assigned brands
Higher
AEG-ELECTROLUX, BOSCH, DYSON, MIELE, SIEMENS
Medium
ARISTON, BAUKNECHT, CANDY, ELEKTRA BREGENZ,
ELECTROLUX, EUDORA, GORENJE/KÖRTING/MORA,
HOOVER, LG, SAMSUNG, WHIRLPOOL, ZANUSSI
Lower
BEKO, BRANDT, DAEWOO, EUROTECH, FAGOR, KONCAR,
HOTPOINT, INDESIT, SANYO
Source: Eurostat (Lot C), European Comparison Programme: Consumer price survey E07-1
“House and garden” Particular survey guidelines
•
Given the technological change of washing machines, new features appear from time
to time. This is not so frequent that it would be necessary to select a new set of relevant characteristics every month, but the price statistician should be aware of the possibility.
•
Whenever a new characteristic is introduced, the price statistician should judge
whether the new characteristic is important enough to be included. Therefore, a continuous market observation is necessary.
•
Of course, new characteristics can only be used to adjust for quality differences if this
characteristic is collected in both the price reference month and the current month.
•
An appropriate form which could be distributed to the price collectors should precisely
ask for the considered characteristics and might look like this:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
227
Washing machines
Chart 164
Price collection form
Washing machine, front loader, capacity of 5 – 7 kg, higher brand
Price (€):
649.–
Brand and model description:
Bosch WAS 28440
Capacity (kg):
7
Electricity consumption (kWh):
1.19
Water consumption (litres):
49
Maximum spin speed (rpm):
1400
Waterproofing:
Aqua-Stop
Built-in possibility:
: Yes
… No
Switch-on delay:
: Yes
… No
Shop: ___________________________
Date: ___________________________
•
In practice it would be more convenient to collect only the respective prices as well
as the brand and model descriptions in the shop. The characteristics of the particular washing machines can be collected from internet sites.
•
For the estimation of the regression equation it is desirable that price observations
used reflect a state of equilibrium (balance between supply and demand). That means
that if it is practically possible, it would normally be preferable that temporarily reduced prices due to promotion offers or stock clearances can be identified in the price
collection and not used for estimating the regression equation. (However, note that
in the index calculation, on the other hand, reduced prices shall not be excluded.)
Minimum size of the target sample
•
The regression sample should contain 100 to 300 observations depending on the
number of characteristics that are used in the regression equation (see section on
quality adjustment). As a rule of thumb, 15 to 20 observations per characteristic which
is finally included in the regression equation should be collected.
•
In the empirical studies conducted within the CENEX project a relatively large index
sample was necessary to yield reliable results. Conclusions on a concrete value for
the minimum sample size of the index sample could not be drawn.
•
In the regression period, the index sample may or may not be a sub-sample of the
regression sample.
228
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
5.3
Proposal for the replacement strategy
General remarks
•
The general remarks for the replacement strategy are the same as described in the
non-hedonic section for washing machines. Please consider that the replacement
strategy only refers to the index sample. The regression sample is a cross-sectional
sample where no “replacements” can occur.
•
If a model is no longer available or no longer well sold it should be replaced. The successor model should be best sold and representative.
•
It is necessary to observe the market in order to account for general changes. Therefore, the representativity of the models in the index sample should be checked within
the scope of monthly price collection and on a regular basis in the central office.
Representativity check within the scope of the monthly price collection in the shop
•
Use the concrete washing machine models that were in the sample in the previous
month as a starting point for the price collection of the current month in order to
keep the share of matched models reasonably high. However, the main criterion for
the selection of replacements should be representativity.
•
The price collector should check whether or not the models of the previous month
are still representative. The price collector should assess this each month with the
help of the shop assistant. If this is not practicable the price collector can also revert
to indicators such as the way the product is advertised or presented in the shop or he
might also revert to his experience.
•
If the model of the previous month is still representative continue observing the price
of this model. If the model is no longer representative or if it is no longer available
replace it by a representative one.
Representativity check in the office
•
As the market for washing machine is relatively stable with regards to technological
progress, a yearly representativity check might be sufficient. Nevertheless, the market should be observed for general changes such as new equipment features, new
brands, and technological trends in general.
•
If the pre-selected strata and product specifications are still representative, carry on
using them. If not, adjust the product specifications to the current market circumstances by the annual re-sampling.
5.4
•
Proposal for the quality adjustment
As stated in the section on the construction of the target sample, for the application
of hedonic re-pricing it may be necessary to collect two data sets. In order to calculate the hedonic equation a comprehensive database is necessary which is called
“regression sample”. The hedonic equation is used to apply quality adjustment in
the case of replacement situations which can only occur in the index sample. In the
following sections the individual steps are described in great detail.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
229
Washing machines
Topic 1: Data needed
•
The compilation of the regression sample as well as of the index sample is described
in the section on the target sample.
•
The regression sample could look like this:
t
Model
Price (in €)
Brand
Maximum
spin speed
Waterproofing
Capacity (kg)
Type
Water
consumption (l)
Electricity
consumption (kWh)
Built-in possibility
Switch-on delay
Chart 165
Example of a data base of a regression sample for washing machines
1
1
1
...
A
B
C
...
649.–
819.–
699.–
...
Bosch
Miele
AEG-Electrolux
...
1400
1400
1400
...
High
High
High
...
7
6
6
...
Top
Top
Top
...
49
49
49
...
1.19
1.02
1.02
...
Yes
Yes
Yes
...
Yes
Yes
Yes
...
Topic 2: Data editing
•
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
•
Reformat category variables to dummy variables. These can only take values of 0 or 1
depending on whether the model is equipped with a specific feature or not.
•
As a semi-logarithmic function is applied only the price is transformed by logarithm
prior to the regression.
t
Model
Price (ln)
Brand cluster (high)
Brand cluster (low)
Maximum
spin speed
Waterproofing
(high)
Capacity (kg)
Top loader
Water
consumption (l)
Electricity
consumption (kWh)
Built-in possibility
Switch-on delay
Chart 166
Example of a modified data base of a regression sample for washing machines
1
1
1
...
A
B
C
...
6.475
6.708
6.550
...
1
1
1
...
0
0
0
...
1400
1400
1400
...
1
1
1
...
7
6
6
...
0
0
0
...
49
49
49
...
1.19
1.02
1.02
...
1
1
1
...
1
1
1
...
230
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Topic 3: Creating the hedonic function
•
After the data has been arranged, the hedonic regression equation can be set up. In
this regard, the price (as dependent variable) should be explained as a function of
the model’s characteristics (independent variables).
•
Statistical programs like SPSS, Stata or SAS are helpful to calculate the regression.
The computer program is set up to perform a linear regression analysis on the data
of the regression sample. The price is to be considered as the “dependent” variable
and the characteristics (brand dummies, maximum spin speed, etc.) are considered
as “independent” variables.
•
With regard to the general function form of the hedonic regression equation, for washing machines a semi-logarithmic function form is suggested. Only the logarithm of
the price is taken. The hedonic regression equation could, for example, look like this:
ln P = b0 + b1 ⋅ brand cluster high + b2 ⋅ brand cluster low + b3 ⋅ top loader +
+ b4 ⋅ electricit y consumptio n + b5 ⋅ water consumptio n + + ε
•
The hedonic regression equation contains dummy variables (here: brand cluster a,
brand cluster c, washing machine type) and continuous variables (here: electricity consumption, water consumption). For each dummy variable, one characteristic value
has to be defined as default (reference) value. In the example above, for the brand
clusters, brand cluster “medium” serves as the default value. For the washing machine
type, front loaders were chosen as default value.
•
The statistical program computes the coefficients (corresponding to the particular
characteristics) and further statistical values which contain information on the significance of the variables.
•
The regression may yield the following significant coefficients:
Chart 167
Example of the regression coefficients
Regression coefficients
•
b0
Constant
5.66329
b1
Brand cluster (high)
0.24642
b2
Brand cluster (low)
– 0.1428
b3
Maximum spin speed
0.0005339
b4
Waterproofing (high)
0.05264
b5
Capacity
0.10508
b6
Top loader
0.18836
The variables that were deemed relevant to quality adjust for have been collected as
discussed under Topic 1. These were included into the regression equation. Potentially, some of the considered variables might not be significantly related to the price.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
231
Washing machines
Those “insignificant” variables could be identified by their t-values. At the design
stage when the regression equation is tested, variables with t-values tending to be
within the interval of – 2 to + 2 could be considered for skipping from the regression
equation to be used operatively. One may have to be a little careful here, and some
skilled judgement is called for. It should be noted that the t-test is approximate, as
strictly seen it depends on assumptions that are usually not under perfect control.
•
Particularly in the testing of the regression equation at the design stage, it should
be checked whether the algebraic signs of the coefficients are as expected in view
of the meaning of these. If a “wrong” sign occurs, the possible cause should be
searched: Whether it is just expected random variation, or a misspecification of the
variable, or contaminated data, or something else. Possibly a variable may then be
identified as problematic and ultimately have to be skipped from the regression equation. But DON’T skip a variable automatically as soon as a “wrong” sign turns up, as
doing so may just sweep problems under the carpet and produce invalid results.
•
The particular regression coefficients can be interpreted as follows: The coefficient
for a continuous variable like capacity (b5 = 0.10508) indicates that a capacity of one
additional kilogram results in an expected price increase of approximately 10.508 %.
The coefficient for a dummy variable like “Brand cluster (high)” (b1 = 0.24642) means
that the price difference of “Brand cluster (medium)” (as reference brand cluster)
and “Brand cluster (high)” is approximately 24.642 %. That means, a replacement of a
washing machine of a low brand by a washing machine of a high brand inherits a price
increase of approximately 24.642 %.
•
In order to assess the adequacy of the regression estimation, consider the adjusted
R² value. For washing machines this value should probably be expected to be around
0.5-0.6 or higher, as in principle a persistent relation of technological characteristics and the price should be expected. An R² very close to 1 should not normally
occur and may indicate severely contaminated data. On the other hand, an adjusted
R² considerably below 0.5 indicates a relatively low explanatory ability of the regression equation. A possible cause may be that the sample is taken from a fairly narrow
and homogeneous stratum, where large characteristics differences do not occur. If so,
the results may possibly be useful anyhow, in spite of the low R²; here some skilled
judgement may be called for. Another possible cause for a low R² may be that some
essential characteristics variable has been left out and would have to be included in
the regression equation.
Topic 4: Index calculation
•
In the following the index sample is considered.
•
At first those situations where a replacement takes place have to be detected. In
these cases the sampled models change from one month to the next which implies
different characteristic values and thus different quality levels of the two products.
Those quality differences have to be adjusted as described in the following.
•
In each replacement situation the values of the hedonic function h for the replaced
and the replacement model has to be calculated. Therefore, insert the variable values of the respective model in the hedonic equation. The coefficients of the hedonic
232
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
equation have been calculated in Topic 3 and the variable values of the respective
models are collected within the scope of price collection.
•
In the next step, the quality adjustment factor has to be computed. The quality adjustment factor (“g”) is the relation of the values of the hedonic function h of replaced
and replacement model. In other words, it is the mathematical ratio of quality for the
new and the old model and refers to only this particular replacement situation:
g=
•
h ( X tB )
Market value of the replacement
=
A
h ( X t −1 ) Market value of the replaced model
This quality adjustment factor is used to adjust the quality of the replaced model by
multiplying its price with the quality adjustment factor:
Pt A−1 ⋅ g = quality adjusted price of the replaced model
•
To calculate the quality adjusted price development, divide the price of the replacement product by the quality adjusted price of the replaced product:
PtB
Pt A−1
•
⋅g
=
observed price of the replacement
observed price of the replaced model ⋅ g
For the index calculation, insert the quality adjusted price development in cases of replacements instead of the normal term. Here model A has been replaced by model B:
§ PB
PC
PD PE
PF
I = ¨ At =1 ⋅ tC=1 ⋅ tD=1 ⋅ tE=1 ⋅ tF=1
¨P ⋅g P
t =0 Pt =0 Pt =0 Pt = 0
© t =0
·
¸
¸
¹
1/ 5
, with
PtB=1
Pt A=0
being replaced by
PtB=1
Pt A=0 ⋅ g
Topic 5: Refreshing the hedonic function
•
Monitor the market in order to consider technological progress and market changes.
•
As the product-specific market develops the validity of the calculated linear regression
can change. Especially the significance of the variables as well as the regression coefficients themselves can change since the product undergoes technological progress.
In order to reflect those technological changes, the regression equation has to be
updated regularly. In the empirical studies conducted within the scope of the CENEX
two indices were compared, one with a monthly re-estimation and one with an infrequent re-estimation. The indices showed a small difference. However, the difference
was so small that for washing machines it is probably sufficient to re-estimate the
hedonic function twice a year (e.g. in January and July) or yearly. Therefore, a new regression sample may have to be collected twice a year or yearly, following the procedure described for the target sample.
5.5
Cost estimation
One-time effort for the introduction
•
Two members of staff are needed to introduce the method. Of these, one should have
sufficient hedonic expertise and the other a knowledge of washing machine variables.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
233
Washing machines
One should be responsible for implementing hedonic calculations, the other staff member should have responsibility for identifying variables and models for the sample.
•
Up to six months seem to be necessary for the introduction. This can include an initial design study on data requirements, market research, testing new price collector
forms, testing and assessment of the regression equation, methodology reporting,
approval and documentation as well as system designs for smooth regular production.
•
If an initial pilot study with extended data collection is needed, this of course requires
additional resources.
Current effort of price collection
•
There are two aspects to price collection. The prices can be collected by regular price
collectors which also conduct the model selection. The variables can be obtained from
the internet by one central office expert on washing machines.
•
Estimated amount of work:
– price collection: 50 hours,
– variable collection and model selection: 15 hours,
– total: 65 hours per collection.
Current effort of index calculation
•
In order to set up the index calculation the price statistician would have to be competent in hedonic modelling. Once created the required expertise to maintain the system
is minimal.
•
Estimated amount of work: This should be automatic following data entry.
5.6
Examples
Example for the construction of a target sample:
•
As described for the compilation of the target sample a market evaluation yielded that
capacity and washing machine type are important to be considered. Let us assume
that in our example the following three strata are relevant:
– front loaders (capacity up to 5 kg),
– front loaders (capacity 5 – 7 kg),
– top loaders (capacity up to 5 kg).
•
In our example, we only cover front loaders with a capacity of 5 to 7 kg. Consider in
month 1, the price collector observes a number of representative washing machine
models for the stratum front loader (capacity 5 – 7 kg):
234
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Capacity (kg)
Price
in month 1 (€)
...
A
Bosch
WAS 28440
7
649.–
...
B
Miele
W1624
6
819.–
...
C
AEG-Electrolux
Lavamat 64840
6
599.–
...
...
...
...
Brand
Model
Type
Chart 168
Example of the price collection table
...
•
...
...
Besides the price and the capacity, a set of further relevant characteristics is also
collected corresponding to the applied price collection form, e.g.:
Chart 169
Price collection form
Washing machine, front loader, capacity of 5 – 7 kg, higher brand
Price (€):
649.–
Brand and model description:
Bosch WAS 28440
Capacity (kg):
7
Electricity consumption (kWh):
1.19
Water consumption (litres):
49
Maximum spin speed (rpm):
1400
Waterproofing:
Aqua-Stop
Built-in possibility:
: Yes
… No
Switch-on delay:
: Yes
… No
Shop: ___________________________
Date: ___________________________
Example for the replacement strategy:
•
Suppose that in month 3 the washing machine version AEG-Electrolux Lavamat 64840
is no longer available or no longer well sold and it is replaced by the washing machine
version Siemens WM 14 E 421.
•
Those replacement situations can be marked with a “Q” in the price collection table.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
235
Washing machines
7
649.–
629.–
629.–
W1624
6
819.–
799.–
799.–
C
AEGElectrolux
Lavamat 64840
6
699.–
699.–
D
Siemens
WM 14 E 421
7
•
Q
Q
Brand
Price
in month 3 (€)
WAS 28440
Price
in month 2 (€)
Price
in month 1 (€)
Bosch
Miele
Version
A
B
Model
Capacity (kg)
Chart 170
Example of the price collection table
Q
749.–
The different models could exhibit different characteristics with inherent different qualities. To match the products quality adjustment has to be applied.
Example for quality adjustment by applying hedonic re-pricing:
Topic 1: Data needed
•
The compilation of the regression sample is described in the section on the target
sample.
•
The regression sample could look like this:
Maximum
spin speed
Waterproofing
Capacity (kg)
Type
Water
consumption (l)
Electricity
consumption (kWh)
Built-in possibility
Switch-on delay
High
7
Top
49
1.19
Yes
Yes
819.–
Miele
1400
High
6
Top
49
1.02
Yes
Yes
1400
High
6
Top
49
1.02
Yes
Yes
...
...
...
...
...
...
...
...
Price (in €)
1400
Model
Bosch
t
Brand
Chart 171
Example of a data base of a regression sample for washing machines
1
A
649.–
1
B
1
C
699.–
AEGElectrolux
...
...
...
...
Topic 2: Data editing
•
236
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
Reformat category (alphanumerical) variables to dummy variables. These can only
take values of 0 or 1 depending on whether the model is equipped with a specific feature or not. In this example the characteristics “waterproofing”, “type”, and “brand
cluster” have to be reformatted to a dummy variable. The waterproofing system – which
was collected via the price collection form (here: aqua stop) – has to be assigned to
a waterproofing level cluster. In this example we defined “high waterproofing” and
“low waterproofing”. In this example, “aqua stop” is assigned to “high waterproofing”.
For each dummy variable, one characteristic value has to be defined as default (reference) value.
– The reference value for the brand dummy variables is the medium brand cluster.
That means, in the case of a medium brand washing machine, the alternative
brand dummies have a value of 0.
– The reference value for the washing machine type variable is “front loader”. That
means, in the case of a front loader washing machine, the top loader dummy has
a value of 0.
– The reference value for the waterproofing variable is “waterproofing (low)”. That
means, in the case of washing machines equipped with a higher level waterproofing system the “waterproofing (high)” dummy has a value of 1. In this example we assume that there are only two waterproofing levels. In other cases it might
be useful to define more waterproofing levels.
Switch-on delay
Built-in possibility
Electricity
consumption (kWh)
Water
consumption (l)
Top loader
Capacity (kg)
Waterproofing
(high)
Maximum
spin speed
Brand cluster (low)
Brand cluster (high)
Price (ln)
t
Model
Chart 172
Example of a modified data base of a regression sample for washing machines
1
A
6.475
1
0
1400
1
7
0
49
1.19
1
1
1
B
6.708
1
0
1400
1
6
0
49
1.02
1
1
1
C
6.550
1
0
1400
1
6
0
49
1.02
1
1
...
...
...
...
...
...
...
...
...
...
...
...
...
Topic 3: Creating the hedonic function
•
The procedure of data collection for the regression sample and the subsequent creation of the hedonic function are exemplarily described in the section on quality adjustment. The significant coefficients of the hedonic equation which are needed for the
example can be seen in the following chart:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
237
Washing machines
Chart 173
Example of the regression coefficients
Regression coefficients
b0
Constant
5.66329
b1
Brand cluster (high)
0.24642
b2
Brand cluster (low)
– 0.1428
b3
Maximum spin speed
0.0005339
b4
Waterproofing (high)
0.05264
b5
Capacity
0.10508
b6
Top loader
0.18836
Topic 4: Index calculation
•
In the following tables you find three price series for three periods containing the data
that are required for the index calculation. At first, all replacement situations (marked
with a “Q”) have to be detected since only in those cases quality adjustment has to
be applied. In this example from month 2 to month 3 there is one replacement situation.
629.–
629.–
819.–
799.–
799.–
C
AEGElectrolux
6
699.–
699.–
D
Siemens
•
Brand
Lavamat 64840
WM 14 E 421
238
Q
7
749.–
The index calculation from month 1 to month 2 follows the formula below:
§ P A P B PC
I2 = ¨ t A=2 ⋅ tB=2 ⋅ tC=2
¨P
© t =1 Pt =1 Pt =1
•
Price
in month 3 (€)
649.–
6
Q
7
W1624
Price
in month 2 (€)
WAS 28440
Q
Price
in month 1 (€)
Bosch
Miele
Version
A
B
Model
Capacity (kg)
Chart 174
Example of the price collection table
·
¸
¸
¹
1/ 3
§ 629 € 799 € 699 € ·
¸¸
= ¨¨
⋅
⋅
© 649 € 819 € 699 € ¹
1/ 3
= 0.9815
The index from month 2 to month 3 contains replacement cases (marked with “Q”).
The AEG-Electrolux model is replaced by a Siemens model. For both price observations, the replaced and the replacement model, the values of the hedonic function h
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
have to be calculated. In this example the hedonic regression equation is a semilogarithmic function. Hence, the exponential function is needed to obtain the values
of the hedonic function which are needed for the quality adjusted price. This is done
by taking the anti-log of the sum of the constant and the respective products of
characteristics and coefficients:
ht = e ( b0 + b1 ⋅ v1 + ... + b10 ⋅ v10 )
•
To calculate the values of the hedonic function for both models, it is necessary to
know the particular characteristics, which can be seen in the table below:
•
t
Model
Brand cluster (high)
Brand cluster (low)
Maximum
spin speed
Waterproofing
(high)
Capacity (kg)
Top loader
Water
consumption (l)
Electricity
consumption (kWh)
Chart 175
Example of the characteristics of replaced and replacement model
2
C
1
0
1400
1
6
0
49
1.02
3
D
1
0
1400
1
7
0
49
1.19
Inserting the variables into the hedonic equation yields the values of the hedonic function for the particular models:
h Ct=2= e( 5.66329 + 0.24642 ⋅ 1 + 0.0005339 ⋅ 1400 + 0.05264 ⋅ 1 + 0.10508 ⋅ 6 + ( −0.0144 ) ⋅ 49 + ( −0.0473 ) ⋅ 1.02 )
= 725.20 €
h
D
( 5.66329 + 0.24642 ⋅ 1 + 0.0005339 ⋅ 1400 + 0.05264 ⋅ 1 + 0.10508 ⋅ 7 + ( −0.0144 ) ⋅ 49 + ( −0.0473 ) ⋅ 1.19 )
t =3 = e
= 799.10 €
•
The mathematical ratio of the values of the hedonic function from the replacement
and the replaced model yields the quality adjustment factor “g” which can be calculated as follows (the subscript r = 1 indicates that the regression was run in month 1):
g rC=,D1 =
•
D
t =3
htC=2
h
=
799.10 €
= 1.1019
725.20 €
To calculate the index from month 2 to month 3 the following formula is applied:
1/ 3
§ P A PB PD ·
I3 = ¨ t A=3 ⋅ tB=3 ⋅ tC=3 ¸
¨P
¸
© t =2 Pt =2 Pt =2 ¹
Federal Statistical Office, Statistics and Science, Vol. 13/2009
239
Washing machines
•
The index contains one replacement from model C in month 2 to model D in month 3.
As hedonic re-pricing is used for quality adjustment
PD
PtD=3
is replaced by C t =3 C ,D
C
Pt =2
Pt =2 ⋅ g r =1
•
The index for month 3 is calculated in the following (including the application of
hedonic re-pricing):
I 3QA
§P A PB
PD
= ¨ t A=3 ⋅ t B=3 ⋅ C t =3 C ,D
¨P
© t =2 Pt =2 Pt =2 ⋅ g r =0
·
¸
¸
¹
1/ 3
§ 629 € 799 €
·
749 €
¸¸
= ¨¨
⋅
⋅
© 649 € 799 € 699 € ⋅ 1.1019 ¹
1/ 3
= 0.9804
Price
in month 3 (€)
7
649.–
629.–
629.–
B
Miele
W1624
6
819.–
799.–
799.–
C
AEG-Electrolux Lavamat 64840
6
699.–
699.–
D
Siemens
7
Brand
WM 14 E 421
Q
WAS 28440
Q
Price in
month 1 (€)
Bosch
Version
A
Model
Capacity (kg)
Price
in month 2 (€)
Chart 176
Monthly indexes
Q
749.–
Quality adjusted Price
C
AEG-Electrolux Lavamat 64840
Monthly indexes
240
6
770.22
100
98.2
98.0
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
6
Books
•
As a first step the COICOP four-digit-level 09.5.1 Books is divided into two headlines: “rapidly changing market” and “long-selling market”.
– Rapidly changing market (“changing-product principle”): Applicable to products
which have relatively short commercial life-spans and high rates of introduction
of new products (books for children, books for leisure and culture including recently published books).
– Long-selling market (“constant product principle”): Applicable to products that
have relatively long commercial life-spans and low rates of introduction of new
products (schoolbooks, dictionaries, encyclopaedia, evergreens and the like).
•
The hedonic method is deemed most reliable in the HICP standards for the rapidlychanging market (A-method). Furthermore, the empirical studies of the CENEX showed
that the index with hedonic adjustment exhibits the smallest volatility for the rapidlychanging market.
•
Neither the empirical studies nor the HICP Standards propose the application of the
hedonic method for the long-selling market. For this reason, the following section
deals with the application of hedonic re-pricing for books in the rapidly changing
market only.
Note:
•
The quality adjustment method depends on the homogeneity of the available bestseller list:
– A homogeneous bestseller list is, for example, a list only containing pocket fiction books. The books on the list are similar in the type of binding and size and
in many cases the number of pages varies between certain ranges. In such circumstances where homogeneous bestseller lists are available there is no further need
for quality adjustment as all books can be seen as essentially equivalent. The
prices of these books may then be directly compared.
– Another case is a national bestseller list which does e.g. not account for the binding but instead any book that is well sold can appear on the list. In this case
there is a need for quality adjustment and then the method which is regarded as
superior in the HICP Standards is the hedonic method.
•
Bestseller lists may be either nation-wide or collected locally in stores, e.g. only books
that are prominently placed on tables are observed. Please note that national bestseller lists are not generally preferred to local bestseller lists. Both approaches can
be used depending on the index to be published. If only a nationwide index is to be
published then a national bestseller list of all book sales may be the more appropriate choice. If additionally regional indexes are to be published regional bestseller lists
may the better choice. Books on bestseller lists can differ in their characteristics from
month to month.
6.1
•
Proposal for the identification of consumption segments
The suggested consumption segments are identified as described in the non-hedonic
section for books.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
241
Books
6.2
Proposal for the construction of a target sample
•
For the application of hedonic re-pricing, usually two samples have to be collected.The
first sample is called “regression sample” which serves the calculation of the hedonic
regression equation. The second sample is called “index sample” and is used to
calculate the price index for books. The index sample may also be a sub-sample of
the regression sample or it can be independent from the regression sample.
•
It is suggested to collect one regression sample for each consumption segment.
•
The procedure suggested for the construction of a sample is valid for both the regression and the index sample which is already described in the non-hedonic section for
books.
The regression sample
•
To apply the hedonic method, a regression sample has to be collected. This sample
is needed for the calculation of a systematic relationship between the prices of the
books and their particular characteristics. Resulting from this relationship the hedonic
equation can be established.
•
To find out which characteristics are relevant to quality adjust books for:
– Use experience from other national statistical offices, which may be further ahead
in implementation of hedonics for books, or from other research.
– A more ambitious approach could be to conduct a pilot study with a rather broadminded collection of characteristics. But do not do that if you do not already have
a rather clear mind about what you are looking for, as else you would likely just
waste your and respondents' resources.
•
The same set of characteristics needs to be collected for each book, regardless of
whether it belongs to the regression sample or to the index sample.
•
The following characteristics may serve as orientation for books:
– number of pages,
– format (length in cm),
– binding (hardback, binds in board, special binding),
– weight,
– ...
•
Record two or three characteristics. In general the number of pages, the format
(length) and the kind of binding are available.
•
It should be stressed that there is no reason to overdo things here and collect lots of
variables, as doing so would not be worth the effort. Actually, for books the
application of hedonics has a more limited role than for most other products. For
books the use of bestseller lists is the main device to control for quality, and
hedonics is there merely a supplementary device to reduce disturbing variation
which may occur in bestseller lists.
•
Distinguish between different versions of a title since different versions of a title (for
example different bindings) could be on the market at the same time. If so, include
more than one version of a book title in the sample.
242
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
An appropriate form which could be distributed to the price collectors should precisely
ask for the considered characteristics and might look like this:
Chart 177
Example of a price collection form
Author: Paulo Coelho
Title: The Witch of Portobello
ISBN: 978-0007251865
Price: 20.99 €
Number of pages: 320
Length: 21.6 cm
Binding:
… binds in board : hardback …
special binding
Shop: _______________
Date: ______________
•
The ISBN is needed for a clear identification of different book titles and its respective editions.
•
For the estimation of the regression equation it is desirable that price observations
used reflect a state of equilibrium (balance between supply and demand). That means
that if it is practically possible, it would normally be preferable that temporarily reduced prices due to promotion offers or stock clearances can be identified in the price
collection and not used for estimating the regression equation. (However, note that
in the index calculation, on the other hand, reduced prices shall not be excluded.)
Minimum size of the target sample
•
The regression sample should contain 50 to 100 observations depending on the number of characteristics that are used in the regression equation (see section on quality
adjustment). As a rule of thumb, 15 to 20 observations per characteristic which is finally included in the regression equation should be collected. In case that the suggested characteristics variables are length, number of pages and binding the regression sample should contain at least 45 to 60 observations.
•
The determination of the number of observations in the index sample depends on
the level of precision that is to be attained on the one side and cost calculations on
the other side. A rapidly changing market approach will however be rather demanding with regard to sampling issues. The reason is that book-samples that are based
on the most representative items of the moment can be quite heterogeneous and
therefore probably will have an instable structure over time. A sufficient sample size
will be a necessary (but not sufficient) condition to ease this problem.
6.3
•
Proposal for the replacement strategy
Please consider that the replacement strategy refers only to the index sample. The
regression sample is a cross-sectional sample where no “replacements” can occur.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
243
Books
•
The general remarks for the replacement strategy are the same as described in the
non-hedonic section for books.
Chart 178
Monthly replacement of bestseller lists
target sample
Consumption Segment
Fiction
M
M
M
M
M
M
M
Fiction
bestseller
list
M
M
M
M
M
M
M
M
Change to updated
bestseller list
Reorganisation of price observations
•
In case of using the hedonic method in connection with a bestseller list it is advisable
to use the following approach to organise your price observations. The chart below
gives an overview of the recommended approach which was originally presented in
the final report of TF QAS I (p. 10, with modifications):
Chart 179
Identification of each book on the bestseller list
Top five bestsellers of a specific bestseller list and their ranks
Price
series
period 1
period 2
period 3
period 4
period 5
rank
model
rank
model
rank
model
rank
model
rank
model
1
(1)
A
(1)
A
(3)
F
(2)
F
(2)
K
2
(2)
B
(2)
B
(2)
B
(5)
H
(4)
H
3
(3)
C
(3)
C
(1)
G
(3)
I
(3)
L
4
(4)
D
(4)
D
(4)
D
(4)
D
(5)
D
5
(5)
E
(5)
E
(5)
E
(1)
J
(1)
M
244
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
In the chart above, each alphabetic character stands for one specific book. No matter
on which rank a certain book title on the bestseller list is as long as it remains among
the “top 5” it stays on the same price series to minimize the need for quality adjustment procedures.
•
Example:
– In the example above, in price series 1 book A is observed in periods 1 and 2.
– In period 3, book A is no longer among the five bestsellers and therefore drops
out of the sample. Instead, book F enters the “top 5” of the bestseller list on
rank (3) and is placed in price series 1.
– Book F remains in the same price series in period 4 even after its rank on the bestseller list changes from (3) to (2).
– In period 5, book F is no longer among the five bestsellers and therefore drops
out of the sample. It is replaced by book K which for the first time enters the bestseller list on rank (2).
•
Furthermore, the chart indicates the duration of each book in the sample (e.g. in period 5 only the initial book D remains in the sample).
6.4
•
Proposal for the quality adjustment
As stated in the section on the construction of the target sample, for the application
of hedonic re-pricing it may be necessary to collect two data sets. In order to calculate the hedonic equation a comprehensive database is necessary which is called
“regression sample”. The hedonic equation is used to apply quality adjustment in the
case of replacement situations which can only occur in the index sample. In the following sections the individual steps are described in great detail.
Topic 1: Data needed
•
The compilation of the regression sample and of the index sample is described in
the section on the target sample.
•
The regression sample could look like this:
Chart 180
Example of a data base of a regression sample for books
Author
ISBN
Number
of pages
Length
Binding
Steel
Rogue
978-0593056752
320
23.6
hardback
Coelho
The Witch of
Portobello
978-0007251865
320
21.6
hardback
...
...
...
...
...
•
Title
...
The variables “Author”, “Title” and “ISBN” are not needed for the calculation but enable the identification of individual books.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
245
Books
•
In order to gain information on characteristics of a certain book the ISBN is a unique
identifier. It can be used to centrally collect the characteristics of a certain book title
in databases rather than locally in the stores. They can, for example, be found on the
internet (e.g. at www.amazon.xx ) or in the respective national library.
Topic 2: Data editing
•
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
•
Reformat category variables to dummy variables. These can only take values of 0 or 1
depending on whether the model is equipped with a specific feature or not. In the
case of books the characteristic “binding” has to be transformed to a dummy variable. For each dummy variable, one characteristic value has to be defined as default
(reference) value.
•
The price, the number of pages and the length are continuous variables. They can
be transformed to their natural logarithm if a semi-log or double-log formulation is
desired. In the present example a linear formulation is chosen. This results from the
final report of the Task Force on Quality Adjustment and Sampling where the linear,
the semi-log, and the double-log functions showed similar results. To simplify matters, the linear function is taken as example here.
Chart 181
Example for the modified database of the regression sample for books
Author
Title
ISBN
Length
Number
of pages
Binds in
board *
Steel
Rogue
978-0593056752
23.6
320
0
Coelho
The Witch of
Portobello
978-0007251865
21.6
320
0
...
...
...
...
...
...
* Default value: hardback binding.
Topic 3: Creating the hedonic function
•
After the data has been arranged, the hedonic regression equation can be set up. In
this regard, the price (as dependent variable) should be explained as a function of the
model’s characteristics (independent variables).
•
A number of 50 to 100 representative observations should be sufficient for establishing a regression equation.
•
Statistical programs like SPSS, Stata or SAS are helpful to calculate the regression.
The computer program is set up to perform a linear regression analysis on the data
of the regression sample. The price is to be considered as the “dependent” variable
and the characteristics (binding dummies, number of pages, size, etc.) are considered
as “independent” variables.
246
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
•
With regard to the general function form of the hedonic regression equation a linear
function form is suggested.
P = b0 + b1 ⋅ length + b2 ⋅ number of pages + b3 ⋅ binds in board + ... + ε
•
The hedonic regression equation contains continuous variables (length, number of
pages) and one dummy variable (binds in board). For each dummy variable, one
characteristic value has to be defined as default (reference) value. In the example
above, for the binding the hardback binding serves as a default value. That means
that for all hardback books the characteristic variable binds in board has a value of
zero.
•
Alternatively, as mentioned above under the Topic 2 data editing, a double-logarithmic
form of the hedonic regression equation can be used. Especially, the number of pages
variable can be transformed to the natural logarithm since the double logarithmic
form may be more suitable when continuous variables may vary over wide ranges,
with high ratios between upper and lower bound.
•
The statistical program computes the coefficients (corresponding to the particular
characteristics) and further statistical values which contain information on the significance of the variables.
•
The result of the regression analysis could look like this:
Chart 182
Example for the results of the hedonic regression
Regression coefficients
t
Year 2007 (R = 0.869; n = 245)
b0
Constant
– 11.625
– 4.853
b1
Length (cm)
1.416
12.837
b2
Number of pages
0.004
4.022
b3
Binds in board *
– 6.743
– 15.906
* Default value: hardback binding.
•
The variables that were deemed relevant to quality adjust for have been collected as
discussed under Topic 1. These were included into the regression equation. Potentially, some of the considered variables might not be significantly related to the price.
Those “insignificant” variables could be identified by their t-values. At the design
stage when the regression equation is tested, variables with t-values tending to be
within the interval of – 2 to + 2 could be considered for skipping from the regression
equation to be used operatively. One may have to be a little careful here, and some
skilled judgement is called for. It should be noted that the t-test is approximate, as
strictly seen it depends on assumptions that are usually not under perfect control.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
247
Books
•
Particularly in the testing of the regression equation at the design stage, it should
be checked whether the algebraic signs of the coefficients are as expected in view
of the meaning of these. If a “wrong” sign occurs, the possible cause should be
searched: Whether it is just expected random variation, or a misspecification of the
variable, or contaminated data, or something else. Possibly a variable may then be
identified as problematic and ultimately have to be skipped from the regression equation. But DON'T skip a variable automatically as soon as a “wrong” sign turns up, as
doing so may just sweep problems under the carpet and produce invalid results.
•
The particular regression coefficients can be interpreted as follows: The coefficient for
a continuous variable like number of pages (b2 = 0.004) indicates that one more page
within a books results in an expected price increase of approximately 0.004 EUR. The
coefficient for a dummy variable like binds in board (b3 = – 6.743) means that a book
with binds in board results in an expected price decrease of approximately 6.74 EUR.
•
In order to assess the adequacy of the regression estimation, consider the adjusted
R² value. For books this value should probably be expected to be around 0.5 – 0.6
or higher, as in principle a persistent relation of technological characteristics and the
price should be expected. An R² very close to 1 should not normally occur and may
indicate severely contaminated data. On the other hand, an adjusted R² considerably below 0.5 indicates a relatively low explanatory ability of the regression equation. A possible cause may be that the sample is taken from a fairly narrow and homogeneous stratum, where large characteristics differences do not occur. If so, the results
may possibly be useful anyhow, in spite of the low R²; here some skilled judgement
may be called for. Another possible cause for a low R² may be that some essential
characteristics variable has been left out and would have to be included in the regression equation.
Topic 4: Index calculation
•
This step describes the procedure for the index calculation. This procedure needs to
be repeated for each consumption segment.
•
Select the price observations for the HICP sample (e.g. include the five bestsellers of
each bestseller list). Restructure the list so that a certain book is on a single price series as long as it is within the “top 5” in order to minimise quality adjustment cases
and collect the corresponding characteristics.
•
At first those situations where a new book has appeared in the “top 5” have to be detected. In these cases the sampled books change from one month to the next which
implies different characteristic values and thus different quality levels of the two
books. Those quality differences have to be adjusted as described in the following.
•
In each replacement situation the values of the hedonic function h for the replaced
and the replacement model has to be calculated. Therefore, insert the variable values
of the respective model in the hedonic equation.
•
The coefficients of the hedonic equation have been calculated in Topic 3 and the variable values of the respective models are collected within the scope of price collection.
In the example, the regression coefficients are:
248
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 183
Example of the regression coefficients
Regression coefficients
•
b0
Constant
– 11.625
b1
Length (in cm)
1.416
b2
Number of pages
0.004
b3
Binds in board
– 6.743
In the next step, the quality adjustment factor has to be computed. The quality adjustment factor (“g”) is the relation of the values of the hedonic function h of the replaced model A and the replacement model B. In other words, it is the mathematical
ratio of quality for the new and the old model and refers to only this particular replacement situation:
g=
h ( X tB )
h ( X tA−1 )
=
Market value of the replacement
Market value of the replaced model
In the following this functional form is used. Alternatively, as in the case of books
there is no one-to-one matching (books are more or less individual objects and cannot be matched between periods) the non-matching form of the hedonic re-pricing
method can be considered:
g=
h ( standard set of characteristics )
h ( characteristics of actual model )
The numerator here is the value of the hedonic function for some hypothetical “standard” model with given characteristics. The idea is that all prices of both the price reference period and the current period, are modified so as to correspond to the same
standard quality.
•
This quality adjustment factor is used to adjust the quality of the replaced model by
multiplying its price with the quality adjustment factor:
Pt A−1 ⋅ g = quality adjusted price of the replaced model
•
To calculate the quality adjusted price development, divide the price of the replacement product by the quality adjusted price of the replaced product:
PtB
observed price of the replacement
=
A
Pt −1 ⋅ g observed price of the replaced model ⋅ g
•
For the index calculation, insert the quality adjusted price development in cases of replacements instead of the normal term. Here model A has been replaced by model B:
Federal Statistical Office, Statistics and Science, Vol. 13/2009
249
Books
§ PB
PC
PD PE
PF
I = ¨ At =1 ⋅ tC=1 ⋅ tD=1 ⋅ tE=1 ⋅ tF=1
¨P ⋅g P
t =0 Pt =0 Pt =0 Pt = 0
© t =0
·
¸
¸
¹
1/ 5
, with
PtB=1
Pt A=0
being replaced by
PtB=1
Pt A=0 ⋅ g
Topic 5: Refreshing the hedonic function
•
6.5
Given the slow evolution on the book market a yearly up-date of the regression equations is sufficient.
Cost estimation
One time effort for introduction
•
The implementation, including for example the collection of data for the estimation
of the regression, may require up to four months.
Current effort of price collection and index calculation
•
6.6
For the price collection and the index calculation all in all up to five days may be required.
Examples
Example for the identification of consumption segments:
•
It is assumed that three consumption segments are relevant in our example:
– fiction books,
– non-fiction books,
– children’s books.
•
In the following, the focus lies on fiction books. In order to deal with the two remaining consumption segments you would need to proceed analogously.
Example for the construction of a target sample:
•
In a first step, check whether a national bestseller list is available for fiction books for
your country. If a bestseller list is unavailable, please proceed as described above
in the section on sampling.
•
For this example, a bestseller list for fiction books is available. It is, furthermore, assumed that prices are fixed throughout the country.
•
The top five fiction titles from the bestseller list are exemplarily considered for price
collection.
250
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 184
Fictitious example of the sample of bestseller fiction books in the base period
Position
•
Author/Title
Price
(€)
1
Danielle Steel/Rogue
17.95
2
Paulo Coelho/The Witch of Portobello
20.99
3
Robert Menasse/Don Juan de la Mancha
20.99
4
Catherine Coulter/Tailspin
16.45
5
Henning Mankell/Italian Shoes
19.99
As the price development is similar throughout regions due to fixed prices, the prices
can be collected centrally, for example from a big bookstore. The characteristics of
the books are also collected.
Chart 185
Example of a price collection form
Author: Paulo Coelho
Title: The Witch of Portobello
ISBN: 978-0007251865
Price: 20.99 €
Number of pages: 320
Length: 21.6 cm
Binding:
… binds in board : hardback …
special binding
Shop: _____________
Date: _____________
Example for the replacement strategy
•
The following replacement situation refers to the index sample only.
•
In the case of bestseller lists one can not speak of one-to-one replacements. Instead,
for our example, in each month the first five books are chosen from the bestseller
list originating from the same source that was already considered in the previous
month. This way, the sample remains representative over time.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
251
Books
Chart 186
Fictitious example of the sample of bestseller fiction books in three sequent months
Month 1
Position
Rogue
The Witch of Portobello
Don Juan de la Mancha
Tailspin
Italian Shoes
deathly hallows
The Witch of Portobello
Rogue
Inkdeath
The Host
deathly hallows
18.95
Paulo Coelho/
20.99
The Witch of Portobello
20.99
Danielle Steel/
17.95
Rogue
17.95
Stephenie Meyer/
17.99
Stephenie Meyer/
19.99
Price
(€)
Harry Potter and the
18.95
Cornelia Funke/
16.45
Author/Title
Joanne K. Rowling/
Harry Potter and the
Danielle Steel/
20.99
Henning Mankell/
5
Price
(€)
Paulo Coelho/
20.99
Catherine Coulter/
4
The Host
18.95
Henning Mankell/
18.95
Italian Shoes
19.99
The chart above demonstrates the dynamics of bestseller lists:
–
–
–
252
17.95
Robert Menasse/
3
Author/Title
Month 3
Joanne K. Rowling/
Paulo Coelho/
2
•
Price
(€)
Danielle Steel/
1
•
Author/Title
Month 2
Two books remain in the sample in all three months: Rogue by Danielle Steel
and The Witch of Portobello by Paulo Coelho.
The book Italian Shoes by Henning Mankell is in the sample in the first month,
loses representativity in the second month so that it needs to be excluded and
re-gains representativity in the third month when it is again among the five top
selling books and therefore re-included in the sample.
The remaining books stayed in the sample either only for one month or for two
sequent months.
As long as a book title is among the top five of the bestseller list it is not replaced.
This book stays in the same price series. In order to keep the number of quality adjustment situations as low as possible you need to re-organise your observations as
illustrated by the following chart.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 187
Re-arrangement of the bestseller list to obtain price series
Month 1
Price
series
1
2
3
4
5
Month 2
Price
(€)
Author/Title
(1) Danielle Steel/
Rogue
(2) Paulo Coelho/
20.99
(5) Henning Mankell/
Inkdeath
The Host
The Witch of Portobello
20.99
Harry Potter and the
18.95
deathly hallows
18.95
(5) Henning Mankell/
17.99
(5) Stephenie Meyer/
19.99
17.95
(1) Joanne K. Rowling/
Harry Potter and the
deathly hallows
Rogue
(2) Paulo Coelho/
20.99
(4) Cornelia Funke/
16.45
Italian Shoes
The Witch of Portobello
Price
(€)
Author/Title
(3) Danielle Steel/
17.95
(1) Joanne K. Rowling/
(4) Catherine Coulter/
Tailspin
Rogue
(2) Paulo Coelho/
20.99
(3) Robert Menasse/
Don Juan de la Mancha
Price
(€)
(3) Danielle Steel/
17.95
The Witch of Portobello
Author/Title
Month 3
Italian Shoes
19.99
(4) Stephenie Meyer/
18.95
The Host
18.95
•
In our example in month 2 the book by Robert Menasse is no longer available or no
longer well sold and it is replaced for example by the book by Joanne K. Rowling.
•
The two books exhibit different characteristics with inherent different qualities. To
match the products quality adjustment has to be applied.
Example for quality adjustment by applying hedonic re-pricing:
Topic 1: Data needed
•
The compilation of the regression sample is described in the section on the target
sample.
•
The regression sample could look like this:
Chart 188
Example of a data base of a regression sample for books
Author
Title
ISBN
Number
of pages
Length
(cm)
Binding
Steel
Rogue
978-0593056752
320
23.6
hardback
Coelho
The Witch of
Portobello
978-0007251865
320
21.6
hardback
...
...
...
...
...
...
Topic 2: Data editing
•
After the regression sample is established, the data should be put in shape. That
means, the variables should be defined as numerical or dummy variables.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
253
Books
•
Reformat category variables to dummy variables. These can only take values of 0 or 1
depending on whether the model is equipped with a specific feature or not. In the
case of books the characteristic “binding” has to be transformed to a dummy variable.
For each dummy variable, one characteristic value has to be defined as default (reference) value.
•
The price, the number of pages and the length are continuous variables. In the present
example a linear formulation is chosen.
Chart 188
Example for the modified database of the regression sample for books
Author
Title
ISBN
Number
of pages
Length
(cm)
Binds in
board *
Steel
Rogue
978-0593056752
320
23.6
0
Coelho
The Witch of
Portobello
978-0007251865
320
21.6
0
...
...
...
...
...
...
* Default value: hardback binding.
Topic 3: Creating the hedonic function
•
The procedure of data collection for the regression sample and the subsequent creation of the hedonic function are exemplarily described in section on quality adjustment. The coefficients of the hedonic equation can be seen in the table below again:
Chart 189
Example of the regression coefficients
Regression coefficients
•
b0
Constant
– 11.625
b1
Length (in cm)
1.416
b2
Number of pages
0.004
b3
Binds in board
– 6.743
The reference binding is hardback.
Topic 4: Index calculation
•
254
In the following chart you can find five price series for three periods containing the
data that is required for the index calculation. At first, all replacement situations
(marked with a “Q”) have to be detected since only in those cases quality adjustment has to be applied. In this example from month 1 to month 2 there are three replacement situations, from month 2 to month 3 there is one replacement situation.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
Chart 190
Example of the price collection table
Model
•
Author
Title
Price
Price
Price
in month 1 Q in month 2 Q in month 3
(€)
(€)
(€)
A
Danielle Steel
Rouge
17.95
17.95
17.95
B
Paulo Coelho
The Witch of Portobello
20.99
20.99
20.99
C
Robert Menasse
Don Juan de la Mancha
20.99
Q
D
Catherine Coulter
Tailspin
16.45
Q
E
Henning Mankell
Italian Shoes
19.99
Q
F
Joanne K. Rowling Harry Potter and
the deathly hallows
18.95
G
Cornelia Funke
Inkdeath
17.99
H
Stephenie Meyer
The Host
18.95
19.99
18.95
Q
18.95
The index from month 1 to month 2 contains replacement cases (marked with “Q”).
Model C is replaced by Model F, Model D by Model G, and Model E by Model H. As
long as a book stays in the top five no quality adjustment is necessary regardless of
the exact rank on the list. For the price observations of the replaced and the replacement model the values of the hedonic function h have to be calculated. For this example a linear function is:
ht = b0 + b1 ⋅ v 1 + b2 ⋅ v 2 + b3 ⋅ v 3
•
To calculate the values of the hedonic function h for both the replaced and the replacement model, it is necessary to know the particular characteristics, which can be
seen in the following chart :
Chart 191
Example of the characteristics of replaced and replacement models
Model
Author
Title
Number
of pages
Length
(in cm)
Binds
in board
C
Robert Menasse
Don Juan de la Mancha
274
20.2
0
D
Catherine Coulter
Tailspin
416
23.0
0
E
Henning Mankell
Italian Shoes
448
20.8
0
F
Joanne K. Rowling Harry Potter and
the deathly hallows
608
20.4
0
G
Cornelia Funke
Inkdeath
656
21.6
0
H
Stephenie Meyer
The Host
617
22.0
0
Federal Statistical Office, Statistics and Science, Vol. 13/2009
255
Books
•
Inserting the variables into the hedonic equation yields the values of the hedonic
function h for the particular models:
h ( Menasse ) t =1 = −11.625 + 0.004 ⋅ 448 + 1.416 ⋅ 20.2 = 18.77 €
h ( Rowling )t =2 = −11.625 + 0.004 ⋅ 608 + 1.416 ⋅ 20.4 = 19.69 €
•
The mathematical ratio of the values of the hedonic function from the replacement F
(Rowling) and the replaced model C (Menasse) yields the quality adjustment factor
“g” which can be calculated as follows (the subscript r = 0 indicates that the regression was run in month 0):
C ,F
g r =0 =
•
h ( new ) t = 2
h ( old ) t =1
=
h ( Rowling ) t = 2
h ( Menasse ) t =1
=
18.77 €
19.69 €
= 0.95327
Concerning the other replacement cases proceed in the same way. For the replacement of Coulter by Funke, calculate
h (Coulter )t =1 = −11.625 + 0.004 ⋅ 416 + 1.416 ⋅ 23 = 22.61 €
h ( Funke )t =2 = −11.625 + 0.004 ⋅ 656 + 1.416 ⋅ 21.6 = 21.58 €
grD=,G0 =
h ( new )t =2
h ( Funke )t =2
21.58 €
=
=
= 0.95444
h ( old )t =1 h (Coulter )t =1 22.61 €
and for the replacement of Mankell by Meyer, calculate
h ( Mankell ) t =1 = −11.625 + 0.004 ⋅ 448 + 1.416 ⋅ 20.8 = 19.62€
h ( Meyer )t =2 = −11.625 + 0.004 ⋅ 617 + 1.416 ⋅ 22 = 22.00 €
grE=,H0 =
•
h ( new )t =2
h ( Meyer )t =2
22.00 €
=
=
= 1.12130
h ( old )t =1 h ( Mankell )t =1 19.62 €
To calculate the index from month 1 to month 2 the following formula is applied:
1/ 5
§ P A P B P F PG P H ·
I2 = ¨ t A=2 ⋅ tB=2 ⋅ tC=2 ⋅ tD=2 ⋅ tE=2 ¸
¨P
¸
© t =1 Pt =1 Pt =1 Pt =1 Pt =1 ¹
•
The index contains three replacements: model C is replaced by model F, model D by
model G, and model E by model H. As hedonic re-pricing is used for quality adjustment
PtF=2
PF
is replaced by C t =2 C , F
C
Pt =1
Pt =1 ⋅ gr =0
PtG=2
PtG=2
is
replaced
by
PtD=1
PtD=1 ⋅ grD=,0G
256
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
PtH=2
PtH=2
is
replaced
by
PtE=1
PtE=1 ⋅ grE=,0H
•
The index for month 2 is calculated in the following (including the application of
hedonic re-pricing):
1/ 5
I2QA
§ P A PB
·
PF
PG
PH
= ¨ t A=2 ⋅ tB=2 ⋅ C t =2C ,F ⋅ D t =2G,D ⋅ E t =2E ,H ¸
¨P
¸
© t =1 Pt =1 Pt =1 ⋅ gr =0 Pt =1 ⋅ gr =0 Pt =1 ⋅ gr =0 ¹
1/ 5
§ 17.95 € 20.99 €
·
18.95 €
17.99 €
18.95 €
¸¸
= ¨¨
⋅
⋅
⋅
⋅
© 17.95 € 20.99 € 20.99 € ⋅ 0.95327 16.45 € ⋅ 0.95444 19.99 € ⋅ 1.12130 ¹
= 0.9829
•
In month 3 only one replacement occurs: model G is replaced by model E which was
in the “top 5” in month 1 lost representativity in the second month so that it needed
to be excluded and re-gained representativity in the third month when it is again
among the five top selling books and therefore re-included in the sample. The values
of the hedonic function h for both models are calculated:
h ( Funke )t =2 = −11.625 + 0.004 ⋅ 656 + 1.416 ⋅ 21.6 = 21.58 €
h ( Mankell )t =3 = −11.625 + 0.004 ⋅ 448 + 1.416 ⋅ 20.8 = 19.62 €
grG=,E0 =
•
h ( new )t =2 h ( Mankell )t =3 19.62 €
=
=
= 0.90918
21.58 €
h ( old )t =1
h ( Funke )t =2
As in month 2 the index for month 3 is calculated in the following way:
1/ 5
I3QA
·
§ P A PB PF PH
PE
= ¨ t A=3 ⋅ tB=3 ⋅ tF=3 ⋅ tH=3 ⋅ G t =3G,E ¸¸
¨P
© t =2 Pt =2 Pt =2 Pt =2 Pt =2 ⋅ gr =0 ¹
§ 17.95 € 20.99 € 18.95 € 18.95 €
·
19.99 €
¸¸
= ¨¨
⋅
⋅
⋅
⋅
© 17.95 € 20.99 € 18.95 € 18.95 € 17.99 € ⋅ 0.90918 ¹
= 1.0409
Federal Statistical Office, Statistics and Science, Vol. 13/2009
1/ 5
257
Books
Chart 192
Monthly indexes
Model
Author
Price
in month 1 Q
(€)
Title
Price
in month 2 Q
(€)
Price
in month 3
(€)
A
Danielle Steel
Rouge
17.95
17.95
17.95
B
Paulo Coelho
The Witch of Portobello
20.99
20.99
20.99
C
Robert Menasse
Don Juan de la Mancha
20.99
Q
D
Catherine Coulter Tailspin
16.45
Q
E
Henning Mankell Italian Shoes
19.99
Q
F
Joanne K. Rowling Harry Potter and the
deathly hallows
18.95
G
Cornelia Funke
Inkdeath
17.99
H
Stephenie Meyer The Host
18.95
19.99
18.95
Q
18.95
Quality adjusted prices
C
Robert Menasse
D
Catherine Coulter Tailspin
15.70
E
Henning Mankell Italian Shoes
22.41
G
Cornelia Funke
Monthly indexes
258
Don Juan de la Mancha
20.01
Inkdeath
16.36
100
98.3
104.1
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
7
Annexes and references
7.1
Annexes
Annex 1: Details on different types of hedonic methods
Four main types of hedonic methods can be identified, namely the following (Triplett 2006;
cf. National Research Council 2002):
•
The hedonic time dummy variable method.
•
The hedonic characteristics price index method.
•
The hedonic price imputation method.
•
The hedonic re-pricing method.
These four types of hedonic methods will now be described and discussed. For each of the
types there are different variants.
The hedonic time dummy variable method
Alias: The direct method.
The hedonic time dummy variable method computes the price index as a transformed
regression coefficient from analysis on pooled data from more than one time-period.
Index computation with this method. Consider an example with three characteristics variables, z1 , z2 , z 3 . In the case of two considered periods, the reference period t = 0
and the comparison period t = 1 , the hedonic regression equation may have this form,
(15)
ln P = b0 + b1 ⋅ z1 + b2 ⋅ z2 + b3 ⋅ z3 + γ ⋅ Dt =1 + ε
where Dt =1 is a time dummy variable taking the value 1 for the period t = 1 and 0 otherwise.
The price index, for period 1 with reference period 0, now appears as the exponentiated
value of a regression coefficient in the regression equation,
It =1 = e γ .
The computation procedure is that a regression analysis is run on data from both periods
t = 0 and t = 1 simultaneously, and then the index It =1 is obtained from one of the estimated regression coefficients.
Variants: The same method could be applied directly in a setting with more than two periods, by taking several time dummy variables, one for each period after the reference period. However, another alternative is to fit the regression equation (15) for each pair of
adjacent periods, and then chain.
Remark: The index number is here obtained by another way than a familiar “index formula” such as Jevons index or Dutot index. Nevertheless, if there are no quality changes,
the index number It =1 obtained by fitting Eq. (15) to the observed prices turns out to be
equal to the usual Jevons index.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
259
Annex 1: Details on different types of hedonic methods
The hedonic characteristics price index method
Alias: The direct characteristics method.
The hedonic characteristics price index method computes a price index for a fixed basket
of characteristics, priced by currently updated hedonic functions.
Index computation with this method. Consider an example with five sampled productoffers, labelled A, C, D, E, F, in the reference period t = 0 . In the comparison period t = 1
the product-offer A has been replaced by another product-offer labelled B . The Jevons
index unadjusted for quality change is then computed as
(16)
§ PB PC PD PE
PF
I = ¨ tA=1 ⋅ tC=1 ⋅ tD=1 ⋅ tE=1 ⋅ tF=1
¨P
© t =0 Pt =0 Pt =0 Pt =0 Pt =0
·
¸
¸
¹
1/ 5
.
The hedonic characteristics price index method then means that the index is computed
as in eq. (16) but with this modification:
(17)
Use Eq. (16) but in place of
PtB=1
Pt A=0
take
PtB=1
Pt A=0 ⋅ h( z B )/ h( z A )
.
Here ht =0 (.) and ht =1(.) stand for hedonic functions estimated on data for each of the two
periods separately. Furthermore z A is the set of characteristics values of product-offer A.
Variants: Using the characteristics z A of the earlier product-offer A in Eq. (17) is in the
vein of a Laspeyres index, by implying a characteristics basket from the earlier period.
Similarly there is a Paasche-style alternative, taking in the place of z A in (17) instead
the characteristics z B of the later product-offer B, and a Fisher-style alternative with an
index equal to the geometric mean of the latter two indices.
The hedonic price imputation method
Alias: An indirect method.
The hedonic price imputation method computes a price index with some prices given by
a hedonic function.
Index computation with this method. In the example described in connection to eq. (16),
the hedonic price imputation method in a simple form could mean that the index is computed as in eq. (16) but with this modification:
(18)
Use Eq. (16) but in place of
PtB=1
Pt A=0
take
PtB=1
ht =0 ( z A )
.
Like in Eq. (17) for the Hedonic characteristics price index method just described,
ht =0 (.) here stands for a hedonic function, ideally pertaining to the period t = 0 .
Variants: By Eq. (18) a price given by a hedonic function is imputed for the newly observed
product-offer B in the earlier period, when it did not exist or its price was not collected.
On the other hand the vanishing product-offer A, not available in the comparison period, is
260
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
dropped from the index computation. An alternative way is to do it the other way around,
to impute for the vanishing product-offer A in the comparison period and drop the newly
observed product-offer A from the index computation.
Imputing for both vanishing and newly observed product-offers is a further alternative.
Then both the vanishing product-offer A and the newly observed product-offer B can be
included, and the index computation becomes
(19)
§ h (z A )
PtB=1
PtC=1 PtD=1 PtE=1 PtF=1
⋅
⋅
⋅
⋅
⋅
I = ¨ t =1 A
¨ P
ht =0 ( z B ) PtC=0 PtD=0 PtE=0 PtF=0
t =0
©
·
¸
¸
¹
1/ 6
Actually the number of observed prices here does not necessarily have to be the same
in the two periods, as one just imputes a price whenever a product-offer is absent in
either of the two periods.
Double imputation is yet another alternative, meaning that imputation is made also to
avoid comparison between observed and hedonically estimated prices, by an index computation like
(19')
§ h (z A ) h (zB ) PC
PD PE
PF
I = ¨ t =1 A ⋅ t =1 B ⋅ tC=1 ⋅ tD=1 ⋅ tE=1 ⋅ tF=1
¨ h (z ) h (z ) P
t =0
t =0 Pt =0 Pt =0 Pt =0
© t =0
·
¸
¸
¹
1/ 6
.
The hedonic re-pricing method
Alias: “The hedonic quality adjustment method”; the hedonic method of pricing of characteristics; an indirect method.
The hedonic re-pricing method computes a price index from observed prices that may be
modified in proportion to a hedonic function.
Note on terminology: “Hedonic re-pricing” is here suggested as name for the method that
Triplett (2006) and Eurostat (2005) refer to as “The hedonic quality adjustment method”.
The reason for this suggestion is that the name “The hedonic quality adjustment method”
may be very likely to be mistaken for “quality adjustment by hedonic methods” in general. The latter would include also the time dummy method etc. It seems advisable to avoid
that evidently misunderstanding-prone wording.
Index computation with this method. In the example described in connection to eq. (16),
the hedonic re-pricing method means that the index is computed as in eq. (16) but with
this modification:
(20)
Use Eq. (16) but in place of
PtB=1
Pt A=0
take
PtB=1
Pt A=0 h( z B )/ h( z A )
.
Again h(.) stands for a hedonic function, which should ideally be estimated fairly recently,
but the timing of its estimation is far less critical in this method than in those just described.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
261
Annex 1: Details on different types of hedonic methods
Variants: By Eq. (20) the reference price is adjusted for the quality change, and this should
often be a convenient practice. However it would in principle be possible to invert the adjustment factor and put it in the numerator and not in the denominator, meaning that instead the comparison price is adjusted to the same effect.
More interestingly, the method can be generalised to settings where there is no matching of product-offers between the reference period and the comparison period. Then each
price in both periods is increased or decreased by a factor corresponding to how the value
of the product-offer on account of its set of characteristics relates to that of a standardlevel set
z S of characteristics. The computation goes like this:
I =
(21)
(P
(P
B
S
t =1h( z
A
S
t =0 h( z
)/ h( z B ) ⋅ PtG=1h( z S )/ h( z G ) ⋅ PtH=1h( z S )/ h( z H ) ⋅
)/ h( z A ) ⋅ PtC=0 h( z S )/ h( z C ) ⋅ PtD=0 h( z S )/ h( z D ) ⋅
)
))
⋅ Pt J=1h( z S )/ h( z J ) ⋅ PtK=1h( z S )/ h( z K )
1/ 5
⋅ PtE=0 h( z S )/ h( z E ) ⋅ PtF=0 h( z S )/ h( z F
1/ 5
Here all the product-offers observed in the comparison period, labelled B, G, H, J, K, are
newly observed, not observed in the reference period. Actually the number of observed
prices here does not necessarily have to be the same in the two periods.
The non-matching form by Eq. (21) of the hedonic re-pricing method is useful in combination with the bestseller list approach for products like books and recorded media (cf.
Eurostat 2002).
Remark: It may be noted that if double imputation, like in Eq. (19), is applied in the same
setting as Eq. (21), the formula becomes very similar to Eq. (21). Effectively it differs from
Eq. (21) only in its more specified timing of the hedonic functions, and in the exponent
becoming 1/10 instead of 1/5. The difference in exponents may at first seem paradoxical
but is as it should. This is indicated by the fact that the mentioned exponents make both
indices satisfy the proportionality axiom (cf. ILO et al. 2004, Sect. 16.37). That is, the indices are such that if all prices change in the same proportion (and quality is unchanged),
then the index changes in the same proportion.
It may be illuminating to notice that the index by double imputation becomes identical to
the geometric mean of the hedonic characteristic price index method (as by Eq. (17)) and
the index by hedonic re-pricing of Eq. (21), if the hedonic function in Eq. (21) is taken as
the geometric mean of the two period-specific hedonic functions.
Choice of which type of hedonic method – normally use the hedonic re-pricing method
The question of which of the described types of hedonic method to choose will now be
dealt with.
The hedonic time dummy variable method appears to have been fairly much used in academic studies but not so much in official statistics, as Triplett (2006) notes. That method
has some difficulty in transparency. Quality adjustment and index calculation here take
262
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
place in the same “black box”, in a way hiding what is going on. As Melser (2004) notes,
the method also leads to non-compliance with some usually expected axiomatic index
properties.
The hedonic re-pricing method seems to be the one most often used in hedonic applications to official statistics, and there are apparently good reasons for this, as Triplett (2006)
notes. The updating of the hedonic function is not so critical with this method. The hedonic
regression may thus where suitable be performed on a larger data material than the monthly index computations. This may often be an important practical advantage, as the number
of monthly price observations may in many cases be rather small for a monthly regression
analysis on fresh data.
The hedonic re-pricing method also affects only such prices that are associated with quality change, which is in line with usual practice on quality adjustment generally and supports transparency. This feature is not least an advantage to effective data editing, as it
makes it feasible to search for errors in data which may lie behind suspicious results.
Furthermore the generalised form of the Hedonic re-pricing method, as described in connection to Eq. (10), makes this method suitably applicable also to situations where there
is no effective matching over time, such as for book titles or newly built houses.
It may also be noted that the HICP “Implementation Regulation”, as well as a later amendment to it (Commission of the European Communities 1996, 2007), contains the following
definition:
“Quality adjustment” means the procedure of making an allowance for an observed quality change by increasing or decreasing the observed current or reference price by a factor or an amount equivalent to the value of that quality change.
Apparently the hedonic re-pricing method is the hedonic method that is most clearly compatible with this formulation, although the formulation should probably not be seen as ruling out other hedonic methods.
The hedonic re-pricing method should be regarded as a default choice among hedonic
methods, to be used except possibly where there are very particular reasons to choose
some of the other hedonic methods.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
263
Annex 2: Brief theoretical digression on number of price observations
Annex 2: Brief theoretical digression on number of price observations
In the treatment of “Topic 1” in Applying hedonic regression – concepts and guidance for
use, the following rule of thumb was stated on the number of price observations needed:
For regression analysis use a sample so large that there are at least some 15 to 20 price
observations per characteristics variable in the regression equation.
The rule of thumb has some basis in statistical theory. A heuristic argument may be stated
briefly. Following Atkinson & Riani (2000, Sect. 2.1.2) and using the notation there, one
may express essentially the variances of the values of the hedonic function for observations number i = 1,…, n as
(22)
var yˆ i = σ 2 hi
where hi = x iT ( X T X ) −1 x i
Here σ 2 is the variance of the residual, and X is the n×p-matrix and x i are column pvectors, made up of the observed values of the characteristics variables. (Sorry, just here h
is used in quite another sense than hedonic function.)
Strictly the cited variance relation assumes certain ideal conditions, of independent identically distributed residuals etc. If those conditions are not quite met the equation may not
hold strictly, as neglected covariance terms may then interfere. Furthermore variances are
here to be understood in the sense of theory of model-based statistical analysis, which
means that variability is due to the residual term only and conditional on the characteristics variables.
For the number of observations n and the number of characteristics variables p, the following fact holds:
(23)
1
n
n
¦h
i
= p/n
i =1
As the numbers hi express proportions of variances, corresponding proportions of standard deviations are given by square roots. It may then be concluded that a condition for the
hedonic function not to be heavily disturbed by noise is that the square root of the number p/n is well below one, perhaps at most around 1/4 or so. This entails the above rule
of thumb.
264
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Hedonic regression
7.2
References
Atkinson, A. & Riani, M. (2000): Robust Diagnostic Regression Analysis, New York,
Springer.
Bartlett, V. & Lewis, T. (1994): Outliers in Statistical Data, Third Edition, Chichester, Wiley.
Commission of the European Communities (1996): Commission Regulation (EC) No
1749/1996.
Commission of the European Communities (2000): Commission Regulation (EC) No
2602/2000.
Commission of the European Communities (2007): Commission Regulation (EC) No
1334/2007 amending Commission Regulation (EC) No 1749/1996.
Belsley, D. A.; Kuh, E. & Welsch, R. E. (1980): Regression Diagnostics, New York, Wiley.
Davidson, R. & MacKinnon, J. G. (1993): Estimation and Inference in Econometrics, New
York, Oxford University Press.
Donald, S. G. & Maddala, G. S. (1993): Identifying outliers and influential observations
in econometric models, in: Handbook of Statistics, Vol. 11 (Econometrics), ed. by G. S.
Maddala et al., Amsterdam, Elsevier, Ch. 24, pp. 663 – 701.
Draper, N. R. & Smith, H. (1998): Applied Regression Analysis, Third Edition, Chichester,
Wiley.
Dulberger, E. R. (1989): The application of a hedonic model to a quality-adjusted price
index for computer processors, in: D. W. Jorgensen & R. Landau (eds.): Technology and
Capital Formation, Cambridge, MA, MIT Press, pp. 37 – 75.
Emerson, J. D. & Stoto, M. A. (1983): Transforming data, in: Understanding Robust and
Exploratory Data Analysis, ed. by D. C. Hoaglin et al., New York, Wiley (reprinted 2000),
Ch. 4, pp. 97 – 128.
Eurostat (2002): Action 4: Classification of QA methods on a case by case basis – Final
reports from Task Force on QAS, to WP HICP Meeting October 2002, Document HCPI
02/428, Parts A-E.
Eurostat (2005): Draft final report on computers and software, from Task Force Quality
Adjustment and Sampling, to WG HICP Meeting June 2005, Document HCPI 04/511-rev. 1.
Eurostat (2007): Lot C, European Comparison Programme: Consumer Price Survey E07-1
”House and garden” Particular survey guidelines
Eurostat (2008): HICP Manual (forthcoming).
Goodall, C. (1983): Examining residuals, in: Understanding Robust and Exploratory Data
Analysis, ed. by D. C. Hoaglin et al., New York, Wiley (reprinted 2000), Ch. 7, pp. 211 – 246.
Gordon, R. J. (1990): The Measurement of Durable Goods Prices, Chicago: Chicago University Press.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
265
References
Guédès, D. (2007): Fashion and consumer price index, 10 th meeting of the UN International Working Group on Price Indices (The Ottawa Group), in Ottawa.
International Labour Organisation – ILO(2003): Amended draft resolution concerning consumer price indices, Report of the Seventeenth International Conference of Labour Statisticians in Geneva,
ILO/IMF/OECD/UNECE/Eurostat/The World Bank (2004): Consumer price index manual:
Theory and practice. International Labour Office, Geneva (available on www.ilo.org).
Kennedy, P. (2008): A Guide to Econometrics, Sixth Edition, Malden, Blackwell.
Kutner, M. H.; Nachtsheim, C. J. & Neter, J. (2004): Applied Linear Regression Models,
Fourth Edition, Boston, McGraw Hill.
Maronna, R. A.; Martin, R. D. & Yohai, V. J. (2006): Robust Statistics, Chichester, Wiley.
McCullagh, P. & Nelder, P. (1989): Generalized Linear Models, London, Chapman & Hall.
Melser, D. (2004): The hedonic regression time-dummy method and the monotonicity
axioms, SSHRC International Conference on Index Number Theory and the Measurement
of Prices and Productivity in Vancouver.
Miller, A. (2002): Subset Selection in Regression, Second Edition, Boca Raton, Chapman
& Hall/CRC.
Miller, R. G. (1981): Simultaneous Statistical Inference, Second Edition, New York:
Springer.
National Research Council (2002): At What Price? Panel on Cost-of-Living Indexes, ed. by
C. L. Schultze et al., Washington DC, National Academy Press.
Nordberg, L. (1989): Generalized linear modeling of sample survey data, Journal of Official
Statistics, 5, pp. 223 – 239.
Ryan, T. P. (1997): Modern Regression Methods, New York, Wiley.
SAS (2004): SAS/STAT® 9.1 User's Guide, Cary NC, SAS Institute Inc.
Savin, N. E. (1984): Multiple hypothesis testing, in: Handbook of Econometrics, Vol. II,
ed. by Z. Griliches & M. D. Intriligator, Amsterdam, Elsevier, Ch. 14, pp. 827 – 879.
Triplett, J. (2006): Handbook on Quality Adjustment of Price Indexes for Information and
Communication Technology Products, OECD Directorate for Science, Technology and Industry, Paris, OECD.
Wilks, S. S. (1962): Mathematical Statistics, New York, Wiley.
Note:
SAS and SAS/STAT are registered trademarks of SAS Institute Inc., Cary, NC, USA.
Minitab, SPSS, STATA, and SYSTAT are trademarks of their proprietors.
266
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Further reading
Ahnert, H.; Kenny, G. (2004): Quality Adjustment of European Price Statistics and the Role
for Hedonics, ECB, Occasional Paper Series, No. 15.
Bascher, J.; Lacroix, T. (1999): Dish-washer and PCs in the French CPI: Hedonic Modelling,
from Design to Practice, 5th meeting of the UN International Working Group on Price Indices (The Ottawa Group), reprint available at www.ottawagroup.org.
Beer, M. (2005): Bootstrapping a Hedonic Price Index: Experience from Used Cars Data,
Paper presented at the “Statistische Woche 2005”, Braunschweig, 26 – 29 September
2005. (especially relevant for “used cars”)
Beer, M. (2007): Hedonic Elementary Price Indices: Axiomatic Foundation and Estimation
Techniques, PhD thesis, Fribourg.
Books in Europe (2003): What future for European Books? Books and the book market in
the enlarged European Union, Resolution, Athens, 10 – 11 April 2003, reprint available
at www.booksineurope.org. (especially relevant for “books”)
Boskin, M.; Dulberger, E.; Gordon, R.; Griliches, Z. and Jorgensen, D. (1996): Toward a
More Accurate Measure of the Cost of Living, Final Report to the Senate Finance Committee from the Advisory Commission to Study the Consumer Price Index.
Dalén, J. (2002): Criteria for the acceptability of quality adjustment methods, Statistik Austria, HICP Research, Final Report.
Dalén, J.; Ingelsson, U. (2004): Measuring price changes for books based on hedonic models, paper for the SSHRC International Conference on Index Number Theory and the Measurement of Prices and Productivity, Vancouver, 30 June – 3 July 2004. (especially relevant
for “books”)
Diewert, E. (2001): Hedonic Regressions: A consumer theory approach, Discussion
Paper 01-12, Department of Economics, University of British Columbia.
Diewert, E. (2002): Hedonic Producer Price Indices and Quality Adjustment, Discussion
Paper 02-14, Department of Economics, University of British Columbia.
Eurostat (2002): Classification of QA methods on a case by cas basis – Final reports
from from Task Force on QAS. HCPI 02/428-A to E, Luxembourg.
Eurostat (2005): HICP standards for cars and other vehicles, HCPI 05/520 Rev. 1 Final,
Luxembourg. (especially relevant for “cars”)
Fenwick, D.; Wingfield, D. (2005): Methodological Improvements to the Retail Prices Index
and Consumer Prices Index from February 2005, Economic Trends, No. 616, pp. 74 – 80.
Godfray, T. (2006): Opportunities for booksellers created by the Digitisation of Book Content, Presentation for the European Bookseller Federation, Brussels, 4th December 2006.
(especially relevant for “books”)
Federal Statistical Office, Statistics and Science, Vol. 13/2009
267
Further reading
van der Grient, H. A. (2004): Scanner Data on Durable Goods: Market Dynamics and
Hedonic Time Dummy Indexes, Mimeo, Voorburg, Statistics Netherlands. (especially relevant for “televison sets”)
de Haan, J. (2007): Hedonic Price Indexes: A comparison of imputation, time dummy and
other approaches, paper presented at the EMG workshop on Productivity, Prices and R&D
Capitalisation Sydney, 12 – 14 December 2007.
ILO/IMF/OECD/UNECE/Eurostat/The World Bank (2004): Consumer price index manual:
Theory and practice, Geneva, International Labour Office, reprint available at www.ilo.org.
Linz, S.; Dexheimer, V. (2005): Dezentrale hedonische Indizes in der Preisstatistik, in: Wirtschaft und Statistik, 3/2005, Wiesbaden, Statistisches Bundesamt. (especially relevant
for “televison sets” and “washing machines”)
Lowe, R. (1998): Televisions: Quality Changes and Scanner Data. 4th meeting of the UN
International Working Group on Price Indices (The Ottawa Group), reprint available at
www.ottawagroup.org. (especially relevant for “televison sets”)
Manninen, K. (2005): The Effects of the Quality Adjustment Method on Price Indices for
Digital Cameras, WP 2005-01, U.S. Department of Commerce, BEA, Bureau of Economic
Analysis.
Moulton, B.; LaFleur, T.; Moses, K. (1998): Research on improved quality adjustment in
the CPI: The case of televisions, 4th meeting of the UN International Working Group on
Price Indices (The Ottawa Group), reprint available at www.ottawagroup.org. (especially
relevant for “televison sets”)
van Mulligen, P. H. (2003): Quality aspects in price comparisons and international comparisons: Applications of the hedonic method, Voorburg, Statistics Netherlands. (especially relevant for “computers”)
National Research Council (2002): At What Price? Conceptualizing and Measuring Costof-Living and Price Indexes, C. L. Schultze and C. Mackie (eds.), Washington DC, National Academy Press.
Pakes, A. (2003): A Reconsideration of Hedonic Price Indexes with an Application to PCs,
American Economic Review, 93(5), pp. 1578 – 96. (especially relevant for “computers”)
Pashigian, B. (2001): The Used Car Price Index: A Checkup and Suggested Repairs,
WP 338, BLS Working Papers. (especially relevant for “used cars”)
Peach, R.; Alvarez, K. (1996): Core CPI: Excluding Food, Energy . . . and Used Cars, Current Issues in Economics and Finance, 2(4), Federal Reserve Bank of New York. (especially relevant for “used cars”)
Ramsey, J. B. (1969): Tests for Specification Errors in Classical Linear Least Squares Regression Analysis, Journal of the Royal Statistical Society B, 31/2, pp. 350 – 371.
268
Federal Statistical Office, Statistics and Science, Vol. 13/2009
Further reading
Reese, M. (2001): Hedonic Quality Adjustment Methods for College Textbooks in the
U.S. CPI, Washington DC, Bureau of Labor Statistics, reprint available at www.bls.gov/cpi/
cpictb.htm. (especially relevant for “books”)
Reis, H.; Santos Silva, J. M. C. (2006): Hedonic Prices Indexes for New Passenger Cars in
Portugal (1997 – 2001), Economic Modelling, 23(6), pp. 890 – 908. (especially relevant
for “new cars”)
Silver, M.; Ioannidis, C.; Haworth, M. (1997): Hedonic Quality Adjustments for Non-Comparable Items for Consumer Price Indices, 3rd meeting of the UN International Working
Group on Price Indices (The Ottawa Group), reprint available at www.ottawagroup.org.
(especially relevant for “television sets”)
Triplett, J. E. (2006): Handbook on hedonic indexes and quality adjustments in price indexes: special application to information technology products, Paris, OECD. (especially
relevant for “computers”)
White, H. (1980): A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct
Test for Heteroskedasticity, Econometrica, 48/4, pp. 817 – 838.
Federal Statistical Office, Statistics and Science, Vol. 13/2009
269