How to revise the GUM? Walter Bich
Transcription
How to revise the GUM? Walter Bich
Accred Qual Assur (2008) 13:271–275 DOI 10.1007/s00769-008-0357-y DISCUSSION FORUM How to revise the GUM? Walter Bich Received: 22 November 2007 / Accepted: 5 January 2008 / Published online: 26 January 2008 Springer-Verlag 2008 The Guide to the expression of uncertainty in measurement, GUM [1], was published in 1993 by the International Organization for Standardisation (ISO) in the name of seven international organizations, namely: ISO itself, The Bureau International des Poids et Mesures (BIPM), the International Electrotechnical Commission (IEC), the International Federation for Clinical Chemistry and Laboratory Medicine (IFCC), the International Unions for Pure and Applied Chemistry (IUPAC) and Physics (IUPAP), and the International Organization for Legal Metrology (OIML). It was reprinted with minor corrections in 1995. In 1997, the same organizations established the Joint Committee for Guides in Metrology (JCGM). The International Laboratory Accreditation Cooperation (ILAC) joined in 1998. The JCGM has two working groups. Working group 1, ‘‘Expression of uncertainty in measurement’’, has the task ‘‘to promote the use of the GUM and to prepare supplements for its broad application’’. Working group 2 ‘‘on International vocabulary of basic and general terms in metrology’’, has the task ‘‘to revise and promote the use of the VIM’’. The first meeting of JCGM-WG1 was in March 2000. It was then decided that the GUM would not be revised in the short term, despite some identified limitations and drawbacks. Supplements covering these limitations and drawbacks would instead be produced [2]. The main reason for this decision was that the GUM was becoming by that time the authoritative document in the field of measurement uncertainty; as its elaboration had been a fragile compromise between several different views, it was not deemed timely to liven up a debate never really worked out. Recently, the JCGM decided that in the future the GUM would be revised, and the first contributions started to appear. Specifically, in two recent papers [3, 4] Rabinovich explains his views on improvement of the GUM. In Ref. [3], the main criticism of the present GUM is that it is intended only for repeated measurements, and nothing is said in it about single measurements, which constitute the large majority. In the second [4], various other issues are addressed, among which the most important is probably that concerning the old debate about the true value of a quantity. In this paper I will try to explain the principles underpinning the GUM, and discuss the lines along which I think it should be revised and updated. Papers published in this section do not necessarily reflect the opinion of the Editors, the Editorial Board and the Publisher. Generalities W. Bich (&) Istituto Nazionale di Ricerca Metrologica, 10135 Torino, Italy e-mail: [email protected] In 1980 the working group established to address the problem of unified treatment of uncertainty in measurement Abstract The announcement of a revision of the Guide to the expression of uncertainty in measurement has renewed the debate about the topic of measurement uncertainty. In this paper the author, chairman of Working Group 1 of the Joint Committee for Guides in Metrology, replies to the theses given in two recent papers by Semion Rabinovich. His opinions are personal, and are not necessarily shared by the JCGM/WG1. They are to be intended as a further contribution to the present discussion. Keywords Metrology Measurement Measurement uncertainty Introduction 123 272 clearly stated, in its recommendation INC-1, that uncertainties should be expressed as variances (or, better, as their positive square roots) both for random and systematic components. This recommendation was reaffirmed in the CIPM recommendations CI-1981 (GUM, A.2) and CI-1986 (GUM, A.3). The decision to adopt variances was opposed to a different view, according to which error limits were to be preferred. There are good reasons for preferring variances to error limits. Variances are well-defined quantities in the theory of random variables, and their properties, especially as concerns their ‘‘propagation’’ have been known for a long time. In contrast, error limits, or maximum errors, are difficult to define, and since their properties depend on their definition, they are not good candidates for unique treatment of uncertainties. The GUM was developed according to the CIPM recommendations, and the best one can do presently, almost 15 years since its publication, is to try to further develop its framework, and to resolve, within that framework, its inconsistencies. This is precisely the mission of the JCGM/WG1. Accred Qual Assur (2008) 13:271–275 The phase of deciding a model relating the measurand to the input quantities is part of what is sometimes denoted as the ‘‘formulation stage’’. The GUM approach to this specific stage is quite general, the only prescription being that a model must be available. The case treated in Ref. [3] concerns a specific model claimed to be suitable for single measurements and is compliant with the general framework of the GUM. Some guidance, mainly through examples, is given in the GUM about the formulation stage (GUM, Annex H). In addition, a model is given for the cosine error, as a case representative of a class of experimental situations in which highly asymmetric distributions are involved for the input quantities (GUM, F.2.4.4). Examples of such experimental situations in the chemical field are titration or measurement of the concentration of impurities. Encoding an experimental procedure in a suitable model can be a difficult task. Recognizing this fact and with the aim of enriching the treatment of modelling, the JCGM/WG1 will prepare a specific supplement devoted to this topic [2]. The modelling of single, or ‘‘direct’’ measurements might be considered as an useful example. The GUM framework Input evaluation stage In this section I will review the main steps of the GUM uncertainty framework, commenting on their merits and limitations. Formulation stage The framework of the GUM is now well-known, and is based on the assumption that the measurand Y (or output quantity) is not observed directly, but is obtained via a number of, say, N other quantities Xi (the input quantities) to which it is related through a known functional relationship, the model Y = f(X1, X2, …, XN). Even the simplest, seemingly direct measurements, such as those mentioned by Rabinovich, fall into this categorization. For example, the indication of a bathroom balance, which is expressed in divisions of the scale, is not the measurand Y (which is the mass of the person in kilograms), but simply one of the input quantities, say, X1. The measurand is obtained from the indication X1, perhaps repeated two or three times, and a series of corrections X2, X3, …, XN (the zero and the span of the scale, and perhaps its linearity, or the deviation of the local acceleration due to gravity from that of the place in which the balance was manufactured and adjusted). Although in many practical cases, to obtain an estimate, the corrections are negligible, the model must be used to evaluate the uncertainty of that estimate. In fact, the uncertainty associated with negligible corrections is not, in general, also negligible. 123 The subsequent step might be called the ‘‘input evaluation stage’’. The experimenter has now to assign estimates to the input quantities, as well as uncertainties associated with these estimates. Assignment of estimates is rather intuitive. Some may come from indications (repeated or not) of instruments; in this case one usually takes the average of the indications. Others may be constant values taken from the literature or from prior experience (for example, the value of a reference standard taken from a certificate of calibration, or a coefficient of thermal expansion taken from a textbook). In the simplest cases one has an indication from the reading of an instrument and a number of corrections expected to be equal to zero or to unity (for additive and multiplicative models, respectively). The assignment of the uncertainties associated with input quantities is less obvious. Clause 4 of the GUM is entirely devoted to this task, and Annex F gives practical guidance on several cases, including that of a single estimate, among which the case of a single indication from an instrument is included. Therefore, it is unfair to claim that the GUM ‘‘does not mention single measurements’’ [3]. The procedures given in the GUM are well defined, and the well-known classification of evaluations of uncertainty in Types A and B has been introduced to distinguish clearly between two procedures adopting statistics and probability theory, respectively. The distinction is not only formal, and deserves some discussion. Accred Qual Assur (2008) 13:271–275 Type A evaluations If a quantity is repeatedly sampled during the experiment, so that a set of indications is available and its average is used to estimate the quantity value, the experimental standard deviation of the mean of the sample is typically (but not invariably) assigned as the uncertainty associated with the estimate of that quantity (GUM, 4.2.3) (Type A evaluation). In the GUM, this is considered as an estimate of the ‘‘true’’ standard deviation, as the sample average is viewed as an estimate of the ‘‘true’’ quantity value. A degrees of freedom is attached to the estimate of the standard deviation, as a measure of its reliability (or uncertainty). Therefore, uncertainties obtained by Type A evaluations are uncertain themselves (see GUM, E.4). Type B evaluations If a set of indications is not available for a quantity, a subjective distribution of probability (PDF) is assigned, embodying the available knowledge of that quantity (Type B evaluation). This approach is based on interpretation of probability as degree of belief. The mean (expectation) of the PDF is the estimate assigned to the quantity and its variance is calculated in the appropriate way depending on the available information (GUM, 4.3). Therefore, mean and variance of the PDF in Type B evaluations are the true distribution parameters, and should be viewed as exact, i.e, with no uncertainty. This concept is not fully implemented in the GUM, as degrees of freedom (although typically very high) are attached to them. The assignment of subjective degrees of freedom in Type B evaluations is not convincing and looks like an ad hoc procedure to align Type B with Type A evaluations (GUM, G.4.2), in view of the determination of the expanded uncertainty (see below). In any case, this internal inconsistency represents, not only in my opinion, the main drawback of the GUM [5, 6]. However, as far as the discussion concerns the uncertainties associated with input estimates, the inconsistency is only conceptual and does no harm. Propagation stage The subsequent stage is the ‘‘propagation stage’’. The problem here is to obtain an estimate of the measurand and its associated uncertainty, given the model, the input estimates and the uncertainties associated with them. Also in this specific topic the GUM does not suggest anything new, but simply adopts a well-known property 273 of random variables [7], that is, a random variable which is function of other random variables has a variance (and an expectation) which can be obtained from those of the independent variables upon which it depends, according to a comparatively simple (elementary for the expectation) formula. This result is based on a first-order, or linear, approximation. Therefore, it improves with the closeness to linearity of the relationship compared to the magnitude of the variances. If the nonlinearity is appreciable, higher-order terms can be added, subject to some conditions, to improve the approximation. Therefore, as much as only standard uncertainties are concerned, the framework of the GUM is satisfactory, at least from the practical viewpoint, in many experimental situations. However, a word of caution is necessary here, especially in connection with Type A evaluations. The point is that the formula is valid for parameters of a PDF, that is, for the ‘‘true’’ variances (and expectations), whereas, in the present GUM framework, both the input quantity values and variances are considered as estimates of the corresponding parameters. This implies that the estimates must be close to the corresponding parameters for the formula to be (approximately) valid. Expanded uncertainty The concept of expanded uncertainty was introduced to meet the need for greater confidence in the possible value of the measurand than that given by the standard uncertainty. The GUM solution is to multiply the standard uncertainty u(y) by a numerical factor k. ‘‘In general, k will be in the range 2 to 3’’ (GUM, 6.3.1). However, it should be realized that simple multiplication by k does not add value to the amount of information given by the standard uncertainty (GUM, 6.2.3), unless a measure of the confidence is at hand, that is, the coverage probability is known. This expanded uncertainty at a prescribed coverage probability, Up, is a measure of uncertainty which really adds value with respect to standard uncertainty, and is what is almost invariably required, for example, in metrology, in the declaration of the Calibration and Measurement Capabilities [8]. In that case, as in many others, an interval is required within which the value of the measurand lies with a known (typically high, say, 0.95) degree of belief, or probability. In the framework of the present GUM, this task is very difficult to fulfill, even approximately, for two reasons. First, the shape of the PDF for the output quantity Y is not known, especially in the tails (which are the interesting part in a coverage interval); second, the standard uncertainty itself is uncertain, which makes uncertain not only the shape, but also the size of the PDF. 123 274 Drawbacks and remedies Drawbacks From the above discussion, the main drawback of the present GUM is the following internal inconsistency—on one hand PDFs are interpreted as pictures of the available knowledge; on the other, degrees of freedom are attached to their parameters, which are viewed as estimates affected by an uncertainty. In other words, implementation of the view of probability as degree of belief is incomplete. This has important consequences. If the input uncertainties are uncertain, so is the output uncertainty, and thus the size of the output PDF. The way out is suggested in Annex G of the GUM, on the basis that in most situations the output PDF is a scaled-and-shifted Student’s t distribution. Guidance is given on how to determine the degrees of freedom of such a PDF, the so-called effective degrees of freedom, from the degrees of freedom of the input PDFs and their variances, by means of the Welch–Satterthwaite formula (GUM, G.4). In this case, a coverage interval can be constructed comparatively easily. The objection to this scheme is that the conditions for the output PDF being a scaled-and-shifted Student’s t distribution are quite strong and not likely to be met in many practical cases. For example, the input PDFs should be several, independent, and all more or less of the same size or, alternatively, a Gaussian should dominate. This requirement is not met for a simple measurement in which a dominant uniform (say, a value of the reference standard taken from a calibration certificate) is superposed on a small Gaussian (representing the comparison noise). This specific case can be treated analytically in a straightforward way, but in many other cases the treatment is difficult, or impossible. A second limit of the present GUM is that it does not cover exhaustively the case of an arbitrary number of measurands determined from a common set of input quantities. This case is frequent, for example when measuring complex quantities typical of electricity. Remedies A full implementation of the concept of probability as degree of belief would greatly help. In this view, the PDFs are assigned to the input quantities on the basis of available knowledge and are viewed as a way to encode the latter in a rigorous mathematical language. This view, intuitive for Type B evaluations, can easily be adopted for Type A evaluations also. Although in the former case knowledge comes from non-statistical ways and in the latter it comes from a set of indications, both types can be encoded in the same way, using the appropriate tools. In the former case, the principle of maximum entropy would be used [9], in the 123 Accred Qual Assur (2008) 13:271–275 latter, Bayes’ theorem [10], yielding as a result a scaled-andshifted Student’s t distribution. As a consequence, the input uncertainties would have no uncertainty other than the uncertainty associated with the measurand. This modification would remove one of the difficulties in the propagation of uncertainties. The law would be applied to expectations and variances, rather than to their estimates, and therefore it would be valid, within the limits of a first-order approximation. Also the construction of a coverage interval would be simplified, being reduced to a mathematically clear problem—that of determining the PDF of a random variable, given the PDFs of those on which it depends. This problem has a well-known formal solution [11], although calculation difficulties prevent its application except in simple cases. Therefore, it is preferred to use a numerical method, yielding a numerical approximation of the PDF for the measurand, from which the required coverage interval can easily be constructed. This topic is treated in the first supplement to the GUM since its publication, now in print [12]. As to the case of an arbitrary number of measurands, it will be the subject of a second specific supplement, now at an advanced stage of drafting. It is worth noting that the supplements are to be considered as a complement to the GUM, and must be used in conjunction with it. Whether to implement them in the revision of the main document is still matter of debate. Further issues There are other reasons to revise the GUM. The most important is to make it compliant with VIM 3 [13], which introduced important modifications in terminology, and in general to review the document carefully and eliminate some minor ambiguities. This takes us to the questions raised by Rabinovich in his other paper [4] concerning terminology and, especially, the terms ‘‘true value’’ and ‘‘error’’ carefully avoided in the present GUM. As concerns the former term, the GUM framework implies a measurand which, during measurements, is considered ‘‘essentially unique’’ (GUM, 1.2), although there would be no difficulty modifying the framework to encompass more general measurands, for example, those having an intrinsic uncertainty, be it definitional or other. In regard of the latter term, ‘‘error’’, it is, at least in part, connected with the former, so that if one becomes alive again there is no reason to demonize the other. In any case, the VIM 3 has new (!) definitions to which a revised GUM should be fully compliant. I personally would have appreciated in VIM 3 a term such as ‘‘estimate’’, which has a precise meaning in probability and would be the first choice for an experimenter to indicate a value for the measurand obtained from a measurement. In any case, the decision to distinguish in the VIM 3 between true value and a Accred Qual Assur (2008) 13:271–275 value of the measurand is appreciable and will contribute to clarification of GUM language. This, in general, has to be in equilibrium between VIM definitions and terms that have a precise meaning in the language of probability theory, which underpins the whole GUM approach. The existence in the GUM of two uncertainties, although with different qualifiers, ‘‘standard’’ and ‘‘expanded’’, might also be a source of ambiguity. In the future GUM, it is likely that the role of expanded uncertainty, viewed as a coverage interval for symmetric PDFs, will be de-emphasized, in line with the approach of Supplement 1. Also the classification of the methods of evaluation of input uncertainties as Types A and B is likely to follow a similar destiny. From the notational viewpoint, a weak point of the present GUM is the adoption of the same symbol for both a quantity and the corresponding random variable; this should be corrected in a future revision. Conclusion I have tried in this paper to convey some ideas on how the GUM should be revised. They represent a personal viewpoint, although they are in a sense the elaboration of long discussions with my colleagues on the JCGM-WG1, whom I thank. 275 References 1. BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, and OIML (1995) Guide to the expression of uncertainty in measurement, 2nd edn. ISBN 92-67-10188-9 2. Bich W, Cox MG, Harris PM (2006) Metrologia 43:S161–S166 3. Rabinovich SG (2007) Accred Qual Assur 12:419–424 4. Rabinovich SG (2007) Accred Qual Assur 12:603–608 5. Kacker RN (2006) Metrologia 43:1–11 6. Kacker RN, Jones AT (2003) Metrologia 40:235–248 7. Lee PM (1992) Bayesian statistics: an introduction. Edward Arnold, London, 294 p 8. Comite´ International des Poids et Mesures (CIPM) (1999) Mutual recognition of national measurement standards and of calibration and measurement certificates issued by national metrology institutes, BIPM, Paris. http://www.bipm.org/utils/en/pdf/mra_ 2003.pdf 9. Weise K, Wo¨ger W (1992) Meas Sci Technol 3:1–11 10. Sivia DS (2004) Data analysis—a Bayesian tutorial. Oxford University Press, Oxford, 189 p 11. Bickel PJ, Doksum KA (1977) Mathematical statistics. PrenticeHall, Englewood Cliffs, 492 p 12. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML (2008) Evaluation of measurement data. Supplement 1 to the Guide to the expression of uncertainty in measurement. Propagation of distributions using a Monte Carlo method, ISO/IEC Guide 98-3/Supplement 1. ISO, Geneva 13. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML (2007) International vocabulary of metrology—basic and general concepts and associated terms, VIM, 3rd edn. International Organization for Standardization, Geneva 123