Proficiency testing in analytical laboratories: how to make it work
Transcription
Proficiency testing in analytical laboratories: how to make it work
Accred Qual Assur (1996) 1 : 49–56 Q Springer-Verlag 1996 R. Wood M. Thompson Received: 3 November 1995 Accepted: 20 November 1995 R. Wood Food Labelling and Standards Division, Ministry of Agriculture, Fisheries and Food, CSL Food Science Laboratory, Norwich Research Park, Colney, Norwich NR4 7UQ, UK M. Thompson Department of Chemistry, Birkbeck College (University of London), Gordon House, 29 Gordon Square, London WC1H 0PP, UK REVIEW PAPER Proficiency testing in analytical laboratories: how to make it work Abstract This paper covers the role of proficiency testing schemes in providing an occasional but objective means of assessing and documenting the reliability of the data produced by a laboratory, and in encouraging the production of data that are “fit-for-purpose”. A number of aspects of proficiency testing are examined in order to highlight features critical for their successful implementation. Aspects that are considered are: accreditation, the economics and scope of proficiency testing schemes, methods of scoring, assigned values, the target value of standard deviation sp, the homogeneity of the distributed ma- Introduction It is now universally recognised that for a laboratory to produce consistently reliable data it must implement an appropriate programme of quality assurance measures. Amongst such measures is the need for the laboratory to demonstrate that its analytical systems are under statistical control, that it uses methods of analysis that are validated, that its results are “fit-for-purpose” and that it participates in proficiency testing schemes. These requirements may be summarised as follows. Internal quality control Internal quality control (IQC) is one of a number of concerted measures that analytical chemists can take to ensure that the data produced in the laboratory are under statistical control, i.e. of known quality and uncer- terial, proficiency testing in relation to other quality assurance measures and whether proficiency testing is effective. Stress is placed on the importance of any proficiency testing scheme adhering to a protocol that is recognised, preferably internationally. It is also important that the results from the scheme are transparent to both participating laboratory and its “customer”. Key words Proficiency, testing 7 Fit for purpose 7 Internal quality control 7 Harmonised international protocol tainty. In practice it is effected by comparing the quality of results achieved in the laboratory at a given time with results from a standard of performance. IQC therefore comprises the routine practical procedures that enable the analytical chemist to accept a result or group of results or to reject the results and repeat the analysis. IQC is undertaken by the inclusion of particular reference materials, “control materials”, into the analytical sequence and by duplicate analysis. Analytical methods Analytical methods should be validated as fit for purpose before use by a laboratory. Laboratories should ensure that, as a minimum, the methods they use are fully documented, laboratory staff trained in their use and that they have implemented a satisfactory IQC system. 50 Proficiency testing Proficiency testing and accreditation Proficiency testing is the use of results generated in interlaboratory test comparisons for the purpose of a continuing assessment of the technical competence of the participating laboratories [1]. With the advent of “mutual recognition” on both a European and world wide basis, it is now essential that laboratories participate in proficiency testing schemes that will provide an interpretation and assessment of results which is transparent to the participating laboratory and its “customer”. Participation in proficiency testing schemes provides laboratories with an objective means of assessing and documenting the reliability of the data they are producing. Although there are several types of proficiency testing schemes, they all share a common feature: test results obtained by one laboratory are compared to an external standard, frequently the results obtained by one or more other laboratories in the scheme. Laboratories wishing to demonstrate their proficiency should seek and participate in proficiency testing schemes relevant to their area of work. However, proficiency testing is only a snapshot of performance at infrequent intervals – it will not be an effective check on general performance or an inducement to achieve fitness for purpose, unless it is used in the context of a comprehensive quality system in the laboratory. The principles of proficiency testing are now well established and understood. Nevertheless, there are some aspects of practice that need further amplification and comment. This paper aims to highlight some of these. Despite the primary self-help objectives described above, an acceptable performance in a proficiency testing scheme (where available) is increasingly expected as a condition for accreditation. Indeed, in the latest revision of ISO Guide 25 it is a requirement that laboratories participate in appropriate proficiency testing schemes whenever these are available [2]. Fortunately both the accreditation requirements and the “self-help intentions” can be fulfilled by the same means at one and the same time. Elements of proficiency testing In analytical chemistry proficiency testing almost invariably takes the form of a simultaneous distribution of effectively identical samples of a characterised material to the participants for unsupervised blind analysis by a deadline. The primary purpose of proficiency testing is to allow participating laboratories to become aware of unsuspected errors in their work and to take remedial action. This it achieves by allowing a participant to make three comparisons of its performance: with an externally determined standard of accuracy; with that of peer laboratories; with its own past performance. In addition to these general aims, a proficiency testing scheme should specifically address fitness for purpose, the degree to which the quality of the data produced by a participant laboratory can fulfil its intended purpose. This is a critical issue in the design of proficiency testing schemes that will be discussed below. History of proficiency testing: International Harmonised Protocol Proficiency testing emerged from the early generalised interlaboratory testing that was used in different degrees to demonstrate proficiency (or rather lack of it), to characterise analytical methods and to certify reference materials. These functions have now been separated to a large degree, although it is still recognised that proficiency testing, in addition to its primary function, can sometimes be used to provide information on the relative performance of different analytical methods for the same analyte, or to provide materials sufficiently well characterised for IQC purposes [3]. The systematic deployment of proficiency testing was pioneered in the United States in the 1940s and in the 1960s in the United Kingdom by the clinical biochemists, who clearly need reliable results within institutional units and comparability between institutions. However, the use of proficiency testing is now represented in most sectors of analysis where public safety is involved (e.g. in the clinical chemistry, food analysis, industrial hygiene and environmental analysis sectors) and increasingly used in the industrial sector. Each of these sectors has developed its own approach to the organisation and interpretation of proficiency testing schemes, with any commonality of approach being adventitious rather than by collaboration. To reduce differences in approach to the design and interpretation of proficiency testing schemes the three international organisations ISO, IUPAC and AOAC INTERNATIONAL have collaborated to bring together the essential features of proficiency testing in the form of The International Harmonised Protocol for the Proficiency Testing of (Chemical) Analytical Laboratories [4, 5]. This protocol has now gained international acceptance, most notably in the food sector. For the food sector it is now accepted that proficiency testing schemes must conform to the International Harmonised Protocol, and that has been endorsed as official policy by the Codex Alimentarius Commission, AOAC INTERNATIONAL and the European Union. 51 Studies on the effectiveness of proficiency testing have not been carried out in a systematic manner in most sectors of analytical chemistry, although recently a major study of proficiency testing under the auspices of the Valid Analytical Measurement (VAM) programme has been undertaken by the Laboratory of the Government Chemist in the United Kingdom. However, the results have yet to be published (personal communication). This paper comments on some critical aspects of proficiency testing, identified as a result of experience in applying the International Harmonised Protocol to operational proficiency testing schemes. Economics of proficiency testing schemes: requirement for laboratories to undertake a range of determinations offered within a proficiency testing scheme Proficiency testing is in principle adaptable to most kinds of analysis and laboratories and to groups of laboratories of all sizes. However, it is most effectively and economically applied to large groups of laboratories conducting large numbers of routine analyses. Setting up and running a scheme has a number of overhead costs which are best distributed over a large number of participant laboratories. Moreover, if only a small range of activities is to be subject to test, then proficiency testing can address all of them. If in a laboratory there is an extremely wide range of analyses that it may be called upon to carry out (e.g. a food control laboratory), it will not be possible to provide a proficiency test for each of them individually. In such a case it is necessary to apply proficiency testing to a proportion of the analyses that can be regarded as representative. It has been suggested that for laboratories undertaking many different analyses a “generic” approach should be taken wherever possible. Thus, for general food analysis laboratories, they should participate in, and achieve a satisfactory performance from, series dealing with the testing of GC, HPLC, trace element and proximate analysis procedures, rather than for every analyte that they may determine (always assuming that an appropriate proficiency testing scheme is available). However, the basic participation should be supplemented by participation in specific areas where regulations are in force and where the analytical techniques applied are judged to be sufficiently specialised to require an independent demonstration of competence. In the food sector examples of such analytes are aflatoxins (and other mycotoxins), pesticides and overall and specific migration from packaging to food products. However, it is necessary to treat with caution the inference that a laboratory that is successful in a particu- lar proficiency scheme for a particular determination will be proficient for all similar determinations. In a number of instances it has been shown that a laboratory proficient in one type of analysis may not be proficient in a closely related one. Two examples of where ability of laboratories to determinate similar analytes is very variable are described here. Example 1: Total poly- and (cis) mono-unsaturated and saturated fatty acids in oils and fats Results from proficiency testing exercises that include such tests indicate that the determinations are of variable quality. In particular, the determination of poly-unsaturated and saturated fatty acids is generally satisfactory but the determination of mono-unsaturated fatty acids is unduly variable with a bi-modal distribution of results sometimes being obtained. Bi-modality might be expected on the grounds that some participant laboratories were able to separate cis- from trans- mono-unsaturated fatty acids. However, examination of the methods of analysis used by participants did not substantiate this – some laboratories reported results as if they were separating cis- and trans- fatty acids even though the analytical systems employed were incapable of such a separation. This is clearly demonstrated in Reports from the UK Ministry of Agriculture, Fisheries and Food’s Food Analysis Performance Assessment Scheme [6]. Example 2: Trace nutritional elements (zinc, iron, calcium etc.) Laboratories have been asked to analyse proficiency test material which contains a number of trace elements of nutritional significance, e.g. for zinc, calcium and iron etc. It has been observed that the number of laboratories which achieve “satisfactory” results for each analyte determined in the same test material differs markedly, thus suggesting that the assumption that the satisfactory determination of one such analyte is indicative that a satisfactory determination would be observed for all similar analytes is not valid. This conclusion is generally assumed even if the elements are determined in a “difficult” matrix, such as in a foodstuff, where many of the problems may be assigned to a matrix effect rather than the end-point determination. Other limitations are apparent in proficiency testing. For example, unless the laboratory uses typical analytical conditions to deal with the proficiency testing materials (and this is essentially out of the control of the organiser in most schemes) the result will not enable participants to take remedial action in case of inaccuracy. This gives rise to a potential conflict between the reme- 52 dial and the accreditation roles of proficiency testing. It is unfortunate that successful participation in proficiency testing schemes has become a “qualification” (or at least poor performance a “disqualification”) factor in accreditation. Nevertheless, it is recognised by most proficiency testing scheme organisers that their primary objective is to provide help and advise — not to “qualify” or “accredit” participants. Finally, it must be remembered that extrapolation from success in proficiency tests to proficiency in everyday analytical work is an assumption — in most circumstances it would be prohibitively expensive and practically difficult for a proficiency testing organiser to test the proposition experimentally by using undisclosed testing. However, most customers would anticipate that performance in a proficiency testing exercise would be the “best” that is achievable by a laboratory, and that repeated poor performance in a proficiency testing scheme is not acceptable. Scoring Converting the participant’s analytical results into scores is nearly always an essential aid to the interpretation of the result. Those scores must be transparent to both the laboratory and its “customer”; that customer may be either a customer in the conventional sense or an accreditation agency. Raw analytical results are expressed in a number of different units, cover a large range of concentrations and stem from analyses that may need to be very accurate or may require only “order-of-magnitude” accuracy. An effective scoring system can reduce this diversity to a single scale on which all results are largely comparable and which any analytical chemist or his client can interpret immediately. Such a scoring system (the zscore) has been recommended in the International Harmonised Protocol. A number of other scoring systems have evolved in the various proficiency testing schemes which are presently operating; many of these systems incorporate arbitrary scaling, the main function of which is to avoid negative scores and fractions. However, all of these scores can be derived from two basic types of score, the z-score and the q-score [4, 5]. The first action in converting a result into a score is ˆ between the to consider the error, the difference x-X ˆ (X ˆ being the best result x and the assigned value X available estimate of the true value). This error can then be scaled by two different procedures: q-scores The q-score results by scaling the error to the assigned ˆ )/X ˆ . Values of q will be nearly zerovalue, i.e. qp(x-X centred ( in the absence of overall bias among the participants). However, the dispersion of q will vary among analytes often by quite large amounts and so needs further interpretation. Thus a “stranger” to the scheme would not be able to judge whether a score represented fitness for purpose — the scheme is not transparent. z-scores The z-score results by scaling the error to a target value ˆ )/sp. If the partifor standard deviation, sp, i.e. zp(x-X cipating laboratories as a whole are producing data that are fit for purpose and are close to normally distributed (as is often the case) the z-score can be interpreted roughly as a standard normal deviate, i.e. it is zero-centred with a standard deviation of unity. Only a relatively few scores (F0.1%) would fall outside bounds of B 3 in “well-behaved” systems. Such bounds (normally B3 or B2) are used as decision limits for the instigation of remedial action by individual laboratories. The B 3 boundary has already been prescribed in the UK Aflatoxins in Nuts, Nut Products, Dried Figs and Dried Fig Products Regulations [7]. If participants as a whole were performing worse than the fitness for purpose specification, then a much larger proportion of the results would give z-scores outside the action limits. Because the error is scaled to the parameter sp it is immediately interpretable by both participating laboratory and its customers. Combining scores Many scheme organisers and participants like to summarise scores from different rounds or from various analytes within a single round of a test as some kind of an average; various possibilities are suggested in the International Harmonised Protocol. Such combinations could be used within a laboratory or by a scheme organiser for review purposes. Although it is a valid procedure to combine scores for the same analyte within or between rounds, it has to be remembered that combination scores can mask a proportion of moderate deviations from acceptability. Combining scores from different analytes is more difficult to justify. Such a combination could for instance hide the fact that the results for a particular analyte were always unsatisfactory. Use of such scores outside the analytical community might therefore give rise to misleading interpretations. Thus, it must be emphasised that the individual score is most informative; it is the score that should be used for any internal or external “assessment” purposes and that combination scores may, in some situations, disguise unsatisfactory individual scores. 53 ˆ The selection of assigned values, X The method of determining the assigned value and its uncertainty is critical to the success of the proficiency testing scheme. An incorrect value will affect the scores of all of the participants. For scheme co-ordinators the ideal proficiency testing materials are certified reference materials (CRMs), because they already have assigned values and associated uncertainties. In addition, participants cannot reasonably object to the assigned value for the CRM as such values have normally been derived by the careful (and expensive!) certification exercises, usually carried out on an international basis. Unfortunately the use of CRMs for proficiency test material is relatively limited, as it is comparatively seldom that an appropriate CRM can be obtained at sufficiently low cost for use in a proficiency testing exercise. In addition, use in proficiency testing schemes is not the primary objective in the preparation of CRMs. Considerable thought is therefore given to the validation of materials specially prepared by organisers of proficiency testing schemes. There are essentially three practical ways in which the assigned value can be determined, these being: Through test material formulation; from the consensus mean from all participants; from the results from “expert laboratories”. Test material formulation An inexpensive and simple method is applicable and available when the distributed material is prepared by formulation, i.e. by mixing the pure analyte with a matrix containing none. The assigned value is then simply calculated as the concentration or mass of analyte added to the matrix. The uncertainty can readily be estimated by consideration of the gravimetric and volumetric errors involved in the preparation, and is usually small. A typical example where the technique applies is the preparation of materials for alcoholic strength determinations where alcohol of known strength can be added to an appropriate aqueous medium. However, there are several factors which prevent this technique being widely used. Unless the material is a true solution it is difficult to distribute the analyte homogeneously in the matrix. In addition, there is often a problem in obtaining the added analyte in the same chemical form as the native analyte and, in many instances the nature of the material itself would prevent its preparation by formulation. Generally, however, participants in the proficiency testing scheme have confidence in this method of using the test material formulation to obtain the assigned value. Consensus mean of results from all participants The most frequently used procedure is probably that of taking a consensus mean of all the participants. In many sectors this is particularly appropriate where the analyses under consideration are considered “simple” or “routine”, and where the determination is well understood chemically or where there is a widely used standard method. The procedure will apply specifically where the method used is empirical or defining. In such instances the consensus of the participants (usually a robust mean of the results) is, by definition, the true value within the limits of its uncertainty, which itself can be estimated from a robust standard deviation of the results. When several distinct empirical methods are in use in a sector for the determination of the same nominal analyte, it is important to recognise that they may well provide results that are significantly different. This would give rise to a problem in identifying a consensus value from a mixture of results of the methods. Therefore it is sometimes useful to prescribe the particular empirical method that is to be used for obtaining the consensus. Examples of empirical determinations are the determination of “extractable copper” in a soil or the proximate analysis (moisture, fat ash and crude protein determinations) of a foodstuff. Usually, although participants have confidence in the consensus mean of the all-participants procedure, there are instances where it is much less appropriate to use the consensus mean, e.g. when the analysis is regarded as difficult and where the analyte is present at low trace levels. Under those circumstances it is by means rare for the consensus to be significantly biased or, in some instances, for there to be no real consensus at all. Organisers should recognise that such an approach encourages consistency among the participants but not necessarily trueness. It can easily institutionalise faulty methods and practices. An example of this is given in Fig. 1, where the results are displayed as recommended in the Harmonised Protocol. However, the mean has been calculated as a consensus mean for all participants. As a result, the laboratories 7865, 2504 and 3581 appear to be different from other participants. However, in this example, it is generally recognised these laboratories are “expert” in the particular technique being assessed. In such situations the “expert laboratory” procedure should be adopted. Results from “expert laboratories” Should the above methods of obtaining an assigned value be inappropriate, the organiser may resort to the use of analysis by expert laboratories using definitive or 54 Fig. 1 standard methods. In principle one such laboratory could be used if the participants felt confident in the result. However, in many instances it would be better if concurrent results from several expert laboratories were used. This is obviously an expensive option unless the experts are normal participants in the scheme. A possible modification of this idea would be the use of the consensus of the subset of the participants regarded as completely satisfactory performers. Organisers using this strategy must be vigilant to avoid the drift into a biased consensus previously mentioned. The uncertainty ua on the assigned value is an important statistic. It must not be too large in relation to the target value for standard deviation (discussed below); otherwise the usefulness of the proficiency testing is compromised. This problem occurs if the possible variability of the assigned value becomes comparable with the acceptable variability defined by sp. As a rule of thumb ua must be less than 0.3 sp to avoid such difficulties [8]. The selection of the target value of standard deviation, sp This is another critical parameter of the scoring system. Ideally the value of sp should represent a fitness for purpose criterion, a range for interlaboratory variation that is consistent with the intended use of the resultant data. Hence a satisfactory performance by a participant should result in a z-score within the range B2 (although B3 may be used in some situations). Obviously the value of sp should be set by the organisers considering what is in fact fit for purpose, before the proficiency testing begins. This precedence is essential so that the participants know in advance what standard of performance is required. A number of proficiency testing schemes do not take this approach and use the value of sp generated from within the reported results for any one distribution of test material. This is inappropriate, as by definition about 95% of participants will automatically be identified as being satisfactory — even if by any objective consideration they were not. The use 55 of an external standard of performance is therefore essential. As the concentration of the analyte is unknown to the participants at the time of analysis, it may be necessary to express the criterion as a function of concentration rather than a single value applicable over all concentrations. It is also important that the value of sp used for an analysis should remain constant over extended periods of time, so that z-scores of both individual participants and groups of participants remain comparable over time. As stressed above, the foregoing excludes the possibility of using the actual robust standard deviation of a round of the test as the denominator in the calculation of z-scores. It also excludes the use of criteria that merely describes the current state of the art. Such practice would undoubtedly serve to identify outlying results but would not address fitness for purpose. It could easily seem to justify results that were in fact not fit for purpose. Moreover, it would not allow comparability of scores over a period of time. The question of how to quantify fitness for purpose remains incompletely answered. A general approach has been suggested based on the minimisation of cost functions [9], but has yet to be applied to practical situations. Specific approaches based on professional judgements are used in various sectors. In the food industry the Horwitz function [10] is often taken as a fitness for purpose (acceptability) criterion whereas in others, e.g. in clinical biochemistry, criteria based on probabilities of false positives and negatives have evolved [11]. In some areas fitness for purpose may be determined by statutory requirements, particularly where method performance characteristics are prescribed, as by the European Union [12] and the Codex Alimentarius Commission for veterinary drug residues methods. Homogeneity of the distributed material As most chemical analysis is destructive, it is essentially impracticable to circulate among the participants a single specimen as a proficiency testing material. The alternative is to distribute simultaneously to all participants samples of a characterised bulk material. For this to be a successful strategy the bulk material must be essentially homogeneous before the subdivision into samples takes place. This is simple in the instance where the material is a true solution. In many instances, however, the distributed material is a complex multiphase substance that cannot be truly homogeneous down to molecular levels. In such a case it is essential that the samples are at least so similar that no perceptible differences between the participants’ results can be attributed to the proficiency testing material. This condition is called “sufficient homogeneity”. If it is not de- monstrated the validity of the proficiency testing is questionable. The International Harmonised Protocol recommends a method for establishing sufficiently homogeneity. (More strictly speaking, the test merely fails to detect significant lack of inhomogeneity.) After the bulk material has been homogenised it is divided into the test material for distribution. Ten or more of the test materials are selected at random and analysed in duplicate under randomised repeatability conditions by a method of good precision and appropriate trueness. The results are treated by analysis of variance and the material is deemed to be sufficiently homogeneous, if no significant variation between the analyses is found, or if the between-sample standard deviation is less than 0.3 sp. There is a potential problem with the test for homogeneity — it may be expensive to execute because it requires at least 20 replicate analyses. In the instance of a very difficult analysis dependent on costly instrumentation and extraction procedures, e.g. the determination of dioxins, the cost of the homogeneity test may be a major proportion of the total cost of the proficiency test. Moreover, if the material is found to be unsatisfactory, the whole procedure of preparation and testing has to be repeated. Some organisers are so confident of their materials that they do not conduct a homogeneity test. However, experience in some sectors has shown that materials found to be satisfactory in some batches are decidedly heterogeneous in other batches after the same preparative procedures. Another complication of such testing is that a single material may prove to be acceptable for one analyte and heterogeneous for another. A possible strategy that could be used with care is to store the random samples selected before distribution, but to analyse them only if the homogeneity of the material is called into question after the results have been examined. However, no remedial action would then make the round of the proficiency testing usable if heterogeneity were detected, so the whole round would have to be repeated to provide the proficiency information for the participants. In general, it seems that homogeneity tests are a necessary expense, unless the distributed material is a true solution that has been adequately mixed before subdivision. Proficiency testing and other quality assurance measures While proficiency testing provides information for a participant about the presence of unsuspected errors, it is completely ineffectual unless the proficiency testing is an integral part of the formal quality system of the laboratory. For example, proficiency testing is not a substitute for IQC, which should be conducted in every 56 run of analysis for the detection of failures of the analytical system in the short term to produce data that are fit for purpose. It seems likely that the main way in which proficiency testing benefits the participant laboratory is in compelling it to install an effective IQC system. This actually enhances the scope of the proficiency testing scheme. The IQC system installed should cover all analyses conducted in the laboratory, and not just those covered by the proficiency testing scheme. In one scheme it was shown that laboratories with better-designed IQC systems showed considerably better performance in proficiency testing [13]. A crucial feature of a successful IQC scheme was found to be a control material that was traceable outside the actual laboratory. An important role of proficiency testing is the triggering of remedial action within a laboratory when unsatisfactory results are obtained. Where possible the specific reason for the bad result should be determined by reference to documentation. If consistently bad results are obtained, then the method used (or the execution of the method protocol) must be flawed or perhaps applied outside the scope of its validation. Such interaction encourages the use of properly validated methods and the maintenance of full records of analysis. Does proficiency testing work? There are two aspects of proficiency tests that need consideration in judging their efficacy. These aspects concern the “inliers” and the “outliers” among the results of the participants in a round. The inliers represent the laboratories that are performing consistently as a group but may need to improve certain aspects of their performance by attention to small details of the method protocol. The outliers are laboratories that are making gross errors perhaps by committing major deviations from a method protocol, by using an improperly validated method, by using a method outside the scope of its validation, or other comparable faults. The dispersion of the results of the inliers (as quantified by a robust standard deviation) in an effective scheme would be expected at first to move round by round towards the value of sp and then stabilise close to that value. Ideally then, in a mature scheme the proportion of participants falling outside defined z-scores should be roughly predictable from the normal distribution. Usually in practice in a new scheme the dispersion will be considerably greater than sp in the first round but improve rapidly and consistently over the subsequent few rounds. Then the rate of improvement decreases towards zero. If the incentives to perform well are not stringent, the performance of the group of laboratories may stabilise at a level that does not meet the fitness for purpose requirement. Examples of this may be found in some schemes where the proportion of participants obtaining satisfactory results in, say, the determination of pesticides has increased over time but has now stabilised at F70% rather than the 95% which is the ultimate objective. However, where there are external constraints and considerations (e.g. accreditation) the proportion of outliers rapidly declines round by round (discounting the effects of late newcomers to the scheme) as the results from the scheme markedly penalise such participants. In addition, in view of the importance of proficiency testing schemes to the accreditation process, the need for proficiency testing schemes to themselves become either accredited or certified needs to be addressed in the future. Conclusion Although it is now a formal requirement in many sectors that analytical laboratories participate in proficiency testing schemes, such participation is not always without problems. Particular issues that proficiency testing scheme organisers should address include the selection of the procedure for determining the assigned value and an external standard for target standard deviation. It is also important that they adhere to a protocol that is recognised, preferably internationally, and that the results from the scheme are transparent to both participant laboratory and its “customer”. References 1. ISO Guide 43 (1993), 2nd Edition, Geneva 2. ISO Guide 25 (1993), 2nd Edition, Geneva 3. Thompson M, Wood R (1995), Pure Appl Chem 67 : 649–666 4. Thompson M, Wood R (1993) Pure Appl Chem 65 : 2123–2144 5. Thompson M, Wood R (1993) J AOAC International 76 : 926–940 6. Report 0805 of the MAFF Food Analysis Performance Assessment Scheme, FAPAS Secretariat, CSL Food Laboratory, Norwich, UK 7. Statutory Instrument 1992 No. 3326, HMSO, London 8. Statistics Sub-Committee of the AMC (1995) Analyst 120 : 2303–2308 9. Thompson M, Fearn T, Analyst, (in press) 10. Horwitz W (1982) Anal Chem 54 : 67A–76A 11. IFCC approved recommendations on quality control in clinical chemistry, part 4. Internal quality control” (1980) J Clin Chem Clin Biochem 18 : 534–541 12. Offical Journal of the European Union, No. L118 of 14.5.93, p64 13. Thompson M, Lowthian P J (1993) Analyst 118 : 1495–1500