ASTM E2935-2013 Standard Practice for Conducting Equivalence Testing in Laboratory Applications《实验室中进行等价测试的实施规程》.pdf

资源描述

1、Designation: E2935 13 An American National StandardStandard Practice forConducting Equivalence Testing in Laboratory Applications1This standard is issued under the fixed designation E2935; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revi

2、sion, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice provides statistical methodology for con-ducting equivalence testing on numerical data

3、 from twosources to determine if their true means are similar withinpredetermined limits.1.2 Applications include (1) equivalence testing for biasagainst an accepted reference value, (2) determining equiva-lence of two test methods, test apparatus, instruments, reagentsources, or operators within a

4、laboratory, and (3) equivalenceof two laboratories in a method transfer.1.3 The current guidance in this standard applies only toexperiments conducted on a single material. Guidance is givenfor determining the amount of data required for an equivalencetrial.1.4 The statistical methodology for determ

5、ining equivalenceused is the “Two one-sided t-test” (TOST). The control of risksassociated with the equivalence decision is discussed.1.5 The values stated in SI units are to be regarded asstandard. No other units of measurement are included in thisstandard.1.6 This standard does not purport to addr

6、ess all of thesafety concerns, if any, associated with its use. It is theresponsibility of the user of this standard to establish appro-priate safety and health practices and determine the applica-bility of regulatory limitations prior to use.2. Referenced Documents2.1 ASTM Standards:2E177 Practice

7、for Use of the Terms Precision and Bias inASTM Test MethodsE456 Terminology Relating to Quality and StatisticsE2282 Guide for Defining the Test Result of a Test MethodE2586 Practice for Calculating and Using Basic Statistics3. Terminology3.1 DefinitionsSee Terminology E456 for a more exten-sive list

8、ing of statistical terms.3.1.1 accepted reference value, na value that serves as anagreed-upon reference for comparison, and which is derivedas: (1) a theoretical or established value, based on scientificprinciples, (2) an assigned or certified value, based on experi-mental work of some national or

9、international organization, or(3) a consensus or certified value, based on collaborativeexperimental work under the auspices of a scientific orengineering group. E1773.1.2 bias, nthe difference between the expectation of thetest results and an accepted reference value. E1773.1.3 confidence interval,

10、 nan interval estimate L, Uwith the statistics L and U as limits for the parameter andwith confidence level 1 , where Pr(LU) 1 . E25863.1.3.1 DiscussionThe confidence level, 1 , reflects theproportion of cases that the confidence interval L, U wouldcontain or cover the true parameter value in a seri

11、es of repeatedrandom samples under identical conditions. Once L and U aregiven values, the resulting confidence interval either does ordoes not contain it. In this sense “confidence” applies not to theparticular interval but only to the long run proportion of caseswhen repeating the procedure many t

12、imes.3.1.4 confidence level, nthe value, 1 , of the probabilityassociated with a confidence interval, often expressed as apercentage. E25863.1.4.1 Discussion is generally a small number. Confi-dence level is often 95 % or 99 %.3.1.5 confidence limit, neach of the limits, L and U, of aconfidence inte

13、rval, or the limit of a one-sided confidenceinterval. E25863.1.6 degrees of freedom, nthe number of independentdata points minus the number of parameters that have to beestimated before calculating the variance. E25863.1.7 equivalence, nsimilarity between two populationparameters within predetermine

14、d limits.3.1.8 intermediate precision conditions, nconditions un-der which test results are obtained with the same test methodusing test units or test specimens taken at random from a singlequantity of material that is as nearly homogeneous as possible,1This test method is under the jurisdiction of

15、ASTM Committee E11 on Qualityand Statistics and is the direct responsibility of Subcommittee E11.20 on TestMethod Evaluation and Quality Control.Current edition approved Aug. 1, 2013. Published August 2013. DOI: 10.1520/E2935-13.2For referenced ASTM standards, visit the ASTM website, www.astm.org, o

16、rcontact ASTM Customer Service at serviceastm.org. For Annual Book of ASTMStandards volume information, refer to the standards Document Summary page onthe ASTM website.Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States1and with changing

17、conditions such as operator, measuringequipment, location within the laboratory, and time. E1773.1.9 mean, nof a population, , average or expectedvalue of a characteristic in a population of a sample, Xsumof the observed values in the sample divided by the samplesize. E25863.1.10 population, nthe to

18、tality of items or units ofmaterial under consideration. E25863.1.11 population parameter, nsummary measure of thevalues of some characteristic of a population. E25863.1.12 precision, nthe closeness of agreement betweenindependent test results obtained under stipulated conditions.E1773.1.13 repeatab

19、ility, nprecision under repeatabilityconditions. E1773.1.14 repeatability conditions, nconditions where inde-pendent test results are obtained with the same method onidentical test items in the same laboratory by the same operatorusing the same equipment within short intervals of time. E1773.1.15 re

20、peatability standard deviation (sr), nthe standarddeviation of test results obtained under repeatabilityconditions. E1773.1.16 sample, na group of observations or test results,taken from a larger collection of observations or test results,which serves to provide information that may be used as a bas

21、isfor making a decision concerning the larger collection. E25863.1.17 sample size, n, nnumber of observed values in thesample. E25863.1.18 sample statistic, nsummary measure of the ob-served values of a sample. E25863.1.19 test result, nthe value of a characteristic obtainedby carrying out a specifi

22、ed test method. E22823.1.20 test unit, nthe total quantity of material (containingone or more test specimens) needed to obtain a test result asspecified in the test method. See test result. E22823.2 Definitions of Terms Specific to This Standard:3.2.1 bias equivalence, nequivalence of a populationme

23、an with an accepted reference value.3.2.2 equivalence limit, E, nin equivalence testing, a limiton the difference between two population parameters.3.2.2.1 DiscussionIn certain applications, this may betermed practical limit or practical difference.3.2.3 equivalence test, na statistical test conduct

24、ed withinpredetermined risks to confirm equivalence of two populationparameters.3.2.4 means equivalence, nequivalence of two populationmeans.3.2.5 power, nin equivalence testing, the probability ofaccepting equivalence, given the true difference between twopopulation means.3.2.5.1 DiscussionIn the c

25、ase of testing for bias equiva-lence the power is the probability of accepting equivalence,given the true difference between a population mean and anaccepted reference value.4. Significance and Use4.1 Laboratories conducting routine testing have a continu-ing need to evaluate test result bias, to ev

26、aluate changes forimproving the test process performance, or to validate thetransfer of a test method to a new location or apparatus. In allsituations it must be demonstrated that any bias or innovationwill have negligible effect on test results for a characteristic ofa material. This standard provi

27、des statistical methods to con-firm that the mean test results from a testing process areequivalent to those from a reference standard or another testingprocess, where equivalence is defined as agreement withinprescribed limits, termed equivalence limits.4.1.1 The intra-laboratory applications in th

28、is practiceinclude, but are not limited to, the following:(1) Evaluating the bias of a test method with respect to acertified reference material,(2) Evaluating bias due to a minor change in a test methodprocedure,(3) Qualifying new instruments, apparatus, or operators ina laboratory, and(4) Qualifyi

29、ng new sources of reagents or other materialsused in the test procedure.4.1.2 This practice also supports evaluating bias in a methodtransfer from a developing laboratory to a receiving laboratory.4.2 This practice currently deals only with the equivalenceof population means. In this standard, a pop

30、ulation refers to ahypothetical set of test results arising from a stable testingprocess that measures a characteristic of a single material.NOTE 1The equivalence concept can also apply to populationparameters other than means, such as precision, stated as variances,standard deviations, or relative

31、standard deviations (coefficients ofvariation), linearity, sensitivity, specificity, etc.4.3 The data analysis for equivalence testing of populationmeans in this practice uses a statistical methodology termed the“Two one-sided t-test” (TOST) procedure which shall bedescribed in detail in this standa

32、rd (see X1.1). The TOSTprocedure will be adapted to the type of objective andexperiment design selected.4.3.1 Historically, this procedure originated in the pharma-ceutical industry for use in bioequivalence trials (1, 2),3denoted as the Two One-Sided Test, and has since beenadopted for other applic

33、ations, particularly in testing andmeasurement applications (3, 4).4.3.2 The conventional Students t test used for detectingdifferences is not recommended for equivalence testing as itdoes not properly control the consumers and producers risksfor this application (see X1.3).4.4 This practice provide

34、s recommendations for the designof an equivalence experiment, and two basic designs arediscussed. Guidance is provided for determining the amount ofdata required to control the risks of making the wrong decisionin accepting or rejecting equivalence (see X1.2).4.4.1 The consumers risk is the probabil

35、ity of acceptingequivalence when the actual bias or difference in means is3The boldface numbers in parentheses refer to the list of references at the end ofthis standard.E2935 132equal to the equivalence limit. This probability is controlled toa low level so that accepting equivalence gives a high d

36、egreeof assurance that differences in question are less than theequivalence limit.4.4.2 The producers risk is the risk of falsely rejectingequivalence. If improvements are rejected this can lead toopportunity losses to the company and its laboratories (theproducers) or cause additional unnecessary e

37、ffort in improvingthe testing process.5. Planning the Equivalence Study5.1 Objectives and Design SelectionThis practice sup-ports two equivalence study objectives: (1) determining thebias equivalence of a test method or (2) determining the meansequivalence of test results from two testing processes.

38、 In bothobjectives two population means are compared for equiva-lence.5.1.1 Bias EquivalenceThis study requires a suitablequantity of a certified reference material having an acceptedreference value (ARV) for the material characteristic of inter-est. The ARV is considered as a known population mean

39、withzero variability for the purpose of the equivalence study. Theaverage of the test results conducted on the reference materialis the population mean estimate to be compared with the ARV(see X1.4).5.1.2 Means EquivalenceThis study compares the averagetest result from the current testing process wi

40、th the innovatedprocess. A single material is selected, subdivided into testsamples, and distributed for testing by each process. Thematerial should be reasonably homogeneous, because inhomo-geneity in the material will decrease the test precision.5.2 Design RequirementsInputs for carrying out the s

41、ta-tistical test of equivalence are the equivalence limits and theconsumers risk. Additional inputs for designing the equiva-lence study are an estimate of the test method precision and theproducers risk profile over selected differences in the means.5.2.1 The equivalence limits to be used in the TO

42、STprocedure are selected as the worst-case differences betweenthe two population means and are determined by the subjectmatter expert or by industry consensus. These limits areusually symmetrical around zero and then are denoted as Eand E.5.2.1.1 In certain cases the limits may be asymmetrical andar

43、e then denoted by E1and E2, where E1is usually a negativevalue. The producers risk profile for this situation will not betreated in this practice.5.2.2 The consumers risk is the probability of falselydeclaring equivalence and is usually set at a value of 0.05,representing a 5% risk. Other risk level

44、s may be selected,depending on circumstances.5.2.3 The test method precision, , is stated as the standarddeviation of the test method, or methods, used in the equiva-lence study. An estimate may be available from a methodvalidation, an interlaboratory study, or other sources.5.3 Sample Size Determin

45、ationThe number of test results,n, from each population controls the producers risk offalsely rejecting equivalence at a given true mean difference, .The producers risk may be alternatively stated in terms of thepower, or probability 1 of properly accepting equivalence ata given value of .5.3.1 For

46、symmetric equivalence limits, the power profileplots the probability of properly declaring equivalence versusthe absolute value of , due to the symmetry of the equivalencelimits. This calculation can be performed using a spreadsheetcomputer package (see X1.5).5.3.2 An example of a set of power profi

47、les is shown in Fig.1. The probability scale for power on the vertical axis variesfrom 0 to 1. The power profile, a reversed S-shaped curve,should be close to a probability of 1 at zero absolute differenceand will decline to the consumer risk probability at an absolutedifference of E. Power for abso

48、lute differences greater than EFIG. 1 Multiple Power Curves for Lab Transfer ExampleE2935 133are less than the consumer risk and decline asymptotically tozero as the absolute difference increases.5.3.2.1 In Fig. 1 power profiles are shown for three differentsample sizes. Increasing the sample size m

49、oves the powercurve to the right, giving a greater chance of acceptingequivalence for a given true difference .5.3.3 Power curves are evaluated by entering differentvalues of n and evaluating the curve shape.Apractical solutionis to choose n such that the power is above a 0.9 probability outto about half to two-thirds of the distance to E, thus giving ahigh probability that equivalence will be demonstrated for arange of true absolute differences that are deemed of little or noscientific import in t

展开阅读全文