1、Designation: E2935 15 An American National StandardStandard Practice forConducting Equivalence Testing in Laboratory Applications1This standard is issued under the fixed designation E2935; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revi
2、sion, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice provides statistical methodology for con-ducting equivalence testing on numerical data
3、 from twosources to determine if their true means differ by no more thanpredetermined limits.1.2 Applications include (1) equivalence testing for biasagainst an accepted reference value, (2) determining equiva-lence of two test methods, test apparatus, instruments, reagentsources, or operators withi
4、n a laboratory, and (3) equivalenceof two laboratories in a method transfer.1.3 The current guidance in this standard applies only toexperiments conducted on a single material. Guidance is givenfor determining the amount of data required for an equivalencetrial.1.4 The statistical methodology for de
5、termining equivalenceused is the two one-sided tests (TOST) procedure. The controlof risks associated with the equivalence decision is discussed.1.5 The values stated in SI units are to be regarded asstandard. No other units of measurement are included in thisstandard.1.6 This standard does not purp
6、ort to address all of thesafety concerns, if any, associated with its use. It is theresponsibility of the user of this standard to establish appro-priate safety and health practices and determine the applica-bility of regulatory limitations prior to use.2. Referenced Documents2.1 ASTM Standards:2E17
7、7 Practice for Use of the Terms Precision and Bias inASTM Test MethodsE456 Terminology Relating to Quality and StatisticsE2282 Guide for Defining the Test Result of a Test MethodE2586 Practice for Calculating and Using Basic Statistics3. Terminology3.1 DefinitionsSee Terminology E456 for a more exte
8、n-sive listing of statistical terms.3.1.1 accepted reference value, na value that serves as anagreed-upon reference for comparison, and which is derivedas: (1) a theoretical or established value, based on scientificprinciples, (2) an assigned or certified value, based on experi-mental work of some n
9、ational or international organization, or(3) a consensus or certified value, based on collaborativeexperimental work under the auspices of a scientific orengineering group. E1773.1.2 bias, nthe difference between the expectation of thetest results and an accepted reference value. E1773.1.3 confidenc
10、e interval, nan interval estimate L, Uwith the statistics L and U as limits for the parameter andwith confidence level 1 , where Pr(LU) 1 . E25863.1.3.1 DiscussionThe confidence level, 1 , reflects theproportion of cases that the confidence interval L, U wouldcontain or cover the true parameter valu
11、e in a series of repeatedrandom samples under identical conditions. Once L and U aregiven values, the resulting confidence interval either does ordoes not contain it. In this sense “confidence” applies not to theparticular interval but only to the long run proportion of caseswhen repeating the proce
12、dure many times.3.1.4 confidence level, nthe value, 1 , of the probabilityassociated with a confidence interval, often expressed as apercentage. E25863.1.4.1 Discussion is generally a small number. Confi-dence level is often 95 % or 99 %.3.1.5 confidence limit, neach of the limits, L and U, of aconf
13、idence interval, or the limit of a one-sided confidenceinterval. E25863.1.6 degrees of freedom, nthe number of independentdata points minus the number of parameters that have to beestimated before calculating the variance. E25863.1.7 equivalence, ncondition that two population param-eters differ by
14、no more than predetermined limits.3.1.8 intermediate precision conditions, nconditions un-der which test results are obtained with the same test methodusing test units or test specimens taken at random from a single1This test method is under the jurisdiction of ASTM Committee E11 on Qualityand Stati
15、stics and is the direct responsibility of Subcommittee E11.20 on TestMethod Evaluation and Quality Control.Current edition approved Oct. 1, 2015. Published October 2015. Originallyapproved in 2013. Last previous edition approved in 2014 as E2935 14. DOI:10.1520/E2935-15.2For referenced ASTM standard
16、s, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at serviceastm.org. For Annual Book of ASTMStandards volume information, refer to the standards Document Summary page onthe ASTM website.Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19
17、428-2959. United States1quantity of material that is as nearly homogeneous as possible,and with changing conditions such as operator, measuringequipment, location within the laboratory, and time. E1773.1.9 mean, nof a population, , average or expectedvalue of a characteristic in a population of a sa
18、mple, Xsumof the observed values in the sample divided by the samplesize. E25863.1.10 population, nthe totality of items or units ofmaterial under consideration. E25863.1.11 population parameter, nsummary measure of thevalues of some characteristic of a population. E25863.1.12 precision, nthe closen
19、ess of agreement betweenindependent test results obtained under stipulated conditions.E1773.1.13 repeatability, nprecision under repeatabilityconditions. E1773.1.14 repeatability conditions, nconditions where inde-pendent test results are obtained with the same method onidentical test items in the s
20、ame laboratory by the same operatorusing the same equipment within short intervals of time. E1773.1.15 repeatability standard deviation (sr), nthe standarddeviation of test results obtained under repeatabilityconditions. E1773.1.16 sample, na group of observations or test results,taken from a larger
21、 collection of observations or test results,which serves to provide information that may be used as a basisfor making a decision concerning the larger collection. E25863.1.17 sample size, n, nnumber of observed values in thesample. E25863.1.18 sample statistic, nsummary measure of the ob-served valu
22、es of a sample. E25863.1.19 test result, nthe value of a characteristic obtainedby carrying out a specified test method. E22823.1.20 test unit, nthe total quantity of material (containingone or more test specimens) needed to obtain a test result asspecified in the test method. See test result. E2282
23、3.2 Definitions of Terms Specific to This Standard:3.2.1 bias equivalence, nequivalence of a populationmean with an accepted reference value.3.2.2 equivalence limit, E, nin equivalence testing, a limiton the difference between two population parameters.3.2.2.1 DiscussionIn certain applications, this
24、 may betermed practical limit or practical difference.3.2.3 equivalence test, na statistical test conducted withinpredetermined risks to confirm equivalence of two populationparameters.3.2.4 means equivalence, nequivalence of two populationmeans.3.2.5 paired samples design, nin means equivalencetest
25、ing, single samples are taken from the two populations at anumber of sampling points.3.2.5.1 DiscussionThis design is termed a randomizedblock design for a general number of populations sampled, andeach group of data within a sampling point is termed a block.3.2.6 power, nin equivalence testing, the
26、 probability ofaccepting equivalence, given the true difference between twopopulation means.3.2.6.1 DiscussionIn the case of testing for bias equiva-lence the power is the probability of accepting equivalence,given the true difference between a population mean and anaccepted reference value.3.2.7 tw
27、o independent samples design, nin means equiva-lence testing, replicate test results are determined indepen-dently from two populations at a single sampling time for eachpopulation.3.2.7.1 DiscussionThis design is termed a completelyrandomized design for a general number of populationssampled.3.2.8
28、two one-sided tests (TOST) procedure, na statisticalprocedure used for testing the equivalence of the parametersfrom two distributions (see equivalence).3.3 Symbols:B = bias (7.1.1)dj= difference between a pair of test results at samplingpoint j (7.1.1)d= average difference (7.1.1)D = difference in
29、sample means (6.1.2)(X1.1.2)E = equivalence limit (5.2.1)E1= lower equivalence limit (5.2.1.1)E2= upper equivalence limit (5.2.1.1)H0: = null hypothesis (X1.1.1)HA: = alternate hypothesis (X1.1.1)f = degrees of freedom for s (8.1.1)(X1.1.2)fi= degrees of freedom for si(6.1.1)fp= degrees of freedom f
30、or sp(6.1.2)n = sample size (number of test results) from a population(5.3)(6.1.3)(7.1.1)(8.1.1)ni= sample size from ith population (6.1.1)n1= sample size from population 1 (6.1.2)n2= sample size from population 2 (6.1.2)s = sample standard deviation (8.1.1)sB= sample standard deviation for bias (8.
31、1.2)sd= standard deviation of the difference between two testresults (7.1.1)sD= sample standard deviation for mean difference (6.1.3)(X1.1.2)si= sample standard deviation for ith population (6.1.1)si2= sample variance for ith population (6.1.1)s12= sample variance for population 1 (6.1.2)s22= sample
32、 variance for population 2 (6.1.2)sp= pooled sample standard deviation (6.1.2)sr= repeatability sample standard deviation (6.2)t = Students t statistic (6.1.4)(7.1.3)(8.1.3)t12,f= (1-)th percentile of the Students t distribution withf degrees of freedom (X1.1.2)Xij= jth test result from the ith popu
33、lation (6.1)X= test result average (8.1.1)Xi= test result average for the ith population (6.1.1)X1= test result average for population 1 (6.1.3)X2= test result average for population 2 (6.1.3)E2935 152Z12= (1-)th percentile of the standard normal distribution(X1.5.1) = consumers risk (5.2.2)(6.2)(7.
34、2) = producers risk (5.3) = true mean difference between populations (5.3) = population mean (X1.4.1)i= ith population mean (X1.1.1) = approximate degrees of freedom for sD(X1.1.4) = standard deviation of the test method (5.2.3)d= standard deviation of the true difference between twopopulations (7.2
35、)() = standard normal cumulative distribution function(X1.5.1)3.4 Acronyms:3.4.1 ARV, naccepted reference value (5.1.2)(8.1)(X1.4)3.4.2 CRM, ncertified reference material (5.1.2)(8.1)3.4.3 ILS, ninterlaboratory study (6.2)3.4.4 LCL, nlower confidence limit (6.2.5)(7.2.3)3.4.5 TOST, ntwo one-sided te
36、sts (4.3) (Section 6) (Sec-tion 7) (Section 8)(Appendix X1)3.4.6 UCL, nupper confidence limit (6.2.5)(7.2.3)4. Significance and Use4.1 Laboratories conducting routine testing have a continu-ing need to evaluate test result bias, to evaluate changes forimproving the test process performance, or to va
37、lidate thetransfer of a test method to a new location or apparatus. In allsituations it must be demonstrated that any bias or innovationwill have negligible effect on test results for a characteristic ofa material. This standard provides statistical methods to con-firm that the mean test results fro
38、m a testing process areequivalent to those from a reference standard or another testingprocess, where equivalence is defined as agreement withinprescribed limits, termed equivalence limits.4.1.1 The intra-laboratory applications in this practiceinclude, but are not limited to, the following:(1) Eval
39、uating the bias of a test method with respect to acertified reference material,(2) Evaluating bias due to a minor change in a test methodprocedure,(3) Qualifying new instruments, apparatus, or operators ina laboratory, and(4) Qualifying new sources of reagents or other materialsused in the test proc
40、edure.4.1.2 This practice also supports evaluating systematic dif-ferences in a method transfer from a developing laboratory toa receiving laboratory.4.2 This practice currently deals only with the equivalenceof population means. In this standard, a population refers to ahypothetical set of test res
41、ults arising from a stable testingprocess that measures a characteristic of a single material.NOTE 1The equivalence concept can also apply to populationparameters other than means, such as precision, stated as variances,standard deviations, or relative standard deviations (coefficients ofvariation),
42、 linearity, sensitivity, specificity, etc.4.3 The data analysis for equivalence testing of populationmeans in this practice uses a statistical methodology termed thetwo one-sided tests (TOST) procedure which shall be describedin detail in this standard (see X1.1). The TOST procedure willbe adapted t
43、o the type of objective and experiment designselected.4.3.1 Historically, this procedure originated in the pharma-ceutical industry for use in bioequivalence trials (1, 2),3denoted as the Two One-Sided Tests Procedure, and has sincebeen adopted for other applications, particularly in testing andmeas
44、urement applications (3, 4).4.3.2 The conventional Students t test used for detectingdifferences is not recommended for equivalence testing as itdoes not properly control the consumers and producers risksfor this application (see X1.3).4.4 Risk ManagementGuidance is provided for determin-ing the amo
45、unt of data required to control the risks of makingthe wrong decision in accepting or rejecting equivalence (seeX1.2).4.4.1 The consumers risk is the probability of acceptingequivalence when the actual bias or difference in means isequal to the equivalence limit. This probability is controlled toa l
46、ow level so that accepting equivalence gives a high degreeof assurance that differences in question are less than theequivalence limit.4.4.2 The producers risk is the risk of falsely rejectingequivalence. If improvements are rejected this can lead toopportunity losses to the company and its laborato
47、ries (theproducers) or cause additional unnecessary effort in improvingthe testing process.5. Planning the Equivalence Study5.1 Objectives and Design SelectionThis practice sup-ports two equivalence study objectives: (1) determining themeans equivalence of test results from two testing processes or(
48、2) determining the bias equivalence of a test method. In bothobjectives, two population means are compared for equiva-lence.5.1.1 Means EquivalenceThis study compares the averagetest result from the current testing process with the innovatedprocess. A single material is selected, subdivided into tes
49、tsamples, and distributed for testing by each process. Thematerial should be reasonably homogeneous, because inhomo-geneity in the material will decrease the test precision.5.1.1.1 Design TypesThis practice provides recommenda-tions for the design of a means equivalence experiment, andtwo basic designs are discussed. Section 6 discusses the twoindependent samples design, in which each population issampled independently. Section 7 discusses the paired samplesdesign in which pairs of s