1、Designation: D6708 16b An American National StandardStandard Practice forStatistical Assessment and Improvement of ExpectedAgreement Between Two Test Methods that Purport toMeasure the Same Property of a Material1This standard is issued under the fixed designation D6708; the number immediately follo
2、wing the designation indicates the year oforiginal adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope*1.1 This practice co
3、vers statistical methodology for assess-ing the expected agreement between two standard test methodsthat purport to measure the same property of a material, anddeciding if a simple linear bias correction can further improvethe expected agreement. It is intended for use with resultscollected from an
4、interlaboratory study meeting the require-ment of Practice D6300 or equivalent (for example, ISO 4259).The interlaboratory study must be conducted on at least tenmaterials that span the intersecting scopes of the test methods,and results must be obtained from at least six laboratories usingeach meth
5、od.1.2 The statistical methodology is based on the premise thata bias correction will not be needed. In the absence of strongstatistical evidence that a bias correction would result in betteragreement between the two methods, a bias correction is notmade. If a bias correction is required, then the p
6、arsimonyprinciple is followed whereby a simple correction is to befavored over a more complex one.NOTE 1Failure to adhere to the parsimony principle generally resultsin models that are over-fitted and do not perform well in practice.1.3 The bias corrections of this practice are limited to aconstant
7、correction, proportional correction or a linear (propor-tional + constant) correction.1.4 The bias-correction methods of this practice are methodsymmetric, in the sense that equivalent corrections are obtainedregardless of which method is bias-corrected to match theother.1.5 A methodology is present
8、ed for establishing the 95 %confidence limit (designated by this practice as the betweenmethods reproducibility) for the difference between two resultswhere each result is obtained by a different operator usingdifferent apparatus and each applying one of the two methodsX and Y on identical material,
9、 where one of the methods hasbeen appropriately bias-corrected in accordance with thispractice.NOTE 2In earlier versions of this standard practice, the term “cross-method reproducibility” was used in place of the term “between methodsreproducibility.” The change was made because the “between methods
10、reproducibility” term is more intuitive and less confusing. It is importantto note that these two terms are synonymous and interchangeable with oneanother, especially in cases where the “cross-method reproducibility” termwas subsequently referenced by name in methods where a D6708assessment was perf
11、ormed, before the change in terminology in thisstandard practice was adopted.NOTE 3Users are cautioned against applying the between methodsreproducibility as calculated from this practice to materials that aresignificantly different in composition from those actually studied, as theability of this p
12、ractice to detect and address sample-specific biases (see6.8) is dependent on the materials selected for the interlaboratory study.When sample-specific biases are present, the types and ranges of samplesmay need to be expanded significantly from the minimum of ten asspecified in this practice in ord
13、er to obtain a more comprehensive andreliable 95 % confidence limits for between methods reproducibility thatadequately cover the range of sample specific biases for different types ofmaterials.1.6 This practice is intended for test methods which mea-sure quantitative (numerical) properties of petro
14、leum or petro-leum products.1.7 The statistical methodology outlined in this practice isalso applicable for assessing the expected agreement betweenany two test methods that purport to measure the same propertyof a material, provided the results are obtained on the samecomparison sample set, the sta
15、ndard error associated with eachtest result is known, and the sample set design meets therequirements of this practice, in particular that the statisticaldegree of freedom associated with all standard errors are 30 orgreater.2. Referenced Documents2.1 ASTM Standards:21This practice is under the juri
16、sdiction of ASTM Committee D02 on PetroleumProducts, Liquid Fuels, and Lubricants and is the direct responsibility of Subcom-mittee D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.Current edition approved June 15, 2016. Published August 2016. Originallyapproved in 2001. Last
17、previous edition approved in 2016 as D6708 16a. DOI:10.1520/D6708-16B.2For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at serviceastm.org. For Annual Book of ASTMStandards volume information, refer to the standards Document Summary page onthe ASTM
18、 website.*A Summary of Changes section appears at the end of this standardCopyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States1D5580 Test Method for Determination of Benzene, Toluene,Ethylbenzene, p/m-Xylene, o-Xylene, C9and HeavierAromati
19、cs, and Total Aromatics in Finished Gasoline byGas ChromatographyD5769 Test Method for Determination of Benzene, Toluene,and Total Aromatics in Finished Gasolines by GasChromatography/Mass SpectrometryD6299 Practice for Applying Statistical Quality Assuranceand Control Charting Techniques to Evaluat
20、e AnalyticalMeasurement System PerformanceD6300 Practice for Determination of Precision and BiasData for Use in Test Methods for Petroleum Products andLubricantsD7372 Guide for Analysis and Interpretation of ProficiencyTest Program Results2.2 ISO Standard:3ISO 4259 Petroleum ProductsDetermination an
21、d applica-tion of precision data in relation to methods of test.3. Terminology3.1 Definitions:3.1.1 between ILCP method-averages reproducibility(RILCP_ X, ILCP_Y), na quantitative expression of the random errorassociated with the difference between the bias-corrected ILCPaverage of method X versus t
22、he ILCP average of method Yfrom a Proficiency Testing program, when the method X hasbeen assessed versus method Y, and an appropriate bias-correction has been applied to all method X results in accor-dance with this practice; it is defined as the 95 % confidencelimit for the difference between two s
23、uch averages.3.1.2 between-method bias, na quantitative expression forthe mathematical correction that can statistically improve thedegree of agreement between the expected values of two testmethods which purport to measure the same property.3.1.3 between methods reproducibility (RXY), na quantita-t
24、ive expression of the random error associated with thedifference between two results obtained by different operatorsusing different apparatus and applying the two methods X andY, respectively, each obtaining a single result on an identicaltest sample, when the methods have been assessed and anapprop
25、riate bias-correction has been applied in accordancewith this practice; it is defined as the 95 % confidence limit forthe difference between two such single and independentresults.3.1.3.1 DiscussionAstatement of between methods repro-ducibility must include a description of any bias correctionused i
26、n accordance with this practice.3.1.3.2 DiscussionBetween methods reproducibility is ameaningful concept only if there are no statistically observablesample-specific relative biases between the two methods, or ifsuch biases vary from one sample to another in such a way thatthey may be considered ran
27、dom effects. (see 6.7.)3.1.4 closeness sum of squares (CSS), na statistic used toquantify the degree of agreement between the results from twotest methods after bias-correction using the methodology ofthis practice.3.1.5 Interlaboratory Crosscheck Program (ILCP),nASTM International Proficiency Test
28、Program sponsoredby Committee D02 on Petroleum Products, Liquid Fuels, andLubricants; see ASTM website for current details. D73723.1.6 total sum of squares (TSS), na statistic used toquantify the information content from the inter-laboratorystudy in terms of total variation of sample means relative
29、to thestandard error of each sample mean.3.2 Symbols:X,Y = single X-method and Y-method results,respectivelyXijk,Yijk= single results from the X-method andY-method round robins, respectivelyXi,Yi= means of results on the ithround robinsampleS = the number of samples in the round robinLXi,LYi= the nu
30、mbers of laboratories that returnedresults on the ithround robin sampleRX,RY= the reproducibilities of the X- and Y-methods, respectivelyRXi,RYi= the reproducibility of method X and Y,evaluated at the method X and Y meansof the ithround robin sample, respectivelyRILCP_ X, ILCP_Y= estimate of between
31、 ILCP method-averages reproducibilitysRXi,sRYi= the reproducibility standard deviations,evaluated at the method X and Y meansof the ithround robin samplesrXi,srYi= the repeatability standard deviations,evaluated at the method X and Y meansof the ithround robin samplesXi,sYi= standard errors of the m
32、eans ithroundrobin sampleX,Y= the weighted means of round robins(across samples)xi,yi= deviations of the means of the ithroundrobin sample results from Xand Y, re-spectively.TSSX, TSSY= total sums of squares, around Xand YF = a ratio for comparing variances; notuniquemore than one usevX,vY= the degr
33、ees of freedom for reproducibilityvariances from the round robinswi= weight associated with the difference be-tween mean results (or corrected meanresults) from the ithround robin sampleCSS = weighted sum of squared differences be-tween (possibly corrected) mean resultsfrom the round robina,b = para
34、meters of a linear correction: Y= a +bXt1,t2= ratios for assessing reductions in sums ofsquares3Available from American National Standards Institute (ANSI), 25 W. 43rd St.,4th Floor, New York, NY 10036.D6708 16b2RXY= estimate of between methods reproduc-ibilityY= predicted Y-method value for a sampl
35、e byapplying the bias correction establishedfrom this practice to an actual X-methodresult for the same sampleYi= predicted ithround robin sampleY-method mean, by applying the biascorrection established from this practiceto its corresponding X-method meani= standardized difference between Yiand Yi.L
36、X,LY= harmonic mean numbers of laboratoriessubmitting results on round robinsamples, by X- and Y- methods, respec-tivelyRXY= estimate of between methodsreproducibility, computed from anX-method result only4. Summary of Practice4.1 Precisions of the two methods are quantified usinginter-laboratory st
37、udies meeting the requirements of PracticeD6300 or equivalent, using at least ten samples in common thatspan the intersecting scopes of the methods. The arithmeticmeans of the results for each common sample obtained by eachmethod are calculated. Estimates of the standard errors of thesemeans are com
38、puted.NOTE 4For established standard test methods, new precision studiesgenerally will be required in order to meet the common sample require-ment.NOTE 5Both test methods do not need to be run by the samelaboratory. If they are, care should be taken to ensure the independent testresult requirement o
39、f Practice D6300 is met (for example, by double-blindtesting of samples in random order).4.2 Weighted sums of squares are computed for the totalvariation of the mean results across all common samples foreach method. These sums of squares are assessed against thestandard errors of the mean results fo
40、r each method to ensurethat the samples are sufficiently varied before continuing withthe practice.4.3 The closeness of agreement of the mean results by eachmethod is evaluated using appropriate weighted sums ofsquared differences. Such sums of squares are computed fromthe data first with no bias co
41、rrection, then with a constant biascorrection, then, when appropriate, with a proportionalcorrection, and finally with a linear (proportional + constant)correction.4.4 The weighted sums of squared differences for the linearcorrection is assessed against the total variation in the meanresults for bot
42、h methods to ensure that there is sufficientcorrelation between the two methods.4.5 The most parsimonious bias correction is selected.4.6 The weighted sum of squares of differences, afterapplying the selected bias correction, is assessed to determinewhether additional unexplained sources of variatio
43、n remain inthe residual (that is, the individual Yiminus bias-corrected Xi)data. Any remaining, unexplained variation is attributed tosample-specific biases (also known as method-materialinteractions, or matrix effects). In the absence of sample-specific biases, the between methods reproducibility i
44、s esti-mated.4.7 If sample-specific biases are present, the residuals (thatis, the individual Yiminus bias-corrected Xi) are tested forrandomness. If they are found to be consistent with a random-effects model, then their contribution to the between methodsreproducibility is estimated, and accumulat
45、ed into an all-encompassing between methods reproducibility estimate.4.8 Refer to Fig. 1 for a simplified flow diagram of theprocess described in this practice.5. Significance and Use5.1 This practice can be used to determine if a constant,proportional, or linear bias correction can improve the degr
46、eeof agreement between two methods that purport to measure thesame property of a material.5.2 The bias correction developed in this practice can beapplied to a single result (X) obtained from one test method(method X) to obtain a predicted result (Y) for the other testmethod (method Y).NOTE 6Users a
47、re cautioned to ensure that Yis within the scope ofmethod Y before its use.5.3 The between methods reproducibility established by thispractice can be used to construct an interval around Ythatwould contain the result of test method Y, if it were conducted,with about 95 % confidence.5.4 This practice
48、 can be used to guide commercial agree-ments and product disposition decisions involving test methodsthat have been evaluated relative to each other in accordancewith this practice.5.5 The magnitude of a statistically detectable bias isdirectly related to the uncertainties of the statistics from the
49、experimental study. These uncertainties are related to both thesize of the data set and the precision of the processes beingstudied. A large data set, or, highly precise test method(s), orboth, can reduce the uncertainties of experimental statistics tothe point where the “statistically detectable” bias can become“trivially small,” or be considered of no practical consequencein the intended use of the test method under study. Therefore,users of this practice are advised to determine in advance as tothe magnitude of bias correction below which they wou