BS ISO 16269-4-2010 Statistical interpretation of data Detection and treatment of outliers《数据的统计说明 异常值的检测和处理》.pdf

上传人:brainfellow396 文档编号:585001 上传时间:2018-12-15 格式:PDF 页数:66 大小:1.86MB
下载 相关 举报
BS ISO 16269-4-2010 Statistical interpretation of data Detection and treatment of outliers《数据的统计说明 异常值的检测和处理》.pdf_第1页
第1页 / 共66页
BS ISO 16269-4-2010 Statistical interpretation of data Detection and treatment of outliers《数据的统计说明 异常值的检测和处理》.pdf_第2页
第2页 / 共66页
BS ISO 16269-4-2010 Statistical interpretation of data Detection and treatment of outliers《数据的统计说明 异常值的检测和处理》.pdf_第3页
第3页 / 共66页
BS ISO 16269-4-2010 Statistical interpretation of data Detection and treatment of outliers《数据的统计说明 异常值的检测和处理》.pdf_第4页
第4页 / 共66页
BS ISO 16269-4-2010 Statistical interpretation of data Detection and treatment of outliers《数据的统计说明 异常值的检测和处理》.pdf_第5页
第5页 / 共66页
亲,该文档总共66页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、raising standards worldwideNO COPYING WITHOUT BSI PERMISSION EXCEPT AS PERMITTED BY COPYRIGHT LAWBSI Standards PublicationBS ISO 16269-4:2010Statistical interpretation ofdataPart 4: Detection and treatment of outliersBS ISO 16269-4:2010 BRITISH STANDARDNational forewordThis British Standard is the U

2、K implementation of ISO 16269-4:2010.The UK participation in its preparation was entrusted to TechnicalCommittee SS/2, Statistical Interpretation of Data.A list of organizations represented on this committee can beobtained on request to its secretary.This publication does not purport to include all

3、the necessaryprovisions of a contract. Users are responsible for its correctapplication. BSI 2010ISBN 978 0 580 65939 3ICS 03.120.30Compliance with a British Standard cannot confer immunity fromlegal obligations.This British Standard was published under the authority of theStandards Policy and Strat

4、egy Committee on 31 October 2010.Amendments issued since publicationDate Text affectedBS ISO 16269-4:2010Reference numberISO 16269-4:2010(E)ISO 2010INTERNATIONAL STANDARD ISO16269-4First edition2010-10-15Statistical interpretation of data Part 4: Detection and treatment of outliers Interprtation sta

5、tistique des donnes Partie 4: Dtection et traitement des valeurs aberrantes BS ISO 16269-4:2010ISO 16269-4:2010(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces

6、which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems

7、 Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely eve

8、nt that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT ISO 2010 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mec

9、hanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web www.i

10、so.org Published in Switzerland ii ISO 2010 All rights reservedBS ISO 16269-4:2010ISO 16269-4:2010(E) ISO 2010 All rights reserved iiiContents Page Foreword iv Introduction.v 1 Scope1 2 Terms and definitions .1 3 Symbols10 4 Outliers in univariate data 11 4.1 General .11 4.1.1 What is an outlier? 11

11、 4.1.2 What are the causes of outliers? .11 4.1.3 Why should outliers be detected?.11 4.2 Data screening.12 4.3 Tests for outliers .14 4.3.1 General .14 4.3.2 Sample from a normal distribution14 4.3.3 Sample from an exponential distribution16 4.3.4 Samples taken from some known non-normal distributi

12、ons18 4.3.5 Sample taken from unknown distributions.19 4.3.6 Cochrans test for outlying variance .21 4.4 Graphical test of outliers 22 5 Accommodating outliers in univariate data23 5.1 Robust data analysis.23 5.2 Robust estimation of location24 5.2.1 General .24 5.2.2 Trimmed mean .24 5.2.3 Biweight

13、 location estimate .25 5.3 Robust estimation of dispersion .25 5.3.1 General .25 5.3.2 Median-median absolute pair-wise deviation.25 5.3.3 Biweight scale estimate26 6 Outliers in multivariate and regression data 26 6.1 General .26 6.2 Outliers in multivariate data .26 6.3 Outliers in linear regressi

14、on.28 6.3.1 General .28 6.3.2 Linear regression models.29 6.3.3 Detecting outlying Y observations.31 6.3.4 Identifying outlying X observations.31 6.3.5 Detecting influential observations.32 6.3.6 A robust regression procedure35 Annex A (informative) Algorithm for the GESD outliers detection procedur

15、e .36 Annex B (normative) Critical values of outliers test statistics for exponential samples 37 Annex C (normative) Factor values of the modified box plot 44 Annex D (normative) Values of the correction factors for the robust estimators of the scale parameter .47 Annex E (normative) Critical values

16、 of Cochrans test statistic 48 Annex F (informative) A structured guide to detection of outliers in univariate data .51 Bibliography54 BS ISO 16269-4:2010ISO 16269-4:2010(E) iv ISO 2010 All rights reservedForeword ISO (the International Organization for Standardization) is a worldwide federation of

17、national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. Inte

18、rnational organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in accordance with th

19、e rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval b

20、y at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO 16269-4 was prepared by Technical Committee

21、ISO/TC 69, Applications of statistical methods. ISO 16269 consists of the following parts, under the general title Statistical interpretation of data: Part 4: Detection and treatment of outliers Part 6: Determination of statistical tolerance intervals Part 7: Median Estimation and confidence interva

22、ls Part 8: Determination of prediction intervals BS ISO 16269-4:2010ISO 16269-4:2010(E) ISO 2010 All rights reserved vIntroduction Identification of outliers is one of the oldest problems in interpreting data. Causes of outliers include measurement error, sampling error, intentional under- or over-r

23、eporting of sampling results, incorrect recording, incorrect distributional or model assumptions of the data set, and rare observations, etc. Outliers can distort and reduce the information contained in the data source or generating mechanism. In the manufacturing industry, the existence of outliers

24、 will undermine the effectiveness of any process/product design and quality control procedures. Possible outliers are not necessarily bad or erroneous. In some situations, an outlier may carry essential information and thus it should be identified for further study. The study and detection of outlie

25、rs from measurement processes leads to better understanding of the processes and proper data analysis that subsequently results in improved inferences. In view of the enormous volume of literature on the topic of outliers, it is of great importance for the international community to identify and sta

26、ndardize a sound subset of methods used in the identification and treatment of outliers. The implementation of this part of ISO 16269 enables business and industry to recognize the data analyses conducted across member countries or organizations. Six annexes are provided. Annex A provides an algorit

27、hm for computing the test statistic and critical values of a procedure in detecting outliers in a data set taken from a normal distribution. Annexes B, D and E provide the tables needed to implement the recommended procedures. Annex C provides the tables and statistical theory that underlie the cons

28、truction of modified box plots in outlier detection. Annex F provides a structured guide and flow chart to the procedures recommended in this part of ISO 16269. BS ISO 16269-4:2010BS ISO 16269-4:2010INTERNATIONAL STANDARD ISO 16269-4:2010(E) ISO 2010 All rights reserved 1Statistical interpretation o

29、f data Part 4: Detection and treatment of outliers 1 Scope This part of ISO 16269 provides detailed descriptions of sound statistical testing procedures and graphical data analysis methods for detecting outliers in data obtained from measurement processes. It recommends sound robust estimation and t

30、esting procedures to accommodate the presence of outliers. This part of ISO 16269 is primarily designed for the detection and accommodation of outlier(s) from univariate data. Some guidance is provided for multivariate and regression data. 2 Terms and definitions For the purposes of this document, t

31、he following terms and definitions apply. 2.1 sample data set subset of a population made up of one or more sampling units NOTE 1 The sampling units could be items, numerical values or even abstract entities depending on the population of interest. NOTE 2 A sample from a normal (2.22), a gamma (2.23

32、), an exponential (2.24), a Weibull (2.25), a lognormal (2.26) or a type I extreme value (2.27) population will often be referred to as a normal, a gamma, an exponential, a Weibull, a lognormal or a type I extreme value sample, respectively. 2.2 outlier member of a small subset of observations that

33、appears to be inconsistent with the remainder of a given sample (2.1) NOTE 1 The classification of an observation or a subset of observations as outlier(s) is relative to the chosen model for the population from which the data set originates. This or these observations are not to be considered as ge

34、nuine members of the main population. NOTE 2 An outlier may originate from a different underlying population, or be the result of incorrect recording or gross measurement error. NOTE 3 The subset may contain one or more observations. 2.3 masking presence of more than one outlier (2.2), making each o

35、utlier difficult to detect BS ISO 16269-4:2010ISO 16269-4:2010(E) 2 ISO 2010 All rights reserved2.4 some-outside rate probability that one or more observations in an uncontaminated sample will be wrongly classified as outliers (2.2) 2.5 outlier accommodation method method that is insensitive to the

36、presence of outliers (2.2) when providing inferences about the population 2.6 resistant estimation estimation method that provides results that change only slightly when a small portion of the data values in a data set (2.1) is replaced, possibly with very different data values from the original one

37、s 2.7 robust estimation estimation method that is insensitive to small departures from assumptions about the underlying probability model of the data NOTE An example is an estimation method that works well for, say, a normal distribution (2.22), and remains reasonably good if the actual distribution

38、 is skew or heavy-tailed. Classes of such methods include the L-estimation weighted average of order statistics (2.10) and M-estimation methods (see Reference 9). 2.8 rank position of an observed value in an ordered set of observed values NOTE 1 The observed values are arranged in ascending order (c

39、ounting from below) or descending order (counting from above). NOTE 2 For the purposes of this part of ISO 16269, identical observed values are ranked as if they were slightly different from one another. 2.9 depth box plot smaller of the two ranks (2.8) determined by counting up from the smallest va

40、lue of the sample (2.1), or counting down from the largest value NOTE 1 The depth may not be an integer value (see Annex C). NOTE 2 For all summary values other than the median (2.11), a given depth identifies two (data) values, one below the median and the other above the median. For example, the t

41、wo data values with depth 1 are the smallest value (minimum) and largest value (maximum) in the given sample (2.1). 2.10 order statistic statistic determined by its ranking in a non-decreasing arrangement of random variables ISO 3534-1:2006, definition 1.9 NOTE 1 Let the observed values of a random

42、sample be x1, x2, , xn. Reorder the observed values in non-decreasing order designated as x(1)u x(2)u u x(k)u u x(n); then x(k)is the observed value of the kth order statistic in a sample of size n. NOTE 2 In practical terms, obtaining the order statistics for a sample (2.1) amounts to sorting the d

43、ata as formally described in Note 1. BS ISO 16269-4:2010ISO 16269-4:2010(E) ISO 2010 All rights reserved 32.11 median sample median median of a set of numbers Q2(n + 1)/2th order statistic (2.10), if the sample size n is odd; sum of the n/2th and the (n/2) + 1th order statistics divided by 2, if the

44、 sample size n is even ISO 3534-1:2006, definition 1.13 NOTE The sample median is the second quartile (Q2). 2.12 first quartile sample lower quartile Q1for an odd number of observations, median (2.11) of the smallest (n 1)/2 observed values; for an even number of observations, median of the smallest

45、 n/2 observed values NOTE 1 There are many definitions in the literature of a sample quartile, which produce slightly different results. This definition has been chosen both for its ease of application and because it is widely used. NOTE 2 Concepts such as hinges or fourths (2.19 and 2.20) are popul

46、ar variants of quartiles. In some cases (see Note 3 to 2.19), the first quartile and the lower fourth (2.19) are identical. 2.13 third quartile sample upper quartile Q3for an odd number of observations, median of the largest (n 1)/2 observed values; for an even number of observations, median of the

47、largest n/2 observed values NOTE 1 There are many definitions in the literature of a sample quartile, which produce slightly different results. This definition has been chosen both for its ease of application and because it is widely used. NOTE 2 Concepts such as hinges or fourths (2.19 and 2.20) ar

48、e popular variants of quartiles. In some cases (see Note 3 to 2.20), the third quartile and the upper fourth (2.20) are identical. 2.14 interquartile range IQR difference between the third quartile (2.13) and the first quartile (2.12) NOTE 1 This is one of the widely used statistics to describe the

49、spread of a data set. NOTE 2 The difference between the upper fourth (2.20) and the lower fourth (2.19) is called the fourth-spread and is sometimes used instead of the interquartile range. 2.15 five-number summary the minimum, first quartile (2.12), median (2.11), third quartile (2.13), and maximum NOTE The five-number summary provides numerical information about the location, spread and range. BS ISO 16269-4:2010ISO 16269-4:2010(E) 4 ISO 2010 All rights reserved2.16 box plot horizontal or vertical grap

展开阅读全文
相关资源
  • BS ISO IEC 29150-2011 Information technology Security techniques Signcryption《信息技术 安全技术 签密》.pdfBS ISO IEC 29150-2011 Information technology Security techniques Signcryption《信息技术 安全技术 签密》.pdf
  • BS ISO IEC 15408-1-2009 Information technology - Security techniques - Evaluation criteria for IT Security - Introduction and general model《信息技术 安全技术 IT安全评价准则 一.pdfBS ISO IEC 15408-1-2009 Information technology - Security techniques - Evaluation criteria for IT Security - Introduction and general model《信息技术 安全技术 IT安全评价准则 一.pdf
  • BS ISO 7295-1988+A1-2014 Tyre valves for aircraft Interchangeability dimensions《飞机轮胎汽门嘴 互换性尺寸》.pdfBS ISO 7295-1988+A1-2014 Tyre valves for aircraft Interchangeability dimensions《飞机轮胎汽门嘴 互换性尺寸》.pdf
  • BS ISO 15118-1-2013 Road vehicles Vehicle to grid communication interface General information and use-case definition《道路车辆 车辆到电力通讯接口 通用信息和使用案例定义》.pdfBS ISO 15118-1-2013 Road vehicles Vehicle to grid communication interface General information and use-case definition《道路车辆 车辆到电力通讯接口 通用信息和使用案例定义》.pdf
  • BS ISO 13765-2-2004 Refractory mortars - Determination of consistency using the reciprocating flow table method《耐熔灰浆 使用往复流动表法测定一致性》.pdfBS ISO 13765-2-2004 Refractory mortars - Determination of consistency using the reciprocating flow table method《耐熔灰浆 使用往复流动表法测定一致性》.pdf
  • BS ISO 10998-2008+A1-2014 Agricultural tractors Requirements for steering《农业拖拉机 操纵要求》.pdfBS ISO 10998-2008+A1-2014 Agricultural tractors Requirements for steering《农业拖拉机 操纵要求》.pdf
  • BS Z 9-1998 Space data and information transfer systems - Advanced orbiting systems - Networks and data links - Architectural specification《空间数据和信息传输系统 高级轨道系统 网络和数据链接 结构规范》.pdfBS Z 9-1998 Space data and information transfer systems - Advanced orbiting systems - Networks and data links - Architectural specification《空间数据和信息传输系统 高级轨道系统 网络和数据链接 结构规范》.pdf
  • BS Z 7-1998 Space data and information transfer systems - ASCII encoded English《空间数据和信息传输系统 ASCII 编码英语》.pdfBS Z 7-1998 Space data and information transfer systems - ASCII encoded English《空间数据和信息传输系统 ASCII 编码英语》.pdf
  • BS Z 5-1997 Space data and information transfer systems - Standard formatted data units - Control authority procedures《航天数据和信息发送系统 标准格式数据单元 控制授权程序》.pdfBS Z 5-1997 Space data and information transfer systems - Standard formatted data units - Control authority procedures《航天数据和信息发送系统 标准格式数据单元 控制授权程序》.pdf
  • BS Z 4-1997 Space data and information transfer systems - Standard formatted data units - Structure and construction rules《航天数据和信息传输系统 标准格式数据单元 结构和构造规则》.pdfBS Z 4-1997 Space data and information transfer systems - Standard formatted data units - Structure and construction rules《航天数据和信息传输系统 标准格式数据单元 结构和构造规则》.pdf
  • 猜你喜欢
    相关搜索

    当前位置:首页 > 标准规范 > 国际标准 > BS

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1