ImageVerifierCode 换一换
格式:PDF , 页数:66 ,大小:1.86MB ,
资源ID:585001      下载积分:10000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-585001.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(BS ISO 16269-4-2010 Statistical interpretation of data Detection and treatment of outliers《数据的统计说明 异常值的检测和处理》.pdf)为本站会员(brainfellow396)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

BS ISO 16269-4-2010 Statistical interpretation of data Detection and treatment of outliers《数据的统计说明 异常值的检测和处理》.pdf

1、raising standards worldwideNO COPYING WITHOUT BSI PERMISSION EXCEPT AS PERMITTED BY COPYRIGHT LAWBSI Standards PublicationBS ISO 16269-4:2010Statistical interpretation ofdataPart 4: Detection and treatment of outliersBS ISO 16269-4:2010 BRITISH STANDARDNational forewordThis British Standard is the U

2、K implementation of ISO 16269-4:2010.The UK participation in its preparation was entrusted to TechnicalCommittee SS/2, Statistical Interpretation of Data.A list of organizations represented on this committee can beobtained on request to its secretary.This publication does not purport to include all

3、the necessaryprovisions of a contract. Users are responsible for its correctapplication. BSI 2010ISBN 978 0 580 65939 3ICS 03.120.30Compliance with a British Standard cannot confer immunity fromlegal obligations.This British Standard was published under the authority of theStandards Policy and Strat

4、egy Committee on 31 October 2010.Amendments issued since publicationDate Text affectedBS ISO 16269-4:2010Reference numberISO 16269-4:2010(E)ISO 2010INTERNATIONAL STANDARD ISO16269-4First edition2010-10-15Statistical interpretation of data Part 4: Detection and treatment of outliers Interprtation sta

5、tistique des donnes Partie 4: Dtection et traitement des valeurs aberrantes BS ISO 16269-4:2010ISO 16269-4:2010(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces

6、which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems

7、 Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely eve

8、nt that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT ISO 2010 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mec

9、hanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web www.i

10、so.org Published in Switzerland ii ISO 2010 All rights reservedBS ISO 16269-4:2010ISO 16269-4:2010(E) ISO 2010 All rights reserved iiiContents Page Foreword iv Introduction.v 1 Scope1 2 Terms and definitions .1 3 Symbols10 4 Outliers in univariate data 11 4.1 General .11 4.1.1 What is an outlier? 11

11、 4.1.2 What are the causes of outliers? .11 4.1.3 Why should outliers be detected?.11 4.2 Data screening.12 4.3 Tests for outliers .14 4.3.1 General .14 4.3.2 Sample from a normal distribution14 4.3.3 Sample from an exponential distribution16 4.3.4 Samples taken from some known non-normal distributi

12、ons18 4.3.5 Sample taken from unknown distributions.19 4.3.6 Cochrans test for outlying variance .21 4.4 Graphical test of outliers 22 5 Accommodating outliers in univariate data23 5.1 Robust data analysis.23 5.2 Robust estimation of location24 5.2.1 General .24 5.2.2 Trimmed mean .24 5.2.3 Biweight

13、 location estimate .25 5.3 Robust estimation of dispersion .25 5.3.1 General .25 5.3.2 Median-median absolute pair-wise deviation.25 5.3.3 Biweight scale estimate26 6 Outliers in multivariate and regression data 26 6.1 General .26 6.2 Outliers in multivariate data .26 6.3 Outliers in linear regressi

14、on.28 6.3.1 General .28 6.3.2 Linear regression models.29 6.3.3 Detecting outlying Y observations.31 6.3.4 Identifying outlying X observations.31 6.3.5 Detecting influential observations.32 6.3.6 A robust regression procedure35 Annex A (informative) Algorithm for the GESD outliers detection procedur

15、e .36 Annex B (normative) Critical values of outliers test statistics for exponential samples 37 Annex C (normative) Factor values of the modified box plot 44 Annex D (normative) Values of the correction factors for the robust estimators of the scale parameter .47 Annex E (normative) Critical values

16、 of Cochrans test statistic 48 Annex F (informative) A structured guide to detection of outliers in univariate data .51 Bibliography54 BS ISO 16269-4:2010ISO 16269-4:2010(E) iv ISO 2010 All rights reservedForeword ISO (the International Organization for Standardization) is a worldwide federation of

17、national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. Inte

18、rnational organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in accordance with th

19、e rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval b

20、y at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO 16269-4 was prepared by Technical Committee

21、ISO/TC 69, Applications of statistical methods. ISO 16269 consists of the following parts, under the general title Statistical interpretation of data: Part 4: Detection and treatment of outliers Part 6: Determination of statistical tolerance intervals Part 7: Median Estimation and confidence interva

22、ls Part 8: Determination of prediction intervals BS ISO 16269-4:2010ISO 16269-4:2010(E) ISO 2010 All rights reserved vIntroduction Identification of outliers is one of the oldest problems in interpreting data. Causes of outliers include measurement error, sampling error, intentional under- or over-r

23、eporting of sampling results, incorrect recording, incorrect distributional or model assumptions of the data set, and rare observations, etc. Outliers can distort and reduce the information contained in the data source or generating mechanism. In the manufacturing industry, the existence of outliers

24、 will undermine the effectiveness of any process/product design and quality control procedures. Possible outliers are not necessarily bad or erroneous. In some situations, an outlier may carry essential information and thus it should be identified for further study. The study and detection of outlie

25、rs from measurement processes leads to better understanding of the processes and proper data analysis that subsequently results in improved inferences. In view of the enormous volume of literature on the topic of outliers, it is of great importance for the international community to identify and sta

26、ndardize a sound subset of methods used in the identification and treatment of outliers. The implementation of this part of ISO 16269 enables business and industry to recognize the data analyses conducted across member countries or organizations. Six annexes are provided. Annex A provides an algorit

27、hm for computing the test statistic and critical values of a procedure in detecting outliers in a data set taken from a normal distribution. Annexes B, D and E provide the tables needed to implement the recommended procedures. Annex C provides the tables and statistical theory that underlie the cons

28、truction of modified box plots in outlier detection. Annex F provides a structured guide and flow chart to the procedures recommended in this part of ISO 16269. BS ISO 16269-4:2010BS ISO 16269-4:2010INTERNATIONAL STANDARD ISO 16269-4:2010(E) ISO 2010 All rights reserved 1Statistical interpretation o

29、f data Part 4: Detection and treatment of outliers 1 Scope This part of ISO 16269 provides detailed descriptions of sound statistical testing procedures and graphical data analysis methods for detecting outliers in data obtained from measurement processes. It recommends sound robust estimation and t

30、esting procedures to accommodate the presence of outliers. This part of ISO 16269 is primarily designed for the detection and accommodation of outlier(s) from univariate data. Some guidance is provided for multivariate and regression data. 2 Terms and definitions For the purposes of this document, t

31、he following terms and definitions apply. 2.1 sample data set subset of a population made up of one or more sampling units NOTE 1 The sampling units could be items, numerical values or even abstract entities depending on the population of interest. NOTE 2 A sample from a normal (2.22), a gamma (2.23

32、), an exponential (2.24), a Weibull (2.25), a lognormal (2.26) or a type I extreme value (2.27) population will often be referred to as a normal, a gamma, an exponential, a Weibull, a lognormal or a type I extreme value sample, respectively. 2.2 outlier member of a small subset of observations that

33、appears to be inconsistent with the remainder of a given sample (2.1) NOTE 1 The classification of an observation or a subset of observations as outlier(s) is relative to the chosen model for the population from which the data set originates. This or these observations are not to be considered as ge

34、nuine members of the main population. NOTE 2 An outlier may originate from a different underlying population, or be the result of incorrect recording or gross measurement error. NOTE 3 The subset may contain one or more observations. 2.3 masking presence of more than one outlier (2.2), making each o

35、utlier difficult to detect BS ISO 16269-4:2010ISO 16269-4:2010(E) 2 ISO 2010 All rights reserved2.4 some-outside rate probability that one or more observations in an uncontaminated sample will be wrongly classified as outliers (2.2) 2.5 outlier accommodation method method that is insensitive to the

36、presence of outliers (2.2) when providing inferences about the population 2.6 resistant estimation estimation method that provides results that change only slightly when a small portion of the data values in a data set (2.1) is replaced, possibly with very different data values from the original one

37、s 2.7 robust estimation estimation method that is insensitive to small departures from assumptions about the underlying probability model of the data NOTE An example is an estimation method that works well for, say, a normal distribution (2.22), and remains reasonably good if the actual distribution

38、 is skew or heavy-tailed. Classes of such methods include the L-estimation weighted average of order statistics (2.10) and M-estimation methods (see Reference 9). 2.8 rank position of an observed value in an ordered set of observed values NOTE 1 The observed values are arranged in ascending order (c

39、ounting from below) or descending order (counting from above). NOTE 2 For the purposes of this part of ISO 16269, identical observed values are ranked as if they were slightly different from one another. 2.9 depth box plot smaller of the two ranks (2.8) determined by counting up from the smallest va

40、lue of the sample (2.1), or counting down from the largest value NOTE 1 The depth may not be an integer value (see Annex C). NOTE 2 For all summary values other than the median (2.11), a given depth identifies two (data) values, one below the median and the other above the median. For example, the t

41、wo data values with depth 1 are the smallest value (minimum) and largest value (maximum) in the given sample (2.1). 2.10 order statistic statistic determined by its ranking in a non-decreasing arrangement of random variables ISO 3534-1:2006, definition 1.9 NOTE 1 Let the observed values of a random

42、sample be x1, x2, , xn. Reorder the observed values in non-decreasing order designated as x(1)u x(2)u u x(k)u u x(n); then x(k)is the observed value of the kth order statistic in a sample of size n. NOTE 2 In practical terms, obtaining the order statistics for a sample (2.1) amounts to sorting the d

43、ata as formally described in Note 1. BS ISO 16269-4:2010ISO 16269-4:2010(E) ISO 2010 All rights reserved 32.11 median sample median median of a set of numbers Q2(n + 1)/2th order statistic (2.10), if the sample size n is odd; sum of the n/2th and the (n/2) + 1th order statistics divided by 2, if the

44、 sample size n is even ISO 3534-1:2006, definition 1.13 NOTE The sample median is the second quartile (Q2). 2.12 first quartile sample lower quartile Q1for an odd number of observations, median (2.11) of the smallest (n 1)/2 observed values; for an even number of observations, median of the smallest

45、 n/2 observed values NOTE 1 There are many definitions in the literature of a sample quartile, which produce slightly different results. This definition has been chosen both for its ease of application and because it is widely used. NOTE 2 Concepts such as hinges or fourths (2.19 and 2.20) are popul

46、ar variants of quartiles. In some cases (see Note 3 to 2.19), the first quartile and the lower fourth (2.19) are identical. 2.13 third quartile sample upper quartile Q3for an odd number of observations, median of the largest (n 1)/2 observed values; for an even number of observations, median of the

47、largest n/2 observed values NOTE 1 There are many definitions in the literature of a sample quartile, which produce slightly different results. This definition has been chosen both for its ease of application and because it is widely used. NOTE 2 Concepts such as hinges or fourths (2.19 and 2.20) ar

48、e popular variants of quartiles. In some cases (see Note 3 to 2.20), the third quartile and the upper fourth (2.20) are identical. 2.14 interquartile range IQR difference between the third quartile (2.13) and the first quartile (2.12) NOTE 1 This is one of the widely used statistics to describe the

49、spread of a data set. NOTE 2 The difference between the upper fourth (2.20) and the lower fourth (2.19) is called the fourth-spread and is sometimes used instead of the interquartile range. 2.15 five-number summary the minimum, first quartile (2.12), median (2.11), third quartile (2.13), and maximum NOTE The five-number summary provides numerical information about the location, spread and range. BS ISO 16269-4:2010ISO 16269-4:2010(E) 4 ISO 2010 All rights reserved2.16 box plot horizontal or vertical grap

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1