1、Abdel H. El-Shaarawi National Water Research Institute and Department of Mathematics and Statistics, McMaster University Abdel.el-shaarawiec.gc.caData-driven and Physically-based Models for Characterization of Processes in Hydrology, Hydraulics, Oceanography and Climate Change January 6-28, 2008 IMS
2、, Singapore,Modeling Extreme Events Data,Outline,Some referencesExamples of extreme events dataTypes of extreme events dataCommonly used models for extremes:Distributions of order statistics Generalized extreme value distributionsGeneralized Pareto distributionsParameter and quantile estimation of e
3、xtremesSummary and concluding remarks,References,Beirlant Jan, Yuri Goegebeur, Johan Segers and Jozef Teugels (2004), Statistics of Extremes: Theory and Applications, NewYork: John Wiley & Sons.Castillo, E. and Hadi, A. S. (1994), Parameter and Quantile Estimation for the Generalized Extreme-Value D
4、istribution, Environmetrics, 5, 417432.Castillo, E. and Hadi, A. S. (1995), A Method for Estimating Parameters and Quantiles of Continuous Distributions of Random Variables, Computational Statistics and Data Analysis, 20, 421439.,References,Castillo, E., Hadi, A. S., Balakrishnan, N., and Sarabia, J
5、. M. (2006), Extreme Value and Related Models in Engineering and Science Applications, New York: John Wiley & Sons.Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values.Springer-Verlag, London, England.El-Shaarawi, A. H., and Hadi, A. S.,Modified Likelihood Function for Paramet
6、er and Quantile Estimation, Work in progress.Nadarajah, S. and El-Shaarawi, A. H. (2006). On the Ratios for Extreme Value Distributions with Applications to Rainfall Modeling. Environmetrics Kotz, S. and Nadarajah, S. (2000). Extreme Value Distributions: Theory and Applications. London: Imperial Col
7、lege Press.,Software: S-plus & R,Stuart Coles S-plus package available at URL:http:/www.math.lancs.ac.uk./coless extRemes R package available at http:/www.isse.ucar.edu/extremevalues,Examples of Extreme Events Data,In many statistical applications, the interest is centered on estimating some populat
8、ion characteristics based on random samples taken from a population under study.For example, we wish to estimate:the average rainfall, the average temperature, the median income, etc.,Examples of Extreme Events Data,In other areas of applications, we are not interested in estimating the average but
9、rather in estimating the maximum or the minimum.,1. Ocean Engineering: In the design of offshore platforms, breakwaters, dikes and other harbor works, engineers rely upon the knowledge of the probability distribution of the maximum, not the average wave height.,Some Examples:,Examples of Extreme Eve
10、nts Data,2. Structural Engineering: Modern building codes and standards require:,Estimation of extreme wind speeds and their recurrence intervals during the lifetime of the building. Knowledge of the largest loads acting on the structure during its lifetime. Seismic incidence: the maximum earthquake
11、 intensity during the lifetime of the building.,Examples of Extreme Events Data,3. Designing Dams: Engineers would not be interested in the probability distribution of the average flood, but in the maximum floods.4. Agriculture: Farmers would be interested in both the minimum and maximum rain fall (
12、drought versus flooding).5. Insurance companies would be interested in the maximum insurance claims.,Examples of Extreme Events Data,6. Pollution Control: The pollution of air and water has become a common problem in many countries due to large concentrations of people, traffic, and industries (prod
13、ucing smoke, human, chemical, nuclear wastes, etc.). Government regulations, require pollution indices to remain below a given critical level. Thus, the regulations are satisfied if, and only if, the largest pollution concentration during the period of interest is less than the critical level.,Nile
14、meter,U.S. Bureau of the census, Watson and Pauly (2002),Living resources: food security,Niagara River Fraser River,Upstream-Downstream Water Quality Monitoring Human and Ecosystem Health: Regulations and Control,Time Plots: Fraser River Hope,Evolution of the Flow along the Fraser River,Hansard/Red
15、Pass,Max of log (Flow) at Hope,Some Results for Max (Hope),Yearly maximum significant wave-height data 1949-19765.60 6.55 6.65 7.35 7.80 7.90 8.00 8.509.05 9.15 9.40 9.60 9.80 9.90 10.85 10.90 11.10 11.30 11.30 11.55 11.75 12.85 12.90 13.40,Two More Example: wave-height & Temperature (Basel),Two Sta
16、tions: Ratio of GEV Distributions W=X/(X+Y),Seoul Rainfall Data,Microbiological Regulations (Human health),Approximate expression for probability of compliance with the regulations,Sample size n=5 and 10 # of simulations =10000,Ratio of single sample rejection probability to that of the mean rule (n
17、 = 5,10 and 20),The Temperature Data: Change-Point,Relative Likelihood Function for the Change Point,Relative Likelihood function for the Change Point (Temp. Data),Q-Q plots for the two segements,Return Levels,Outline,Some referencesExamples of extreme events dataTypes of extreme events dataCommonly
18、 used models for extremes:Distributions of order statistics Generalized extreme value distributionsGeneralized Pareto distributionsParameter and quantile estimation of extremesSummary and concluding remarks,Types of Extreme Events Data,The choice of model and estimation methods depends on the type o
19、f available data.,Data, x1, x2, , xn, drawn from a possibly unknown population, are available. We wish to:,Find an appropriate parametric model, F(x; q), that fits the data reasonably well Estimate the parameters, q, and quantiles, X(p), of such a model,Types of Extreme Events Data,Examples:,1. Comp
20、lete Data: All n observations are available.,Daily/Monthly energy consumption Daily/Monthly rain fall, stream discharge or flood flow,Types of Extreme Events Data,Examples:,2. Maxima/Minima: Only maxima or minima are available.,Maximum/minimum daily/monthly temperatures Maximum daily/monthly wave he
21、ights Maximum daily/monthly wind speeds, pollution concentrations, etc.,Types of Extreme Events Data,3. Exceedances over/under a threshold: When using yearly maxima (minima), then an important part of the information large (small) values (other than the two extremes occurring the same year) is lost.
22、 The alternative is to use the exceedances over (under) a given threshold.,Exceedances Over/Under a Threshold,We are interested in events that cause failure such as exceedances of a random variable over a threshold value. For example, waves can destroy a breakwater when their heights exceed a given
23、value, say 9 meters. Then it does not matter whether the height of a wave is 9.5, 10 or 12 meters because the consequences of these events are similar.,Exceedances Over/Under a Threshold,So, only failure causing observations exceeding a given threshold are available. Definition: Let X be a random va
24、riable and u be a given threshold value. The event X = x is said to be an exceedance at the level u if X u.,Summary: Types of Data,Extreme events data come in one of three types: 1. Complete observations, 2. Maxima/Minima, or 3. Exceedances over/under a threshold value,Outline,Some referencesExample
25、s of extreme events dataTypes of extreme events dataCommonly used models for extremes:Distributions of order statistics Generalized extreme value distributionsGeneralized Pareto distributionsParameter and quantile estimation of extremesSummary and concluding remarks,Commonly Used Models for Extremes
26、,The choice of model depends on the type of available data:,Distributions of Order Statistics (DOS): Used when we have complete data Generalized Extreme Value (GEV) Distribution (AKA: Von Mises Family): Used for maxima/minima type of data Generalized Pareto Distribution (GPD): Used for exceedances o
27、ver/under threshold type of data,Distributions of Order Statistics,Let X1, X2, , Xn be a sample of size n from a possibly unknown cdf F(x; q), depending on unknown vector-valued parameter q. Let X1:n X2:n Xn:n be the corresponding order statistics. Xi:n is called the ith order statistic. Of particul
28、ar interest is the minimum, X1:n, and the maximum, Xn:n order statistics.,Distributions of Order Statistics,The distributions of the the order statistics are well know. For example:The cdf of the maximum order statistics is:The cdf of the minimum order statistics is:,Problems with Distributions of O
29、S,The distributions of the order statistics have the following practical problems:,The cdf of the parent population, F(x; q), is usually unknown When the data consist only of maxima or minima, the sample sizes are usually unknown,Non-Degenerate Limiting Distributions,The answer to the above problem
30、is:,Theorem: The only non-degenerate cdf family satisfying (1) is the Maximal Generalized Extreme Value Distribution (GEVM). The only non-degenerate cdf family satisfying (2) is the Minimal Generalized Extreme Value Distribution (GEVm).,Generalized Extreme Value Distributions,Thus, there are two GEV
31、 distributions, one maximal, GEVM, and one minimal, GEVm. The GEV (AKA, Von Mises) distributions were introduced by Jenkinson (1955). They are used when we have a large sample or the observations themselves are either minima or maxima. Their cdf are given later.,Generalized Extreme Value Distributio
32、ns,The GEV distributions are now widely used to model extremes of natural and environmental data. Examples are found in:,Flood Studies Report of the USAs Natural Environment Research Council (1975) Several articles in Tiago de Oliveira (1984) Hosking, Wallis, and Wood (1985) Castillo et al. (2006),M
33、aximal Generalized Extreme Value,The cumulative distribution function (cdf) of the maximal GEVM distribution is:,Minimal Generalized Extreme Value,The cumulative distribution function (cdf) of the minimal GEVm distribution is:,Relationship Between GEVM and GEVm,Theorem: If the cdf of X is L(l, d, k)
34、, then the cdf of Y = - X is H(-l, d, k). Implication: One form of the cdf can be obtained from the other.,Maximal Generalized Extreme Value,The GEVM family has three-parameters:l is a location parameter d is a scale parameter (d 0)k is a shape parameter,The parameter k is the most important of the
35、three. The pth quantile is (0 p 1):,Special Cases of the Maximal GEV,The family of GEVM has three special cases:,1. The Maximal Weibull distribution is obtained when k 0. Its cdf is:,Special Cases of the Maximal GEV,2. The Maximal Gumbel distribution is obtained when k = 0. Its cdf is:,Special Cases
36、 of the Maximal GEV,3. The Maximal Frechet distribution is obtained when k 0. Its cdf is:,Weibull, Gumbel, and Frechet,Weibull and Frechet converge to Gumbel,Summary,The GEV family can be used when:,The cdf of the parent population, F(x; q), is unknown The sample size is very large (no degeneracy pr
37、oblems) The data consist only of maxima or minima (we do not need to know the sample sizes),Outline,Some referencesExamples of extreme events dataTypes of extreme events dataCommonly used models for extremes:Distributions of order statistics Generalized extreme value distributionsGeneralized Pareto
38、distributionsParameter and quantile estimation of extremesSummary and concluding remarks,Types of Extreme Events Data,Recall the three types of extreme events data:,Complete Data: All n observations are available. Maxima/Minima: Only maxima or minima are available Exceedances over/under a threshold:
39、 Only observations exceeding a given threshold are available,Use distributions of order statistics if we know F(x) and n is not too large; else, use GEV.,Use GPD.,Use GEV.,Exceedances Over/Under a Threshold,As mentioned earlier, we are interested in events that cause failure such as exceedances of a
40、 random variable over a threshold value. The differences between the actual values and the threshold value are called exceedances over/under the threshold.,Generalized Maximal Pareto Distributions,Pickands (1975) demonstrates that when the threshold tends to the upper end of the random variable, the
41、 exceedances follow a generalized Pareto distribution, GPDM(a, k), with cdf,Generalized Maximal Pareto Distribution,The GPDM family has a two-parameters:a is a scale parameter (a 0)k is a shape parameter,The pth quantile is (0 p 1):,Note that when,Special Cases of the Maximal GPD,The GPDM has three
42、special cases:,When k = 0, the GPDM reduces to the Exponential distribution with mean a. When k = 1, the GPDM reduces to the Uniform U(0, a). When k 0, the GPDM becomes the Pareto distribution.,Generalized Minimal Pareto Distribution,A similar family exists for the case of exceedances under a thresh
43、old. These are called the the Generalized Minimal Pareto distributions or the Reversed Generalized Pareto distributions.,Outline,Some referencesExamples of extreme events dataTypes of extreme events dataCommonly used models for extremes:Distributions of order statistics Generalized extreme value dis
44、tributionsGeneralized Pareto distributionsParameter and quantile estimation of extremesSummary and concluding remarks,Parameter and Quantile Estimation,Available estimation methods include:,1. The maximum likelihood (MLE):Jenkinson (1969)Prescott and Walden (1980, 1983)Smith (1984, 1985)2. The metho
45、d of moments (MOM),Parameter and Quantile Estimation,3. The probability weighted moments (PWM):Greenwood et al. (1979), Hosking et al. (1985) 4.The Elemental Percentile method (EPM): Castillo and Hadi (1995) 5.Order Statistics (Least Squares): El-Shaarawi 5. Modified Likelihood Function (MLF): El-Sh
46、aarawi and Hadi (work in progress).,Problems With Traditional Estimators,Traditional methods of estimation (MLE and the moments-based methods) have problems because:,The range of the distribution depends on the parameters:x 0x l + d / k, for k 0So, MLE do not have the usual asymptotic properties.,Pr
47、oblems With Traditional Estimators,The MLE requires numerical solutions. For some samples, the likelihood may not have a local maximum. For k 1, the MLE do not exist (the likelihood can be made infinite).,Problems With Traditional Estimators,When k -1, the mean and higher moments do not exist. So, M
48、OM and PWM do not exist when k -1. The PWM estimators are good for cases where 0.5 k 0.5. Outside this range of k, the PWM estimates may not exist, and if they do exist their performance worsens as k increases.,Recently Proposed Estimation Methods,4. The Elemental Percentile method (EPM): Castillo a
49、nd Hadi (1995) 5. Modified Likelihood Function (MLF): El-Shaarawi and Hadi (work in progress).,This leaves us with two recently proposed methods for estimating the parameters and quantiles of the extreme models:,Elemental Percentile method (EPM),1. Initial estimates are obtained by equating three distinct order statistics to their corresponding percentiles:,Elemental Percentile method (EPM),2. Substitute the cdf of the GEVM, we obtain:,These are three equations in three unknowns: l, d, and k.,