1、Basic Statistical Concepts,Donald E. Mercante, Ph.D.Biostatistics School of Public Health L S U - H S C,Population,Sample,Statistics,Parameters,Two Broad Areas of Statistics,Descriptive Statistics- Numerical descriptors- Graphical devices- Tabular displaysInferential Statistics - Hypothesis testing
2、- Confidence intervals - Model building/selection,Descriptive Statistics,When computed for a population of values, numerical descriptors are called ParametersWhen computed for a sample of values, numerical descriptors are called Statistics,Descriptive Statistics,Two important aspects of any populati
3、onMagnitude of the responsesSpread among population members,Descriptive Statistics,Measures of Central Tendency (magnitude)Mean - most widely used- uses all the data- best statistical properties- susceptible to outliersMedian - does not use all the data- resistant to outliers,Descriptive Statistics,
4、Measures of Spread (variability)range - simple to compute- does not use all the datavariance - uses all the data- best statistical properties- measures average distance of values from a reference point,Properties of Statistics,Unbiasedness - On target Minimum variance - Most reliableIf an estimator
5、possesses both properties then it is a MINVUE = MINimum Variance Unbiased EstimatorSample Mean and Variance are UMVUE =Uniformly MINimum Variance Unbiased Estimator,Inferential Statistics,- Hypothesis Testing- Interval Estimation,Hypothesis Testing,Specifying hypotheses:H0: “null” or no effect hypot
6、hesisH1: research or alternative hypothesisNote: Only H0 (null) is tested.,Errors in Hypothesis Testing,Hypothesis Testing,In parametric tests, actual parameter values are specified for H0 and H1.H0: 120,Hypothesis Testing,Another example of explicitly specifying H0 and H1.H0: = 0H1: 0,Hypothesis Te
7、sting,General framework:Specify null & alternative hypothesesSpecify test statisticState rejection rule (RR)Compute test statistic and compare to RRState conclusion,Common Statistical Tests,Common Statistical Tests (cont.),Advanced Topics,P-Values,p = Probability of obtaining a result at least this
8、extreme given the null is true.P-values are probabilities0 p 1Computed from distribution of the test statistic,Rate a proportion, specifically a fraction, where The numerator, c, is included in the denominator:Useful for comparing groups of unequal size Example:,Epidemiological Concepts,Measures of
9、Morbidity:Incidence Rate: # new cases occurring during a given time interval divided by population at risk at the beginning of that period.Prevalence Rate: total # cases at a given time divided by population at risk at that time.,Epidemiological Concepts,Most people think in terms of probability (p)
10、 of an event as a natural way to quantify the chance an event will occur = 0=p=10 = event will certainly not occur1 = event certain to occurBut there are other ways of quantifying the chances that an event will occur.,Epidemiological Concepts,Odds and Odds Ratio:For example, O = 4 means we expect 4
11、times as many occurrences as non-occurrences of an event.In gambling, we say, the odds are 5 to 2. This corresponds to the single number 5/2 = Odds.,Epidemiological Concepts,The relationship between probability & odds,Epidemiological Concepts,Epidemiological Concepts,Odds1 correspond,To probabilitie
12、s0.5,0Odds,Death sentence by race of defendant in 147 trials,Example 1: Odds Ratio,Odds of death sentence = 50/97 = 0.52For Blacks: O = 28/45 = 0.62For Nonblacks: O = 22/52 = 0.42Ratio of Black Odds to Nonblack Odds = 1.47 This is called the Odds Ratio,Example 2: Odds Ratio,Odds ratios are directly
13、related to the parameters of the logit (logistic regression) model.Logistic Regression is a statistical method that models binary (e.g., Yes/No; T/F; Success/Failure) data as a function of one or more explanatory variables.We would like a model that predicts the probability of a success, ie, P(Y=1)
14、using a linear function.,Logistic Regression,Problem: Probabilities are bounded by 0 and 1. But linear functions are inherently unbounded.Solution: Transform P(Y=1) = p to an odds. If we take the log of the odds the lower bound is also removed.Setting this result equal to a linear function of the ex
15、planatory variables gives us the logit model.,Logistic Regression,Logit or Logistic Regression ModelWhere pi is the probability that yi = 1. The expression on the left is called the logit or log odds.,Logistic Regression,Probability of success:Odds Ratio for Each Explanatory Variable:,Logistic Regre
16、ssion,Screening Tests,How do we evaluate the usefulness of such a test?Diagnostics:sensitivityspecificityFalse positive rateFalse negative ratepredictive value positivepredictive value negative,Screening Tests,Screening Tests,Screening Tests,Screening Tests,Interval Estimation,Statistics such as the
17、 sample mean, median, variance, etc., are calledpoint estimates-vary from sample to sample-do not incorporate precision,Interval Estimation,Take as an example the sample mean:X (popn mean) Or the sample variance:S2 2(popn variance),Estimates,Interval Estimation,Recall Example 1, a one-sample t-test
18、on the population mean. The test statistic wasThis can be rewritten to yield:,Interval Estimation,Which can be rearranged to give a,(1-)100% Confidence Interval for :,Form: Estimate Multiple of Std Error of the Est.,Interval Estimation,Example 1: Standing SBPMean = 140.8, s.d. = 9.5, N = 1295% CI for : 140.8 2.201 (9.5/sqrt(12) 140.8 6.036 (134.8, 146.8),