1、1-20-05,1,Back to basics Probability, Conditional Probability and Independence,Probability of an outcome in an experiment is the proportion of times that this particular outcome would occur in a very large (“infinite”) number of replicated experiments Probability distribution describes the probabili
2、ty of any outcome in an experiment If we have two different experiments, the probability of any combination of outcomes is the joint probability and the joint probability distribution describes probabilities of observing and combination of outcomes If the outcome of one experiment does not affect th
3、e probability distribution of the other, we say that outcomes are independent Event is a set of one or more possible outcomes,1-20-05,2,Back to basics Probability, Conditional Probability and Independence,Let N be the very large number of trials of an experiment, and ni be the number of times that i
4、th outcome (oi) out of possible infinitely many possible outcomes has been observed pi=ni/N is the probability of the ith outcome Properties of probabilities following from this definition 1) pi 0 2) pi 1,4) For any set of mutually exclusive events (events that dont have any outcomes in common),5) p
5、(NOT e) = 1-p(e) for any event e,1-20-05,3,Conditional Probabilities and Independence,Suppose you have a set of N DNA sequences. Let the random variable X denote the identity of the first nucleotide and the random variable Y the identity of the second nucleotide.,Suppose now that you have randomly s
6、elected a DNA sequence from this set and looked at the first nucleotide but not the second. Question: what is the probability of a particular second nucleotide y given that you know that the first nucleotide is x*?,The probability of a randomly selected DNA sequence from this set to have the xy dinu
7、cleotide at the beginning is equal to P(X=x,Y=y),P(Y=y|X=x*) is the conditional probability of Y=y given that X=x*,X and Y are independent if of P(Y=y|X=x)=P(Y=y),1-20-05,4,Conditional Probabilities and Independence,If X and Y are independent, then from,Probability of two independent events is equal
8、 to the product of their probabilities,1-20-05,5,Suppose we have T genes which we measured under two experimental conditions (W and C) in n replicated experiments ti* and pi are the t-statistic and the corresponding p-value for the ith gene, i=1,.,T P-value is the probability of observing as extreme
9、 or more extreme value of the t-statistic under the “null-distribution” (i.e. the distributions assuming that iW = iC ) than the one calculated from the data (t*) The ith gene is “differentially expressed“ if we can reject the ith null hypothesis iW = iC and conclude that iW iC at a significance lev
10、el (i.e. if pi) Type I error is committed when a null-hypothesis is falsely rejected Type II error is committed when a null-hypothesis is not rejected but it is false Experiment-wise Type I Error is committed if any of a set of (T) null hypothesis is falsely rejected If the significance level is cho
11、sen prior to conducting experiment, we know that by following the hypothesis testing procedure, we will have the probability of falsely concluding that any one gene is differentially expressed (i.e. falsely reject the null hypothesis) is equal to What is the probability of committing a Family-wise T
12、ype I Error? Assuming that all null hypothesis are true, what is the probability that we would reject at least one of them?,Identifying Differentially Expressed Genes,1-20-05,6,Experiment-wise error rate,Assuming that individual tests of hypothesis are independent and true:p(Not Committing The Exper
13、iment-Wise Error) = p(Not Rejecting H01 AND Not Rejecting H02 AND . AND Not Rejecting H0T) = (1- )(1- ).(1- ) = (1- )Tp(Committing The Experiment-Wise Error) =1-(1- )TSidaks adjustment: a= 1-(1- )1/T 1-(1- a )T = 1-(1-1-(1- )1/T)T = 1-(1- )1/T)T = 1-(1-) = ,1-20-05,7,Experiment-wise error rate,Anoth
14、er adjustment: p(Committing The Experiment-Wise Error) = (Rejecting H01 OR Not Rejecting H02 OR . OR Not Rejecting H0T) T (Homework: How does that follow from the probability properties)Bonferroni adjustment: b= /TGenerally ba Bonferroni adjustment more conservative The Sidaks adjustment assumes ind
15、ependence likely not to be satisfied. If tests are not independent, Sidaks adjustment is most likely conservative,1-20-05,8,Adjusting p-value,Individual Hypotheses: H0i: iW = iC pi=p(tn-1 ti*) , i=1,.,T“Composite“ Hypothesis: H0: iW = iC, i=1,.,T p=minpi, i=1,.,T The composite null hypothesis is rej
16、ected if even a single individual hypothesis is rejected Consequently the p-value for the composite hypothesis is equal to the minimum of individual p-values If all tests have the same reference distribution, this is equivalent to p=p(tn-1 t*max) We can consider a p-value to be itself the outcome of
17、 the experiment What is the “null“ probability distribution of the p-value for individual tests of hypothesis? What is the “null“ probability distribution for the composite p-value?,1-20-05,9,Given that the null hypothesis is true, probability of observing the p-values smaller than a fixed number be
18、tween 0 and 1 is:p(pi ti*)a)=a,Null distribution of the p-value,The general result in this respect is that F(X) UNIF(0,1) whenever F is the Cumulative Distribution Function of X,1-20-05,10,p(p a) = = 1-p(p1 a AND p2 a AND . AND pT a) = =Assuming independence between different tests = =1- p(p1 a) p(p
19、2 a). p(pT a) = =1-1-p(p1 a) 1-p(p2 a). 1-p(pT a)=1-1-aTInstead of adjusting the significance level, can adjust all p-values: pia = 1-1-aT,Null distribution of the composite p-value,1-20-05,11,Null distribution of the composite p-value,The null distribution of the composite p-value for 1, 10 and 300
20、00 tests,1-20-05,12,Seems simple,Applying a conservative p-value adjustment will take care of false positives How about false negatives Type II Error arises when we fail to reject H0 although it is falsePower=p(Rejecting H0 when W -C 0) = p(t* t|W -C 0)=p(p |W -C 0)Depends on various things (, df, ,
21、 W -C) Probability distribution of is non-central t,1-20-05,13,Effects multiple comparison adjustments on power http:/homepages.uc.edu/%7Emedvedm/documents/Sample%20Size%20for%20arrays%20experiments.pdf,t4 : Green Dashed Line t9 : Red Dashed Linet4,nc=6.1: Green Solid Line t9,nc=8.6 Red Solid Line,T
22、=5000, =0.05, a =0.0001, W -C = 10, = 1.5,27.6,8.8,1-20-05,14,This is not good enough,Traditional statistical approaches to multiple comparison adjustments which strictly control the experiment-wise error rates are not optimal Need a balance between the false positive and false negative rates Benjam
23、ini Y and Hochberg Y (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society B 57:289-300. Instead of controlling the probability of generating a single false positive, we control the proportion of false positives Consequence is that some of the implicated genes are likely to be false positives.,1-20-05,15,False Discovery Rate,FDR = E(V/R) If all null hypothesis are true (composite null) this is equivalent to the Family-wise error rate,1-20-05,16,False Discovery Rate,1-20-05,17,Effects,