1、Analysis of Exon Arrays,Slides provided by Dr. Yi Xing,Outline,Design of exon arrays Background correction Probe selection, expression index computation Evaluation of gene level index Exon level analysis Conclusion,1. Basic design of Exon Array,Exon Array Probesets Classified by Annotational Confide
2、nce,Core probesets target exons supported by RefSeq mRNAs.Extended probesets target exons supported by ESTs or partial mRNAs.Full probesets target exons supported purely by computational predictions.,2. Background modeling: predict non-specific hybridization from probe sequence,Wu and Irizarry (2005
3、) use probe effect modeling to obtain more accurate expression index on 3 arrays Johnson et al (2006) use probe effect modeling to detect ChIP peaks for Tiling arrays Kapur et al (2007) use probe effect modeling to correct background for Exon array,Background modeling in Exon Arrays,logBi = *niT + j
4、k Iijk + k nik2+ i Estimate parameters from either Background probes (n = 37,687) Full probes (n = 400,000) test on a different array (with single scaling constant),Full probes useful for modeling background,Promoter array may be used to train exon array background,Preliminary conclusions,Background
5、 correction based on background probe effect modeling can greatly reduce background noise Model parameters are similar for different ChIP-DNA samples, or for different RNA samples, but not across DNA and RNA. The data may be rich enough to support learning of more complex models with even better pre
6、dictive power.,3. Probe selection and expression index computation,Probes,Samples,Core probes,Gene-level visualization: Heatmap of Intensities,major histocompatibility complex, class II, DM beta,Heatmap of Pairwise Correlations,Probes,Probes,HLA_DMB,First observations,Heapmap of correlations is a us
7、eful complement to heatmap of intensities Core probes have higher intensity than extended and full probes,Probe selection for gene-level expression,Most full and extended probes are not suitable for estimating gene-level expression Probes may target false exon predictions Even some core probes may n
8、ot be suitable Bad probes with low affinity, or cross-hybridize Probes targeting differentially spliced exons Probe selection Selecting a suitably large subset of good probes targeting constitutively spliced regions of the gene Use only to selected probes to estimate gene expression,_ _ _constitutiv
9、e alternatively spliced constitutive,Heatmap of CD44 core probes (Ordered By Genomic Locations),ataxin 2-binding protein 1,These examples motivated our Probe Selection Strategy,Probe selection procedure (on core probes) Hierarchical clustering of the probe intensities across 11 tissues (33 samples),
10、 and cut the tree at various heights (0.1,0.2,1.0). Choose a height cutoff to strike a balance between the size of the largest sub-group and the correlation within the sub-group. Iteratively remove probes if they do not correlate well with current expression index At least 11 core probes need to be
11、chosen. If the total number of core probes is less than 11 for the entire transcript cluster, we skip probe selection.,(Xing Y, Kapur K, Wong WH. PLoS ONE. 2006 20;1:e88),Hierarchical Clustering of CD44 Core Probes (distance=1-corr, average linkage),h=0.1 44 (42%) probes,Computation of gene level ex
12、pression index,Background correction,Normalization,Probe selection,Computation of Overall Gene Expression Indexes,GeneBASE: Gene-level Background Adjusted Selected probe Expression Download: http:/biogibbs.stanford.edu/kkapur/GeneBASE/ Xing, Kapur, Wong, PLoS ONE, 1:e88, 2006 Kapur, Xing, Wong, Geno
13、me Biology, 8:R82, 2007,(linear scaling or none),(dChip type model),Gene level quantile normalization,optional,In most cases selection does not affect fold changes,spectrin, beta, non-erythrocytic 4 (SPTBN4),Sometimes, selections change fold-change significantly,BetaIV spectrins are essential for me
14、mbrane stability and the molecular organization of nodes of Ranvier along neuronal axons,4. Evaluations of gene level index,Before selection,After selection,Fold-change of liver over muscle, in 438 genes with high fold-change in 3 expression array data,1st evaluation: tissue fold change,Before selec
15、tion,After selection,Probe selection allows more sensitive detection of fold-changes,Zoom-in,Before selection,After selection,FC of muscle over liver, in 500 genes detected to be overexpressed in muscle over liver by 3 array,Before selection,After selection,Zoom-in,FC of muscle over liver,2nd evalua
16、tion: Presence/Absence calls,Use SAGE data to construct gold-standard Presence in tissue if 100 tags per million Absence if no tags in given tissue but 100 tpm in at least another tissue Exon array A/P calls: use sum of z-scores for core probes (z-score is computed based on background model),(a),(b)
17、,(c),Cerebellum,Heart,Kidney,ROC curves shows that background correction improves A/P calls.Red: Exon, Z-score call Blue: Exon Affy call Brown: 3 Affy call, max probeset Purple: 3 Affy call, min probe set,3rd evaluation: Cross-species conservation,3 and Exon array data for six adult tissues in both
18、human and mouse Expression computed for about 10,000 pairs of human-mouse ortholog pairs,3 arrays,Exon arrays,Similarity of gene expression profiles in six human tissues and six corresponding mouse tissues. For each ortholog pair we calculated the Pearson correlation coefficient (PCC) of expression
19、indexes across six tissues (solid line). We also permutated ortholog relationships and calculated the PCC for random human-mouse gene pairs (dashed line).,(Xing Y, Ouyang Z, Kapur K, Scott MP, Wong WH. Mol Biol Evol. April 2007),3 arrays correlations,Exon arrays correlations,3 arrays scatter plot,Ex
20、on arrays scatter plot,Exon arrays also reveal conservation of absolute abundance of transcripts in individual tissues!,4th evaluation: q-PCR,On log scale, exon array fold change estimate is correlated with qPCR fold change (corr = 0.9),5. Issues in exon level analysis,Challenges,The experimental va
21、lidation rate in several published exon array studies are highly variable. Gardina et al. BMC Genomics 7:325, 21% Kwan et al. Genome Res 17:1210, 45% Hung et al. RNA 14:284, 22%-56% Clark et al. Genome Biol 8:R64, 84%.Most exons are targeted by no more than four probes. No probes for splice junction
22、s. Noise in observed probe intensities (due to background, cross-hybridization) can make the inferred splicing pattern unreliable.,MADS: Microarray Analysis of Differential Splicing,1. Correction for background (non-specific hybridization),2. Probe selection and expression index calculation,4. Detec
23、tion of differential splicing,3. Correction for cross-hybridization,1. Kapur, Xing, Wong, Genome Biology, 8:R82, 2007 2. Xing, Kapur, Wong WH. PLoS ONE. 2006 20;1:e88 3. Xing et.al., 2008, RNA, 2008, 14(8): 1470-1479,Splicing Index: Corrected Probe Intensity Estimated Gene Expression Level,Analysis
24、of “gold-standard” alternative splicing data via PTB knockdown experiments,Our “gold-standard” - a list of exons with pre-determined inclusion/exclusion profiles in response to PTB depletion (Boutz P, et.al. Genes Dev. 2007, 21(13):1636-52.) We used shRNA to knock-down PTB, generated Exon array data
25、, and analyzed data on “gold-standard” exons.MADS detected all exons with large changes (25%) in transcript inclusion levels, and offered improvement over Affymetrixs analysis procedure.,Collaboration with Douglas Black (UCLA),Boutz P, et.al. Genes Dev. 2007, 21(13):1636-52.,MADS sensitivity correla
26、tes with the magnitude of change in exon inclusion levels of “gold-standard exons”,Xing et.al., 2008, RNA, 2008, 14(8): 1470-1479,Exon array detection of novel PTB-dependent splicing events,control,shRNA knockdown of splicing repressor PTB,Detection of alternative 3-UTR and Poly-A sites of Ncam1,30
27、differentially spliced exons were tested; 27 were validated. Validation rate: 27/30=90%,Cross-Hybridization,Probes are designed to hybridize to their target transcriptsOften probes have 0,1,2,3 base pair mismatches to non-target transcriptsCross-hyb seriously complicates exon-level analysis.,Mapping
28、 mismatches to probes,6,000,000 probes Each 25bp long 3,000,000,000bp genome sequence For 1-bp mismatch, a nave search needs O(6M x 3G x 25) years of CPU time Fast matching algorithm (by Hui Jiang) makes this feasible in hours,Distribution of Number of Cross-hyb Transcripts,Full Probes,Core Probes,C
29、orrection of sequence-specific cross-hybridization to off-target transcripts,PAN3,Conclusion,Gene level index is accurate and reflects absolute abundanceWe show that sequence-specific modeling of microarray noise (background and cross-hybridization) improves the precision of exon-level analysis of e
30、xon array data.Overall, our data demonstrate that exon array design is an effective approach to study gene expression and differential splicing.Development of future “probe rich” exon arrays, with increased probe density on exons and inclusion of splice junction probes, will offer more powerful tools for global or targeted analysis of alternative splicing.,