1、1,Analysis of DNA Damage and Repair in Colonic Crypts,Raymond J. Carroll Texas A&M University http:/stat.tamu.edu/carroll carrollstat.tamu.eduPostdoctoral Training Program: http:/stat.tamu.edu/B3NC,2,Acknowledgments,Jeffrey Morris, M.D. Anderson Lead author Naisyin Wang (adducts and structure) Marin
2、a Vannucci, Texas A&M (wavelets) Phil Brown, University of Canterbury (wavelets) Joanne Lupton, Biology of Nutrition at Texas A&M (problems and data!),3,Outline,Introduction Colon Carcinogenesis Studies Hierarchical Functional Model DNA Damage: regional correlations Crypt Cell Architecture: modeling
3、 where the cells are located DNA Repair: Wavelet-based Estimation of Hierarchical Functions Conclusions,4,Some Background,General Goal: Study how diet affects colon carcinogenesis. Model: Carcinogen-induced colon cancer in rats. Early Carcinogenesis: DNA damage to cells, and associated repair and ce
4、ll death (apoptosis) If not repaired or removed Mutation Colon cancer,5,Some Background,We are especially interested in anatomical effects Regions of the colon, e.g., proximal (front) and distal (back) There are some major differences in early carcinogenesis between these two regions Localized pheno
5、mena: cell locations Apoptosis and DNA adducts differ by location in colonic crypts,6,Colon Sliced and Laid Out,Aberrant Colon Crypts,Normal Colon Crypts,7,Architecture of Colon Crypts: Crosssectional View,Stem Cells: Mother cells near bottom Depth in crypt age of cells Suggests importance of depth
6、Relative Cell Position: 0 = bottom 1 = top,crypts,Lumen,8,Architecture of Colon Crypt: Expanded View,The cells are more easily visible here Note that the cells seem smaller at the crypt bottom,9,Architecture of Colon Crypt,The general idea is to slice the colon crypt The cells along the left wall ar
7、e assayed,10,Colon Carcinogenesis Studies,Rats are fed different diets exposed to carcinogen (and/or radiation) euthanized. DNA adducts, DNA repair, apoptosis measured through imaging experiments Hierarchical structure of data Diet groups - rats - crypts - cells/pixels Hierarchical longitudinal (in
8、cell depth) data,11,Coordinated Response,Rats were exposed to a potent carcinogen (AOM) At both the proximal and distal regions of the colon, 20 crypts were assayed The rat-level function is gdr(t) For each cell within each crypt, the level of DNA damage was assessed by measuring the DNA adduct leve
9、ls Question: how is DNA damage related in the proximal and distal regions, across rats? We call this coordinated response,12,Coordinated Response as Correlation,We are interested in the “correlation” of the DNA damage in the proximal region with that of the distal region Are different regions of the
10、 colon responding (effectively) independently to carcinogen exposure? This sort of interrelationship of response is what is being studied in our group. It is not cell signaling in the classic sense We will have data on this in the near future,13,Coordinated Response,Correlation in the usual sense is
11、 not possible Let Y(t) = DNA adduct in a proximal cell measured by immunohistochemical staining intensity at cell depth t Let Z (t) = DNA adduct in a distal cell at cell depth t We cannot calculate correlation(Y,Z) (t) in the usual way the same cell cannot be in both locations Coordinated response t
12、hen has to be measured at a higher level,14,Coordinated Response: Hierarchical Functional Model,Let d = diet group Let r = rat Let c = crypt Let t =tdrc= cell position Let Ydrc(t) = adduct level in the proximal region The diet-level function is gd(t),Our aim: estimate the correlation between proxima
13、l and distal regions as a function of cell depth at the rat level,15,Coordinated Response: Average then Smooth,If cell depths were identical for each crypt, we could solve this by “average then smooth” That is, average over all crypts at any given depth, then estimate the correlation as a function o
14、f depth The estimated correlation would of course account for the averaging over a finite number of crypts,Problem: data are not of this structureCell locations vary from crypt to cryptNumber of cells varies from crypt-to-crypt,16,Coordinated Response: Smooth then Average,Instead, we smoothed crypts
15、 via nonparametric regression Then average the smooth fits over the crypts (on a grid of depths) Then compute the correlation as before We actually fit REML to the fitted functions at the crypt level,Problem: Is there any effect due to the initial smooth?,17,Coordinated Response: Asymptotics,General
16、 theory available: kernel regression Allows explicit calculations Can we estimate the correlation function just as well as if the crypt-level functions were known? Complex higher order expansions necessary The asymptotic theory is for large numbers of Rats Crypts Cells,18,Coordinated Response: Asymp
17、totics,Possibility #1: Use standard methods at the crypt level Optimal at the crypt level Double-smoothing phenomenon (at crypt then across crypts) Effect of smoothing does not disappear,19,Coordinated Response: Asymptotics,Possibility #2: Under-smoothing at crypt level Known to work for other doubl
18、e-smoothing problems Is optimal for this problem Explicit simple adjustments for under-smoothing derived Divide optimal bandwidth by the 1/5th power of the number of crypts Result: no asymptotic effect due to the initial smoothing,20,Coordinated Response: Results,Simulations: we found that this simp
19、le bit of under-smoothing works well. Data: extraordinary lack of sensitivity to the smoothing parameter other smoothers give the same basic answers In principle: Regular Smooth then Average: sub-optimal Undersmooth then Average: better,21,Coordinated Response: Asymptotics,Alternatives: Random coeff
20、icient polynomial models: REML/Bayes Hierarchical regression splines Major Point: The method should not matter too much Estimation of Crypt level functions has no asymptotic effect,22,Results: Correlation Functions for Proximal and Distal Regions,The negative correlation in the corn oil diet is unex
21、pectedMay suggest localization of damage: consistent with damage in the proximal or distal regions, but not both,23,Results: Correlation Functions for Proximal and Distal Regions,For basic reasons, as well as robustness reasons, we were led to study whether this was an artifact of the use of relativ
22、e as opposed to actual cell depth,24,Modeling Cell Crypt Architecture,Most analyses of cell depth measure cells on a relative basis Thus, if there are 11 cells, the depths are listed as 0/10, 1/10, , 10/10 This is not the same as actual depth Indeed, it effectively suggests that cells are uniformly
23、spaced along the crypt wall,25,Cell Crypt Architecture: Two Questions,We are interested in the first place in the architecture: Are the cells uniformly distributed within a crypt? It is also extremely tedious to measure actual cell depth Almost any statistical analysis extant uses nominal cell depth
24、: i.e., cell i of n has nominal depth (i-1)/(n-1) Are downstream analyses affected by the use of nominal instead of actual cell depth?,26,Cell Crypt Architecture: Two Questions,Downstream analyses: affected by the use of nominal instead of actual cell depth? Let X = true cell depth = Beta(0.5,1.0) w
25、ith n = 30 Let W = nominal cell depth Let E(Y|X) = X What is E(Y|W)? Plot order statistics of X versus W,27,Cell Crypt Architecture,We have data on 30 rats 20 colonic crypts per rat 45 cells per crypt For each rat, 3 crypts were analyzed to measure their actual cell positions Thus, we have incomplet
26、e data: true cell positions are missing on 17 crypts per rat Question: is the negative proximal-distal correlation in the corn-oil group a consequence of measuring only nominal cell position?,28,Cell Crypt Architecture: Order Statistics,The actual cell positions are on 0,1 We model the true cell pos
27、itions for each crypt as the order statistics from Beta(a,b) We fit the crypt level functions via parametric cubic random effects models General problem: data missing as a group but subject to ordering constraints The order statistic model greatly speeds up computation,29,Cell Crypt Architecture,MCM
28、C approach: various tricks to speed up especially the generation of the missing cell positions (600 per animal) Missing cell positions can be generated simultaneously at the crypt level Simpler than cell-by-cell generation Faster than cell-by-cell generation If generation were cell-by-cell, the orde
29、r constraints would have to be accounted for,30,Cell Crypt Architecture: Results,Proximal architecture is almost exactly U0,1 Distal architecture is clearly not uniform: Beta(a = 0.8,b = 1.0) Here is the posterior mean density The correlation analysis was virtually unchanged Appears that measuring e
30、xact cell positions is not necessary,31,Cell DNA Damage and Repair,The same data structure occurs for DNA repair enzyme data as it does for DNA damage (adduct) data It is clearly of great interest to understand the relationship between the two also as a function of cell depth Repair is measured on a
31、 pixel-by-pixel basis averaging across the crypt A problem arises: the DNA repair data are not nearly so smooth as the adduct data,32,DNA Adduct (Damage) Data: 4 crypts with Regression Spline Fits,33,DNA Repair Data Plots,DNA Repair Enzyme for Selected Crypts,34,Cell DNA Repair,The irregularity of t
32、he DNA repair data suggests that new techniques are necessary We are going to use wavelet methods around an MCMC calculator The multi-level hierarchical data structure makes this a new problem The images are pixel-by-pixel: We “connected the dots” Split into 256 (2*8) “observations” Forces regularly
33、 spaced data,35,Hierarchical Functional Model,2-level HF model:,36,Wavelets & Wavelet Regression,Data space model: y = f(t) + e t = equally spaced grid, length n=2J, on (0,1) Here e = MVN(0,s2) In wavelet space: d = Wy = + e*d = empirical wavelet coefficients = true wavelet coefficients By orthogona
34、lity, e* MVN(0,2),37,Overview of Wavelet Method,Convert data Yabc to wavelet space dabc Involves 1 DWT for each crypt Fit hierarchical model in wavelet space to obtain Posterior distribution of true wavelet coefficients d corresponding to gd(t) Variance component estimates to assess relative variabi
35、lity Use IDWT to obtain posterior distribution of gd(t) for estimation and inference,38,Wavelet Space Model,Wavelets: families of orthonormal basis functions ddrc = = W ydrc,Discrete Wavelet Transform,Daubechies Basis Function,39,Shrinkage Prior,Prior on is a 0-normal mixtureNonlinear shrinkage - de
36、noises dataregularization parameters Hierarchical model fit using MCMC,40,Some General Comments,We focused on marginal (diet level) analyses The marginalization allowed for efficient MCMC Some fairly difficult calculations are required Much more efficient than brute-force Enables analysis of subsamp
37、ling units, e.g., individual rats This we have not yet done in our data Enables assessment of variance components,41,Summary,Method to fit hierarchical longitudinal data Nonparametrically estimate mean profiles for: Treatments Individuals Subsampling units Estimates of relative variability at hierar
38、chical levels We find that 90% of the variability is from crypt-to-crypt Do lots of crypts!,42,Results: DNA Repair Estimates & 90% posterior bounds by diet/time,Fish Oil,Corn Oil,0 h,3 h,6 h,9 h,12 h,43,Conclusion,Cell-based colon carcinogenesis studies Hierarchical Longitudinal/Functional data Rich in information - challenging to extract Methods developed Kernel methods for longitudinal correlations Method for missing data with order constraints Wavelet regression methods for longitudinal hierarchical data,