1、BioConductor,Steffen Durinck Robert Gentleman Sandrine DudoitNovember 28, 2003 NETTAB Bologna,Outline,what is R what is Bioconductor packages getting and using Bioconductor,R,R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S languag

2、e and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S.,R,what sorts of things is R good at? there are very many statistical algorithms there are very many machine lea

3、rning algorithms visualization it is possible to write scripts that can be reused R is a real computer language,R,R supports many data technologies XML,database integration,SOAP R interacts with other languages C; FORTRAN; Perl; Python; Java R has good visualization capabilities R has a very active

4、development environment R is largely platform independent Unix; Windows; OSX,Overview of the Bioconductor Project,Bioconductor,Bioconductor is an open source and open development software project for the analysis of biomedical and genomic data. The project was started in the Fall of 2001 and include

5、s 23 core developers in the US, Europe, and Australia. R and the R package system are used to design and distribute software. Releases v 1.0: May 2nd, 2002, 15 packages. v 1.1: November 18th, 2002, 20 packages. v 1.2: May 28th, 2003, 30 packages. v 1.3: October 28th, 2003, 54 packages. ArrayAnalyzer

6、: Commercial port of Bioconductor packages in S-Plus.,Goals,Provide access to powerful statistical and graphical methods for the analysis of genomic data. Facilitate the integration of biological metadata (GenBank, GO, LocusLink, PubMed) in the analysis of experimental data. Allow the rapid developm

7、ent of extensible, interoperable, and scalable software. Promote high-quality documentation and reproducible research. Provide training in computational and statistical methods.,Bioconductor Packages,Bioconductor packages,Bioconductor software consists of R add-on packages. An R package is a structu

8、red collection of code (R, C, or other), documentation, and/or data for performing specific types of analyses. E.g. affy, cluster, graph, hexbin packages provide implementations of specialized statistical and graphical methods.,Bioconductor packages Release 1.3, October 28th, 2003,AnnBuilder Biocond

9、uctor annotation data package builder Biobase Biobase: Base functions for Bioconductor DynDoc Dynamic document tools MAGEML handling MAGEML documents MeasurementError.cor Measurement Error model estimate for correlation coefficient RBGL Test interface to boost C+ graph lib ROC utilities for ROC, wit

10、h uarray focus RdbiPgSQL PostgreSQL access Rdbi Generic database methods Rgraphviz Provides plotting capabilities for R graph objects Ruuid Ruuid: Provides Universally Unique ID values SAGElyzer A package that deals with SAGE libraries SNPtools Rudimentary structures for SNP data affyPLM affyPLM - P

11、robe Level Models Affy Methods for Affymetrix Oligonucleotide Arrays Affycomp Graphics Toolbox for Assessment of Affymetrix Expression Measures Affydata Affymetrix Data for Demonstration Purpose Annaffy Annotation tools for Affymetrix biological metadata Annotate Annotation for microarrays,Ctc Clust

12、er and Tree Conversion. daMA Efficient design and analysis of factorial two-colour microarray data Edd expression density diagnostics externalVector Vector objects for R with external storage factDesign Factorial designed microarray experiment analysis Gcrma Background Adjustment Using Sequence Info

13、rmation Genefilter Genefilter: filter genes Geneplotter Geneplotter: plot microarray data Globaltest Global Test Gpls Classification using generalized partial least squares Graph graph: A package to handle graph data structures Hexbin Hexagonal Binning Routines Limma Linear Models for Microarray Dat

14、a Makecdfenv CDF Environment Maker marrayClasses Classes and methods for cDNA microarray data marrayInput Data input for cDNA microarrays marrayNorm Location and scale normalization for cDNA microarray data marrayPlots Diagnostic plots for cDNA microarray data marrayTools Miscellaneous functions for

15、 cDNA microarrays,Bioconductor packages Release 1.3, October 28th, 2003,Bioconductor packages Release 1.3, October 28th, 2003,Matchprobes Tools for sequence matching of probes on arrays Multtest Multiple Testing Procedures ontoTools graphs and sparse matrices for working with ontologies Pamr Pam: pr

16、ediction analysis for microarrays reposTools Repository tools for R Rhdf5 An HDF5 interface for R Siggenes Significance and Empirical Bayes Analyses of Microarrays Splicegear splicegear tkWidgets R based tk widgets Vsn Variance stabilization and calibration for microarray data widgetTools Creates an

17、 interactive tcltk widgets,Microarray data analysis,CEL, CDF,affy vsn,.gpr, .Spot, MAGEML,Pre-processing,exprSet,graph RBGL Rgraphviz,edd genefilter limma multtest ROC + CRAN,annotate annaffy + metadata packages,CRAN class cluster MASS mva,geneplotter hexbin + CRAN,marray limma vsn,Differential expr

18、ession,Graphs & networks,Clusteranalysis,Annotation,CRAN class e1071 ipred LogitBoost MASS nnet randomForest rpart,Prediction,Graphics,marray packages,Pre-processing two-color spotted array data:diagnostic plots,robust adaptive normalization (lowess, loess).,maPlot + hexbin,maBoxplot,maImage,affy pa

19、ckage,Pre-processing oligonucleotide chip data:diagnostic plots, background correction, probe-level normalization,computation of expression measures.,image,plotDensity,plotAffyRNADeg,barplot.ProbeSet,annotate, annafy, and AnnBuilder,Assemble and process genomic annotation data from public repositori

20、es. Build annotation data packages or XML data documents. Associate experimental data in real time to biological metadata from web databases such as GenBank, GO, KEGG, LocusLink, and PubMed. Process and store query results: e.g., search PubMed abstracts. Generate HTML reports of analyses.,AffyID 410

21、46_s_at,ACCNUM X95808,LOCUSID 9203,SYMBOL ZNF261,GENENAME zinc finger protein 261,MAP Xq13.1,PMID 10486218 9205841 8817323,GO GO:0003677 GO:0007275 GO:0016021,+ many other mappings,Metadata package hgu95av2 mappings between different gene identifiers for hgu95av2 chip.,MAGEML package,.,marray packag

22、es (cDNA arrays),SIGGENES PACKAGE - SAM,multtest package,Multiple hypothesis testing Control type I error rate by using e.g. Bonferroni method,heatmap,mva package -clustering,mva package principal component analysis,Getting started,Installation,Main R software: download from CRAN (

23、), use latest release, now 1.8.0. Bioconductor packages: download from Bioconductor (, use latest release, now 1.3.Available for Linux/Unix, Windows, and Mac OS.,Installation,After installing R, install Bioconductor packages using getBioC install script. From R source(“http:/www

24、“) getBioC()In general, R packages can be installed using the function install.packages. In Windows, can also use “Packages” pull-down menus.,User interaction,R Command-lineWidgets. Small-scale graphical user interfaces (GUI), providing point & click access for specific ta

25、sks. E.g. File browsing and selection for data input, basic analyses.,Widgets,tkMIAME,tkphenoData,tkSampleNames,Reading in phenoData,Documentation and help,R manuals and tutorials:available from the R website or on-line in an R session. R on-line help system: detailed on-line documentation, availabl

26、e in text, HTML, PDF, and LaTeX formats. help.start() help(lm) ?hclust apropos(mean) example(hclust) demo() demo(image),Short courses,Bioconductor short courses modular training segments on software and statistical methodology; lectures notes, computer labs, and course packages available on WWW for

27、self-instruction.,Vignettes,Bioconductor has adopted a new documentation paradigm, the vignette. A vignette is an executable document consisting of a collection of code chunks and documentation text chunks. Vignettes provide dynamic, integrated, and reproducible statistical documents that can be aut

28、omatically updated if either data or analyses are changed. Each Bioconductor package contains at least one vignette, providing task-oriented descriptions of the packages functionality.,Vignettes,vExplorer,HowTos: Task-oriented descriptions of package functionality.Executable documents consisting of

29、documentation text and code chunks.Dynamic, integrated, and reproducible statistical documents.Can be used interactively vExplorer.Generated using Sweave (tools package).,References,R, software (CRAN); documentation; newsletter: R News; mailing list. Bioconductor

30、 software, data, and documentation (vignettes); training materials from short courses; mailing list. Personal,acknowledgements,Robert GentlemanDepartment of Biostatistical Science, Dana Faber Cancer Institute, Boston Sandrine Dudoit Division Biostatistics, University of California, Berkeley,


