1、Canonical Correlation Analysis: An overview with application to learning methods By David R. Hardoon, Sandor Szedmak, John Shawe-Taylor School of Electronics and Computer Science, University of Southampton Published in Neural Computaion, 2004,Presented by: Shankar Bhargav,Measuring the linear relati
2、onship between two multi dimensional variables Finding two sets of basis vectors such that the correlation between the projections of the variables onto these basis vectors is maximized Determine Correlation Coefficients,Canonical Correlation Analysis,More than one canonical correlations will be fou
3、nd each corresponding to a different set of basis vectors/Canonical variates Correlations between successively extracted canonical variates are smaller and smaller Correlation coefficients : Proportion of correlation between the canonical variates accounted for by the particular variable.,Canonical
4、Correlation Analysis,Differences with Correlation,Not dependent on the coordinate system of variables Finds direction that yield maximum correlations,Find basis vectors for two sets of variables x, y such that the correlations between the projections of the variables onto these basis vectorSx = (x.w
5、x) and Sy = (y.wy) = ESx Sy ESx2 ESy2 = E(xT wx yT wy) E(xT wx xT wx) E(yT wy yT wy), = max wx wy EwxTx yT wy EwxTx xT wx EwyT y yT wy = max wx wy wxTCxy wy wxTCxxwx wyTCyy wySolving thiswith constraint wxTCxxwx =1wyTCyy wy=1,Cxx-1CxyCyy-1Cyx wx = 2 wx Cyy-1CyxCxx-1Cxy wy= 2 wyCxy wy = x Cxx wx Cyx
6、wx = y Cyy wyx=y-1= wyTCyywy wxTCxxwx,CCA in Matlab, A, B, r, U, V = canoncorr(x, y)x, y : set of variables in the form of matricesEach row is an observationEach column is an attribute/feature A, B: Matrices containing the correlation coefficient r : Column matrix containing the canonical correlatio
7、ns (Successively decreasing) U, V: Canonical variates/basis vectors for A,B respectively,Interpretation of CCA,Correlation coefficient represents unique contribution of each variable to relation Multicollinearity may obscure relationships Factor Loading : Correlations between the canonical variates
8、(basis vector) and the variables in each set Proportion of variance explained by the canonical variates can be inferred by factor loading,Redundancy Calculation,Redundancy left = (loadingsleft2)/p*Rc2Redundancy right = (loadingsright2)/q*Rc2p Number of variable in the first (left) set of variablesq
9、Number of variable in the second (right) set of variables Rc2 Respective squared canonical correlationSince successively extracted roots are uncorrelated we can sum the redundancies across all correlations to get a single index of redundancy.,Application,Kernel CCA can be used to find non linear rel
10、ationships between multi variates Two views of the same semantic object to extract the representation of the semantics Speaker Recognition Audio and Lip movement Image retrieval Image features (HSV, Texture) and Associated text,Use of KCCA in cross-modal retrieval,400 records of JPEG images for each
11、 class with associated text and a total of 3 classes Data was split randomly into 2 parts for training and test Features Image HSV Color, Gabor texture Text Term frequencies Results were taken for an average of 10 runs,Cross-modal retrieval,Content based retrieval: Retrieve images in the same class
12、Tested with 10 and 30 images setswhere countjk = 1 if the image k in the set is of the same label as the text query present in the set, else countjk = 0.,Comparison of KCCA (with 5 and 30 Eigen vectors) with GVSM Content based retrieval,Mate based retrieval,Match the exact image among the selected r
13、etrieved images Tested with 10 and 30 images setswhere countj = 1 if the exact matching image was present in the set else it is 0,Comparison of KCCA (with 30 and 150 Eigen vectors) with GVSM Mate based retrieval,Comments,The good Good explanation of CCA and KCCA Innovative use of KCCA in image retri
14、eval applicationThe bad The data set and the number of classes used were small The image set size is not taken into account while calculating accuracy in Mate based retrieval Could have done cross-validation tests,Limitations and Assumptions of CCA,At least 40 to 60 times as many cases as variables is recommended to get relliable estimates for two roots BarciKowski & Stevens(1986) Outliers can greatly affect the canonical correlation Variables in two sets should not be completely redundant,Thank you,