1、Analyzing Attribute Dependencies,Aleks Jakulin & Ivan Bratko Faculty of Computer and Information Science University of Ljubljana Slovenia,Overview,Problem: Generalize the notion of “correlation” from two variables to three or more variables. Approach: Use the Shannons entropy as the foundation for q
2、uantifying interaction. Application: Visualization, with focus on supervised learning domains. Result: We can explain several “mysteries” of machine learning through higher-order dependencies.,Problem: Attribute Dependencies,Approach: Shannons Entropy,C,A,Interaction Information,I(A;B;C) :=,I(AB;C),
3、- I(B;C),- I(A;C),= I(A;B|C) - I(A;B),(Partial) history of independent reinventions: McGill 54 (Psychometrika) - interaction informationHan 80 (Information & Control) - multiple mutual informationYeung 91 (IEEE Trans. On Inf. Theory) - mutual informationGrabisch&Roubens 99 (I. J. of Game Theory) - B
4、anzhaf interaction indexMatsuda 00 (Physical Review E) - higher-order mutual inf.Brenner et al. 00 (Neural Computation) - average synergyDemar 02 (A thesis in machine learning) - relative information gainBell 03 (NIPS02, ICA2003) - co-informationJakulin 03 - interaction gain,Properties,Invariance wi
5、th respect to attribute/label division: I(A;B;C) = I(A;C;B) = I(C;A;B) = = I(B;A;C) = I(C;B;A) = I(B;C;A). Decomposition of mutual information: I(AB;C) = I(A;C)+I(B;C)+I(A;B;C) I(A;B;C) is “synergistic information.” A, B, C are independent I(A;B;C) = 0.,Positive and Negative Interactions,If any pair
6、 of the attributes is conditionally independent w/r to a third attribute, the 3-information “neutralizes” the 2-information:I(A;B|C) = 0 I(A;B;C) = -I(A;B) Interaction information may be positive or negative: Positive: XOR problem (A = B C) synergy Negative: conditional independence, redundant attri
7、butes redundancy Zero: Independence of one of the attributes or a mix of synergy and redundancy.,Applications,Visualization Interaction graphs Interaction dendrograms Model construction Feature construction Feature selection Ensemble construction Evaluation on the CMC domain: predicting contraceptio
8、n method from demographics.,Interaction Graphs,CMC,Application: Feature Construction,NBC Model Predictive perf.(Brier score)_ 0.2157 0.0013 Wedu, Hedu 0.2087 0.0024 Wedu 0.2068 0.0019 WeduHedu 0.2067 0.0019 Age, Child 0.1951 0.0023 AgeChild 0.1918 0.0026 ACWH 0.1873 0.0027 A, C, W, H 0.1870 0.0030 A
9、, C, W 0.1850 0.0027 AC, WH 0.1831 0.0032 AC, W 0.1814 0.0033,Alternatives,GBN,NBC,TAN,0.1874 0.0032,0.1815 0.0029,0.1849 0.0028,BEST: 100000 models AC, WH, MediaExp,0.1811 0.0032,Dissimilarity Measures,The relationships between attributes are to some extent transitive. Algorithm: Define a dissimila
10、rity measure between two attributes in the context of the label C:Apply hierarchical clustering to summarize the dissimilarity matrix.,Interaction Dendrogram,cluster “tightness”,loose,tight,weakly interacting,strongly interacting,Application: Feature Selection,Soybean domain: predict disease from sy
11、mptoms; predominantly negative interactions. Global optimization procedure for feature selection: 5000 NBC models tested (B-Course),Selected features balance dissimilarity and importance. We can understand what global optimization did from the dendrogram.,Application: Ensembles,Implication: Assumpti
12、ons in Machine Learning,Work in Progress,Overfitting: the interaction information computations do not account for the increase in complexity. Support for numerical and ordered attributes. Inductive learning algorithms which use these heuristics automatically. Models that are based on the real relationships in the data, not on our assumptions about them.,Summary,There are relationships exclusive to groups of n attributes. Interaction information is a heuristic for quantification of relationships with entropy. Two visualization methods: Interaction graphs Interaction dendrograms,