1、A Survey on Transfer Learning,Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint work with Prof. Qiang Yang,Transfer Learning? (DARPA 05),Transfer Learning (TL): The ability of a system to recognize and apply knowledge and skills
2、learned in previous tasks to novel tasks (in new domains),It is motivated by human learning. People can often transfer knowledge learnt previously to novel situationsChess CheckersMathematics Computer ScienceTable Tennis Tennis,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer L
3、earning?Settings of Transfer LearningApproaches to Transfer LearningNegative TransferConclusion,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer LearningNegative TransferConclusion,Traditional ML vs. TL (P. Langley 06
4、),Traditional ML vs. TL,Learning Process of Traditional ML,Learning Process of Transfer Learning,Notation,Domain: It consists of two components: A feature space , a marginal distribution In general, if two domains are different, then they may have different feature spaces or different marginal distr
5、ibutions.Task: Given a specific domain and label space , for each in the domain, to predict its corresponding label In general, if two tasks are different, then they may have different label spaces or different conditional distributions,Notation,For simplicity, we only consider at most two domains a
6、nd two tasks.Source domain: Task in the source domain:Target domain:Task in the target domain,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer LearningNegative TransferConclusion,Why Transfer Learning?,In some domains
7、, labeled data are in short supply. In some domains, the calibration effort is very expensive.In some domains, the learning process is time consuming.,How to extract knowledge learnt from related domains to help learning in a target domain with a few labeled data?How to extract knowledge learnt from
8、 related domains to speed up learning in a target domain?,Transfer learning techniques may help!,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer LearningNegative TransferConclusion,Settings of Transfer Learning,Trans
9、fer Learning,Multi-task Learning,Transductive Transfer Learning,Unsupervised Transfer Learning,Inductive Transfer Learning,Domain Adaptation,Sample Selection Bias /Covariance Shift,Self-taught Learning,Labeled data are available in a target domain,Labeled data are available only in a source domain,N
10、o labeled data in both source and target domain,No labeled data in a source domain,Labeled data are available in a source domain,Case 1,Case 2,Source and target tasks are learnt simultaneously,Assumption: different domains but single task,Assumption: single domain and single task,An overview of vari
11、ous settings of transfer learning,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer LearningNegative TransferConclusion,Approaches to Transfer Learning,Approaches to Transfer Learning,Outline,Traditional Machine Learni
12、ng vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer Learning Inductive Transfer Learning Transductive Transfer Learning Unsupervised Transfer Learning,Inductive Transfer Learning Instance-transfer Approaches,Assumption: the source domain and target domai
13、n data use exactly the same features and labels. Motivation: Although the source domain data can not be reused directly, there are some parts of the data that can still be reused by re-weighting.Main Idea: Discriminatively adjust weighs of data in the source domain for use in the target domain.,Indu
14、ctive Transfer Learning - Instance-transfer Approaches Non-standard SVMs Wu and Dietterich ICML-04,Differentiate the cost for misclassification of the target and source data,Uniform weights,Correct the decision boundary by re-weighting,Loss function on the target domain data,Loss function on the sou
15、rce domain data,Regularization term,Inductive Transfer Learning - Instance-transfer Approaches TrAdaBoost Dai et al. ICML-07,Inductive Transfer Learning Feature-representation-transfer Approaches Supervised Feature Construction Argyriou et al. NIPS-06, NIPS-07,Assumption: If t tasks are related to e
16、ach other, then they may share some common features which can benefit for all tasks. Input: t tasks, each of them has its own training data. Output: Common features learnt across t tasks and t models for t tasks, respectively.,Supervised Feature Construction Argyriou et al. NIPS-06, NIPS-07,where,Av
17、erage of the empirical error across t tasks,Regularization to make the representation sparse,Orthogonal Constraints,Inductive Transfer Learning Feature-representation-transfer Approaches Unsupervised Feature Construction Raina et al. ICML-07,Three steps: Applying sparse coding Lee et al. NIPS-07 alg
18、orithm to learn higher-level representation from unlabeled data in the source domain.Transforming the target data to new representations by new bases learnt in the first step. Traditional discriminative models can be applied on new representations of the target data with corresponding labels.,Step1:
19、Input: Source domain data and coefficient Output: New representations of the source domain data and new bases Step2:Input: Target domain data , coefficient and bases Output: New representations of the target domain data,Unsupervised Feature Construction Raina et al. ICML-07,Inductive Transfer Learni
20、ng Model-transfer Approaches Regularization-based Method Evgeiou and Pontil, KDD-04,Assumption: If t tasks are related to each other, then they may share some parameters among individual models. Assume be a hyper-plane for task , where and Encode them into SVMs:,Common part,Specific part for individ
21、ual task,Regularization terms for multiple tasks,Inductive Transfer Learning Relational-knowledge-transfer Approaches TAMAR Mihalkova et al. AAAI-07,Assumption: If the target domain and source domain are related, then there may be some relationship between domains being similar, which can be used fo
22、r transfer learningInput: Relational data in the source domain and a statistical relational model, Markov Logic Network (MLN), which has been learnt in the source domain. Relational data in the target domain.Output: A new statistical relational model, MLN, in the target domain.Goal: To learn a MLN i
23、n the target domain more efficiently and effectively.,TAMAR Mihalkova et al. AAAI-07,Two Stages: Predicate Mapping Establish the mapping between predicates in the source and target domain. Once a mapping is established, clauses from the source domain can be translated into the target domain. Revisin
24、g the Mapped Structure The clauses mapping from the source domain directly may not be completely accurate and may need to be revised, augmented , and re-weighted in order to properly model the target data.,TAMAR Mihalkova et al. AAAI-07,Source domain (academic domain),Target domain (movie domain),Ma
25、pping,Revising,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer Learning Inductive Transfer Learning Transductive Transfer Learning Unsupervised Transfer Learning,Transductive Transfer Learning Instance-transfer Appro
26、aches Sample Selection Bias / Covariance Shift Zadrozny ICML-04, Schwaighofer JSPI-00,Input: A lot of labeled data in the source domain and no labeled data in the target domain.Output: Models for use in the target domain data.Assumption: The source domain and target domain are the same. In addition,
27、 and are the same while and may be different causing by different sampling process (training data and test data).Main Idea: Re-weighting (important sampling) the source domain data.,Sample Selection Bias/Covariance Shift,To correct sample selection bias:How to estimate ? One straightforward solution
28、 is to estimate and , respectively. However, estimating density function is a hard problem.,weights for source domain data,Sample Selection Bias/Covariance Shift Kernel Mean Match (KMM) Huang et al. NIPS 2006,Main Idea: KMM tries to estimate directly instead of estimating density function. It can be
29、 proved that can be estimated by solving the following quadratic programming (QP) optimization problem.Theoretical Support: Maximum Mean Discrepancy (MMD) Borgwardt et al. BIOINFOMATICS-06. The distance of distributions can be measured by Euclid distance of their mean vectors in a RKHS.,To match mea
30、ns between training and test data in a RKHS,Transductive Transfer Learning Feature-representation-transfer Approaches Domain Adaptation Blitzer et al. EMNL-06, Ben-David et al. NIPS-07, Daume III ACL-07,Assumption: Single task across domains, which means and are the same while and may be different c
31、ausing by feature representations across domains. Main Idea: Find a “good” feature representation that reduce the “distance” between domains.Input: A lot of labeled data in the source domain and only unlabeled data in the target domain.Output: A common representation between source domain data and t
32、arget domain data and a model on the new representation for use in the target domain.,Domain Adaptation Structural Correspondence Learning (SCL) Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05,Motivation: If two domains are related to each other, then there may exist some “pivo
33、t” features across both domain. Pivot features are features that behave in the same way for discriminative learning in both domains.Main Idea: To identify correspondences among features from different domains by modeling their correlations with pivot features. Non-pivot features form different domai
34、ns that are correlated with many of the same pivot features are assumed to correspond, and they are treated similarly in a discriminative learner.,SCL Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05,a) Heuristically choose m pivot features, which is task specific. b) Transform
35、each vector of pivot feature to a vector of binary values and then create corresponding prediction problem.,Learn parameters of each prediction problem,Do Eigen Decomposition on the matrix of parameters and learn the linear mapping function.,Use the learnt mapping function to construct new features
36、and train classifiers onto the new representations.,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer Learning Inductive Transfer Learning Transductive Transfer Learning Unsupervised Transfer Learning,Unsupervised Tran
37、sfer Learning Feature-representation-transfer Approaches Self-taught Clustering (STC) Dai et al. ICML-08,Input: A lot of unlabeled data in a source domain and a few unlabeled data in a target domain.Goal: Clustering the target domain data.Assumption: The source domain and target domain data share so
38、me common features, which can help clustering in the target domain.Main Idea: To extend the information theoretic co-clustering algorithm Dhillon et al. KDD-03 for transfer learning.,Self-taught Clustering (STC) Dai et al. ICML-08,Objective function that need to be minimizedwhere,Source domain data,
39、Target domain data,Common features,Cluster functions,Co-clustering in the target domain,Co-clustering in the source domain,Output,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer LearningNegative TransferConclusion,Ne
40、gative Transfer,Most approaches to transfer learning assume transferring knowledge across domains be always positive. However, in some cases, when two tasks are too dissimilar, brute-force transfer may even hurt the performance of the target task, which is called negative transfer Rosenstein et al N
41、IPS-05 Workshop.Some researchers have studied how to measure relatedness among tasks Ben-David and Schuller NIPS-03, Bakker and Heskes JMLR-03.How to design a mechanism to avoid negative transfer needs to be studied theoretically.,Outline,Traditional Machine Learning vs. Transfer LearningWhy Transfer Learning?Settings of Transfer LearningApproaches to Transfer LearningNegative TransferConclusion,Conclusion,How to avoid negative transfer need to be attracted more attention!,