A Review of Information FilteringPart II- Collaborative Filtering.ppt

资源描述

1、A Review of Information Filtering Part II: Collaborative Filtering,Chengxiang ZhaiLanguage Technologies Institiute School of Computer Science Carnegie Mellon University,Outline,A Conceptual Framework for Collaborative Filtering (CF) Rating-based Methods (Breese et al. 98) Memory-based methods Model-

2、based methods Preference-based Methods (Cohen et al. 99 & Freund et al. 98) Summary & Research Directions,What is Collaborative Filtering (CF)?,Making filtering decisions for an individual user based on the judgments of other users Inferring individuals interest/preferences from that of other simila

3、r users General idea Given a user u, find similar users u1, , um Predict us preferences based on the preferences of u1, , um,CF: Applications,Recommender Systems: books, CDs, Videos, Movies, potentially anything! Can be combined with content-based filtering Example (commercial) systems GroupLens (Re

4、snick et al. 94): usenet news rating Amazon: book recommendation Firefly (purchased by Microsoft?): music recommendation Alexa: web page recommendation,CF: Assumptions,Users with a common interest will have similar preferences Users with similar preferences probably share the same interest Examples

5、“interest is IR” = “read SIGIR papers” “read SIGIR papers” = “interest is IR” Sufficiently large number of user preferences are available,CF: Intuitions,User similarity If Jamie liked the paper, Ill like the paper ? If Jamie liked the movie, Ill like the movie Suppose Jamie and I viewed similar movi

6、es in the past six months Item similarity Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars You may also like Independence Day,Collaborative Filtering vs. Content-based Filtering,Basic filtering question: Will user U like item X? Two different ways of answe

7、ring it Look at what U likes Look at who likes X Can be combined,= characterize X = content-based filtering,= characterize U = collaborative filtering,Rating-based vs. Preference-based,Rating-based: Users preferences are encoded using numerical ratings on items Complete ordering Absolute values can

8、be meaningful But, values must be normalized to combine Preferences: Users preferences are represented by partial ordering of items Partial ordering Easier to exploit implicit preferences,A Formal Framework for Rating,u1 u2 ui .um,Users: U,Objects: O,o1 o2 oj on 3 1.5 . 2 213,Xij=f(ui,oj)=?,?,The ta

9、sk,Unknown function f: U x O R,Assume known f values for some (u,o)s Predict f values for other (u,o)s Essentially function approximation, like other learning problems,Where are the intuitions?,Similar users have similar preferences If u u, then for all os, f(u,o) f(u,o) Similar objects have similar

10、 user preferences If o o, then for all us, f(u,o) f(u,o) In general, f is “locally constant” If u u and o o, then f(u,o) f(u,o) “Local smoothness” makes it possible to predict unknown values by interpolation or extrapolation What does “local” mean?,Two Groups of Approaches,Memory-based approaches f(

11、u,o) = g(u)(o) g(u)(o) if u u Find “neighbors” of u and combine g(u)(o)s Model-based approaches Assume structures/model: object cluster, user cluster, f defined on clusters f(u,o) = f(cu, co) Estimation & Probabilistic inference,Memory-based Approaches (Breese et al. 98),General ideas: Xij: rating o

12、f object j by user i ni: average rating of all objects by user i Normalized ratings: Vij = Xij - ni Memory-based predictionSpecific approaches differ in w(a,i) - the distance/similarity between user a and i,User Similarity Measures,Pearson correlation coefficient (sum over commonly rated items)Cosin

13、e measureMany other possibilities!,Improving User Similarity Measures (Breese et al. 98),Dealing with missing values: default ratings Inverse User Frequency (IUF): similar to IDF Case Amplification: use w(a,I)p, e.g., p=2.5,Model-based Approaches (Breese et al. 98),General ideas Assume that data/rat

14、ings are explained by a probabilistic model with parameter Estimate/learn model parameter based on data Predict unknown rating using E xk+1 | x1, , xk, which is computed using the estimated modelSpecific methods differ in the model used and how the model is estimated,Probabilistic Clustering,Cluster

15、ing users based on their ratings Assume ratings are observations of a multinomial mixture model with parameters p(C), p(xi|C) Model estimated using standard EM Predict ratings using Exk+1 | x1, , xk,Bayesian Network,Use BN to capture object/item dependency Each item/object is a node (Dependency) str

16、ucture is learned from all data Model parameters: p(xk+1 |pa(xk+1) where pa(xk+1) is the parents/predictors of xk+1 (represented as a decision tree) Predict ratings using Exk+1 | x1, , xk,Three-way Aspect Model (Popescul et al. 2001),CF + content-based Generative model (u,d,w) as observations z as h

17、idden variable Standard EM Essentially clustering the joint data Evaluation on ResearchIndex data Found its better to treat (u,w) as observations,Evaluation Criteria (Breese et al. 98),Rating accuracy Average absolute deviation Pa = set of items predicted Ranking accuracy Expected utility Exponentia

18、lly decaying viewing probabillity ( halflife )= the rank where the viewing probability =0.5 d = neutral rating,Datasets,Results,- BN & CR+ are generally betterthan VSIM & BC - BN is best with more training data - VSIM is better with little training data - Inverse User Freq. Is effective - Case ampli

19、fication is mostly effective,Summary of Rating-based Methods,Effectiveness Both memory-based and model-based methods can be effective The correlation method appears to be robust Bayesian network works well with plenty of training data, but not very well with little training data The cosine similarit

20、y method works well with little training data,Summary of Rating-based Methods (cont.),Efficiency Memory based methods are slower than model-based methods in predicting Learning can be extremely slow for model-based methods,Preference-based Methods (Cohen et al. 99, Freund et al. 98),Motivation Expli

21、cit ratings are not always available, but implicit orderings/preferences might be available Only relative ratings are meaningful, even if when ratings are available Combining preferences has other applications, e.g.,Merging results from different search engines,A Formal Model of Preferences,Instance

22、s: O=o1, on Ranking function: R: (U x) O x O 0,1 R(u,v)=1 means u is strongly preferred to v R(u,v)=0 means v is strongly preferred to u R(u,v)=0.5 means no preference Feedback: F = (u,v), u is preferred to v Minimize Loss:,Hypothesis space,The Hypothesis Space H,Without constraints on H, the loss i

23、s minimized by any R that agrees with F Appropriate constraints for collaborative filtering Compare this with,The Hedge Algorithm for Combining Preferences,Iterative updating of w1, w2, , wn Initialization: wi is uniform Updating: 0,1L=0 = weight stays L is large = weight is decreased,Some Theoretic

24、al Results,The cumulative loss of Ra will not be much worse than that of the best ranking expert/feature Preferences Ra = ordering = R L(R,F) = DISAGREE(,Ra)/|F| + L(Ra,F) Need to find that minimizes disagreement General case: NP-complete,A Greedy Ordering Algorithm,Use weighted graph to represent p

25、references R For each node, compute the potential value, I.e., outgoing_weights - ingoing_weightsRank the node with the highest potential value above all others Remove this node and its edges, repeat At least half of the optimal agreement is guaranteed,Improvement,Identify all the strongly connected

26、 components Rank the components consistently with the edges between them Rank the nodes within a component using the basic greedy algorithm,Evaluation of Ordering Algorithms,Measure: “weight coverage” Datasets = randomly generated small graphs Observations The basic greedy algorithm works better tha

27、n a random permutation baseline Improved version is generally better, but the improvement is insignificant for large graphs,Metasearch Experiments,Task: Known item search Search for a ML researchers homepage Search for a university homepage Search expert = variant of query Learn to merge results of

28、all search experts Feedback Complete : known item preferred to all others Click data : known item preferred to all above it Leave-one-out testing,Metasearch Results,Measures: compare combined preferences with individual ranking function sign test: to see which system tends to rank the known relevant

29、 article higher. #queries with the known relevant item ranked above k. average rank of the known relevant item Learned system better than individual expert by all measure (not surprising, why?),Metasearch Results (cont.),Direct Learning of an Ordering Function,Each expert is treated as a ranking fea

30、ture fi: O R U 0 (allow partial ranking) Given preference feedback : X x X R Goal: to learn H that minimizes the lossD (x0,x1): a distribution over X x X (actually a uniform dist. over pairs with feedback order) D (x0,x1) = c max0, (x0,x1) ,The RankBoost Algorithm,Iterative updating of D(x0,x1) Init

31、ialization: D1= D For t=1,T: Train weak learner using Dt Get weak hypothesis ht: X R Choose t 0 Update Final hypothesis:,How to Choose t and Design ht ?,Bound on the ranking lossThus, we should choose t that minimizes the bound Three approaches: Numerical search Special case: h is either 0 or 1 Appr

32、oximation of Z, then find analytic solution,Efficient RankBoost for Bipartite Feedback,Complexity at each round: O(|X0|X1|) O(|X0|+|X1|),Bipartite feedback: Essentially binary classification,X0,X1,Evaluation of RankBoost,Meta-search: Same as in (Cohen et al 99) Perfect feedback 4-fold cross validati

33、on,EachMovie Evaluation,# users,#movies/user,#feedback movies,Performance Comparison Cohen et al. 99 vs. Freund et al. 99,Summary,CF is “easy” The users expectation is low Any recommendation is better than none Making it practically useful CF is “hard” Data sparseness Scalability Domain-dependent,Su

34、mmary (cont.),CF as a Learning Task Rating-based formulation Learn f: U x O - R Algorithms Instance-based/memory-based (k-nearest neighbors) Model-based (probabilistic clustering) Preference-based formulation Learn PREF: U x O x O - R Algorithms General preference combination (Hedge), greedy orderin

35、g Efficient restricted preference combination (RankBoost),Summary (cont.),Evaluation Rating-based methods Simple methods seem to be reasonably effective Advantage of sophisticated methods seems to be limited Preference-based methods More effective than rating-based methods according to one evaluatio

36、n Evaluation on meta-search is weak,Research Directions,Exploiting complete information CF + content-based filtering + domain knowledge + user model More “localized” kernels for instance-based methods Predicting movies need different “neighbor users” than predicting books Suggesting using items simi

37、lar to the target item as features to find neighbors,Research Directions (cont.),Modeling time There might be sequential patterns on the items a user purchased (e.g., bread machine - bread machine mix) Probabilistic model of preferences Making preference function a probability function, e.g, P(AB|U)

38、 Clustering items and users Minimizing preference disagreements,References,Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) “Learning to Order Things“, Journal of AI Research, Volume 10, pages 243-270. Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting algorithm for co

39、mbining preferences. Machine Learning Journal. 1999. Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Articial Intelligence, pp. 43-52. Alexandrin Popescul and Lyle H.

40、Ungar, Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments, UAI 2001. N. Good, J.B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. “Combining Collaborative Filtering with Personal Agents for Better Recommendations.“ Proceedings AAAI-99. pp 439-446. 1999.,The EndThank you!,

展开阅读全文