ImageVerifierCode 换一换
格式:PPT , 页数:42 ,大小:786KB ,
资源ID:378657      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-378657.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Association Rule Mining (II).ppt)为本站会员(赵齐羽)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Association Rule Mining (II).ppt

1、Association Rule Mining (II),Instructor: Qiang Yang Thanks: J.Han and J. Pei,Frequent-pattern mining methods,2,Bottleneck of Frequent-pattern Mining,Multiple database scans are costly Mining long patterns needs many passes of scanning and generates lots of candidates To find frequent itemset i1i2i10

2、0 # of scans: 100 # of Candidates: (1001) + (1002) + + (110000) = 2100-1 = 1.27*1030 ! Bottleneck: candidate-generation-and-test Can we avoid candidate generation?,Frequent-pattern mining methods,3,FP-growth: Frequent-pattern Mining Without Candidate Generation,Heuristic: let P be a frequent itemset

3、, S be the set of transactions contain P, and x be an item. If x is a frequent item in S, x P must be a frequent itemset No candidate generation! A compact data structure, FP-tree, to store information for frequent pattern mining Recursive mining algorithm for mining complete set of frequent pattern

4、s,Frequent-pattern mining methods,4,Example,Min Support = 3,Frequent-pattern mining methods,5,Scan the database,List of frequent items, sorted: (item:support)The root of the tree is created and labeled with “” Scan the database Scanning the first transaction leads to the first branch of the tree: Or

5、der according to frequency,Frequent-pattern mining methods,6,Scanning TID=100,Transaction Database TID Items 100 f,a,c,d,g,i,m,p,f:1,c:1,a:1,m:1,p:1,Header TableNode Item count head f 1 c 1 a 1m 1p 1,root,Frequent-pattern mining methods,7,Scanning TID=200,Frequent Single Items: F1= TID=200 Possible

6、frequent items: Intersect with F1: f,c,a,b,m Along the first branch of , intersect: Generate two children , ,Frequent-pattern mining methods,8,Scanning TID=200,Transaction Database TID Items 200 f,c,a,b,m,f:2,c:2,a:2,m:1,p:1,Header TableNode Item count head f 1 c 1 a 1 b 1 m 2p 1,root,b:1,m:1,Freque

7、nt-pattern mining methods,9,The final FP-tree,Transaction Database TID Items 100 f,a,c,d,g,i,m,p 200 a,b,c,f,l,m,o 300 b,f,h,j,o 400 b,c,k,s,p 500 a,f,c,e,l,p,m,n Min support = 3,Frequent 1-items in frequency descending order:f,c,a,b,m,p,f:4,c:1,b:1,p:1,b:1,c:3,a:3,b:1,m:2,p:2,m:1,Header TableNode I

8、tem count head f 1 c 2 a 1 b 3 m 2 p 2,Frequent-pattern mining methods,10,FP-Tree Construction,Scans the database only twice Subsequent mining: based on the FP-tree,Frequent-pattern mining methods,11,How to Mine an FP-tree?,Step 1: form conditional pattern base Step 2: construct conditional FP-tree

9、Step 3: recursively mine conditional FP-trees,Frequent-pattern mining methods,12,Conditional Pattern Base,Let I be a frequent item A sub database which consists of the set of prefix paths in the FP-tree With item I as a co-occurring suffix pattern Example: m is a frequent item ms conditional pattern

10、 base: : support =2 : support = 1 Mine recursively on such databases,f:4,c:1,b:1,p:1,b:1,c:3,a:3,b:1,m:2,p:2,m:1,Frequent-pattern mining methods,13,Conditional Pattern Tree,Let I be a suffix item, DB|I be the conditional pattern base The frequent pattern tree TreeI is known as the conditional patter

11、n tree Example: m is a frequent item ms conditional pattern base: : support =2 : support = 1 ms conditional pattern tree,f:4,c:3,a:3,m:2,Frequent-pattern mining methods,14,Composition of patterns a and b,Let a be a frequent item in DB, B be as conditional pattern base, and b be an itemset in B. Then

12、 a + b is frequent in DB if and only if b is frequent in B. Example: Starting with a=p ps conditional pattern base (from the tree) B= (f,c,a,m): 2 (c,b): 1 Let b be c. Then a+b=p,c, with support = 3.,Frequent-pattern mining methods,15,Single path tree,Let P be a single path FP tree Let I1, I2, Ik be

13、 an itemset in the tree Let Ij have the lowest support Then the support(I1, I2, Ik)=support(Ij) Example:,f:4,c:1,b:1,p:1,b:1,c:3,a:3,b:1,m:2,p:2,m:1,Frequent-pattern mining methods,16,FP_growth Algorithm Fig 6.10,Recursive Algorithm Input: A transaction database, min_supp Output: The complete set of

14、 frequent patterns 1. FP-Tree construction 2. Mining FP-Tree by calling FP_growth(FP_tree, null) Key Idea: consider single path FP-tree and multi-path FP-tree separately Continue to split until get single-path FP-tree,Frequent-pattern mining methods,17,FP_Growth (tree, a),If tree contains a single p

15、ath P, then For each combination (denoted as b) of the nodes in the path P, then Generate pattern b+a with support = min_supp of nodes in b Else for each a in the header of tree, do Generate pattern b = a + a with support = a.support; Construct (1) bs conditional pattern base and (2) bs conditional

16、FP-tree Treeb If Treeb is not empty, then Call FP-growth(Treeb, b); ,Frequent-pattern mining methods,18,FP-Growth vs. Apriori: Scalability With the Support Threshold,Data set T25I20D10K,Frequent-pattern mining methods,19,FP-Growth vs. Tree-Projection: Scalability with the Support Threshold,Data set

17、T25I20D100K,Frequent-pattern mining methods,20,Why Is FP-Growth the Winner?,Divide-and-conquer: decompose both the mining task and DB according to the frequent patterns obtained so far leads to focused search of smaller databases Other factors no candidate generation, no candidate test compressed da

18、tabase: FP-tree structure no repeated scan of entire database basic opscounting and FP-tree building, not pattern search and matching,Frequent-pattern mining methods,21,Implications of the Methodology: Papers by Han, et al.,Mining closed frequent itemsets and max-patterns CLOSET (DMKD00) Mining sequ

19、ential patterns FreeSpan (KDD00), PrefixSpan (ICDE01) Constraint-based mining of frequent patterns Convertible constraints (KDD00, ICDE01) Computing iceberg data cubes with complex measures H-tree and H-cubing algorithm (SIGMOD01),Frequent-pattern mining methods,22,Visualization of Association Rules

20、: Pane Graph,Frequent-pattern mining methods,23,Visualization of Association Rules: Rule Graph,Frequent-pattern mining methods,24,Mining Various Kinds of Rules or Regularities,Multi-level, quantitative association rules, correlation and causality, ratio rules, sequential patterns, emerging patterns,

21、 temporal associations, partial periodicity Classification, clustering, iceberg cubes, etc.,Frequent-pattern mining methods,25,Multiple-level Association Rules,Items often form hierarchy Flexible support settings: Items at the lower level are expected to have lower support. Transaction database can

22、be encoded based on dimensions and levels explore shared multi-level mining,Frequent-pattern mining methods,26,Quantitative Association Rules,age(X,”34-35”) income(X,”30K - 50K”) buys(X,”high resolution TV”),Numeric attributes are dynamically discretized Such that the confidence or compactness of th

23、e rules mined is maximized. 2-D quantitative association rules: Aquan1 Aquan2 Acat Cluster “adjacent” association rules to form general rules using a 2-D grid. Example:,Frequent-pattern mining methods,27,Redundant Rules SA95,Which rule is redundant? milk wheat bread, support = 8%, confidence = 70% “

24、skim milk” wheat bread, support = 2%, confidence = 72% The first rule is more general than the second rule. A rule is redundant if its support is close to the “expected” value, based on a general rule, and its confidence is close to that of the general rule.,INCREMENTAL MINING CHNW96,Rules in DB wer

25、e found and a set of new tuples db is added to DB, Task: to find new rules in DB + db. Usually, DB is much larger than db. Properties of Itemsets: frequent in DB + db if frequent in both DB and db. infrequent in DB + db if also in both DB and db. frequent only in DB, then merge with counts in db. No

26、 DB scan is needed! frequent only in db, then scan DB once to update their itemset counts. Same principle applicable to distributed/parallel mining.,Frequent-pattern mining methods,29,CORRELATION RULES,Association does not measure correlation BMS97, AY98. Among 5000 students 3000 play basketball, 37

27、50 eat cereal, 2000 do both play basketball eat cereal 40%, 66.7% Conclusion: “basketball and cereal are correlated” is misleading because the overall percentage of students eating cereal is 75%, higher than 66.7%. Confidence does not always give correct picture!,Frequent-pattern mining methods,30,C

28、orrelation Rules,P(AB)=P(B)*P(A), if A and B are independent events A and B negatively correlated the value is less than 1; Otherwise A and B positively correlated.,P(B|A)/P(B) is known as the lift of rule BA If less than one, then B and A are negatively correlated. BasketballCereal 2000/(3000*3750/

29、5000)=2000*5000/3000*37501,Frequent-pattern mining methods,31,Chi-square Correlation BMS97,The cutoff value at 95% significance level is 3.84 0.9 Thus, we do not reject the independence assumption.,Frequent-pattern mining methods,32,Constraint-based Data Mining,Finding all the patterns in a database

30、 autonomously? unrealistic! The patterns could be too many but not focused! Data mining should be an interactive process User directs what to be mined using a data mining query language (or a graphical user interface) Constraint-based mining User flexibility: provides constraints on what to be mined

31、 System optimization: explores such constraints for efficient miningconstraint-based mining,Frequent-pattern mining methods,33,Constraints in Data Mining,Knowledge type constraint: classification, association, etc. Data constraint using SQL-like queries find product pairs sold together in stores in

32、Vancouver in Dec.00 Dimension/level constraint in relevance to region, price, brand, customer category Rule (or pattern) constraint small sales (price $200) Interestingness constraint strong rules: min_support 3%, min_confidence 60%,Frequent-pattern mining methods,34,Constrained Mining vs. Constrain

33、t-Based Search,Constrained mining vs. constraint-based search/reasoning Both are aimed at reducing search space Finding all patterns satisfying constraints vs. finding some (or one) answer in constraint-based search in AI Constraint-pushing vs. heuristic search It is an interesting research problem

34、on how to integrate them Constrained mining vs. query processing in DBMS Database query processing requires to find all Constrained pattern mining shares a similar philosophy as pushing selections deeply in query processing,Frequent-pattern mining methods,35,Constrained Frequent Pattern Mining: A Mi

35、ning Query Optimization Problem,Given a frequent pattern mining query with a set of constraints C, the algorithm should be sound: it only finds frequent sets that satisfy the given constraints C complete: all frequent sets satisfying the given constraints C are found A nave solution First find all f

36、requent sets, and then test them for constraint satisfaction More efficient approaches: Analyze the properties of constraints comprehensively Push them as deeply as possible inside the frequent pattern computation.,Frequent-pattern mining methods,36,Anti-Monotonicity in Constraint-Based Mining,Anti-

37、monotonicity intemset S satisfies the constraint, so does any of its subset sum(S.Price) v is anti-monotone sum(S.Price) v is not anti-monotone Example. C: range(S.profit) 15 is anti-monotone Itemset ab violates C So does every superset of ab,TDB (min_sup=2),Frequent-pattern mining methods,37,Which

38、Constraints Are Anti-Monotone?,Frequent-pattern mining methods,38,Monotonicity in Constraint-Based Mining,Monotonicity When an intemset S satisfies the constraint, so does any of its superset sum(S.Price) v is monotone min(S.Price) v is monotone Example. C: range(S.profit) 15 Itemset ab satisfies C

39、So does every superset of ab,TDB (min_sup=2),Frequent-pattern mining methods,39,Which Constraints Are Monotone?,Frequent-pattern mining methods,40,Succinctness, Convertible, Inconvertable Constraints in Book,We will not consider these in this course.,Frequent-pattern mining methods,41,Associative Cl

40、assification,Mine association possible rules in form of itemset class Itemset: a set of attribute-value pairs Class: class label Build Classifier Organize rules according to decreasing precedence based on confidence and support B. Liu, W. Hsu & Y. Ma. Integrating classification and association rule

41、mining. In KDD98,Frequent-pattern mining methods,42,Classification by Aggregating Emerging Patterns,Emerging pattern (EP): A pattern frequent in one class of data but infrequent in others. Age=30 is frequent in class “buys_computer=yes” and infrequent in class “buys_computer=no” Rule: age=30 buys computer G. Dong & J. Li. Efficient mining of emerging patterns: discovering trends and differences. In KDD99,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1