ImageVerifierCode 换一换
格式:PPT , 页数:52 ,大小:1.16MB ,
资源ID:378073      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-378073.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Advanced databases Inferring new knowledge from data(.ppt)为本站会员(wealthynice100)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Advanced databases Inferring new knowledge from data(.ppt

1、 1Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/1Advanced databases Inferring new knowledge from data(bases): Knowledge Discovery in DatabasesBettina BerendtKatholieke Universiteit Leuven, Department of Computer Sciencehttp:/www.cs.kuleuven.be/

2、berendt/teaching/2007w/adb/ Last update: 15 November 20072Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/2AgendaMotivation: Application examplesThe process of knowledge discoveryOrigins and contextMajor issues in knowledge discoveryA short overv

3、iew of key techniques3Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/3What is the impact of genetically modified organisms?4Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/4Is our school syst

4、em good for immigrants and/or children from poor backgrounds?5Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/5What are the effects of teaching in English at universities?6Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be

5、/berendt/teaching/2007w/adb/6What makes people happy?7Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/7What do men and women like?8Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/8Is this a ma

6、n or a woman?clicked on9Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/9Primary Tasks of Data MiningClassification Deviation andchange detection SummarizationClusteringDependency ModelingRegressionfinding the descriptionof several predefined cla

7、sses and classify a data item into one of them.maps a data item to a real-valued prediction variable.identifying a finite set of categories or clusters to describe the data.finding a compact description for a subset of datafinding a model which describes significant dependencies between variables.di

8、scovering the most significant changes in the data 10Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/10AgendaMotivation: Application examplesThe process of knowledge discoveryOrigins and contextMajor issues in knowledge discoveryA short overview

9、of key techniques11Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/11Data mining“ and knowledge discovery“n (informal definition):data mining is about discovering knowledge in (huge amounts of) datan Therefore, it is clearer to speak about “knowl

10、edge discovery in data(bases)”12Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/12Recall: Data, information, and knowledgeData represents a fact or statement of event without relation to other things.n Ex: It is raining.Information embodies the u

11、nderstanding of a relationship of some sort, possibly cause and effect.n Ex: The temperature dropped 15 degrees and then it started raining.Knowledge represents a pattern that connects and generally provides a high level of predictability as to what is described or what will happen next.n Ex: If the

12、 humidity is very high and the temperature drops substantially the atmospheres is often unlikely to be able to hold the moisture so it rains.(This is from knowledge-management theory. If you want to know about wisdom, check the Web page:G. Bellinger, D. Castro, & A. Mills: Data, Information, Knowled

13、ge, and Wisdom. http:/www.systems-thinking.org/dikw/dikw.htm )13Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/13Why Data Mining? The Explosive Growth of Data: from terabytes to petabytesn Data collection and data availabilityl Automated data co

14、llection tools, database systems, Web, computerized societyn Major sources of abundant datal Business: Web, e-commerce, transactions, stocks, l Science: Remote sensing, bioinformatics, scientific simulation, l Society and everyone: news, digital cameras, We are drowning in data, but starving for kno

15、wledge! “Necessity is the mother of invention” Data mining Automated analysis of massive data sets14Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/14Background: Evolution of Database Technology1960s:n Data collection, database creation, IMS and

16、network DBMS1970s: n Relational data model, relational DBMS implementation1980s: n RDBMS, advanced data models (extended-relational, OO, deductive, etc.) n Application-oriented DBMS (spatial, scientific, engineering, etc.)1990s: n Data mining, data warehousing, multimedia databases, and Web database

17、s2000sn Stream data management and miningn Data mining and its applicationsn Web technology (XML, data integration) and global information systems 15Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/15The KDD processThe non-trivial process of ident

18、ifying valid, novel, potentially useful, and ultimately understandable patterns in data - Fayyad, Platetsky-Shapiro, Smyth (1996) non-trivial processMultiple processvalid Justified patterns/modelsnovel Previously unknownuseful Can be used understandable by human and machine16Berendt: Advanced databa

19、ses, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/16The process part of knowledge discoveryCRISP-DM CRoss Industry Standard Process for Data Mining a data mining process model that describes commonly used approaches that expert data miners use to tackle problems.17Berendt

20、: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/17Knowledge discovery, machine learning, data miningn Knowledge discovery= the whole process n Machine learningthe application of induction algorithms and other algorithms that can be said to learn.“= mode

21、ling“ phasen Data miningl sometimes = KD, sometimes = ML18Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/18The KDD ProcessData organized by function Create/selecttarget databaseSelect samplingtechnique and sample dataSupply missing valuesNormali

22、zevaluesSelect DM task (s)Transform todifferentrepresentationEliminatenoisy dataTransformvaluesSelect DM method (s)Create derivedattributesExtract knowledgeFind importantattributes &value rangesTest knowledge Refine knowledgeQuery & report generationAggregation & sequencesAdvanced methodsData wareho

23、using123 4519Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/19AgendaMotivation: Application examplesThe process of knowledge discoveryOrigins and contextMajor issues in knowledge discoveryA short overview of key techniques20Berendt: Advanced dat

24、abases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/20Main Contributing Areas of KDDDatabasesStore, access, search, update data (deduction)StatisticsInfer info from data (deduction & induction, mainly numeric data) Machine LearningComputer algorithms that improve automat

25、ically through experience (mainly induction, symbolic data)KDDdata warehouses:integrated dataOLAP: On-Line Analytical Processing21Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/21Data Mining: Classification SchemesGeneral functionalityn Descript

26、ive data mining n Predictive data miningDifferent views lead to different classificationsn Data view: Kinds of data to be minedn Knowledge view: Kinds of knowledge to be discoveredn Method view: Kinds of techniques utilizedn Application view: Kinds of applications adapted22Berendt: Advanced database

27、s, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/22Data Mining: Confluence of Multiple Disciplines Data MiningDatabase Technology StatisticsMachineLearningPatternRecognition Algorithm OtherDisciplinesVisualization23Berendt: Advanced databases, winter term 2007/08, http:/ww

28、w.cs.kuleuven.be/berendt/teaching/2007w/adb/23Why Not Traditional Data Analysis?Tremendous amount of datan Algorithms must be highly scalable to handle such as tera-bytes of dataHigh-dimensionality of data n Micro-array may have tens of thousands of dimensionsHigh complexity of datan Data streams an

29、d sensor datan Time-series data, temporal data, sequence data n Structure data, graphs, social networks and multi-linked datan Heterogeneous databases and legacy databasesn Spatial, spatiotemporal, multimedia, text and Web datan Software programs, scientific simulationsNew and sophisticated applicat

30、ions24Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/24AgendaMotivation: Application examplesThe process of knowledge discoveryOrigins and contextMajor issues in knowledge discoveryA short overview of key techniques25Berendt: Advanced databases,

31、 winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/25Data Mining: On What Kinds of Data?Database-oriented data sets and applicationsn Relational database, data warehouse, transactional databaseAdvanced data sets and advanced applications n Data streams and sensor datan Time-se

32、ries data, temporal data, sequence data (incl. bio-sequences) n Structure data, graphs, social networks and multi-linked datan Object-relational databasesn Heterogeneous databases and legacy databasesn Spatial data and spatiotemporal datan Multimedia databasen Text databasesn The World-Wide Web26Ber

33、endt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/26Data Mining FunctionalitiesMultidimensional concept description: Characterization and discriminationn Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regionsFrequent patte

34、rns, association, correlation vs. causalityn Diaper Beer 0.5%, 75% (Correlation or causality?)Classification and prediction n Construct models (functions) that describe and distinguish classes or concepts for future predictionl E.g., classify countries based on (climate), or classify cars based on (

35、gas mileage)n Predict some unknown or missing numerical values 27Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/27Data Mining Functionalities (2)Cluster analysisn Class label is unknown: Group data to form new classes, e.g., cluster houses to fi

36、nd distribution patternsn Maximizing intra-class similarity & minimizing interclass similarityOutlier analysisn Outlier: Data object that does not comply with the general behavior of the datan Noise or exception? Useful in fraud detection, rare events analysisTrend and evolution analysisn Trend and

37、deviation: e.g., regression analysisn Sequential pattern mining: e.g., digital camera large SD memoryn Periodicity analysisn Similarity-based analysisOther pattern-directed or statistical analyses28Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/

38、28Are All the “Discovered” Patterns Interesting?Data mining may generate thousands of patterns: Not all of them are interestingn Suggested approach: Human-centered, query-based, focused miningInterestingness measuresn A pattern is interesting if it is easily understood by humans, valid on new or tes

39、t data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm Objective vs. subjective interestingness measuresn Objective: based on statistics and structures of patterns, e.g., support, confidence, etc.n Subjective: based on users belief

40、in the data, e.g., unexpectedness, novelty, actionability, etc.29Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/29Find All and Only Interesting Patterns?Find all the interesting patterns: Completenessn Can a data mining system find all the inter

41、esting patterns? Do we need to find all of the interesting patterns?n Heuristic vs. exhaustive searchn Association vs. classification vs. clusteringSearch for only interesting patterns: An optimization problemn Can a data mining system find only the interesting patterns?n Approachesl First general a

42、ll the patterns and then filter out the uninteresting onesl Generate only the interesting patternsmining query optimization30Berendt: Advanced databases, winter term 2007/08, http:/www.cs.kuleuven.be/berendt/teaching/2007w/adb/30Other Pattern Mining IssuesPrecise patterns vs. approximate patternsn A

43、ssociation and correlation mining: possible find sets of precise patternsl But approximate patterns can be more compact and sufficientl How to find high quality approximate patterns?n Gene sequence mining: approximate patterns are inherentl How to derive efficient approximate pattern mining algorithms?Constrained vs. non-constrained patternsn Why constraint-based mining?n What are the possible kinds of constraints? How to push constraints into the mining process?

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1