ImageVerifierCode 换一换
格式:PPT , 页数:49 ,大小:443.50KB ,
资源ID:372937      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-372937.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Introduction to Stanford DB Group Research.ppt)为本站会员(livefirmly316)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Introduction to Stanford DB Group Research.ppt

1、1,Introduction to Stanford DB Group Research,Li Ruixuan http:/ ,2,Contents,Introduction Past projects Current projects Events References Links,3,The Stanford Database Group,“Mainstream” faculty Hector Garcia-Molina Jennifer Widom Jeff Ullman Gio Wiederhold “Adjunct” faculty Chris Manning (natural la

2、nguage processing) Rajeev Motwani (theory) Terry Winograd (human-computer interaction) A.k.a. Stanford InfoLab,4,Database Group (contd),Approximately 25 Ph.D. students Varying numbers of M.S. and undergraduate students Handful of visitors One senior research associate One systems administrator, one

3、programmer Excellent administrative staff Resident photographer,5,Research Areas (very coarse),Digital libraries Peer-to-peer systems Data streams Replication, caching, archiving, broadcast, The Web Ontologies, semantic Web Data mining Miscellaneous,6,Past Projects,LIC: Large-Scale Interoperation an

4、d Composition (1999) mediator (SKC, OntoWeb, CHAIMS, SmiQL, image DB) SKC: Scalable Knowledge Composition (2000) - semantic heterogeneity TID: Trusted Image Distribution (2001) - Image Filtering for Secure Distribution of Medical Information Image Database: Content-based Image Retrieval (2003) SimQL

5、:Simulation Access Language (2001) - Software modules in manufacturing, acquisition, and planning systems,7,Past Projects (contd),TSIMMIS: Wrapping and mediation for heterogenous information sources (1998) Lore: A Database Management System for XML (2000) WHIPS: WareHouse Information Prototype at St

6、anford (1998) - Data warehouse creation and maintenance MIDAS: Mining Data at Stanford (1999) WSQ: Web-Supported Queries (2000) - Integrating database queries and Web searches,8,Current Projects,WebBase: Crawling, storage, indexing, and querying of large collections of Web pages. (Molina) STREAM: A

7、Database Management System for Data Streams (Widom) Peers: Building primitives for peer-to-peer systems (Molina) Digital Libraries: Interoperating on-line services for end-user support (TID,WebBase,OntoAgents) (Molina) TRAPP: Approximate data caching: trading precision for performance (Widom) CHAIMS

8、: Compiling High-level Access Interfaces for Multi-site Software (1999) (Wiederhold) OntoAgents: Ontology based Infrastructure for Agents (2002) (Wiederhold),9,WebBase: Objectives,Provide a storage infrastructure for Web-like content Store a sizeable portion of the Web Enable researchers to easily b

9、uild indexes of page features across large sets of pages Distribute Webbase content via multicast channels Support structure and content-based querying over the stored collection,10,WebBase: Architecture,11,WebBase: Current Status,Efficient “smart” crawler Parallelism Freshness & Relevance Efficient

10、 and scalable indexing Distributed Web-scale content indexes Indexes over graph structure Unicast dissemination Within Stanford External clients: Columbia, U.Wash, U.C.Berkeley,12,WebBase: In Progress,WebBase Infrastructure Multicast dissemination Complex queries Other work PageRank extensions Clust

11、ering and similarity search Structured data extraction Hidden Web crawling,13,Data Streams: Motivation,Traditional DBMS - data stored in finite, persistent data sets New applications - data as multiple, continuous, rapid, time-varying data streams Network monitoring and traffic engineering Security

12、applications Telecom call records Financial applications Web logs and click-streams Sensor networks Manufacturing processes,14,STREAM: Architecture,15,STREAM: Challenges,Multiple, continuous, rapid, time-varying streams of data Queries may be continuous (not just one-time) Evaluated continuously as

13、stream data arrives Answer updated over time Queries may be complex Beyond element-at-a-time processing Beyond stream-at-a-time processing,16,DBMS versus DSMS,Persistent relations One-time queries Random access Access plan determined by query processor and physical DB design “Unbounded” disk store,T

14、ransient streams (and persistent relations) Continuous queries Sequential access Unpredictable data arrival and characteristics Bounded main memory,17,STREAM: Current Status,Data streams and stored relations Declarative language for registering continuous queries Flexible query plans Designed to cop

15、e with high data rates and query workloads Graceful approximation when needed Careful resource allocation and usage Relational, centralized (for now),18,STREAM: Ongoing Work,Algebra for streams Semantics for continuous queries Synopses and algorithmic issues Memory management issues Exploiting const

16、raints on streams Approximation in query processing Distributed stream processing System development,19,STREAM: Related Work,Amazon/Cougar (Cornell) sensors Aurora (Brown/MIT) sensor monitoring, dataflow Hancock (AT&T) telecom streams Niagara (OGI/Wisconsin) Internet XML databases OpenCQ (Georgia) t

17、riggers, incr. view maintenance Stream (Stanford) general-purpose DSMS Tapestry (Xerox) pub/sub content-based filtering Telegraph (Berkeley) adaptive engine for sensors Tribeca (Bellcore) network monitoring,20,Peer-To-Peer Systems,Multiple sites (at edge) Distributed resources Sites are autonomous (

18、different owners) Sites are both clients and servers Sites have equal functionality,21,P2P Benefits,Pooling available (inexpensive) resourcesHigh availability and fault-toleranceSelf-organization,22,P2P Challenges,Search Query Expressiveness Comprehensiveness Topology Data Placement Message Routing

19、Resource Management fairness load balancing,Security & Privacy Anonymity Reputation Accountability Information Preservation Information Quality Trust Denial of service attacks,23,Peers: Stanford Research,New Architectures Performance Modeling and Optimization Security and Trust Distributed Resource

20、Management Applications,24,Digital Library Project: Overview,25,DigLib Projects: DLI1,DLI2,Resource Discovery Retrieving Information Interpreting Information Managing Information Sharing Information,26,DigLib: Resource Discovery,Geographic Views (Tools to assist you in more systematically locating d

21、ifferent types of information from a large and diverse number of information sources),27,DigLib: Retrieving Information,Information Tiling PalmPilot Infrastructure (PDA)Power Browsing (PDA)Query Translator SDLIP (Simple Digital Library Interoperability Protocol)Value Filtering WebBase,28,DigLib: Int

22、erpreting Information,Murals (Tools to help a user interpret and organize search results) Web Clustering,29,DigLib: Managing Information,Archival Repositories Archiving Movie InterBib (a tool for maintaining bibliographic information)Medical Transport Info PhotoBrowser,30,DigLib: Sharing Information

23、,Diet ORB (PDA, based on MICO)Digital Wallets Mobile Info Delivery Mobile Security Multicasting,31,DLI1 Projects (95-99),AHA ComMentor DLITE Google GLOSS FAB Grassroots Metadata Architecture,RManage/FIRM SenseMaker SCAM Shopping Models, U-PAI SONIA STARTS WebWriter,32,TRAPP: Overview,TRAPP: Tradeoff

24、 in Replication Precision and Performance A.k.a: Approximate Data Caching Project goal: investigating techniques to permit controlled and explicit relaxation of data precision in exchange for improved performance,33,TRAPP: Motivation,Transactional consistency too expensive Even nontransactional prop

25、agation of every update still too expensive in many casesSolution: Approximate Caching Exploit the fact that many applications do not require exact consistency Avoid propagating insignificant updates Trade cache precision for network load,34,Example: TRAPP Over Numeric Data,Caches store intervals th

26、at bound the exact source values Sources refresh when value leaves interval,Query answers are intervals Precision constraints specify maximum width,35,Eg(contd): Querying in TRAPP,For one-time aggregation queries: Answers computed by combining approximate cached data and exact source data At query-t

27、ime: Find low-cost subset of sources to probe so final answer will have adequate precision Algorithm determined by aggregation function Some easy, some hard,36,TRAPP: Approximate Caching,Two common scenarios: Minimize bandwidth usage, precision fixed TRAPP: caches store bounds as approximations Quer

28、ies select combination of cached & source data Adaptive bound adjustment for good precision levelBandwidth fixed, maximize precision Best-Effort Synchronization: caches store stale copies Refreshing based on priority scheduling Global priority order via threshold Adaptive threshold setting for flow

29、control,37,TRAPP: Status,Past work: focused on an approximate data caching architecture that permits fine-grained control of the precision-performance tradeoff for numerical data in data caching environments. Current work: applying the above techniques and others to more complex data such as Web pag

30、es.,38,CHAIMS: Overview,CHAIMS: Compiling High-level Access Interfaces for Multi-site Software Objective: Investigate revolutionary approaches to large-scale software composition. Approach: Develop and validate a composition-only language, a protocol for large, distributed, heterogeneous and autonom

31、ous megamodules, and a supporting system. Planned contributions: Asynchrony by splitting up CALL-statement. Hardware and software platform independence. Potential for multi-site dataflow optimization. Performance optimization by invocation scheduling.,39,CHAIMS: Overview,Megaprogram for composition,

32、 written by domain programmer,CHAIMS system automates generation of client for distributed system,Megamodules, provided by various megamodule providers,40,CHAIMS: Architecture,41,OntoAgents: Objective,OntoAgents goal: establish an agent infrastructure on the WWW or WWW-like networks Such an agent in

33、frastructure requires an information food chain: every part of the food chain provides information, which enables the existence of the next part.,42,OntoAgents: Architecture,Ontology Construction Tool,Ontology Articulation Toolkit,Annotated Webpages,Webpage Annotation Tool,Ontologies,Agents,Metadata

34、 Repository,Inference Engine,Community Portal,End User,43,Events: DB Seminars,44,Events: Meetings,Stanford Computer Science Forum - Annual Affiliates Meeting, Stanford, May 2003. SWiM (the Stream Winter Meeting): About 35 researchers in the data streams are came together at Stanford for SWiM, Jan. 2

35、003. Stream Team: A few data streams research groups held some informal get-togethers, 2002. Conference Talk: ACM SIGMOD/PODS, VLDB, ICDT, ICDE, ICDCS, CIDR,45,References: WebBase,Junghoo Cho, Hector Garcia-Molina. “Parallel Crawlers,“ In Proceedings of the Eleventh World Wide Web Conference, May 20

36、02. Taher Haveliwala, Aristides Gionis, etc. “Evaluating Strategies for Similarity Search on the Web,“ Proceedings of the Eleventh International World Wide Web Conference, May 2002. Taher Haveliwala. “Topic-Sensitive PageRank,“ Proceedings of the Eleventh International World Wide Web Conference, May

37、 2002.,46,References: STREAM,R. Motwani, J. Widom, etc. Query Processing, Resource Management, and Approximation in a Data Stream Management System In Proc. of the 2003 Conference on Innovative Data Systems Research (CIDR), January 2003 A. Arasu, B. Babcock. etc. STREAM: The Stanford Stream Data Man

38、ager In Proc. of the ACM Intl Conf. on Management of Data (SIGMOD 2003), June 2003 B. Babcock, S. Babu, etc. Models and Issues in Data Stream Systems Invited paper in Proc. of the 2002 ACM Symp. on Principles of Database Systems (PODS 2002), June 2002,47,References: Peers,Neil Daswani, Hector Garcia

39、-Molina and Beverly Yang. Open Problems in Data-Sharing Peer-to-Peer Systems, In ICDT, 2003. Hector Garcia-Molina. Peer-To-Peer Data Management, Key-notes In ICDE, 2002. Hrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina. Streaming Live Media over a Peer-to-Peer Network.,48,References: TRAP

40、P,C. Olston and J. Widom. Best-Effort Cache Synchronization with Source Cooperation. ACM SIGMOD 2002 International Conference on Management of Data, Madison, Wisconsin, June 2002, pp. 73 -84. C. Olston, B. T. Loo and J. Widom. Adaptive Precision Setting for Cached Approximate Values. ACM SIGMOD 2001

41、 International Conference on Management of Data, Santa Barbara , California, May 2001, pp. 355-366.,49,Useful Links,Database Group: http:/www-db.stanford.edu/ STREAM: http:/www-db.stanford.edu/stream/ Peers: http:/www-db.stanford.edu/peers/ DigLib: http:/www-diglib.stanford.edu/ TRAPP: http:/www-db.stanford.edu/trapp/ WebBase: http:/www-diglib.stanford.edu/testbed/doc2/WebBase/,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1