An Overview of Databases for the Big Data Ecosystem.ppt

上传人:feelhesitate105 文档编号:378327 上传时间:2018-10-09 格式:PPT 页数:35 大小:188.58KB
下载 相关 举报
An Overview of Databases for the Big Data Ecosystem.ppt_第1页
第1页 / 共35页
An Overview of Databases for the Big Data Ecosystem.ppt_第2页
第2页 / 共35页
An Overview of Databases for the Big Data Ecosystem.ppt_第3页
第3页 / 共35页
An Overview of Databases for the Big Data Ecosystem.ppt_第4页
第4页 / 共35页
An Overview of Databases for the Big Data Ecosystem.ppt_第5页
第5页 / 共35页
亲,该文档总共35页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、An Overview of Databases for the Big Data Ecosystem,Keith W. Hare JCC Consulting, Inc. September 20, 2016,1,09/20/2016,Copyright 2016, JCC Consulting, Inc.,Abstract,The ultimate goal of big data techniques is to be able to identify useful, usable information in a timely fashion actionable analytics

2、Prerequisites to producing actionable analytics are Ability to analyze lots of disparate data Ability to discover, access, store and retrieve lots of data This presentation provides an overview of data storage and retrieval in a big data ecosystem Focus on the characteristics, not the implementation

3、s Useful for understanding how the pieces should fit together Addresses the prerequisites not the end goal,09/20/2016,Copyright 2016, JCC Consulting, Inc.,2,Who am I?,Senior Consultant with JCC Consulting, Inc. since 1985 High performance database systems Replicating data between database systems SQ

4、L Standards committees since 1988 Convenor, ISO/IEC JTC1 SC32 WG3, since 2005 Vice Chair, ANSI INCITS DM32.2, since 2003 Vice Chair, INCITS Big Data Technical Committee since 2015 Education Muskingum College, 1980, BS in Biology and Computer Science Ohio State, 1985, Masters in Computer & Informatio

5、n Science,3,09/20/2016,Copyright 2016, JCC Consulting, Inc.,Topics,Why is “Big Data” Different? Big Data Buzzwords High Level View Data Distribution Integrating Data from Multiple Sources Data Query Languages Big Data Eco-system Products Summary“Lets do a deep dive in the Big Data and drill down unt

6、il we hyperlocalize some disruptive technologies.” (See http:/ 2016, JCC Consulting, Inc.,4,Why is “Big Data” Different?,Often defined in terms of 3 4 5 6 7 Vs: Volume exceed capacity of a single “computer” Velocity speed at which data is generated Variety new types of data Variability speed at whic

7、h data changes Veracity quality & provenance Visualization meaningful presentation Value actionable analytics Focus on primary data rather than extract, load, and transform (ETL) In many ways, “Big Data” is what we have always been doing, only bigger and more complex.,09/20/2016,Copyright 2016, JCC

8、Consulting, Inc.,5,Big Data: Driving Forces,Inexpensive storage of large volumes of data Inexpensive compute power Next Generation Analytics Moving from off-line to in-line embedded analytics Explaining what happened Predicting what will happen Operating on Data at rest stored someplace Data in moti

9、on streaming Multiple disparate data sources Look at available data and wonder what answers are hidden there,Copyright 2016, JCC Consulting, Inc.,6,09/20/2016,Big Data: Working Definition,Requirements cannot be met on a single computer Variety, Volume, Velocity, Variability, Availability Imprecise t

10、erms, but useful for understanding problem space All relative what was impossible yesterday is Big Data today and will be trivial tomorrow Distribute data storage to support volume & velocity Replicate data storage to provide availability Distribute processing Apply compute power in parallel Avoid m

11、oving data across the network move the answers,Copyright 2016, JCC Consulting, Inc.,7,09/20/2016,Data Volume How Big is Big?,Gigabyte 1000*3 Terabytes 1000*4 Petabytes 1000*5 Exabyte 1000*6 Zettabyte 1000*7 Yottabyte 1000*8 Brontobyte* 1000*9 Gegobyte* 1000*10,09/20/2016,Copyright 2016, JCC Consulti

12、ng, Inc.,8,*This terminology is still subject to change.,Big Data Buzzwords,NoSQL Databases Sharding Map-Reduce Schema-less New SQL,09/20/2016,Copyright 2016, JCC Consulting, Inc.,9,Big Data Buzzwords NoSQL,Originally did not include SQL Rejected complexity of SQL language Rejected overhead and limi

13、tations of SQL Databases Now Not Only SQL Turns out that SQL is a powerful language for specifying queries Potentially useful data storage and retrieval techniques,09/20/2016,Copyright 2016, JCC Consulting, Inc.,10,Sharding,Partitioning data across multiple servers Scaling out Once the data is shard

14、ed, send queries to data with Map Reduce,09/20/2016,Copyright 2016, JCC Consulting, Inc.,11,Big Data Buzzwords Map Reduce,Patented algorithm for: partitioning queries to run on multiple nodes in parallel Integrating the results Map Reduce details originally created by developer Operations can (and s

15、hould) be generated by database software,09/20/2016,Copyright 2016, JCC Consulting, Inc.,12,Big Data Buzzwords Schema-less,Reduce development time by eliminating up-front schema design Schema information still exists Embedded in the data Embedded in the code to support an API Pinned to a developers

16、wall Reinventing databases from the 1960s,09/20/2016,Copyright 2016, JCC Consulting, Inc.,13,Big Data Buzzwords New SQL,Combine powerful SQL query language with performance benefits of NoSQL databases Support ACID transactions,09/20/2016,Copyright 2016, JCC Consulting, Inc.,14,High level view,“Big D

17、ata” Data Types Data Storage Models When is data accessed? Data Distribution Integrating Data From Multiple Sources Variety of Data Sets/Sources Variety of Data Source Ownership Data query languages,09/20/2016,Copyright 2016, JCC Consulting, Inc.,15,“Big Data” Data Types,Traditional Data Types Chara

18、cter Numerical Date/Time/Timestamp Large Objects LOB/BLOB/CLOB “Big Data” Data Types Multi-dimensional arrays Images/video Documents Loosely formatted data Objects Spatial,Copyright 2016, JCC Consulting, Inc.,16,09/20/2016,Data Storage Models,Row Store Tabular Column Store Key Value Document XML JSO

19、N Java Script Object Notation BSON Binary JSON Graph Multi dimensional array Object,09/20/2016,Copyright 2016, JCC Consulting, Inc.,17,When is data accessed?,After being stored Before (or instead of) being stored Streaming data,09/20/2016,Copyright 2016, JCC Consulting, Inc.,18,Data Distribution,Sin

20、gle node vertical scaling Clustered Replicated Horizontally distributed & replicated horizontal scaling,09/20/2016,Copyright 2016, JCC Consulting, Inc.,19,Vertical Scaling,Buy a bigger server More CPUs Faster CPUs More Memory More storage Argument for vertical scaling Cores per CPU chip are increasi

21、ng 22 cores/CPU Configurable memory is increasing 2 terabytes/server Storage capacity is increasing 15.3 Terabyte SSDs Faster networks 20 Gigabit network adapters Not all problems can be solved with vertical scaling,09/20/2016,Copyright 2016, JCC Consulting, Inc.,20,Potential Problems with Vertical

22、Scaling,What if data storage breaks? What if server breaks? What if data center breaks? What if a single server cannot handle CPU load? What if a network cannot handle the traffic? What if data doesnt fit? Horizontal scaling and replication solve these issues but introduce additional complexities.,0

23、9/20/2016,Copyright 2016, JCC Consulting, Inc.,21,Horizontally Scaled Data Source,Copyright 2016, JCC Consulting, Inc.,22,Horizontal Scaling is one solution to the data volume challenge.,09/20/2016,Horizontal Distribution Levels,Single Server Cluster of servers Multiple servers/clusters in a Datacen

24、ter Multiple datacenters on a Continent Multiple continents on a Planet Lets not think small Multiple planets in a Solar System Multiple solar systems in a Galaxy Still some challenges around network latency,09/20/2016,Copyright 2016, JCC Consulting, Inc.,23,Horizontal Distribution and Replication,D

25、istribute processing Distribute query and analysis Map-Reduce algorithms Transmit results, not the entire data set Replicate data for fault tolerance and performance Lots of complexities that do not fit in timeframe for this talk,09/20/2016,Copyright 2016, JCC Consulting, Inc.,24,Integrating Data Fr

26、om Multiple Sources,Discovering that data exists Data location Access method(s) Understanding what data is available and what it means Schema can programmatically queried Ontologies to identify comparable data Security requirements Privacy requirements Business details Identifying possible operation

27、s/analysis Integrating the resulting analysis Challenging problems in this area,09/20/2016,Copyright 2016, JCC Consulting, Inc.,25,Data Source Registry N,Data Source Registry 2,Integrating Multiple Data Sources,Copyright 2016, JCC Consulting, Inc.,26,Analytics Engine,Data Source 1,Data Source 2,Data

28、 Source N,Data Source Registry 1,Disclaimer: This diagram assists in identifying requirements. It is not intended to be a full processing model.,09/20/2016,Variety of Data Representations,Tabular data relations Designed, cleansed, curated Spatial data Images & Video Well defined structures Need addi

29、tional domain information aerial photos, faces, stars, etc. XML may have well defined DTD Store everything now, figure it out later JSON/BSON E.g. network packet logs Multiple data models to handle data diversity,Copyright 2016, JCC Consulting, Inc.,27,09/20/2016,Variety of Data Source Ownership,Sel

30、f Owned Publically Available Data for hire Derived Data,Copyright 2016, JCC Consulting, Inc.,28,09/20/2016,Data Source Registry Requirements,Language/Interface for registering data source Support for discovering and identifying available data sources Content of the data source Semantics and Syntax o

31、f data Available analytic routines Security/Privacy restrictions Provenance of the data Information about connecting to data source Business agreement information Costs Use Restrictions Service Level Agreements Potentially use block chaining (distributed ledger) for agreement Standards support integ

32、ration of multiple data sources,Copyright 2016, JCC Consulting, Inc.,29,09/20/2016,Data Query Languages,JDBC SQL queries from Java SPARQL Graph query language XQuery XML Product and application specific APIs Specify how to access the data SQL specify what data is needed, not how to access it Traditi

33、onal Tables with rows & columns Expanded to support: XML JSON Polymorphic Table Functions Multi-dimensional Arrays,09/20/2016,Copyright 2016, JCC Consulting, Inc.,30,Data Analysis and Visualization,R statistics package Others?,09/20/2016,Copyright 2016, JCC Consulting, Inc.,31,Big Data Eco-system Pr

34、oducts,Open Source Products Minimal upfront license costs Minimal documentation Minimal support Multiple products in the ecosystem Lots of time and effort to implement and deploy Commercial off the shelf (COTS) Products Potentially expensive license costs Documentation Support Lots of time and effor

35、t to implement and deploy Commercial products integrating Open Source products,09/20/2016,Copyright 2016, JCC Consulting, Inc.,32,Summary,In many ways, “Big Data” is the same as weve always been doing. Focus on analysis rather than transaction processing New software and techniques for horizontal di

36、stribution New buzzwords New datatypes Distribute processing and integrate results One challenge is integrating data from multiple sources Locating the data Understanding what the data contains Requesting and integrating analysis Big Data is a tool ultimate goal is actionable analytics,09/20/2016,Co

37、pyright 2016, JCC Consulting, Inc.,33,Questions?Keith W. Hare JCC Consulting, Inc. 600 Newark Granville Road P.O. Box 381 Granville, OH 43023 USA K,09/20/2016,Copyright 2016, JCC Consulting, Inc.,34,References,May 2014 “Understanding Big Data: The Seven Vs”, Eileen McNulty Eileen McNulty. http:/ Na

38、tional Research Council. 2013. “Frontiers in Massive Data Analysis”, Washington, D.C., The National Academies Press. http:/www.nap.edu/catalog.php?record_id=18374 May 2014, “Big Data: Seizing Opportunities, Preserving Values”, Executive Office of the President. http:/www.whitehouse.gov/sites/default

39、/files/docs/big_data_privacy_report_may_1_2014.pdf 2015, “ISO/IEC JTC1 Big Data Preliminary Report 2014” http:/www.iso.org/iso/big_data_report-jtc1.pdf September 2015, NIST Big Data Public Working Group Reports (NIST.SP.1500-1, 2, 3, 4, 5, 6, 7) https:/www.nist.gov/el/cyber-physical-systems/big-data-pwg,Copyright 2016, JCC Consulting, Inc.,35,09/20/2016,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 教学课件 > 大学教育

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1