1、ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing,Donald Kossmann Technical University of Munich http:/www3.in.tum.deJoint work with Alfons Kemper (Passau) and others,Outline,Background The ObjectGlobe Lookup Service (Security Aspects) QoS Management Summary,Query Processing on
2、 the Internet,Web servers, relational databases on the Web: centralized or limited query capabilities Middleware Systems: a great deal of data shipping Goals of ObjectGlobe: integrate any kind of data integrate any kind of query processing capabilities bring query processing capabilities to the data
3、,Middleware for Query Processing,S,Data-Provider A,Data-Provider B,wrap_S,thumbnail,thumbnail,wrap_S,User-defined operators,Heavy data shipping,wrap_S,thumbnail,wrap_S,thumbnail,Fct-Provider,S,Data-Provider A,Data-Provider B,Open Query Processing (Step 1),Load functions,wrap_S,thumbnail,wrap_S,thumb
4、nail,Fct-Provider,S,Data-Provider A,Data-Provider B,Open Query Processing (Step 2),Load functions,Traveling from M to UCB,flights,rental cars,Selection,Selection,Routenplaner,Route,Top N,Function Provider,Data Provider,Data Provider,Cycle Provider,Open QP with ObjectGlobe,Create an open marketplace
5、for data providers cycle providers function providers Requirements wrappers exist for all data of data providers JVM runs on all cycle providers fixed interface for operators of function providers,Scenarios,Free Internet: everything is free and available for everybody Restricted Internet: charge acc
6、ording to usage, quality, and timeliness; restrictions (e.g., age) Intranet: everything is free and available for insiders“ Outsourcing: charge for certain services (e.g., backup, business analyses),Challenges,Lookup Service Find the relevant services Security Protect data and cycle providers from b
7、ad code Quality of Service What you pay is what you get,ObjectGlobe Lookup-Service,Lookup-Service,Parser,Optimizer,Execution Engine,Application /User,Browse, Search,Authorisation, .,Statistics, Cost Information, .,Provider,Register,Description of Services,Providers register RDF or XML documents Ther
8、e is a pre-defined schema to describe services Data Providers: Theme (e.g., Hotel) Attributes (e.g., rate, location, category) Access paths and wrappers Characteristics of the server (e.g., availability) Information for authorization Statistics .,Function Provider: Signature (e.g., foo(int, int) - i
9、nt) Information for authorization Hardware requirements (e.g., 30 MB main memory) Size of Java byte code . Cycle Provider: Hardware (e.g., 1 GB main memory) Location and network connections / bandwidth Information for authorization .,XML Description of a Data Provider,4711 Hotel All hotels you ever
10、want city string .,Lookup Query,Data Providers for Hotels that return the City and Rate of each hotelsearch DataProvider d select d.uniqueId, d.attr.* where d.theme.name = hotel“ and d.attr.?.topic = city“ and d.attr.?.topic = rate“,Three-tier Architecture,Local Lookup-Servers Keep copies of meta-da
11、ta of services that are relevant for a particular organization or subsidary Evaluate Lookup requests for that organization Relevance is determined by subscription rules (queries) Public Lookup-Servers (Backbone) Store all (public) meta-data Store subscription rules of local Lookup-Servers Notify loc
12、al Lookup-Servers of changes Users can browse in the public info of the backbone,Three-tier Architecture,Public Lookup-Server,Public Lookup-Server,Local LS,Local LS,Local LS,New Rules Answers,Client,Client,Client,Client,Client,Queries Answers,New Rules,Updates, Inserts,Processing Lookup Requests Loc
13、al Lookup-Servers store meta-data in RDBMS Translate Lookup request into SQL Registering new services Public Lookup-Servers store meta-data in RDBMS Public Lookup-Servers store rules in RDBMS Apply filter algorithm using RDBMS in order to find relevant local Lookup-Servers Deletes and updates of ser
14、vices Apply filter algorithm to find affected local Lookup-Servers (more complicated, however) Principle: Map everything to RDBMS, Lilly Potter Harry Potter James Potter 314 ,person,person,Harry Potter,name,name,name,person,Lilly Potter,James Potter,child,314,0,4711,666,i314,Storing XML Data in an R
15、DBMS,Edge Approach,Edge Table,Value Table (String),Value Table (Integer),XML Queries,Find the name of all persons that like to play Quidditch and are younger than 18 years select $n where $n $a Quidditch , $a 18Carry out pattern matching with document graph“,Translation to SQL,SELECT nv.value FROM E
16、dge p, Edge n, Edge h, Value nv, Value hv WHERE p.label = person“ ANDp.target = n.source ANDn.label = name“ ANDn.target = nv.id ANDp.target = h.source ANDh.label = hobby“ ANDh.target = hv.id ANDhv.value = Quidditch“;,Works essentially in the same way for the query language of our Lookup service.,Pub
17、lish & Subscribe Algorithm,Decompose subscription rules and store them in RDMBS of Public Lookup-Servers SQL Join-Queries in order to match sub-rules with meta-data objects (Recall: meta-data is decomposed, too) SQL Join-Queries in order to re-construct matching subscription rules from sub-rules,Dec
18、omposition of Subscription Rules,Data Providers for Stock Market Information that cost less than 500 Dollars: search DataProvider d where d.theme.name = Stock Market“ and d.cost 500 Decomposition into three atomic rules: R1: search Theme t where t.name = Brse“ R2: search DataProvider d where d.cost
19、500 R3: search R1 a, R2 b where b.theme = a Store these rules in RDBMS,Matching,Result of Join: (R1, O1); (R2, O2),Re-constructing Subscription Rules from matching atomic sub-rules,Store decomposition graph in RDMBS higher-level and atomic rules are vertices Top-level rules are so-called triggering
20、rules; if they are affected, notify LLS Walk bottom up“ through decomposition graph SQL-Join Query: for each pair of matching rules, find out whether they have a common parent N.B. the decomposition graph is a binary directed, acyclic graph,Preliminary Experiments,Synthetic benchmark database with 1
21、00.000 (different) subscription rules Oracle 8i used in the Public Lookup Server,Batch updates are crucial,Summary,Basic Principle: decompose rules and data Advantages: Generic, independent of schema Very easy to implement, no administration needed Exploit query capabilities of RDBMS Need not worry
22、about document boundaries Finding common sub-rules is trivial Disadvantage: Sub-optimal query performance (many Joins) but probably sufficient, if updates are batched,Related Work,Lookup Services: Jini, UDDI, Plug & Play Publish & Subscribe: IR world SIFT (Stanford) XFilter (Berkeley) LeSelect (INRI
23、A) Continuous Queries (Niagra, .) Storing and Indexing XML Data: .,Outline,Background The ObjectGlobe Lookup Service (Security Aspects) QoS Management Summary,Security Requirements in ObjectGlobe,Protection of Data and Cycle Providers Secure Communication use SSL connections (authenticated and encry
24、pted) Authentication of Clients passwords / certificates digitally signed requests (query subplans) Authorization control data/cycle providers are autonomous but register user privileges in lookup service,Security of Data/Cycle Providers,ObjectGlobe runtime system,Class loader,Class loader,Class loa
25、der,Internal class loader,Secure sandbox,Internet,Query 1,Query 2,Query 3,Privileged Built-inOperators for Disk or Network Access,sandbox,external operator,Internal operator,tmpfile,Other Potential Risks,Malicious cycle providers security classification of cycle providers authorization for data- and
26、 function code-flow providers analyze the safety of the entire QEP Denial of Service Attack monitor resource consumption of external operators quality assertion of external operators authenticate external operators,QoS Management,State of the Art: best-effort Goal: users should be able to constrain
27、Cost of execution Running time Quality of the results Initial approach (to get a feeling)extended query optimization Admission control Monitoring and plan adaptions at execution time Real solution: ?,Quality Parameters,Cost of execution $ Running time First tuple, last tuple, Nth tuple Quality of th
28、e results Number of results Coverage: Number (or %) of data sources queried Staleness of data Cost as a function of coverage (- Mariposa) Cost as a function of #wheels (Mercedes),Quality of Service-Parameters,Completeness,Cost (),min,max,max,Response time,Desired space for query plans,Extended Query
29、 Optimization,Bottom-up dynamic programming query optimizer, standard costing etc., and the following extensions Generate alternatives for each operator Consider classes of equivalent providers Extended Pruning, Heuristics for choosing a Winner Enumerate incomplete“ UNIONs Initialize QoS-Accounts,Qu
30、ery Optimization: Quality of Service-Considerations,Completeness,Cost,40%,illegal QEP,P,Q,R,QoS-Annotated Query Plan,scan,scan,thumbnail,display,host=A.com,host=client,host=client,host=A.com,host=B.com,host=B.com,host=B.com,QoS Accounts,Optimization: Open Questions,Revisit heuristics to choose winni
31、ng plan Dynamic heuristics depending on workload and/or feedback Reverse engineering a plan How much data should a plan read if the cost should be $5.00? Does query optimization matter?,Admission Control & Monitoring,Admission Control: Check assumptions of optimizer Carried out at plan instantiation
32、 time for each plan fragment (set of operators at one site) Monitoring: Predict quality of results at the end of execution Carried out by special Monitoring operators Take actions if violations are detected ECA rules specify actions,Monitoring Operators,at the end of pipelines are non-blocking / low
33、 cost above receive“ ops keep statistics for predictions differentiate between open“ and next“ phase Communicate with each other for liveliness,monitor,A,send,monitor,B,send,monitor,receive,Join,Plan Adaptions,General: Abort, Restart / Reoptimize Response Time Violation: compressConnection movePlan
34、(w/wo state) increasePriority removeTempResults, . Coverage / Result Quality Violation: addSubPlan Cost Violation: movePlan, decreasePriority, .,ECA Rules for Adaptions,if cost is high and coverage is low then abortif cost is high and coverage is high then delResultsif rt is high and cost is low and
35、 network is critical then compress,Plan Adaptions: Open Questions,What is the right mix of actions? What are the right thresholds for the rules? How to avoid the Schweinezyklus“? How to draw the right conclusions from the statistics produced by Monitoring? What is the right granularity of actions? P
36、lan vs. Operator vs. Tuple,Project Status,First demo presented at SIGMOD 99 Travel information Four Web data sources (hotels, sights, train conns) One function provider (travel routes, top N) Three cycle providers (two in Europe, one in US) Online-Demo: http:/db.fmi.uni-passau.de/projects/OG Current work: more experiments Problem: getting data from Web sources is sloooow,