1、Dynamic XML documents with Distribution and Replication Authors : Serge Abiteboul, Angela Bonifati, Grgory Cobna, Ioana Manolescu, Tova Milo,As summarized by : Preethi Vishwanath San Jose State University Computer Science,Dynamic XML documentsXML documents where some data is given explicitly while o
2、ther parts are given only intentionally by means of embedded calls to web services that can be called to generate the required information. SOAP and WSDL normalize the way programs can be invoked over the Web, and become the standard means of publishing and accessing dynamic, up-to-date sources of i
3、nformation. May be distributed and/or partially distributed.,Whether dynamic or static, XML document may beDistributed in several parts located at different peers, while maintaining the general unity of the separated piecesPartially or entirely replicated on different peers.,Aspects of distribution
4、due to embedding calls to a Web Service,(1) Accessing remote services: Such a document provides the means to access remote services. This feature is already provided by platforms supporting embedded scripts in HTML/XML documents, e.g., JSP, ASP.Net.(2) Replicating data fragments with embedded servic
5、e calls: a call included in a replicated fragment may be activated from the replicas site, following a rather different communication path.(3) Replicating service definitions: A special form of replication may be achieved by replicating not only data, but also service definitions. This is in the spi
6、rit of “code-shipping”.,Context of paper and Contributions,Dynamic XML documents (XML documents including calls to Web services) that are possibly distributed over several sites, with portions of them possibly replicated.ContributionsModel. Introduce a simple model for replicating and distributing X
7、ML documents over several sites. The model may be used for standard or dynamic documents. In general, users querying distributed/replicated data prefer to ignore data location and expect the system to locate data for them. But it is sometimes desirable to specify which replicas of a given fragment t
8、o use (e.g., the one in the local cache, or the most recent one).,(2) Query evaluation and optimization. In the presence of replicas and distribution, many evaluation strategies are possible for a given query, depending on the choice of the replica to use, and of the sites performing each elementary
9、 computation. Typically, several peers will collaborate to evaluate a query; each involved peer will have to make choices in order to improve its observable performance, based on a cost metric specific to this peer. (3) Tailored replication. To improve its observable performance, a peer may be willi
10、ng to replicate some data, possibly including service calls, and even service definitions, as explained above. Such replication is subject to natural constraints (e.g., storage space).,Data Model & Query Language,Dynamic XML Documents May be viewed as labeled tree. Tree nodes represent the XML eleme
11、nts/attributes., edges represent relationship. Function elements, represent calls to the Web Web Services Opaque “SOAP-based Web services, black boxes” Declarative web services, implementation is known and described in terms of XQuery. Peers Offers some Web services and contains some dynamic XML doc
12、ument which may include calls to services provided by the same or other peers. Distribution May include calls to services provided by the same or other peers. A higher level of data distribution can be achieved by allowing a document to be distributed over several peers. Tree data model : means that
13、 document nodes may now have external children edges pointing to children nodes on other peers, and analogously, an external parent edge if the parent of the node is on another peer. Replication of data and services Same document fragment exists in several peers. All children of the same node with t
14、he same ID are considered replicas of a single node.,A dynamic XML Document of the SKI PortalColorado Aspen goodAspen . . ,Web Services of Ski Portalfunction OperativeSkiResorts($state) implementation:XQuery for $x in document(”SkiPortal”)/statestate name=$state /resorts/resortsnow cond/value()=”goo
15、d” return $xfunction HotelsInfo($state, $resort) implementation:XQuery for $x in document(”SkiPortal”)/statestate name=$state /resorts/resortname=$resort/hotels/hotel return $x,If the two functions were opaque and the resort knows nothing about their internal implementation, there are essentially tw
16、o possibilities: Call the ski portal each time a service is needed and have the portal compute the answer and return it, or cache the returned result and use it for some time, trading communication cost for data accuracy.Query Frequency By analyzing the OperativeSkiResorts query, we can see that its
17、 answer may change only every hour - when the SnowConditions functions is invoked. Hence, to give fully accurate answers to its visitors, the ski center needs to invoke the function every hour, and cache data in between.Replicating relevant data and services Assume that the Colorado ski center compu
18、ter is capable of (1) storing dynamic XML documents, (2) invoking the web service calls embedded in them, and (3) processing XQuery queries. Rather than just caching the current query result, one could then decide to replicate (and maintain) in the ski center computer all the relevant data, and prov
19、ide a local version of the service queries.,The Colorado dynamic document and services,Aspen goodAspen . function OperativeSkiResorts(“Colorado”) implementation:XQuery for $x in document(”ColoradoSkiCenter”)/resortsnow cond/value()=”good” return $xfunction HotelsInfo(“Colorado”, $resort) implementat
20、ion:XQuery for $x in document(” ColoradoSkiCenter”)/resortname=$resort/hotels/hotel return $x,Partial Replication,Replicate just the resort names and their ski conditions, without the hotels data, and just provide access to this data through the ski portal, when needed. The externalURL sub-element o
21、f the hotels element, together with the ID, indicate where the data of this element may be found. The external edge is simply viewed as an intensional description of this missing data and gives the means to obtain it if needed.,The Colorado document with external edgesAspen goodAspen http:/ Inverse
22、External Edges.http:/www.HS.com/ColoradoSkiCenter.http:/www.HS.com/ColoradoSkiCenter. ,Master-Slave Policy,Maintaining consistency over replicated objects difficult.Typical solution Have each object owned by a single master who is in charge of maintaining the various copies in sync. If the various c
23、opies are the children of a single element, then this element is the candidate for being in charge of synchronization.ExampleColorado http:/www.HS.com/ColoradoSkiCenter. ,Queries,Each element encountered in the evaluation of a path expression, on a given peer p, may contain some data (residing on th
24、at peer), and may also point (via external edges) to some replicas (on different peers). Which of the Element versions should be used ? Ignore all the external edges and consider only the data residing within the given peer p. use the elements local data as well as follow all the given external edge
25、s to its replicas, in order to get the maximal available information. Intermediate choice : Choose some arbitrary copy consider the elements local data when available, and follow an external edge Follow a particular edge Give a preference listExample : A Replicated queryfor $x in document(”SkiPortal
26、”)/statestate name=”Colorado” /resorts/resort replicate $x with resort name/* snow cond/* hotels as external link at peer ”http:/www.HS.com/ColoradoSkiCenter”,COST MODEL,ConfigurationA set of peers, each containing some data and providing some web services (opaque or XQuery-based ones) Workload (for
27、 a configuration)System workload consists of the service calls invoked by the dynamic documents in the configuration, as well as of queries/web service requests posed by users at the various peers. Unifying user queries and services Consists of the invocation of web services entailed by the dynamic
28、documents, and queries and web services requested by the user.,Decomposing Queries on PeersThe processing of Q can thus be viewed as decomposed into several “intra-peer” sub-queries: each such sub-query is evaluated on a particular peer, consulting only the peers local data, and communication with o
29、ther peers in order to forward some finer sub-queries or send/receive data or computation results.,P1,Q,P1,Cost Formulas,Formulas for calculating the data used by a given workload on a set of peers Mi,j = i,j * Oj * min(Fi,Fj) D = TL*M*LComputation, Communication and storage costs incurred by the wo
30、rkloadCjGlobComp=Comp*Lj*cpj CGlobReceivs = D*BWIN CGlobSends = TBWOUT * D CjGlobSpace = Space * L *spjWhere Mi,j is the volume of data transferred from one query Wi to another query WjD represents the volume of data transferred from peer Pi to peer Pj due to all queries in WCjGlobComp is the observ
31、able cost of computation CGlobReceivs is the observable cost of received dataCGlobSends is the observable cost of sent dataCjGlobSpace is the observable cost of space, resp., of peer j,Outline of Query Evaluation,Data Shipping vs Query ShippingWrappers decide how much of the decide how much of query
32、 sent by the mediator they solve. The mediator has global information about data location, and all wrappers report directly to it. Control over execution is distributed.,Communication PatternAt each step the sub-query Qnext includes the address of the peer P on which Q was originally asked, so that
33、the result is returned directly to P, since it requires less communication. Drawback All peers get to know who initiated the query,Peer Pi has to execute a simple path expression Q Q some data in P1 and some in P2. P adopts the heuristic of executing as much of Q as possible, say Qlocal, obtaining a
34、n intermediate result, and delegates one or several further subqueries Qnext to one or several other peers Pnext. Each Pnext will receive the intermediate results and continue processing, by applying the same method: attempt to evaluate all Qnext and, if all data is not available, delegate further.,
35、Replicating data and services,For a given configuration and workload, every peer measures its observable performance In order to improve its observable performance, the peer may want to change the configuration; due to peer autonomy, the peer can only modify his own set of data and services. Possibl
36、e replication scenarios that peer P may consider, Accessing remote information (do not replicate) When not all the data needed for the query evaluation resides on “ , it may need to consult remote data, for instance via external links If the query frequency is high and storage cost at the given peer
37、 is low, “ may prefer to replicate the relevant data and use a local version rather than the remote one. Replicating data fragments with or without service calls Scenario 1 P may take the replicated fragment including the service calls embedded in it; thus P will call the service itself. Alternative
38、ly ,P may leave (some of) the calls to be executed at the remote peer, and just refer to the data they return via external links Scenario 2 Cost Effective Example if the service provider charges some fee from the caller, leaving the call on the remote peer spares “ from this fee; or, if the call is
39、invoked more frequently than the query that uses its data, its output is transmitted to “ at the frequency of the query rather than that of the call invocation, thus entailing less communication. Replicating service definitions When the data is replicated together with its embedded calls, we may wan
40、t to also replicate, for declarative services, the code of the called services as well as the data that they use Things become more complex when service definitions are replicated. One has to decideif and how to modify the service code to best fit the needs of P, Which data the code uses, and how mu
41、ch of it to replicate, and recursively, for which service calls appearing in this replicated data, the code (and the data that it uses) should be also replicated.,Replication Algorithm,Algorithm repDecision Input: configuration con f, service implementation Q Output: configuration con f1 con f1 con
42、f, repData 0 foreach path expression pe over docin Q pe is of the form l1c1/l2c2/lk / evaluate pe by top-down navigation in doc foreach step j in the evaluation of pe, j = 1,2,.,k Q1 /lj+1/lj+2/lk if exists sc|sc child of a node in the current node list, sc is a call to a service sv, whose output ty
43、pe may contain a path lj+1/lk then repData the set of subtrees rooted at the current node list con f1 con f U repData U Q1 if cost(con f1) cost(con f) then foreach sv1 call of service in repData con f1 repDecision(con f1, def(sv1)endforbreak / stop here for evaluation of peelse nop;else nop; endfor / the evaluation of pe is over if (empty ( repData) / repData has not yet been assigned repData the result of pe on doc con f1 con f U repData foreach sv1 call of service in repData con f1 repDecision(con f1,def(sv1)endfor endfor return con f1,
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1