1、Monitoring Streams - A New Class of Data Management Applications,Don Carney Brown UniversityUur etintemel Brown UniversityMitch Cherniack Brandeis UniversityChristian Convey Brown UniversitySangdon Lee Brown UniversityGreg Seidman Brown UniversityMichael Stonebraker MITNesime Tatbul Brown University
2、Stan Zdonik Brown University,Background,MIT/Brown/Brandeis team First Aurora, then Borealis Practical system Designed for Scalablility: 106 stream inputs, queries QoS-Driven Resource Management Stream Storage Management Realiability/ Fault Tolerance Distribution and AdaptivityFirst stream startup: S
3、treamBase Financial applications,Example Stream Applications,Market Analysis Streams of Stock Exchange Data Critical Care Streams of Vital Sign Measurements Physical Plant Monitoring Streams of Environmental Readings Biological Population Tracking Streams of Positions from Individuals of a Species,N
4、ot Your Average DBMS,External, Autonomous Data Sources Querying Time-Series Triggers-in-the-large Real-time response requirements Noisy Data, Approximate Query Results,Outline,2. Aurora Overview/ Query Model Runtime Operation Adaptivity,Aurora from 100,000 Feet,Query,. . .,. . .,Query,. . .,Query,.
5、. .,. . .,. . .,. . .,Aurora from 100 Feet,. . .,. . .,. . .,. . .,. . .,Queries = Workflow (Boxes and Arcs) Workflow Diagram = “Aurora Network” Boxes = Query Operators Arcs = Streams,s,s,m,s,m,s,Slide,Tumble,m,s,Streams (Arcs) stream: tuple sequence from common source (e.g., sensor) tuples timestam
6、ped on arrival (Internal use: QoS),Query Operators (Boxes) Simple: FILTER, MAP, RESTREAM Binary: UNION, JOIN, RESAMPLE Windowed: TUMBLE, SLIDE, XSECTION, WSORT,Aurora in Action,. . .,. . .,. . .,. . .,. . .,s,s,m,s,m,s,Slide,Tumble,m,s,s,s,s,s,s,s,m,m,s,s,s,s,s,s,s,s,m,m,s,s,s,s,m,m,App,Tumble,Tumbl
7、e,App,“Box-at-a-time” Scheduling,Arcs Tuple Queues,Outputs Monitored for QoS,Continuous and Historical Queries,Connection Point,1 Hour,Quality-of-Service (QoS),Output Value,Specifies “Utility” Of Imperfect Query Results Delay-Based (specify utility of late results) Delivery-Based, Value-Based (speci
8、fy utility of partial results)QoS InfluencesScheduling, Storage Management, Load Shedding,% Tuples Delivered,B,Delay,A,C,Talk Outline,Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions,Runtime Operation Basic Architecture,Scheduler,QOS Monitor,Box Proc
9、essors,Router,Runtime Operation Scheduling: Maximize Overall QoS,Choice 1:,A: Cost: 1 sec,(, age: 1 sec),B: Cost: 2 sec,(, age: 3 sec),Delay = 2 sec Utility = 0.5,Delay = 5 sec Utility = 0.8,Schedule Box A now rather than later Ideal: Maximize Overall Utility Presently exploring scalable heuristics
10、(e.g., feedback-based),Choice 2:,Runtime Operation Scheduling: Minimizing Per Tuple Processing Overhead,Train Scheduling:,A,B,A (x),A (y),A (z),B (A (x),B (A (y),B (A (z),Default Operation: = Context Switch,Run-time Queue Management Prefetch Queues Prior to Being Scheduled Drop Tuples from Queues to
11、 Improve QoS2. Connection Point ManagementSupport Efficient (Pull-Based) Access to Historical DataE.g., indexing, sorting, clustering, ,Runtime Operation Storage Management,Talk Outline,Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions,Stream Query Op
12、timization,Differences with Traditional Query Optimization?,Stream Query Optimization,New classes of operators (windows) may mean new rewrites New execution modes (continuous/pipelining) More dynamic fluctuations in statistics compile time optimization not possible Global optimization not practical;
13、 as huge query networks Adaptive optimization. Other cost models taking memory into account, not throughput but output rate, etc. Query optimization and load shedding,Query Optimization,Compile-time, Global Optimization InfeasibleToo Many BoxesToo Much Volatility in Network, Data,Dynamic, Local Opti
14、mizationThreshold re when to optimize,Motivation of Query Migration,Continuous query over streams Statistics unknown before start Statistics changing during execution Stream rates, arrival pattern, distribution, etcNeed for dynamic adaptation Plan re-optimization Change the shape of query plan tree,
15、Run-time Plan Re-Optimization,Step 1 - Decide when to optimize Statistics Monitoring Step 2 Generate new query plan Query Optimization Step 3 Replace current plan by new plan Plan Migration,Adaptivity in Query Optimization,Dynamic Optimization : Migration,3. Drain Subnetwork,4. Optimize Subnetwork,5
16、. Turn on Taps,1. Identify Subnetwork,2. Buffer Inputs,Nave Plan Migration Strategy,Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan,AB,BC,A,B,C,AB,BC,A,B,C,Problem: Works for stateless operators only,Stateful
17、Operator in CQ,Why stateful Need non-blocking operators in CQ Operator needs to output partial results State data structure keep received tuples,AB,A,B,b1,b2,b3,b4,b5,ax,State A,State B,ax,ax,b2,ax,b3,Key Observation: The purge of tuples in states relies on processing of new tuples.,Example: Symmetr
18、ic NL join w/ window constraints,Nave Migration Strategy Revisited,Steps (1) Pause execution of old plan (2) Drain out all tuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan,AB,BC,A,B,C,(2) All tuples drained,(4) Processing Resumed,(3) Old Replaced By new,Deadlo
19、ck Waiting Problem:,Adaptivity Query Optimization,State Movement Protocol Parallel Track Protocol,Moving State Strategy,Basic idea Share common states between two migration boxes Key steps State Matching Match states based on IDs. State Moving Create new pointers for matched states in new box Whats
20、left? Unmatched states in new box,CD,SABC,SD,BC,SAB,SC,AB,SA,SB,AB,SA,SBCD,CD,SBC,SD,BC,SB,SC,QA,QB,QC,QD,QA,QB,QC,QD,QABCD,QABCD,Old Box,New Box,Parallel Track Strategy,Basic idea Execute both plans in parallel and gradually “push” old tuples out of old box by purging Key steps Connect boxes Execut
21、e in parallel Until old box “expired” (no old tuple or sub-tuple) Disconnect old box Start execute new box only,CD,SABC,SD,BC,SAB,SC,AB,SA,SB,AB,SA,SBCD,CD,SBC,SD,BC,SB,SC,QA,QB,QC,QD,QA,QB,QC,QD,QABCD,QABCD,1. Two Load Shedding Techniques: Random Tuple DropsAdd DROP box to network (DROP a special c
22、ase of FILTER) Position to affect queries w/ tolerant delivery-based QoS reqtsSemantic Load SheddingFILTER values with low utility (acc to value-based QoS)2. Triggered by QoS Monitore.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS,Adaptivity Load Shedding
23、,Adaptivity Detecting Overload,Throughput Analysis,Cost = c Selectivity = s,Input rate = r,1/c r Problem,Latency Analysis,Implementation GUI,Implementation Runtime,Conclusions,Aurora Stream Query Processing SystemDesigned for Scalability QoS-Driven Resource Management Continuous and Historical Queries Stream Storage Management Implemented PrototypeWeb site: www.cs.brown.edu/research/aurora/,
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1