1、Systems Support for End-to-End Performance Management,Sandip Agarwala PhD Advisor: Karsten SchwanCollege of Computing Georgia Tech,Source: Gartner (December 2005),Complexity, complexity, complexity,Reasons for Complexity,Application diversity Interdependencies Heterogeneous components Too many diffe
2、rent technologies and platform Too little “hints” from the system to the administrators Legacy issues; Application-specific solutions Insufficient information about the system to drive self-management Lack of Automation,Online System Management,Control,Execute,Monitor,Analyze,Workload,Scheduling Cap
3、acity and SLA management Design evaluation and tuning Bottleneck detection Resource provisioning, accounting, etc.,Proposed Approach: Service Path,Service Path,System abstractions that describe the dynamic dependencies between the different distributed application components Service Class: Applicati
4、on-level request class, e.g. SLA class,Service Path Characteristics,End-to-End analysis Online Non-intrusive Application-generic,Outline,Background Motivation Service path Discovery with E2EProf Refinement with SysProf Automated SLA Enforcement Related Work Future Plans,E2EProf,Black-box approach Co
5、rrelate per-edge time series signals Monitor network packet traces (source, destination, timestamps),Model traces as per-edge time series signals or density functions,Basic Approach,A,X,B,C,D,SpikeCausality,Spikes position Delay,No spike,Evaluation with 4-tier RUBiS1,Tomcat Server 1,Tomcat Server 2,
6、MySQL Server,Apache Web Server,1http:/rubis.objectweb.org/,Clients,comment,bidding,CPU bound,I/O bound,EJB Server 2,EJB Server 1,Service Path Detection in RUBiS,Highest delay node,Static server assignment,Round-robin load balancer,Change detection in RUBiS,Injected Delay,Revenue Pipeline Total Traff
7、ic: 1.34 million / day (56k / hour),Delta Air Lines Application,TACSIN & TACSOUT,XIN & XOUT,APEXIN & APEXOUT,Error/Warning (Tivoli) Logs,Delta Air Lines Application,Huge request burst,Outline,Background Motivation Service path Discovery with E2EProf Refinement with SysProf Automated SLA Enforcement
8、Related Work Future Plans,Beyond dependency and latency,C1,C2,S1,S3,S2,S5,S6,S4,Solution: Zoom into the servicepath with SysProf No application hints or instrumentation Monitor resource usage on per-class basis,SysProf Methodology,eth driver,BDD,Network Stack,System Call,FS/ VM/ etc.,A1,A2,AN,Schedu
9、ler,User,Kernel,Scheduler,Instrumentation points,From client,To client,Init CID,Context Switches,Context Switches,Net softirq,system call parameters, PID, App functions,Disk I/O,Track request context Work done for processing a request class May span user-level or kernel-level Executes in more than o
10、ne contexts (e.g. processes, threads, softirqs) Happens in a system-visible event (e.g. system calls),Class ID Propagation,Init CID,Process CID,From client,To client,Msg CID,Packet CID,Inherits CID,Front-Tier,Middle-Tier,End-Tier,User,Kernel,Application of SysProf,Resource Accounting Utility Billing
11、 Bottleneck detection Capacity Estimation Root-Cause Analysis Black-Box SLA management,Resource-Aware Adaptive Control,Tomcat Server 1,Tomcat Server 2,MySQL Server,EJB Server 2,EJB Server 1,Class 1,Class 2,Class 3,Cluster workloads contending for same resources,Separate Queue/Controller for each clu
12、ster,Front-end,Controller + Scheduler,Resource-Aware Adaptive Control,With SysProf,Capacity = 80 req/s per server,No SysProf,Summary,Service Path System abstractions to represent dependencies and request path E2EProf and Pathmap Dependency and latency analysis SysProf Service-based resource analysis
13、 Aid human operator and automate end-to-end performance management,Thank You!Questions?Email: sandipcc.gatech.edu,Extra Slides,Pathmap Optimizations,time,Packet timestamp trace,Time-series signal Or Density Function,Cross-correlation series,Bursty traffic,Sliding window (W),Run-length compression,Upper-bound On latency,W,