1、1,An Information-theoretic Approach to Network Measurement and Monitoring,Yong Liu, Don Towsley, Tao Ye, Jean Bolot,2,Outline,motivationbackgroundflow-based network modelfull packet trace compression marginal/jointcoarser granularity netflow and SNMPfuture work,3,Motivation,network monitoring: sensi
2、ng a network traffic engineering, anomaly detection, single point v.s. distributeddifferent granularities full traffic trace: packet headers flow level record: timing, volume summary statistics: byte/packet countschallenges growing scales: high speed link, large topology constrained resources: proce
3、ssing, storage, transmission 30G headers/hour at UMass gatewaysolutions sampling: temporal/spatial compression: marginal/distributed,4,Questions,how much can we compress monitoring traces? how much information is captured by different monitoring granularity?packet trace/NetFlow/SNMP how much joint i
4、nformation is there in multiple monitors?joint compressiontrace aggregationmonitor placement,5,Our Contribution,flow-based network modelsexplore temporal/spatial correlation in network tracesprojection to different granularity information theoretic frameworkentropy: bound/guideline on trace compress
5、ionquantitative approach for more general problems validation against measurement from operational network,6,Entropy & Compression,Shannon entropy of discrete pression of i.i.d. symbols (length M) by coding coding: expected code length: info. theoretic bound on compression ratio: Shannon/Huffman cod
6、ing assign short codeword to frequent outcome achieve the H(X) bound,7,Entropy & Correlation,joint entropy entropy rate of stochastic processexploit temporal correlation Lempel-Ziv Coding: (LZ77, gzip, winzip) asymptotically achieve the bound for stationary processjoint entropy rate of correlated pr
7、ocessesexploit spatial correlation Slepian-Wolf Coding: (distributed compression) encode each process individually, achieve joint entropy rate in limit,8,Network Trace Compression,nave way: treat as byte stream, compress by generic toolsgzip compress UMass traces by a factor of 2network traces are h
8、ighly structured datamultiple fields per packet diversity in information richness correlation among fieldsmultiple packets per flowpackets within a flow share informationtemporal correlationmultiple monitors traversed by a flowmost fields unchanged within the networkspatial correlationnetwork models
9、 explore correlation structure quantify information content of network traces serves as lower bounds/guidelines for compression algorithms,9,Packet Header Trace,source IP address,destination IP address,data sequence number,acknowledgment number,time stamp (sec.),time stamp (sub-sec.),total length,To
10、S,vers.,HLen,IPID,flags,TTL,protocol,header checksum,destination port,source port,window size,Hlen,fragment offset,TCP flags,urgent pointer,checksum,Timing,IP Header,TCP Header,0,16,31,10,Header Field Entropy,source IP address,destination IP address,data sequence number,acknowledgment number,time st
11、amp (sec.),time stamp (sub-sec.),total length,ToS,vers.,HLen,IPID,flags,TTL,protocol,header checksum,destination port,source port,window size,Hlen,fragment offset,TCP flags,urgent pointer,checksum,Timing,IP Header,TCP Header,0,16,31,flow id,time,11,Single Point Packet Trace,T0,F0,T1,F1,T3,F0,Tn,Fn,T
12、m,F0,temporal correlation introduced by flowspackets from same flow closely spaced in timethey share header information,packet inter-arrival: # bits per packet:,12,Network Models,flow-based model flow arrivals follow Poisson with rate flows are classified to independent flow classes according to rou
13、ting (the set of routers traversed) flow i is described by: flow inter-arrival time: flow ID: flow length: packet inter-arrival time within the flow: packet arrival stochastic process:,13,Entropy in Flow Record,# bits per flow: # bits per second:marginal compression ratio determined by flow length (
14、pkts.) and variability in pkt. inter-arrival.,14,Single Point Compression: Results,Compression ratio lower bound calculated by entropy much lower than real compression algorithmReal compression algorithm difference Records IPID, packet size, TCP/UDP fields Fixed packet buffer for each flow = many fl
15、ow records for long flows,15,Distributed Network Monitoring,single flow recorded by multiple monitorsspatial correlation: traces collected at distributed monitors are correlated marginal node view: #bits/sec to represent flows seen by one node, bound on single point compression network system view:
16、#bits/sec to represent flows cross the network, bound on joint compression joint compression ratio: quantify gain of joint compression,16,“perfect” network fixed routes/constant link delay/no packet lossflow classes based on routes flows arrive with rate: # of monitors traversed: #bits per flow reco
17、rd: info. rate at node v: network view info. rate: joint compression ratio:,Baseline Joint Entropy Model,dependence on # of monitors travered,17,Joint Compression: Results,18,Coarser Granularity Models,NetFlow model similar to flow model: joint compression result similar to full trace SNMP model any
18、 link SNMP rate process is sum of rate processes of all flow classes passing through that link traffic rates of flow classes are independent Gaussian entropy can be calculated by covariance of these processes information loss due to summation small joint information between monitorsdifficult to reco
19、ver rates of flow classes from SNMP data,19,Joint Compression Ratio of Different Granularity,20,Conclusion,information theoretic bound on marginal compression ratio - 20% (time+flow id, even lower if include other low entropy fields) marginal compression ratio high (not very compressible) in SNMP, l
20、ower in NetFlow, and the lowest in full tracejoint coding is much more useful/nessassary in full trace case than in SNMP“More entropy for your buck”,21,Future Work,network impairments how many more bits for delay/loss/route changemodel netflow with sampling distributed compression algorithmslossless v.s. lossy compressionentropy based monitor placement maximize information under constraints,22,Thanks!,