1、Traffic Measurement for Network Operations,Jennifer RexfordIP Network Management and Performance AT Florham Park, NJ,Outline of Tutorial,Introduction (1.5 hours) Measurement techniques (3 hours) General terminology SNMP and RMON Packet monitoring Flow measurement Data interpretation Network-wide mod
2、els (1.5 hours) Path matrix (trajectory sampling, IP traceback) Traffic matrix (network tomography, MPLS MIBs) Demand matrix (joining flow and routing data),Introduction: Outline,Example challenges for network operators Detect, diagnose, and fix Internet Protocol (IP) background Protocols, addressin
3、g, and design goals Internet Service Provider networks ISP architecture and routing protocols Responsibilities of network operators Challenges, timescales, and key tasks Network state Topology, configuration, and routing,Network Operations: Detecting the Problem,overload!,Detecting the problem!High
4、utilization or loss statistics for the link?High delay or low throughput for probe traffic?Complaint from angry customer (via phone network)?,“Dont IP networks manage themselves?”Doesnt TCP adapt automatically to network congestion?Dont the routing protocols automatically reroute after a failure?,Ne
5、twork Operations: Excess Traffic,Network Operations: DoS Attack,Denial-of-Service attack,Network Operations: Link Failure,Summary of the Examples,How to detect that a link is congested? Periodic polling of link statistics Active probes measuring performance Customer complaints How to diagnose the re
6、ason for the congestion? Change in user behavior Denial of service attack Router/link failure or policy change How to fix the problem? Interdomain routing change Installation of packet filters Intradomain routing change Network measurement plays a key role in each step!,IP Protocol Background,Charac
7、teristics of the Internet,The Internet is Decentralized (loose confederation of peers) Self-configuring (no global registry of topology) Stateless (limited information in the routers) Connectionless (no fixed connection between hosts) These attributes contribute To the success of Internet To the rap
8、id growth of the Internet and the difficulty of controlling the Internet!,ISP,sender,receiver,IP Connectionless Paradigm,No error detection or correction for packet data Higher-level protocol can provide error checking Successive packets may not follow the same path Not a problem as long as packets
9、reach the destination Packets can be delivered out-of-order Receiver can put packets back in order (if necessary) Packets may be lost or arbitrarily delayed Sender can send the packets again (if desired) No network congestion control (beyond “drop”) Sender can slow down in response to loss or delay,
10、Layering in the IP Protocols,Internet Protocol,Transmission Control Protocol (TCP),User Datagram Protocol (UDP),Telnet,HTTP,SONET,ATM,Ethernet,RTP,DNS,FTP,IP Suite: End Hosts vs. Routers,HTTP,TCP,HTTP,TCP,IP,Ethernet interface,SONET interface,SONET interface,host,host,router,router,HTTP message,TCP
11、segment,IP packet,IP packet,IP packet,Example: HTTP Delay,Browser cache,DNS resolution,TCP open,1st byte response,Last byte response,Sources of variability of delay Browser cache hit/miss, need for cache revalidation DNS cache hit/miss, multiple DNS servers, errors Packet loss, high RTT, server acce
12、pt queue RTT, busy server, CPU overhead (e.g., CGI script) Response size, receive buffer size, congestion downloading embedded image(s) on the page,IP Addressing,32-bit number in dotted-quad notation (12.34.158.5) Divided into network & host portions (left and right) 12.34.158.0/23 is a 23-bit prefi
13、x with 29 addresses,Network (23 bits),Host (9 bits),12,34,158,5,Classless InterDomain Routing (CIDR),Prefixes are key to Internet scalability Address allocation by ARIN/RIPE/APNIC and by ISPs Routing protocols and packet forwarding based on prefixes Today, routing tables contain 150,000 prefixes For
14、warding based on the longest prefix match Destination-based forwarding of IP packets Forwarding table maps prefix to next-hop link(s) Router identifies the longest matching prefix,4.0.0.0/8 4.83.128.0/17 12.0.0.0/8 12.34.158.0/23 126.255.103.0/24,12.34.158.5,IP Design Philosophy: Main Goals Clark88,
15、Effective multiplexed utilization of existing networks Packet switching, not circuit switching Continued communication despite network failures Routers dont store state about ongoing transfers End hosts provide key communication services Support for multiple types of communication service Multiple t
16、ransport protocols (e.g., TCP and UDP) Accommodation of a variety of different networks Simple, best-effort packet delivery service Packets may be lost, corrupted, or delivered out of order Distributed management of network resources Multiple institutions managing the network Intradomain and interdo
17、main routing protocols,Operator Philosophy: Tension With IP,Accountability of network resources But, routers dont maintain state about transfers But, measurement isnt part of the infrastructure Reliability/predictability of services But, IP doesnt provide performance guarantees But, equipment is not
18、 very reliable (no “five-9s”) Fine-grain control over the network But, routers dont do fine-grain resource allocation But, network self-configures after failures End-to-end control over communication But, end hosts adapt to congestion But, traffic may traverse multiple domains,The Role of Traffic Me
19、asurement,Operations (control) Generating reports for customers and internal groups Diagnosing performance and reliability problems Tuning the configuration of the network to the traffic Planning outlay of new equipment (routers, proxies, links) Science (discovery) End-to-end characteristics of dela
20、y, throughput, and loss Verification of models of TCP congestion control Workload models capturing the behavior of Web users Understanding self-similarity/multi-fractal traffic We focus helping operators run the network, and assume we have access to the network infrastructure,Measurement Challenges
21、for Operators,Network-wide view Crucial for evaluating control actions Multiple kinds of data from multiple locations Large scale Large number of high-speed links and routers Large volume of measurement data Poor state-of-the-art Working within existing protocols and products Technology not designed
22、 with measurement in mind The “do no harm” principle Dont degrade router performance Dont require disabling key router features Dont overload the network with measurement data,ISP Background and Network Operations,ISP Background: Outline,Autonomous Systems (ASes) Definition of an Autonomous System P
23、eer, provider, and customer relationships Internet Service Provider architecture Example backbone network Logical view of a backbone Architecture of a high-end router Routing protocols Border Gateway Protocol (BGP) Interior Gateway Protocols (IGPs),Internet Architecture,Divided into Autonomous Syste
24、ms Distinct regions of administrative control (15,000) Set of routers and links managed by a single “institution” Service provider, company, university, Hierarchy of Autonomous Systems Large, tier-1 provider with a nationwide backbone Medium-sized regional provider with smaller backbone Small networ
25、k run by a single company or university Interaction between Autonomous Systems Internal topology is not shared between ASes but, neighboring ASes interact to coordinate routing,What is an “Institution”?,Not equivalent to an AS Many institutions span multiple autonomous systems Some institutions do n
26、ot have their own AS number Ownership of an AS may be hard to pinpoint (whois) Not equivalent to a block of IP addresses (prefix) Many institutions have multiple (non-contiguous) prefixes Some institutions are a small part of a larger address block Ownership of a prefix may be hard to pinpoint (whoi
27、s) Not equivalent to a domain name () Some sites may be hosted by other institutions Some institutions have multiple domain names (),Connections Between ASes,Internet Service Provider Backbone,modem banks, business customers, web/e-mail servers,neighboring providers,Gateway routers,Backbone routers,
28、Access routers,Inside a High-End Router,Switching Fabric,Processor,Line card,Line card,Line card,Line card,Line card,Line card,Components of a High-End Router,Route processor Implementation of the various routing protocols Creation of forwarding table for the line cards Command-line interface for ne
29、twork operators Handling of packets directed to the Loopback address Handling of “special packets” (IP options, expired TTL) Switching fabric Forwarding of packet from input to output interface Line cards Link-layer protocol to convert to/from IP packets Packet handling (filtering, route look-up, bu
30、ffering, rate limiting, ToS marking, link scheduling,) Transfer of packet to/from the switching fabric,Interdomain Routing (Between ASes),1,2,3,4,5,6,7,Client,Web server,Path: 6, 5, 4, 3, 2, 1,Border Gateway Protocol (BGP),ASes exchange info about who they can reach IP prefix: block of destination I
31、P addresses AS path: sequence of ASes along the path Policies configured by the network operator Path selection: which of the paths to use? Path export: which neighbors to tell?,1,2,3,12.34.158.5,“I can reach 12.34.158.0/23”,“I can reach 12.34.158.0/23 via AS 1”,Intradomain Routing: OSPF or IS-IS,Sh
32、ortest path routing based on link weights Routers flood link-state information to each other Routers compute the “next hop” to reach others Weights configured by the network operator Simple heuristics: link capacity or physical distance Traffic engineering: tuning link weights to traffic,Asymmetric
33、Routes: Hot-Potato Routing,Web request and TCP ACKs,Web response,client,server,Network Operations: Outline,Operating a network Control loop, timescales, and practical challenges Operator tasks Reporting, troubleshooting, traffic engineering, provisioning, capacity planning, architecture Network mode
34、l Network state and data sources Conclusions,Responsibilities of Network Operators,Operating a Network,Control loop Detect: note the symptoms Diagnose: identify the illness Fix: select and dispense the medicine Key ingredients Measurement of the traffic and the network status Analysis and modeling o
35、f the measurement data Modeling of the network control mechanism (“what if”) Time scales Minutes to hours Days to weeks Months to years,Practical Challenges,Increase in the scale of the network Link speeds, # of routers/links, # of peering points Large network has 100s of routers and 1000s of links
36、Significant traffic fluctuations Time-of-day changes and addition of new customers/peers Special events (Olympics) and new applications (Napster) Difficult to forecast traffic load before designing topology Market demand for stringent network performance Service level agreements (SLAs), high-quality
37、 voice-over-IP Increase in network capability & feature complexity New services (Quality of Service, Virtual Private Networks) New routing protocols (MPLS, multicast),Network Operations Tasks,Reporting of network-wide statistics Generating basic information about usage and reliability Performance/re
38、liability troubleshooting Detecting and diagnosing anomalous events Traffic engineering Adjusting network configuration to the prevailing traffic Capacity planning Deciding where and when to install new equipment Provisioning of existing network Process of adding new customers/peers, routers/links,
39、etc. Selecting and testing new network architectures MPLS routing, multicast, monitoring, quality-of-service, .,Basic Reporting,Producing basic statistics about the network For business purposes, network planning, ad hoc studies Examples Proportion of transit vs. customer-customer traffic Total volu
40、me of traffic sent to/from each private peer Mixture of traffic by application (Web, Napster, etc.) Mixture of traffic to/from individual customers Usage, loss, and reliability trends for each link Requirements Network-wide view of basic traffic and reliability statistics Ability to “slice and dice”
41、 measurements in different ways (e.g., by application, by customer, by peer, by link type),Topology and Link Utilization,Utilization: link color (high to low),Troubleshooting,Detecting and diagnosing problems Recognizing and explaining anomalous events Examples Why a backbone link is suddenly overlo
42、aded Why the route to a destination prefix is flapping Why DNS queries are failing with high probability Why a route processor has high CPU utilization Why a customer cannot reach certain Web sites Requirements Network-wide view of many protocols and systems Diverse measurements at different protoco
43、l levels Thresholds for isolating significant phenomena,Traffic Flow Through Backbone,Color/size of node: proportional to traffic to this router (high to low) Color/size of link: proportional to traffic carried (high to low),Peering point,Traffic Engineering,Adjusting resource allocation policies Pa
44、th selection, buffer management, and link scheduling Examples Changing IGP weights to divert traffic from congested links Changing BGP policies to balance load on peering links Changing RED parameters to improve TCP throughput Changing WFQ weights to reduce delay for “gold” traffic Requirements Netw
45、ork-wide view of the traffic carried in the backbone Timely view of the network topology and configuration Accurate models to predict impact of control operations (e.g., the impact of RED parameters on TCP throughput),BGP Policy Change,Two large flows of traffic,New egress point for the flow,Capacit
46、y Planning,Deciding whether to buy/install new equipment What? Where? When? Examples Where to put the next backbone router When to upgrade a peering link to higher capacity Whether to add/remove a particular private peer Whether the network can accommodate a new customer Whether to install a caching
47、 proxy for cable modems Requirements Projections of future traffic patterns from measurements Cost estimates for buying/deploying the new equipment Model of the potential impact of the change (e.g., latency reduction and bandwidth savings from a caching proxy),Network State,Network State: Not Just T
48、raffic Measurement,Topology Routers and links, and their connectivity and capacity BGP sessions with neighbors and within the backbone Configuration Path selection (e.g., OSPF weights, BGP policies) Link scheduling (e.g., FIFO or WFQ weights) Buffer management (e.g., drop-tail or RED parameters) Pac
49、ket filters (e.g., ingress filters to prevent DoS) Interdomain routing Reachability to neighboring domains (e.g., BGP updates),Necessary for a network-wide view for the operator,Network State: Data Sources,Router configuration files Router name, OS version, IP address, running processes Individual i
50、nterfaces and their location in the router Set of commands applied against the router Polling/trapping of SNMP data Up/down status of individual links, sessions, etc. Router forwarding tables Next-hop link(s) for each destination prefix BGP routing tables or BGP monitors Routing choices advertised by other domains,