Beyond Bloom Filters- Approximate Concurrent State Machines.ppt

上传人:wealthynice100 文档编号:378926 上传时间:2018-10-09 格式:PPT 页数:43 大小:234KB
下载 相关 举报
Beyond Bloom Filters- Approximate Concurrent State Machines.ppt_第1页
第1页 / 共43页
Beyond Bloom Filters- Approximate Concurrent State Machines.ppt_第2页
第2页 / 共43页
Beyond Bloom Filters- Approximate Concurrent State Machines.ppt_第3页
第3页 / 共43页
Beyond Bloom Filters- Approximate Concurrent State Machines.ppt_第4页
第4页 / 共43页
Beyond Bloom Filters- Approximate Concurrent State Machines.ppt_第5页
第5页 / 共43页
亲,该文档总共43页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、Beyond Bloom Filters: Approximate Concurrent State Machines,Michael Mitzenmacher Joint work with Flavio Bonomi, Rina Panigrahy, Sushil Singh, George Varghese,Goals of the Talk,Survey some of my recent work on Bloom filters and related hashing-based data structures. But lots of other people currently

2、 working in this area an area of research in full bloom. Highlight: new results from SIGCOMM, ESA, Allerton 2006. For more technical details and experimental results, see papers at my home page.,Motivation: Router State Problem,Suppose each flow has a state to be tracked. Applications: Intrusion det

3、ection Quality of service Distinguishing P2P traffic Video congestion control Potentially, lots of others! Want to track state for each flow. But compactly; routers have small space. Flow IDs can be 100 bits. Cant keep a big lookup table for hundreds of thousands or millions of flows!,Approximate Co

4、ncurrent State Machines,Model for ACSMs We have underlying state machine, states 1X. Lots of concurrent flows. Want to track state per flow. Dynamic: Need to insert new flows and delete terminating flows. Can allow some errors. Space, hardware-level simplicity are key.,Problems to Be Dealt With,Keep

5、ing state values with small space, small probability of errors. Handling deletions. Graceful reaction to adverarial/erroneous behavior. Invalid transitions. Non-terminating flows. Could fill structure if not eventually removed. Useful to consider data structures in well-behaved systems and ill-behav

6、ed systems.,Results,Comparison of multiple ACSM proposals. Based on Bloom filters, d-left hashing, fingerprints. Surprisingly, d-left hashing much better! Experimental evaluation. Validates theoretical evaluation. Demonstrates viability for real systems. New construction for Bloom filters. New d-lef

7、t counting Bloom filter structure. Factor of 2 or better in terms of space.,Review: Bloom Filters,Given a set S = x1,x2,x3,xn on a universe U, want to answer queries of the form:Bloom filter provides an answer in “Constant” time (time to hash). Small amount of space. But with some probability of bei

8、ng wrong. Alternative to hashing with interesting tradeoffs.,Bloom Filters,Start with an m bit array, filled with 0s.,Hash each item xj in S k times. If Hi(xj) = a, set Ba = 1.,To check if y is in S, check B at Hi(y). All k values must be 1.,Possible to have a false positive; all k values are 1, but

9、 y is not in S.,n items m = cn bits k hash functions,False Positive Probability,Pr(specific bit of filter is 0) isIf r is fraction of 0 bits in the filter then false positive probability is Approximations valid as r is concentrated around Er. Martingale argument suffices. Find optimal at k = (ln 2)m

10、/n by calculus. So optimal fpp is about (0.6185)m/n,n items m = cn bits k hash functions,Example,m/n = 8,Opt k = 8 ln 2 = 5.45.,n items m = cn bits k hash functions,Handling Deletions,Bloom filters can handle insertions, but not deletions.If deleting xi means resetting 1s to 0s, then deleting xi wil

11、l “delete” xj.,0,1,0,0,1,0,1,0,0,1,1,1,0,1,1,0,B,xi xj,Counting Bloom Filters,Start with an m bit array, filled with 0s.,Hash each item xj in S k times. If Hi(xj) = a, add 1 to Ba.,To delete xj decrement the corresponding counters.,Can obtain a corresponding Bloom filter by reducing to 0/1.,Counting

12、 Bloom Filters: Overflow,Must choose counters large enough to avoid overflow. Poisson approximation suggests 4 bits/counter. Average load using k = (ln 2)m/n counters is ln 2. Probability a counter has load at least 16:Failsafes possible. We assume 4 bits/counter for comparisons.,ACSM Basics,Operati

13、ons Insert new flow, state Modify flow state Delete a flow Lookup flow state Errors False positive: return state for non-extant flow False negative: no state for an extant flow False return: return wrong state for an extant flow Dont know: return dont know Dont know may be better than other types of

14、 errors for many applications, e.g., slow path vs. fast path.,ACSM via Counting Bloom Filters,Dynamically track a set of current (FlowID,FlowState) pairs using a CBF. Consider first when system is well-behaved. Insertion easy. Lookups, deletions, modifications are easy when current state is given. I

15、f not, have to search over all possible states. Slow, and can lead to dont knows for lookups, other errors for deletions.,Direct Bloom Filter (DBF) Example,0,0,1,0,2,3,0,0,2,1,0,1,1,2,0,0,0,0,0,0,1,3,0,0,3,1,1,1,1,2,0,0,Timing-Based Deletion,Motivation: Try to turn non-terminating flow problem into

16、an advantage. Add a 1-bit flag to each cell, and a timer. If a cell is not “touched” in a phase, 0 it out. Non-terminating flows eventually zeroed. Counters can be smaller or non-existent; since deletions occur via timing. Timing-based deletion required for all of our schemes.,Timer Example,3,0,0,2,

17、1,0,1,1,1,0,0,0,1,0,1,0,3,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,RESET,Timer bits,Stateful Bloom Filters,Each flow hashed to k cells, like a Bloom filter. Each cell stores a state. If two flows collide at a cell, cell takes on dont know value. On lookup, as long as one cell has a state value, and there are n

18、ot contradicting state values, return state. Deletions handled by timing mechanism (or counters in well-behaved systems). Similar in spirit to KM, Bloom filter summaries for multiple choice hash tables.,Stateful Bloom Filter (SBF) Example,1,4,3,4,3,3,0,0,2,1,0,1,4,?,0,2,1,4,5,4,5,3,0,0,2,1,0,1,4,?,0

19、,2,What We Need : A New Design,These Bloom filter generalizations were not doing the job. Poor performance experimentally. Maybe we need a new design for Bloom filters! In real life, things went the other way; we designed a new ACSM structure, and found that it led to a new Bloom filter design.,Inte

20、rlude : The Main Point,There are alternative ways to design Bloom filter style data structures that are more effective for some variations, applications. The goal is to accomplish this while maintaining the simplicity of the Bloom filter design. For ease of programming. For ease of design in hardwar

21、e. For ease of user understanding!,A New Approach to Bloom Filters,Folklore Bloom filter construction. Recall: Given a set S = x1,x2,x3,xn on a universe U, want to answer membership queries. Method: Find an n-cell perfect hash function for S. Maps set of n elements to n cells in a 1-1 manner. Then k

22、eep bit fingerprint of item in each cell. Lookups have false positive e. Advantage: each bit/item reduces false positives by a factor of 1/2, vs ln 2 for a standard Bloom filter. Negatives: Perfect hash functions non-trivial to find. Cannot handle on-line insertions.,Perfect Hashing Approach,Element

23、 1,Element 2,Element 3,Element 4,Element 5,Fingerprint(4),Fingerprint(5),Fingerprint(2),Fingerprint(1),Fingerprint(3),Near-Perfect Hash Functions,In BM96, we note that d-left hashing can give near-perfect hash functions.,Review: d-left Hashing,Split hash table into d equal subtables. To insert, choo

24、se a bucket uniformly for each subtable. Place item in a cell in the least loaded bucket, breaking ties to the left.,Properties of d-left Hashing,Analyzable using both combinatorial methods and differential equations. Maximum load very small: O(log log n). Differential equations give very, very accu

25、rate performance estimates. Maximum load is extremely close to average load for small values of d.,Example of d-left hashing,Consider 3-left performance.,Average load 4,Average load 6.4,Near-Perfect Hash Functions,In BM96, we note that d-left hashing can give near-perfect hash functions. Useful even

26、 with deletions. Main differences Multiple buckets must be checked, and multiple cells in a bucket must be checked. Not perfect in space usage. In practice, 75% space usage is very easy. In theory, can do even better.,First Design : Just d-left Hashing,For a Bloom filter with n elements, use a 3-lef

27、t hash table with average load 4, 60 bits per bucket divided into 6 fixed-size fingerprints of 10 bits. Overflow rare, can be ignored. False positive rate of Vs. 0.000744 for a standard Bloom filter. Problem: Too much empty, wasted space. Other parametrizations similarly impractical. Need to avoid w

28、asting space.,Just Hashing : Picture,Bucket,1011011100,0001110101,1010101000,0000111111,Empty,Empty,Key: Dynamic Bit Reassignment,Use 64-bit buckets: 4 bit counter, 60 bits divided equally among actual fingerprints. Fingerprint size depends on bucket load. False positive rate of 0.0008937 Vs. 0.0004

29、587 for a standard Bloom filter. DBR: Within a factor of 2. And would be better for larger buckets. But 64 bits is a nice bucket size for hardware. Can we remove the cost of the counter?,DBR : Picture,Bucket,Count : 4,000110110101 111010100001 101010101000 101010110101 010101101011,Semi-Sorting,Fing

30、erprints in bucket can be in any order. Semi-sorting: keep sorted by first bit. Use counter to track #fingerprints and #fingerprints starting with 0. First bit can then be erased, implicitly given by counter info. Can extend to first two bits (or more) but added complexity.,DBR + Semi-sorting : Pict

31、ure,Bucket,Count : 4,2,000110110101 111010100001 101010101000 101010110101 010101101011,DBR + Semi-Sorting Results,Using 64-bit buckets, 4 bit counter. Semi-sorting on loads 4 and 5. Counter only handles up to load 6. False positive rate of 0.0004477 Vs. 0.0004587 for a standard Bloom filter. This i

32、s the tradeoff point. Using 128-bit buckets, 8 bit counter, 3-left hash table with average load 6.4. Semi-sorting all loads: fpr of 0.00004529 2 bit semi-sorting for loads 6/7: fpr of 0.00002425 Vs. 0.00006713 for a standard Bloom filter.,Additional Issues,Futher possible improvements Group buckets

33、to form super-buckets that share bits. Conjecture: Most further improvements are not worth it in terms of implementation cost. Moving items for better balance? Underloaded case. New structure maintains good performance.,Improvements to Counting Bloom Filter,Similar ideas can be used to develop an im

34、proved Counting Bloom Filter structure. Same idea: use fingerprints and a d-left hash table. Counting Bloom Filters waste lots of space. Lots of bits to record counts of 0. Our structure beats standard CBFs easily, by factors of 2 or more in space. Even without dynamic bit reassignment.,End of Inter

35、lude : Back to ACSMs,How do we use this new design for ACSMs?,Fingerprint Compressed Filter,Each flow hashed to d choices in the table, placed at the least loaded. Fingerprint and state stored. Deletions handled by timing mechanism or explicitly. False positives/negatives can still occur (especially

36、 in ill-behaved systems). Lots of parameters: number of hash functions, cells per bucket, fingerprint size, etc. Useful for flexible design.,Fingerprint Compressed Filter (FCF) Example,Experiment Summary,FCF-based ACSM is the clear winner. Better performance than less space for the others in test si

37、tuations. ACSM performance seems reasonable: Sub 1% error rates with reasonable size.,Conclusions and Open Questions,Approximate concurrent state machines are very practical, potentially very useful. Natural progression from set membership to functions (Bloomier filter) to state machines. What is ne

38、xt? Surprisingly, d-left hashing variants appear much stronger that standard Bloom filter constructions. Leads to new Bloom filter/counting Bloom filter constructions, well suited to hardware implementation. Lots more to understand. Tradeoffs of different errors at the data structure level. Impact of different errors at the application level. On the fly dynamic optimization of data structure. Reduce fingerprint bits as load increases?,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 教学课件 > 大学教育

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1