ImageVerifierCode 换一换
格式:PPT , 页数:42 ,大小:330.41KB ,
资源ID:379242      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-379242.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Cache-Aware Lock-Free Queues for Multiple Producers-.ppt)为本站会员(吴艺期)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Cache-Aware Lock-Free Queues for Multiple Producers-.ppt

1、Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency,Anders Gidenstam Hkan Sundell Philippas Tsigas,Distributed Computing and Systems group, Department of Computer Science and Engineering, Chalmers University of Technology,School of business and informatics Unive

2、rsity of Bors,10/10/2018,Anders Gidenstam, University of Bors,2,Outline,Introduction Lock-free synchronization The Problem & Related work The new lock-free queue algorithm Experiments Conclusions,10/10/2018,Anders Gidenstam, University of Bors,3,Synchronization on a shared object,Lock-free synchroni

3、zation Concurrent operations without enforcing mutual exclusion Avoids: Blocking (or busy waiting), convoy effects, priority inversion and risk of deadlock Progress Guarantee At least one operation always makes progress,10/10/2018,Anders Gidenstam, University of Bors,4,Correctness of a concurrent ob

4、ject,Desired semantics of a shared data object Linearizability Herlihy & Wing, 1990 For each operation invocation there must be one single time instant during its duration where the operation appears to take effect. The observed effects should be consistent with a sequential execution of the operati

5、ons in that order.,10/10/2018,Anders Gidenstam, University of Bors,5,Correctness of a concurrent object,Desired semantics of a shared data object Linearizability Herlihy & Wing, 1990 For each operation invocation there must be one single time instant during its duration where the operation appears t

6、o take effect. The observed effects should be consistent with a sequential execution of the operations in that order.,Processes can read/write single memory words Synchronization primitives Built into CPU and memory system Atomic read-modify-write (i.e. a critical section of one instruction) Example

7、s: Compare-and-Swap, Load-Linked / Store-Conditional,System Model,10/10/2018,Anders Gidenstam, University of Bors,6,CPU,CPU,Shared Memory,A process Reads/writes may reach memory out of order Reads of own writes appear in program order Atomic synchronization primitive/instruction Single word Compare-

8、and-Swap Atomic Acts as memory barrier for the process own reads and writes All own reads/writes before are done before All own reads/writes after are done after The affected cache block is held exclusively,System Model: Memory Consistency,10/10/2018,Anders Gidenstam, University of Bors,7,10/10/2018

9、,Anders Gidenstam, University of Bors,8,Outline,Introduction Lock-free synchronization The Problem & Related work The new lock-free queue algorithm Experiments Conclusions,The Problem,Concurrent FIFO queue shared data object Basic operations: enqueue and dequeueDesired Properties Linearizable and Lo

10、ck-free Dynamic size (maximum only limited by available memory) Bounded memory usage (in terms of live contents) Fast on real systems,10/10/2018,Anders Gidenstam, University of Bors,9,B,C,E,D,A,F,G,Tail,Head,Related Work: Lock-free Multi-P/C Queues,Michael & Scott, 1996 Linked-list, one element/node

11、 Global shared head and tail pointers Tsigas & Zhang, 2001 Static circular array of elements Two different NULL values for distinguishing initially empty from dequeued elements Global shared head and tail indices, lazily updated Michael & Scott, 1996 + Elimination Moir, Nussbaum, Shalev & Shavit, 20

12、05 Same as the above + elimination of concurrent pairs of enqueue and dequeue when the queue is near empty Hoffman, Shalev & Shavit, 2007 Baskets queue Linked-list, one element/node Reduces contention between concurrent enqueues after conflict Needs stronger memory management than M&S (SLFRC or Bewa

13、re&Cleanup),10/10/2018,Anders Gidenstam, University of Bors,10,10/10/2018,Anders Gidenstam, University of Bors,11,Outline,Introduction Lock-free synchronization The Problem & Related work The new lock-free queue algorithm Experiments Conclusions,The Algorithm,Basic idea: Cut and unroll the circular

14、array queue Primary synchronization on the elements Compare-And-Swap (NULL1 - Value - NULL2 avoids the ABA problem) Head and tail both move to the right Need an “infinite” array of elementsu,10/10/2018,Anders Gidenstam, University of Bors,12,The Algorithm,Basic idea: Creating an “infinite” array of

15、elements. Divide into blocks of elements, and link them together New empty blocks added as needed Emptied blocks are marked deleted and eventually reclaimed Block fields: Elements, next, (filled, emptied flags), deleted flag. Linked chain of dynamically allocated blocks Lock-free memory management n

16、eeded for safe reclamation! Beware&Cleanup Gidenstam, Papatriantafilou, Sundell & Tsigas, 2009,10/10/2018,Anders Gidenstam, University of Bors,13,globalTailBlock,globalHeadBlock,*,The Algorithm,Thread local storage Last used Head block/index for Enqueue Tail block/index for Dequeue Reduces need to r

17、ead/update global shared variables,10/10/2018,Anders Gidenstam, University of Bors,14,globalTailBlock,globalHeadBlock,*,The Algorithm,Enqueue Find the right block (first via TLS, then TLS-next or globalHeadBlock) Search the block for the first empty element,10/10/2018,Anders Gidenstam, University of

18、 Bors,15,globalTailBlock,globalHeadBlock,scan,*,The Algorithm,10/10/2018,Anders Gidenstam, University of Bors,16,globalTailBlock,globalHeadBlock,Add with CAS,Enqueue Find the right block (first via TLS, then TLS-next or globalHeadBlock) Search the block for the first empty element Update element wit

19、h CAS (Also, the linearization point),*,The Algorithm,Dequeue Find the right block (first via TLS, then TLS-next or globalTailBlock) Search the block for the first valid element,10/10/2018,Anders Gidenstam, University of Bors,17,globalTailBlock,globalHeadBlock,scan,*,The Algorithm,Dequeue Find the r

20、ight block (first via TLS, then TLS-next or globalTailBlock) Search the block for the first valid element Remove with CAS, replace with NULL2 (linearization point),10/10/2018,Anders Gidenstam, University of Bors,18,globalTailBlock,globalHeadBlock,*,Remove with CAS,The Algorithm,Maintaining the chain

21、 of blocks Helping scheme when moving between blocks Invariants to be maintained globalHeadBlock points to The newest block or the block before it globalTailBlock points to The oldest active block (not deleted) or the block before it,10/10/2018,Anders Gidenstam, University of Bors,19,globalTailBlock

22、,globalHeadBlock,*,Maintaining the chain of blocks,Updating globalTailBlock Case 1 “Leader” Finds the block empty If needed help to ensure globalTailBlock points to tailBlock (or a newer block),10/10/2018,Anders Gidenstam, University of Bors,20,globalTailBlock,globalHeadBlock,*,Maintaining the chain

23、 of blocks,Updating globalTailBlock Case 1 “Leader” Finds the block empty Helping done Set delete mark,10/10/2018,Anders Gidenstam, University of Bors,21,globalTailBlock,globalHeadBlock,*,*,Maintaining the chain of blocks,Updating globalTailBlock Case 1 “Leader” Finds the block empty Helping done Se

24、t delete mark Update globalTailBlock pointer Move own tailBlock pointer,10/10/2018,Anders Gidenstam, University of Bors,22,globalTailBlock,globalHeadBlock,*,*,Maintaining the chain of blocks,Updating globalTailBlock Case 2: “Way out of date” tailBlock-next marked deleted Restart with globalTailBlock

25、,10/10/2018,Anders Gidenstam, University of Bors,23,globalTailBlock,globalHeadBlock,*,*,Maintaining the chain of blocks,Updating globalTailBlock Case 2: “Way out of date” tailBlock-next marked deleted Restart with globalTailBlock,10/10/2018,Anders Gidenstam, University of Bors,24,globalTailBlock,glo

26、balHeadBlock,*,*,Minding the Cache,Blocks occupy one cache-line Cache-lines for enqueue v.s. dequeue are disjoint (except when near empty) Enqueue/dequeue will cause coherence traffic for the affected block Scanning for the head/tail involves one cache-line,10/10/2018,Anders Gidenstam, University of

27、 Bors,25,globalTailBlock,globalHeadBlock,Minding the Cache,Blocks occupy one cache-line Cache-lines for enqueue v.s. dequeue are disjoint (except when near empty) Enqueue/dequeue will cause coherence traffic for the affected block Scanning for the head/tail involves one cache-line,10/10/2018,Anders

28、Gidenstam, University of Bors,26,globalTailBlock,globalHeadBlock,Cache-lines,Scanning in a weak memory model?,Key observations Life cycle for element values (NULL1 - value - NULL2) Elements are updated with CAS thus requiring the old value to be the expected one. Scanning only skips values later in

29、the life cycle Reading an old value is safe (will try CAS and fail),10/10/2018,Anders Gidenstam, University of Bors,27,globalTailBlock,globalHeadBlock,Scanning for empty,Scanning for first item to dequeue,10/10/2018,Anders Gidenstam, University of Bors,28,Outline,Introduction Lock-free synchronizati

30、on The Problem & Related work The lock-free queue algorithm Experiments Conclusions,10/10/2018,Anders Gidenstam, University of Bors,29,Experimental evaluation,Micro benchmark Threads execute enqueue and dequeue operations on a shared queue High contention. Test Configurations Random 50% / 50%, initi

31、al size 0 Random 50% / 50%, initial size 1000 1 Producer / N-1 Consumers N-1 Producers / 1 Consumer Measured throughput in items/sec #dequeues not returning EMPTY,10/10/2018,Anders Gidenstam, University of Bors,30,Experimental evaluation,Micro benchmark Algorithms Michael & Scott, 1996 Michael & Sco

32、tt, 1996 + Elimination Moir, Nussbaum, Shalev & Shavit, 2005 Hoffman, Shalev & Shavit, 2007 Tsigas & Zhang, 2001 The new Cache-Aware Queue Gidenstam, Sundell & Tsigas, 2010 PC Platform CPU: Intel Core i7 920 2.67 GHz 4 cores with 2 hardware threads each RAM: 6 GB DDR3 1333 MHz Windows 7 64-bit,10/10

33、/2018,Anders Gidenstam, University of Bors,31,Experimental evaluation (i),10/10/2018,Anders Gidenstam, University of Bors,32,Experimental evaluation (ii),10/10/2018,Anders Gidenstam, University of Bors,33,Experimental evaluation (iii),10/10/2018,Anders Gidenstam, University of Bors,34,Experimental e

34、valuation (iv),Conclusions,The Cache-Aware Lock-free Queue The first lock-free queue algorithm for multiple producers/consumers with all of the properties below Designed to be cache-friendly Designed for the weak memory consistency provided by contemporary hardware Is disjoint-access parallel (excep

35、t when near empty) Use thread-local storage for reduced communication Use a linked-list of array blocks for efficient dynamic size support,10/10/2018,Anders Gidenstam, University of Bors,35,10/10/2018,Anders Gidenstam, University of Bors,36,Thank you for listening!Questions?,10/10/2018,Anders Gidens

36、tam, University of Bors,37,Experimental evaluation,Scalable? No. FIFO order limits the possibilities for concurrency,10/10/2018,Anders Gidenstam, University of Bors,38,The Algorithm,Basic idea: “infinite” array of elements. Primary synchronization on the elements Compare-And-Swap (NULL1 - Value - NU

37、LL2 avoids ABA) Divided into blocks of elements New empty blocks added as needed Emptied blocks are removed and eventually reclaimed Block fields: Elements, next, (filled, emptied), deleted. Linked chain of dynamically allocated blocks Lock-free memory management needed for safe reclamation! Beware&

38、Cleanup Gidenstam, Papatriantafilou, Sundell & Tsigas, 2009,10/10/2018,Anders Gidenstam, University of Bors,39,globalTailBlock,globalHeadBlock,*,The Algorithm,10/10/2018,Anders Gidenstam, University of Bors,40,globalTailBlock,globalHeadBlock,System Model: Memory Consistency,A process Reads/writes ma

39、y reach memory out of order Reads of own writes appear in program order Atomic synchronization primitive/instruction Single word Compare-and-Swap Acts as memory barrier for own reads and writes All own reads/writes before are done before All own reads/writes after are done after The affected cache b

40、lock is held exclusively,10/10/2018,Anders Gidenstam, University of Bors,41,Cache-coherency,Maintaining the chain of blocks,Updating globalTailBlock Case 1 Leader Finds the block empty If no next = queue empty (done),10/10/2018,Anders Gidenstam, University of Bors,42,globalTailBlock,globalHeadBlock,*,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1