ImageVerifierCode 换一换
格式:PPT , 页数:50 ,大小:1.85MB ,
资源ID:373173      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-373173.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(A Practical Approach toExploiting Coarse-GrainedPipeline .ppt)为本站会员(花仙子)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

A Practical Approach toExploiting Coarse-GrainedPipeline .ppt

1、A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs William Thies, Vikram Chandrasekhar, Saman Amarasinghe,Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of TechnologyMICRO 40 December 4, 2007,Legacy Code,310 billion lines of legacy c

2、ode in industry today 60-80% of typical IT budget spent re-engineering legacy code (Source: Gartner Group) Now code must be migrated to multicore machines Current best practice: manual translation,Parallelization: Man vs. Compiler,Parallelization: Man vs. Compiler,Parallelization: Man vs. Compiler,P

3、arallelization: Man vs. Compiler,Parallelization: Man vs. Compiler,Parallelization: Man vs. Compiler,Parallelization: Man vs. Compiler,Parallelization: Man vs. Compiler,Can we improve compilers by making them more human?,Humanizing Compilers,Current: An Omnipotent Being,New: An Expert Programmer,Ric

4、hard Stallman,First step: change our expectations of correctness,Zeus,Humanizing Compilers,First step: change our expectations of correctness Second step: use compilers differently Option A: Treat them like a programmer Transformations distrusted, subject to test Compiler must examine failures and f

5、ix themOption B: Treat them like a tool Make suggestions to programmer Assist programmers in understanding high-level structure How does this change the problem? Can utilize unsound but useful information In this talk: utilize dynamic analysis,Dynamic Analysis for Extracting Coarse-Grained Paralleli

6、sm from C,Dynamic Analysis for Extracting Coarse-Grained Parallelism from C,Focus on stream programs Audio, video, DSP, networking, and cryptographic processing kernels Regular communication patterns Static analysis complex or intractable Potential aliasing (pointer arithmetic, function pointers, et

7、c.) Heap manipulation (e.g., Huffman tree) Circular buffers (modulo ops) Correlated input parameters Dynamic analysis promising Observe flow of data Very few variations at runtime,Adder,LPF1,LPF2,LPF3,HPF1,HPF2,HPF3,Speaker,AtoD,FMDemod,Scatter,Gather,Dynamic Analysis for Extracting Coarse-Grained P

8、arallelism from C,Focus on stream programs Audio, video, DSP, networking, and cryptographic processing kernels Regular communication patterns Static analysis complex or intractable Potential aliasing (pointer arithmetic, function pointers, etc.) Heap manipulation (e.g., Huffman tree) Circular buffer

9、s (modulo ops) Correlated input parameters Opportunity for dynamic analysis If flow of data is very stable, can infer it with a small sample,Adder,LPF1,LPF2,LPF3,HPF1,HPF2,HPF3,Speaker,AtoD,FMDemod,Scatter,Gather,Overview of Our Approach,Original Program,Annotated Program,Mark Potential Actor Bounda

10、ries,Run Dynamic Analysis,No,Hand Parallelized Program,Auto Parallelized Program,Satisfied with Parallelism?,Yes,Communicate data by hand,Communicate based on trace,test and refine using multiple inputs,MPEG-2 Decoder,Stability of MPEG-2,MPEG-2 Decoder,Stability of MPEG-2,Top 10 YouTube Videos,Stabi

11、lity of MPEG-2 (Within an Execution),Frame,Stability of MPEG-2 (Across Executions),Minimum number of training iterations (frames) needed on each video in order to correctly decode the other videos.,Stability of MPEG-2 (Across Executions),Minimum number of training iterations (frames) needed on each

12、video in order to correctly decode the other videos.,5 frames of training on one video is sufficient to correctly parallelize any other video,Stability of MP3 (Across Executions),Minimum number of training iterations (frames) needed on each track in order to correctly decode the other tracks.,Stabil

13、ity of MP3 (Across Executions),Minimum number of training iterations (frames) needed on each track in order to correctly decode the other tracks.,Stability of MP3 (Across Executions),Layer 1 frames,Minimum number of training iterations (frames) needed on each track in order to correctly decode the o

14、ther tracks.,Stability of MP3 (Across Executions),CRC Error,Minimum number of training iterations (frames) needed on each track in order to correctly decode the other tracks.,Stability of MP3 (Across Executions),Minimum number of training iterations (frames) needed on each track in order to correctl

15、y decode the other tracks.,Outline,Analysis Tool Case Studies,Outline,Analysis Tool Case Studies,Annotating Pipeline Parallelism,Programmer indicates potential actor boundaries in a long-running loop,Annotating Pipeline Parallelism,Programmer indicates potential actor boundaries in a long-running lo

16、op,Annotating Pipeline Parallelism,Programmer indicates potential actor boundaries in a long-running loop,Annotating Pipeline Parallelism,Programmer indicates potential actor boundaries in a long-running loopServes as a fundamental API for pipeline parallelism Comparable to OpenMP for data paralleli

17、sm Comparable to Threads for task parallelism,Legacy C Code,MP3 Decoding,while (!end_bs(. ,Dynamic Analysis,Legacy C Code,Record Who Produces / Consumes each Location,MP3 Decoding,Huffman () ,Dequantize() ,Antialias() ,Hybrid() ,Polyphase() ,out_fifo() ,while (!end_bs(. ,Build Block Diagram,Mem,Huff

18、man(),Antialias(),Polyphase(),out_fifo(),Dynamic Analysis,Implemented Using Valgrind,Dequantize(),Hybrid(),Dequantize(),Dequantize(),Hybrid(),Huffman(),Antialias(),Polyphase(),out_fifo(),Exploiting the Parallelism,Stateless stage (data parallel),Antialias(),Polyphase(),Stateful stage (sequential),Hy

19、brid(),Huffman(),Antialias(),Polyphase(),out_fifo(),Exploiting the Parallelism,Reorder(),Dequantize(),Antialias(),for (i=0; iN; i+) PIPELINE(); Dequantize(); PIPELINE(); . ,Polyphase(),Stateless stage (data parallel),Stateful stage (sequential),Hybrid(),Huffman(),Antialias(),Polyphase(),out_fifo(),E

20、xploiting the Parallelism,Reorder(),Dequantize(),Antialias(),for (i=0; iN; i+) PIPELINE(N); Dequantize(); PIPELINE(); . ,Polyphase(),Stateless stage (data parallel),Stateful stage (sequential),DequantizeN(),Dequantize1(),Dequantize(),Hybrid(),Huffman(),Antialias(),Polyphase(),out_fifo(),Exploiting t

21、he Parallelism,Antialias(),for (i=0; iN; i+) PIPELINE(N); Dequantize(); PIPELINE(); . ,Polyphase(),Stateless stage (data parallel),Stateful stage (sequential),Parallel Runtime Environment,Pipeline parallelism requires buffering between stages Two ways to implement buffering: 1. Modify original progr

22、am to add buffers 2. Wrap original code in virtual execution environment We fork each actor into an independent process, and communicate the recorded variables via pipesRobust in the presence of aliasing Suitable to shared or distributed memory Efficient (7% communication overhead on MP3),Mem,Dequan

23、tize(),Programmer assistance needed for: - mallocd data- nested loops- reduction vars,pipe,Outline,Analysis Tool Case Studies,Extracted Stream Graphs,Ground Moving Target Indicator (GMTI),Extracted with tool:,From GMTI specification:,Audio and Video Codecs,MP3 Decoder,MPEG-2 Decoder,SPEC Benchmarks,

24、197.parser,256.bzip2 (compression),256.bzip2 (decompression),456.hmmer,Interactive Parallelization Process,Analysis tool exposed serializing dependences As annotated back-edges in stream graph (main.c:9 fft.c:5) How to deal with serializing dependences? 1. Rewrite code to eliminate dependence, or 2.

25、 Instruct the tool to ignore the dependence Lesson learned: Many memory dependences can be safely ignored! Allow malloc (or free) to be called in any order (GMTI, hmmer) Allow rand() to be called in any order (hmmer) Ignore dependences on uninitialized memory (parser) Ignore ordering of demand-drive

26、n buffer expansion (hmmer),Results,On two AMD 270 dual-core processors,Results,Profiled for 10 iterations of training data Ran for complete length of testing data Only observed unsoundness: MP3,How to Improve Soundness?,Revert to sequential version upon seeing new code (fixes MP3) Hardware support M

27、ondriaan memory protection (Witchel et. al) Versioned memory (used by Bridges et al.) Would provide safe communication, but unsafe parallelism Rigorous testing with maximal code coverage Programmer review,Related Work,Revisiting the Sequential Programming Model for Multi-Core (Bridges et al., yester

28、day) Same pipeline-parallel decompositions of parser, bzip2 Like commutative annotation, we tell tool to ignore dependences But since we target distributed memory, annotation represents privatization rather than reordering Dynamic analysis for understanding, parallelization Rul et. al (2006) program

29、mer manages communication Redux (2003) fine-grained dependence visualization Karkowski and Corporaal (1997) focus on data parallelism Inspector/executor for DOACROSS parallelism Rauchwerger (1998) survey,Conclusions,Dynamic analysis can be useful for parallelization Our tool is simple, transparent, and one of the first to extract coarse-grained pipeline parallelism from C programs Primary application: program understanding Secondary application: automatic parallelization Future work in improving soundness, automation,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1