ImageVerifierCode 换一换
格式:PPT , 页数:24 ,大小:1.49MB ,
资源ID:374411      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-374411.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(End-to-end Data-flow Parallelism for Throughput Optimization .ppt)为本站会员(ideacase155)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

End-to-end Data-flow Parallelism for Throughput Optimization .ppt

1、End-to-end Data-flow Parallelism for Throughput Optimization in High-speed Networks,Esma Yildirim Data Intensive Distributed Computing Laboratory University at Buffalo (SUNY) Condor Week 2011,Motivation,Data grows larger hence the need for speed to transfer it Technology develops with the introducti

2、on of high-speed networks and complex computer architectures which are not fully utilized yet Still many questions are out in the uncertainty,I can not receive the speed I am supposed to get from the network,I have a 10G high-speed network and supercomputers connecting. Why do I still get under 1G t

3、hroughput?,I cant wait for a new protocol to replace the current ones, why cant I get high throughput with what I have at hand?,OK, may be I am asking too much but I want to get optimal settings to achieve maximal throughput,I want to get high throughput without congesting the traffic too much. How

4、can I do it in the application level?,2,Introduction,Users of data-intensive applications need intelligent services and schedulers that will provide models and strategies to optimize their data transfer jobs Goals: Maximize throughput Minimize model overhead Do not cause contention among users Use m

5、inimum number of end-system resources,3,Introduction,Current optical technology supports 100 G transport hence, the utilization of network brings a challenge to the middleware to provide faster data transfer speeds Achieving multiple Gbps throughput have become a burden over TCP-based networks Paral

6、lel streams can solve the problem of network utilization inefficiency of TCP Finding the optimal number of streams is a challenging task With faster networks end-systems have become the major source of bottleneck CPU, NIC and Disk Bottleneck We provide models to decide on the optimal number of paral

7、lelism and CPU/disk stripes,4,Outline,Stork Overview End-system Bottlenecks End-to-end Data-flow Parallelism Optimization Algorithm Conclusions and Future Work,5,Stork Data Scheduler,Implements state-of-the art models and algorithms for data scheduling and optimization Started as part of the Condor

8、project as PhD thesis of Dr. Tevfik Kosar Currently developed at University at Buffalo and funded by NSF Heavily uses some Condor libraries such as ClassAds and DaemonCore,6,Stork Data Scheduler (cont.),Stork v.2.0 is available with enhanced features http:/www.storkproject.org Supports more than 20

9、platforms (mostly Linux flavors) Windows and Azure Cloud support planned soon The most recent enhancement: Throughput Estimation and Optimization Service,7,End-to-end Data Transfer,Method to improve the end-to-end data transfer throughput Application-level Data Flow Parallelism Network level paralle

10、lism (parallel streams) Disk/CPU level parallelism (stripes),8,Network Bottleneck,Step1: Effect of Parallel Streams on Disk-to-disk Transfers Parallel streams can improve the data throughput but only to a certain extent Disk speed presents a major limitation. Parallel streams may have an adverse eff

11、ect if the disk speed upper limit is already reached,9,Disk Bottleneck,Step2: Effect of Parallel Streams on Memory-to-memory Transfers and CPU Utilization Once disk bottleneck is eliminated, parallel streams improve the throughput dramatically Throughput either becomes stable or falls down after rea

12、ching its peak due to network or end-system limitations. Ex:The network interface card limit(10G) could not be reached (e.g.7.5Gbps-internode),10,CPU Bottleneck,Step3: Effect of Striping and Removal of CPU Bottleneck Striped transfers improves the throughput dramatically Network card limit is reache

13、d for inter-node transfers(9Gbps),11,Prediction of Optimal Parallel Stream Number,Throughput formulation : Newtons Iteration Model a , b and c are three unknowns to be solved hence 3 throughput measurements of different parallelism level (n) are needed Sampling strategy: Exponentially increasing par

14、allelism levels Choose points not close to each other Select points that are power of 2: 1, 2, 4, 8, , 2k Stop when the throughput starts to decrease or increase very slowly comparing to the previous level Selection of 3 data points From the available sampling points For every 3-point combination, c

15、alculate the predicted throughput curve Find the distance between the actual and predicted throughput curve Choose the combination with the minimum distance,12,Flow Model of End-to-end Throughput,CPU nodes are considered as nodes of a maximum flow problem Memory-to-memory transfers are simulated wit

16、h dummy source and sink nodes The capacities of disk and network is found by applying parallel stream model by taking into consideration of resource capacities (NIC & CPU),13,Flow Model of End-to-end Throughput,Convert the end-system and network capacities into a flow problem Goal: Provide maximal p

17、ossible data transfer throughput given real-time traffic (maximize(Th) Number of streams per stripe (Nsi) Number of stripes per node (Sx) Number of nodes (Nn),14,Assumptions Parameters not given and found by the model: Available network capacity (Unetwork) Available disk system capacity (Udisk) Para

18、meters given CPU capacity (100% assuming they are idle at the beginning of the transfer) (UCPU) NIC capacity (UNIC) Number of available nodes (Navail),Flow Model of End-to-end Throughput,Variables: Uij = Total capacity of each arc from node i to node j Uf= Maximal (optimal) capacity of each flow (st

19、ripe) Nopt = Number of streams for Uf Xij = Total amount of flow passing i j Xfk = Amount of each flow (stripe) NSi= Number of streams to be used for Xfkij Sxij= Number of stripes passing i j Nn = Number of nodes Inequalities: There is a high positive correlation between the throughput of parallel s

20、treams and CPU utilization The linear relation between CPU utilization and Throughput is presented as :a and b variables are solved by using the sampling throughput and CPU utilization measurements in regression of method of least squares,15,OPTB Algorithm for Homogeneous Resources,This algorithm fi

21、nds the best parallelism values for maximal throughput in homogeneous resources Input parameters: A set of sampling values from sampling algorithm (ThN) Destination CPU, NIC capacities (UCPU, UNIC) Available number of nodes (Navail) Output: Number of streams per stripe (Nsi) Number of stripes per no

22、de (Sx) Number of nodes (Nn) Assumes both source and destination nodes are idle,16,OPTB-Application Case Study,17,9Gbps,Systems: Oliver, Eric Network: LONI (Local Area) Processor: 4 cores Network Interface: 10GigE Ethernet Transfer: Disk-to-disk (Lustre) Available number of nodes: 2,OPTB-Application

23、 Case Study,18,9Gbps,ThNsi=903.41Mbps p=1 ThNsi=954.84 Mbps p=2 ThNsi=990.91 Mbps p=4 ThNsi=953.43 Mbps p=8,Nopt=3 Nsi=2,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,1,1,1,1,1,1,1,

24、1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,2,2,2,2,2,2,2,2,2,4,4,4,4,4,4,4,4,4,8,8,8,8,8,8,8,8,8,OPTB-Application Case Study,19,9Gbps,Sx=2 ThSx1,2,2=1638.48 Sx=4 ThSx1,4,2=3527.23 Sx=8 ThSx2,4,2=4229.33,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= S

25、xij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,Nsi= Sxij=,2,1,2,1,2,1,1,2,1,2,1,2,1,2,1,2,1,0,0,0,0,0,0,0,0,0,0,0,0,2,2,2,2,2,2,2,2,2,2,4,4,4,4,4,4,4,4,4,8,2,4,2,4,2,4,8,2,4,2,4,2,4,8,OPTB-LONI-memory-to-memory-10G,20,OPTB-LONI-memory-to-memory-1G-Algorithm Overhead,21,Conclusions,We ha

26、ve achieved end-to-end data transfer throughput optimization with data flow parallelism Network level parallelism Parallel streams End-system parallelism CPU/Disk striping At both levels we have developed models that predict best combination of stream and stripe numbers,22,Future work,We have focuse

27、d on TCP and GridFTP protocols and we would like to adjust our models for other protocols We have tested these models in 10G network and we plan to test it using a faster network We would like to increase the heterogeneity among the nodes in source or destination,23,Acknowledgements,This project is

28、in part sponsored by the National Science Foundation under award numbers CNS-1131889 (CAREER) Research & Theory OCI-0926701 (Stork) SW Design & Implementation CCF-1115805 (CiC) Stork for Windows Azure We also would like to thank to Dr. Miron Livny and the Condor Team for their continuous support to the Stork project.http:/www.storkproject.org,24,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1