ImageVerifierCode 换一换
格式:PPT , 页数:35 ,大小:850.25KB ,
资源ID:373426      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-373426.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Toward Loosely Coupled Programming on Petascale Systems.ppt)为本站会员(赵齐羽)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Toward Loosely Coupled Programming on Petascale Systems.ppt

1、Toward Loosely Coupled Programming on Petascale Systems,Presenter: Sora Choe,Introduction37 Requirements812 Implementation1317 Microbenchmarks Performance1823 Loosely Coupled Applications.2431 DOCKS and MARS Conclusion and Future Work3234,Index,Emerging petascale computing systems incorporate high-s

2、peed, low-latency interconnects Designed to support tightly coupled parallel computations Most of applications running on these have a SPMD structure Implemented by using MPI for interprocess communication Goal: enable the use of petascale computing systems for task-parallel applications,Introductio

3、n,Problem Space,Many tasks that can be individually scheduled on many different computing resources across multiple administrative boundaries to achieve some larger application goal Emphasis on using much large numbers of computing resources over short periods of time to accomplish many computationa

4、l tasks Primary metrics are in seconds e.g. FLOPS, tasks/sec, MB/sec I/O rates,Many-Task Computing(MTC),MTC applications can be executed efficiently on todays supercomputers A set of problems that must be overcome to make loosely coupled programming practical on emerging petascale architecture Local

5、 resource manager scalability and granularity Efficient utilization of the raw hardware Shared file system contention Application scalability IBM Blue Gene/P supercomputer(also known as Intrepid) Processors = cores = CPUs,Hypothesis,The I/O subsystem of peta. systems offers unique capabilities neede

6、d by MTC applications The cost to manage and run on peta. systems like the BG/P is less than that of conventional clusters or Grids Large-scale systems inevitably have utilization issues Some apps are so demanding that only peta. systems have enough compute power to get results in a reasonable timef

7、rame, or to leverage new opportunities,Why Peta. Sys. For MTC Apps? ( 4 motivating factors),For large-scale and loosely coupled apps to efficiently execute on petascale systems, which are traditionally HPC systems Required mechanisms Multi-level scheduling Efficient task dispatch Extensive use of ca

8、ching to minimize shared infrastructure such as file systems and interconnects,Requirements,Essential because LRM(Cobalt) on BG/P works at a granularity of pset Pset: a group of 64 quad-core compute nodes and one I/O node Allocate compute resources from Cobalt at the pset granularity, and then make

9、these resources available to apps at a single processor core granularity Made possible through Falkon and its resource provisioning mechanism,Multi-Level Scheduling,Overhead of scheduling and starting resources Compute nodes are powered off when not in use and must be booted when allocated to a job

10、Since compute nodes dont have local disks, the boot-up process involves reading the lightweight IBM compute node kernel(Linux-based ZeptoOS kernel image, specifically) from a shared file system Multi-level scheduling reduces it to insignificant overhead over many jobs,Multi-Level Scheduling(cont.),S

11、treamlined task submission framework Falkons specialization leading higher performance LRMs for reservation, policy-based scheduling, accounting, etc. Client frameworks(workflow sys. or distributed scripting systems) for recovery, data staging, job dependency management, etc.2534 tasks/sec in a Linu

12、x cluster 3186 tasks/sec on the SiCortex3071 tasks/sec on the BG/PVS 0.522 jobs/sec on traditional LRMs like Condor or PBS,Efficient Task Dispatch,Compute nodes on BG/P have a shared file system(GPFS) and local file system implemented in RAM(ramdisk) For better app. scalability, Extensive caching of

13、 app. data using ramdisk LFS Minimizing the use of shared file systems Simple caching scheme is employed for Static data : app. Binaries, libraries, common input cached at all compute nodes Dynamic data : input data specific for a single data cached on one compute node,Extensive Use of Caching,Swift

14、 and Falkon Swift enables scientific workflows through a data-flow-based functional parallel programming model Falkon light-weight task execution dispatcher for optimized task throughput and efficiency Extensions to get Falkon to work on BG/P Static Resource Provisioning Alternative Implementations

15、Distributed Falkon Architecture Reliability Issues at Large Scale,Implementation,An app. requests a number of processors for a fixed duration directly from the Cobalt LRM Once the job goes into a running state and the Falkon framework is bootstrapped, the application interacts directly with Falkon t

16、o submit single processor tasks for the duration of the allocation,Static Resource Provisioning,Performance depends on the behavior of our task dispatch mechanisms The initial Falkon implementation 100% Java GT4 Java WS-Core to handle Web Services comm. Alternative Reimplementation some functionalit

17、y in C due to the lack of Java on BG/P Replace WS-based protocol with simple TCP-based protocol TCPCore to handle the TCP-based comm. Protocol Persistent TCP sockets,Alternative Implementations,Distributed Falkon Architecture,Failure on a single node only affects the task being executed on that node

18、, and I/O node failure affect only their respective psets Most errors Reported to the client(Swift) Swift maintains persistent state that allows it to restart a parallel app. script from the point of failure Others Handled directly by Falkon by rescheduling the tasks,Reliability Issues at Large Scal

19、e,Startup Cost,Falkon Task Dispatch Performance,Efficiency and Speedup (small scale),Efficiency and Speedup (large scale),Shared File System Performance (read and/or write by “dd” utility),Shared File System Performance (operation costs),Screens KEGG compounds and drugs against important metabolic p

20、rotein targets A compound that interacts strongly with a receptor associated with a disease may inhibit its function and act as a beneficial drug Simulate the “docking” of small molecules, or ligands, to the “active sites” of large macromolecules of known structure called “receptors” Speeding drug d

21、evelopment by rapidly screening for promising compounds and eliminating costly dead-ends,Molecular Dynamic: DOCK,DOCK6 Performance Evaluation,DOCK5 Performance Evaluation,Micro Analysis of Refinery System An economic modeling app. For petroleum refining developed by D. Hanson and J. Laitner at Argon

22、ne Consists of about 16K lines of C code, and can process many internal model execution iterations(0.5 sechours of BG/P CPU time) The goal of running MARS on the BG/P is to perform detailed multi-variable parameter studies of the behavior of all aspects of petroleum refining,Economic Modeling: MARS,

23、1M MARS tasks on BG/P,Swift can be used To make workloads more dynamic, and reliable, To provide a natural flow from the results of an app. to the input of following stage in a workflow, making complex loosely coupled programming a reality,Running Apps. Through Swift,Swift vs Falkon on MARS app.,Ove

24、rhead Managing the data Creating per-task working directories from the compute nodes Creating and tracking several status and log files for each task Optimization Placing temporary dirs. in local ramdisk rather than the shared file systems Copying the input data to the local ramdisk of the compute n

25、ode for each job execution Creating the per job logs on local ramdisk and copying them only to persistent shared storage at the completion of each job,Swift,Characteristics of MTC application suitable for peta-scale systems Number of tasks number of CPUs Average task execution time O(60 sec) with mi

26、nimal I/O to achieve 90%+ efficiency 1 second of compute per processor core per 550KB of I/O to achieve 90%+ efficiency,Conclusions,Solutions for Main Bottleneck Shared file system is accessed throughout system Startup cost is insignificant for large application Offload to in memory operations so re

27、peated use could be handled completely from memory Read dynamic input data and write dynamic output data from/to shared file system in bulk,Conclusions(cont.),Make better use of the specialized networks on some peta. sys. such as BG/Ps Torus network Exploit unique I/O subsystem capabilities e.g. col

28、lective I/O operations using the specialized high bandwidth and low latency interconnects Have transparent data management solutions To offload the use of shared file sys. resources when local file sys. can handle the scale of data Data caching, proactive data replication, data-aware scheduling Add support for MPI-based apps. in Falkon, the ability to run MPI apps. on an arbitrary number of processors,Future Work,THE END,QUESTIONS?,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1