1、CA 714CA Midterm Review,C5 Cache Optimization,Reduce miss penalty Hardware and software Reduce miss rate Hardware and software Reduce hit time Hardware Complete list of techniques in figure 5.26 on page 499,C5 AMAT,Average memory access time= Hit time + miss rate * miss penalty + Useful in focusing
2、on the memory performance Not the over all system performance Left out CPIbase in the calculation,C5 CPI,CPI calculation used for over all system performance comparison, such as speedup of computer A to computer BCPI = CPIbase + Penalty CPIbase is the CPI without the special case penalty. Penalty is
3、 the penalty cycle per instruction,C5 CPI Example,CPI Calculation Example: CPI for two leveled cache. Assume unified L2 and separate instruction and data cache. CPI = CPIbase +Penalty CPIbase depends on the program and processor. Penalty = L1 miss penalty + L2 miss penalty L1 miss penalty = data L1
4、miss penalty + instruction miss penalty Data L1 miss penalty = data L1 access per instr * data L1 miss rate * data L1 miss penalty. Instruction miss penalty = instruction L1 access per instr * instruction L1 miss rate * instruction L1 miss penalty. L2 miss penalty = L2 access per instr * L2 miss rat
5、e * L2 miss penalty. L2 access per instr = instruction L1 access per instr * instruction L1 miss rate + data L1 access per instr * data L1 miss rate,C5 Virtual Memory,Easier to program in virtual memory Additional hardware needed to translate between the two OS is usually used to translate the two T
6、LB is used to cached the translation result for faster translation.,Cpu virtual address,$ virtual/physical address,Mem physical address,TLB,Virtual address,Physical address,C5 VM,CPI calculation for memory with VM CPI = CPIbase + Penalty Penalty = TLB miss penalty cycle per instruction= TLB miss per
7、 instruction * penalty cycle per TLB miss= TLB access per instruction * TLB miss rate * penalty cycle per TLB missTLB access per instruction is for both data and instruction access.,C7 Disk,Average disk access time = average seek time + average rotational delay + transfer time + controller overhead
8、Average rotational delay = time for platter to rotate half a cycle Transfer time = size of access / transfer speed Transfer speed = rotational speed * size of tracks Assuming the bit are read off continuously as the disk head pass over it.,C7 RAID,Use small disks to build a large storage system Smal
9、ler disks are mass produce and so cheaper Large number of disks results high failure rate Use redundancy to lower failure rate RAID 2: Mirror RAID 3: bit interleave RAID 4/5 : distributed bit interleaving,C7 RAID,Mean time to data loss MTTDL = MTTF2disk/ (N*(G-1)*MTTRdisk) N = total number of disks
10、in the system G = number of disks in the bit protected group MTTR = mean time to repair= mean time to detection + mean time to replacement 1/MTTF = 1/MTTF(component),All component,C7 RAID,MTTDL exampleRAID 2 system with 10 GB total capacity. Individual disk are 1 GB each with MTTFdisk 1000000 hr. As
11、sume MTTR of 1 hr Solution G = 2 since each disk is mirrored. N = 20, 10 for 10 GB of capacity 10 for mirroringMTTDL = MTTF2disk/ (N*(G-1)*MTTRdisk)= 10000002/ (20*(2-1)*1disk),C7 Queuing Theory,Used to analyze system resource requirement and performance + more accurate than the simple extreme case
12、study + less complicated than the full simulation study Results are based on exponential distribution modeling of the arrival rate,C8,Class of connections ordered by decreasing distance, bandwidth, latency Wan/internet LAN SAN Bus,C8,Total transfer time = sender overhead + Time of flight + message s
13、ize/bandwidth + receiver overhead Latency = sender overhead + Time of flight + receiver overhead Hard to improve latency, easy to improve bandwidth Effect bandwidth = Message size / Total transfer time Total transfer time is dominated by latency as bandwidth increases. Need bigger message size to am
14、eliorate the latency overhead,C6 Multiprocessor Limit,Amdahls Law Speedup = 1/(fraction enhanced/speedup) + fraction not enhanced Fraction enhanced is limited by sequential portion of the program Communication overhead may limit speedup for the parallel portion.,C6 scaling calculation for app,Matrix
15、 multiplication. Take the squaring special case. A*A A is a square matrix with n elements. P is the number of processors Computation scaling: A is n.5 * n.5 matrix. Calculation complexity is then n1.5Diving between p processor gives (n1.5)/p Communication scaling:Assume matrix A is tiled into square
16、 with side dimension n.5/p.5elements needed are row and column this gives 2* (n.5/p.5)(n.5) - (n.5/p.5) n/p.5 Computation to communication scalingn.5/p.5,C6 Type of multiprocessor,SISD : single instruction single data 5 stage pipe lined risc processor SIMD: single instruction multiple data Vector pr
17、ocessor MISD: multiple instruction single data MIMD: multiple instruction multiple data Super scalar, clustered machines, VLIW,C6 dealing with parallelism,Shared memory Communication between processors are implicit Message passing Explicit communication They are equivalent in term of functionality. Can build on from another,
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1