ImageVerifierCode 换一换
格式:PPT , 页数:45 ,大小:1.80MB ,
资源ID:377909      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-377909.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(A Vector API for Java.ppt)为本站会员(twoload295)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

A Vector API for Java.ppt

1、A Vector API for Java,Ian Graves ,Legal Disclaimers,2,INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELS TERMS AND CONDITIO

2、NS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYR

3、IGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A “Mission Critical Application“ is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTELS PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDE

4、MNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY

5、, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without noti

6、ce. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved“ or “undefined“. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The informati

7、on here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available o

8、n request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or

9、 go to: http:/ Intel, the Intel logo, Intel Xeon, and Xeon logos are trademarks of Intel Corporation in the U.S. and/or other countries. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor fam

10、ilies: Go to: Learn About Intel Processor Numbers http:/ *Other names and brands may be claimed as the property of others. Copyright 2015 Intel Corporation. All rights reserved.,Legal Disclaimers Continued,3,Some results have been estimated based on internal Intel analysis and are provided for infor

11、mational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are

12、measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performa

13、nce of that product when combined with other products. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmar

14、ks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platfo

15、rm into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported. SPEC, SPECint, SPECfp, SPECrate, SPECpower, SPECjbb, SPECompG, SPEC MPI, and SPECjEnterprise* are trademarks of

16、the Standard Performance Evaluation Corporation. See http:/www.spec.org for more information. TPC Benchmark, TPC-C, TPC-H, and TPC-E are trademarks of the Transaction Processing Council. See http:/www.tpc.org for more information. Intel Advanced Vector Extensions (Intel AVX)* are designed to achieve

17、 higher throughput to certain integer and floating point operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel Turbo Boost Technology 2.0 to not achieve any or maximum tu

18、rbo frequencies. Performance varies depending on hardware, software, and system configuration and you should consult your system manufacturer for more information. Intel Advanced Vector Extensions refers to Intel AVX, Intel AVX2 or Intel AVX-512. For more information on Intel Turbo Boost Technology

19、2.0, visit http:/ In this Presentation,Is still a rough prototype! Subject to change! Part of the OpenJDK Project Panama Licensed Under GPLv2 With ClassPath Exception Get the code here! http:/ CodeSnippets Vector API Design Wrap Up,Introduction: Vector API Project Team,Oracle Vladimir Ivanov John Ro

20、se Paul Sandoz Intel Michael Berg Steve Dohrmann Ian Graves Shravya Rukmannagari Sandhya Viswanathan,Terminology,Code Snippets: Encoding instructions as data in Java Binding to MethodHandle Vector API: API encompassing operations with vector instruction support. Implemented on top of Code Snippets.,

21、Motivation,Many popular applications benefit from data-parallel computations Architectural support remains opaque to the JVM developer Looking to expose “pure Java” performant solutions that map to the architecture well. No JNI interfacing single language solutions Minimized Boilerplate generated co

22、de is good quality,Project Goals,Expose data-parallel vector operations for developer use in Java Portability and performance Scalability Idiomatic,Code Snippets,CodeSnippets as a Substrate,A portable API for expressing primitives More flexible than HotSpot intrinsics Less technical debt with Graal

23、on the horizon ISAs can use the same API In prototype phase, but good perf observed Value objects to registers MethodHandle invocation achieves good code quality.,Implementing a Primitive,Primitives Bind to MethodHandle Invoked via MethodHandle methods MethodHandles library has additional combinator

24、s Types of CodeSnippets represented as MethodType objects Vector represented by Long2/4/8 objects Wrappers for 128,256,and 512-bit values. Wrappers are elided in the best case. Values registerized. Escape analysis a work in progress,Binding to Machine Instruction,static final MethodType MT_L4_BINARY

25、 = MethodType.methodType(Long4.class, Long4.class, Long4.class);private static final MethodHandle MHm256_vaddps = MachineCodeSnippet.make(“mm256_vaddps“, MT_L4_BINARY, requires(AVX),new RegisterxmmRegistersSSE, xmmRegistersSSE, xmmRegistersSSE,(Register regs) - Register out = regs0;Register in1 = re

26、gs1;Register in2 = regs2;int vex = vex_prefix(rBit(out),X_LOW,bBit(in2),M_0F,W_LOW,in1,L_256,PP_NONE);return vex_emit(vex, 0x58, modRM(out, in2););,Registers via JVMCI,Desired Register Masks,MethodHandle Type,Feature-checking predicate,Macro-ized x86 encoding,Checked Invocation,private static Long4

27、vaddps_naive(Long4 a, Long4 b) float res = new float8;for (int i = 0; i 8; i+) resi = getFloat(a, i) + getFloat(b, i);return long4FromFloatArray(res,0);public static Long4 vaddps(Long4 a, Long4 b) try Long4 res = (Long4) MHm256_vaddps.invokeExact(a, b);assert assertEquals(res, vaddps_naive(a, b);ret

28、urn res; catch (Throwable e) throw new Error(e);,Pure Java equivalent function.,Type-safe invocation point.,A Small Example,public static float proc(float left, float right, float res)if(left.length != right.length)throw new UnsupportedOperationException(“Arrays unequal.“); else if (left.length % 8

29、!= 0) throw new UnsupportedOperationException(“Length must be n*8“);for(int i = 0; i left.length; i+=8)addArrays(left,right,res,i);return res; /Convenience,Loop Kernel,Small Example (contd),/Isolated for code quality purposes in prototypepublic static void addArrays(float left, float right, float re

30、s, int i)/VMOVDQU ymmX, YMMWORD PTR Long4 l = PatchableVecUtils.long4FromFloatArray(left,i);Long4 rr = PatchableVecUtils.vaddps(l,right,i);/VMOVDQU YMMWORD PTR , ymmXPatchableVecUtils.long4ToFloatArray(res,i,rr);,Scaled load,Scaled store,vaddps reg, YMMWORD PTR .,Generating C2 Code,java -XaddExports

31、:java.base/jdk.internal.misc=ALL-UNNAMED -XaddExports:java.base/jdk.internal.vm.annotation=ALL-UNNAMED -XX:+UnlockDiagnosticVMOptions -XX:-UseSuperWord-XX:LoopMaxUnroll=1-XX:PrintAssemblyOptions=intel -XX:CompileCommand=option,*AddArraysLong4PS:addArrays,PrintAssembly-cp build AddArraysLong4PS,Snipp

32、ets!,Generated Code,Performance of This Example,Compared to Scalar implementation Disabled SuperWord and Loop Unrolling We see a 40% reduction in clock cycles spent in the loop kernel with the vectorized version. This workload is a prototype PoC, we need more advanced workloads that better leverage

33、vectorization. Bigger, more intensive workloads to come Wall clock time indicates overhead coming from outside of the loop kernel vs. the scalar version more work to do!,The Vector API,Java Needs an Abstraction for Vectors,Vector ISA Extensions are powerful, expressive, and deep. Most instructions h

34、ave many different forms and support differing operand sizes NxM problems abound for API writers Needs to be to capture the essence of vectorization in the spirit of Java Platform independence Snippets too low level Meaningful static checking Familiar patterns to abstract operational complexity,Vect

35、or API,Intended API to encompass the CodeSnippets implementation Proposed by John Rose*. Work continues within the Panama Project interface Vector S - Shape type describes the size of the Vector E - The element type of the Vector Broadest support for Float, Integer, Double Draft implementations chec

36、ked into Project Panama,* http:/ of the API,Vector,FloatVector,FloatVector128,FloatVector256,FloatVectorXYZ,Factory-Constructed Classes,Factory methods here.,Basic Vector-Vector Functionality,interface Vector Vector add(Vector v2);Vector mul(Vector v2);Vector and(Vector v2); ,Immutability!,More Adva

37、nced,interface Vector E getElement(int i);Vector putElement(int i, E elem);E sumAll();E toArray();fromArray(E ary, int offset); ,Scalar/Vector Interfacing,Horizontal Reductions. Multiple snippets.,Loading and storing to arrays,Fully Realized Expressiveness,interface Vector Vector map(UnaryOperator o

38、p);Vector mapWhere(Mask mask, UnaryOperator op);Vector map(BinaryOperator op, Vector v2);Vector mapWhere(Mask mask, BinaryOperator op, Vector this2); ,Kernel with Vector API,public static void addArrays(float left, float right, float res, int i)FloatVector l = float256FromArray(left,i),r = float256F

39、romArray(right,i),lr = l.add(r);lr.intoArray(res,i); ,27,Higher Order Components,Highly desirable, modern part of this API A programmer specifies a loop body Minimal thought given to vectorization Using regular arithmetic and logical syntactic operators Requires a way to “crack” or inspect lambdas a

40、t runtime Ways Forward We need better control of our higher order components Factories for constructing primitive arithmetic operations Need to be composable,Kernel Construction,We can construct our “higher order” operations from existing parts. We can constrain our support to operations that are ve

41、ctorizable. Arity-one, or arity-two (maybe three) operations Restricting to arithmetic and logical operations that are broadly supported Our existing work on CodeSnippets can form the base! MethodHandles are highly composable, even with snippets,f = (x,y) - (x+y) * y;,MethodType mt = MethodType.meth

42、odType(Long4.class,Long4.class,Long4.class);MethodHandle MHm256_vaddps = CodeSnippet.make(,mt,),MHm256_vmulps = CodeSnippet.make(,mt,);MethodHandle f_pre = MethodHandles.collectArguments(MHm256_vmulps, 0, MHm256_vaddps);MethodHandle f = MethodHandles.permuteArguments(f_pre,mt,0,1,1);,Statically Type

43、d Wrappers,A layer over MethodHandles for encapsulating the lower level details and making them type safe will coincide with the existing API spec. One method proposed is VectorOp Proposed on Project Panama* Vector Operations explicit and exposed to the user to compose and use as kernels. Another ap

44、proach is to use a lightweight syntax tree Hand off to a Vector object for interpretation/conversion to an equivalent MethodHandle structure for execution. Vector objects visit the tree to compose the according MethodHandles. Same syntax trees could be handed off to different Vector types. Still ver

45、y much in the works!,* http:/ Thoughts.,Most Vector operations are simple expressions Expressions are (basically) trees MethodHandles can be combined together in a tree-like fashion permuteArguments() collectArguments() filterArguments() filterReturn() Method Handles have added benefits (high level

46、models matter!) Weve already observed good code with Method Handles, so lets try it! Coding this way can elide the need to box Long2/4/8,32,Expressions Bind to Method Handles.,33,*,+,y,y,x,(x,y) -,AST Visitor,Theres more!,34,34,*,+,y,y,x,(x,y) -,256_visitor,128_visitor,XYZ_visitor,Babys First EDSL,i

47、nterface Expression default Expression add(Expression right)return new AddExpression(this,right);default Expression mul(Expression right)return new MulExpression(this,right);default Expression not()return new NotExpression(this);default Expression trace(Consumer f)return new TraceExpression(this,f);

48、default Expression fromFloat(Float f)return new ConstExpression(f);R evaluate(ExpressionEvaluator e); ,35,Careful!,BinaryOperation expr = (l,r) - Expression e1 = l.add(r);return e1.mul(r); ,36,expr.apply(Symbol.LEFT,Symbol.RIGHT);To populate leaf nodes. Symbol non-public.,MethodHandle binaryReduction(float left, float right, float dst, BinaryOperator);MethodHandle br = binaryReduction(left,right,dst,(l,r) - Expression e1 = l.add(r);return e1.mul(r); );/Execute the entire computation br.invokeExact();/Making it hot for inspection for(int i = 0; i BIGNUMBER; i+)br.invokeExact(),

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1