ImageVerifierCode 换一换
格式:PPT , 页数:130 ,大小:790KB ,
资源ID:373399      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-373399.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Threading Full Auto Physics.ppt)为本站会员(visitstep340)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Threading Full Auto Physics.ppt

1、Threading Full Auto Physics,David Wu Director of Technology Pseudo Interactive,Agenda,About Full Auto About Pseudo Review of Threading Techniques Full Auto Threading Strategy Full Auto Threading Reality What went right, what went wrong Random Notes,Full Auto,Released on Feb 14 2006 for the Xbox 360,

2、Full Auto,Design Vision: “The Most destructive racing game ever” My Vision: “Physics running on multiple cores” Which I later updated to: “Physics running on multiple cores that has few race conditions and doesnt crash too often” Marketing preferred the Design Vision.,About Pseudo Interactive,Pseudo

3、 Interactive is a team of great game developers.,Types of Parallelism,Pipelined Stages Discrete stages that feed into one another I.e. Physics - Render,Pipeline,Similar to CPU/GPU parallelism,T=4,T=3,CPU Game Frame 3,GPU Render Frame 2,CPU Game Frame 4,GPU Render Frame 3,T=5,CPU Game Frame 5,GPU Ren

4、der Frame 4,T=4,T=3,Pipeline,Thread 0 collision detection Frame 3,Thread 1 Logic/AI Frame 2,Thread 2 Integration Frame 1,Thread 0 collision detection Frame 4,Thread 1 Logic/AI Frame 3,Thread 2 Integration Frame 2,T=5,Thread 0 collision detection Frame 5,Thread 1 Logic/AI Frame 4,Thread 2 Integration

5、 Frame 3,Types of Parallelism,Pipelined Stages Discrete stages that feed into one another I.e. Physics - Render Forking Divide data and solve on multiple threads at once I.e. Collision detection on independent objects,Forking Data Parallelism,Perform the same task on multiple independent objects in

6、parallel. Thread “forks” into multiple threads across multiple processors All threads repeatedly grab pending objects indiscriminately and execute the task on them When finished, threads combine back into the original thread.,Forking,Object A Thread 2,Object B Thread 0,Object C Thread 1,Fork,combine

7、,5-Mar-05,Types of Parallelism,Pipelined Stages Discrete stages that feed into one another I.e. Physics - Render Forking Divide data and solve on multiple threads at once I.e. Collision detection on independent objects Jobs Spin off tasks to be complete asynchronously Lower priority tasks not in the

8、 critical path I.e. Effects, procedural mesh and texture updates,Jobs,Free Threads,High Priority Thread allocation,Main Game Loop Physics, Ai, Rendering Etc.,Job Task,Job QueueJob Job Job,Synchronous Deterministic Critical Path Latency critical,Asynchronous Non-deterministic Low Priority Latency tol

9、erant,Full Auto X360 Thread Allocation,Rendering,Main Thread Physics Game Logic Animation AI Network etc,Audio,Worker threads,Thread 0 | Thread 1 _Thread 2 | Thread 3 _Thread 4 | Thread 5,Voice Encode/decode,Full Auto Pipeline,Collision Detection,Collect Net data and player input,Game Logic, Animati

10、on, AI,Integration and Constraint Solving,Kick render, audio, Network data,30% CPU time,40% CPU time,30% CPU time,Full Auto Pipeline,Collision Detection,Collect Net data and player input,Game Logic, Animation, AI,Integration and Constraint Solving,Kick render, audio, Network data,30% CPU time,40% CP

11、U time,30% CPU time,Collision Detection,All objects stored in a global kdop tree. Moving objects can collide with moving objects and idle objects idle objects cannot collide with idle objects. Main thread forks into 5 threads (soliciting the worker threads),Collision Detection,Broad Phase,Moving Obj

12、ects,Constraint Manager,Narrow Phase,Callbacks,Register Contacts,KDOP Tree All Objects Read Only,Logic Code,Thread,Thread,Thread,Thread,Pipeline?,Broad Phase (BVH),Narrow Phase (GJK),Callbacks,Register Contacts,This would not improve latency and would require double buffering of data,Forking,Collisi

13、on Detection Tasks,Each moving object is a task Threads pop objects off of the todo list one by one using interlocked access until they are all processed. To avoid duplicate tests, moving objects are only tested vs other moving objects whose memory address is lower,Collision Detection,Object N Broad

14、 Phase,Moving Objects,Constraint Manager,Object N Narrow Phase,Object N Callbacks,Object N Register Contacts,KDOP Tree All Objects Read Only,Logic Code,Race Conditions,There are multiple threads registering contacts simultaneously. To avoid race conditions we need atomic access There are many terms

15、and concepts that will suit our needs: Mutex Critical Section Semaphore Spin Lock,Mutex Rant,Most researchers assert and often prove that locks/mutexes suck I am convinced that this is true But everything else sucks more,Mutex 101,We use: atomic primitives whenever possible. (InterlockedIncrement(),

16、cellAtomicIncrement(), etc) a Mutex variant for everything else. I.e. constraintManagerMutex.Acquire() Register contacts constraintManagerMutex.Release();,Spin lock Mutex.,Acquire() looks like:while(_InterlockedCompareExchange(With slight variations you can have a recursive mutex or a read/write mut

17、ex. System mutexs will set up events and sleep. Useful if you have more software threads than hardware threads,Collision Detection,Object N Broad Phase,Moving Objects,Constraint Manager,Object N Narrow Phase,Object N Callbacks,Object N Register Contacts,KDOP Tree All Objects Read Only,Logic Code,Mut

18、ex,Mutex,Constraint Manager,The constraint manager keeps track of all interacting objects. The connected sub graphs form simulation batches of coupled objects Objects that must be solved together. More on this later,Collision Detection Results,There were enough objects to distribute tasks evenly wit

19、h good load balancing. In the median case with 5 threads across 3 cores we had a 3-4x performance increase as compared to 1 thread on 1 core. The worst case was about 2x, this was due to rare situations in which there were a lot of call-backs that held locks for too long. We wanted to improve this,C

20、ollision Detection Callbacks,We have many logic call-backs that trigger in response to contacts. Due to the quantity of logic code that was not thread safe, we added a lock around all call-backs that were suspicious. This way they could think that that are single threaded. They know that no other th

21、read agnostic logic is running while they are running. Call-backs are not allowed to access global collision data, they use information that is passed to them regarding the specific contact.,Collision Detection,Object N Broad Phase,Moving Objects,Constraint Manager,Object N Narrow Phase,Object N Cal

22、lbacks,Object N Register Contacts,KDOP Tree All Objects Read Only,Less thread friendly logic code,Mutex,Mutex,Thread friendly logic code,Collision Detection Callbacks,I considered adding extra software threads to help reduce the impact of waiting on locks. I.e.: Each thread runs as usual, but when i

23、t needs a lock that is taken, it pushes its current context and starts from the beginning with the next available object. When the lock is available, it goes back to its first object and finishes that before continuing with the second object.,Collision Detection Callbacks,It is generally important t

24、hat you test all hypothesis. Bonus points if you talk about the results at GDC to save other game programmers time. Even if you dont want points you might get a free pass to GDC if your lecture is accepted. That being said, it felt messy and I didnt want a lot of software threads hanging around, so

25、I didnt try it.,Collision Detection Callbacks,In the future we plan to make all call-backs deferred, with no assumptions as to ordering. This will be needed for PS3, where collision detection will not occur on a general purpose CPU. I anticipate a lot of debugging and pain.,PPU,PPU,SPU,Collision Det

26、ection PS3,SPU,Object N Broad Phase,Moving Objects,Object N Narrow Phase,Callback Queue,Contact Queue,KDOP Tree All Objects Read Only,Logic Code,Constraint Manager,Collision Detection Implementation,Surprisingly the implementation was not too difficult and there were few tortuous moments. For brief

27、period, I thought that threading was easy money. I was wrong.,Full Auto Main Thread,Collision Detection,Collect Net data and player input,Game Logic, Animation, AI,Integration and Constraint Solving,Kick render, audio, Network data,30% CPU time,40% CPU time,30% CPU time,Other Threads,Other Threads,O

28、ther Threads,Other Threads,Thread 0,Game Logic First Plan,Game Logic Animation AI,Geo Queries Hull Tracing Line of sight Sensors,Kinematics,Game Logic is single threaded. Engine operations fork across cpus to speed up expensive operations,Game Logic First Plan,Rationale: Threading is hidden in engin

29、e code Game logic code does not need to know about it. Game logic code is more volatile (more changes) and there is more of it, so thread safety is more costly to implement. We were hoping that most of the time was spent in Engine functions that could be paralyzed well.,Game Logic First Plan Results

30、,Not much of a change. In Full Auto, the majority of time was spent outside of engine code. This is likely genre specific, i.e. in some games a lot more time is spent in line of sight checks .,All Threads,All Threads,All Threads,All Threads,Thread 0,Game Logic Second Plan,Less Thread Friendly Game L

31、ogic,Geo Queries Hull Tracing Line of sight Sensors,Kinematics,Thread Friendly Game Logic 1 functional per thread,Main thread forks Each thread friendly game logic “functional” executes in any thread. Once these are done, the threads merge and thread 0 executes the less thread friendly game logic in

32、 a single threaded manner.,Game Logic Second Plan,Initially all logic was “less thread friendly” We made functionals thread friendly one by one based on: How much CPU time they used How much programmer time it would take to make them thread friendly,Game Logic Second Plan,The functionals that we con

33、verted include: Car control system Animation All actuators, constraints, potentials (ropes, springs, etc) Particle Emitters Sound Emitters Sensors Some AI,Game Logic Second Plan,Some that we wanted to convert, but that didnt make it: Some AI Race Logic Weapons As expected, these became performance h

34、otspots Amdahls Law Sucks,Full Auto Main Thread,Collision Detection,Collect Net data and player input,Game Logic, Animation, AI,Integration and Constraint Solving,Kick render, audio, Network data,30% CPU time,40% CPU time,30% CPU time,Integration,Integration Batches,Each batch contains one or more o

35、bjects who are physically interacting,Integrate and solve constraints,Update BVH, dependant kinematics, VSD,Write state to record buffer,Select batch with no outstanding dependencies,All Threads,All Threads,All Threads,Integration Forked,Integration Batches,Integrate and solve constraints,Update BVH

36、, dependant kinematics, VSD,Write state to record buffer,Select batch with no outstanding dependencies,Record Buffer,atomicInc,Mutex,BVH,VSD,Mutex,Mutex,Integration Results,3-4x speedup in all tests using 5 threads across 3 cores.,Damage,Damage Processing took a lot of time. We spun off all non crit

37、ical-path processing into jobs. I.e. Modifying meshes to reflect FEM damage Adding decals Building meshes for chunks from their source geo.,Damage,Jobs are executed only when threads are not busy doing other things In high stress situations, it takes several frames for damage effects to display.,Fre

38、e Threads,High Priority Thread allocation,Main Game Loop Physics, Ai, Rendering Etc.,Job Task,Job QueueJob Job Job,Synchronous Deterministic Critical Path Latency critical,Asynchronous Non-deterministic Low Priority Latency tolerant,Rendering,Rendering executed in its own thread which ran in paralle

39、l with the main game loop It only used one thread. We were unable to find a way to fork anything significant given the nature of the API and our timeline. The API is better now.,What Went Right?,Threaded Physics Early Used all processors and threads Performance actually improved We debugged just abo

40、ut all of it before we shipped.,What Went Wrong,Not enough polish time There were some easy money perf gains that we missed I am a fan of consistant performance. Lots of the time we run at 60, but some times things get slow, and I have no idea why. Not enough time on CPU rendering perf. We spend mor

41、e time in DX code than in physics Too much debugging. Many of the bugs were rogue interactions between the renderer and the game code.,Lessons Learned,Threading is challenging but surprisingly fun. Given enough time, there is a great deal of optimization that can be done. All consoles are like PS2 i

42、n that compared with PC and Xbox1, optimization actually makes a difference.,Future,How will processors evolve going forward? Symmetric Processors (I.e. X360, current PC) Or Asymmetric Processors: I.e. PC with Physics chip or GPU CELL Big Vectors or small vectors? (SIMD),Future,On X360 vs CELL we ar

43、e able to thread a lot more stuff, more easily. CELL has a lot of potential, but more time is required to implement systems and update. Fortunately Physics can adapt to most of these paradigms,Physics level of detail,This is the new holy grail. Due to the nature of the problems we often have times o

44、f very high stress followed by periods of low stress. Some tasks can be deferred and updated by threads asynchronously with greater latency. Some tasks can be dropped, with detail increased only when possible.,Amdahls Law,Amdahls Law sucks more on multi core systems than it does on single core syste

45、ms. Single threaded slow code is many times slower than than fully threaded fast code. Due to the nature of our problems and schedules, we cannot thread everything. As processor counts increase, single threaded parts become more and more significant. Collision Detection and Integration are fast in F

46、ull Auto. However other parts in between and single threaded logic have become the bottlenecks.,Amdahls Law,Solutions (Wishful) Faster single threaded processors New Algorithms More time for programmers Help from the Compiler Solution (Practical) Change the design such that we only do things that ru

47、n fast.,Note on Low level Parallelism,Instruction Parallelism SIMD, Multiple elements in a vector Pipeline parallelism Keep all execution units active Memory Parallelism Keep memory bus active through prefetching and pipelined writes. Ideally, Compilers would handle these But they dont. Often these

48、can be high bang/buck items on in order cores with curiously high latency,Pipeline: Notes,I am not a fan of pipelined stages Whenever possible we used forking and jobs. Presently, Full Auto only uses pipelined stages for rendering vs the rest of the game. We spent too much time debugging race condit

49、ions between game logic and rendering data. In the future, when the graphics API is more thread friendly we will fork our rendering.,Add Hoc Threading 101,Some of our threading was heavily planned, The rest of it was less so We used a data centric approach with the following ad hoc heuristics,Add Ho

50、c Threading 101,Data centric approach If there is only one writer, and data is always safe to read (I.e. a resizing array isnt) then you can let it go Data with multiple writers but no ordering dependencies have to use atomic access. We generally use a Mutex here, or when possible, interlocked primitives Sometimes there are too many readers for a mutex to work well in which case we queue up the operation to be executed with single threaded logic. Sometimes a block of code is too big, or we are too lazy to analyze, so we queue that up too.,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1