1、1,Larry Rudolph & Shalini Agarwal,An Oxygenated Presentation Manager,Larry RudolphOxygen Workshop, January, 2002,2,Larry Rudolph & Shalini Agarwal,Goals & Overview,Integrate Many Oxygen Technologies Application Driven Use an application that we understand Personally use often Would help if were more
2、 human-centric Portable (as opposed to E-21) Develop Architectural Infrastructure Exposes new requirementsCritique of Presentation Manager What is wrong with it What needs improvement,3,Larry Rudolph & Shalini Agarwal,Application Scenario,4,Larry Rudolph & Shalini Agarwal,An Oxygen Application,Compo
3、nents Input Vision Speech Touch,Processing Changing configuration,Output Projector Handheld Archive,Equipment Today, it is too hard Linux laptop; windows laptop; camera; microphone; network; projector; power blocks Tomorrow, much easier a couple of H21s,5,Larry Rudolph & Shalini Agarwal,Camera watch
4、ing laser point on screen,Camera Challenges Inexpensive ones have wrong focal length Alignment issues Use edge of screen, display pattern, figure out from what is known to be visible We ended up displaying a pattern of concentric circles Relative size of laser point depends on distance Beyond ten fe
5、et, had to use only certain types of lasers Could slow-down camera and let pixels saturate (too complicated),6,Larry Rudolph & Shalini Agarwal,Camera watching laser point on screen (cont),Camera Interface Click at point (x,y) Hold laser at same location for 5 seconds Select horizontal line ( (x1,y1)
6、 , (x1,y2) ) Sweep laser back and forth, line is diameter of ellipse Select object centered at point (x,y) Sweep laser in circle, point is center of circle Previous or Next Click in left (right) 1/8 of screen,7,Larry Rudolph & Shalini Agarwal,Microphone listening to speaker,Microphone Many technolog
7、ies; Lapel-mic; mic array; room microphone Current approach: ipaq Continuous recognition Push to speak Audio server on ipaq Detects start and stop Best results when human pushes to start and releases to stop Audio wave file sent to Galaxy speech system Galaxy output actions via CGI-script A nice uni
8、fying mechanism One more complicated component,8,Larry Rudolph & Shalini Agarwal,Speaker controlling presentation via ipaq,Ipaq output to CGI-script Server Same actions as from speech server Action are Next slide, Previous slide, Goto slide #n, Goto slide named Next item, Previous item, Goto item #n
9、, Goto item named Next animations, previous animation, goto animation #n Start presentation , End presentation, Pause presentation Initialize Camera, test microphone Handheld (Ipaq) display GUI generated from speechbuilder grammar List of slides, items per slides Currently use ad-hoc solution where
10、power-point sends lists to ipaq. Need more automatic solution,9,Larry Rudolph & Shalini Agarwal,Output to projector, handheld, archive,Unlimited number of video / audio output producers E.g. powerpoint just one producer of output At any time, each output device has an associated producer This produc
11、er can receive input from several producers Handheld has proxy To reduce bandwidth to ipaq Current slide, list of slides, list of commands Archive Each slide shown, audio (from a different microphone) sent to archive Currently just gif of current slide,10,Larry Rudolph & Shalini Agarwal,Processing c
12、ontrolling session,Do not let powerpoint control the world Slide viewer; movie player; program execution; browser; etc Want to mix all types of applications Presenter has control of the output Eg: Switch output producer from powerpoint to media player Remove interrupting technologies Dynamically dis
13、connect any input / output source All done via core language Or some other glue language, e.g. meta-glue Which does all the other infrastructure issues,Multi-Modal Input,Shalini Agarwal Oxygen Conference January 8th, 2002,12,Larry Rudolph & Shalini Agarwal,Initial Experience With Presentation Manage
14、r,One Single Monolithic Context Command within slide, between slides, between applications Problem Too many false positives Preliminary Solution Slide tracking e.g. recognize “Next Slide” command only after at least 60% of words on slide have been said e.g. recognize “Show Demo” only after slide 17
15、Still lots of problems Many slide styles hard to track (e.g. figures not words on slide) Tracking for within slide different than for between slides,13,Larry Rudolph & Shalini Agarwal,A Better Solution: Multiple Contexts,Very Active Research AreaIntelligent-room project; Galaxy; Others Three layers,
16、 each having its own context Slide (Next Item, Next Animation) Presentation (Next Slide, Goto Conclusion, Goto Example) Session (Start Presentation, Switch to Browser, Show Questions) Challenges Each context requires its own speech recognition system Multicasting sound wave to each system Selecting
17、the best result,14,Larry Rudolph & Shalini Agarwal,Extending the Galaxy System,Start with context for speech and then extend Note, our goals are similar but not identical to those of the Spoken Language Group We are not dialog-based Exploit their work,Follow Galaxy Recognizer scores different guesse
18、s at words Language Processing Unit uses input grammar to select best input sentence Scott Cyphers gave us the nbest interface,15,Larry Rudolph & Shalini Agarwal,Recognizer chooses 10 best guesses at word matches (for this context),Language Processor picks best sentence from recognizer based on inpu
19、t grammar,16,Larry Rudolph & Shalini Agarwal,System Structure,17,Larry Rudolph & Shalini Agarwal,System Structure,next item,next movie,previous item,Selector,Language Processor,Recognizer,next item,end presentation,Language Processor,Recognizer,start presentation,start explorer,start presentation,Sl
20、ide Layer,Session Layer,start presentation,18,Larry Rudolph & Shalini Agarwal,System Structure,19,Larry Rudolph & Shalini Agarwal,Recognizer,Add Recognizer for T9,Language Processor,Presentation Layer,go to slide nine,Selector,Language Processor,next item,Language Processor,start presentation,Sound
21、Input,Slide Layer,Session Layer,start presentation,T9 Input,Recognizer,Recognizer,20,Larry Rudolph & Shalini Agarwal,Add Recognizer for Graffiti,Language Processor,Presentation Layer,go to slide nine,Selector,Language Processor,next item,Language Processor,start presentation,Sound Input,Slide Layer,
22、Session Layer,start presentation,T9 Input,Recognizer,Graffiti Input,Recognizer,Recognizer,21,Larry Rudolph & Shalini Agarwal,Other Input Modes,T9 (telephone keypad) To input a, b, or c press “2”; Current cell phones have dictionary to select correct word Lots of false positives (very annoying) Remem
23、ber my introduction? Using an application-dependent grammar would reduce errors,Pen-based character input Use strokes to input characters Current palm pilot only recognizes “Graffiti” alphabet Lots of false positives (very annoying) Using an application-dependent grammar would reduce errors,22,Larry
24、 Rudolph & Shalini Agarwal,Replacing the Recognizers,Build recognizers for T9 and Graffiti Use Galaxy system to process results from new recognizers,Hub,Speech Synthesis,Language Generation,Dialogue Management,Discourse Resolution,Language Processing,Speech Recog.,Database Server,Audio,T9 Recog.,Gra
25、ffiti Recog.,23,Larry Rudolph & Shalini Agarwal,Conclusion,Each application defines an input grammarThis grammar can be used to Ensure that each application gets valid input It might not be what the user wanted, but the application will understand it Reduce false-positives Identify the input suitabl
26、e for associated application Choose the application with the highest score If tie, must do something else (future research) Enable T9, Graffiti, Speech, other input modes,Critique of Presentation Manager,25,Larry Rudolph & Shalini Agarwal,Vision / Gesture Recognition,Laser Pointer Great for drawing
27、attention to content Audience is primary consumer Secondary use to control presentation But it is not a mouse Semantics are tied to slide context Differs from Intelligent-room use Small number of identified gestures Gestures easily punctuated Low computational overhead Soon will be handled with a H2
28、1,26,Larry Rudolph & Shalini Agarwal,Critique of Vision / Gesture Recognition,Laser Pointer Great for drawing attention to content Cheap technology but mostly distracting Too shaky, imprecise But it is not a mouse More awkward to use than mouse Another gadget to hold in the hand, button to identify,
29、 batteries to maintain Small number of identified gestures There are better ways of drawing attention to slide content I rarely use it and dont like it when others do Low computational overhead Dumb vs Intelligent Device Discussion,27,Larry Rudolph & Shalini Agarwal,Speech Recognition,Initially seem
30、s like great idea Speaker is already speaking, so can use it to control presentation Want passive, intelligent listener Not a dialog No “prompt” : alienating distraction Want no mistakes For dialog, better to guess than ignore For us, high cost for incorrect guess Most words are not relevant to spee
31、ch system More trouble than it is worth But may be good for real-time search of content,28,Larry Rudolph & Shalini Agarwal,More useful aspect Output modalities,Presenter has put the time and effort into the production Simplier is better Audience has harder task Understand material being presented Re
32、cord thoughts, impressions, connections Filter for later review Process in real-time Keep-up with presentation Do all this with minimal distractions Output modalities Content for live audience Content for speaker (superset of audience) Content for retrieval Correlate notes with content,29,Larry Rudo
33、lph & Shalini Agarwal,Record and correlate notes with presentation,CORE: Communication Oriented Routing Environment,(Oxygen Research Group),31,Larry Rudolph & Shalini Agarwal,Assumptions,Actuators / Sensors (I/O) in the environment Many are shared by apps & users Many are flaky / faulty “User” does
34、not know much about them Environment, application, users desires change over time,32,Larry Rudolph & Shalini Agarwal,An Oxygen Application,Interconnected Collection of Stuff Who specifies the stuff? I dont know, but its mostly virtual stuff Many layers of abstraction “Dont ask, its turtles all the w
35、ay down” Two main layers of programming Professionals Users, e.g. grandmother,33,Larry Rudolph & Shalini Agarwal,Communications-Oriented Programs,Connecting the (virtual) stuff done by user Home stereo / theater analogy Plug Stuff together; unplug it if doesnt work Dont like it, unplug it Device dri
36、vers, services, clients, dont know to whom or to what they connect In client/server model, server knows a lot about the client, the client knows even more about the server Extend Unix Pipes,34,Larry Rudolph & Shalini Agarwal,CORE,Larry Bears CORE,Physical Devices,Larry Bear,CORE,Other COREs,35,Larry
37、 Rudolph & Shalini Agarwal,Message Flow,Messages flow between nodes & core Core is both language and router Within Core Router, some messages are interpreted and may trigger actions other messages get routed to other nodes Request-Reply message strategy Even number of messages No reply within time p
38、eriod, means error,36,Larry Rudolph & Shalini Agarwal,CORE Language Elements,Four elements Nodes, Links, Messages, Rules Features Interpreted Language Statement is a message & reply Each element has an inverse,37,Larry Rudolph & Shalini Agarwal,Nodes Specify via INS,CORE,Cam = device=web-cam; locati
39、on=518;,PTRvision = device=process; OS=Linux;File=Laser Vision, ,Laser Vision,Node handler = (nickname, specifier),38,Larry Rudolph & Shalini Agarwal,Node Statement Handler,When node message arrives Verified for correctness (statements allowed) Routed to Node Manager (just another node) Node Manager
40、 INS lookup, verifies if allowed, creates if needed Creates core thread to manage communication with node Bookkeeping & reply message with handle/error,39,Larry Rudolph & Shalini Agarwal,CORE,Slide Speech,Presentation Speech,Command Speech,Laser Vision,Lcamera,vision = (Cam,PTRvision),Links,40,Larry
41、 Rudolph & Shalini Agarwal,Link Statement Handler,Message routed to link manager Two queries to node mng for thread cntl Message to thread controller of source node Specifying destination thread controller Message to thread controller of dest node Specifying source thread controller Bookkeeping & re
42、ply message handler/error,41,Larry Rudolph & Shalini Agarwal,CORE,Slide Speech,Presentation Speech,Command Speech,Laser Vision,Messages flow over the links,Next Slide!,Messages,42,Larry Rudolph & Shalini Agarwal,Message Handling,Messages can be encrypted Core statement messages have fixed format Eve
43、rything else is data message Each node thread has two unbounded buffers Core to node & Node to core Logging, rollback, fault-tolerance,43,Larry Rudolph & Shalini Agarwal,CORE,Slide Speech,Presentation Speech,Command Speech,Laser Vision,RULES: (trigger,action),Questions,( MESSQuestion , Lslide,lcd -
44、& Lslide,qlcd ),Rules,Questions,Questions,44,Larry Rudolph & Shalini Agarwal,Rule Statement Handler,( trigger , consequence ) Both are “event sets” Eight basic events: +Node, -Node, +Link, -Link +Message, -Message, +Rule, -Rule Event set is a set of events Trigger is true when events are true Conseq
45、uence makes events true,45,Larry Rudolph & Shalini Agarwal,Rules A link is a rule,A message event is of form (node, message specifier) ( message specifier , node ) Message came from or going to node A link (x,y) is just shorthand for the rule:+( x , m ) ( - (x, m) , +(m , y) ) If a message m arrives
46、 at node x, then make that event false (remove the message) and make the event of m arriving at y from core true.,46,Larry Rudolph & Shalini Agarwal,Rules Access Control Lists,An access control list is just a rule When messages arrive at node, if they arrive from valid node, then allowed to continue
47、 to flow. Modifying access control lists is just adding or removing rules.,47,Larry Rudolph & Shalini Agarwal,Rules,Rule statement gets sent to rule manager Event set is just another shorthand for rules Rule manager sends command to trigger node thread that tells it about the consequence Rules are reversible,48,Larry Rudolph & Shalini Agarwal,Reversibility,Each statement is invertible (reversible) If there is an error in the application specification, then can undo it all. General debugging is possible with reversible rules and message flow,