1、The Past, Present, and Future of Video Telephony: A Systems Perspective,Dave LindberghStanford Networking Seminar 27 January 2005,Thanks,Thanks to Athina for inviting meIm going to take advantage of the opportunity to present some opinions about video telephonyI hope at least theyre educated opinion
2、s,Contents,A little bit about my perspectiveWhere weve come fromWhere we are now What is succeeding What is not succeeding (yet) Current problems & challengesThe mass-market barrier Expectations vs. reality What it will take to succeedWhere we go from here,A little bit about my perspective,Engineeri
3、ng background Modems & data communications Protocols, real-time systems, image processing1993: PictureTel, largest vendor of video conferencing gear ISDN, H.320, H.261 128+ kbit/s minimumSoon got sucked into standardization work Mid-90s: Chaired H.324 Systems Experts Group Edited ITU-T Rec. H.324 Ba
4、sis of todays 3G-324M system Precursor to H.323 (yes, I take some of the blame),What Ive been doing lately,H.264 video compression standardization Profiles/Levels ApplicationsEditor, ITU-T Rec. H.239 Role management Live = People Presentation = ContentEditor, ITU-T Rec. H.241 Video signalingEditor,
5、H.324 (again)Rapporteur, ITU-T Q.23/16 (“Media Coding”),Video telephony system,18 frames/second Progressive scanPlasma displayPixel aspect ratio 3:2Image quality described as “excellent”End-to-end latency 1 millisecond (great!),April 7, 1927 Bell Labs,New York Washington DC,Walter Gifford Herbert Ho
6、over President, AT&T US Secy of Commerce New York Washington DC,“Television” = Telephone + Vision,50x50 pixel display, neon bulbsCamera: Scanning arc lamp beamOptional projection to 2x3 feet But “results were not so good”,Edna Mae Horner OperatorChesapeake and Potomac Telephone Company,AT&T Picturep
7、hone,1957 “Experimental Model”,Early 1960s,Mirror,AT&T was very serious,Plenty of smart business people!,1964,Framing,Did it “cost too much”?,AT&T finally gave up in the early 1970s,1980s Still image picture phones,Mid-1980s: Japanese consumer electronic firms introduced still-image picture phone Us
8、ed existing regular analog phone line POTS modem 5 seconds to send 1 black & white frame No audio during picture transmission $200 eachVery few takers,1992 AT&T Videophone 2500,“Predicting that 10 years from now video phones will be as popular as cordless phones and fax machines, last week AT&T intr
9、oduced the first full-color motion video phone that operates over regular phone linesAT&T officials say the picture quality was acceptable to test-market consumers” Newsweek, January 20, 199210 frames/second, $1500 Marconi, others, had similar products,Many more videophones since then,Mostly based o
10、n ITU-T standardsH.320 (ISDN) H.323 (IP) H.324 (POTS) and SIPThey all worked,Siemens T-View H.320 (ISDN) Phone 1997,More videophones,And more,And more,FOMA experience in Japan,FOMA = H.324/Mobile, 64 kbit/s channels Video calls cost 2x price of voice calls3 million phones deployed (as of Sept. 2004)
11、 Average monthly video usage = 2 minutes Top 20% of users do 20 minutes/monthMost users young Show where they are, who theyre with Dont point camera at themselvesDoCoMo is hopeful that usage will increase when penetration 1 phone/family,Still-image camera phones,2nd generation Camera is on back of p
12、hone,Did they all cost too much?,Many had good video qualityMost were reliable & easy to useMany $50 PC cameras with videophone appsMS NetMeeting & Messenger are freeClearly, people do want video phones Witness all the attempts, user excitementBut they dont buy or use them when offered For some reas
13、on people are disappointed We need to understand why before we can fix this,What is succeeding?,The real killer app: TELEVISIONBut TV is doing fine without help from me,What else is succeeding?,Video conferencing $2B/year industry, profitable Top vendors: Polycom, Tandberg, Sony,Video conferencing t
14、oday,Most use is in large organizations Industry Government EducationMost use is internal Between sites of the same organizationMost use is scheduled Planned meetings, not spontaneousOnly a few meeting rooms have VC equipment Much talk about ubiquitous access, but not real yet,Situations where VC wo
15、rks well,With people you already know Already introduced, not strangersNot too many people on screen at once Need to see facial expressions clearlyGood lightingGood room & furniture layoutPeople & Content at same timeHow I use it every week Offices in Boston, California, Texas, Atlanta, Israel,Why i
16、s VC successful?,Relatively big picture size, high resolution Less restriction on where people are in the frameGood lightingHigh-value applicationWork environment, pre-scheduled meetings People come dressed & prepared to meet others Reduces discomfort with “being on camera”Yet, VC is still in 2% of
17、conference rooms Lots of room for growth Similar problems as stopped video telephony It works, but not nearly as well as we want it to!,How are we doing?,Were doing an excellent job on the classical technical challenges Video and audio coding Cost: $250K (1989) to $2000 (2004), less for PCs Bandwidt
18、h is getting cheaper all the time Simplicity, reliability have improved greatlySome immediate challenges Standards and network issues being workedLonger-term challenges for video telephony Expectations vs. reality Human factors,Standards,Wonderful thing: So many to choose from!Religion: H.323, SIP,
19、MGCP, proprietary No real differences from user perspective Some want to start overagainEvery standard is unnecessarily complex Over-reaction to past mistakes, too little experience The POTS network was also incredibly complex Limits of human complexity management abilityDirectory services ENUM/DNS,
20、 H.350/LDAP, UMMAP This will settle out with time,Latency,Lots of denial this is not helpfulITU-T G.114 gives 150 ms as an upper limit For total end-to-end latency Including propagation over distance This is about right, but difficult to achieveIP networks inherently have latency issues Usually make
21、 ARQ, backchannel schemes impracticalLow frame rates make things worse,Quality of Service on IP,Lots of solutions in theory DiffServ, MPLS, IP Precedence, etcZero penetration on public Internet There is no pricing modelMost private networks provide QoS with massive over-provisioning This is often ch
22、eaper than “clever” schemesQoS will remain a problem on the public Internet until there is a way to charge for it,More network issues,NATs and Firewalls IP is effectively unusable between organizations Virtually all inter-organization calls are still ISDNNetwork fragmentation IP, ISDN, POTS, 3G, 4G
23、Lack of public/automatic gatewaysThese are all being worked; will get solved Some things will take time to shake outA “killer app” could force more rapid change But this hasnt happened yet,Video compression coding,Ideal lossy video compression system Every possible bit sequence decodes as something
24、meaningful to human perception On various time scales Might this be the way the brain works? Markov chain text generators sound a lot like dreamsLots of room for improved codingPast: Biggest challenge was reducing bitrate Bandwidth and storage were expensiveToday: Computational efficiency a challeng
25、e Bits are getting cheaper faster than computes are,Beyond video coding,Most video research focused on coding Without compression video was unmanageableBetween improved compression and cheaper bandwidth and storage, things can now change Computation has gotten much cheaperFast, cheap video DSP means
26、 we can do more Stitching, warping, perspective correction Searching, indexing, processing, recognizing contentAnalogous to audio & still-image DSP There will be new apps unique to video,The mass-market barrier,Video conferencing is a successful nicheVideo telephony hasnt succeeded yet Yet, clearly
27、there is a market desire!Current issues dont explain past failures Standards, directories were solved for videophones Latency was not a problem in the analog world QoS, NAT/FW issues didnt exist for switched circuitsThen what will it take for success? Why have users not yet embraced video telephony?
28、,What will it take to succeed?,Cost? PC apps are nearly free, yet very little usedReliability? Current systems work well on private networks Video telephones were quite reliableComplexity? Video phones are easy to use, as are modern VCSVideo quality? VC systems provide TV-like quality, yet havent br
29、oken into the mass-market,Expectations Fiction,Metropolis (Fritz Lang, 1926),From Jetsons to Star Trek,Countless science-fiction films & TV shows Perfect framing, perfect lighting People look straight into the camera Nobody is nervous “being on camera”,2001: A Space Odyssey (Stanley Kubrick, 1968),E
30、xpectations Film & TV experience,Weve all grown up with film & TV Professional cinematography / videography Studios with proper lighting, layout Multiple camera angles Directors to choose the best shotsWith video telephony we get Single camera viewpoint Bad lighting Bad or no framing Often, poor res
31、olution and video quality Sometimes, tiny pictures,What it will take: Human factors,Framing Keeping people in the pictureCamera viewpoint & perspective Psychological factors, geometrical distortionEye contact & gaze direction And “camera shyness”Peripheral vision Sense of space close-up vs. wide vie
32、wsAttention requirements Tradeoffs of different media,Framing,Keeping people in the picture Close enough to see faces clearly Far enough for freedom of movement Consciousness of framing, control can be distractingScenes in movie Mother (Albert Brooks, 1996) Rob Morrow and Debbie Reynolds on videopho
33、neAutomatic speaker-following not ideal Often used when multiple people are in the room Want to see listener reactions (not just talker) Want to see VIPs (even if theyre listening) Close-ups can lose sense of relative position Still, often better than doing nothing,Camera viewpoint & perspective,If
34、the camera is too close Geometric distortion big noses, etc.If your camera is above eye-level Theyre looking down on you you look submissiveIf your camera is below eye-level Theyre looking up at you you look dominantThis is why royal thrones are tallThere is no single “right” position People can eit
35、her stand or sit,Eye contact,What happens when you stare at someone?Does it happen if you stare at video?What happens when someone stares at you? Do you feel comfortable?Eye contact is a form of innate, highly evolved non-verbal communication A deep part of human nature Lots of emotional charge Not
36、present in video telephony - unnatural,Eye contact & gaze direction,People can detect eye contact at great distances They can tell when theyre being observed They may respond with a glance, or return contactCooperation or liking = more direct gaze Disagreement or dislike = less direct gazeGaze and e
37、motional signals Unwavering gaze - dominance or threat Gaze avoidance submission or fear Gaze can signal sincerity, discomfort, challenge,Eye contact and video,We need to solve eye contact on video I think this will reduce “camera shyness”Need to know who is looking at youNeed to know if/when youre
38、stared atNeed to allow natural feedback response,Other “naturalness” issues,Peripheral vision Noticing what other people are doingMany people prefer to see but not be seen At least with current video systemsAttention demand & media Text: Least can carry on several IMs at once Audio: More one at a ti
39、me, can do other things Video: Most cant do other things Not a flaw, just something to take into account,Why is this all so complicated?,Voice telephony doesnt have these problemsBecause people are evolved to talk in the dark This is why telephones “work”Because video is not “just another channel” B
40、ut thats how engineers usually think about it Its something very different than audioThe video telephony experience needs to feel more natural and intuitive I think this is the real reason it hasnt succeeded yet This is where research needs to focus,We are still in the “mainframe” era,Video telephon
41、y is not unique in facing this challenge Automobiles: Benz Motorwagen to Ford Model T Mass production, simplicity Aviation: Wright Flyer to Douglas DC-3 Efficiency, safety Computing: Mainframes to PCs VLSI microprocessorsFrom possibilities in theory to useful practice High-value niche applications c
42、ome first These teach us about what is missing When technology matures, the mass-market arrives,Where we go from here,“Show-me” 3G video phones can succeed now See where I am See who Im with Human factors issues not a problem in this model Limited usage compared to voice minutesVideo conferencing, o
43、ther high-value apps will continue to mature & expandHuman factors improvements needed For “talking heads” video telephony to succeedA fertile field for research please work on it!,It may take a visionary individual,?,Thank you!,Polycom has an opening for a video DSP researcher to work on these topicsSend CVs to lindbergh92F,