1、INTERNATIONAL TELECOMMUNICATION UNION ITU=T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU Series H Supplement I (05/99) SERIES H: AUDIVISUAL AND MULTIMEDIA SYSTEMS Application profile - Sign language and lip=reading real-time conversation using low bit-rate video communication ITU-T H-series Recom
2、mendations - Supplement 1 (Previously CCITT Recommendations) ITU-T H-SERIES RECOMMENDATIONS AUDIOVISUAL AND MULTIMEDIA SYSTEMS Characteristics of transmission channels used for other than telephone purposes Telephone circuits oracables used for various types of telegraph transmission or simultaneous
3、 transmission H. 10-H. 1 9 Use of telephone-type circuits for voice-frequency telegraphy H 20-H. 29 H.3-H.39 Telephone-type circuits used for facsimile telegraphy Characteristics of data signals H.40-H.49 H. 50-H. 99 CHARACTERISTICS OF VISUAL TELEPHONE SYSTEMS INFRASTRUCTURE OF AUDIOVISUAL SERVICES
4、H.100-H.199 General H.200-H.219 Transmission multiplexing and synchronization H.220-H.229 Systems aspects H.230-H.239 Communication procedures H.240-H.259 Coding of moving video H 260-H .279 Related systems aspects H .280-H .299 Systems and terminal equipment for audiovisual services H .300-H.399 Su
5、pplementary services for multimedia H.450-H.499 For further details, please refer to ITU-T List of Recommendations. SUPPLEMENT 1 TO ITU-T H-SERIES RECOMMENDATIONS APPLICATION PROFILE - SIGN LANGUAGE AND LIP-READING REAL-TIME CONVERSATION USING LOW BIT-RATE VIDEO COMMUNICATION Summary Sign language a
6、nd lip-reading are two important application areas of video communication. For the successful transmission of the components of visual language, certain quality requirements must be met. This supplement is an application profile document that gives the background to the requirements and offers as we
7、il guidance on how these requirements can be met. The purpose of this supplement is not to propose new video coding schemes, but rather to indicate how current and future video coding schemes can be applied to these two areas of application, with good results, Source Supplement 1 to ITU-T H-series R
8、ecommendations was prepared by ITU-T Study Group 16 (1997-2000) and was approved under the WTSC Resolution No, 5 procedure on 27 May 1999. H series - Supplement 1 (05/99) 1 STD=ITU-T RECMN SERIES H SUPP 1-ENGL 1444 4862571 hhh583 876 9 FOREWORD ITU (International Telecommunication Union) is the Unit
9、ed Nations Specialized Agency in the field of telecommunications. The JTU Telecommunication Standardization Sector (ITU-T) is a permanent organ of the ITU. The ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing
10、 telecommunications on a worldwide basis. The World Telecommunication Standardization Conference (WTSC), which meets every four years, establishes the topics for study by the ITU-T Study Groups which, in their turn, produce Recommendations on these topics. The approval of Recommendations by the Memb
11、ers of the ITU-T is covered by the procedure laid down in WTSC Resolution No. 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with IS0 and IEC. NOTE In this publication the term recognized operating agency (RO
12、A) includes any individual, company, corporation or governmental organization that operates a public correspondence service. The terms Administration, ROA and public correspondence are defined in the Constitution of the ITU (Geneva, 1992). INTELLECTUAL PROPERTY RIGHTS The ITU draws attention to the
13、possibility that the practice or implementation of this publication may involve the use of a claimed Intellectual Property Right. The ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of
14、 the publication development process. As of the date of approval of this publication, the ITU had received notice of intellectual property, protected by patents, which may be required to implement this publication. However, implementors are cautioned that this may not represent the latest informatio
15、n and are therefore strongly urged to consult the TSB patent database. O ITU 1999 All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from the ITU. 11
16、 H series - Supplement 1 (05199) . STD.ITU-T RECMN SERIES H SUPP 1-ENGL 1999 = 48b254L Obbb584 702 CONTENTS 1 2 3 4 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 6 6.1 6.2 7 8 9 Scope Abbreviations . Definitions References Basic needs for reproduction of sign language and lip-reading . Temporal resolution requireme
17、nts Basic characteristics . 5.2.1 Finger-spelling 5.2.2 General signing . 5.2.3 Lip-reading 5.2.4 Adaptation . 5.2.5 5.2.6 Analysis of the frame rate requirement . Granularity of temporal resolution Spatial resolution requirements Fidelity Delay . Synchronism . Conclusion on performance requirements
18、 Performance verification Reference material Performance evaluations 1.2 s I Degrade to SQCIF degrade to SQCIF NOTE -The values must be observed with sign language or lip-reading movements present. Figure 2 - Resolution requirements for sign language and lip-reading in person-to-person conversation
19、Table 2 - Summary of usability degradation caused by delay and blur Usability I - I Good I 0.4 s I No I 6 Performance verification 6.1 Reference material This supplement includes a CD-ROM which contains a sign language video clip that can be used for performance evaluation. The video clip “Irene“ is
20、 taken from a Swedish TV programme. It contains a suitable amount of motion in the sign language. It also shows the normal rapidity of motion. This clip is under copyright from the Swedish Educational Broadcasting Company. Appendix I reproduces the CD-ROM Readme file, which contains the copyright st
21、atement and the technical description of the electronic files. 6.2 Performance evaluations A codec, or a teminal setup, is tested by transmitting the evaluation scene through a codec, or through a set of video phones, with a network connection. The result is recorded and evaluated. Recommendation P.
22、93 1 4 specifies an evaluation method. The frame rate during signing is evaluated. H series - Supplement 1 (0999) 7 - The selected static resolution is noted. Any extra blur introduced during medium motion is measured by comparing the recorded frames with pictures from the same scene with resolution
23、 reduced to QCIF and SQCIF. Blur is only evaluated on hands and face. The delay is measured. The synchronism of audio (voice), and video (lip movements), is measured. From these recordings the performance can be evaluated and compared to the goals described above. For approximate evaluation of these
24、 values, when laboratory equipment is not available, a simple evaluation tool from the National Swedish Association of the Deaf can be used. 7 In order to satisSr user requirements, certain features should be implemented in the terminai. Advice to the terminai impiementers o It should provide an int
25、erface to activate external alerting systems, e.g. flashing lights, Users may need to revert to text conversation sometimes. It is therefore advisable to pocket vibrator, watch-size vibrator or strong sound generators. implement the text conversation protocol T. 140 in the terminal. skips to be used
26、. A high frame rate automatically gives an opportunity to achieve a reasonable delay. o a The preference for over 20 fix and delay below 0.4 s calls for an algorithm with no frame o Deviation from all quality requirements can be accepted up to 2 s after a scene shift. I 8 Advice to the user The user
27、 should mange to use an environment with good lighting conditions and a plain background. 9 Broadening the scope If the equipment is to be used for sign language or lip-reading application in videoconferencing, multicasting, broadcasting or information retrieval, the following facts change the requi
28、rements. o The view is often broader, including both signing people and other objects. This indicates There are fewer possibilities for the user to give feedback in order to control perception by The delay requirements are less stringent. For broadcasting or information retrieval several The exact r
29、equirements for each application are outside the scope of this application profile. that usability should start at CIF spatial resolution. influencing the speaker or signer. Therefore, the higher frame rate from 20 fps is required. seconds delay is acceptable. For conferencing, the delay requirement
30、s are similar to those for conversational use. o o o 8 H series - Supplement 1 (0999) APPENDIX I Copyright and technical description of H-series Supplement 1 test material This appendix reproduces the content of the Readme. txtJiZe of the CD-ROM. Supplement 1 to ITU-T H-series Recommendations (06/19
31、99) “Zrene“ video clip Version 1.0, June 1999. 1.1 Copyright All rights are reserved. The material may only be used for the research and development of products to be used by deaf people. The material may not be included in commercial products without the permission of the Swedish Educational Broadc
32、asting Company. All other use of the material is prohibited. 1.2 support For distribution of update software, please contact: Sales Department ITU Place des Nations CH- 12 1 1 Geneve 20 SWITZERLAND email: salesitu.int For reporting problems, please contact TSB helpdesk service at: TSB Helpdesk servi
33、ce ITU Place des Nations CH-121 1 Geneve 20 SWITZERLAND fax: +41 22 730 5853 email: tsbedhitu.int 1.3 This video sequence presents sign language intended to be used as test material for video coding. It contains sign language performed at natural speed. The sequence is named “Irene“ after the signin
34、g person. It is in Swedish sign language and originally produced by the Swedish Educational Broadcasting Company. It shows the same head-to-stomach view that is usually used in personal use of videophones for signing. It is recorded in PAL with 25 fps. It is provided in three formats: 1) Sign-Irene.
35、mpeg (3261 Kbytes) 2) Sign-1rene.cif (80 190 Kbytes) 3) Details on the video sequence MPEG-1 coded in CIF resolution at 25 fps; YCbCr 4:2:0 format in CIF at 25 fps; Sign - 1rene.qcif (20 048 Kbytes) YCbCr 4:2:0 format in QCIF at 25 fps. H series - Supplement 1 (05199) 9 - - - Fingerspelling content
36、This is an approximate representation of two fingerspelling sequences in “Irene“. The numbers are frame numbers from the beginning of the MPEG version. The letters indicate when the letters are quite clearly formed by the hand. A dash indicates that no clear letter is formed in the transition betwee
37、n letters. The first is “Pia Wickman“, with the last “a“ only visible on the mouth. I Pia W i c han“ Fr. Ltr. I Fr. Ltr. I Fr. Ltr. I Fr. Ltr. I Fr. Ltr. 1 29 p I 39 - I 49 - I 59 - I 69 -I 30 P I 40 a I 50 - I 60 - I 70 nI 31 P I 41 a I 51 w 1 61 k I 71 nI 32 p I 42 a I 52 w I 62 k I 72 nI 33 - I 4
38、3 a I 53 - I 63 k 1 73 nI 34 - I 44 a I 54 i 1 64 - I 74 nI 35 i I 45 a I 55 - I 65 - I 75 nI 36 i I 46 a I 56 c I 66 - I 76 nI 37 i I 47 - I 57 c I 67 - I 77 nI 38 i I 48 - I 58 c I 68 mI Edsviken I Fr. Ltr. I Fr. Ltr. I Fr. Ltr. I Fr. Ltr. 1 Fr. Ltr. I 308 e I 315 s I 322 i I 329 e I 336 nI 309 e
39、I 316 s I 323 - I 330 nI 310 - I 317 - I 324 - I 331 nI 311 - I 318 v I 325 k I 332 nl 312 d I 319 v I 326 k I 333 nI 313 s I 320 v I 327 k I 334 nl 314 s I 321 - I 328 - I 335 nI General signing content The last phrase in the clip is signed with signs (without finger-spelling), comparable to words.
40、 The phrase is presented here, transcribed sign by sign, with the number of frames each sign occupies in paranthesis “SHE(7) TELLS(7) HERSELF( 1 1) HOW(4) SHE(2) FELT( 1 1) EXPERIENCED( 13) ADOLESCENCE( 16)“. The sequence is found between frames 406 and 529 in the MPEG version. Grammatical component
41、s The clip shows a number of eye blinks, that are typical grammatical components of sign language and which are used as sentence delimiters. They are short and, in many cases, OCCLU only on one or two frames. There is a grammatical blink in frames 394 and 395. 10 H series - Supplement 1 (05/99) - _
42、- STD-ITU-T RECMN SERIES H SUPP 3-ENGL L999 48b2593 Obbb59b 424 = eries A #cries B ieries C leries D ieries E ieries F ;cries G jeries H jeries I Series J Series K Series L Series M Series N Series O Series P Series Q Series R Series S Series T Series U Series V Series X Series Y Series Z ITU-T RECO
43、MMENDATIONS SERIES Organization of the work of the ITU-T Means of expression: definitions, symbols, classification General telecommunication statistics General tariff principles Overall network operation, telephone service, service operation and human factors Non-telephone telecommunication services
44、 Transmission systems and media, digital systems and networks Audiovisual and multimedia systems Integrated services digital network Transmission of television, sound programme and other multimedia signals Protection against interference Construction, installation and protection of cables and other
45、elements of outside plant TMN and network maintenance: international transmission systems, telephone circuits telegraphy, facsimile and leased circuits Maintenance: international sound programme and television transmission circuits Specifications of measuring equipment Telephone transmission quality
46、, telephone installations, local line networks Switching and signalling Telegraph transmission Telegraph services terminal equipment Terminals for telematic services Telegraph switching Data communication over the telephone network Data networks and open system communications Global information infrastructure and Internet protocol aspects Languages and general software aspects for telecommunication systems Printed in Switzerland Geneva, 1999