1、 International Telecommunication Union ITU-T J.248TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (06/2008) SERIES J: CABLE NETWORKS AND TRANSMISSION OF TELEVISION, SOUND PROGRAMME AND OTHER MULTIMEDIA SIGNALS Measurement of the quality of service Requirements for operational monitoring of video-to-
2、audio delay in the distribution of television programs Recommendation ITU-T J.248 Rec. ITU-T J.248 (06/2008) i Recommendation ITU-T J.248 Requirements for operational monitoring of video-to-audio delay in the distribution of television programs Summary Since the advent of digital television networks
3、 for program transmission, and the introduction of high-efficiency bit-rate reduction (BRR) devices and of other types of digital image processing devices, audiences sometimes complain that the television programs they receive are out of “lip-sync“. Lip-sync errors are generally due to the fact that
4、 audio and video are separately processed in the television chain, and processing delays are typically different for video than for the accompanying audio signal. Recommendation ITU-T J.248 analyses the problem and provides guidance on means to measure lip-sync errors in the context of operational m
5、onitoring in television programme transmission chains. Source Recommendation ITU-T J.248 was approved on 13 June 2008 by ITU-T Study Group 9 (2005-2008) under Recommendation ITU-T A.8 procedure. ii Rec. ITU-T J.248 (06/2008) FOREWORD The International Telecommunication Union (ITU) is the United Nati
6、ons specialized agency in the field of telecommunications, information and communication technologies (ICTs). The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations
7、on them with a view to standardizing telecommunications on a worldwide basis. The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics. The approval of
8、 ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with ISO and IEC. NOTE In this Recommendation, the expression “Administration“ is
9、 used for conciseness to indicate both a telecommunication administration and a recognized operating agency. Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandatory provisions (to ensure e.g., interoperability or applicability) and compliance with
10、the Recommendation is achieved when all of these mandatory provisions are met. The words “shall“ or some other obligatory language such as “must“ and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the Recommendation is required
11、of any party. INTELLECTUAL PROPERTY RIGHTS ITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual
12、Property Rights, whether asserted by ITU members or others outside of the Recommendation development process. As of the date of approval of this Recommendation, ITU had not received notice of intellectual property, protected by patents, which may be required to implement this Recommendation. However
13、, implementers are cautioned that this may not represent the latest information and are therefore strongly urged to consult the TSB patent database at http:/www.itu.int/ITU-T/ipr/. ITU 2009 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior
14、 written permission of ITU. Rec. ITU-T J.248 (06/2008) iii CONTENTS Page 1 Scope 1 2 References. 1 3 Definitions 1 3.1 Terms defined elsewhere 1 3.2 Terms defined in this Recommendation. 1 4 Abbreviations and acronyms 2 5 Conventions 2 6 The reference television chain 2 6.1 Television program produc
15、tion 2 6.2 Primary distribution 2 6.3 Secondary distribution 2 6.4 Presentation 2 7 Main causes for loss of lip-sync . 2 8 User requirements for operational monitoring . 4 8.1 Operational aspect 4 8.2 Measurement aspect . 4 Appendix I Perceptual limits for lip-sync errors. 5 Appendix II Detection an
16、d adjustment of lip-sync errors. 6 II.1 Perceptual verification of lip-sync along television chains 6 II.2 Maintaining lip-sync along the television chain The ideal solution 6 II.3 Maintaining lip-sync along the television chain A practical approximate solution . 6 Rec. ITU-T J.248 (06/2008) 1 Recom
17、mendation ITU-T J.248 Requirements for operational monitoring of video-to-audio delay in the distribution of television programs 1 Scope This Recommendation specifies requirements for operational monitoring aimed to help minimizing video-to-audio delay, thus minimizing lip-sync errors in the transmi
18、ssion of television programs. NOTE The structure and content of this Recommendation have been organized for ease of use by those familiar with the original source material specifications; as such, the usual style of ITU-T recommendations has not been applied. 2 References The following ITU-T Recomme
19、ndations and other references contain provisions which, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated were valid. All Recommendations and other references are subject to revision; users of this Recommendation are ther
20、efore encouraged to investigate the possibility of applying the most recent edition of the Recommendations and other references listed below. A list of the currently valid ITU-T Recommendations is regularly published. The reference to a document within this Recommendation does not give it, as a stan
21、d-alone document, the status of a Recommendation. ITU-T J.243 Recommendation ITU-T J.243 (2006), Requirements for operational monitoring in television programme transmission chains. ITU-R BT.1359 Recommendation ITU-R BT.1359-1 (1998), Relative timing of sound and vision for broadcasting. ITU-R BT.17
22、29 Recommendation ITU-R BT.1729 (2005), Common 16 x 9/4 x 3 aspect ratio digital television reference test pattern. 3 Definitions 3.1 Terms defined elsewhere None. 3.2 Terms defined in this Recommendation This Recommendation defines the following terms: 3.2.1 final edited master: The final edited ma
23、ster is the final instance of a television program as it is provided at the end of the program production chain, ready to be dispatched to the distributors and the end users. 3.2.2 frame synchronizer: A device that receives a video signal from a remote source, and synchronizes it to the local video
24、synchronization pulses, in order that it may be seamlessly mixed with locally generated video signals. 3.2.3 interframe coding: Bit rate reduction video signal encoding that exploits the video signal redundancy over several pictures. 3.2.4 lip synchronization (lip-sync): Operation to provide the fee
25、ling that the speaking motion of the displayed person is synchronized with that persons voice, or other sounds are synchronized to their visually displayed source. Alternatively, the minimization of the relative delay between the visual display of a person speaking and the audio of the voice of the
26、person speaking. The objective 2 Rec. ITU-T J.248 (06/2008) is to achieve a natural relationship between the visual image and the aural message for the viewer/listener. 3.2.5 primary distribution: Use of a transmission channel for transferring audio and/or video information to one or several destina
27、tion points without a view to further post-processing on reception (e.g., from a continuity studio to a transmitter network). 3.2.6 secondary distribution: Use of a transmission channel for distribution of programs to viewers at large. 3.2.7 source coding (bit-rate reduction): The encoding of the or
28、iginal digital signal (video, audio or data) in bit-rate reduction (BRR) representation before protection is applied against bit errors in the channel. 4 Abbreviations and acronyms None. 5 Conventions None. 6 The reference television chain For the purpose of this Recommendation, the reference televi
29、sion chain, from acquisition to presentation can be described as consisting of the four sections, listed below. 6.1 Television program production This is the section of the television chain, in which program material is captured locally or acquired from remote sources. The production section starts
30、from the camera and microphone, and it ends where the complete program material is presented on the finished edited master, ready to be dispatched to distributors and end users. Except for the simplest programs, the audio and video components of all television programs are acquired and processed in
31、separate production chains, and they are generally only brought together on the finished edited master of the program, under the supervision of the program director and producer. 6.2 Primary distribution This is the section of the television chain, in which programs are sent from the program provide
32、r to the program distributor (e.g., the input of the cable head-end). The program audio and video may need to be separately processed in this section, e.g., to be mixed with local program material or commentary. 6.3 Secondary distribution This is the section of the television chain, in which program
33、s are sent from the primary distribution point (e.g., the output of the cable head-end) to the end user of the programs (e.g., the cable television subscriber). 6.4 Presentation This is the section of the television chain, in which the audio and video of program are presented on the audiovisual disp
34、lay of the end user. Rec. ITU-T J.248 (06/2008) 3 7 Main causes for loss of lip-sync As long as audio and video signals are multiplexed in a single bit stream, they preserve the lip-sync they had at the input of the multiplexer. However, lip-sync may be lost whenever the audio and video signals are
35、separately processed. In this event, the amount of lip-sync error depends on the type of processing and on the number of cascaded processes. The reference television chain contains a large number of devices that the video or respectively the audio signals must go through. Every device along the chai
36、n introduces some delay in the audio or the video signal that goes through it. Audio delays introduced along the chain are generally small enough not to appreciably affect lip-sync. Video delays are also generally small, except for some specific devices. The devices that introduce the most significa
37、nt video delays, often large enough to visibly affect lip-sync are frame synchronizers, bit-rate-reduction video encoders and decoders, and complex image processing devices found in production and in presentation, such as image correctors, interlacers/de-interlacers etc., which may be built into con
38、sumer displays1. a) Frame synchronizers are devices that receive a video signal from a remote source and synchronize it to the local video synchronization pulses, in order that it may be seamlessly mixed with locally generated video signals. They are found in program production, but also in primary
39、and secondary distribution. The video delay that they introduce is inherently variable, since it depends on the phase of remote sync with respect to the local sync, and it may be the order of one video picture. b) Source bit-rate-reduction encoders and decoders can introduce very large but essential
40、ly fixed video delays, whose amount depends on the GoP (temporal interpolation) mode to which they are set to operate, and on the computation delay in the encoder, which can be quite high in advanced encoders such as MPEG-4 ones. The delay introduced in the encoder is then cumulated with the one int
41、roduced in the decoder. The total delay can be quite large, of the order of many television images, and in extreme case of the order of some seconds. c) Complex image processing devices are present both in production and in presentation. They can introduce appreciable video delays depending on the o
42、perating mode to which they are set. d) In some cases, the video signal travels on a path different from the audio signal. In these cases, a delay can be introduced, due to the different transit time of the video and the audio signals over their respective transmission path. This is the case, for in
43、stance, of a sports program in which the video is sent via satellite and the comment, mixed with the international sound, is sent via a land line. For the purpose of this Recommendation, which addresses the primary and secondary distribution of programs, it is assumed that lip-sync is perfect on the
44、 final edited master at the end of television program production, and any lip-sync error is introduced downstream from it, namely in the contribution section of the chain, in the distribution section of the chain and in program _ 1Lip-sync errors can also be introduced by some program production too
45、ls. As an example, a video-wall behind the anchor-man of a news bulleting will generally introduce some delay in the displayed video signal, due the image processing circuits it contains. If the video-wall displays the image of an interviewee that is present in the news studio, the delay between the
46、 interviewees direct image and the one on the video-wall may be objectionable. As another example, radiocameras are often used in live broadcasts. The radiocamera signal generally has to go through a frame synchronizer and perhaps a color corrector, thus suffering some delay. When the radiocamera si
47、gnal is mixed with a signal coming directly from other cameras that shoot the same event, the lip-sync error may be noticeable. 4 Rec. ITU-T J.248 (06/2008) presentation, i.e. in the consumer display. This is a credible assumption for most programs, notably recorded ones, since the creative staff ca
48、n be expected to certify that the final edited master is correct before the program is released. 8 User requirements for operational monitoring The following requirements are derived from ITU-T J.243 in the light of specific issues in this Recommendation. 8.1 Operational aspect 1) Capability of in-s
49、ervice monitoring; 2) Applicability to the video formats in use such as SDTV and HDTV; 3) Applicability to the numbers of audio channels in use; 4) Applicability to the coding bit rates in use, irrespective of variable bit rate (VBR) or constant bit rate (CBR); 5) Applicability to the transmission bit rates in use; 6) Applicability to the coding parameters and tools (e.g., profile/level, picture structure, range of motion vectors) in use; 7) Applicability to different signal processing such as compression coding, standards conversion, aspect ratio c