Swaminathan SridharMultimedia Processing LabUniversity of .ppt

资源描述

1、Swaminathan Sridhar Multimedia Processing Lab University of Texas as Arlington,MULTIPLEXING OF AVS CHINA PART 2 VIDEO WITH AAC BIT STREAMS AND DE-MULTIPLEXING WITH LIP SYNC WHILE PLAYBACK,Thesis outline:,What is multiplexing? Applications of multiplexing. The need for choosing AVS video and AAC audi

2、o codecs. Video & audio elementary stream formats. Multiplexing process. De-multiplexing process. Lip synchronization while playback. Results and conclusions. Future work. References.,What is multiplexing?,A multimedia program is a combination of multiple elementary streams such as video and audio.

3、Multiplexing is the process of converting multiple elementary streams such as video an audio streams into a single transport stream for transmission. It conserves the usage of transmission channels.,Applications of multiplexing,Multiplexing is used in areas of applications such as ATSC DVB-T DVB-S D

4、VB-H IPTV,The digital transmission/reception process adopted in the ATSC standard 22,The need for video and audio compression,With the advent of high definition television transmission schemes high quality video and audio data are transmitted which occupy a lot of bandwidth over a transmission chann

5、el. To address this issue the video and audio data are compressed using efficient compression schemes such as AVS China video codec and AAC audio codec.,Why AVS China video ?,AVS (audio video coding standard) China is the latest digital video coding standard developed by the AVS work group of China.

6、 AVS video codec employs the latest video coding tools which primarily targets standard definition (SD) and high definition (HD) video compression. Compared to the previous video coding standards such as MPEG-2 and MPEG-4, AVS achieves the same quality of video with significantly lower bit rates or

7、vice versa.,Overview of AVS China standard 5,AVS-video profiles & their applications 4,Coding tools of AVS part 2 video codec,Intra prediction : 8x8 block based intra prediction. 5 modes for the luminance component namely the DC, horizontal, vertical, down left and down right and 4 modes for the chr

8、ominance component namely the DC, horizontal, vertical and plane mode are specified. Motion compensation : 16x16, 16x8, 8x16 and 8x8 block sizes. Motion vector resolution: pixel accuracy with 4-tap interpolation filter. Transform: 8x8 integer cosine transform. Quantization and scaling with scaling o

9、nly in the encoder. Entropy coding: context based 2D-VLC De-blocking filter: performed around the 8x8 boundaries,AVS China part 2 video encoder 2,AVS video decoder 12,AVS video encoded bit stream format,Start code: It consists of start code prefix and start code value. Start code prefix: A string of

10、 23 zero bits followed by a single bit with a value of 1 i.e. 0x000001 which are all byte aligned. This is followed by start code value. Start code value: It is an 8 bit integer that identifies the start code type.,Start code types & start code values used in the AVS-video bit stream 8,Picture codin

11、g type used in AVS-video bit stream,Pb_picture_start_code : The bit string format is 0x000001B6 which indicates the start code of P or B picture. Picture_coding_type: It is a 2-bit unsigned integer which specifies the coding type of a picture as shown in Table 1.Table 1 Coding type of a picture 8,NA

12、L unit,NAL unit stands for network abstraction layer unit which is a type of packetization that prefixes certain headers to the encoded video bit stream. It was designed to provide a network friendly environment for transmission of video data . It mainly addresses video related applications such as

13、video telephony, video storage, broadcast and streaming applications, IPTV etc. The syntax for NAL unit is defined in H.264 standard but AVS part 2 standard does not define any syntax for the NAL unit.,NAL unit mapping with the encoded AVS video streamThe basic syntax for the NAL unit is shown in fi

14、gure 1.Figure 1 NAL unit syntax 13. NAL unit consists of a 8 bit header followed by the payload.The procedure for mapping AVS video stream with NAL unit is to map the data between every start code prefixes i.e. 0x000001 in the AVS video stream into a NAL unit (which includes the start code value but

15、 not the code prefixes) and then add a 1-byte header before the start code value.,NAL unit header description,It is a 8 bit header consisting of the following parameters. Forbidden_zero_bit : which is a 1 bit value and it is always 0. Nal_ref_idc : which is a 2-bit unsigned integer value. It indicat

16、es the priority of the type of data carried in the NAL unit based upon the start code type. This value should not be zero for I frames. Nal_unit_type : which is a 5-bit unsigned integer value and therefore 32 types of NAL units are allowed. This value indicates the type of data carried in the NAL pa

17、yload.,NAL unit type according to the start code values 14.,Why AAC audio?,AAC codec showed superior performance at both low and high bit rates as compared to MP3 and AC3. Supports up to 48 audio channels with a wide variety of sampling frequencies from 8 KHz to 96 KHz. The first codec to achieve IT

18、U-R broadcast quality at a bit rate of 128 Kb/s for stereo. The encoding efficiency is nearly 30 % more than MP3 (MPEG-1/2 audio layer 3).,AAC audio,Advanced audio coding is a standardized lossy compression scheme for coding the digital audio. It has been standardized under the ISO/IEC as part 7 of

19、the MPEG-2 standard and part 3 of the MPEG-4 standard. AAC profiles: Main profile: Provides the highest audio quality and is the most complex. Low-complexity profile: Achieves nearly the same audio quality as the main profile but with significant savings on the memory and process requirements. Scala

20、ble sampling rate profile: It provides flexibility for scalable and low-complexity applications. It is more appropriate in applications where bandwidth is a constraint.,AAC audio stream format,ADIF- Audio Data Interchange FormatThis format uses only one header in the beginning of the file followed b

21、y the raw audio data blocks. It is generally used for storage applications. ADTS- Audio Data Transport StreamThis format uses separate header for each frame enabling decoding from any frame. This format is mainly used for transport applications.,ADTS header format 18,Factors to be considered for mul

22、tiplexing and transmission,The audio and video coded bit streams are split into smaller data packets. The frame wise arrangement of the coded video and audio streams help in forming small data packets. While multiplexing, equal priority is given to all the elementary streams. Additional information

23、to help synchronize the audio and video at the de-multiplexer.,Packetization 2 layers of packetization are adopted for multiplexing that conform to MPEG 2 systems standard: PES Packetized Elementary Stream layer TS- Transport Stream layer,Packetized elementary streams (PES),Elementary streams (ES) a

24、re composed of: Encoded video (AVS) stream Encoded audio (AAC) stream Optional Data stream PES contains the access units (frames) that are sequentially separated and packetized. PES headers differentiates various ES and contains time stamp information useful for synchronizing video and audio stream

25、at the de-multiplexer. PES packet sizes varies with the size of each access unit. Each PES can have data from only one ES.,Packetized elementary streams (PES) 22,PES header description,3 bytes of start code 0x000001.1 byte of stream ID (unique for each ES).2 bytes of packet length.2 bytes of time st

26、amp (frame number),Frame number as time stamp,For video PES:Since the video frame rate is constant (i.e. either 25 or 30 frames per second ), the playback time of a particular frame can be calculated from the frame number as Playback time = frame number/fps For audio PES:Since the input sampling fre

27、quency is constant (i.e. between 8-96 kHz) and the No. samples per AAC frame is 1024, the playback time of a particular audio frame can be calculated from the frame number as Playback time = 1024*frame number *(1/sampling frequency),Method adopted in MPEG 2 systems standard for time stamps,Audio-vid

28、eo synchronization is achieved using the presentation time stamp (PTS) The encoder attaches a PTS to video and audio frame which is a 33 bit value in cycles of a 90-KHz system time clock (STC). Additional information known as program clock reference (PCR) which is the value of the STC at the encoder

29、 is periodically transmitted to achieve exact synchronization.,Advantages of using frame number as time stamp over the existing method that uses clock samples as time stamp,Less complex and is suitable for software implementation. No synchronization problem due to clock jitters. No propagation of de

30、lay between audio and video streams. Saves the extra over head in the PES header bytes used for sending the PCR bytes.,Transport stream packetization,PES packets formed from the various elementary streams are broken into smaller packets known as the transport stream (TS) packets. Transport stream pa

31、ckets have a fixed length of 188 bytes. One of the reasons of the choosing the TS packet size is the interoperability with the ATM packets such that each MPEG 2 TS packets is broken down to 4 ATM packets. Constraints: Each TS packet can have data from only one PES. PES header should be the first byt

32、e of the TS payload. If the above constraints are not met, stuffing bytes are added.,Transport stream 22,TS packet header 22, 25,TS packet header description,Sync byte: A TS packet always starts with a sync byte of 0x47. Payload unit start indicator: This bit is set to indicate that the first byte o

33、f the PES packet is present in the payload of the current TS packet. Adaptation field control (AFC): This bit is set if the data carried in the TS packet payload is other than the PES data. This can be a stretch of stuffing bytes in case the length of PES data is less than 185 bytes. Packet identifi

34、er (PID): This is a 10 bit packet identifier value. This is used to uniquely identify the video and audio ES. Some values of the PID are pre-defined such as a PID value of 0x1FFF indicates a null TS packet which is sent in regular intervals to create an overall constant bit stream. Continuity counte

35、r: This is a 4 bit counter which is incremented by one every time the data from the same PES is encapsulated into a TS packet. Payload byte offset: If AFC is set to 1, byte offset value of the start of the payload is mentioned here.,Adopted multiplexing method,Multiplexing method plays an important

36、role in avoiding the buffer overflow or underflow at the de-multiplexing end. Video and audio timing counters are used to ensure effective multiplexing of the TS packets. Timing counters are incremented according to the playback time of each TS packet. A packet with the least timing counter value is

37、 always given preference during packet allocation.,Calculating playback time of each TS packet,Multiplexed transport stream 22,De-multiplexing process,Buffer fullness at the de-multiplexer using the adopted method,Synchronization and playback,The data is loaded from the buffer during playback. IDR f

38、rame searched from the starting of the video buffer. Frame number of the IDR frame is extracted. The playback time of the current IDR frame is calculated as, Video playback time=IDR frame number/fps The corresponding audio frame number is calculated as, Audio frame number= (Video playback time * sam

39、pling frequency)/1024,Synchronization and playback,If a non-integer value, the audio frame number is rounded off and the corresponding audio frame is searched in the audio buffer. The audio and video contents from the corresponding frame numbers are decoded and played back. Then the audio and video

40、buffers are refreshed and new set of data are loaded into the buffers and this process continues. If the corresponding audio frame is not found in the buffer, then next IDR frame is searched and the same process is repeated.,Results,Synchronization results,Conclusions,Synchronization of audio-video

41、is achieved by starting the de-multiplexer from any TS packet. Visually there is no lag between the video and audio. The buffer fullness at the de-multiplexer end is continuously monitored and buffer overflow or underflow is prevented using the adopted multiplexing method.,Test conditions,Input raw

42、video: YUV format. Input raw audio: WAVE format. Profiles used: AVS: Jizhun (baseline) profile. AAC: Low complexity profile with ADTS format. GOP: IBBPBB (IDR forced). Video frame rate: 25 frames per second. Audio sampling frequency: 44.1 KHz. Single program TS is generated.,Future Work,The algorith

43、m can be extended to support multiple elementary streams such as to include subtitles during playback. The proposed algorithm can also be modified to support elementary streams from different video and audio codecs depending on their NAL and ADTS formats respectively. The adopted method can also be

44、extended to support some error resilient codes in the case of transmission of multimedia program over error prone networks.,References,1 L. Yu et al, “Overview of AVS-Video: tools, performance and complexity”, SPIE VCIP, vol. 5960, pp. 596021-1596021-12, Beijing, China, July 2005. 2 W. Gao et al, “A

45、VS- The Chinese next-generation video coding standard”, National Association of Broadcasters, Las Vegas, 2004. 3 X. Wang et al, “Performance comparison of AVS and H.264/AVC video coding standards”, J. Computer Science & Technology, vol. 21, No. 3, pp. 310-314, May 2006. 4 L. Yu et al. “Overview of A

46、VS-video coding standards”, Special issue on AVS standards, Signal Processing: Image Communication, vol. 24, pp. 247-262, April 2009. 5 AVS Work Group website http:/ 6 R. A. Burger et al, “A survey of digital TV standards China”, IEEE Second International Conference on Communications and Networking

47、in China, pp. 687-696, Aug. 2007. 7 L. Fan et al, “Overview of AVS video standard”, in the proceedings of IEEE Intl Conf. on Multimedia and Expo, ICME 04, vol. 1, pp. 423-426, Taipei, Taiwan, June 2004.,References,8 Information Technology Advanced coding of audio and video Part 2: Video, The standar

48、ds of Peoples Republic of China, GB/T 20090.2 2006. 9 I. E. G. Richardson, “The H.264 advanced video compression standard”, II Edition Wiley, 2010. 10 C. X. Zhang et al, “The technique of pre-scaled transform”, IEEE Intl Symposium on Circuits and Systems, vol.1, pp. 316-319, May 2005. 11 Q. Wang et

49、al, “Context-based 2D-VLC entropy coder in AVS video coding standard”, J. Computer Science & Technology, vol. 21, No.3, pp. 315-322, May 2006 12 H. Jia et al, “An AVS HDTV video decoder architecture employing efficient HW/SW partitioning”, IEEE Trans. on Consumer Electronics, vol. 52, pp. 1447- 1453

50、, Nov. 2006. 13 T. Wiegand et al, “Overview of the H.264/AVC video coding standard”, IEEE Trans. on CSVT, vol. 13, pp. 560-576, July 2003. 14 GB 20090.2 RTP Payload Format, FG IPTV- 0512, International Telecommunications Union, May 2007. 15 Information Technology Generic coding of moving pictures and associated audio: Systems, International Standard 13818-1, ISO/IEC JTC1/SC29/WG11 N0801, 1994.,

展开阅读全文