1、INTERNATIONAL STANDARD ISO/IEC 11172-I First edition 1993-08-01 Information technology - Coding of moving pictures and associated audio for digital storage media at up to about I,5 Mbit/s - Part 1: Systems Technologies de Iinforma tion - Codage de /image animbe et du son associ6 pour /es supports de
2、 stockage numdrique jusqui) environ 1,5 Mbit/s - Pat-tie 7: Syst by contrast, the semantic rules apply to the combined stream in its entirety. The systems specification does not specify the architecture or implementation of encoder or decoders. However, bitstream properties do impose functional and
3、performance requirements on encoders and decoders. For instance, encoders must meet minimum clock tolerance requirements. Notwithstanding this and other requirements, a considerable degree of freedom exists in the design and implementation of encoders and decodes. A prototypical audio/video decoder
4、system is depicted in figure 1 to illustrate the function of an ISO/IEC 11172 decoder. The architecture is not unique - System Decoder functions including decoder timing control might equally well be distributed among elementary stream decoders and the Medium Specific Decoder - but this figure is us
5、eful for discussion. The prototypical decoder design does not imply O IxI=Owhenx=O xl=-xwhenxo 0 XC 0 -1 x Greater than. e Greater than or equal to. Shift right with sign extension. Shift left with zero till. 2.2.5 Assignment = Assignment operator. 2.2.6 Mnemonics The following mnemonics are defined
6、 to describe the different data types used in the coded bit-stream. bslbf pa equal to 1 for single-channel mode, 2 in other modes. (Audio) Granule of 3 * 32 subband samples in audio Layer II, 18 * 32 sub-band samples in audio Layer III. (Audio) The main-data portion of the bitstream contains the sca
7、lefactors, Huffman encoded data, and ancillary information. (Audio) The location in the bitstream of the beginning of the main-data for the frame. The location is equal to the ending location of the previous frames main-data plus one bit. It is calculated from the main expr2; expr3) ( exprl is an ex
8、pression specifying the initialization of the loop. Normally it data-element specifies the initial state of the counter. expd is a condition specifying a test . . . made before each iteration of the loop. The loop terminates when the condition I is not true. expr3 is an expression that is performed
9、at the end of each iteration of the loop, normally it increments a counter. Note that the most common usage of this construct is as follows: for ( i = 0; i c n; i+) ( The group of data elements occurs n times. Conditional constructs data-element within the group of data elements may depend on the va
10、lue of the . . . loop control variable i, which is set to zero for the first occurrence, 1 incremented to one for the second occurrence, and so forth. As noted, the group of data elements may contain nested conditional constructs. For compactness, the ( ) may be omitted when only one data element fo
11、llows. data-element data-element 0 is an array of data The number of dam elements is indicated by the context. data-element n data-element n is the n+lth element of an array of data. data-element mn data-element mn is the m+l,n+l th element of a two-dimensional array of data-element Imn data-element
12、 lmn is the l+l,m+l,n+l th element of a three-dimensional array of data. data-element mn is the inclusive range of bits between bit m and bit n in the data-element. While the syntax is expressed in procedural terms, it should not be assumed that 2.4.3 implements a satisfactory decoding procedure. In
13、 particular, it defines a correct and error-free input bitstream. Actual decoders must include a means to look for start codes in order to begin decoding correctly, and to identify errors, erasures or insertions while decoding. The methods to identify these situations, and the actions to be taken, a
14、re not standardized. Definition of bytealigned function The function bytealigned 0 returns 1 if the current position is on a byte boundary, that is the next bit in the bit stream is the ftrst bit in a byte. Otherwise it returns 0. Definition of nextbits function The function nextbits () permits comp
15、arison of a bit string with the next bits to be decoded in the bit StfMIIl. Definition of next-start-code function The nextSarUode function removes any zero bit and zero byte stuffing and locates the next start code. Spltax No. of bits Mnemonic next_start_codeo ( while ( !bytealigned() ) zero-bit 1
16、9, I, 0 while ( nextbits != 000 0000 0000 0000 0000 0001 ) zero-byte 8 “00000000” 1 This function checks whether the current position is byte aligned. If it is not, zero stuffing bits are present. After that any number of zero bytes may be present before the start-code. Therefore start-codes are alw
17、ays byte aligned and may be preceded by any number of zero stuffing bits. 14 0 ISOAEC ISOAEC 11172-l: 1993 (E) 2.4 Requirements 2.4.1 Coding structure and parameters The system coding layer allows one or more elementary streams to be combined into a single stream. Data from each elementary stream am
18、 multiplexed and encoded together with information that allows elementary streams to be replayed in synchronism. ISOAEC 11172 multiplexed stream An ISOLEC 11172 stream consists of one or more elementary streams multiplexed together. Each elementary stream consists of access units, which are the code
19、d representation of presentation units. The presentation unit for a video elementary stream is a picture. The corresponding access unit includes all the coded data for the picture. The access unit containing the first coded picture of a group of pictures also includes any preceding data from that gr
20、oup of pictures, as defined in 2.4.2.4 in ISOLEC 11172-2, starting with the group-start-code. The access unit containing the fast coded picture after a sequence header, as defined in 2.4.2.3 in part 2, also includes that sequence header. The sequence-end-code is included in the access unit containin
21、g the last coded picture of a sequence. (See 2.4.2.2 in ISO/IEC 11172-2 for the definition of the sequence-end-code). The presentation unit for an audio elementary stream is the set of samples that corresponds to samples from an audio frame (see 2.4.3.1, 2.4.2.1, and 2.4.2.2 in ISO/IEC 11172-3 for t
22、he definition of an audio frame). Data from elementary streams is stored in packets. A packet consists of a packet header followed by packet data. The packet header begins with a 32-bit start-code that also identifies the stream to which the packet data belongs. The packet header may contain decodin
23、g and/or presentation time-stamps (DTS and PTS) that refer to the first access unit that commences in the packet. The packet data contains a variable number of contiguous bytes from one elementary stream. Packets are organised in packs. A pack commences with a pack header and is followed by zero or
24、more packets. The pack header begins with a 32-bit start-code. The pack header is used to store timing and bitrate information. The stream begins with a system header that optionally may be repeated. The system header carries a summary of the system parameters defined in the stream. 2.4.2 System tar
25、get decoder The semantics of the multiplexed stream specified in 2.4.4 and the constraints on these semantics specified in 2.4.5 require exact definitions of decoding events and the times at which these events occur. The definitions needed are set out in this International Standard using a hypotheti
26、cal decoder known as the system target decoder (STD). The STD is a conceptual model used to define these terms precisely and to model the decoding process during the construction of ISO/IEC 11172 streams. The STD is defined only for this purpose. Neither the architecture of the STD nor the timing de
27、scribed precludes unintcrruptcd, synchronized play-back of ISO/IEC 11172 multiplexed streams from a variety of decoders with different architectures or timing schedules. 15 ISOAEC 11172-1: 1993 (E) 0 ISOAEC P,(k) h-#3 Notation A System Control Figure 2 - Diagram of system target decoder The followin
28、g notation is used to describe the system target decoder and is partially illustrated in figure 2. i, i are indices to bytes in the ISO/IEC 11172 multiplexed stream. The first byte has index 0. j is an index to access units in the elementary streams. k, k,k” are indices to presentation units in the
29、elementary streams. n is an index to the elementary streams. M(i) is the i* byte in the ISO/IJX 11172 multiplexed stream. Mi) indicates the time in seconds at which the i* byte of the ISO/IEC 11172 multiplexed stream enters the system target decoder. The value tm(0) is an arbitmry constant. SCR(i) i
30、s the time encoded in the SCR field measured in units of the 90 kHz system clock MO G(i) P i N; i+) ( packet-data-byte I 24 bslbf 8 uimsbf 16 uimsbf 8 2 1 13 4 3 1 15 1 15 1 4 3 1 15 1 15 1 4 3 1 15 1 15 1 8 bslbf 8 bslbf 7 bslbf bslbf bslbf uimsbf bslbf bslbf bslbf bslbf bslbf bslbf bslbf bslbf bsl
31、bf bslbf bslbf bslbf bslbf bslbf bslbf bslbf bslbf bslbf bslbf bslbf bslbf 21 ISO/IEC 11172-l: 1993 (E) 0 ISOAEC 2.4.4 Semantic definition of fields in syntax 2.4.4.1 iSO/iEC 11172 Layer iso- 1172-end-code - The iso- 1172-end-code is the bit string “0000 0000 0000 0000 0000 0001 1011 1001” (OOOOO1B9
32、 in hexadecimal). It terminates the ISO/IEC 11172 multiplexed stream. 2.4.4.2 Pack Layer Pack pack-start-code - The pack-start-code is the bit string “0000 0000 0000 0000 0000 Oool 1011 1010” (OOOOO1BA in hexadecimal). It identifies the beginning of a pack. system-clock-reference - The system-clock-
33、reference (SCR) is a 33-bit number coded in three separate fields. It indicates the intended time of arrival of the last byte of the system-clock-reference field at the input of the system target decoder. The value of the SCR is measured in the number of periods of a 9OkHz system clock with a tolera
34、nce specified in 2.4.2. IJsing the notation of 2.4.2, the value encoded in the system-clock-reference is: SCR(i) = NINT (system-clock-frequency * (tm(i) ) % 233 for i such that M(i) is the last byte of the coded system-clock-reference field. marker-bit - A marker-bit is a one bit field that has the
35、value “1”. mux-rate - This is a positive integer specifying the rate at which the system target decoder receives the ISO/IEC 11172 multiplexed stream during the pack in which it is included. The value of mux-rate is measured in units of 50 bytes/s, rounded upwards. The value zero is forbidden. The v
36、alue represented in mux-rate is used to define the time of arrival of bytes at the input to the system target decoder in 2.4.2. Ihe value encoded in the mux-rate field may vary from pack to pack in an ISO/lEC 11172 multiplexed StlBXll. System Header system-header-start-code - The system-header-start
37、-code is the bit string “0000 0000 0000 0000 0000 0001 1011 1011” (000001BB in hexadecimal). It identifies the beginning of a system header. header-length - The header-length shall be equal to the number of bytes in the system header following the header-length field. Note that future extensions of
38、this part of ISOAEC 11172 may extend the system healer. rate-bound - The rate-bound is an integer value greater than or equal to the maximum value of the mux-rate field coded in any pack of the ISO/IEC 11172 multiplexed stream. It may be used by a decoder to assess whether it is capable of decoding
39、the entire stream. audio-bound - The audio-bound is an integer, in the inclusive range from 0 to 32, greater than or equal to the maximum number of ISO/lEC 11172 audio streams in the ISO/IEC 11172 multiplexed stream for which the decoding processes are simultaneously active. For the purpose of this
40、clause, the decoding process of an MPEG audio stream is active, if the STD buffer is not empty, or if the decoded access unit is being presented in the STD model. fured-flag - The fwed-flag is a one-bit flag. If its value is set to “1” fixed bitrate operation is indicated. If its value is set to “0”
41、 variable bitrate operation is indicated. During fixed bitrate operation, the value encoded in all system-clock-reference fields in the multiplexed ISO/IEC 11172 stream shall adhere to the following linear equation: SCR(i) = NINT (cl * i + c2) % 2 33 WhfX cl is a real-valued constant valid for all i
42、; c2 is a real-valued constant valid for all i; i is the index in the ISOAEC 11172 multiplexed stream of the final byte of any system-clock-reference field in the stream. 22 0 ISOAEC ISOAEC 11172-l : 1993 (E) CSPSJlag - The CSPS-flag is a one-bit flag. If its value is set to “1” the ISO/IEC 11172 mu
43、ltiplexed stream meets the constraints defined in 2.4.6. system-audio-lock-flag - The system-audio-lock-f is a one-bit flag indicating that there is a specified, constant rational relationship between the audio sampling rate and the system clock frequency in the system target decoder. Subclause 2.4.
44、2 defines system-clock-frequency and the audio sampling rate is specified in ISO/IEC 11172-3. The system-audio-lock-flag may only be set to “1” if, for all presentation units in all audio elementary streams in the ISO/IEC 11172 multiplexed stream, the ratio of system-clock-frequency to the actual au
45、dio sampling rate, SCASR, is constant and equal to the value indicated in the following table at the nominal sampling rate indicated in the audio stream. SCASR = system-clock else BSn = STD-buffer-size-bound * 1024; 2.4.4.3 Packet Layer packet-start-code-prefix - The packet-start-code-prefix is a 24
46、-bit code. Together with the stream-id that follows, it constitutes a packet start code that identifies the beginning of a packet. The packet-star-code-prefix is the bit string “0000 0000 OOOO 0000 0000 0001” (OOOOOl in hexadecimal). stream-id - The stream-id specifies the type and number of the ele
47、mentary stream as defined by the stream-id table, table 1 in 2.4.4.2. Each elementary stream in an lSO/IEC 11172 multiplexed stream shall have a unique stream-id. packet-length - The packet-length specifies the number of bytes remaining in the packet after the packet-length field. St&E-byte - This i
48、s a fixed g-bit value equal to “1111 1111” that can be inserted by the encoder for example to meet the requirements of the digital storage medium. It is discarded by the decoder. No more than sixteen stuffing bytes shall be present in one packet header. STD-buffer-scale - The ST&buffer-scale is a one-bit field that indicates the scaling factor used to interpret the subsequent ST&buffer&e field. If the preceding stream-id indicates an audio stream, 24