1、 Copyright 2011 by THE SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS 3 Barker Avenue, White Plains, NY 10601 (914) 761-1100 Approved October 3, 2011 Table of Contents Page Foreword 2 Intellectual Property 2 Introduction 2 1 Scope . 3 2 Conformance Notation . 3 3 Normative References . 4 4 Notat
2、ion 4 4.1 Abbreviations . 4 4.2 Definition of Terminology . 4 5 File Identification 5 6 Track Definition 5 7 Layer Definition 6 7.1 Layer Table Box . 6 7.2 Layer Information Box 6 8 Video Stream Definition . 8 8.1 Base Track . 8 8.2 VC-4 Track . 8 Annex A Bibliography (Informative) . 14Table of Tabl
3、es Table 1 Scalability field 7Page 1 of 14 pages SMPTE RP 2058-4:2011 SMPTE RECOMMENDED PRACTICE VC-4 Bitstream Storage in the ISO Base Media File Format SMPTE RP 2058-4:2011 Page 2 of 14 pages Foreword SMPTE (the Society of Motion Picture and Television Engineers) is an internationally-recognized s
4、tandards developing organization. Headquartered and incorporated in the United States of America, SMPTE has members in over 80 countries on six continents. SMPTEs Engineering Documents, including Standards, Recommended Practices, and Engineering Guidelines, are prepared by SMPTEs Technology Committe
5、es. Participation in these Committees is open to all with a bona fide interest in their work. SMPTE cooperates closely with other standards-developing organizations, including ISO, IEC and ITU. SMPTE Engineering Documents are drafted in accordance with the rules given in Part XIII of its Administrat
6、ive Practices. SMPTE RP 2058-4 was prepared by Technology Committee 10E on Essence. Intellectual Property SMPTE draws attention to the fact that it is claimed that compliance with this Recommended Practice may involve the use of one or more patents or other intellectual property rights (collectively
7、, “IPR“). The Society takes no position concerning the evidence, validity, or scope of this IPR. Each holder of claimed IPR has assured the Society that it is willing to License all IPR it owns, and any third party IPR it has the right to sublicense, that is essential to the implementation of this R
8、ecommended Practice to those (Members and non-Members alike) desiring to implement this Recommended Practice under reasonable terms and conditions, demonstrably free of discrimination. Each holder of claimed IPR has filed a statement to such effect with SMPTE. Information may be obtained from the Di
9、rector, Standards or that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain possibility or course of action is deprecated but not prohibited. The keywords “may“ and “need not“ indicate courses of action permissible within the limits of the
10、 document. The keyword “reserved” indicates a provision that is not defined at this time, shall not be used, and may be defined in the future. The keyword “forbidden” indicates “reserved” and in addition indicates that the provision will never be defined in the future. Unless otherwise specified, th
11、e order of precedence of the types of normative information in this document shall be as follows: Normative prose shall be the authoritative definition; Tables shall be next; followed by formal languages; then figures; and then any other language forms. SMPTE RP 2058-4:2011 Page 4 of 14 pages 3 Norm
12、ative References The following standards contain provisions which, through reference in this text, constitute provisions of this recommended practice. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this recomme
13、nded practice are encouraged to investigate the possibility of applying the most recent edition of the standards indicated below. SMPTE ST 2058-1:2011, VC-4 Layered Video Extension Bitstream Format and Decoding Process ISO/IEC 14496-12:2008, Information Technology Coding of Audio-Visual Objects Part
14、 12: ISO Base Media File Format 4 Notation 4.1 Abbreviations This section provides a list of acronyms used in this document. EBS Encapsulated Bitstream Segment HRD Hypothetical Reference Decoder 4.2 Definition of Terminology 4.2.1 Base Layer A picture that is decoded from the base layer stream, spec
15、ified by other video coding standards, such as ITU-T H.262 (MPEG-2) or ITU-T H.264 (MPEG-4 AVC), etc, is used in reconstructing a new picture when integrating with the residual picture decoded from the enhancement layer stream. The synchronization of the base layer picture and other enhancement laye
16、r streams is expected to be signaled in the system layer. 4.2.2 Base Layer Stream A sequence of bits that forms the representation of coded picture at the base layer consists of one or more video sequences which are specified by other video coding standards, such as ITU-T H.262 (MPEG-2) or ITU-T H.2
17、64 (MPEG-4 AVC), etc. 4.2.3 Enhancement Layer A residual layer that has at least one lower layer. The lowest layer is the base layer. The number of enhancement layers should be greater than or equal to one and each enhancement layer is a residual layer. 4.2.4 Enhancement Layer Stream A VC-4 residual
18、 enhancement layer stream consists of one or more VC-4 residual sequences. A VC-4 residual enhancement layer stream may also result from re-assembling enhancement residual sub-bitstreams. 4.2.5 Entry Point Sample A sample which includes sequence header and is considered as the new starting point of
19、the sequence. The subsequent coded samples do not use the previously coded samples as the reference sample before this entry point sample. It also means the current entry point sample does not use any reference samples for motion compensation. SMPTE RP 2058-4:2011 Page 5 of 14 pages 5 File Identific
20、ation A file type box, as defined in the ISO base media file format (ISO/IEC 14496-12), shall be present in conforming files. major_brand: This specification does not define the use of vc-4 as a major brand. minor_version: Major brand is not defined. compatible_brands: Files that conform to this spe
21、cification shall have vc-4 in the compatible brands list. The brand used for base layer shall be in the compatible brands list, thus at least the base layer bitstream can be decoded by a player that supports the brand for the base layer but is non-conformant to this specification. 6 Track Definition
22、 A base layer bitstream and an enhancement layer bitstream shall be separately represented by a base track and one or more VC-4 tracks, respectively. There shall be exactly one base track and at least one VC-4 track in a VC-4 file. In the terminology of the ISO base media file format (ISO/IEC 14496-
23、12), a base track and a VC-4 track are video tracks. Therefore they use: A handler_type of vide in the Handler Reference Box. A Video Media Header Box in the Media Information Box. Additionally, a VC-4 track shall use a box called VC4SampleEntry in the Sample Description Box (this box is derived fro
24、m the VisualSampleEntry defined in ISO/IEC 14496-12). A VC-4 track shall contain a subset of enhancement layers or all enhancement layers. When a VC-4 file has multiple enhancement layers, each enhancement layer may be contained separately in differenct VC-4 tracks, or each enhancement layer may be
25、contained in multiple VC-4 tracks. Each track shall be identified by the track_ID in the Track Header Box. The base track and VC-4 tracks shall be linked to each other by means of track reference boxes, defined in Section 8.3.3 of ISO/IEC 14496-12. Each VC-4 track shall have one Track Reference Box.
26、 The following reference types are defined for VC-4 file format: vbas shall be used by VC-4 tracks to reference the base track. Every VC-4 track shall have this reference type in its track reference box. vext shall be used by VC-4 extractors to indicate the track from which the media data shall be c
27、opied (see Section 8.2.3). vdep shall be used by VC-4 tracks to reference the VC-4 tracks which are required to decode samples in the current track. These referenced VC-4 tracks shall have enhancement layers referenced by the layers in the current track. The referenced track IDs in vdep shall be arr
28、anged in the ascending order. The visual samples in the base track and VC-4 tracks shall be temporally aligned in the decoding timeline and mapped at the same decoding time. For every VC-4 track, its Track_in_preview flag in the flags of the Track Header Box shall be set to 0. SMPTE RP 2058-4:2011 P
29、age 6 of 14 pages 7 Layer Definition 7.1 Layer Table Box 7.1.1 Definition Box Type: ltbl Container: Movie Box (moov) Mandatory: Yes Quantity: Exactly one This box provides information about all the layers in the file, from base layer to the highest enhancement layer. This box shall contain layer_cou
30、nt Layer Information Boxes. Each Layer Information Box provides information of each layer. The Layer Table Box shall specify the information of the layers which exist in different tracks when all the layers including the base layer or a subset of the enhancement layers are separately stored in multi
31、ple tracks. 7.1.2 Syntax class LayerTableBox extends Box(ltbl) unsigned int(8) layer_count; for ( i=1; i = layer_count; i+) LayerInfoBox(); 7.1.3 Semantics layer_count shall be an integer that gives the number of layers. This field shall indicate the number of all the layers in the file, including t
32、he base layer and all enhancement layers. LayerInfoBox is defined in Section 7.2. This box shall provide the information of each layer. 7.2 Layer Information Box 7.2.1 Definition Box Type: lyri Container: Layer Table Box (ltbl) Mandatory: Yes Quantity: Two or more (one for the base layer and one or
33、more for at least one enhancement layer) This box shall provide the information about a single layer in the VC-4 file. This box shall consist of the layer ID, information of the tracks which contain the layer identified with the layer ID, the number of quality layers, scalable method, frame size, fr
34、ame rate and bitrate. 7.2.2 Syntax class LayerInfoBox extends FullBox(lyri, version = 0, 0) unsigned int(8) layer_ID; signed int(8) ref_layer_ID; unsigned int(8) track_count; unsigned int(32)track_count track_ID; unsigned int(3) reserved = 0; SMPTE RP 2058-4:2011 Page 7 of 14 pages unsigned bit(1) q
35、uality_refinement_flag; if (quality_refinement_flag = 1) unsigned int(4) max_quality_layer_ID; else unsigned int(4) reserved = 0; unsigned int(8) 4 scalability; unsigned int(16) width; unsigned int(16) height; unsigned int(32) framerate; unsigned int(32) maxBitrate; unsigned int(32) avgBitrate; 7.2.
36、3 Semantics layer_ID specifies the identifier of the layer. For enhancement layers, this field shall provide the layer ID of the current enhancement layer. The value 0 is reserved for the base layer. A higher layer ID value indicates a higher layer. One enhancement layer may be present in multiple t
37、racks. In this case, their layer IDs must be identical. ref_layer_ID shall specify the layer ID of the lower layer referenced by the current layer. The value of this field shall be equal to the value of REF_LAYER_ID in the sequence header of the current layer (see Section 7.1.2 of SMPTE ST 2058-1).
38、The value -1(0xFF) is reserved for the base layer. track_count shall provide the number of tracks which contain the current layer. For the base layer, this field shall have the value 1 because only one base track exists in VC-4 file as specified in Section 6. For enhancement layers, the value of thi
39、s field shall be equal to or greater than 1 because an enhancement layer may be contained in many VC-4 tracks. track_ID is an array of track IDs which shall specify the tracks which contain the current layer. quality_refinement_flag shall be equal to the value of QUALITY_REFINEMENT_FLAG (see Section
40、 7.1.46 of SMPTE ST 2058-1). Value 1 shall indicate that quality refinement is used for the corresponding layer. Value 0 shall indicate that quality refinement is not used for the corresponding layer. max_quality_layer_ID contains the value of the MAX_QUALITY_LAYER_ID in the picture header of the VC
41、-4 elementary stream (see Section 8.1.5 of SMPTE ST 2058-1). max_quality_layer_ID+1 specifies the number of quality layers in the layer. scalability shall provide the information about the scalable method between the reference layer with the ref_layer_ID and the current layer. It is described by one
42、 of the four-character strings listed below: Table 1 Scalability field Name String Details Base layer base Only used for the base layer SNR scalability snrs The layer is SNR scaled Spatial scalability spls The layer is spatially scaled width and height shall specify the horizontal and vertical sizes
43、 of the layer, respectively. framerate shall specify the frame rate (fps) of the layer. This field shall be set to 0xFFFFFFFF if the frame rate is not known, unspecified, or non-constant. SMPTE RP 2058-4:2011 Page 8 of 14 pages maxBitrate shall indicate the maximum rate in bits/second over a one-sec
44、ond window. avgBitrate shall indicate the average rate in bits/second over the entire length of the current layer. 8 Video Stream Definition 8.1 Base Track Samples of base track are defined in the ISO base file format specification of the codec with which they are encoded. Examples of available code
45、cs for base layer bitstream include: SMPTE VC-1 (SMPTE RP 2025) ISO/IEC MPEG-4 (ISO/IEC 14496-14) ITU-T H.264 (ISO/IEC 14496-15) Note: As more ISO base media file formats become available for other codecs, they may be used for the base track. 8.2 VC-4 Track This section defines the structure of a sa
46、mple in a video elementary bitstream of a VC-4 track. 8.2.1 VC-4 Sample Definition A VC-4 sample consists of encapsulated bitstream segments (EBSs). Each EBS consists of a start code and a bitstream segment. The start code is used to identify the type of the EBS (see Annex B of SMPTE ST 2058-1). EBS
47、 types and order constraints of EBSs in a VC-4 sample are specified in Annex F of SMPTE ST 2058-1. In a VC-4 track, a VC-4 sample shall contain data which belongs only to the enhancement layers contained in the current track. An entry point sample shall start with n sequence header EBSs when the tra
48、ck contains n enhancement layers. The sequence header EBSs shall be located consecutively in lower-to-higher layer order and then frame or field picture data EBSs shall follow. A non-entry point sample starts with frame or field picture data EBSs, without sequence header EBSs. Frame or field picture
49、 data EBSs are ordered in increasing order of their layer IDs. A frame picture data EBS consists of a start code, a frame header, and frame data. A field picture data EBS consists of a start code, a field header, and field data. Frame data and field data are again divided into one or more slice data EBSs. When a VC-4 track has one enhancement layer, entry point samples in this track shall have one sequence header EBS, and then one frame or field picture data EBS shall f