1、BRITISH STANDARD BS ISO/IEC 14496-14:2003 Information technology Coding of audio-visual objects Part 14: MP4 file format ICS 35.040 BS ISO/IEC 14496-14:2003 This British Standard was published under the authority of the Standards Policy and Strategy Committee on 10 February 2004 BSI 10 February 2004
2、 ISBN 0 580 43390 0 National foreword This British Standard reproduces verbatim ISO/IEC 14496-14:2003 and implements it as the UK national standard. The UK participation in its preparation was entrusted to Technical Committee IST/37, Coding of audio, picture, multimedia and hypermedia information, w
3、hich has the responsibility to: A list of organizations represented on this committee can be obtained on request to its secretary. Cross-references The British Standards which implement international publications referred to in this document may be found in the BSI Catalogue under the section entitl
4、ed “International Standards Correspondence Index”, or by using the “Search” facility of the BSI Electronic Catalogue or of British Standards Online. This publication does not purport to include all the necessary provisions of a contract. Users are responsible for its correct application. Compliance
5、with a British Standard does not of itself confer immunity from legal obligations. aid enquirers to understand the text; present to the responsible international/European committee any enquiries on the interpretation, or proposals for change, and keep the UK interests informed; monitor related inter
6、national and European developments and promulgate them in the UK. Summary of pages This document comprises a front cover, an inside front cover, the ISO/IEC page, pages ii to vi, pages 1 to 11 and a back cover. The BSI copyright notice displayed in this document indicates when the document was last
7、issued. Amendments issued since publication Amd. No. Date Comments Reference number ISO/IEC 14496-14:2003(E)INTERNATIONAL STANDARD ISO/IEC 14496-14 First edition 2003-11-15 Information technology Coding of audio-visual objects Part 14: MP4 file format Technologies de linformation Codage des objets a
8、udiovisuels Partie 14: Format de fichier MP4 BSISO/IEC1449614:2003DPlcsid Fremia ii BSISO/IEC1449614:2003 iiiContents Page Foreword iv Introduction v 0.1 Derivation. v 0.2 Interchange v 0.3 Content Creation . v 0.4 Streamed presentation vi 1 Scope 1 2 Normative references . 1 3 Storage of MPEG-4 1 3
9、.1 Elementary Stream Tracks. 1 3.2 Track Identifiers 3 3.3 Synchronization of streams. 4 3.4 Composition 5 3.5 Handling of FlexMux. 5 4 File Identification. 6 5 Additions to the Base Media Format. 6 5.1 Object Descriptor Box 7 5.2 Track Reference Types. 7 5.3 Track Header Box 8 5.4 Handler Reference
10、 Types. 8 5.5 MPEG-4 Media Header Boxes 8 5.6 Sample Description Boxes. 8 5.7 Degradation Priority Values. 10 6 Template fields used. 10 Annex A (informative) Patent statements 11 BSISO/IEC1449614:2003iv Foreword ISO (the International Organization for Standardization) and IEC (the International Ele
11、ctrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of tech
12、nical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint techn
13、ical committee, ISO/IEC JTC 1. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by the joint technical committee are ci
14、rculated to national bodies for voting. Publication as an International Standard requires approval by at least 75 % of the national bodies casting a vote. ISO/IEC 14496-14 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, m
15、ultimedia and hypermedia information. ISO/IEC 14496 consists of the following parts, under the general title Information technology Coding of audio-visual objects: Part 1: Systems Part 2: Visual Part 3: Audio Part 4: Conformance testing Part 5: Reference software Part 6: Delivery Multimedia Integrat
16、ion Framework (DMIF) Part 7: Optimized reference software for coding of audio-visual objects Part 8: Carriage of ISO/IEC 14496 contents over IP networks Part 9: Reference hardware description Part 10: Advanced Video Coding (AVC) Part 11: Scene description and application engine Part 12: ISO base med
17、ia file format Part 13: Intellectual Property Management and Protection (IPMP) extensions Part 14: MP4 file format Part 15: Advanced Video Coding (AVC) file format Part 16: Animation Framework eXtension (AFX) BSISO/IEC1449614:2003 vIntroduction 0.1 Derivation This specification defines MP4 as an ins
18、tance of the ISO Media File format ISO/IEC 14496-12 and ISO/IEC 15444-12. The general nature of the ISO Media File format is fully exercised by MP4. MPEG-4 presentations can be highly dynamic, and there is an infrastructure the Object Descriptor Framework , which serves to manage the objects and str
19、eams in a presentation. An Initial Object Descriptor serves as the starting point for this framework. In the usage modes documented in the ISO Media File, an Initial Object Descriptor would normally be present, as shown in the following diagrams. 0.2 Interchange The following diagram gives an exampl
20、e of a simple interchange file, containing two streams. IOD moov mp4 file mdat trak (BIFS) trak (OD) trak (video) trak (audio) other boxes Interleaved, time-ordered, BIFS, OD, video, and audio access unitsFigure 1 Simple interchange file 0.3 Content Creation In the following diagram, a set of files
21、being used in the process of content creation is shown. BSISO/IEC1449614:2003vi with other unused data media file mp4 file BIFS access units possibly unordered with other unused data Video and audio access units possibly unordered mdat IOD moov mp4 file trak (BIFS) trak (OD) trak (video) trak (audio
22、) other boxes other boxes (inc. moov) Figure 2 Content Creation File 0.4 Streamed presentation The following diagram shows a presentation prepared for streaming over a multiplexing protocol, only one hint track is required. IOD moov mp4 file mdat trak (BIFS) trak (OD) trak (video) trak (audio) other
23、 boxes Interleaved, time-ordered, BIFS, OD, video, and audio access units, and hint instructions hint Figure 3 Hinted Presentation for Streaming BSISO/IEC1449614:2003 1Information technology Coding of audio-visual objects Part 14: MP4 file format 1 Scope This International Standard defines the MP4 f
24、ile format, as derived from the ISO Base Media File format. 2 Normative references The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (i
25、ncluding any amendments) applies. ISO/IEC 14496-1:2001, Information technology Coding of audio-visual objects Part 1: Systems ISO/IEC 14496-12: Information technology Coding of audio-visual objects Part 12: ISO base media file format (technically identical to ISO/IEC 15444-12) 3 Storage of MPEG-4 3.
26、1 Elementary Stream Tracks 3.1.1 Elementary Stream Data To maintain the goals of streaming protocol independence, the media data is stored in its most natural format, and not fragmented. This enables easy local manipulation of the media data. Therefore media-data is stored as access units, a range o
27、f contiguous bytes for each access unit (a single access unit is the definition of a sample for an MPEG-4 media stream). This greatly facilitates the fragmentation process used in hint tracks. The file format can describe and use media data stored in other files, however this restriction still appli
28、es. Therefore if a file is to be used which contains pre-fragmented media data (e.g. a FlexMux stream on disc), the media data will need to be copied to re-form the access units, in order to import the data into this file format. This is true for all stream types in this specification, including suc
29、h meta-information streams as Object Descriptor and the Clock Reference. The consequences of this are, on the positive side, that the file format treats all streams equally; on the negative side, this means that there are internal cross-links between the streams. This means that adding and removing
30、streams from a presentation will involve more than adding or deleting the track and its associated media-data. Not only must the stream be placed in, or removed from, the scene, but also the object descriptor stream may need updating. For each track, the entire ES-descriptor is stored as the sample
31、description or descriptions. The SLConfigDescriptor for the media track shall be stored in the file using a default value (predefined = 2), except when the Elementary Stream Descriptor refers to a stream through a URL, i.e. the referred stream is outside the scope of the MP4 file. In that case the S
32、LConfigDescriptor is not constrained to this predefined value. BSISO/IEC1449614:20032 In a transmitted bit-stream, the access units in the SL Packets are transmitted on byte boundaries. This means that hint tracks will construct SL Packet headers using the information in the media tracks, and the hi
33、nt tracks will reference the access units from the media track. The placement of the header during hinting is possible without bit shifting, as each SL Packet and corresponding contained access unit will both start on byte boundaries. 3.1.2 Elementary Stream Descriptors The ESDescriptor for a stream
34、 within the scope of the MP4 file as described in this document is stored in the sample description and the fields and included structures are restricted as follows. ES_ID set to 0 as stored; when built into a stream, the lower 16 bits of the TrackID are used. streamDependenceFlag set to 0 as stored
35、; if a dependency exists, it is indicated using a track reference of type dpnd. URLflag kept untouched, i.e. set to false, as the stream is in the file, not remote. SLConfigDescriptor is predefined type 2. OCRStreamFlag set to false in the file. The ESDescriptor for a stream referenced through an ES
36、 URL is stored in the sample description and the fields and included structures are restricted as follows. ES_ID set to 0 as stored; when built into a stream, the lower 16 bits of the TrackID are used. streamDependenceFlag set to 0 as stored; if a dependency exists, it is indicated using a track ref
37、erence of type dpnd. URLflag kept untouched, i.e. set to true, as the stream is not in the file. SLConfigDescriptor kept untouched. OCRStreamFlag set to false in the file. Note that the QoSDescriptor also may need re-writing for transmission as it contains information about PDU sizes etc. 3.1.3 Obje
38、ct Descriptors The initial object descriptor and object descriptor streams are handled specially within the file format. Object descriptors contain ES descriptors, which in turn contain stream specific information. In addition, to facilitate editing, the information about a track is stored as an ESD
39、escriptor in the sample description within that track. It must be taken from there, re-written as appropriate, and transmitted as part of the OD stream when the presentation is streamed. As a consequence, ES descriptors are not stored within the OD track or initial object descriptor. Instead, the in
40、itial object descriptor has a descriptor used only in the file, containing solely the track ID of the elementary stream. When used, an appropriately re-written ESDescriptor from the referenced track replaces this descriptor. Likewise, OD tracks are linked to ES tracks by track references. Where an E
41、S descriptor would be used within the OD track, another descriptor is used, which again occurs only in the file. It contains the index into the set of mpod track references that this OD track owns. A suitably re-written ESDescriptor replaces it by the hinting of this track. BSISO/IEC1449614:2003 3Th
42、e ES_ID_Inc is used in the Object Descriptor Box: class ES_ID_Inc extends BaseDescriptor : bit(8) tag=ES_IDIncTag unsigned int(32) Track_ID; / ID of the track to use ES_ID_IncTag = 0x0E is reserved for file format usage. The ES_ID_Ref is used in the OD stream: class ES_ID_Ref extends BaseDescriptor
43、: bit(8) tag=ES_IDRefTag bit(16) ref_index; / track ref. index of the track to use ES_ID_RefTag = 0x0F is reserved for file format usage. MP4_IOD_Tag = 0x10 is reserved for file format usage. MP4_OD_Tag = 0x11 is reserved for file format usage. IPI_DescrPointerRefTag = 0x12 is reserved for file form
44、at usage. ES_DescrRemoveRefTag = 0x07 is reserved for file format usage (command tag). NOTE The above tag values are defined in 8.2.2.2 Table 1 and 8.2.3.2 Table 2 of the MPEG-4 Systems Specification, and the actual values should be referenced from those tables. A hinter may need to send more OD eve
45、nts than actually occur in the OD track: for example, if the ES_description changes at a time when there is no event in the OD track. In general, any OD events explicitly authored into the OD track should be sent along with those necessary to indicate other changes. The ES descriptor sent in the OD
46、track is taken from the description of the temporally next sample in the ES track (in decoding time). 3.2 Track Identifiers The track identifiers used in an MP4 file are unique within that file; no two tracks may use the same identifier. Each elementary stream in the file is stored as a media track.
47、 In the case of an elementary stream, the lower two bytes of the four-byte track_ID shall be set to the elementary stream identifier (ES_ID).; the upper two bytes of the track_ID are zero in this case. Hint tracks may use track identifier values in the same range, if this number space is adequate (w
48、hich it generally is). However, hint track identifiers may also use larger values of track identifier, as their identifiers are not mapped to elementary stream identifiers. Thus very large presentations may use the entire 16-bit number space for elementary stream identifiers. The next track identifi
49、er value, found in next_track_ID in the MovieHeaderBox, as defined in the ISO Base Media Format, generally contains a value one greater than the largest track identifier value found in the file. This enables easy generation of a track identifier under most circumstances. However, if this value is equal to or larger than 65535, and a new media track is to be added, then a search must be