1、Adopted by INCITS (InterNational Committee for Information Technology Standards) as an American National Standard.Date of ANSI Approval: 5/10/2004Published by American National Standards Institute,25 West 43rd Street, New York, New York 10036Copyright 2004 by Information Technology Industry Council
2、(ITI).All rights reserved.These materials are subject to copyright claims of International Standardization Organization (ISO), InternationalElectrotechnical Commission (IEC), American National Standards Institute (ANSI), and Information Technology Industry Council(ITI). Not for resale. No part of th
3、is publication may be reproduced in any form, including an electronic retrieval system, withoutthe prior written permission of ITI. All requests pertaining to this standard should be submitted to ITI, 1250 Eye Street NW,Washington, DC 20005.Printed in the United States of AmericaReference numberISO/
4、IEC 14496-14:2003(E)ISO/IEC 2003INTERNATIONAL STANDARD ISO/IEC14496-14First edition2003-11-15Information technology Coding of audio-visual objects Part 14: MP4 file format Technologies de linformation Codage des objets audiovisuels Partie 14: Format de fichier MP4 ISO/IEC 14496-14:2003(E) PDF discla
5、imer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties ac
6、cept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the
7、 file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. ISO/IEC 2003 All
8、 rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country o
9、f the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web www.iso.org Published in Switzerland ii ISO/IEC 2003 All rights reservedISO/IEC 14496-14:2003(E) ISO/IEC 2003 All rights reserved iiiContents Page Foreword
10、 iv Introduction v 0.1 Derivation. v 0.2 Interchange v 0.3 Content Creation . v 0.4 Streamed presentation vi 1 Scope 1 2 Normative references . 1 3 Storage of MPEG-4 1 3.1 Elementary Stream Tracks. 1 3.2 Track Identifiers 3 3.3 Synchronization of streams. 4 3.4 Composition 5 3.5 Handling of FlexMux.
11、 5 4 File Identification. 6 5 Additions to the Base Media Format. 6 5.1 Object Descriptor Box 7 5.2 Track Reference Types. 7 5.3 Track Header Box 8 5.4 Handler Reference Types. 8 5.5 MPEG-4 Media Header Boxes 8 5.6 Sample Description Boxes. 8 5.7 Degradation Priority Values. 10 6 Template fields use
12、d. 10 Annex A (informative) Patent statements 11 ISO/IEC 14496-14:2003(E) iv ISO/IEC 2003 All rights reservedForeword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National b
13、odies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Othe
14、r international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. International Standards are drafted in accordance with the rules
15、 given in the ISO/IEC Directives, Part 2. The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires appro
16、val by at least 75 % of the national bodies casting a vote. ISO/IEC 14496-14 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information. ISO/IEC 14496 consists of the following parts, under the
17、general title Information technology Coding of audio-visual objects: Part 1: Systems Part 2: Visual Part 3: Audio Part 4: Conformance testing Part 5: Reference software Part 6: Delivery Multimedia Integration Framework (DMIF) Part 7: Optimized reference software for coding of audio-visual objects Pa
18、rt 8: Carriage of ISO/IEC 14496 contents over IP networks Part 9: Reference hardware description Part 10: Advanced Video Coding (AVC) Part 11: Scene description and application engine Part 12: ISO base media file format Part 13: Intellectual Property Management and Protection (IPMP) extensions Part
19、14: MP4 file format Part 15: Advanced Video Coding (AVC) file format Part 16: Animation Framework eXtension (AFX) ISO/IEC 14496-14:2003(E) ISO/IEC 2003 All rights reserved vIntroduction 0.1 Derivation This specification defines MP4 as an instance of the ISO Media File format ISO/IEC 14496-12 and ISO
20、/IEC 15444-12. The general nature of the ISO Media File format is fully exercised by MP4. MPEG-4 presentations can be highly dynamic, and there is an infrastructure the Object Descriptor Framework , which serves to manage the objects and streams in a presentation. An Initial Object Descriptor serves
21、 as the starting point for this framework. In the usage modes documented in the ISO Media File, an Initial Object Descriptor would normally be present, as shown in the following diagrams. 0.2 Interchange The following diagram gives an example of a simple interchange file, containing two streams. IOD
22、moovmp4 filemdattrak (BIFS)trak (OD)trak (video)trak (audio)other boxesInterleaved, time-ordered, BIFS, OD, video, and audio access unitsFigure 1 Simple interchange file 0.3 Content Creation In the following diagram, a set of files being used in the process of content creation is shown. ISO/IEC 1449
23、6-14:2003(E) vi ISO/IEC 2003 All rights reservedwith other unused data media file mp4 file BIFS access units possibly unordered with other unused data Video and audio access units possibly unordered mdat IOD moov mp4 file trak (BIFS) trak (OD) trak (video) trak (audio) other boxes other boxes (inc.
24、moov) Figure 2 Content Creation File 0.4 Streamed presentation The following diagram shows a presentation prepared for streaming over a multiplexing protocol, only one hint track is required. IOD moov mp4 file mdat trak (BIFS) trak (OD) trak (video) trak (audio) other boxes Interleaved, time-ordered
25、, BIFS, OD, video, and audio access units, and hint instructions hint Figure 3 Hinted Presentation for Streaming INTERNATIONAL STANDARD ISO/IEC 14496-14:2003(E) ISO/IEC 2003 All rights reserved 1Information technology Coding of audio-visual objects Part 14: MP4 file format 1 Scope This International
26、 Standard defines the MP4 file format, as derived from the ISO Base Media File format. 2 Normative references The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of
27、 the referenced document (including any amendments) applies. ISO/IEC 14496-1:2001, Information technology Coding of audio-visual objects Part 1: Systems ISO/IEC 14496-12: Information technology Coding of audio-visual objects Part 12: ISO base media file format (technically identical to ISO/IEC 15444
28、-12) 3 Storage of MPEG-4 3.1 Elementary Stream Tracks 3.1.1 Elementary Stream Data To maintain the goals of streaming protocol independence, the media data is stored in its most natural format, and not fragmented. This enables easy local manipulation of the media data. Therefore media-data is stored
29、 as access units, a range of contiguous bytes for each access unit (a single access unit is the definition of a sample for an MPEG-4 media stream). This greatly facilitates the fragmentation process used in hint tracks. The file format can describe and use media data stored in other files, however t
30、his restriction still applies. Therefore if a file is to be used which contains pre-fragmented media data (e.g. a FlexMux stream on disc), the media data will need to be copied to re-form the access units, in order to import the data into this file format. This is true for all stream types in this s
31、pecification, including such meta-information streams as Object Descriptor and the Clock Reference. The consequences of this are, on the positive side, that the file format treats all streams equally; on the negative side, this means that there are internal cross-links between the streams. This mean
32、s that adding and removing streams from a presentation will involve more than adding or deleting the track and its associated media-data. Not only must the stream be placed in, or removed from, the scene, but also the object descriptor stream may need updating. For each track, the entire ES-descript
33、or is stored as the sample description or descriptions. The SLConfigDescriptor for the media track shall be stored in the file using a default value (predefined = 2), except when the Elementary Stream Descriptor refers to a stream through a URL, i.e. the referred stream is outside the scope of the M
34、P4 file. In that case the SLConfigDescriptor is not constrained to this predefined value. ISO/IEC 14496-14:2003(E) 2 ISO/IEC 2003 All rights reservedIn a transmitted bit-stream, the access units in the SL Packets are transmitted on byte boundaries. This means that hint tracks will construct SL Packe
35、t headers using the information in the media tracks, and the hint tracks will reference the access units from the media track. The placement of the header during hinting is possible without bit shifting, as each SL Packet and corresponding contained access unit will both start on byte boundaries. 3.
36、1.2 Elementary Stream Descriptors The ESDescriptor for a stream within the scope of the MP4 file as described in this document is stored in the sample description and the fields and included structures are restricted as follows. ES_ID set to 0 as stored; when built into a stream, the lower 16 bits o
37、f the TrackID are used. streamDependenceFlag set to 0 as stored; if a dependency exists, it is indicated using a track reference of type dpnd. URLflag kept untouched, i.e. set to false, as the stream is in the file, not remote. SLConfigDescriptor is predefined type 2. OCRStreamFlag set to false in t
38、he file. The ESDescriptor for a stream referenced through an ES URL is stored in the sample description and the fields and included structures are restricted as follows. ES_ID set to 0 as stored; when built into a stream, the lower 16 bits of the TrackID are used. streamDependenceFlag set to 0 as st
39、ored; if a dependency exists, it is indicated using a track reference of type dpnd. URLflag kept untouched, i.e. set to true, as the stream is not in the file. SLConfigDescriptor kept untouched. OCRStreamFlag set to false in the file. Note that the QoSDescriptor also may need re-writing for transmis
40、sion as it contains information about PDU sizes etc. 3.1.3 Object Descriptors The initial object descriptor and object descriptor streams are handled specially within the file format. Object descriptors contain ES descriptors, which in turn contain stream specific information. In addition, to facili
41、tate editing, the information about a track is stored as an ESDescriptor in the sample description within that track. It must be taken from there, re-written as appropriate, and transmitted as part of the OD stream when the presentation is streamed. As a consequence, ES descriptors are not stored wi
42、thin the OD track or initial object descriptor. Instead, the initial object descriptor has a descriptor used only in the file, containing solely the track ID of the elementary stream. When used, an appropriately re-written ESDescriptor from the referenced track replaces this descriptor. Likewise, OD
43、 tracks are linked to ES tracks by track references. Where an ES descriptor would be used within the OD track, another descriptor is used, which again occurs only in the file. It contains the index into the set of mpod track references that this OD track owns. A suitably re-written ESDescriptor repl
44、aces it by the hinting of this track. ISO/IEC 14496-14:2003(E) ISO/IEC 2003 All rights reserved 3The ES_ID_Inc is used in the Object Descriptor Box: class ES_ID_Inc extends BaseDescriptor : bit(8) tag=ES_IDIncTag unsigned int(32) Track_ID; / ID of the track to use ES_ID_IncTag = 0x0E is reserved for
45、 file format usage. The ES_ID_Ref is used in the OD stream: class ES_ID_Ref extends BaseDescriptor : bit(8) tag=ES_IDRefTag bit(16) ref_index; / track ref. index of the track to use ES_ID_RefTag = 0x0F is reserved for file format usage. MP4_IOD_Tag = 0x10 is reserved for file format usage. MP4_OD_Ta
46、g = 0x11 is reserved for file format usage. IPI_DescrPointerRefTag = 0x12 is reserved for file format usage. ES_DescrRemoveRefTag = 0x07 is reserved for file format usage (command tag). NOTE The above tag values are defined in 8.2.2.2 Table 1 and 8.2.3.2 Table 2 of the MPEG-4 Systems Specification,
47、and the actual values should be referenced from those tables. A hinter may need to send more OD events than actually occur in the OD track: for example, if the ES_description changes at a time when there is no event in the OD track. In general, any OD events explicitly authored into the OD track sho
48、uld be sent along with those necessary to indicate other changes. The ES descriptor sent in the OD track is taken from the description of the temporally next sample in the ES track (in decoding time). 3.2 Track Identifiers The track identifiers used in an MP4 file are unique within that file; no two
49、 tracks may use the same identifier. Each elementary stream in the file is stored as a media track. In the case of an elementary stream, the lower two bytes of the four-byte track_ID shall be set to the elementary stream identifier (ES_ID).; the upper two bytes of the track_ID are zero in this case. Hint tracks may use track identifier values in the same range, if this number space is adequate (which it generally is). However, hint track identifiers