1、 Reference numberISO/IEC 14496-3:2001(E)ISO/IEC 2001INTERNATIONAL STANDARD ISO/IEC14496-3Second edition2001-12-15Information technology Coding of audio-visual objects Part 3: Audio Technologies de linformation Codage des objets audiovisuels Partie 3: Codage audio ISO/IEC 14496-3:2001(E) PDF disclaim
2、er This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties acce
3、pt therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the f
4、ile; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. ISO/IEC 2001 All r
5、ights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of
6、the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.ch Web www.iso.ch Printed in Switzerland ii ISO/IEC 2001 All rights reservedISO/IEC 14496-3:2001(E) ISO/IEC 2001 All rights reserved iiiForeword ISO (the Internation
7、al Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by
8、 the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of
9、 information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3. The main task of the joint technical committee is to prepare International Standards. Draft Inte
10、rnational Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75 % of the national bodies casting a vote. ISO/IEC 14496-3 was prepared by Joint Technical Committee ISO/IEC JTC 1, Infor
11、mation technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information. This second edition cancels and replaces the first edition (ISO/IEC 14496-3:1999), which has been technically revised. It incorporates Amd.1:2000 and Cor.1:2001. ISO/IEC 14496 consists of the foll
12、owing parts, under the general title Information technology Coding of audio-visual objects: Part 1: Systems Part 2: Visual Part 3: Audio Part 4: Conformance testing Part 5: Reference software Part 6: Delivery Multimedia Integration Framework (DMIF) Part 7: Optimized software for MPEG-4 visual tools
13、Part 8: Carriage of MPEG-4 contents over IP networks Annexes 2.E, 3.C, 4.A and 5.A form a normative part of this part of ISO/IEC 14496. Annexes 1.A to 1.C, 2.A to 2.D, 3.A, 3.B, 3.D to 3.F, 4.B, 5.B to 5.F, 6.A and 7.A are for information only. Due to its technical nature, this part of ISO/IEC 14496
14、 requires a special format as several standalone electronic files and, consequently, does not conform to some of the requirements of the ISO/IEC Directives, Part 2. ISO/IEC 14496-3:2001(E)iv ISO/IEC 2001 All rights reservedIntroductionOverviewISO/IEC 14496-3 (MPEG-4 Audio) is a new kind of audio sta
15、ndard that integrates many different types of audiocoding: natural sound with synthetic sound, low bitrate delivery with high-quality delivery, speech with music,complex soundtracks with simple ones, and traditional content with interactive and virtual-reality content. Bystandardizing individually s
16、ophisticated coding tools as well as a novel, flexible framework for audiosynchronization, mixing, and downloaded post-production, the developers of the MPEG-4 Audio standard havecreated new technology for a new, interactive world of digital audio.MPEG-4, unlike previous audio standards created by I
17、SO/IEC and other groups, does not target a singleapplication such as real-time telephony or high-quality audio compression. Rather, MPEG-4 Audio is a standardthat applies to every application requiring the use of advanced sound compression, synthesis, manipulation, orplayback. The subparts that foll
18、ow specify the state-of-the-art coding tools in several domains; however, MPEG-4Audio is more than just the sum of its parts. As the tools described here are integrated with the rest of the MPEG-4standard, exciting new possibilities for object-based audio coding, interactive presentation, dynamic so
19、undtracks,and other sorts of new media, are enabled.Since a single set of tools is used to cover the needs of a broad range of applications, interoperability is a naturalfeature of systems that depend on the MPEG-4 Audio standard. A system that uses a particular coderforexample a real-time voice com
20、munication system making use of the MPEG-4 speech coding toolsetcan easilyshare data and development tools with other systems, even in different domains, that use the same toolforexample a voicemail indexing and retrieval system making use of MPEG-4 speech coding.The remainder of this clause gives a
21、 more detailed overview of the capabilities and functioning of MPEG-4 Audio.First a discussion of concepts, that have changed since the MPEG-2 audio standards, is presented. Then theMPEG-4 Audio toolset is outlined.New concepts in MPEG-4 AudioMany concepts in MPEG-4 Audio are different than those in
22、 previous MPEG Audio standards. For the benefit ofreaders who are familiar with MPEG-1 and MPEG-2 we provide a brief overview here. MPEG-4 has no standard for transport. In all of the MPEG-4 tools for audio and visual coding, the codingstandard ends at the point of constructing a sequence of access
23、units that contain the compressed data.The MPEG-4 Systems (ISO/IEC 14496-1:2001) specification describes how to convert the individuallycoded objects into a bitstream that contains a number of multiplexed sub-streams.There is no standard mechanism for transport of this stream over a channel; this is
24、 because the broad range ofapplications that can make use of MPEG-4 technology have delivery requirements that are too wide to easilycharacterize with a single solution. Rather, what is standardized is an interface (the Delivery MultimediaInterface Format, or DMIF, specified in ISO/IEC 14496-6:1999)
25、 that describes the capabilities of a transportlayer and the communication between transport, multiplex, and demultiplex functions in encoders anddecoders. The use of DMIF and the MPEG-4 Systems bitstream specification allows transmission functionsthat are much more sophisticated than are possible w
26、ith previous MPEG standards.However, LATM and LOAS were defined to provide a low overhead audio multiplex and transport mechanismfor natural audio applications, which do not require sophisticated object-based coding or other functionsprovided by MPEG-4 Systems.The following table gives an overview a
27、bout the multiplex, storage and transmission formats for MPEG-4 Audiocurrently available within the MPEG-4 framework:ISO/IEC 14496-3:2001(E) ISO/IEC 2001 All rights reserved vFormat Functionalitydefined in:Functionalityredefined in:DescriptionFlexMux ISO/IEC 14496-1:2001 (MPEG-4 Systems)(Normative)-
28、 Flexible multiplex schemeMultiplexLATM ISO/IEC 14496-3:2001(MPEG-4 Audio)(Normative)- Low Overhead Audio Transport MultiplexADIF ISO/IEC 13818-7:1997(MPEG-2 Audio)(Normative)ISO/IEC 14496-3:2001(MPEG-4 Audio)(Informative)(MPEG-2 AAC) Audio Data InterchangeFormat,AAC onlyStorageMP4FF ISO/IEC 14496-1
29、:2001(MPEG-4 Systems)(Normative)- MPEG-4 File formatADTS ISO/IEC 13818-7:1997(MPEG-2 Audio)(Normative, Exemplarily)ISO/IEC 14496-3:2001(MPEG-4 Audio)(Informative)Audio Data Transport Stream,AAC onlyTransmissionLOAS ISO/IEC 14496-3:2001(MPEG-4 Audio)(Normative, Exemplarily)- Low Overhead Audio Stream
30、, based onLATM, three versions are available:AudioSyncStream()EPAudioSyncStream()AudioPointerStream()To allow for a user on the remote side of a channel to dynamically control a server streaming MPEG-4 content,MPEG-4 defines backchannel streams that can carry user interaction information. MPEG-4 Aud
31、io supports low-bitrate coding. Previous MPEG Audio standards have focused primarily ontransparent (undetectable) or nearly transparent coding of high-quality audio at whatever bitrate was requiredto provide it. MPEG-4 provides new and improved tools for this purpose, but also standardizes (and hast
32、ested) tools that can be used for transmitting audio at the low bitrates suitable for Internet, digital radio, orother bandwidth-limited delivery. The new tools specified in MPEG-4 are the state-of-the-art tools that supportlow-bitrate coding of speech and other audio. MPEG-4 is an object-based codi
33、ng standard with multiple tools. Previous MPEG Audio standardsprovided a single toolset, with different configurations of that toolset specified for use in various applications.MPEG-4 provides several toolsets that have no particular relationship to each other, each with a different targetfunction.
34、The Profiles of MPEG-4 Audio (subclause 1.5.2) specify which of these tools are used together forvarious applications.Further, in previous MPEG standards, a single (perhaps multi-channel or multi-language) piece of content wastransmitted. In contrast, MPEG-4 supports a much more flexible concept of
35、a soundtrack. Multiple tools maybe used to transmit several audio objects, and when using multiple tools together an audio compositionsystem is used to create a single soundtrack from the several audio substreams. User interaction, terminalcapability, and speaker configuration may be used when deter
36、mining how to produce a single soundtrack fromthe component objects. This capability gives MPEG-4 significant advantages in quality and flexibility whencompared to previous audio standards. MPEG-4 provides capabilities for synthetic sound. In natural sound coding, an existing sound iscompressed by a
37、 server, transmitted and decompressed at the receiver. This type of coding is the subject ofmany existing standards for sound compression. In contrast, MPEG-4 standardizes a novel paradigm in whichsynthetic sound descriptions, including synthetic speech and synthetic music, are transmitted and thens
38、ynthesized into sound at the receiver. Such capabilities open up new areas of very-low-bitrate but still very-high-quality coding. MPEG-4 provides capabilities for Error Robustness. Improved error robustness for AAC is provided by aset of error resilience tools. These tools reduce the perceived degr
39、adation of the decoded audio signal that isISO/IEC 14496-3:2001(E)vi ISO/IEC 2001 All rights reservedcaused by corrupted bits in the bitstream. Improved error robustness capabilities for all coding tools areprovided through the error resilient bitstream payload syntax. This tool supports advanced ch
40、annel codingtechniques, which can be adapted to the special needs of given coding tools and a given communicationschannel. This error resilient bitstream payload syntax is mandatory for all error resillient object types.The error protection tool (EP tool) provides unequal error protection (UEP) for
41、MPEG-4 Audio in conjunctionwith the error resilient bitstream payload. UEP is an efficient method to improve the error robustness of sourcecoding schemes. It is used by various speech and audio coding systems operating over error-prone channelssuch as mobile telephone networks or Digital Audio Broad
42、casting (DAB). The bits of the coded signalrepresentation are first grouped into different classes according to their error sensitivity. Then error protection isindividually applied to the different classes, giving better protection to more sensitive bits. MPEG-4 provides capabilities for Scalabilit
43、y. Previous MPEG Audio standards provided a single bitrate,single bandwidth toolset, with different configurations of that toolset specified for use in various applications.MPEG-4 provides several bitrate and bandwidth options within a single bitstream, providing a scalabilityfunctionality that perm
44、its a given bitstream to scale to the requirement of different channels and applications orto be responsive to a given channel that has dynamic throughput characteristics. The tools specified in MPEG-4 are the state-of-the-art tools providing scalable compression of speech and audio signals.As with
45、previous MPEG standards, MPEG-4 does not standardize methods for encoding sound. Thus, contentauthors are left to their own decisions as to the best method of creating bitstreams. At the present time, methods toautomatically convert natural sound into synthetic or multi-object descriptions are not m
46、ature; therefore, mostimmediate solutions will involve interactively-authoring the content stream in some way. This process is similar tocurrent schemes for MIDI-based and multi-channel mixdown authoring of soundtracks.CapabilitiesOverview of capabilitiesThe MPEG-4 Audio tools can be broadly organiz
47、ed into several categories:Speech tools for the transmission and decoding of synthetic and natural speech.Audio tools for the transmission and decoding of recorded music and other audio soundtracks.Synthesis tools for very low bitrate description and transmission, and terminal-side synthesis, of syn
48、thetic musicand other sounds.Composition tools for object-based coding, interactive functionality, and audiovisual synchronization.Scalability tools for the creation of bitstreams that can be transmitted, without recoding, at several different bitrates.Upstream tools for the dynamic control the stre
49、aming of the server for bitrate control and quality feedback control.Error robustness (including error resilience as well as error protection).Each of these types of tools will be described in more detail in the following subclauses.MPEG-4 speech coding toolsTwo types of speech coding tools are provided in MPEG-4. The natural speech tools allow the compression,transmission, and decoding of human speech, for use in telephony, personal communication, and surveillanceapplications. The synthetic speech tool provides an interface to text