1、 Copyright 2014 by THE SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS 3 Barker Avenue, White Plains, NY 10601 (914) 761-1100 Approved June 17, 2014 The attached document is a Registered Disclosure Document prepared by the proponent identified below. It has been examined by the appropriate SMPTE
2、Technology Committee and is believed to contain adequate information to satisfy the objectives defined in the Scope, and to be technically consistent. This document is NOT a Standard, Recommended Practice or Engineering Guideline, and does NOT imply a finding or representation of the Society. Errors
3、 in this document should be reported to the proponent identified below, with a copy to engsmpte.org. This document is intended to allow the interpretation of Dolby Atmos Bitstream files. It is not intended to support the development of hardware or software applications that create or process these f
4、iles. Creation and processing of such files is reserved to individuals and organizations that have entered into agreements with the proponent identified below for this purpose. Use of this document to produce or process Dolby Atmos files using non-Dolby tools would potentially cause user confusion,
5、diminished sound quality as experienced by content consumers, and damage to the reputation of the Dolby Atmos brand and to Dolby Laboratories itself. All other inquiries in respect of this document, including inquiries as to intellectual property requirements that may be attached to use of the discl
6、osed technology, should be addressed to the proponent identified below. Proponent contact information: Dean Bullock Dolby Laboratories Inc. 100 Potrero Ave. San Francisco, CA 94103 Email Page 1 of 18 pages SMPTE REGISTERED DISCLOSURE DOCUMENT Dolby Atmos Bitstream Specification SMPTE RDD 29:2014 SM
7、PTE RDD 29:2014 Page 2 of 18 pages Table of Contents Page 1 Scope 3 2 Bitstream Organization . 3 2.1 ATMOSFrame Element 4 2.2 BedDefinition Element 4 2.3 ObjectDefinition Element 4 2.4 AudioDataDLC Element . 4 3 Bitstream Conventions . 4 3.1 Position . 4 3.2 Relative distance coding . 5 3.3 Amplitud
8、e Gain . 5 3.4 Plex Coding 5 4 Bit Stream Syntax . 6 4.1 Syntax of ReadElement() . 6 4.2 Syntax of ATMOSFrame( ) . 7 4.3 Syntax of BedDefinition1() 7 4.4 Syntax of ObjectDefinition1() 8 4.5 Syntax of AudioDataDLC() . 9 5 Bit Stream Field Description . 11 5.1 ReadElement() Data Fields 11 5.2 ATMOSFra
9、me() data fields . 11 5.3 BedDefinition1() Data Fields . 13 5.4 ObjectDefinition1() Data Fields . 13 5.5 AudioDataDLC() Data Fields 16 Introduction Dolby Atmos is an advanced cinema sound format comprising an audio essence and metadata stream played through specialized renderers in the cinema. SMPTE
10、 RDD 29:2014 Page 3 of 18 pages 1 Scope This document defines the syntax of a frame-based Dolby Atmos bit stream. The bit stream carries audio essence and metadata necessary to reproduce a complete audio program. 2 Bitstream Organization The audio program is segmented into Frames, with Frames transm
11、itted 24, 25, 30, 48, 50, 60, 96, 100, or 120 times per second. The audio Frames are aligned with the program edit units. In most cases the edit units, and picture and audio frames are of the same duration and are time aligned. Support for frame rates above 120 Hz is not defined. All audio data is e
12、ncapsulated into “elements,” similar in concept to “chunks” in the RIFF1 format. Each element begins with a unique identifier called ElementID. The second field, ElementSize, indicates the size in bytes of the entire element, not including the ElementID and ElementSize. Elements can contain sub-Elem
13、ents. The ElementSize includes the size of all sub elements. Sub-Elements contain additional description data related to the parent Element. At the top level, the entire audio frame is contained in a single ATMOSFrame element. All audio essence and metadata elements for a given frame are contained a
14、s sub Elements of the ATMOSFrame Element as shown below. 1 Resource Interchange File Format Frame Element Bed Definition Element Audio Element Object Definition Element SMPTE RDD 29:2014 Page 4 of 18 pages Currently there are 4 element types specified in the Dolby Atmos bit stream; ATMOSFrame, BedDe
15、finition, ObjectDefinition, and AudioDataDLC. The purpose of each element is described in the following sections. 2.1 ATMOSFrame Element The ATMOSFrame element contains all the information that is common to the entire Dolby Atmos frame. Specifically, the ATMOSFrame contains the Dolby Atmos version,
16、audio sample rate, the audio bit depth, the audio frame rate, and the maximum number of rendered audio assets. All raw audio assets and metadata must be sub elements of the ATMOSFrame element. 2.2 BedDefinition Element A Dolby Atmos bed is a collection of audio channels. An audio channel is an audio
17、 stream that is intended to be played back with a nominal location (e.g. “Left” channel) or function (e.g. LFE). The BedDefinition element contains a list of the audio assets and the associated channel names. 2.3 ObjectDefinition Element The Dolby Atmos system allows audio assets to be panned to any
18、 location independent of the physical or nominal loudspeaker configuration; these panned audio assets are called objects. The ObjectDefinition element provides all the information to pan an audio object. Each ObjectDefinition element updates the position of a single audio object at approximately 20-
19、ms time intervals. The Dolby Atmos presentation can have a large number of audio objects that will be independently rendered to the appropriate locations. To achieve this, the Dolby Atmos bit stream will contain multiple ObjectDefinition elements that must be direct sub-elements of the ATMOSFrame el
20、ement and must have a unique MetaID. 2.4 AudioDataDLC Element Each AudioDataDLC element contains the audio assets for one track of audio, channel or object. Every audio track is losslessly compressed and exists for the duration of the program. The AudioDataDLC element supports sample rates of 48 kHz
21、 or 96 kHz with 24-bit resolution. All AudioDataDLC elements must be direct sub-elements of the ATMOSFrame element and must have a unique AudioDataID. Note: audio object tracks are typically sparse; most audio events conveyed by objects have limited time extent, with digital zero signal between even
22、ts. The AudioDataDLC element can efficiently indicate periods of silence to dramatically decrease the audio payload. 3 Bitstream Conventions 3.1 Position Axes and Origin Many of the metadata elements contained in the bitstream specify a relative position or size. In most cases position is described
23、relative to the playback environment using a unit cube to describe the room boundaries. The origin is taken to be the front left corner of the room. Position is then described using Euclidian (x,y,z) coordinates, assigned as follows: x: lateral, or left/right position x=0 corresponds to left wall; x
24、=1 corresponds to right wall. y: longitude, or front/back position y=0 corresponds to front wall; y=1 corresponds to back wall. z: elevation, or up/down position z=0 corresponds to a plane aligned with the screen, side and rear loudspeakers; z=1 corresponds to the ceiling. SMPTE RDD 29:2014 Page 5 o
25、f 18 pages For example, (0, 0, 0) - front left corner, 0 elevation (left screen speaker), (1, 0, 0) - front right corner, 0 elevation (right screen speaker), and (0.5, 0.5, 1) - middle of ceiling. Metadata that describes position relative to the room uses the unit axes and origin described above; th
26、e location along each axis is coded using the distance coding method described below. 3.2 Relative distance coding 12 Bit Distance Throughout the bitstream, distance metadata on or within the unit cube is coded as a 12 bit distance mantissa (D12) that maps linearly into the range 0,1; for example, 0
27、x000-0.0, and 0xfff-1.0. If D12 is interpreted as an unsigned 12 bit unsigned integer, D12 is mapped to a distance value as follows: Distance = D12/(212 1), 0 1) ObjectDecorCoefsb 8 SMPTE RDD 29:2014 Page 9 of 18 pages ObjectDefintion1 Syntax Word Size /* end if(PanInfoExists) */ /* reads extra bits
28、 to get to byte alignment relative to the start of the frame */ AlignBits . VARIABLE AudioDescription . 8 if(AudioDescription n NumPredRegions; n + ) RegionLengthn . 4 FIROrdern . 5 IIROrdern 5 for(m = 1; m = FIROrder; m +) FIRPredictornm 10 for(m = 1; m = IIROrder; m +) IIRPredictornm . 10 /* Coded
29、 residual */ for(n = 0; n NumSubBlocks; n +) CodeType 1 if(CodeType = 0) /* PCM Residual */ BitDepth 5 for(l = 0; l SubBlockSize; l +) Residualn * SubBlockSize + l BitDepth SMPTE RDD 29:2014 Page 10 of 18 pages AudioDataDLC Syntax Word Size else /*Rice/Golomb Residual */ RiceCode . 5 for(l = 0; l Su
30、bBlockSize; l +) Residualn * SubBlockSize + l. VARIABLE /* 96kHz Residual Data */ if(SampleRate = 0x1) /* Predictor information */ NumPredRegions . 2 for(n = 0; n NumPredRegions; n + ) RegionLengthn . 4 FIROrdern . 5 IIROrdern 5 for(m = 1; m = FIROrder; m +) FIRPredictornm 10 for(m = 1; m = IIROrder
31、; m +) IIRPredictornm . 10 /* Coded residual */ for(n = 0; n NumSubBlocks; n +) CodeType 1 if(CodeType = 0) /* PCM Residual */ BitDepth 5 for(l = 0; l SubBlockSize; l +) Residualn * SubBlockSize + l BitDepth else /*Rice/Golumn Residual */ RiceCode . 5 for(l = 0; l SubBlockSize; l +) Residualn * SubB
32、lockSize + l . VARIABLE /* Each Element must keep track of the number of bits read */ AlignBits VARIABLE /* end of AudioDataDLC*/ SMPTE RDD 29:2014 Page 11 of 18 pages 5 Bit Stream Field Description 5.1 ReadElement() Data Fields 5.1.1 ElementID Plex(8) Each Element block starts with an ElementID. Th
33、e ElementID defines the type of element and the contents of the element. Depending on the ElementID the decoder will perform different tasks. Table 1 provides a list of ElementIDs. If the ElementID is not defined in the system, then the decoder shall skip the element. Table 1 Dolby Atmos Element IDs
34、 ElementID Name Value Meaning ATMOS_FRAME 0x08 Frame Header BED_DEFINITION1 0x10 Bed Definition Type 1 RESERVED 0x20 Reserved OBJECT_DEFINITION1 0x40 Object Definition Type 1 RESERVED 0x80 Reserved RESERVED 0x100 Reserved AUDIO_DATA_DLC 0x200 Audio Data (DLC encoded) 5.1.2 ElementSize Plex(8) Elemen
35、tSize, indicates the size in bytes of the entire element, not including the ElementID and ElementSize. For the Frame Element, the ElementSize is the entire audio frame (not including the ATMOSFrame ElementID and ElementSize) as all other elements are contained as sub elements. 5.2 ATMOSFrame() data
36、fields 5.2.1 ATMOSVersion 8 bits The ATMOSVersion specifies the version of the Dolby Atmos Bit Stream. This field currently has the value of 0x1. This document describes the protocol with ATMOSVersion = 1; 5.2.2 SampleRate 2 bits The SampleRate code specifies the sampling rate of the audio data. All
37、 audio tracks, channels and objects, must have the same sampling rate. The SampleRate code has following definitions as shown in Table 2. Table 2 Sample Rate code SampleRate Code Meaning 0x0 48000 samples per second 0x1 96000 samples per second 0x2 RESERVED 0x3 RESERVED 5.2.3 BitDepth 2 bits The Bit
38、eDepth code specifies the bit depth of the object audio data. All audio tracks must have same bit depth. The BitDepth code has the following meanings as specified in Table 3. Only 24-bits per audio sample are currently supported. SMPTE RDD 29:2014 Page 12 of 18 pages Table 3 Bit Depth Code BitDepth
39、Code Meaning 0x0 RESERVED 0x1 24 bits per audio sample 0x2 RESERVED 0x3 RESERVED 5.2.4 FrameRate 4 bits The FrameRate code specifies the audio frame rate. The FrameRate code has the following meanings as specified by Table 4. Table 4 Frame Rate Code FrameRate Code Meaning 0x0 24 frames per second 0x
40、1 25 frames per second 0x2 30 frames per second 0x3 48 frames per second 0x4 50 frames per second 0x5 60 frames per second 0x6 96 frames per second 0x7 100 frames per second 0x8 120 frames per second 0x9-0xF RESERVED The FrameRate code also controls the sample count (SampleCount) contained in each a
41、udio asset as specified by Table 5. Table 5 Sample Count versus Frame Rate Code and Sample Rate FrameRate Code Sample Count 48 kHz Sample Count 96 kHz 0x0 2000 4000 0x1 1920 3840 0x2 1600 3200 0x3 1000 2000 0x4 960 1920 0x5 800 1600 0x6 500 1000 0x7 480 960 0x8 400 800 0x9-0xF RESERVED RESERVED 5.2.
42、5 MaxRendered Plex(8) The MaxRendered code specifies the maximum audio assets that will be rendered during playback of the Dolby Atmos frame for theaters with the optimal target playback. For example, for a stream with 9.1 channel beds and 118 objects, the MaxRendered count would be set to 128. 5.2.
43、6 SubElementCount Plex(8) The SubElementCount code is the number of elements contained in the current element. SMPTE RDD 29:2014 Page 13 of 18 pages 5.3 BedDefinition1() Data Fields 5.3.1 MetaID Plex(8) MetaID is the unique ID that aids the system track metadata information between audio frames. 5.3
44、.2 ChannelCount Plex(4) The channel count is the number of channels that make up the bed. 5.3.3 ChannelID Plex(4) The ChannelID code specifies the known channel locations. Table 6 provides a list of channelIDs and the associated loudspeaker name. Table 6 Channel IDs ChannelID Code Meaning 0x0 Left S
45、creen Speaker 0x1 Right Screen Speaker 0x2 Center Screen Speaker 0x3 LFE 0x4 Reserved 0x5 Reserved 0x6 Left Side Surround (7.1) 0x7 Right Side Surround (7.1) 0x8 Left Rear Surround (7.1) 0x9 Right Rear Surround (7.1) 0xA Left Top Surround (9.1) 0xB Right Top Surround (9.1) otherwise Reserved 5.3.4 A
46、udioDataID Plex(8) The AudioDataID code is a unique identifier to each of the raw mono audio assets carried in the bit stream. An AudioDataID of NULL (0) indicates no audio asset. 5.4 ObjectDefinition1() Data Fields 5.4.1 NumPanSubBlocks Informative The NumPanSubBlocks specifies the division of the
47、frame into sub frames of approximately 5 ms, as specified by Table 7. SMPTE RDD 29:2014 Page 14 of 18 pages Table 7 Number of Pan Sub Blocks and Sub Block Size versus Sample Rate and Frame Rate Sample Rate Frame Rate (sec-1) NumPanSubBlocks PanSubBlockSize Duration (ms) 48 kHz 24 8 250 5.2 48 25 8 2
48、40 5.0 48 30 8 200 4.2 48 48 4 250 5.2 48 50 4 240 5.0 48 60 4 200 4.2 48 96 2 250 5.2 48 100 2 240 5.0 48 120 2 200 4.2 96 kHz 24 8 500 5.2 96 25 8 480 5.0 96 30 8 400 4.2 96 48 4 500 5.2 96 50 4 480 5.0 96 60 4 400 4.2 96 96 2 500 5.2 96 100 2 480 5.0 96 120 2 400 4.2 5.4.2 PanInfoExists 1 bit The
49、 PanInfoExists bit specifies when the panning information is updated in each sub block boundary. The panning information always exists for the first sub block of a frame. The decoder should assume that if the PanInfoExists bit is set to zero then the panning information is repeated from the previous sub block. 5.4.3 ObjectPosXsb, ObjectPosYsb, ObjectPosZsb 1