1、 Copyright 2016 by THE SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS 3 Barker Avenue, White Plains, NY 10601 (914) 761-1100 Approved August 23, 2016 The attached document is a Registered Disclosure Document prepared by the proponent identified below. It has been examined by the appropriate SMPT
2、E Technology Committee and is believed to contain adequate information to satisfy the objectives defined in the Scope, and to be technically consistent. This document is NOT a Standard, Recommended Practice or Engineering Guideline, and does NOT imply a finding or representation of the Society. Ever
3、y attempt has been made to ensure that the information contained in this document is accurate. Errors in this document should be reported to the proponent identified below, with a copy to engsmpte.org. All other inquiries in respect of this document, including inquiries as to intellectual property r
4、equirements that may be attached to use of the disclosed technology, should be addressed to the proponent identified below. Proponent contact information: Scott Smyers DTS, Inc. 130 Knowles Drive, Suite B Los Gatos, CA 95032 Email: Page 1 of 31 pages SMPTE REGISTERED DISCLOSURE DOCUMENT MDA Program
5、 Specification SMPTE RDD 42:2016 SMPTE RDD 42:2016 Page 2 of 31 pages Table of Contents Page Conventions . 3 Introduction (Informative) 3 1 Scope . 7 2 Normative References 7 3 Timeline . 7 4 Audio Object 7 5 Coordinate System 7 6 Object Model . 8 7 URI Constants 20 8 Basic Data Types 20 9 Reference
6、 Renderer 21 Bibliography (Informative) 31 SMPTE RDD 42:2016 Page 3 of 31 pages Conventions All sections are normative, unless otherwise indicated. Pseudo-code and property names use font style courier new. The expressions MAY, NEED NOT, SHALL, SHALL NOT, SHOULD, and SHOULD NOT indicate normative be
7、havior ISO/EIC Directives, Part 2. VERBAL FORM SEMANTICS MAY It is allowed NEED NOT It is not required that SHALL Is required that SHALL NOT Is required to be not SHOULD It is recommended that SHOULD NOT It is not recommended that Introduction (Informative) The MDA Program, or simply Program hereaft
8、er, is a self-contained object-based audio program. As such it consists of a collection of audio objects, each combining an audio waveform with metadata. The metadata indicates, for instance, when the object occurs on the Program timeline or where it is positioned within the soundfield. It is used t
9、o control the mapping of the audio object waveform to output loudspeakers at playback. SMPTE RDD 42:2016 Page 4 of 31 pages Figure 1 Sample Program. Only a subset of all Fragment properties are shown Figure 1 depicts a Program that consists of 4 audio objects: Dialog, FX1, FX2 and LFE. While the LFE
10、 object exists from t = 0 to t = T3, the dialog object exists only from t = 0 to t = T2, and the FX2 and FX2 objects from t = T1 to t = T3. The Program object model allows for any number of audio objects to overlap at any point in time, and an audio object can be as short as a sample or as long as t
11、he program. Each audio object is assigned an identifier that is unique within the scope of the Program. The metadata associated with each object is divided into Fragments, each corresponding to a period of time during which the metadata is static. To simplify the Program structure, Fragment boundari
12、es are aligned. Two kinds of Fragments, and hence audio objects, are defined: F X 2 o bjectF X 1 o bjectGr o up Id=1Dial og objectOb jec tF rag mentID = 1re n der ingException = CassetLo cator = AAAOb jec tF rag mentID = 1re n der ingException = CassetLo cator = AAAOb jec tF rag mentID = 2po sit ion
13、 = (1,0 .1 ,0 )assetLo cator = BBBOb jec tF rag mentID = 2po si tion = ( 1, 0 ,0)assetLo cator = BBBLF E ob je ctLF EF r agmentID = 4assetLo cator = DDDLF EF r agmentID = 4assetLo cator = DDDLF EF r agmentID = 4assetLo cator = DD DOb jec tF rag mentID = 3po sit ion = (1, - 0. 1,0 )assetLo cator = CC
14、COb jec tF rag mentID = 3po si tion = ( 1, 0 ,0)assetLo cator = CCCGr o up Id=1Asse t (URI = AA A) Asse t ( URI = BBB ) Asse t (URI = C CC) Asse t (URI = DDD ) t = 0 t = T1t = T2t = T3P rogram ti me li neSMPTE RDD 42:2016 Page 5 of 31 pages An ObjectFragment corresponds to an object associated with
15、a spatial locus. For instance, the position of the FX1 and FX2 objects in Figure 1 changes from t = T1 to t = T3. This spatial locus is used to determine the loudspeakers that will output the waveform associated with the object. It is also possible to instruct that an object waveform be routed throu
16、gh a specific loudspeaker, if present. An LFEFragment corresponds to an object whose waveform is intended for routing to a Low Frequency Effect (LFE) channel, and is therefore not associated with a spatial location. Each Fragment references a sequence of audio samples, i.e. the object waveform, with
17、in an underlying asset identified using a Uniform Resource Identifier (URI). Depending on applications, the asset can be carried alongside the Fragment metadata or be remote. Multiple Fragments can reference the same audio samples within a single asset. As illustrated by the FX1 and FX2 objects, Fra
18、gments can be combined into a Group, which logically groups the two ObjectFragments and contains metadata common to them . The object model also allows Fragments to be combined into a Switch, which indicates that only one of the Fragments is rendered at any given time. Groups and Switches are recurs
19、ive entities that can themselves contain Groups and Switches. From the perspective of the object model, Fragments, Groups and Switches are all subclasses of the Entity class, which represents arbitrary entities of the Program timeline. In order to specify unambiguously how object metadata is used to
20、 map object waveforms to loudspeaker outputs, i.e. rendered, a Reference Renderer is fully specified in Section 9. The Reference Renderer uses the Vector Base Amplitude Panning formalism (VBAP), which was introduced by Pulkki et al. and has since been extensively studied. VBAP is an extension of the
21、 familiar tangent law for pair-wise panning to three-dimensional speaker configurations. Specifically, given a loudspeaker triplet on the unit sphere and a point source object located within the spherical triangle defined by the loudspeakers, the contribution of the object waveform to each of the lo
22、udspeaker is determined by the coordinates of the object within the linear basis formed by the three loudspeakers (see Figure 2). Objects with a finite extent can be rendered as a collection of point sources. More complex loudspeaker configurations can be decomposed into multiple speaker triplets. F
23、igure 2 Rendering audio objects using VBAP. The shaded areas show the relative output power at each of the speakers and are determined by expressing the object vector in the basis formed by the three loudspeakers. S1S2S3OSMPTE RDD 42:2016 Page 6 of 31 pages To support a range of applications within
24、its stated scope, the Program object model is designed to be flexible, e.g. the number of simultaneous Fragments is not limited, and offers multiple extension points. Applications are therefore expected to constrain or extend the object model to suit their specific requirements. Similarly, this spec
25、ification does not define a concrete representation of the Program, and mappings to bitstream structures and transmission mechanisms are left to other documents. SMPTE RDD 42:2016 Page 7 of 31 pages 1 Scope This document specifies the object model and reference renderer for the MDA Program. The MDA
26、Program is a self-contained representation of an object-based soundfield designed for linear content. It is specified independently of transport mechanisms. 2 Normative References Internet Engineering Task Force (IETF) (January 2005). RFC 3986 Uniform Resource Identifier (URI): Generic Syntax ISO/EI
27、C Directives, Part 2. Edition 6.0, 2011-04 OMG, Object Constraint Language (OCL), Version 2.3.1, http:/www.omg.org/spec/OCL/2.3.1/PDF OMG, Unified Modeling Language (OMG UML), Superstructure, Version 2.4.1, http:/www.omg.org/spec/UML/2.4.1/Superstructure/PDF 3 Timeline A Program defines a sample-acc
28、urate timeline onto which Entity instances (see Section 6.6) are placed. Positions on the timeline SHALL be expressed as integer multiples of the inverse of the Program audio sample rate (see Section 6.5.2), i.e. as an integer number of audio samples. The origin of the timeline (t=0) is arbitrary. 4
29、 Audio Object An Audio Object is the sequence of all Fragment instances (see Section 6.6) with the same id, ordered as they appear on the timeline. Two Fragment instances belong to the same Audio Object if and only if they have the same id value. 5 Coordinate System This specification uses the Carte
30、sian coordinate system illustrated in Figure 3 and specified as follows: The listener is located at the origin O=(0,0,0), facing the front of the room; The positive z-axis is perpendicular to the floor of the room, and directed to the ceiling; The positive y-axis is directed towards the front of the
31、 room; The positive x-axis is directed to the right of the listener; Loudspeakers lie on the unit sphere S; and The unit circle in the x-y plane is the locus of traditional horizontal two-dimensional loudspeaker configurations. SMPTE RDD 42:2016 Page 8 of 31 pages Figure 3 Program Coordinate System
32、For convenience, the following modified spherical coordinate system is also defined. = sincos = coscos = sinThe symbols (rho), (theta) and (phi) denote the radius, azimuth and elevation of the object, respectively. 6 Object Model 6.1 General The Program object model is specified using a combination
33、of prose, UML as specified in OMG Unified Modeling Language (UML), and OCL as specified in OMG Object Constraint Language (OCL). The prose shall take precedence over the UML and OCL notations in case of conflict. If an optional property is absent, its value shall be unspecified unless a default valu
34、e is provided, in which case its value shall be the default value. Values that are identified as reserved SHALL NOT be used in this version of the specification and, if present in a Program, SHALL be ignored by implementation conforming to this version of the specification. The notation #SymbolName
35、refers to the URI constant with symbol SymbolName. 6.2 Namespace UML elements defined herein SHALL be members of the MDA Package with the namespace specified in Table 1. yxzryxzVisual pre sentationListe n e rListe n e rLo cus of two-dime ns i on al sp e ake r con fi gurati on sSMPTE RDD 42:2016 Page
36、 9 of 31 pages Table 1 MDA Object Model Namespace Symbol URI mdaroot http:/mdaif.org mdacore /core/1.0/ 6.3 Versioning The namespace specified in Section 6.2 SHALL only be associated with Program instances that conform to this specification. Program instances using specifications that modify the lat
37、ter, including future versions of this specification, SHALL use a different namespace. 6.4 Program 6.4.1 General Figure 4 Program Model A Program instance is a single complete Program, which contains all information necessary for reproduction. 6.4.2 header The header property SHALL contain informati
38、on applicable to the Program as a whole. Two Program instances SHALL NOT have identical header.programURI values unless the two instances are identical. 6.4.3 entities The entities property contains all entities associated with the Program. No two Entity instances with the same id property value SHA
39、LL overlap on the timeline. All Entity instances with identical id property values SHALL be of the same concrete subclass. The start or end offset of an Entity instance SHALL NOT belong to the open interval bounded by the start and end offsets of another Entity. Pr o gr amHeader1e n ti tie s*1header
40、1Enti tySMPTE RDD 42:2016 Page 10 of 31 pages 6.5 Header Figure 5 Header Model 6.5.1 programURI The programURI property uniquely identifies the Program instance. The programURI property shall consist of no more than 64 characters, with the meaning of character specified in IETF RFC 3986. 6.5.2 sampl
41、eRate The sampleRate property indicates the audio sampling rate of the Program. Note: Section 6 defines common values for audio sampling rates. 6.5.3 constraintSets The Program object model MAY be constrained and extended by multiple applications, each potentially defining additional metadata proper
42、ties and applying a set of constraints beyond those specified herein. Implementations can use the constraintSets to rapidly determine whether they are capable of processing a Program. Each item of the constraintSets property SHALL be unambiguously associated with a collection of normative provisions
43、 (beyond those specified herein) to which the Program conforms. No two items of the constraintSets property SHALL be equal. 6.5.4 extensions The extensions property allows application-specific metadata (contained in a concrete subclass of the Extension class) to be associated with the Program. progr
44、 a m URI1 : U RIsample Rate 1 : URIcon str aintSets0 . .* : U RIHeaderExtension1e x te nsion s*SMPTE RDD 42:2016 Page 11 of 31 pages 6.6 Entity 6.6.1 General Figure 6 Positioning an Entity instance on the Program Timeline As illustrated in Figure 6, each Entity instance SHALL be associated with a st
45、art and end offset on the timeline, relative to the origin of the timeline. The end offset SHALL be larger than the start offset. The duration of the Entity instance is the difference between the end and start offsets. Figure 7 Entity Model This specification defines a number of concrete subclasses
46、of the Entity class, and future revisions MAY define additional ones. 6.6.2 id The id property allows multiple related Entity instances to be uniquely linked within the scope of the Program. The value of the id property SHALL belong to the range 0, 232-1. 6.6.3 extensions The extensions property all
47、ows application-specific metadata (contained in concrete subclasses of the Extension class defined by the application) to be associated with an Entity. En ti ty , e.g. Obj e ctF r agme nt, Gr o up , etc En ti ty du r ationPr ogram ti me l ineEn ti ty star t o ffsetEn ti ty e n d o ffsett = 0i d 1 :
48、i nteg erEnti tyGr o up Swi tchFragm en tExtension1e x te nsion s*SMPTE RDD 42:2016 Page 12 of 31 pages 6.7 Group 6.7.1 General A Group instance is a logical group of Entity instances, all of which are intended to be rendered. Figure 8 Group Model The start offset of a Group instance SHALL be the sm
49、allest start offset of all Entity instances it contains. The end offset of a Group instance SHALL be the largest end offset of all Entity instances it contains. Note: Section 9.3.1 specifies that all Entity instances within a Group instance are rendered. 6.8 Switch 6.8.1 General Figure 9 Switch Model A Switch instance is a logical group of Entity instances, only one of which is intended to be rendered. The start offset of a S