1、 Recommendation ITU-R BS.2076-0 (06/2015) Audio Definition Model BS Series Broadcasting service (sound) ii Rec. ITU-R BS.2076-0 Foreword The role of the Radiocommunication Sector is to ensure the rational, equitable, efficient and economical use of the radio-frequency spectrum by all radiocommunicat
2、ion services, including satellite services, and carry out studies without limit of frequency range on the basis of which Recommendations are adopted. The regulatory and policy functions of the Radiocommunication Sector are performed by World and Regional Radiocommunication Conferences and Radiocommu
3、nication Assemblies supported by Study Groups. Policy on Intellectual Property Right (IPR) ITU-R policy on IPR is described in the Common Patent Policy for ITU-T/ITU-R/ISO/IEC referenced in Annex 1 of Resolution ITU-R 1. Forms to be used for the submission of patent statements and licensing declarat
4、ions by patent holders are available from http:/www.itu.int/ITU-R/go/patents/en where the Guidelines for Implementation of the Common Patent Policy for ITU-T/ITU-R/ISO/IEC and the ITU-R patent information database can also be found. Series of ITU-R Recommendations (Also available online at http:/www
5、.itu.int/publ/R-REC/en) Series Title BO Satellite delivery BR Recording for production, archival and play-out; film for television BS Broadcasting service (sound) BT Broadcasting service (television) F Fixed service M Mobile, radiodetermination, amateur and related satellite services P Radiowave pro
6、pagation RA Radio astronomy RS Remote sensing systems S Fixed-satellite service SA Space applications and meteorology SF Frequency sharing and coordination between fixed-satellite and fixed service systems SM Spectrum management SNG Satellite news gathering TF Time signals and frequency standards em
7、issions V Vocabulary and related subjects Note: This ITU-R Recommendation was approved in English under the procedure detailed in Resolution ITU-R 1. Electronic Publication Geneva, 2015 ITU 2015 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without writ
8、ten permission of ITU. Rec. ITU-R BS.2076-0 1 RECOMMENDATION ITU-R BS.2076-0 Audio Definition Model (2015) Scope This Recommendation describes the structure of a metadata model that allows the format and content of audio files to be reliably described. This model, called the Audio Definition Model (
9、ADM), specifies how XML metadata can be generated to provide the definitions of tracks in an audio file. Keywords ADM, Audio Definition Model, BWF, Metadata, Wave-file, WAVE, object-based, channel-based, scene-based, renderer, XML, XSD, format, immersive The ITU Radiocommunication Assembly, consider
10、ing a) that Recommendation ITU-R BS.2051 Advanced sound system for programme production, highlights the need for a file format that is capable of dealing with the requirements for future audio systems; b) that Recommendation ITU-R BS.1909 Performance requirements for an advanced multichannel stereop
11、honic sound system for use with or without accompanying picture, outlines the requirements for an advanced multichannel stereophonic sound system; c) that it is desirable that there is a single open standard for a metadata model for defining audio content that file and streaming formats could either
12、 adopt or become compatible with by means of suitable interfacing, recommends for the following use cases: applications requiring a generic metadata model for, and a formalized description of, custom/proprietary audio formats and content (including codecs); generating and parsing audio metadata with
13、 general-purpose tools, such as text editors; an organizations internal production developments, where multi-purpose metadata needs to be added; a human-readable and hand-editable file for describing audio configurations (such as describing a mixing studio channel configuration) in a consistent and
14、translatable format is needed, to use the Audio Definition Model (ADM) described in Annex 1 for metadata to describe audio formats used in programme production and international exchange. 2 Rec. ITU-R BS.2076-0 Annex 1 Audio Definition Model 1 Introduction Audio for broadcasting and cinema is evolvi
15、ng towards an immersive and interactive experience which requires the use of more flexible audio formats. A fixed channel-based approach is not sufficient to encompass these developments and so combinations of channel, object and scene-based formats are being developed. Reports ITU-R BS.2266 1 and R
16、ecommendations ITU-R BS.1909 2 and ITU-R BS.2051 3 highlight these developments and the need for the production chain to accommodate them. The central requirement for allowing all the different types of audio to be distributed, whether by file or by streaming, is that whatever file/stream format is
17、used, metadata should co-exist to fully describe the audio. Each individual track within a file or stream should be able to be correctly rendered, processed or distributed according to the accompanying metadata. To ensure compatibility across all systems, the Audio Definition Model is an open standa
18、rd that will make this possible. 2 Background This purpose of this model is to formalise the description of audio. It is not a format for carrying audio. This distinction will help in the understanding of the model. 2.1 Cooking Analogy To help explain what the ADM actually does, it may be useful to
19、consider a cooking analogy. The recipe for a cake will contain a list of ingredients, instructions on how to combine those ingredients and how to bake the cake. The ADM is like a set of rules for writing the list of ingredients; it gives a clear description of each item, for example: 2 eggs, 400g fl
20、our, 200g butter, 200g sugar. The ADM provides the instructions for combining ingredients but does not tell you how to do the mixing or baking; in the audio world that is what the renderer does. The ADM is in general compatible with wave-file based formats such as ITU-R BS.1352, the BWF as defined b
21、y the EBU in 4 and other wave based formats that support the usage of the needed additional chunks. When used in the context of a BWF file according to 4, the chunk of the BWF file is like the bar code on the packet of each of the ingredients; this code allows us to look up the models description of
22、 each item. The bag containing the actual ingredients is like the data chunk of the BWF file that contains the audio samples. From a BWF file point of view, we would look at our bar codes on each ingredient in our bag, and use that to look up the description of each item in the bag. Each description
23、 follows the structure of the model. There might be ingredients such as breadcrumbs which could be divided into its own components (flour, yeast, etc.); which is like having an audio object containing multiple channels (e.g. stereo containing left and right). Rec. ITU-R BS.2076-0 3 2.2 Brief overvie
24、w This model will initially use XML as its specification language, though it could be mapped to other languages such as JSON (JavaScript Object Notation) if required. When it is used with BWF files according to 4, the XML can be embedded in the chunk of the file. The model is divided into two sectio
25、ns, the content part, and the format part. The content part describes what is contained in the audio, so will describe things like the language of any dialogue, the loudness and so on. The format part describes the technical nature of the audio so it can be decoded or rendered correctly. Some of the
26、 format elements may be defined before we have any audio signals, whereas the content parts can usually only be completed after the signals have been generated. While this model is based around a wave-file based format, it is a more general model. However, examples are given using BWF according to t
27、he definition in 4 as this explains more clearly how the model works. It is also expected that the models parameters are added to in subsequent versions of this specification to reflect the progress in audio technology. 3 Description of the model The overall diagram of the model is given in Fig. 1.
28、This shows how the elements relate to each other and illustrates the split between the content and format parts. It also shows the chunk of a BWF file according to 4 and how it connects the tracks in the file to the model. Where a BWF file according to 4 contains a number of audio tracks, it is nece
29、ssary to know what each track is. The chunk contains a list of numbers corresponding to each track in the file. Hence, for a 6 track file, the list is at least 6 long. For each track there is an audioTrackFormatID number and an audioTrackUID number (notice the additional U which stands for unique).
30、The reason the list could be longer than the number of tracks is that a single track may have different definitions at different times so will require multiple audioTrackUIDs and references. The audioTrackFormatID is used to look up the definition of the format of that particular track. The audioTra
31、ckFormatIDs are not unique; for example, if a file contains 5 stereo pairs, there will be 5 identical audioTrackFormatIDs to describe the left channel, and 5 to describe the right channel. Thus, only two different audioTrackFormatIDs will need to be defined. However, audioTrackUIDs are unique (hence
32、 the U), and they are there to uniquely identify the track. This use of IDs means that the tracks can be ordered in any way in the file; their IDs reveal what those tracks are. 4 Rec. ITU-R BS.2076-0 FIGURE 1 Overall UML Model BS . 2076 01-a u d i o P r o g r a m m ea u d i o P r o g r a m m e I D :
33、 c h a ra u d i o P r o g r a m m e N a m e : c h a ra u d i o P r o g r a m m e L a n g u a g e : c h a ra u d i o C o n t e n t I D R e f : c h a r ( 1 . . * )t y p e L a b e l : c h a ra u d i o C o n t e n ta u d i o C o n t e n t I D : c h a ra u d i o C o n t e n t N a m e : c h a ra u d i o C
34、 o n t e n t L a n g u a g e : c h a ra u d i o O b j e t I D R e f : c h a r ( 1 . . * )t y p e L a b e l : c h a ra u d i o O b j e c ta u d i o O b j e c t I D : c h a ra u d i o O b j e c t N a m e : c h a ra u d i o O b j e c t I D R e f : c h a r ( 0 . . * )a u d i o P a c k F o r m a t I D R
35、e f : c h a r ( 0 . . * )a u d i o T r a c k U I D R e f : c h a r ( 0 . . * )s t a r t T i m ed u r a t i o na u d i o T r a c k U I Ds a m p l e R a t e : i n tb i t D e p t h : i n ta u d i o T r a c k F o r m a t I D R e f : c h a r ( 0 . . 1 )a u d i o P a c k F o r m a t I D R e f : c h a r (
36、0 . . 1 )a u d i o P a c k F o r m a ta u d i o P a c k F o r m a t : c h a ra u d i o P a c k F o r m a t N a m e : c h a ra u d i o C h a n n e l F o r m a t I D R e f : ( 1 . . * )a u d i o P a c k F o r m a t I D R e f : ( 0 . . * )a u d i o C h a n n e l F o r m a ta u d i o C h a n n e l F o r
37、 m a t : c h a ra u d i o C h a n n e l F o r m a t N a m e : c h a rt y p e L a b e l : c h a ra u d i o B l o c k F o r m a t : ( 1 . . . * )a u d i o S t r e a m F o r m a ta u d i o S t r e a m F o r m a t I D : c h a ra u d i o S t r e a m F o r m a t N a m e : c h a r t y p e L a b e l : c h a
38、 ra u d i o C h a n n e l F o r m a t I D R e f : c h a r ( 0 . . 1 )a u d i o P a c k F o r m a t I D R e f : c h a r ( 0 . . 1 )a u d i o T r a c k F o r m a t I D R e f : c h a r ( 1 . . * )a u d i o C h a n n e l F o r m a ta u d i o B l o c k F o r m a t I D : c h a rr t i m ed u r a t i o na u
39、 d i o T r a c k F o r m a ta u d i o T r a c k F o r m a t I D : c h a ra u d i o T r a c k F o r m a t N a m e : c h a rf o r m a t L a b e l : c h a ra u d i o S t r e a m F o r m a t I D R e f : c h a r ( 0 . . 1 )c hna C hunk T r a c kN um a udi oT r a c kU I D a udi oT r a c kF or m a t I D a
40、udi oP a c kF or m a t I DE xa m pl e 1 A T U _00000001 A T _00010001_01 A P _000100012 A T U _00000002 A T _00031001_01 A P _000310012 A T U _00000003 A T _00031002_01 A P _00031002Refers t oCo n t ai n sE x t ern al referen ceC o ntentF o rm a tBWF F i l eThis thisThis this3.1 Format The audioTrac
41、kFormatID answers the question “What is the format of this track?“ The audioTrackFormat will also contain an audioStreamFormatID, which allows identification of the Rec. ITU-R BS.2076-0 5 combination of the audioTrackFormat and audioStreamFormat. An audioStreamFormat describes a decodable signal. Th
42、e audioStreamFormat is made up of one or more audioTrackFormats. Hence, the combination of audioStreamFormat and audioTrackFormat reveals whether the signal has to be decoded or not. The next stage is to find out what type of audio the stream is; for example it may be a conventional channel (e.g. fr
43、ont left), an audio object (e.g. something named guitar positioned at the front), a HOA (Higher Order Ambisonics) component (e.g. X) or a group of channels . Inside audioStreamFormat there will be a reference to either an audioChannelFormat or audioPackFormat that will describe the audio stream. The
44、re will only be one of these references. If audioStreamFormat contains an audioChannelFormat reference (i.e. audioChannelFormatIDRef) then audioStreamFormat is one of several different types of audioChannelFormat. An audioChannelFormat is a description of a single waveform of audio. In audioChannelF
45、ormat there is a typeDefinition attribute, which is used to define what the type of channel is. The typeDefinition attribute can be set to DirectSpeakers, HOA, Matrix Objects or Binaural. For each of those types, there is a different set of sub-elements to specify the static parameters associated wi
46、th that type of audioChannelFormat. For example, the DirectSpeakers type of channel has the sub-element speakerLabel for allocating a loudspeaker to the channel. To allow audioChannelFormat to describe dynamic channels (i.e. channels that change in some way over time), it uses audioBlockFormat to di
47、vide the channel along the time axis. The audioBlockFormat element will contain a start time (relative to the start time of the parent audioObject) and duration. Within audioBlockFormat there are time-dependent parameters that describe the channel which depend upon the audioChannelFormat type. For e
48、xample, the Objects type of channel has the sub-elements azimuth, elevation and distance to describe the location of the sound. The number and duration of audioBlockFormats is not limited, there could be an audioBlockFormat for every sample if something moves rapidly, though that might be a bit exce
49、ssive! At least one audioBlockFormat is required and so static channels will have one audioBlockFormat containing the channels parameters. If audioStreamFormat refers to an audioPackFormat, it describes a group of channels. An audioPackFormat element groups together one or more audioChannelFormats that belong together (e.g. a stereo pair). This is important when rendering the audio, as channels within the group may need to interact with each other. The reference to an audioPackFormat containing multiple audioCh
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1