1、 Copyright 2010 by THE SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS 3 Barker Avenue., White Plains, NY 10601 (914) 761-1100 Approved March 31, 2010 Table of Contents Page Foreword 2 Intellectual Property . 2 Introduction 2 1 Scope 3 2 Conformance Notation 3 3 Normative References 3 4 Acronyms
2、and Terminology 4 4.1 Acronyms 4 4.2 Terminology 4 5 DCDM Audio Essence Constraints. 7 5.1 Audio Charqcteristics Constraints 7 5.2 Terminology 8 6 DCDM Audio Information Required . 12 6.1 DCDM Composition Audio Information 12 6.2 DCDM Post-Production Reel Information . 12 6.3 DCDM Audio Track Inform
3、ation 13 7 DCDM Audio Insert Information Optional 13 Annex A Audio File Naming Examples (Informative) 14 Page 1 of 14 pages SMPTE RP 428-4:2010 SMPTE RECOMMENDED PRACTICE D-Cinema Distribution Master Audio File Format and Delivery Constraints SMPTE RP 428-4:2010 Page 2 of 14 pages Foreword SMPTE (th
4、e Society of Motion Picture and Television Engineers) is an internationally-recognized standards developing organization. Headquartered and incorporated in the United States of America, SMPTE has members in over 80 countries on six continents. SMPTEs Engineering Documents, including Standards, Recom
5、mended Practices, and Engineering Guidelines, are prepared by SMPTEs Technology Committees. Participation in these Committees is open to all with a bona fide interest in their work. SMPTE cooperates closely with other standards-developing organizations, including ISO, IEC and ITU. SMPTE Engineering
6、Documents are drafted in accordance with the rules given in Part XIII of its Administrative Practices. SMPTE RP 428-4 was prepared by Technology Committee 21DC. Intellectual Property At the time of publication no notice had been received by SMPTE claiming patent rights essential to the implementatio
7、n of this Recommended Practice. However, attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. SMPTE shall not be held responsible for identifying any or all such patent rights. Introduction This recommended practice is intended to cons
8、train the audio that is delivered as a Digital Cinema Distribution Master (DCDM) in order to facilitate the process of creating a D-Cinema Package (DCP). By following the guidelines in this document, the audio DCDM will be interchangeable between facilities and will reliably sync with the intended p
9、icture essence. This document is intended to work in conjunction with the existing SMPTE 428-2 DCDM Audio Characteristics and SMPTE 428-3 DCDM Audio Channel Mapping and Channel Labeling documents, utilizing the specifications already established therein and embellishing upon them with further detail
10、 and constraints. It brings into alignment traditional film terminology and techniques with todays digital technology and processes. SMPTE RP 428-4:2010 Page 3 of 14 pages 1 Scope This document specifies constraints on the technical parameters, file structure, file naming and other relevant informat
11、ion pertaining to audio essence delivered from final post-production as a Digital Cinema Distribution Master. 2 Conformance Notation Normative text is text that describes elements of the design that are indispensable or contains the conformance language keywords: “shall“, “should“, or “may“. Informa
12、tive text is text that is potentially helpful to the user, but not indispensable, and can be removed, changed, or added editorially without affecting interoperability. Informative text does not contain any conformance keywords. All text in this document is, by default, normative, except: the Introdu
13、ction, any section explicitly labeled as “Informative“ or individual paragraphs that start with “Note: The keywords “shall“ and “shall not“ indicate requirements strictly to be followed in order to conform to the document and from which no deviation is permitted. The keywords, “should“ and “should n
14、ot“ indicate that, among several possibilities, one is recommended as particularly suitable, without mentioning or excluding others; or that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain possibility or course of action is deprecated bu
15、t not prohibited. The keywords “may“ and “need not“ indicate courses of action permissible within the limits of the document. The keyword reserved indicates a provision that is not defined at this time, shall not be used, and may be defined in the future. The keyword forbidden indicates reserved and
16、 in addition indicates that the provision will never be defined in the future. A conformant implementation according to this document is one that includes all mandatory provisions (“shall“) and, if implemented, all recommended provisions (“should“) as described. A conformant implementation need not
17、implement optional provisions (“may“) and need not implement them as described. Unless otherwise specified, the order of precedence of the types of normative information in this document shall be as follows: Normative prose shall be the authoritative definition; Tables shall be next; followed by for
18、mal languages; then figures; and then any other language forms. 3 Normative References The following standards contain provisions which, through reference in this text, constitute provisions of this recommended practice. At the time of publication, the editions indicated were valid. All standards ar
19、e subject to revision, and parties to agreements based on this recommended practice are encouraged to investigate the possibility of applying the most recent edition of the standards indicated below. SMPTE 428-2-2006, D-Cinema Distribution Master Audio Characteristics SMPTE 428-3-2006, D-Cinema Dist
20、ribution Master Audio Channel Mapping and Channel Labeling SMPTE 429-2-2009, D-Cinema Packaging DCP Operational Constraints SMPTE 429-7-2006, D-Cinema Packaging Composition Playlist SMPTE RP 428-4:2010 Page 4 of 14 pages ITU-R BR.1352-1: 2002, Broadcast Wave Format (BWF), Annex 1 ISO/IEC 646:1991, I
21、nformation Technology ISO 7-Bit Coded Character Set for Information Interchange ISO 639-2/T:1998, Codes for the Representation of Names of Languages Part 2: Alpha-3 Code, First Edition 4 Acronyms and Terminology 4.1 Acronyms DCDM: Digital Cinema Distribution Master DCP: Digital Cinema Package FFOA:
22、First Frame Of Action (See definition in Section 4.2) LFOA: Last Frame Of Action (See definition in Section 4.2) 4.2 Terminology Audio Content and Type: A description of the audio that is contained in the essence. This is used to associate a given file with other files in the Soundfield Configuratio
23、n group, or to differentiate a file containing other content that may be delivered at the same time to packaging. This may be comprised of an Audio Type descriptor and/or an Audio Content descriptor, which are defined below. In typical use, the two are combined to describe a unique audio element. Ex
24、amples include Dialog Stem, Visually Impaired Narration, Hearing Impaired Time Code, Directors Commentary Narration. Audio Content: This is an optional modifier to Audio Type that can be included to further describe the audio if needed. Examples are Dialog, Music, Effects, Narration, Directors Comme
25、ntary, and Time Code. Audio Type: This is the type of audio contained in the audio essence. Examples include Printmaster, Stem, Music and Effects, Score, Composite Mix, Visually Impaired, Hearing Impaired. Bit Depth: The number of bits in a sample word. Channel: The loudspeaker location intended for
26、 an audio essence. Loudspeaker locations are defined in SMPTE 428-3, Section 3. Channel Label: A label to indicate the intended channel for audio reproduction. Channel labels may exist in multiple areas within the audio creation and delivery process including, but not limited to, audio essence, the
27、audio reproduction chain, and individual speakers. This document specifically refers to channel labels applied to the audio essence in an Audio DCDM for delivery to D-Cinema packaging. Composition: (From SMPTE 429-7, DCP Composition Playlist, Section 3.) A composition is a self-contained representat
28、ion of a single complete D-Cinema work such as a motion picture, or a trailer, or an advertisement, etc. It tangibly consists of a Composition Playlist file and one or more track files, which contain the actual essence. SMPTE RP 428-4:2010 Page 5 of 14 pages Content Kind: A designation of what the c
29、ontent is (e.g., feature, trailer, advertisement, etc.) per Table 2 of SMPTE 429-7. Content Title: The title of the composition. For example, the title of a feature film. Content Version: The version of the composition, such as Domestic, International, or Directors Cut. Duration: This is the duratio
30、n of the actual program essence for a given post-production reel, expressed as an integer number of frame periods. This is counted from the left edge of the FFOA to the right edge (end) of the LFOA, and thus includes both the FFOA and LFOA in their entirety. In practice, this may be calculated from
31、an editors LFOA List by taking the stated LFOA start, adding 1 frame and subtracting 192 frames (the leader). If the LFOA list is expressed in feet/frames, this must first be converted to frames and then the above formula is applied. Example: A post-production reel has a stated LFOA of 1500+4. Since
32、 one foot in film is 16 frames, 1500+4 is 24,000 + 4 = 24004 frames. Adding one and subtracting 192 gives a duration of 23,813 frames. Editable Unit (Edit Unit): (From SMPTE 429-7, Section 4.) The smallest temporal increment of access to Essence, e.g., a frame or a sample. In practice, the edit unit
33、 in D-Cinema is the duration of one picture frame for a monoscopic picture composition or the duration of the left eye/right eye pair of frames in a stereoscopic picture. Although audio itself is capable of being edited to the sample, in practice it is edited on the corresponding picture frame bound
34、aries. In D-Cinema packaging, audio is frame wrapped such that a packet of audio essence is the duration of an edit unit and contains the number of audio samples corresponding to the audio sample rate and the composition edit rate. Edit Rate: (From SMPTE 429-7, Section 4.) The number of Editable Uni
35、ts to be reproduced during a temporal interval having duration of exactly one (1.0) second. Because Edit Rate values are not always integer values and sometimes require many digits of precision, Edit Rate values are expressed as a rational number (the ratio of two integers). For a composition contai
36、ning monoscopic picture, the frame rate and edit rate are identical. For stereoscopic picture, the edit unit represents two picture frames, (left eye and right eye), which would always be edited as a pair, and thus the edit rate is half of the frame rate. Essence: (From SMPTE 429-7, Section 4.) The
37、sound, picture, and data resources that make up a Composition. File Date: The date that an audio file was created or the most recent date it was modified. This is assigned by the creator of the file and is independent of the date that a file directory may indicate. First Frame of Action (FFOA): The
38、first frame of a post-production reel that contains image action pertinent to the program. For audio elements, it is the location of the sound that is intended to sync with the first frame of image action. The location of this frame is depicted by the location of the left edge of the frame. For exam
39、ple, the FFOA is 8 seconds (192 frames at 24 fps) from the left edge of the picture start frame on the leader on a post-production reel of 35 mm film. The same concept applies to D-Cinema. Frame Rate: (From SMPTE 429-7, Section 4.) The number of frames per second. Language: The main spoken language
40、of the audio essence. Language representation is depicted in Section 5.2.6.2. SMPTE RP 428-4:2010 Page 6 of 14 pages Last Frame of Action (LFOA): The last frame of a post-production reel that contains image action pertinent to the program. For audio elements, it is the location of the sound that is
41、intended to sync with the last frame of image action. It is common for this value to be expressed in feet and frames, but may be frames only. The location of this frame is depicted by the location of the left edge of the frame, and is traditionally referenced as being the number of frames (or feet a
42、nd frames) counting from the left edge of the picture start frame (a.k.a. 0 feet in film terminology) to the left edge of the LFOA in a post-production reel. The picture start frame is therefore included in the count. Note that the location of the LFOA is the start of the frame-the action actually f
43、inishes at the end of this frame. Therefore, the image action and sound actually ends at LFOA+1Frame. Post-Production Reel: A partition of the essence, the duration of which is defined by the content providers post-production process. For example, a theatrical presentation may be divided into 6 post
44、-production reels, each of which is typically 22 minutes or less in length and corresponds to a 2100 foot or less reel of physical film. A post-production reel may typically be made into a D-Cinema track file. Post-Production Reel Number: An identifier associated with a post-production reel to descr
45、ibe its place in the sequence of reels delivered by post-production, which are intended for a composition. This is generally a combination of letters and numbers. For example, R5 A/B would indicate that this is the 5th reel in the set of post-production reels delivered, and that it is a double lengt
46、h reel (a combination of an A reel and a B reel). Prelap: An industry term that refers to audio that may be placed between the head 2 pop and the FFOA, which is a copy of the outgoing audio of the previous post-production reel, in order to better facilitate the changeover between post-production ree
47、ls. Pullup: An industry term that refers to audio that may be placed after the LFOA and before the tail pop, which is a copy of the incoming audio of the subsequent post-production reel, in order to better facilitate the changeover between post-production reels. Note that pullups have been in use fo
48、r many years in the film industry to facilitate film projection changeovers. Prelaps are more recent, and are often used when there is continuous music across the post-production reels. Prelaps and Pullups are generally used in film projection and are not used in D-Cinema. If they are supplied as a
49、DCDM, they shall be sample accurate with the previous and subsequent post-production reels. Sample Rate: (From SMPTE 429-7, Section 4.) The number of essence samples per second. Soundfield: The acoustical space within which the intended audio image is created. Soundfield Configuration: A Soundfield Configuration is a defined arrangement or configuration of loudspeakers. A group of audio channels that are intended to be reproduced simultaneously through a defined Soundfie