1、 Copyright 2014 by THE SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS 3 Barker Avenue, White Plains, NY 10601 (914) 761-1100 Approved April 1, 2014 The attached document is a Registered Disclosure Document (RDD) prepared by the proponent identified below. It has been examined by the appropriate
2、SMPTE Technology Committee and is believed to contain adequate information to satisfy the objectives defined in the Scope, and to be technically consistent. This document is NOT a Standard, Recommended Practice or Engineering Guideline, and does NOT imply a finding or representation of the Society.
3、Errors in this document should be reported to the proponent identified below, with a copy to engsmpte.org. All other inquiries in respect of this document, including inquiries as to intellectual property requirements that may be attached to use of the disclosed technology, should be addressed to the
4、 proponent identified below. In addition to Harmonic, the proponents include AmberFin of Basingstoke, United Kingdom, Avid Technology of Burlington, MA, USA, EVS of Seraing, Belgium, and IPV of Cambridge, United Kingdom. Harmonic will assume the role of proponent contact. Proponent contact informati
5、on: Harmonic Inc. 15220 NW Greenbrier Parkway #290 Beaverton, OR 97006 Attention: Video Server Engineering email: SMPTE-RDD Page 1 of 15 pages SMPTE RDD 25:2014 SMPTE REGISTERED DISCLOSURE DOCUMENT AVC MXF Proxies SMPTE RDD 25:2014 Page 2 of 15 pages Table of Contents Page Introduction 3 1 Scope 3 2
6、 Reference Documents 3 3 MXF Structure . 4 3.1 MXF Physical Structure 4 3.2 MXF Header Metadata . 5 Annex A Bibliography (Informative) . 6 Annex B Coding Constraints . 7 B.1 Video Coding 7 B.2 Video Compression Constraints . 7 B.3 SEI Metadata Insertion for Video 8 B.4 Audio Coding 9 B.5 Ingest Cons
7、iderations (Informative) 9 B.6 Playout Considerations (Informative) . 9 Annex C Proxy Sizes and Aspect Ratios (Informative) . 10 C.1 Common Source Sizes and Aspect Ratios 10 C.2 Proxy Encodings . 10 C.3 Unrounded Scaling . 11 C.4 Round Down Scaling 11 C.5 Round Up Scaling . 12 Annex D Initial User R
8、equirements (Informative) 13 Annex E Growing Proxies 14 Annex F Unresolved Comments 15 SMPTE RDD 25:2014 Page 3 of 15 pages Introduction This RDD documents an MXF application specification for AVC “Long GOP” proxies with AAC audio. Certain restrictions have been applied to the creation of the MXF fi
9、les as a result of them being used in a Proxy application. Note that AVC no longer uses the construct or terminology of a “Group of Pictures” (GOP). This document continues to use the term as “generally understood,” namely a Group of Pictures starting with an I or IDR Picture in coded order. 1 Scope
10、 This RDD defines an MXF Application Profile for AVC proxies with MPEG-2 AAC audio per Operational pattern 1A (OP1a). Specific AVC and AAC coding constraints for each application is specified in specific annexes. 2 Reference Documents Note: All references in this document to other SMPTE documents us
11、e the current numbering style (e.g. SMPTE ST 378:2004) although, during a transitional phase, the document as published (printed or PDF) may bear an older designation (such as SMPTE 378M-2004). Documents with the same root number (e.g. 378) and publication year (e.g. 2004) are functionally identical
12、. The following standards contain provisions which, through reference in this text, constitute provisions of this registered disclosure document. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this registered d
13、isclosure document are encouraged to investigate the possibility of applying the most recent edition of the standards indicated below. ISO/IEC 13818-7:2006, Information Technology Generic Coding of Moving Pictures and Associated Audio Information Part 7: Advanced Audio Coding (AAC) ISO/IEC 14496-10:
14、2012 | ITU-T Recommendation H.264, Information Technology Coding of Audio-Visual Objects Part 10: Advanced Video Coding ITU-R BT.601-7 (01/07), Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide Screen 16:9 Aspect Ratios SMPTE ST 12-1:2014, Time and Control Code SMPTE ST 378:
15、2004, Television Material Exchange Format (MXF) Operational Pattern 1a (Single Item, Single Package) SMPTE ST 379-2:2010, Television Material Exchange Format (MXF) MXF Constrained Generic Container SMPTE ST 381-2:2011, Material Exchange Format (MXF) Mapping MPEG Streams into the MXF Constrained Gene
16、ric Container SMPTE ST 381-3:2013, Material Exchange Format Mapping AVC Streams into the MXF Generic Container SMPTE ST 394:2006, Television Material Exchange Format (MXF) System Scheme 1 for the MXF Generic Container SMPTE RDD 25:2014 Page 4 of 15 pages SMPTE ST 405:2006, Television Material Exchan
17、ge Format (MXF) Elements and Individual Data Items for the MXF Generic Container System Scheme 1 SMPTE ST 436-1:2013, MXF Mappings for VI Lines and Ancillary Data Packets SMPTE RP 224, SMPTE Labels Register ITU-T Rec. T.35, Procedure for the Allocation of ITU-T Defined Codes for Non-Standard Facilit
18、ies IPV: IPV Embedded Metadata Format, Part No. P06-000122/A, 2011, http:/ IPV: IPV SEI Metadata Insertion, Part No: P06-000121/1.0, 2013, http:/ 3 MXF Structure 3.1 MXF Physical Structure The MXF file shall be written as an Operational pattern 1A (OP1a) structured file, compliant with SMPTE ST 378.
19、 It should be an interleaved single file, written in a strictly left to right fashion (no post-write random access fix-up), with the value of KLV Alignment Grid (KAG) equal to 1. Further specific constraints are as follows: Frame wrapped: Each Content Package (CP) shall contain one “Frames worth“ of
20、 video and audio and shall be interleaved according to the SMPTE ST 379-2 rules: When per-frame SMPTE ST 12-1 time code is included in the essence, it shall be encoded in a GC system element, as defined in SMPTE ST 394 and SMPTE ST 405. When there is VI or VANC data in the video, then each CP shall
21、contain one SMPTE ST 436-1 KLV item. Each CP shall contain one KLV for each video frame as a byte stream NAL Unit structure, starting with an access unit delimiter NAL Unit. Wrapping shall comply with the requirements of SMPTE ST 381-3. Each CP shall contain one KLV for each AAC audio frame, which m
22、ay contain zero, one or two AAC frames. (Note that the durations of the video and audio frames are different and that this may give rise to bitstreams that have zero, one or two AAC frames in a content package). Wrapping shall comply with the requirements of SMPTE ST 381-2. Index Tables: Every Conte
23、nt Package should be indexed The zero Position of the Index table corresponds to the first stored frame in the file. The Indexed Position equaling SP:Track(picture):SourceClip:Origin corresponds to the first displayed frame of video in the file. Both video and audio are VBE and the index tables shal
24、l therefore contain delta entries for each element and shall contain slices. Index tables appear in partitions as detailed below. PosTable for AAC tracks is optional. Header Partition: Shall contain only metadata, no index tables, no essence SMPTE RDD 25:2014 Page 5 of 15 pages Body Partitions: Esse
25、nce shall be collected into pairs of body partitions containing 10 seconds of essence and 10 seconds of indexing. Body partitions shall contain only essence or only index table segments and shall not contain header metadata. In the case of non-integral frame rates, a duration equivalent to the neare
26、st integral frame count shall be used. Each pair of partitions shall be recorded as follows: A Body partition containing the interleaved essence. A Body partition containing only one Index Table segment for the same essence. End of file: At end of file the following sequence shall occur: An optional
27、 Body partition containing any remaining essence that is of a duration less than 10 seconds. A Footer Partition with a closed and complete repetition of the header metadata and the index table segment for the last Body partition. A Random Index Pack (RIP): Correct construction of a RIP clearly marks
28、 which partitions have essence (the RIP will have a non-zero body SID for that partition) and which partitions have indexes (RIP body SID = 0). 3.2 MXF Header Metadata Time Code: SMPTE ST 12-1 Time code shall be continuous. Decoders shall use the MXF Header Material Package timecode as the lead time
29、 code. Partition: The Picture Essence Container Label shall comply with the requirements of SMPTE ST 381-3. The Sound Element Container Label shall comply with the requirements of SMPTE ST 381-2 (see Section 9.1). File Descriptor shall be an MPEG Video descriptor (per SMPTE ST 381-3, Tables 3 and 4)
30、 with: Picture Essence Coding Label 06.0E.2B.34.04.01.01.0A. 04.01.02.02.01.31.11.01 31 = profile category (predictive profiles) 11 = profile (AVC Constrained Baseline Profile) 01 = coding variants (AVC unconstrained coding) Audio Descriptor shall be a MPEG Audio descriptor (per SMPTE ST 381-2, Sect
31、ion10.4) with: Essence Coding 06.0E.2B.34.04.01.01.03. 04.02.02.02.03.03.01.00 for MPEG-2 AAC audio in ADTS Note: While this UL is currently in use, there may be a different UL assigned in the future and a backwards compatibility strategy will be created if this happens. The Material Package and the
32、 Source Package shall contain tracks for video, audio, time code, and optional SMPTE ST 436 data, as per SMPTE ST 377-1. SMPTE RDD 25:2014 Page 6 of 15 pages Annex A Bibliography (Informative) Note: All references in this document to other SMPTE documents use the current numbering style (e.g. SMPTE
33、ST 382:2007) although, during a transitional phase, the document as published (printed or PDF) may bear an older designation (such as SMPTE 382M-2007). Documents with the same root number (e.g. 382) and publication year (e.g. 2007) are functionally identical. ISO/IEC 13818-1:2007 | ITU-T Rec. H.222,
34、 Information Technology Generic Coding of Moving Pictures and Associated Audio Information: Systems ISO/IEC 14496-3:2009, Information Technology Coding of Audio-Visual Objects Part 3: Audio ISO/IEC 14496-15:2010, Information Technology Coding of Audio-Visual Objects Part 15: Advanced Video Coding (A
35、VC) File Format SMPTE ST 382:2007, Material Exchange Format Mapping AES3 and Broadcast Wave Audio into the MXF Generic Container Amendment 1:2012 to SMPTE ST 382:2007 Amendment 2:2013 to SMPTE ST 382:2007 SMPTE ST 2041-2:2010, Format for Non-PCM Audio in AES3 MPEG-2 AAC and HE AAC Audio in ADTS SMPT
36、E ST 2041-3:2010, Format for Non-PCM Audio and Data in AES3 MPEG-4 AAC and HE AAC Compressed Digital Audio in ADTS and LATM/LOAS Wrappers SMPTE RDD 25:2014 Page 7 of 15 pages Annex B Coding Constraints B.1 Video Coding B.1.1 Source Formats All proxies shall be encoded using progressive video. See An
37、nex C for additional guidance on proxy dimensions and aspect ratios. Note: AVC essence is to be encoded per ISO/IEC 14496-10 and not per ISO/IEC 14496-15. B.1.1.1 Frame Rates 24 fps, variable GOP lengths of up to 12 frames 24/1.001 fps, variable GOP lengths of up to 12 frames 25 fps, variable GOP le
38、ngths of up to 12 frames 30 fps, variable GOP lengths of up to 15 frames 30/1.001 fps, variable GOP lengths of up to 15 frames 50 fps, variable GOP lengths of up to 12 frames 60 fps, variable GOP lengths of up to 15 frames 60/1.001 fps, variable GOP lengths of up to 15 frames B.1.1.2 Frame Sizes B.1
39、.1.2.1 Frame Dimensions The sample width of the encoded proxy video shall be no greater than 960 and no less than 176. The sample height of the encoded video shall be no greater than 544 and no less that 112. Both the width and the height shall be a multiple of 16. B.1.1.2.2 Aspect Ratio The aspect
40、ratio for the encoded video shall be included in both the AVC video (as SAR values in the VUI parameters of the Sequence Parameter Set) and the MXF Wrapper (as Optional Picture Descriptor Properties values for DisplayWidth and DisplayHeight). Note that the MXF Wrapper also holds “Stored” and “Sample
41、” Width and Height values, along with “Sample” and “Display” offsets, thereby allowing a crop to be applied to the decoded video before it is rendered. See Annex C for a detailed discussion of this topic. B.2 Video Compression Constraints B.2.1 Coding Video compression shall be compliant with AVC Co
42、nstrained Baseline profile, using variable bitrate encoding, selectable between 0.6 and 6 Mbps. No B frames or forward temporal references are permitted. The following additional provisions shall apply: SMPTE RDD 25:2014 Page 8 of 15 pages 1. There shall be one SPS and one PPS for the duration of th
43、e stream. 2. The SPS and PPS shall be placed at the beginning of the Content Package containing the first IDR or I-frame. 3. The SPS and PPS should be repeated with every IDR and I-frame. B.2.2 Pre-charge and Rollout First displayed image shall occur within the first 2 GOPs. Origin value of the Sour
44、ce Package shall the number of frames of pre-charge. Duration value of the Material Package shall be the total number of displayed frames. Duration value of the Source Package shall be the sum of the pre-charge and displayed frames. Rollout of one or more frames is allowed. Rollout frames shall not
45、be included in the Material and/or Source Package durations. The existence of rollout frames can be determined by comparing the length of the Index and the length of the Source Package. See SMPTE ST 377-1 MXF Timing Model for more details. B.2.3 Encapsulation Byte stream NAL Units inside each MXF Fr
46、ame with stream ID = 0, starting with an access unit delimiter NAL Unit. B.3 SEI Metadata Insertion for Video Additional video metadata can be inserted in the AVC stream using Supplemental Enhancement Information elements as per ISO/IEC 14496-10. The Metadata SEI should be positioned within the vide
47、o stream before the first IDR (0x01) or non-IDR (0x05) NAL Unit for each video frame. B.3.1 NAL Unit Type Per Table 7-1 in ISO/IEC 14496-10, SEIs use NAL Unit type 0x06. There shall not be any other messages in the NAL unit. B.3.2 SEI Type The SEI Type is 0x04 (User Data registered by ITU-T Rec. T.3
48、5). The value for itu_t_t35_country_code shall be 0xB5 (United States). The value for itu_t_35_provider_code shall be 0x002B (Harmonic Inc.). B.3.3 Payload Length The Payload Length is encoded as a series of FF bytes, the number being the payload length divided by 255, followed by a byte of the payl
49、oad length modulo 255. The length itself comprises the data which follows the Payload Length field, namely: SEI GUID; and Payload (see below). B.3.4 UUID The UUID for IPV Metadata SEI blocks is (all numbers in hex) 56.77.C2.77.F3.58.47.48.89.97.9F.39.EC.FD.F1.B9. SMPTE RDD 25:2014 Page 9 of 15 pages B.3.5 Payload The Payload for IPV Metadata SEI blocks can be generated either by the Metadata Library