CAN CSA-ISO IEC TR 15938-8D-2010 Information technology - Multimedia content description interface - Part 8 Extraction and use of MPEG-7 descriptions - AMENDMENT 4 Extraction of au.pdf

上传人:figureissue185 文档编号:591121 上传时间:2018-12-15 格式:PDF 页数:34 大小:829.29KB
下载 相关 举报
CAN CSA-ISO IEC TR 15938-8D-2010 Information technology - Multimedia content description interface - Part 8 Extraction and use of MPEG-7 descriptions - AMENDMENT 4 Extraction of au.pdf_第1页
第1页 / 共34页
CAN CSA-ISO IEC TR 15938-8D-2010 Information technology - Multimedia content description interface - Part 8 Extraction and use of MPEG-7 descriptions - AMENDMENT 4 Extraction of au.pdf_第2页
第2页 / 共34页
CAN CSA-ISO IEC TR 15938-8D-2010 Information technology - Multimedia content description interface - Part 8 Extraction and use of MPEG-7 descriptions - AMENDMENT 4 Extraction of au.pdf_第3页
第3页 / 共34页
CAN CSA-ISO IEC TR 15938-8D-2010 Information technology - Multimedia content description interface - Part 8 Extraction and use of MPEG-7 descriptions - AMENDMENT 4 Extraction of au.pdf_第4页
第4页 / 共34页
CAN CSA-ISO IEC TR 15938-8D-2010 Information technology - Multimedia content description interface - Part 8 Extraction and use of MPEG-7 descriptions - AMENDMENT 4 Extraction of au.pdf_第5页
第5页 / 共34页

1、Information technology Multimedia content description interface Part 8: Extraction and use of MPEG-7 descriptions AMENDMENT 4:Extraction of audio features from compressed formatsAmendment 4:2010 (IDT) toNational Standard of CanadaCAN/CSA-ISO/IEC TR 15938-8-04(ISO/IEC TR 15938-8:2002, IDT)NOT FOR RES

2、ALE.PUBLICATION NON DESTINE LA REVENTE.CSA Standards Update ServiceAmendment 4:2010 toCAN/CSA-ISO/IEC TR 15938-8-04December 2010Title:Information technology Mul timedia content description interface Part 8: Extraction and use of MPEG-7 descriptions AMENDMENT 4:Extraction of audio features from compr

3、essed formatsPagination:28 pages (iii preliminary and 25 text)To register for e-mail notification about any updates to this publicationgo to www.shopcsa.caclick on E-mail Services under MY ACCOUNTclick on CSA Standards Update ServiceThe List ID that you will need to register for updates to this publ

4、ication is 2416193.If you require assistance, please e-mail or call 416-747-2233.Visit CSAs policy on privacy at to find out how we protect your personal information.Reference numberISO/IEC TR 15938-8:2002/Amd.4:2009(E)ISO/IEC 2009TECHNICAL REPORT ISO/IECTR15

5、938-8First edition2002-12-15AMENDMENT 42009-11-15Information technology Multimedia content description interface Part 8: Extraction and use of MPEG-7 descriptions AMENDMENT 4: Extraction of audio features from compressed formats Technologies de linformation Interface de description du contenu multim

6、dia Partie 8: Extraction et utilisation des descriptions MPEG-7 AMENDEMENT 4: Extraction de caractristiques audio partir de formats compresss ISO/IEC TR 15938-8:2002/Amd.4:2009(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may b

7、e printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts

8、 no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the f

9、ile is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT ISO/IEC 2009 All rights reserved. Unless otherwise specified, no part of this publication may be

10、reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41

11、 22 749 01 11 Fax + 41 22 749 09 47 E-mail Web ii ISO/IEC 2009 All rights reservedAmendment 4:2010 to CAN/CSA-ISO/IEC TR 15938-8-04ISO/IEC TR 15938-8:2002/Amd.4:2009(E) ISO/IEC 2009 All rights reserved iiiForeword ISO (the International Organization for Standardization)

12、and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal

13、with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC

14、have established a joint technical committee, ISO/IEC JTC 1. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by the jo

15、int technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75 % of the national bodies casting a vote. In exceptional circumstances, the joint technical committee may propose the publication of a Technical Report of on

16、e of the following types: type 1, when the required support cannot be obtained for the publication of an International Standard, despite repeated efforts; type 2, when the subject is still under technical development or where for any other reason there is the future but not immediate possibility of

17、an agreement on an International Standard; type 3, when the joint technical committee has collected data of a different kind from that which is normally published as an International Standard (“state of the art”, for example). Technical Reports of types 1 and 2 are subject to review within three yea

18、rs of publication, to decide whether they can be transformed into International Standards. Technical Reports of type 3 do not necessarily have to be reviewed until the data they provide are considered to be no longer valid or useful. Attention is drawn to the possibility that some of the elements of

19、 this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Amendment 4 to ISO/IEC TR 15938-8:2002 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, pi

20、cture, multimedia and hypermedia information. Amendment 4:2010 to CAN/CSA-ISO/IEC TR 15938-8-04ISO/IEC TR 15938-8:2002/Amd.4:2009(E) ISO/IEC 2009 All rights reserved 1Information technology Multimedia content description interface Part 8: Extraction and use of MPEG-7 descriptions AMENDMENT 4: Extrac

21、tion of audio features from compressed formats After, add Clause 5: 5 Direct audio feature extraction from the compressed domain 5.1 Introduction Due to efficient MPEG audio compression technologies, such as MPEG 1 Layer III (MP3), AMD4-1 or MPEG-2/-4 AAC, (AAC), AMD4-2, AMD4-3 the number

22、of personal and institutional music stored in archives grew significantly during the last years. At the same time, the need for automatic search and retrieval capabilities for music increased in order to manage these databases. These search and retrieval applications base on low-level features (e.g.

23、 described in the MPEG-7 standard AMD4-4) which are extracted from the digital audio content. In order to efficiently search in large archives, there is need to perform a faster low-level feature extraction. This technical report describes a method, which allows an extraction of MPEG-7 low-level fea

24、tures AMD4-4 directly from the compressed domain, by transforming the frequency representation of MPEG compressed audio files into the DFT domain for feature extraction. 5.2 Conventional feature extraction The conventional approach to obtain MPEG-7 features from compressed audio data is to decode it

25、 first and then to generate the MPEG-7 features based on the decoded time signal. But especially when searching large libraries of compressed audio files this approach can become computationally very expensive. Several works deal with the conversion between subband domain representations, especially

26、 in the field of image and video coding. In AMD4-5, AMD4-6 the conversion between different sizes of DCT transforms is given, having the drawback that they are restricted to non-lapped transforms. The patent in AMD4-7 proposes a conversion method between the MDCT and the DFT domain. It is restricted

27、 to MDCT and DFT and therewith not suitable for our purposes, since we want to include also hybrid filter banks, an integral part of MP3. The architecture presented in AMD4-8 is not restricted to the type of filter banks used. Unfortunately, the number of subbands of the different filterbanks have t

28、o be multiples of each other and this is again unsuitable for our needs. However, this paper serves as the basis for a general conversion method proposed in AMD4-9, which can be applied to any maximally-decimated filter bank without condition on their sizes. Here, a conversion matrix is generated by

29、 multiplying the analysis with a synthesis filter bank. Principally, the same is done in this technical report, though, a universal mathematical description is used, the polyphase description introduced in AMD4-10. Additionally, the described method is extended by applying it to arbitrary resolution

30、 translations between synthesis and analysis filter banks in a practical way. Furthermore, it is adjusted to MP3 and AAC, and exploits some special properties of the so-called conversion matrix which is explained in the next section. In AMD4-11 the problem of generating a complex from a real valued

31、spectral representation is picked up from the reverse side. Therein it is said that a desired frequency response can be approximated by means of Amendment 4:2010 to CAN/CSA-ISO/IEC TR 15938-8-04ISO/IEC TR 15938-8:2002/Amd.4:2009(E) 2 ISO/IEC 2009 All rights reserveda linear combination with constant

32、 weighting factors. This approach only allows a coarse approximation, nonetheless, having a very small computational complexity load. This approach gave the inspiration for the issue termed as spectral approximation. A completely different approach is worth mentioning here which works directly on th

33、e compressed domain. It uses the MDCT coefficients as the basis for the low level feature extraction AMD4-12. Since there is no conversion into the DFT domain applied, this approach is restricted to the time/frequency resolution provided by the used codec. It is hence not compatible to existing MPEG

34、-7 feature databases. 5.3 Direct feature extraction 5.3.1 System overview In order to extract audio features from the compressed domain, we designed a conversion system which directly converts the given time-frequency representations of MPEG-1 Layer III and MPEG-2/-4 AAC into the time-frequency repr

35、esentation needed for calculating MPEG-7 compliant features. After applying the conversion method, the resulting complex-valued spectral coefficients are fed to the feature extraction algorithm. Before we elaborate on the direct feature extraction system, it is important to know some details about h

36、ow the conventional approach works and how it deals with compressed audio input material. Figure AMD4.1 shows the basic building blocks of the conventional feature extraction process. Figure AMD4.1 Basic building blocks of the conventional feature extraction process First, the compressed input audio

37、 material needs to be decoded to PCM audio data. Then, the feature extraction process, which consists of an analysis and a feature calculation stage, applies a window function to the PCM input samples followed by an FFT prior to the feature calculation. Our goal is to substitute the bulk of the comp

38、utational amount needed for decoding and analyzing by one direct conversion process. In this context the bulk of the computational amount of the decoding process comprises basically the synthesis filter bank of the particular decoder. For MP3 additionally reordering and anti-aliasing operations take

39、 place. We now take a look at Figure AMD4.2. The synthesis filter bank of the decoder having a transfer function and the analysis filter bank of the feature extraction process having another transfer function exhibit different numbers of subbands, K and L respectively. Figure AMD4.2 Synthesis filter

40、 bank with K subbands followed by an analysis filter bank with L subbands. Both filter banks are maximally decimated and linear time-invariant Amendment 4:2010 to CAN/CSA-ISO/IEC TR 15938-8-04ISO/IEC TR 15938-8:2002/Amd.4:2009(E) ISO/IEC 2009 All rights reserved 3Yk(m) denotes the subband coefficien

41、t of the compressed bitstream of subband K at block m, x(n) is the decoded time audio signal at time n, and yi(m) is the subband signal of the desired domain of subband I at block m. However, a more efficient and useful representation of maximally-decimated filter banks is the so-called polyphase de

42、scription introduced by Vaidyanathan AMD4-1. The main advantage of the polyphase description is its mathematical compactness, so that a filter bank can be fully described by a polyphase filter matrix. The filtering process then reduces to a multiplication of a z-transformed signal vector with a poly

43、phase filter matrix. Furthermore, a concatenation of different filter banks can be achieved by using only one polyphase matrix, which can be obtained by multiplying the individual polyphase matrices of these filter banks. This property enables the construction of a conversion matrix T(z) of size M *

44、 M as shown in Figure AMD4.3. PolyphaseMatrixPolyphaseMatrixSynthesis AnalysisDirect Conversion System)(zG )(zH)(zY )(zX )(zYPolyphase ConversionMatrix)(zY)(zY () () ()zzz=TGHMM LLKK Figure AMD4.3 Block diagram of the conventional transcoding of the direct conversion method It is evident, that M2mul

45、tiplications are necessary to calculate the desired spectral values when using an M*M conversion matrix. That is equivalent to a complexity of O(N2) and, unfortunately, much more complex than deploying the conventional method, since the latter uses efficient implementations of the MDCT and FFT featu

46、ring an overall complexity of O(N log(N). We found, that only a fraction of the values inside a conversion matrix is necessary for the calculation of audio features, which still guarantee a successful identification of the underlying audio material. This is possible, since the most significant value

47、s of a conversion matrix are evenly spread along the main diagonal, and they decrease quickly the further we move away from it. The most important characteristic of a conversion matrix T(z) is that it exhibits a strong similarity to diagonal and therefore sparse matrices. For instance, Figure AMD4.4

48、 shows an example of such a polyphase conversion matrix, where the white areas corresponds to zeros in the matrix. Observe that three images of matrices can be used, because each corresponds to the coefficients of a different power of z of the polyphase matrix. The analysis time window is set to 30

49、ms because it is suitable for many tasks of music information retrieval. The sampling frequency is chosen to be 44,1 kHz (generally it is arbitrary), hence the matrix generates 1024 complex Fourier coefficients as output, whereas it takes 576 (the content of one MP3 granule) real valued input samples. Amendment 4:2010 to CAN/CSA-ISO/IEC TR 15938-8-04ISO/IEC TR 15938-8:2002/Amd.4:2009(E) 4 ISO/IEC 2009 All rights reservedFigure AMD4.4 Exemplary complex polyphase conversion ma


当前位置:首页 > 标准规范 > 国际标准 > 其他

copyright@ 2008-2019 麦多课文库(网站版权所有