ImageVerifierCode 换一换
格式:PDF , 页数:6 ,大小:100.91KB ,
资源ID:790323      下载积分:10000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-790323.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(ITU-R BS 1657-2003 Procedure for the performance testing of automated audio identification systems《自动音频识别系统性能测试的步骤 关于ITU-R 8 6》.pdf)为本站会员(吴艺期)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

ITU-R BS 1657-2003 Procedure for the performance testing of automated audio identification systems《自动音频识别系统性能测试的步骤 关于ITU-R 8 6》.pdf

1、 Rec. ITU-R BS.1657 1 RECOMMENDATION ITU-R BS.1657 Procedure for the performance testing of automated audio identification systems (Question ITU-R 8/6) (2003) The ITU Radiocommunication Assembly, considering a) that metadata will be accompanying most audio broadcast transmissions in the future; b) t

2、hat the automatic generation of metadata will be necessary to offer a complete cost-efficient service in future; c) that automatic identification of audio items enables tracking of transmitted programme material; d) that different schemes for extraction of audio metadata are developed today; e) that

3、 ISO/IEC JTC 1/SC 29/WG 11 is currently finalizing schemes for the coding of metadata for multimedia data; f) that no quality assessment procedures for audio metadata extraction schemes have been standardized until now, recommends 1 that the procedure described in Annex 1 should be used to evaluate

4、the performance of automated audio identification systems. Annex 1 Procedure for the performance testing of automated audio identification systems 1 Introduction In a time of ever-increasing databases filled with musical content, be it genuine audio material or associated metadata (“data about data”

5、), the demand for tools to maintain this mass of data is also growing more urgent day by day. This desire is not only voiced by professionals, but also by the common Internet user and music-lover, who searches the web on numerous errands for her or his preferred musical style. In order to facilitate

6、 the retrieval of the desired material two different levels of abstraction are here discerned: The search for metadata that can more or less be extracted automatically from the audio content, such as instrumentation, melodic theme, rhythm, etc. An example application for this would be a query-by-hum

7、ming system or the classification into genres, which is commonly used in recommendation engines. Automatic identification of titles, where only insufficient, unreliable or no metadata at all is available. An “essence” of the audio data is distilled and compared to a database of known material, thus

8、creating a link to relevant metadata such as artist, song name, etc. 2 Rec. ITU-R BS.1657 While the first mentioned class contributes mainly to the human interaction interface, the second topic finds its application also in the protection of rights by tracking radio programmes and Internet transacti

9、ons. It is foremost in this latter context that algorithms fitting that profile are referred to as “fingerprinting” techniques. 2 Motivation To meet the demands of the music business, the recognition rate of the applied fingerprinting technology must be high and withstand common alterations and modi

10、fications of the original audio content. For this purpose, the music business has acknowledged the need of quality assurance for audio identification systems by recently formulating a request for information on audio finger-printing technologies. The severity and urgency of this problem is also unde

11、rlined by the fact that a number of different, often proprietary, solutions have appeared recently. All methods, however, face the same problems regarding their robustness to modification and deterioration of the original material. Although the original material may have changed by a number of proce

12、ssing steps or degradations, it nevertheless shall be recognized as the intellectual property of the artist and composer. This leads to the proposition that automated music identification should ideally be as precise and tolerant to signal modification as human perception and recognition. Beyond rob

13、ustness to signal alterations, a good fingerprinting system should exhibit a small fingerprint size (considering that certain applications might require the storage of millions of fingerprints), fast fingerprint extraction and recognition and further desirable properties. It should be noted that rob

14、ustness to signal alterations and compactness of fingerprint representation are two conflicting requirements which have to be reconciled by systems. Consequently, in order to assess the quality of an automated audio identification system, a test environment has to be defined that covers different ty

15、pes of signal degradation in multiple degrees of severity and describes how to determine other essential system parameters. To allow the objective evaluation of identification systems, a unified test procedure is needed. 3 Quality parameters For audio identification systems the following quality par

16、ameters have to be considered: Segment size of the audio material to be identified. What portion of an item is necessary for the identification? Size of the fingerprint. How many data (bytes) per item have to be stored in the database? Is the size of the fingerprint constant or variable (with respec

17、t to the length of the item)? Size of the database. How many items can be handled simultaneously by the system? Rec. ITU-R BS.1657 3 Mode of identification. Does the system allow identification of randomly chosen subsets of audio material (continous fingerprinting) or is identification tied to short

18、 fingerprinted segments? If the latter: What is the segment size? Identification speed. How long does it take to identify an item? How does this scale with the number of items in the database? Identification performance with original and altered material. How much distortion can be introduced withou

19、t significantly affecting the recognition rate? How does this scale with the number of items in the database and the amount of distortion? Fingerprint generation speed. How fast can a fingerprint be generated on a given platform? How many resources are necessary to generate a fingerprint (e.g. centr

20、al processing unit (CPU) speed, amount of random access memory (RAM), floatingpoint processing (FPU) unit necessary)? Training speed. How long does it take to add items to the database? How does this scale with the number of items already in the database? To assess these properties in a sensible fas

21、hion and thereby to show the suitability of a system for real-world application, a test environment must exhibit constant boundary conditions regarding the characteristics under test. Relevant test conditions are the size and content of the reference database, size (referring to the playing duration

22、) and number of the test items, exact modification rules for the test items, and computing platform, which includes specification of the CPU, memory, and operating system. A number of control titles should also be included with the set of test items that are not contained in the reference database i

23、n order to properly test rejection behaviour of the system under test. 4 Selection of test material and size of database All different musical styles and genres should be present in the reference database with prevalence in numbers on the most heard genres. A database size of 10 000 to 100 000 piece

24、s is suggested for a realistic evaluation. Definition of terms: An item is called a duplicate item with respect to another audio item if it consists of the same recording as the original one with the exception that it might have a certain amount of zero valued leading or trailing samples added. This

25、 circumstance can sometimes be observed if the “same” song is located on different compilations or albums. A similar item represents a different (re)mix, cover version or (live) recording of another database item. 4 Rec. ITU-R BS.1657 Requirements for the selection of test material: Special care sho

26、uld be taken to avoid duplicate items within the database. The database shall contain a certain amount of similar items (minimum 20 pairs). Example: ten live recordings of one artist of the same song at different concerts; ten original/remix pairs of one song of different artists; ten original/cover

27、 version pairs of one song of different artists. The database shall be defined before the first experiment. It is not permitted to modify the database according to the test results. 5 Test method As the speed of the calculation may depend on the amount of distortion in the test item, it is required

28、to measure the speed of the extraction and search (classification) process separately for each experiment (1, 2, 3a) to 3i). 5.1 Experiment 1 In a first test run, all titles from the reference database remain unmodified and have to be identified. The performance of the system under test should there

29、fore be 100% for the correctly identified items. The average fingerprint size is calculated based on the full set of reference items. This results in an average size per item or a size per length of an item depending on the type of fingerprint of the system under test. Data from systems which do not

30、 perform continous fingerprinting shall be considered separately from the data of continous fingerprinting systems. 5.2 Experiment 2 Thereafter excerpts of 1 000 items not contained in the reference database and thus unknown to the system with a length of 5 s and 30 s, respectively, shall be added t

31、o the test set. These 2 000 excerpts are presented to the system to acquire the rejection behaviour and to test for potential false positives. In this set of 2 000 items there should be at least ten items which are of the type “similar item” (to a corresponding item in the reference database). 5.3 E

32、xperiment 3 For testing the recognition robustness with modified musical pieces a set of 1 000 items is chosen from the reference set. The first test shall be conducted as described in 3a). Then all other tests (3b) to 3i) are based on the excerpts created in 3a), that is, they represent a combinati

33、on of the specific distortion with the “cropping” effect as described in 3a). The combination of all other distortions with cropping is reasonable to eliminate the unrealistic assumption of perfectly aligned fingerprints. Rec. ITU-R BS.1657 5 The following modification procedures are recommended to

34、be used: 3a) Cropping/offset Taking only small subsegments of the test item. The start sample of the excerpt shall be varied (randomly chosen but fixed for all test systems). The length of the excerpt should be 5, 10 and 20 s, respectively. 3b) Dynamic compression and expansion Parameters shall be c

35、hosen according to customary settings used for broadcasting. 3c) Level adjustment Scaling the input signal by a certain factor, e.g. 6 dB and 10 dB. Clipping shall be avoided. 3d) Equalization Using octave band equalization with adjacent band attenuations set to 6 dB and +6 dB. 3e) Addition of noise

36、 Addition of white or pink noise with an overall S/N of 10 and 20 dB, respectively. 3f) Sampling rate conversion and pitch shifting Deviations of +5% and 5% in sampling rate shall be used. 3g) Audio coding and watermarking The effects of audio coding shall be evaluated using an MPEG-1/2 Layer-3 enco

37、ded signal with the following bit-rate/channel combinations: 24 kbit/s (mono), 64 kbit/s (stereo), 96 kbit/s (stereo) and 128 kbit/s (stereo). 3h) Band limiting The input signal shall be band limited to an upper frequency limit of 4 kHz. 3i) Acoustic transmission The imperfections caused by acoustic

38、 playback under moderate acoustic conditions shall be tested: The signal is transmitted using a loudspeaker and recorded again using a microphone. The recommended distance between both is about 50 cm. It is not necessary to choose a high quality loudspeaker and/or microphone. The test should be done

39、 within a regular (not acoustically treated or isolated) room. The parameters of the individual modification tests have been adjusted in a manner that the equivalent human listening perception would rate from “slight alteration” up to “strong alienation” of the original piece. For audio coding this

40、would correspond to encoding in the MP3 format at 128 kbit/s (stereo) for slight alteration of the original material, and to 24 kbit/s (mono) for strong alienation. Encoding to 96 kbit/s (stereo) and 64 kbit/s (stereo) as intermediate steps is recom-mended, since these bit rates are most commonly us

41、ed in Internet transactions. Therefore no more than five levels of degradation should be chosen1. 1The inclusion of MPEG-1/2 Layer-2, MPEG-2/4 AAC, Dolby-E and others, which frequently are used in broadcast environment is regarded as not necessary because these schemes are usually not misused in a s

42、tudy environment as happens frequently with MPEG-1/2 Layer-3 (MP3). 6 Rec. ITU-R BS.1657 6 Test platform As a recommended computational platform devices and operating system should be utilized that comply with the state-of-the-art equipment available to the regular user. In 2002 an example of an ade

43、quate platform is a Pentium class machine running at 1 GHz with 512 MB of RAM using Windows 2000TMor Linux. 7 System parameter variation During the different tests, fingerprinting systems which allow the achievement of varying degrees of robustness/fingerprinting compactness depending on their extra

44、ction parameter settings may be adapted in their setting to achieve optimum performance for each individual task/test. However, in this case each system/setting combination shall then be considered a separate system with a limited scope of application, its own fingerprint format and extraction proce

45、ss. This does not apply for systems in which a more compact/less robust fingerprint database can be derived from a less compact/more robust representation by means of a self-contained transcoding process, i.e. when only a single fingerprint extraction process from the reference audio material is suf

46、ficient to enable all functions shown in the tests. 8 Test report Test reports should convey, as clearly as possible, the rationale for the study, the methods used and conclusions drawn. Sufficient detail should be presented so that a knowledgeable person could, in principle, replicate the study in

47、order to check empirically on the outcome. An informed reader ought to be able to understand and develop a critique for the major details of the test, such as the underlying reasons for the study, the experimental design methods and execution, and the analyses and conclusions. Special attention shou

48、ld be given to the following aspects: the specification and selection of the reference and test items; the selection of the similar items and the corresponding test results for these special items; the detailed description of the parameters of the different distortions; the detailed description of the set-up parameters used for the systems under test; the detailed basis of all the conclusions that are drawn.

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1