ImageVerifierCode 换一换
格式:PDF , 页数:35 ,大小:439.99KB ,
资源ID:590687      下载积分:10000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-590687.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(CAN CSA-ISO IEC 15938-4A-2005 Information technology - Multimedia content description interface - Part 4 Audio AMENDMENT 1 Audio extensions.pdf)为本站会员(boatfragile160)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

CAN CSA-ISO IEC 15938-4A-2005 Information technology - Multimedia content description interface - Part 4 Audio AMENDMENT 1 Audio extensions.pdf

1、 Reference numberISO/IEC 15938-4:2002/Amd.1:2004(E)ISO/IEC 2004Information technology Multimedia content description interface Part 4: Audio AMENDMENT 1: Audio extensions Technologies de linformation Interface de description du contenu multimdia Partie 4: Audio AMENDEMENT 1: Extensions audio Amendme

2、nt 1:2005 toNational Standard of CanadaCAN/CSA-ISO/IEC 15938-4:04Amendment 1:2004 to International Standard ISO/IEC 15938-4:2002 has been adopted withoutmodification (IDT) as Amendment 1:2005 to CAN/CSA-ISO/IEC 15938-4:04. This Amendment was reviewedby the CSA Technical Committee on Information Tech

3、nology (TCIT) under the jurisdiction of the StrategicSteering Committee on Information Technology and deemed acceptable for use in Canada.October 2005ISO/IEC 15938-4:2002/Amd.1:2004(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file

4、may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat ac

5、cepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that

6、the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. ISO/IEC 2004 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized i

7、n any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 2

8、2 749 09 47 E-mail copyrightiso.org Web www.iso.org ii ISO/IEC 2004 All rights reservedISO/IEC 15938-4:2002/Amd.1:2004(E) ISO/IEC 2004 All rights reserved iiiForeword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized

9、 system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committe

10、es collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. International S

11、tandards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Public

12、ation as an International Standard requires approval by at least 75 % of the national bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent

13、rights. Amendment 1 to ISO/IEC 15938-4:2002 was prepared by Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information. ISO/IEC 15938-4:2002/Amd.1:2004(E) ISO/IEC 2004 All rights reserved 1Information technology Mult

14、imedia content description interface Part 4: Audio AMENDMENT 1: Audio extensions Add at the end of subclause 4.2: 4.3 Handling of multi-channel signals Introduction: The framework to handle multi-channel signals is given by the AudioD and AudioDS Types defined in ISO/IEC 15938-5/Amd.1 (MDS). The new

15、 additional attribute channels gives the channel numbers that are described by the assigned Descriptor or Description Scheme. However, to prevent some misunderstanding, a more detailed description and handling policy is given in this part of ISO/IEC 15938. In particular a recommendation is given to

16、handle typical surround formats, when only tag names like L, C, R, LS, LR, LFE are known. By using the channels attribute, defined in ISO/IEC 15938-5/Amd.1 (MDS) it is possible to specify which channels should be used for e.g. computing the mean with the extraction method. Therefore, the Descriptor

17、and Description Schemes contain information about these channels only. This is useful in order to separate a multi-channel input signal into subgroups that are closely related, e.g. the Left (L), Center (C) and Right (R) signal of a typical surround format. The highest possible channel number is giv

18、en in the file-format of the audio media file itself. All numbers given in the channels attribute higher than the number of channels given by the media file-format should be ignored. In the case where the numbering of the audio channels is not explicitly given in the file-format (like 5.1 surround s

19、ignals), the following convention to number the channels is recommended to be used. When mapping typical surround file-formats, consisting of tags like (L, R, C, LS, RS, LFE), the scheme shown in Figure AMD1-1 should be followed, in order to reduce the ambiguity between scheme and channel number. To

20、 define the channel number, the counting should start at an optional center channel and go from left to right, top to bottom and then from front to back (see for example Figure AMD1-2). An optional rear center will get the last channel number for the standard audio channels. The assigned number can

21、be higher if specialised channels are present, like an LFE channel for low frequency effect signals. Two examples are given in Tables 1 and 2. Furthermore, it is recommended that a textual description of the scheme used inside the AudioSegmentD-Framework (defined in ISO/IEC 15938-5) is included. An

22、instantiation example is given in ISO/IEC 15938-5/Amd.1 (MDS), subclause 4.2.4. ISO/IEC 15938-4:2002/Amd.1:2004(E) 2 ISO/IEC 2004 All rights reservedFigure AMD1-1 - Scheme and channel number for typical surround file-formats Figure AMD1-2 - Scheme and channel number for a 3D speaker arrangement (exa

23、mple) ISO/IEC 15938-4:2002/Amd.1:2004(E) ISO/IEC 2004 All rights reserved 3Examples for mapping: Table AMD1-1- Simple Stereo Tag name channel number Left 1 Right 2 Replace subclause 6.5.8 by: 6.5.8 WordLexiconType 6.5.8.1 Syntax TableAMD1-2- Surround 5.1 Tag name Channel number Center 1 Left 2 Right

24、 3 Left Surround (LS) 4 Right Surround (LR) 5 LFE 6 ISO/IEC 15938-4:2002/Amd.1:2004(E) 4 ISO/IEC 2004 All rights reserved6.5.8.2 Semantics Name Definition WordLexiconType A list of words (a lexicon). Each entry represents one orthographic representation (spelling) or one non-orthographic representat

25、ion of a word or linguistic unit. The lexicon is not a phonetic (pronunciation) dictionary. phoneticAlphabet The name of the encoding scheme of the phone lexicon. Only needed if phonetic representation is used. See 6.5.9 phoneticAlphabetType Token An entry in the lexicon linguisticUnit Indicates the

26、 type of the linguistic unit that is put into the entry of the word lexicon. The linguistic units are defined as follows. word an unit delimitated by whitespace. This is the default value. (example: psychcoacoustics) syllable minimal pronouncable unit (example: psy) morpheme minimal meaning bearing

27、unit (example: psycho ) stem the uninflected base of a word-form, can be polymorphemic. (example: psychoacoustic) affix needs to be added to a stem to get a word component a constituent part of a compound word. Important for compounding languages. (example from German: Forschungs (in English corresp

28、onds to “research-“) nonspeech noises, both human-produced and background, that are non-linguistic in nature. (example: throat clearing, coughing) phrase - a sequence of words (e.g. “God bless America”) other - a linguistic unit that does not map onto any of the above Other values that are datatype-

29、valid with respect to mpeg7:termReferenceType are reserved. ISO/IEC 15938-4:2002/Amd.1:2004(E) ISO/IEC 2004 All rights reserved 5representation Form of representation for a lexicon entry. The kinds of representation are defined as follows. orthographic representation of an entry by spelling nonortho

30、graphic representation of an entry by an identifier that is not synonymous with the spelling of a word. A non-orthographic representation may, for example, encode the phoneme string corresponding to the pronunciation of the entry. 6.5.8.3 Usage, Extraction and Examples (informative) 6.5.8.3.1 Purpos

31、e The word lexicon makes it possible to store the words contained in the lattice. It is common in both speech recognition and spoken document retrieval to include entries in the word lexicon that are “words” only in a wider sense of the term (e.g. acronyms or abbreviations) or not really words at al

32、l (e.g. phrases, syllables, morphemes or the individual components of compound words). The attribute linguisticUnit makes it possible to distinguish between these different types of units. Differentiating these units is useful, for instance, when the retrieval algorithm of an application needs to tr

33、eat different units in different ways. For example stemming, a pre-processing step applied to words, should not be applied to syllables or morphemes. Similarly, different types of units might receive different weightings in the calculation of the retrieval metric. In some applications, it is also ne

34、cessary to know if the entry is given in its human-readable form or not. For example, if the entry is human-interpretable and can potentially be displayed to the user or if certain algorithms are applied which are intended for the orthographic form only (e.g. stemming). 6.5.8.3.2 Extraction The gene

35、ration of syllable, morpheme, compound and phrase transcriptions of spoken input is performed in the following ways: a) The output of the word or phoneme recognizer is mapped to other linguistic units. For example the recognized word can be transformed into syllables using a syllable generation tool

36、. b) The ASR system produces the desired linguist unit directly during the recognition process. In this case, the linguistic units are parts of the recognition vocabulary of the speech recognition engine. For example, the dictionary used for the speech recognition system could be composed exclusivel

37、y of syllables. 6.5.8.3.3 Example The following example shows a lexicon containing six entries. The first two entries represent syllables and the next two entries represent words. The fifth entry also represents a word, but not in its written form. The last entry represents a phrase. 6: n 6:_s_ wate

38、r draw Q e: l e: f a n t ISO/IEC 15938-4:2002/Amd.1:2004(E) 6 ISO/IEC 2004 All rights reservedas a rule Replace subclause 6.5.12 by: 6.5.12 SpokenContentLinkType 6.5.12.1 Syntax 6.5.12.2 Semantics Name Definition SpokenContentLinkType The structure of a word or phone link in the lattice probability

39、The probability of this link. In a crude sense, this is to indicate which links are more likely than others, with larger numbers indicating higher likelihood. nodeOffset The node to which this link leads, specified as a relative offset and defaulting to 1. A node offset leading out of the current bl

40、ock implicitly refers to the next block. A node offset cannot span a whole block, i.e., a link from a node in block 3 must lead to a node in block 3 or block 4. acousticScore The score assigned by the acoustic models of the speech recognition engine only. It is given in logarithmic scale (base e) an

41、d indicates the quality of the match between the acoustic models and the corresponding signal segment. A higher value indicates a better match. Add a new subclause 6.7: 6.7 Audio Signal Quality 6.7.1 Introduction If an AudioSegment DS contains a piece of music, several features describing the signal

42、s quality can be computed to describe the quality attributes. The AudioSignalQualityType contains these quality attributes and uses the ErrorEventType to handle typical errors that occur in audio data and in the transfer process from analog audio to the digital domain. However, note that this DS is

43、not applicable to describe the subjective sound quality of audio signals resulting from sophisticated digital signal processing, including the use of noise shaping or other techniques based on perceptual/psychoacoustic considerations. ISO/IEC 15938-4:2002/Amd.1:2004(E) ISO/IEC 2004 All rights reserv

44、ed 7For example, in the case of searching an audio file on the Internet, quality information could be used to determine which one should be downloaded among several search results. Another application area would be an archiving system. There, it would be possible to browse through the archive using

45、quality information, and also the information could be used to decide if a file is of sufficient quality to be used e.g. for broadcasting. 6.7.2 Conventions The description of the Descriptors refers to the input signal x. If x is a multi channel signal, then the signal for a certain channel is desig

46、nated as xn for the n-th channel. The functions max(), min() and mean() are used as defined in ISO/IEC CD 15938-4 (Audio Part). The function abs() calculates the absolute value.6.7.3 Audio Signal Quality Description Scheme The AudioSignalQualityType is a set of AudioQuality Descriptors and some addi

47、tional tools for handling and describing audio signal quality information. In particular the handling of single error events in audio streams is considered. 6.7.3.1 Syntax ISO/IEC 15938-4:2002/Amd.1:2004(E) 8 ISO/IEC 2004 All rights reserved6.7.3.2 Semantics Name Definition AudioSignalQualityType Th

48、e AudioSignalQualityType describes the quality of an AudioSegment. It consists of several quality elements. Operator The Operator is the person who is responsible for the audio quality information. Operator is of type PersonType. UsedTool The UsedTool is the system that was used by the Operator to c

49、reate the quality information. UsedTool is of type CreationToolType. BackgroundNoiseLevel (BNL) The BackgroundNoiseLevel describes the noise level in an AudioSegment. BackgroundNoiseLevelType is defined in 6.7.4. RelativeDelay The RelativeDelay describes the relative delay between two or more channels of an AudioSegment. RelativeDelayType is defined in 6.7.6. Balance The Balance describes the relative level between two or more channels of an AudioSegment. BalanceType is defined in 6.7.7.

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1