ImageVerifierCode 换一换
格式:PDF , 页数:22 ,大小:255.50KB ,
资源ID:1257798      下载积分:10000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-1257798.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(ISO TR 19358-2002 Ergonomics - Construction and application of tests for speech technology《人类工效学 语音技术试验的建立和应用》.pdf)为本站会员(registerpick115)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

ISO TR 19358-2002 Ergonomics - Construction and application of tests for speech technology《人类工效学 语音技术试验的建立和应用》.pdf

1、 Reference number ISO/TR 19358:2002(E) ISO 2002TECHNICAL REPORT ISO/TR 19358 First edition 2002-10-01 Ergonomics Construction and application of tests for speech technology Ergonomie laboration et mise en uvre des tests des systmes de technologie de la parole ISO/TR 19358:2002(E) PDF disclaimer This

2、 PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept ther

3、ein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; th

4、e PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. ISO 2002 All rights reser

5、ved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the request

6、er. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.ch Web www.iso.ch Printed in Switzerland ii ISO 2002 All rights reservedISO/TR 19358:2002(E) ISO 2002 All rights reserved iiiContents Page Foreword iv Introduction iv 1 Scope 1

7、 2 Terms and definitions. 1 3 Description of speech technologies . 3 3.1 Introduction . 3 3.2 Available technologies . 3 4 Description of relevant variables related to speech technology 4 4.1 Introduction . 4 4.2 Speech type . 5 4.3 Speaker (specification of speaker-dependent aspects)5 4.4 Task (app

8、lication-specific description of relevant recognition parameters) 5 4.5 Training (task-related training aspects) 6 4.6 Environment (specification of the speech quality in a specific environment, for both input and output) 6 4.7 Input (specification of the transmission of the speech signal from the m

9、icrophone to a recognizer input) . 6 4.8 Specification of speech technology modules 6 5 Assessment methods . 7 5.1 General . 7 5.2 Field vs. laboratory evaluation 8 5.3 System transparency 8 5.4 Subjective vs. objective methods 9 5.5 Speech recognition systems . 9 5.6 Speech synthesis systems. 9 5.7

10、 Speaker identification and verification . 9 5.8 Corpora. 10 5.9 Related sources of information . 10 Annex A (informative) Example of assessment. 11 Annex B (informative) Performance measures 14 Bibliography 15 ISO/TR 19358:2002(E) iv ISO 2002 All rights reservedForeword ISO (the International Organ

11、ization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been establish

12、ed has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

13、 International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Pu

14、blication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. In exceptional circumstances, when a technical committee has collected data of a different kind from that which is normally published as an International Standard (“state of the art“, for e

15、xample), it may decide by a simple majority vote of its participating members to publish a Technical Report. A Technical Report is entirely informative in nature and does not have to be reviewed until the data it provides are considered to be no longer valid or useful. Attention is drawn to the poss

16、ibility that some of the elements of this Technical Report may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO/TR 19358 was prepared by Technical Committee ISO/TC 159, Ergonomics, Subcommittee SC 5, Ergonomics of the physical envi

17、ronment. ISO/TR 19358:2002(E) ISO 2002 All rights reserved vIntroduction This Technical Report advises on methods for determining the performance of speech-technology systems (automatic speech recognizers, text-to-speech systems and other devices that make use of the speech signal) and on selecting

18、appropriate test procedures. Human-to-human speech communication is not included in this Technical Report but is covered by ISO 9921. TECHNICAL REPORT ISO/TR 19358:2002(E) ISO 2002 All rights reserved 1Ergonomics Construction and application of tests for speech technology 1 Scope This Technical Repo

19、rt deals with the testing and assessment of speech-related products and services, and is intended for use by specialists active in the field of speech technology, as well as purchasers and users of such systems. Advanced users are referred to the detailed evaluation chapters of the EAGLES Handbook o

20、f Standards and Resources for Spoken Language Systems (Gibbon et al. 1997) and the EAGLES Handbook of Multimodel and Spoken dialogue Systems. EAGLES was a research project partly sponsored by the European Community. 2 Terms and definitions For the purposes of this Technical Report, the following ter

21、ms and definitions apply. 2.1 Automatic Speech Recognition ASR ability of a system to accept human speech as a means of input 2.2 dialogue interactive exchange of information between the speech system and the human speaker 2.3 dialogue management control of the dialogue between the speech system and

22、 the human 2.4 Natural Language Processing NLP automatic processing of text originating from humans 2.5 objective assessment assessment without direct involvement of human subjects during measurement, typically using prerecorded speech 2.6 performance measures means used to assess the system perform

23、ance, typically by diagnostic or relative performance methods 2.7 speaker-dependent system need of a speech-recognition system to be trained with the speech of the specific user 2.8 speaker identification identification of a particular speaker from a closed set of possible speakers ISO/TR 19358:2002

24、(E) 2 ISO 2002 All rights reserved2.9 speaker-independent system system not trained for a specific user but applicable for any user of a selected group (native speakers, adults, etc.) 2.10 speaker recognition general term for technology which identifies or verifies the identity of a speaker 2.11 spe

25、aker verification verification of the identity of a person by assessment of specific aspects of his/her speech 2.12 speaking style speech may be isolated or continuous, read or spontaneous, or dictated 2.13 speech communication conveying or exchanging information using speech, speaking, and hearing

26、modalities NOTE Speech communication may involve brief texts, sentences, groups of words, isolated words, hums and parts of words. 2.14 speech recognizer process in a machine capable of converting spoken language to recognized words NOTE This is the process by which a computer transforms an acoustic

27、 speech signal into text. 2.15 speech synthesis generation of speech from data 2.16 speech understanding technology that extracts the semantic contents of speech 2.17 subjective assessment assessment with the direct involvement of human subjects during measurement 2.18 text-to-speech synthesis gener

28、ation of audible speech from a text 2.19 vocabulary set of words used in a particular context 2.20 vocabulary size number of words in a vocabulary of the speech recognizer ISO/TR 19358:2002(E) ISO 2002 All rights reserved 33 Description of speech technologies 3.1 Introduction Speech technology inclu

29、des the automatic recognition of speech and of the speaker, speech synthesis, etc., Natural Language Processing (NLP) includes the understanding of text items and the management of a dialogue between a human speaker and a machine. Modern technologies are mostly based on algorithms, which make use of

30、 digital-signal processing embedded in a digital-signal processor or a (personal) computer system. The algorithms produce near real-time responses. The performance depends on the application. For example, a speech- recognition system designed for use with a small vocabulary and trained with speech f

31、rom a single user (e.g., control of a personal hand-held telephone) will generally perform (for this particular user) much better than a system designed for a domain with a large vocabulary and generally for a large group of unknown users (e.g., information services through a public telephone networ

32、k). For speech products and services, we can identify four main categories: a) Command and Control. The interface between a user and a system is accomplished by automatic speech recognition (ASR). ASR is normally used in a multimodal design, in which the control of a system by speech is one of the p

33、ossible modalities (i.e., a keyboard, mouse, touch screen, etc. may be an alternative modality). Control by an ASR system may be essential in “hands busy” situations. b) Services and Telephone Applications. Services such as an information kiosk normally require a combination of speech recognition, u

34、nderstanding, speech synthesis and dialogue management in order to control the unsupervised dialogue between user and system. Present state-of-the-art systems cover relatively simple dialogue structures such as travel-information systems (day, time and “from-to”), and call centres (selection of the

35、required information). c) Document Generation. Dictation systems trained for many languages are presently on the market. These systems can be linked to standard word-processing systems. Simple applications include data entry for a specific user domain (e.g. medical reports), more complex systems all

36、ow dictation of full documents and the control of the text processing system. These more complex systems are often trained for a large vocabulary and speaker-dependent use. However, for acceptable performance, the system has to be familiarized with the user and the domain of the use. This is often a

37、ccomplished in two steps: by an (adaptive) acoustical training session in which the user has to read a predefined text, and by presentation of a number of documents written for the user, which are used to extend the vocabulary and to modify the language model. d) Document Retrieval. Retrieval of com

38、plete documents (from a spoken-document archive), information retrieval of specific passages from a document or utterances from a specific speaker are of interest for archive documentation and management and the compilation of overviews. Various technologies are used for labelling of the speech utte

39、rances such as ASR, word spotting and speaker recognition. Specific search algorithms are used to retrieve the required information. 3.2 Available technologies 3.2.1 Speech recognition Automatic speech-recognition systems are capable of producing a transcription (text string) from a speech signal. F

40、or this purpose, trained systems are used. Modern systems, for use with a large vocabulary, extract specific spectral parameters that identify sub units (phonemes) from the speech signal. Words are described in terms of strings of these phonemes. The recognition architecture may require various leve

41、ls related to models of the phonemes (phone models), words (vocabulary) and the statistically description of word combinations (language model). Phone models are normally trained for a large number of speakers resulting in statistically based representation. The statistical approach is normally base

42、d on a Hidden Markov Model (HMM) or a Neural Network (NN). The vocabulary and the language model are obtained from digitally available text that are representative for the application domain. ISO/TR 19358:2002(E) 4 ISO 2002 All rights reserved3.2.2 Speaker identification and verification Automatic s

43、peaker identification is the capability to identify a speaker from a group of known speakers. It answers the question “To whom does this speech sample belong?” This technology involves two steps: modelling the speech of the speaker population (training) and comparing the unknown speech to all of the

44、 speaker models (testing). Speaker verification is a method of confirming that a speaker is the person that he or she claims to be. The heart of the speaker-verification system is an algorithm, which compares an utterance from the speaker with a model built from training utterances gathered from the

45、 authorized user during an enrolment phase. If the speech matches the model within some required tolerance threshold, the speaker is accepted as having the claimed identity. In order to protect against an intruder attempting to fool the system by making a recording of the voice of the authorized use

46、r, the verification system will usually prompt the speaker to say particular phrases, such as sequences of numbers which are selected to be different each time the user tries to gain entry. The speech verification system is combined with a recognition system to assure that the proper phrase was spok

47、en. 3.2.3 Speech synthesis For speech synthesis two methods are used: the first, generally known as “canned speech”, is generated on the basis of prestored messages. The coding techniques to compress the messages are normally used in order to save storage space. With this type of synthesis, high-qua

48、lity speech can be obtained, especially for quick-response applications that make use of a number of standard responses. The second method, “text-to-speech synthesis,” allows the generation of any message from a written text. This generally involves a first stage of linguistic processing, in which t

49、he text-input is converted into an internal representation of phoneme and prosodic markers, and a second stage of sound generation on the basis of this internal representation. The sound generation can be made either entirely by rule, typically using complex models of the speech production mechanism (formant synthesis, intonation), or by concatenating short prestored units (concatenate synthesis). The speech quality obtained with concatenate synthesis is generally considered higher. 3.2.4 Speech understanding Speech-understanding systems can be

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1