ImageVerifierCode 换一换
格式:PDF , 页数:46 ,大小:2MB ,
资源ID:586691      下载积分:10000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-586691.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(BS ISO 24624-2016 Language resource management Transcription of spoken language《语言资源管理 口语转录》.pdf)为本站会员(roleaisle130)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

BS ISO 24624-2016 Language resource management Transcription of spoken language《语言资源管理 口语转录》.pdf

1、BS ISO 24624:2016Language resourcemanagement Transcriptionof spoken languageBSI Standards PublicationWB11885_BSI_StandardCovs_2013_AW.indd 1 15/05/2013 15:06BS ISO 24624:2016 BRITISH STANDARDNational forewordThis British Standard is the UK implementation of ISO 24624:2016.The UK participation in its

2、 preparation was entrusted to TechnicalCommittee TS/1, Terminology.A list of organizations represented on this committee can beobtained on request to its secretary.This publication does not purport to include all the necessaryprovisions of a contract. Users are responsible for its correctapplication

3、. The British Standards Institution 2016.Published by BSI Standards Limited 2016ISBN 978 0 580 84640 3ICS 01.140.10Compliance with a British Standard cannot confer immunity fromlegal obligations.This British Standard was published under the authority of theStandards Policy and Strategy Committee on

4、31 August 2016.Amendments/corrigenda issued since publicationDate T e x t a f f e c t e dBS ISO 24624:2016 ISO 2016Language resource management Transcription of spoken languageGestion des ressources linguistiques Transcription du langage parlINTERNATIONAL STANDARDISO24624First edition2016-08-15Refer

5、ence numberISO 24624:2016(E)BS ISO 24624:2016ISO 24624:2016(E)ii ISO 2016 All rights reservedCOPYRIGHT PROTECTED DOCUMENT ISO 2016, Published in SwitzerlandAll rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form or by any means

6、, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below or ISOs member body in the country of the requester.ISO copyright officeCh. de Blandonnet 8 CP 401CH-1214

7、Vernier, Geneva, SwitzerlandTel. +41 22 749 01 11Fax +41 22 749 09 47copyrightiso.orgwww.iso.orgBS ISO 24624:2016ISO 24624:2016(E)Foreword vIntroduction vi1 Scope . 12 Normative references 13 Terms and definitions . 14 Metadata . 24.1 Description of the electronic file () . 24.1.1 Distribution infor

8、mation () . 24.1.2 Recording information (). 24.2 Description of circumstances () 44.2.1 Participant information () 44.2.2 Setting information () 44.3 Description of source () . 55 Macrostructure 55.1 Timeline () . 55.2 Utterances () . 65.3 Free dependent annotations (, ) 75.4 Grouping of utterances

9、 and dependent annotations () . 95.5 Independent elements outside utterances ( and ) .105.6 Inline paralinguistic annotation () 105.7 Global divisions of a transcription () 116 Microstructure .126.1 Tokens () . 126.1.1 Characterization 126.1.2 Representation as 126.1.3 Further constraints .136.1.4 E

10、xamples 136.2 Pauses () . 146.2.1 Characterization 146.2.2 Representation as 146.2.3 Further constraints .146.2.4 Examples 156.3 Audible and visible non-speech events (, and ) 156.3.1 Characterization 156.3.2 Representation as , or .166.3.3 Examples 166.4 Punctuation () 176.4.1 Characterization 176.

11、4.2 Representation as . 176.4.3 Further constraints .176.4.4 Examples 186.5 Uncertainty, alternatives, incomprehensible and omitted passages (, , ) 186.5.1 Characterization 186.5.2 Representation as or .186.5.3 Further constraints .186.5.4 Examples 196.6 Units above the token and below the level ()

12、206.6.1 Characterization 206.6.2 Representation as 206.6.3 Further constraints .206.6.4 Examples 20 ISO 2016 All rights reserved iiiContents PageBS ISO 24624:2016ISO 24624:2016(E)Annex A (informative) Fully encoded example .22Annex B (informative) Element and attribute index .28Bibliography .31iv IS

13、O 2016 All rights reservedBS ISO 24624:2016ISO 24624:2016(E)ForewordISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committ

14、ees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the Int

15、ernational Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.The procedures used to develop this document and those intended for its further maintenance are described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the di

16、fferent types of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO sh

17、all not be held responsible for identifying any or all such patent rights. Details of any patent rights identified during the development of the document will be in the Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).Any trade name used in this document

18、is information given for the convenience of users and does not constitute an endorsement.For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment, as well as information about ISOs adherence to the World Trade Organization (WTO) principles in the Techn

19、ical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.The committee responsible for this document is ISO/TC 37, Terminology and other language and content resources, Subcommittee SC 4, Language resource management. ISO 2016 All rights reserved vBS ISO 24624:2016ISO 24624:

20、2016(E)IntroductionThis document sets out to facilitate the interchange of transcriptions of spoken language between different computational tools and environments for creating, editing, publishing and exploiting such data. Transcription of spoken language in this context means an orthography-based

21、transcription of verbal activity as recorded in an audio or video recording of a natural interaction. The description of activity in other modalities (e.g. body language, gestures and facial expression) may be part of a spoken language transcription, but this document starts from the assumption that

22、 the verbal dimension is the primary focus of a spoken language transcription. Likewise, although this document may also be relevant for transcription based on phonetic alphabets like the IPA, the assumption for this document is that orthography-based transcription is the default case.This document

23、is developed in the context of the joint agreement between ISO and the Text Encoding Initiative (TEI) consortium, and accordingly, its content is also distributed as part of the TEI guidelines.23This document takes into account data models and encoding practices supported by widely used transcriptio

24、n software. More specifically, it builds on several interoperability studies12,16,17,19involving the following tools: ANVIL10 CLAN11 ELAN22 EXMARaLDA20 FOLKER18 Transcriber1This document was developed to be compatible with the formats produced by these tools. The compatibility may extend to the form

25、ats of further labelling tools (e.g. Praat4or Wavesurfer, h t t p :/www.speech.kth.se/wavesurfer/index2.html), but possibly on a lower level and/or with a requirement to convert these formats to one of the above-mentioned before adding mandatory information (e.g. speaker assignment) using the respec

26、tive tools.This document also aims to be usable with widely used transcription systems (“conventions”). However, in a technical sense, compatibility is not easily definable in this area since, unlike the tool formats, most of these systems lack an explicit formalization. The following selection of t

27、ranscription systems was considered for this document: Codes for the Human Analysis of Transcripts (CHAT)11 Discourse Transcription (DT)7 Gesprchsanalytisches Transkriptionssystem (GAT)21 Halbinterpretative Arbeitstranskriptionen (HIAT)13Since TEI is the reference framework for this document and met

28、adata is not its main concern, no attempt is made here to address metadata compatibility issues beyond the TEI header. However, it should be noted that there are several TEI profiles for the CMDI framework which are related both to each other and to CMDI profiles of other metadata formats (e.g. IMDI

29、) via the ISOCAT registry (see also References 5, 6 and 9).This document aims to define both a target format for legacy data conversion and a format suitable for future data processing requirements. The pros and cons of these two demands were carefully weighed up before decisions were taken. At some

30、 points, certain techniques are therefore marked as preferred vi ISO 2016 All rights reservedBS ISO 24624:2016ISO 24624:2016(E)from a data processing point of view while an alternative technique is still allowed if the structure of legacy data makes its use unavoidable.With regard to the other stand

31、ards developed within ISO committee TC 37/SC 4, this document is intended to provide the primary layer on top of which further annotation layers may be implemented. In particular, the use of the element for tokenizing a transcription is conformable to the TEI-based representation of tokens ISO 24611

32、 (MAF).This document also aligns with the mechanism proposed in the TEI guidelines to embed stand-off annotations within a TEI document. In particular, this mechanism contains a generic element () that groups together annotations related to the same linguistic segment; this grouping meets the needs

33、of this document in the case of annotations of elements or its children.Finally, this document is complementary and does not overlap with the speech and multimodal interaction-related standards developed within the W3C. In particular, it does not deal with speech synthesis as is the case for SSML,24

34、nor does it deal with the representation of the semantic interpretation of multimodal utterances as does EMMA.25 ISO 2016 All rights reserved viiBS ISO 24624:2016BS ISO 24624:2016Language resource management Transcription of spoken language1 ScopeThis document specifies rules for representing transc

35、riptions of audio- and video-recorded spoken interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the document aims to relate transcribed data with standards for annotated corpora. It is applicable to transcription data for studies in sociolinguistics, conversa

36、tion analysis, dialectology, corpus linguistics, corpus lexicography, language technology, qualitative social studies and other transcription data of recorded spoken language. It is not applicable to other forms of transcription, most importantly transcriptions of hand-written manuscripts.Annex A gi

37、ves a fully encoded example and Annex B provides an element index and an attribute index.2 Normative referencesThere are no normative references in this document.3 Terms and definitionsFor the purposes of this document, the following terms and definitions apply.ISO and IEC maintain terminological da

38、tabases for use in standardization at the following addresses: IEC Electropedia: available at http:/www.electropedia.org/ ISO Online browsing platform: available at http:/www.iso.org/obp3.1dependent annotationannotation which does not refer directly to an audio or video recording, but to another ann

39、otation, typically an orthographic or phonetic transcription3.2milestone elementempty XML element used to indicate a boundary point3.3orthographic transcriptionrepresentation or modelling of spoken language based on the orthography of the respective language3.4paralinguistic featurefeature of spoken

40、 language beyond the individual sound(s), such as voice quality, pitch, volume, intonation3.5phonetic transcriptionrepresentation or modelling of spoken language based on the sound system of the respective language3.6spoken languageoral language produced by a persons vocal systemINTERNATIONAL STANDA

41、RD ISO 24624:2016(E) ISO 2016 All rights reserved 1BS ISO 24624:2016ISO 24624:2016(E)3.7transcriberperson who carries out the transcription3.8transcriptionrepresentation or modelling of spoken language by means of written symbols3.9transcription systemtheoretically founded set of principles and rule

42、s detailing what spoken language phenomena are to be transcribed, and how they are to be transcribed4 MetadataThe TEI guidelines formulate extensive suggestions for encoding metadata inside different subsections of the element. The following section addresses only those pieces of metadata which are

43、either (i) crucial for ensuring the interpretability and exchangeability of spoken language transcriptions in general or (ii) likely to be relevant in a large majority of cases. This does not preclude the possibility of, or necessity for, encoding further metadata inside the element.4.1 Description

44、of the electronic file ()4.1.1 Distribution information ()The element inside the section of the should be used to record information about access rights and contact information for the transcription in question.EXAMPLE 1 Use of Hamburger Zentrum fr SprachkorporaAvailable free for research and teachi

45、ng purposes.No redistributing allowed. Hamburger Zentrum fr SprachkorporaMax Brauer-Allee 6022765HamburgGermany4.1.2 Recording information ()The element inside the section of the should be used to record information about the transcribed recording(s). Only the actual recording(s), usually digital au

46、dio and/or video files, should be described here. General information about the respective interaction which is independent of the recording(s) should be described in the element (see 4.2.2).2 ISO 2016 All rights reservedBS ISO 24624:2016ISO 24624:2016(E)A element inside a element should be used to

47、refer to the corresponding digital file via a url attribute (see Reference 2). A type attribute on should be used to indicate the media type of the recording; audio and video are the permissible values for that attribute. The actual digital file type should be encoded as a mimeType attribute (see Re

48、ference 8) on the element. Where two or more files are derived from the same master recording (e.g. a video file or an extracted audio track), these should be represented as different elements inside the same element, rather than as different elements. TEI linking mechanisms, such as or corresp, can

49、 be used to describe relationships between different recordings or between recordings and other elements, such as speakers.EXAMPLE 2 Use of Parkinson Talkshow on BBC, broadcast on 02 November 2007Video excerpt downloaded from YouTube with aTube-Catcher, convertedinto MPG format with Adobe PremiereAudio extracted from video with Audacity 1.3 betaRecorded with a ZOOM H4NSP, external lapel microphoneclipped to Victoria Beckhams dressSynchronized with David Beckhams record-ingReco

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1