BS ISO 24619-2011 Language resource management Persistent identification and sustainable access (PISA)《语言资源管理 持续识别和可持续存取(PISA)》.pdf

上传人:roleaisle130 文档编号:586689 上传时间:2018-12-15 格式:PDF 页数:40 大小:1.11MB
下载 相关 举报
BS ISO 24619-2011 Language resource management Persistent identification and sustainable access (PISA)《语言资源管理 持续识别和可持续存取(PISA)》.pdf_第1页
第1页 / 共40页
BS ISO 24619-2011 Language resource management Persistent identification and sustainable access (PISA)《语言资源管理 持续识别和可持续存取(PISA)》.pdf_第2页
第2页 / 共40页
BS ISO 24619-2011 Language resource management Persistent identification and sustainable access (PISA)《语言资源管理 持续识别和可持续存取(PISA)》.pdf_第3页
第3页 / 共40页
BS ISO 24619-2011 Language resource management Persistent identification and sustainable access (PISA)《语言资源管理 持续识别和可持续存取(PISA)》.pdf_第4页
第4页 / 共40页
BS ISO 24619-2011 Language resource management Persistent identification and sustainable access (PISA)《语言资源管理 持续识别和可持续存取(PISA)》.pdf_第5页
第5页 / 共40页
亲,该文档总共40页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、raising standards worldwideNO COPYING WITHOUT BSI PERMISSION EXCEPT AS PERMITTED BY COPYRIGHT LAWBSI Standards PublicationBS ISO 24619:2011Language resourcemanagement Persistentidentification and sustainableaccess (PISA)BS ISO 24619:2011 BRITISH STANDARDNational forewordThis British Standard is the

2、UK implementation of ISO 24619:2011.The UK participation in its preparation was entrusted to TechnicalCommittee TS/1, Terminology.A list of organizations represented on this committee can beobtained on request to its secretary.This publication does not purport to include all the necessaryprovisions

3、of a contract. Users are responsible for its correctapplication. BSI 2011ISBN 978 0 580 67346 7ICS 01.140.20Compliance with a British Standard cannot confer immunity fromlegal obligations.This British Standard was published under the authority of theStandards Policy and Strategy Committee on 31 May

4、2011.Amendments issued since publicationDate Text affectedBS ISO 24619:2011Reference numberISO 24619:2011(E)ISO 2011INTERNATIONAL STANDARD ISO24619First edition2011-05-15Language resource management Persistent identification and sustainable access (PISA) Gestion des ressources langagires Identificat

5、ion et accs prennesBS ISO 24619:2011ISO 24619:2011(E) COPYRIGHT PROTECTED DOCUMENT ISO 2011 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without

6、permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web www.iso.org Published in Switzerland ii ISO 2011 All rights

7、reservedBS ISO 24619:2011ISO 24619:2011(E) ISO 2011 All rights reserved iiiContents Page Foreword iv Introduction.v 1 Scope1 2 Normative references1 3 Terms and definitions .2 3.1 Resources 2 3.2 Identifiers .4 3.3 Roles, institutions and services 5 3.4 Actions .6 4 Background6 5 Requirements for PI

8、D frameworks and PID use.8 5.1 General .8 5.2 PID framework requirements .8 5.3 PID usage .9 5.4 Citation information and persistent identifiers 10 5.5 Referencing resource parts10 5.6 Collections .11 6 Complementary requirements .11 6.1 Granularity of identifiers.11 6.2 Recommendations 12 Annex A (

9、informative) Independent resources, aggregated resources, and parts of resources .13 Annex B (informative) Persistent identifier system implementations.22 Annex C (informative) Abbreviated terms 25 Bibliography27 Alphabetical Index29 BS ISO 24619:2011ISO 24619:2011(E) iv ISO 2011 All rights reserved

10、Foreword ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a t

11、echnical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters

12、 of electrotechnical standardization. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulate

13、d to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible

14、for identifying any or all such patent rights. ISO 24619 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content resources, Subcommittee SC 4, Language resource management. BS ISO 24619:2011ISO 24619:2011(E) ISO 2011 All rights reserved vIntroduction References and

15、citations are an important part of documents and papers. Traditionally authors use them to provide proper acknowledgment to the author(s) of other papers as a source for their work or use them to support their argumentation. Citations usually contain information that enables a reader to establish th

16、e possible relevance of the cited paper and to identify it unambiguously. Any librarian or knowledgeable person is able to retrieve the document using well-established procedures based on the information in the citation. The availability of directly accessible documents on the web has inspired the p

17、ractice of adding a web location (URI 4) to the citation information. This practice has made it possible to access referenced documents directly in web browsers as well as in other document viewers. This practice is already recommended in standards like ISO 690, although the emphasis there is more o

18、n identifying published resources and parts than on providing sustainable access to them. Increasingly often, such references need to be exploited by machines and software applications as well as by people, requiring reliable availability of the referenced resources. Problems with access that occur

19、when resources are relocated have led to the use of persistent identifier (PID) frameworks 23, 24. Current approaches 18, 19, 24address the resource relocation problem by introducing resolver services that translate a resource identifier to its actual current location. These resolver services have a

20、n added advantage of permitting the association of additional metadata with the identifier. Elaborate frameworks such as the Digital Object Identifier (DOI)14, use this feature to manage extra services, for instance copyright information. The practice of using persistent identifiers to cite and refe

21、rence scientific data, along with individual resources as well as data sets, is less well developed. It is no less powerful, however, in that it allows readers of a paper, or users of a knowledge resource, direct access to the primary scientific data to which the resource refers. When using referenc

22、es to access scientific data, including language resources, it becomes important to be able also to refer to and access parts of resources. This is especially true in the domain of language resources, where several layers of granularity are usually superimposed on the same data set or resource colle

23、ction. Therefore, discussions in this International Standard concerning the use and requirements for PID frameworks extensively explore how these frameworks can deal efficiently with identifying and accessing parts of resources. Special recommendations indicate how to approach the granularity issue

24、when issuing PIDs for resources and resource collections. The need to apply PID frameworks for identifying resources contained in scientific data sets has also increased since modern archives and repositories have begun to weave a network of related complex resources that may be distributed over sev

25、eral locations. In these cases, permanent linkage is a prerequisite. In a multimedia lexicon for instance, a lexical item can refer to images not necessarily physically in the lexicon, or that are even referenced at a different site under control of a different organization. However, the link betwee

26、n the lexicon item and the image must remain valid, even if some servers or files are subject to relocation over time. Emerging e-Science scenarios, which make use of distributed services processing distributed resources, are also completely dependent on having transparent access from any processing

27、 service, irrespective of where it is located or what organization may operate it. This implies that resolving resource references should not be hampered in any way by unnecessary dependencies involving reliance on unsustainable or unpredictable services, whether they are technical or organizational

28、. The requirement that services like PID frameworks be accessible to the whole community of language resource and technology providers is further complicated by the need to provide resolvable PIDs without imposing commercial dependencies on resource providers other than the fundamental and well-esta

29、blished requirements for maintaining resources on the Internet. BS ISO 24619:2011BS ISO 24619:2011INTERNATIONAL STANDARD ISO 24619:2011(E) ISO 2011 All rights reserved 1Language resource management Persistent identification and sustainable access (PISA) 1 Scope This International Standard specifies

30、requirements for the persistent identifier (PID) framework and for using PIDs as references and citations of language resources in documents as well as in language resources themselves. In this context, examples of language resources include such works as digital dictionaries, language-purposed term

31、inological resources, machine-translation lexica, annotated multimedia/multimodal corpora, text corpora that have been annotated with, for example, morpho-syntactic information, and the like. Computational and applied linguists and information specialists create such resources. This International St

32、andard also addresses issues of persistence and granularity of references to resources, first by requiring that persistent references be implemented by using a PID framework and further by imposing requirements on any PID frameworks used for this purpose. PID frameworks also allow the association of

33、 general metadata with the identifier, which can also contain citation information. This International Standard specifies minimum requirements for effective use of PIDs in language resources and cites the use of several possible existing standards and de-facto standards, such as: ISO 69016, APA3, ML

34、A9for citation information, ISO/IEC 21000-17, IETF RFC 5147, Annotea2, temporal-fragment22, XPointer for part identifier syntax and PURL23, ARK18, Handle System24and DOI 14. 2 Normative references The following referenced documents are indispensable for the application of this document. For dated re

35、ferences, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies. ISO 12620:2009, Terminology and other language and content resources Specification of data categories and management of a Data Category Registry for lan

36、guage resources ISO/IEC 21000-17:2006, Information technology Multimedia framework (MPEG-21) Part 17: Fragment Identification of MPEG Resources W3C 2003, XPointer Framework: online W3C Recommendation 25 March 2003 viewed 2010-08-04. Available from: http:/www.w3.org/TR/xptr-framework/ WILDE, E. and D

37、UERST, M. URI Fragment Identifiers for the text/plain Media Type, IETF RFC 5147, April 2008 viewed 2010-12-22. Available from: http:/www.rfc-editor.org/rfc/rfc5147.txt BS ISO 24619:2011ISO 24619:2011(E) 2 ISO 2011 All rights reserved3 Terms and definitions For the purposes of this document, the foll

38、owing terms and definitions apply. 3.1 Resources 3.1.1 resource digital object on the web with a specific identity that can be addressed with a URI (3.2.2) NOTE 1 Adapted from IETF RFC 3986. NOTE 2 In the context of this International Standard, a resource can also be a language resource that has an

39、online representation. NOTE 3 A resource can have several representations. Depending on the PID framework (3.2.5), identification of a specific representation can be encoded in the identifier (ARK, see B.3) or be left to the content negotiating process 8 between the web client (3.3.8) that uses the

40、resolved PID to fetch the resource (3.1.1) and the resource server (3.3.6). 3.1.2 language resource digital resource that provides information about one or more languages NOTE Language resources cover lexicographical, terminological, morpho-syntactical, corpus-related, or semantic resources or digit

41、al resources used to study linguistic phenomena like texts and multimedia/multimodal recordings. They are created and used by linguists, information specialists, lexicographers and terminologists, among others. They frequently comprise many small records compiled within a larger work, and are often

42、authoritative in nature, such as standardized terminologies and glossaries issued by standards bodies such as ISO, IETF, W3C, etc. 3.1.3 complex resource resource (3.1.1) consisting of multiple constituent parts, each of which can be accessed individually NOTE A complex resource can be a federated r

43、esource if its constituent parts are distributed over different repositories (3.1.6). 3.1.4 collection grouping of any number of resources (3.1.1) that need to be referenced as a whole 3.1.5 published collection purposefully built collection of resources that is maintained as an independent entity b

44、y an archive (3.1.7) or repository (3.1.6) and for which adequate citation (3.1.16) information is available 3.1.6 digital repository repository facility that provides reliable access to managed digital resources (3.1.1) 3.1.7 archive digital archive repository (3.1.6) dedicated to the long-term pre

45、servation of its associated data NOTE Often the data in digital archives are also available online, which highlights the need for reliable persistent identifiers (3.2.4). BS ISO 24619:2011ISO 24619:2011(E) ISO 2011 All rights reserved 33.1.8 resource collection incarnation incarnation virtual embodi

46、ment of a disparate, otherwise non-aggregated collection (3.1.4) assembled for a specific purpose that is referenced by a single PID (3.2.4) concatenated with a part identifier (3.2.7) in order to access the components of the collection NOTE A bibliography or index can use a single PID together with

47、 extensions to provide access to components in a set of resources (3.1.1) used in the production of a monograph or project without actually collecting the physical files in one location, which is to say that the individual items remain in their original locations, but are referenced as parts of a vi

48、rtual whole. 3.1.9 version particular form or variation of a resource (3.1.1) that differs from other instantiations of the resource in at least one aspect or item of information NOTE Versions are often identified in sequential order (e.g. Version 1, 2, etc.), but version identification of dynamic r

49、esources subject to frequent change is often achieved by assigning a date-time stamp. 3.1.10 snapshot instantaneous copy of a resource (3.1.1) representing the status of the resource or collection at a single point in time 3.1.11 abstract resource non-network-retrievable resource identified by a URI (3.2.2), usually a concept such as a class or property NOTE It is practice, for example in RDFS (RDF Schema) or OWL (web ontology language) ontologies, to identify abstract resources using URIs. Web arch

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 标准规范 > 国际标准 > BS

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1