1、 Reference number ISO 24610-1:2006(E) ISO 2006INTERNATIONAL STANDARD ISO 24610-1 FIrst edition 2006-04-15 Language resource management Feature structures Part 1: Feature structure representation Gestion des ressources linguistiques Structures de traits Partie 1: Reprsentation de structures de traits
2、 ISO 24610-1:2006(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In do
3、wnloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in t
4、he General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the addres
5、s given below. ISO 2006 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs m
6、ember body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web www.iso.org Published in Switzerland ii ISO 2006 All rights reservedISO 24610-1:2006(E) ISO 2006 All rights reserved iii Content
7、s Page Foreword. v Introduction . vi 1 Scope. 1 2 Normative references. 1 3 Terms and denitions. 1 4 General characteristics of feature structure 4 4.1 Overview 4 4.2 Use of feature structures . 4 4.3 Basic concepts 5 4.4 Notations . 5 4.4.1 Overview 5 4.4.2 Graph notation 6 4.4.3 Matrix notation 7
8、4.4.4 XML-based notation 8 4.5 Structure sharing 10 4.6 Collections as complex feature values. 12 4.6.1 Overview 12 4.6.2 Lists as feature values . 12 4.6.3 Sets as feature values 14 4.6.4 Multisets as feature values 15 4.7 Typed feature structure 16 4.7.1 Overview 16 4.7.2 Types 16 4.7.3 Notations
9、. 16 4.8 Subsumption: relation on feature structures 18 4.8.1 Overview 18 4.8.2 Denition . 18 4.8.3 Condition A on path values . 19 4.8.4 Condition B on structure sharing . 19 4.8.5 Condition C on type ordering 20 4.9 Operations on feature structures and feature values. 21 4.9.1 Overview 21 4.9.2 Co
10、mpatibility . 21 4.9.3 Unication . 22 4.9.4 Unication of shared structures . 22 4.10 Operations on feature values and types 23 4.10.1 Concatenation and union operations . 23 4.10.2 Alternation. 24 4.10.3 Negation. 25 4.11 Informal semantics of feature structures. 27 5 XML Representation of feature s
11、tructures. 29 5.1 Overview 29 5.2 Organization 29 5.3 Elementary feature structures and the binary feature value 30 5.4 Other atomic feature values 32 5.5 Feature and feature-value libraries . 35 5.6 Feature structures as complex feature values 37 5.7 Re-entrant feature structures 40 5.8 Collections
12、 as complex feature values. 41 ISO 24610-1:2006(E) iv ISO 2006 All rights reserved5.9 Feature value expressions. 44 5.9.1 Overview 44 5.9.2 Alternation. 44 5.9.3 Negation. 47 5.9.4 Collection of values 48 5.10 Default values 48 5.11 Linking text and analysis . 50 Annex A (informative) Formal denitio
13、ns and implementation of the XML representation of feature structures 54 A.1 Overview 54 A.2 RELAX NG specication for the module 54 Annex B (informative) Examples for illustration . 60 Annex C (informative) Type inheritance hierarchies. 62 C.1 Overview 62 C.2 Denition 62 C.3 Multiple inheritance 64
14、C.4 Type constraints . 64 Annex D (informative) Denotational semantics of feature structure. 66 D.1 Feature structure signatures . 66 D.2 Feature structure algebra. 66 D.3 FS domains 67 D.4 Feature structure interpretations 68 D.5 Satisability . 68 D.6 Subsumption . 68 D.7 Unication 69 Annex E (info
15、rmative) Use of feature structures in applications. 70 E.1 Overview 70 E.2 Phonological representation 70 E.3 Grammar formalisms or theories 70 E.4 Computational implementations . 71 Bibliography . 75 ISO 24610-1:2006(E) ISO 2006 All rights reserved v Foreword ISO (the International Organization for
16、 Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the
17、right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. Internatio
18、nal Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication a
19、s an International Standard requires approval by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. I
20、SO 24610-1 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content resources, Subcommittee SC 4, Language resource management. ISO 24610 consists of the following parts, under the general title Language resource management Feature structures: Part 1: Feature structu
21、re representation The following part is under preparation: Part 2: Feature system declaration ISO 24610-1:2006(E) vi ISO 2006 All rights reservedIntroduction This part of ISO 24610 results from the agreement between the Text Encoding Initiative Consortium (TEI) and the ISO TC 37/SC 4 that a joint ac
22、tivity should take place to revise the two existing chapters on feature structures and feature system declaration in The TEI Guidelines called P4. It is foreseen that ISO 24610 will have the following two parts. Part 1, Feature structure representation, describes feature structures and their represe
23、ntation. It provides an informal but explicit overview of their basic characteristics and formal semantics. In addition, part 1 defines a standard XML (eXtended Markup Language) vocabulary for the representation of untyped feature structures, feature values, and feature libraries. It thus provides a
24、 reference format for the exchange of feature structure representations between different application systems. Part 2, Feature system declaration, discusses ways of validating typed feature structures which are conformant to part 1, and of enforcing application-specific constraints. It proposes an X
25、ML vocabulary for the representation of such constraints with reference to a set of features and the range of values appropriate for them, and thus facilitates representation and validation of a type hierarchy as well as other well-formedness conditions for particular applications, in particular tho
26、se related to the goal of language resource management. INTERNATIONAL STANDARD ISO 24610-1:2006(E) ISO 2006 All rights reserved 1 Language resource management Feature structures Part 1: Feature structure representation 1 Scope Feature structures are an essential part of many linguistic formalisms as
27、 well as an underlying mechanism for representing the information consumed or produced by and for language engineering applications. This part of ISO 24610 provides a format for the representation, storage and exchange of feature structures in natural language applications concerned with the annotat
28、ion, production or analysis of linguistic data. It also denes a computer format for the description of constraints that bear on a set of features, feature values, feature specications and operations on feature structures, thus offering a means of checking the conformance of each feature structure wi
29、th regards to a reference specication. 2 Normative references The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendmen
30、ts) applies. ISO 8879, Information processing Text and office systems Standard Generalized Markup Language (SGML), as extended by TC 2 (ISO/IEC JTC 1/SC 34 N029: 1998-12-06). ISO 19757-2, Information technology Document Schema Definition Language (DSDL) Part 2: Regular- grammar-based validation RELA
31、X NG NOTE The first reference permits the use of XML and the second, RELAX NG,provides a specification for XML modules. RELAX NG is a schema language for XML, standing for REgular LAnguage for XML for Next Generation, and simplifies and extends the features of DTDs, Document Type Definitions. 3 Term
32、s and denitions For the purposes of this document, the terms and definitions given in ISO 8879 and ISO 19757-2 and the following apply. This list is provided to clarify the terminology relating to feature structures used throughout this part of ISO 24610. Terminology derived from XLM and other forma
33、l languages is not defined here. 3.1 alternation operation on feature values (3.23) that returns one and only one of the values supplied as its argument NOTE Given a feature specication F : a |b, where a |b denotes the alternation of a and b, F has either the value a or the value b, but not both. 3.
34、2 atomic value value (3.23) without internal structure, i.e. value other than feature structure (3.10) and collection (3.4) ISO 24610-1:2006(E) 2 ISO 2006 All rights reserved3.3 boxed label label in box used in a matrix notation to denote a value shared by several features (3.8) NOTE The label may b
35、e any alphanumeric symbol. 3.4 collection list, set, or multiset of values (3.23) NOTE A list is an ordered collection of entities some of which may be identical. A set is an unordered collection of unique entities. A multiset is an unordered collection of entities that may or may not be unique; it
36、is sometimes referred to as a bag. 3.5 complex value value (3.23) represented either as a feature structure (3.10) or as collection (3.4) 3.6 concatenation operation of combining two lists of values (3.23) into a single list 3.7 empty feature structure feature structure (3.10) containing no feature
37、specications (3.9) 3.8 feature property of an entity NOTE The combination of feature and feature-value constitutes a feature specication (3.9). For example, number is a feature, singular is a value, and a pair is a feature specication. 3.9 feature specication assignment of a value (3.23) to a featur
38、e (3.8) NOTE Formally, it is treated as a pair of a feature and its value. 3.10 feature structure set of feature specications (3.9) NOTE The minimum feature structure is the empty feature structure (3.7). 3.11 graph notation notation of feature structure (3.10) in a single rooted graph 3.12 incompat
39、ibility relation between two feature structures (3.10) which have conflicting types (3.19) or at least one common feature (3.8) with incompatible values (3.23) NOTE Two feature structures that are incompatible cannot be unied. The empty feature structure (3.7) is compatible with any other feature st
40、ructure. ISO 24610-1:2006(E) ISO 2006 All rights reserved 3 3.13 matrix notation attribute-value matrix AVM notation that uses square brackets to represent feature structures (3.10) NOTE In a matrix notation, each row represents a feature specication (3.9), with the feature name and the feature valu
41、e separated by a colon (:), space ( ) or the equals sign (=). 3.14 merge generic operation that includes union (3.22) of sets or multisets and concatenation (3.6) of lists 3.15 negation (unary) operation on a value (3.23) denoting any other value incompatible with it NOTE In this part of ISO 24610,
42、negation applies to values only and is not understood as a truth function as in ordinary bivalent logics. 3.16 path sequence of labeled arcs connecting nodes in a graph 3.17 structure sharing re-entrancy relation between two or more features (3.8) within a feature structure (3.10) that share a value
43、 (3.23) 3.18 subsumption relationship between two feature structures (3.10) in which one is more specic than the other NOTE A feature structure A is said to subsume a feature structure B if A is at least as informative as B. Subsumption is a reexive, antisymmetric, and transitive relation between tw
44、o feature structures. 3.19 type name of a class of entities NOTE Feature structures (3.10) may be characterized by grouping them into certain classes. Types are used to name such classes. 3.20 typed feature structure feature structure (3.10) labelled by a type (3.19) NOTE In the graph notation (3.11
45、), each node is labelled with a type. In the matrix notation (3.13), a type is ordinarily placed at the upper left corner of the inside of the pair of square brackets that represents a typed feature structure. In XML notation, the type is supplied as the value (3.23) of a type attribute on the eleme
46、nt. 3.21 unication operation that combines two compatible feature structures (3.10) into the least informative feature structure that contains the information from the two 3.22 union operation that combines two sets, or multisets, into one NOTE The corresponding operation for lists is concatenation
47、(3.6). ISO 24610-1:2006(E) 4 ISO 2006 All rights reserved3.23 value information about an entity NOTE There are two kinds of feature values: atomic value (3.2) and complex value (3.5). 4 General characteristics of feature structure 4.1 Overview A feature structure is a general purpose data structure
48、that identies and groups together individual features by assigning a particular value to each. Because of the generality of feature structures, they can be used to represent many different kinds of information. Interrelations among various pieces of information and their instantiation in markup prov
49、ide a meta-language for representing analysis and interpretation of linguistic content. Moreover, this instantiation allows a specication of a set of features with values of specic types and restrictions, by means of feature system declarations, or other XML mechanisms discussed in ISO 24610-2 1). 4.2 Use of feature structures Feature structures provide partial information about an object by specifying values for some or all of its features. For example, if a female employee named Sandy Jon