1、Reference numberISO/IEC 13249-2:2000(E)ISO/IEC 2000INTERNATIONALSTANDARDISO/IEC13249-2First edition2000-09-01Information technology Databaselanguages SQL multimedia andapplication packages Part 2:Full-TextTechnologies de linformation Langages de bases de donnes Multimdia SQL et paquetages dapplicati
2、on Partie 2: Texte completAdopted by INCITS (InterNational Committee for Information Technology Standards) as an American National Standard.Date of ANSI Approval: 4/4/01Published by American National Standards Institute,25 West 43rd Street, New York, New York 10036Copyright 2002 by Information Techn
3、ology Industry Council (ITI).All rights reserved.These materials are subject to copyright claims of International Standardization Organization (ISO), InternationalElectrotechnical Commission (IEC), American National Standards Institute (ANSI), and Information Technology Industry Council(ITI). Not fo
4、r resale. No part of this publication may be reproduced in any form, including an electronic retrieval system, withoutthe prior written permission of ITI. All requests pertaining to this standard should be submitted to ITI, 1250 Eye Street NW,Washington, DC 20005.Printed in the United States of Amer
5、icaISO/IEC 13249-2:2000(E)PDF disclaimerThis PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall notbe edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
6、 downloading thisfile, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in thisarea.Adobe is a trademark of Adobe Systems Incorporated.Details of the software products used to create this PDF file can be found in th
7、e General Info relative to the file; the PDF-creation parameterswere optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely eventthat a problem relating to it is found, please inform the Central Secretariat at the address g
8、iven below. ISO/IEC 2000All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronicor mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs me
9、mber bodyin the country of the requester.ISO copyright officeCase postale 56 Gb7 CH-1211 Geneva 20Tel. + 41 22 749 01 11Fax + 41 22 749 09 47E-mail copyrightiso.chWeb www.iso.chPrinted in Switzerlandii ISO/IEC 2000 All rights reservedISO/IEC 13249-2:2000 (E) ISO/IEC 2000 - All rights reserved iiiCon
10、tents PageFOREWORD ixINTRODUCTION x1 SCOPE 12 NORMATIVE REFERENCES 32.1 International standards 32.2 Publicly available standards 33 DEFINITIONS, NOTATIONS, AND CONVENTIONS 53.1 Definitions 53.1.1 Definitions provided in Part 1 53.1.2 Definitions provided in Part 2 53.1.3 Definitions taken from ISO/
11、IEC 9075 53.1.4 Definitions taken from ANSI/NISO Z39.19 63.2 Notations 63.3 Conventions 64 CONCEPTS 74.1 Text model 74.2 Text identification facilities 74.2.1 Single word patterns (patterns of the form ) 84.2.2 Single phrase patterns (patterns of the form ) 84.2.3 Patterns representing sets of singl
12、e words 94.2.4 Patterns formed by sets of single phrases 114.2.5 Patterns specifying context conditions 134.2.6 Patterns involving Boolean operators 144.2.7 Identification of FullText values which are pertinent to a given text 154.3 Text ranking facilities 154.4 Language aspects 164.4.1 Multilingual
13、 texts and patterns 164.4.2 Treatment of stop words 174.5 Word normalization 174.6 Types and routines provided by this part of ISO/IEC 13249 184.6.1 Types and routines intended for public use 18ISO/IEC 13249-2:2000 (E)iv ISO/IEC 2000 - All rights reserved4.6.2 Types and routines for definition 184.6
14、.3 Technique for defining the semantics of Category 1 Contains methods 185 FULL-TEXT TYPES 215.1 FullText Type and Routines 215.1.1 FullText Type 215.1.2 Contains Methods 245.1.3 Rank Methods 265.1.4 Tokenize Method 285.1.5 TokenizePosition Method 295.1.6 Segmentize Method 315.1.7 TokenizeAndStem Me
15、thod 325.1.8 TokenizePositionAndStem Method 335.1.9 FullText Methods 355.1.10 FullText_to_Character Function 365.1.11 StrctPattern_to_FT_Pattern Function 375.2 FT_TokenPosition Type and Routines 385.2.1 FT_TokenPosition Type 385.3 FT_Pattern Type and Routines 395.3.1 FT_Pattern Type 395.3.2 FT_Patte
16、rn Key Words 556 STRUCTURED SEARCH PATTERN TYPES 576.1 FT_Any Type and Routines 586.1.1 FT_Any Type 586.1.2 Contains Method 596.1.3 FT_Any Method 616.2 FT_Primary Type and Routines 626.2.1 FT_Primary Type 626.2.2 Contains Method 636.2.3 StrctPattern_to_FT_Pattern Method 646.3 FT_WordOrPhrase Type an
17、d Routines 656.3.1 FT_WordOrPhrase Type 656.3.2 Contains Method 666.3.3 StrctPattern_to_FT_Pattern Method 676.3.4 getWordArray Method 686.4 FT_TextLiteral Type and Routines 696.4.1 FT_TextLiteral Type 696.4.2 Contains Method 716.4.3 StrctPattern_to_FT_Pattern Method 736.4.4 matches Method 746.4.5 To
18、kenize Method 756.4.6 getWordArray Method 766.4.7 FT_TextLiteral Methods 776.4.8 EliminateDQS Function 786.4.9 InsertDQS Function 79ISO/IEC 13249-2:2000 (E) ISO/IEC 2000 - All rights reserved v6.5 FT_StemmedWord Type and Routines 806.5.1 FT_StemmedWord Type 806.5.2 Contains Method 826.5.3 StrctPatte
19、rn_to_FT_Pattern Method 846.5.4 TokenizeAndStem Method 856.5.5 FT_StemmedWord Methods 866.6 FT_Phrase Type and Routines 876.6.1 FT_Phrase Type 876.6.2 Contains Method 896.6.3 StrctPattern_to_FT_Pattern Method 936.6.4 getWordArray Method 946.6.5 TokenizePosition Method 956.6.6 FT_Phrase Methods 966.6
20、.7 matches Function 986.6.8 prune Function 1006.7 FT_StemmedPhrase Type and Routines 1016.7.1 FT_StemmedPhrase Type 1016.7.2 Contains Method 1036.7.3 StrctPattern_to_FT_Pattern Method 1076.7.4 TokenizePositionAndStem Method 1096.7.5 FT_StemmedPhrase Methods 1106.8 FT_Proxi Type and Routines 1126.8.1
21、 FT_Proxi Type 1126.8.2 Contains Method 1136.8.3 StrctPattern_to_FT_Pattern Method 1166.8.4 FT_Proxi Method 1176.9 FT_Soundex Type and Routines 1186.9.1 FT_Soundex Type 1186.9.2 Contains Method 1196.9.3 StrctPattern_to_FT_Pattern Method 1206.9.4 FT_Soundex Method 1216.9.5 GetSoundsSimilar Function 1
22、226.10 FT_BroaderTerm Type and Routines 1236.10.1 FT_BroaderTerm Type 1236.10.2 Contains Method 1256.10.3 StrctPattern_to_FT_Pattern Method 1266.10.4 FT_BroaderTerm Method 1276.10.5 GetBroaderTerms Function 1286.11 FT_NarrowerTerm Type and Routines 1316.11.1 FT_NarrowerTerm Type 1316.11.2 Contains M
23、ethod 1336.11.3 StrctPattern_to_FT_Pattern Method 1346.11.4 FT_NarrowerTerm Method 1356.11.5 GetNarrowerTerms Function 1366.12 FT_Synonym Type and Routines 1396.12.1 FT_Synonym Type 1396.12.2 Contains Method 1416.12.3 StrctPattern_to_FT_Pattern Method 142ISO/IEC 13249-2:2000 (E)vi ISO/IEC 2000 - All
24、 rights reserved6.12.4 FT_Synonym Method 1436.12.5 GetSynonymTerms Function 1446.13 FT_PreferredTerm Type and Routines 1466.13.1 FT_PreferredTerm Type 1466.13.2 Contains Method 1486.13.3 StrctPattern_to_FT_Pattern Method 1496.13.4 FT_PreferredTerm Method 1506.13.5 GetPreferredTerms Function 1516.14
25、FT_RelatedTerm Type and Routines 1536.14.1 FT_RelatedTerm Type 1536.14.2 Contains Method 1546.14.3 StrctPattern_to_FT_Pattern Method 1556.14.4 FT_RelatedTerm Method 1566.14.5 GetRelatedTerms Function 1576.15 FT_TopTerm Type and Routines 1596.15.1 FT_TopTerm Type 1596.15.2 Contains Method 1606.15.3 S
26、trctPattern_to_FT_Pattern Method 1616.15.4 FT_TopTerm Method 1626.15.5 GetTopTerms Function 1636.16 FT_IsAbout Type and Routines 1656.16.1 FT_IsAbout Type 1656.16.2 Contains Method 1666.16.3 StrctPattern_to_FT_Pattern Method 1676.16.4 FT_IsAbout Method 1686.17 FT_Context Type and Routines 1696.17.1
27、FT_Context Type 1696.17.2 Contains Method 1706.17.3 StrctPattern_to_FT_Pattern Method 1736.17.4 FT_Context Method 1756.18 FT_ParExpr Type and Routines 1766.18.1 FT_ParExpr Type 1766.18.2 Contains Method 1776.18.3 StrctPattern_to_FT_Pattern Method 1786.18.4 FT_ParExpr Method 1796.19 FT_Term Type and
28、Routines 1806.19.1 FT_Term Type 1806.19.2 Contains Method 1816.19.3 StrctPattern_to_FT_Pattern Method 1826.19.4 FT_Term Method 1836.20 FT_Expr Type and Routines 1846.20.1 FT_Expr Type 1846.20.2 Contains Method 1856.20.3 StrctPattern_to_FT_Pattern Method 1866.20.4 FT_Expr Method 1876.21 FT_PhraseList
29、 Type and Routines 188ISO/IEC 13249-2:2000 (E) ISO/IEC 2000 - All rights reserved vii6.21.1 FT_PhraseList Type 1886.21.2 Contains Method 1896.21.3 StrctPattern_to_FT_Pattern Method 1916.21.4 FT_PhraseList Method 1927 FULLTEXT_TOKEN TYPE 1937.1 FullText_Token Type 1938 SQL/MM FULL-TEXT THESAURUS SCHE
30、MA 1958.1 Introduction 1958.2 FT_THESAURUS Schema 1968.3 TERM_DICTIONARY base table 1978.4 TERM_HIERARCHY base table 1988.5 TERM_SYNONYM base table 1998.6 TERM_RELATED base table 2009 SQL/MM FULL-TEXT INFORMATION SCHEMA 2019.1 Introduction 2019.2 FT_FEATURES view 2029.3 FT_Schemata view 20210 SQL/MM
31、 FULL-TEXT DEFINITION SCHEMA 20310.1 Introduction 20310.2 FT_FEATURES base table 20410.3 FT_SCHEMATA base table 20711 STATUS CODES 20912 CONFORMANCE 21112.1 Requirements for conformance 21112.2 Claims of conformance 211ANNEX A 213ISO/IEC 13249-2:2000 (E)viii ISO/IEC 2000 - All rights reservedANNEX B
32、 219INDEX 220ISO/IEC 13249-2:2000 (E) ISO/IEC 2000 - All rights reserved ixForewordISO (the International Organization for Standardization) and IEC (the International ElectrotechnicalCommission) form the specialized system for worldwide standardization. National bodies that are membersof ISO or IEC
33、participate in the development of International Standards through technical committeesestablished by the respective organization to deal with particular fields of mutual interest. Otherinternational organizations, governmental and non-governmental, in liaison with ISO and IEC, also takepart in the w
34、ork.International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3.In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IECJTC 1. Draft International Standards adopted by the joint technical committee are circ
35、ulated to nationalbodies for voting. Publication as an International Standard requires approval by at least 75 % of thenational bodies casting a vote.Attention is drawn to the possibility that some elements of this part of ISO/IEC 13249 may be the subject ofpatent rights. ISO and IEC shall not be he
36、ld responsible for identifying any or all such patent rights.International Standard ISO/IEC 13249-2 was prepared by Joint Technical Committee ISO/IEC JTC 1,Information technology, Subcommittee SC 32, Data management and interchange.ISO/IEC 13249 consists of the following parts, under the general tit
37、le Information technology Databaselanguages SQL multimedia and application packages:- Part 1: Framework- Part 2: Full-Text- Part 3: Spatial- Part 5: Still ImageAnnexes A and B of this part of ISO/IEC 13249 are for information only.ISO/IEC 13249-2:2000 (E)x ISO/IEC 2000 - All rights reservedIntroduct
38、ionThe purpose of this International Standard is to define multimedia and application specific types and theirassociated routines using the user-defined features in ISO/IEC 9075.This document is based on the content of ISO/IEC International Standard Database Language (SQL).The organization of this p
39、art of ISO/IEC 13249 is as follows:1) Clause 1, “Scope“, specifies the scope of this part of ISO/IEC 13249.2) Clause 2, “Normative references“, identifies additional standards that, through reference in this part ofISO/IEC 13249, constitute provisions of this part of ISO/IEC 13249.3) Clause 3, “Defi
40、nitions, notations, and conventions“, defines the notations and conventions used in thispart of ISO/IEC 13249.4) Clause 4, “Concepts“, presents concepts used in the definition of this part of ISO/IEC 13249.5) Clause 5, “Full-Text Types“, defines the full-text user-defined types and associated routin
41、es.6) Clause 6, “Structured Search Pattern Types“, defines a family of user-defined types to provide for theconstruction of structured search patterns.7) Clause 7, “FullText_Token Type and Routines“, defines the user-defined FullText_Token type.8) Clause 8, “SQL/MM Full-Text Thesaurus Schema“, defin
42、es the SQL/MM Full-Text thesaurus schemaused to define the thesaurus related routines.9) Clause 9, “SQL/MM Full-Text Information Schema“, defines the SQL/MM Full-Text InformationSchema.10) Clause 10, “SQL/MM Full-Text Definition Schema“, defines the SQL/MM Full-Text DefinitionSchema.11) Clause 11, “
43、Status Codes“, defines the SQLSTATE codes used in this part of ISO/IEC 13249.12) Clause 12, “Conformance“, defines the criteria for conformance to this part of ISO/IEC 13249.13) Annex A, “Implementation-defined elements“, is an informative Annex. It lists those features forwhich the body of this par
44、t of ISO/IEC 13249 states that the syntax or meaning or effect on thedatabase is partly or wholly implementation-defined, and describes the defining information that animplementor shall provide in each case.14) Annex B, “Implementation-dependent elements“, is an informative Annex. It list those feat
45、ures whichthe body of this part of ISO/IEC 13249 states explicitly that the syntax or meaning or effect on thedatabase is implementation-dependent.In the text of this part of ISO/IEC 13249, Clauses begin a new odd-numbered page, and in Clause 5, “Full-Text Types“, through Clause 12, “Conformance“, S
46、ubclauses begin a new page. Any resulting blank spaceis not significant. ISO/IEC 2000 - All rights reserved Scope 11 ScopeThis part of ISO SQL/MM:a) introduces the Full-Text part of ISO/IEC 13249,b) gives the references necessary for this part of ISO/IEC 13249,c) defines notations and conventions sp
47、ecific to this part of ISO/IEC 13249,d) defines concepts specific to this part of ISO/IEC 13249,e) defines the full-text user-defined types and their associated routines.INTERNATIONALSTANDARD ISO/IEC 13249-2:2000(E)Information technology Database languages SQL multimedia and application packages Par
48、t 2:Full-TextISO/IEC 13249-2:2000 (E)2 ISO/IEC 2000 - All rights reserved(blank page)ISO/IEC 13249-2:2000 (E) ISO/IEC 2000 - All rights reserved Normative references 32 Normative referencesThe following normative documents contain provisions which, through reference in this text, constituteprovision
49、s of this part of ISO/IEC 13249. For dated references, subsequent amendments to, or revisions of,any of these publications do not apply. However, parties to agreements based on this part of ISO/IEC 13249are encouraged to investigate the possibility of applying the most recent editions of the normative documentsindicated below. For undated references, the latest edition of the normative document referred to applies.Members of ISO and IEC maintain registers of currently valid International Standards.2.1 International standardsISO/IEC 9075-1:1999