1、BRITISH STANDARD BS ISO 12199:2000 Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet ICS 01.020 BS ISO 12199:2000 This British Standard, having been prepared under the direction of the Management Systems Sector Policy and Strategy Committ
2、ee, was published under the authority of the Standards Policy and S t r a t e g y C o m m i t t e e o n 4 October 2002 BSI 4 October 2002 ISBN 0 580 40500 1 National foreword This British Standard reproduces verbatim ISO 12199:2000 and implements it as the UK national standard. The UK participation
3、in its preparation was entrusted to Technical Committee TS/1, Terminology, which has the responsibility to: A list of organizations represented on this committee can be obtained on request to its secretary. Cross-references The British Standards which implement international publications referred to
4、 in this document may be found in the BSI Catalogue under the section entitled “International Standards Correspondence Index”, or by using the “Search” facility of the BSI Electronic Catalogue or of British Standards Online. This publication does not purport to include all the necessary provisions o
5、f a contract. Users are responsible for its correct application. Compliance with a British Standard does not of itself confer immunity from legal obligations. aid enquirers to understand the text; present to the responsible international/European committee any enquiries on the interpretation, or pro
6、posals for change, and keep the UK interests informed; monitor related international and European developments and promulgate them in the UK. Summary of pages This document comprises a front cover, an inside front cover, the ISO title page, pages ii to v, a blank page, pages 1 to 40, an inside back
7、cover and a back cover. The BSI copyright date displayed in this document indicates when the document was last issued. Amendments issued since publication Amd. No. Date CommentsReference number ISO 12199:2000(E) INTERNATIONAL STANDARD ISO 12199 First edition 2000-08-01 Alphabetical ordering of multi
8、lingual terminological and lexicographical data represented in the Latin alphabet Mise en ordre alphabtique des donnes lexicographiques et terminologiques multilingues reprsentes dans lalphabet latin BSISO12199:2000ISO 99121:(0002)E DPF dlcsiremia ihTs PDF file mya ctnoain emdeddeb tyfepcaes. In acc
9、cnadroe with Aebods licnesign lopic,y this file may be pirntde ro ivwede tub slahl ton eb ideted lnuess teh tyfepacse which aer emdeddeb era licnesed to dna intslaled on teh computre freporming teh idetign. In wodlnidaogn this f,eli trapies accept tniereh teh sersnoptilibiy of ton nifrgnigni Asebod
10、licsnegni ilopcy. ehT ISO tneClar Secteraairt accepts no lilibaity ni this .aera Aebod is a tedarmakr fo Aebod Stsyems Icntaropro.de teDails fo the sfotwera pcudorts sude to crtaee tihs PFD file can be fdnuo in the Glarene Info leratiev to the fil;e the Pc-FDaertion arapmteres were tpoimizde for pir
11、tning. Evyre cera sah neeb taken to enseru taht teh file is suitlbae fro sue by ISO memreb idob.se In teh lnuikley etnev ttah a plborem leratign to it is f,dnuo lpsaee ifnrom teh tneClar Sceterairat at the sserdda givne lebwo. ISO 0002 All rhgits rsevre.de elnUss towrehise scepfidei, on trap of tihs
12、 cilbuptanoi mya eb cudorperde ro utzilide ni yna form ro yb yna snaem, eelctinorc ro mecinahcal, inclidugn tohpociypong dna micrfoilm, wittuoh repmissino in writing form eitreh ISO at teh erddass lebwo ro ISOs memreb ydob in the ctnuoyr of the rtseuqee.r ISO cirypothg fofice saCe tsopale 65 1121-HC
13、 aveneG 20 leT. + 41 22 947 10 11 xaF + 14 22 947 90 74 E-mail cirypothgiso.ch eWb www.iso.ch Printed in Switzlredna ii ISO 0002 All irhgts seredevr BSISO12199:2000iiISO 99121:(0002)E ISO 0002 All rights rsedevre iii Contents Page Foreword.iv Introduction.v 1 Scope .1 2 Normative references.1 3 Term
14、s and definitions 2 4 Preparatory procedures.2 5 First ordering level .3 5.1 First-ordering-level values.3 5.2 First-ordering-level sequence .3 5.3 Equivalence between special Latin letters and basic letters.4 6 Second ordering level 4 6.1 Second-ordering-level values .4 6.2 Special Latin letters an
15、d letters with diacritical marks 4 7 Third ordering level 6 7.1 Third-ordering-level values .6 7.2 Ordering according to capitalization6 8 Fourth ordering level6 8.1 Fourth-ordering-level values.6 8.2 Ordering according to special characters .6 Annex A (normative) Word-by-word ordering7 Annex B (inf
16、ormative) Special rules for lexicographical and terminological ordering9 Annex C (informative) Ordering rules for chemical names.10 Annex D (informative) Character repertoire of the Latin alphabet .12 Annex E (informative) Languages using the Latin alphabet.19 Annex F (informative) Alphabetical sequ
17、ences and character repertoires.22 Annex G (normative) Formal description of the rules of the main body of this International Standard .32 Bibliography 38 BSISO12199:2000iiiISO 99121:(0002)E vi ISO 0002 All rights rsedevre Foreword ISO (the International Organization for Standardization) is a worldw
18、ide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on th
19、at committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in
20、accordance with the rules given in the ISO/IEC Directives, Part 3. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. Attent
21、ion is drawn to the possibility that some of the elements of this International Standard may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. International Standard ISO 12199 was prepared by Technical Committee ISO/TC 37, Terminology (
22、principles and coordination), Subcommittee SC 2, Layout of vocabularies. It complements other International Standards prepared by ISO/TC 37, such as ISO 10241:1992, International terminology standards Preparation and layout and ISO 12200:1999, Computer applications in terminology Machine-readable te
23、rminology interchange format (MARTIF) Negotiated interchange. Annexes A and G form a normative part of this International Standard. Annexes B to F are for information only. BSISO12199:2000ivISO 99121:(0002)E ISO 0002 All rights rsedevre v Introduction In the development of international terminologie
24、s, both in printed form and in databases, it is essential to have uniform and internationally recognized rules for the alphabetical ordering of terminological and lexicographical data, to make these terminologies more easily accessible for the users. In addition, it will facilitate the interchange o
25、f terminological and lexicographical data. BSISO12199:2000vINITANRETOLAN DRADNATS ISO 99121:(0002)E ISO 0002 All rights rsedevre 1 Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet 1 Scope This International Standard specifies the sequenc
26、e of characters to be used in the alphabetical ordering of multilingual terminological and lexicographical data (terms, term elements, or words) represented in the Latin alphabet. Character sets of languages represented in the Latin alphabet are taken into account insofar as terminological or lexico
27、graphical data have been recorded. Character sets used in internationally standardized transliteration into Latin script are also taken into account. The sequence of alphabetical characters given is intended for multilingual purposes only and is not intended to affect the alphabetical order of any s
28、pecific language. The main part of this International Standard specifies letter-by-letter ordering of character strings. Normative annex A treats word-by-word ordering, which is a widely used alternative to this system. Informative annex B gives two additional rules that may be useful for lexicograp
29、hical and terminological ordering. Informative annex C gives ordering rules for chemical names. Informative annex D lists the character repertoire of the Latin alphabet. Informative annex E lists languages using the Latin alphabet. Informative annex F gives alphabetical sequences derived from the se
30、quence specified in this International Standard for a number of languages that use the Latin alphabet. Normative annex G gives a formal description of the rules laid down in the main part of this International Standard conforming with ISO/IEC 14651. 2 Normative references The following normative doc
31、uments contain provisions which, through reference in this text, constitute provisions of this International Standard. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. However, parties to agreements based on this International Standard are enco
32、uraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. For undated references, the latest edition of the normative document referred to applies. Members of ISO and IEC maintain registers of currently valid International Standards. ISO 1
33、087:1990, Terminology Vocabulary. ISO 1087-1: 1) , Terminology work Vocabulary Part 1: Theory and application. ISO 1087-2:2000, Terminology work Vocabulary Part 2: Computer applications. ISO/IEC 10646-1:1993, Information technology Universal Multiple-Octet Coded Character Set (UCS) Part 1: Architect
34、ure and Basic Multilingual Plane. ISO/IEC 14651: 1) , Information technology International string ordering Method for comparing character strings and description of a default tailorable ordering. _ 1) To be published. BSISO12199:20001ISO 99121:(0002)E 2 ISO 0002 All rights rsedevre 3 Terms and defin
35、itions For definitions of terminological concepts, see ISO 1087, ISO 1087-1 and ISO 1087-2. For the purpose of this International Standard, the following terms and definitions apply. 3.1 character member of a set of elements used for the organization, control or representation of data 3.2 letter cha
36、racter used for writing natural language, often representing a sound in the language 3.3 digit character used to represent the numeric value, or part thereof, of a number 3.4 special character character that is not a letter nor a digit EXAMPLE The space character is a special character. 3.5 ligature
37、 character resulting from the joining of two or more letters NOTE The resulting character is, in some cases, considered a separate letter. 3.6 polygraph two or more consecutive letters that are regarded as one letter for some purpose NOTE A polygraph consisting of two or three letters may be referre
38、d to as a digraph or a trigraph respectively. 3.7 diacritical mark character that is not a letter and is placed over, under, or through a letter or a combination of letters 3.8 ordering act of bringing strings of characters into a well-defined sequence according to a string comparison specification
39、4 Preparatory procedures In the process of alphabetical ordering, character strings are compared according to a set of rules. This International Standard specifies the set of rules to be used for the ordering, but does not address the means of selection of relevant character strings, nor any modific
40、ation of the strings that may be needed for a given purpose. Consequently, certain preparatory procedures may be needed before applying the ordering rules. Depending on the needs in each individual case the relevant character strings may have to be selected, e.g. relevant terms may have to be extrac
41、ted from a corpus, the character strings may have to be modified, e.g. sentence-initial uppercase letters may have to be changed to lowercase letters, plural form of words may have to be changed to singular form, or leading zeroes or spaces may be added e.g. in lists containing numerals. Polygraphs
42、are treated as sequences of separate letters. BSISO12199:20002ISO 99121:(0002)E ISO 0002 All rights rsedevre 3 An application may arrange information into several ordering fields, and determine ranking order with several separate and independent comparisons. This International Standard only defines
43、a single comparison for one such field, where the field is a character-string field. Only the characters that appear in the string and their arrangement are taken into account. Apart from the ordering rules and passes, no other knowledge about the words in the character string is used. For example,
44、dictionary information or rules about language syntax, phonetics and semantics are not used. 5 First ordering level 5.1 First-ordering-level values When comparing strings to be ordered, the first-ordering-level values of the strings shall be considered first. The subsequent ordering-level values nee
45、d to be considered only if two or more strings have identical first-ordering- level values. For multilingual ordering, the following rules shall be applied (see annex A for word-by-word ordering): 5.2 First-ordering-level sequence Digits and letters have the following ordering values: a) Digits: 012
46、3456789 NOTE 1 Sequences of digits will be ordered from left to right as written, thus generating the following order, e.g.:11 01 0 01 1 1 1 01 1 11 21 91 9 022 13 . NOTE 2 Leading zeroes may be inserted as a preparatory procedure, e.g. to generate the following order: 0001 0002 0003 0010 0011 0012
47、0019 0021 0100 0110 0111 0190. b) Basic letters of the Latin alphabet: aA bB cC dD eE fF gG hH iI jJ kK lL mM nN oO pP qQ rR sS tT uU vV wW xX yY zZ NOTE 1 This order has been established for use in multilingual environments so as to conflict with as few individual languages as possible. See informa
48、tive annex F for examples of deviations from this sequence in some languages. Uppercase and lowercase letters shall be treated as equivalent (see clause 7). Letters of the Latin alphabet with diacritical marks shall be treated as equivalent to the corresponding basic Latin letters (see clause 6). Sp
49、ecial letters of the Latin alphabet shall be treated as equivalent to basic Latin letters according to Table 1 in 5.3 (see clause 6). The Turkish language distinguishes /I from i/İ, while other languages have the pair i/I only. To order multilingual data including Turkish text, the i/I pair shall be expanded as follows: 1: /I U0131/U0049 LATIN LETTER DOTLESS I (Turkish) 2: i/I U0069/U0049 LATIN L