1、BRITISH STANDARD BS ISO 10754:1996 Information and documentation Extension of the Cyrillic alphabet coded character set for non-Slavic languages for bibliographic information interchange ICS 35.040BSISO10754:1996 This British Standard, having been prepared under the directionof the DISC Board, waspu
2、blished under the authorityof the Standards Boardand comes into effect on 15October 1997 BSI 04-2000 ISBN 0 580 28720 3 National foreword This British Standard reproduces verbatim ISO 10754:1996 and implements it as the UK national standard. The UK participation in its preparation was entrusted by T
3、echnical Committee IDT/2, Information and Documentation, to Subcommittee IDT/2/7, Mechanized Information, which has the responsibility to: aid enquirers to understand the text; present to the responsible international/European committee any enquiries on the interpretation, or proposals for change, a
4、nd keep the UK interests informed; monitor related international and European developments and promulgate them in the UK. A list of organizations represented on this subcommittee can be obtained on request to its secretary. Cross-references The British Standards which implement international or Euro
5、pean publications referred to in this document may be found in the BSI Standards Catalogue under the section entitled “International Standards Correspondence Index”, or by using the “Find” facility of the BSI Standards Electronic Catalogue. A British Standard does not purport to include all the nece
6、ssary provisions of a contract. Users of British Standards are responsible for their correct application. Compliance with a British Standard does not of itself confer immunity from legal obligations. Summary of pages This document comprises a front cover, an inside front cover, pages i and ii, theIS
7、O title page, pages ii to iv, pages 1 to 10 and a back cover. This standard has been updated (see copyright date) and may have had amendments incorporated. This will be indicated in the amendment table on the inside front cover. Amendments issued since publication Amd. No. Date CommentsBSISO10754:19
8、96 BSI 04-2000 i Contents Page National foreword Inside front cover Foreword iii Text of ISO 10754 1ii blankBSISO10754:1996 ii BSI 04-2000 Contents Page Foreword iii 1 Scope 1 2 Normative references 1 3 Implementation 1 4 Code table for extended Cyrillic characters of non-Slavic languages 2 5 Legend
9、 3 6 Explanatory notes 6 Annex A (informative) Non-Slavic languages using Cyrillic script characters from this International Standard 7 Annex B (informative) The use of the Cyrillic script for non-Slavic languages 9 Annex C (informative) Bibliography 10 Table 1 2 Table 2 3 Table A.1 Listing by non-S
10、lavic language 7 Table A.2 Listing by non-spacing character 9 Descriptors: Documentation, bibliographies, data processing, information interchange, graphic characters, Cyrillic characters, character sets, coded character sets, coded representation, extensions.BSISO10754:1996 BSI 04-2000 iii Foreword
11、 ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical
12、 committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of elec
13、trotechnical standardization. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. International Standard ISO10754 was prepare
14、d by Technical Committee ISO/TC 46, Information and documentation, Subcommittee SC 4, Computer applications in information and documentation. Annex A, Annex B and Annex C of this International Standard are for information only.iv blankBSISO10754:1996 BSI 04-2000 1 1 Scope 1.1 This International Stan
15、dard specifies a set of 93 graphic characters with their coded representations. It consists of a code table and a legend showing each graphic, its use and its name. Explanatory notes are also included. The character set is primarily intended for the interchange of information among data processing s
16、ystems and within message transmission systems. 1.2 These characters, together with characters in the basic Cyrillic set, registered as number 37 in the ISOinternational register, constitute a character set for the international interchange of bibliographic citations, including their annotations, in
17、 the non-Slavic Cyrillic alphabets for the languages specified in1.3. 1.3 This character set is intended to handle information in the following language groups: 1.4 This coded character set contains characters used since the Russian Revolution (1917). Some letters which appear to be unrepresented in
18、 the character table are actually graphic variants. Obsolete letters, those used for only a brief period in the late 19th century, have been excluded from this International Standard. This applies chiefly to early letters used in Chechen, Chuvash, Dargwa, Lak and Lezghian. Letters from their 20th ce
19、ntury alphabets are included. 2 Normative references The following standards contain provisions which, through reference in this text, constitute provisions of this International Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parti
20、es to agreements based on this International Standard are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. Members of IEC and ISO maintain registers of currently valid International Standards. ISO/IEC 646:1991, Information technology IS
21、O 7-bit coded character set for information interchange. ISO/IEC 2022:1994, Information technology Character code structure and extension techniques. International register of character sets to be identified by means of escape sequences 1) . 3 Implementation 3.1 The implementation of this coded char
22、acter set in physical media and for transmission, taking into account the need for error checking, is the subject of other International Standards (see Annex C). Abazian Kabardian Mordvin Abkhasian Kalmyk Nenets Adyghe Karachay Nivkh Aisor Kara-Kalpak Nogai Altaic Karelian Ossetic Avar Kazakh Romany
23、 Azerbaijani Khakass Sami Balkar Khanty Selkup Bashkir Kirghiz Shor Buryat Komi Tabasaran Chechen Koryak Tajik Chukchi Kumyk Tat Chuvash Kurdish Tatar Dargwa Lak Turkmen Dungan Lezghian Tuvinian Eskimo Lithuanian Udekhe Even Mansi Udmurt Evenki Mari Uighur Gagauzi Moldavian Uzbek Ingush Mongolian Ya
24、kut 1) Available on application to the Secretariat of the Registration Authority: ECMA, 114 rue du Rhne, CH-1204 Genve, Switzerland.BSISO10754:1996 2 BSI 04-2000 3.2 The implementation of this International Standard is in accordance with the provisions of ISO/IEC 2022 2)and is identified by an escap
25、e sequence. (To be assigned.) 3.3 The unassigned positions in the code tables shall not be utilized in the international interchange of bibliographic information. 4 Code table for extended Cyrillic characters of non-Slavic languages Table 1 is the code table for extended Cyrillic characters of non-S
26、lavic languages. Table 1 2) G0: ESC 2/8 F; G1: ESC 2/9 F; G2: ESC 2/10 F; G3: ESC 2/11 F (“F” represents the final character of the escape sequence).BSISO10754:1996 BSI 04-2000 3 5 Legend Table 2 gives the code, graphic and name of each character and comments on usage. Table 2 Code Graphic Name Comm
27、ents 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F COMBINING ACUTE COMBINING DIAERESIS (Dialytika) COMBINING OGONEK COMBINING RIGHT DESCENDER COMBINING BREVE COMBINING CEDILLA COMBINING GRAVE CYRILLIC SMALL LETTER AIE CYRILLIC SMALL LETTER GHE WITH STRO
28、KE CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK CYRILLIC SMALL LETTER KOMI DE CYRILLIC SMALL LETTER KOMI DJE CYRILLIC SMALL LETTER ABKHASIAN DZE CYRILLIC SMALL LETTER KOMI DZE CYRILLIC SMALL LETTER KOMI ZJE (This position shall not be used) COMBINING DOUBLE ACUTE COMBINING MACRON COMBINING LEFT OGONEK
29、 COMBINING LEFT DESCENDER COMBINING CARON COMBINING RING ABOVE COMBINING HIGH COMMA CYRILLIC CAPITAL LETTER A IE CYRILLIC CAPITAL LETTER GHE WITH STROKE CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK CYRILLIC CAPITAL LETTER KOMI DE CYRILLIC CAPITAL LETTER KOMI DJE CYRILLIC CAPITAL LETTER ABKHASIAN DZE
30、 CYRILLIC CAPITAL LETTER KOMI DZE CYRILLIC CAPITAL LETTER KOMI ZJEBSISO10754:1996 4 BSI 04-2000Table 2 Code Graphic Name Comments 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F CYRILLIC SMALL LETTER YAKUT I WITH STROKE CYRILLIC SMALL LETTER JE WITH ST
31、ROKE CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE CYRILLIC SMALL LETTER BASHKIR KA CYRILLIC SMALL LETTER KA WITH STROKE CYRILLIC SMALL LETTER CHECHEN KA CYRILLIC SMALL LETTER KURDISH QA CYRILLIC SMALL LETTER AISOR EL CYRILLIC SMALL LETTER KOMI ELJ CYRILLIC SMALL LETTER EL WITH MIDDLE HOOK CYRILLIC
32、SMALL LETTER MORDVIN EL KA CYRILLIC SMALL LETTER ALTAIC NG CYRILLIC SMALL LETTER CHUVASH NG CYRILLIC SMALL LETTER KOMI NG CYRILLIC SMALL LETTER EN WITH MIDDLE HOOK CYRILLIC SMALL LETTER O WITH STROKE CYRILLIC CAPITAL LETTER YAKUT I WITH STROKE CYRILLIC CAPITAL LETTER JE WITH STROKE CYRILLIC CAPITAL
33、LETTER KA WITH VERTICAL STROKE CYRILLIC CAPITAL LETTER BASHKIR KA CYRILLIC CAPITAL LETTER KA WITH STROKE CYRILLIC CAPITAL LETTER CHECHEN KA CYRILLIC CAPITAL LETTER KURDISH QA CYRILLIC CAPITAL LETTER AISOR EL CYRILLIC CAPITAL LETTER KOMI ELJ CYRILLIC CAPITAL LETTER EL WITH MIDDLE HOOK CYRILLIC CAPITA
34、L LETTER MORDVIN EL KA CYRILLIC CAPITAL LETTER ALTAIC NG CYRILLIC CAPITAL LETTER CHUVASH NG CYRILLIC CAPITAL LETTER KOMI NG CYRILLIC CAPITAL LETTER EN WITH MIDDLE HOOK CYRILLIC CAPITAL LETTER O WITH STROKE Also in Dargwa, Lak, Lezghian Also in Yakut Also Mordvin and Yakut Also in Dargwa, Lak, Lezghi
35、an “Q” is alternate rendering Also used in Yakut Also used in Mordvin and YakutBSISO10754:1996 BSI 04-2000 5Table 2 Code Graphic Name Comments 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E CYRILLIC SMALL LETTER ABKHASIAN HA CYRILLIC SMALL LETTER SELKUP
36、O IE CYRILLIC SMALL LETTER ABKHASIAN PHE CYRILLIC SMALL LETTER ER KA CYRILLIC SMALL LETTER KOMI ESJ CYRILLIC SMALL LETTER KOMI TJE CYRILLIC SMALL LETTER STRAIGHT U CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE CYRILLIC SMALL LETTER KURDISH WE CYRILLIC SMALL LETTER ABKHASIAN THE CYRILLIC SMALL LETTER
37、CHE WITH VERTICAL STROKE CYRILLIC SMALL LETTER HE CYRILLIC SMALL LETTER ABKHASIAN CHE CYRILLIC SMALL LETTER SHWA CYRILLIC SMALL LETTER YA IE CYRILLIC ASPIRATION OR GUTTURAL SIGN CYRILLIC CAPITAL LETTER ABKHASIAN HA CYRILLIC CAPITAL LETTER SELKUP O IE CYRILLIC CAPITAL LETTER ABKHASIAN PHE CYRILLIC CA
38、PITAL LETTER ER KA CYRILLIC CAPITAL LETTER KOMI ESJ CYRILLIC CAPITAL LETTER KOMI TJE CYRILLIC CAPITAL LETTER STRAIGHT U CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE CYRILLIC CAPITAL LETTER KURDISH WE CYRILLIC CAPITAL LETTER ABKHASIAN THE CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE CYRILLIC CA
39、PITAL LETTER HE CYRILLIC CAPITAL LETTER ABKHASIAN CHE CYRILLIC CAPITAL LETTER SHWA CYRILLIC CAPITAL LETTER YA IE Also used in Dargwa, Lezghian Used in Caucasian languages Also used in Dargwa, LezghianBSISO10754:1996 6 BSI 04-2000 6 Explanatory notes 6.1 Punctuation marks and numerals in European sty
40、le used in the non-Slavic languages covered by this International Standard are available in the basic Cyrillic set (Registration No. 37 in the international register with which this set is designed for use). 6.2 In many of the non-Slavic languages, diacritical marks are combined with Cyrillic script
41、 letters to create distinctive modified letters. These marks are usually placed above or below a letter. The most common marks include the diaeresisthe acute right descender left descender ogonek and left ogonek The use of such marks is widespread and thus, several non-spacing combining marks are de
42、fined in this International Standard. In some texts, a high comma is used above letters instead of an acute mark Most modern Cyrillic script reference sources tend to use the acute, however. Both combining marks are defined in this International Standard. In older texts, an apostrophe is occasionall
43、y used to represent modified letters. When this character is needed, the apostrophe provided in the basic Cyrillic set (Registration No. 37 in the international register) should be used. The non-Slavic languages make liberal and sometimes inconsistent use of the right descender, left descender, ogon
44、ek and left ogonek, in combination with many letters, especially consonants, to show palatalization, aspiration, etc. The cedilla is also combined with some consonants to represent certain sounds. These combining marks have been defined in this International Standard to permit the encoding of the la
45、rge number of combinations that have been identified. Characters with large middle hooks or tails are defined as separate characters. Sources identify these marks as either: “hvostik” (tail), “sedil” (cedilla), or “krjuk” (hook). 6.3 The guttural or aspiration sign (pridykhatelnyj znak; I) must not
46、be confused with the Latin script capital “I”. This sign is used in many Caucasian languages. It always follows a consonant and has the same form regardless of the case of the other letters in a word (e.g. ). Although technically a sign (like a percent “%” sign), this character is given as the last
47、letter of most Cyrillic-based alphabets. The notion of capitalization is not applied to this sign, thus, it is assigned only one code in this International Standard. 6.4 The 14 characters coded in columns 2 and 3 of Table 2 (positions 2127 and 3137) represent combining marks which are non-spacing ch
48、aracters, that is, characters whose use is not followed by the forward movement of an output device. In a character string, these non-spacing characters are input before the characters they modify. Multiple combining marks associated with one letter are to be encoded in the order in which they appea
49、r, reading left to right or top to bottom. They are intended to be combined with other spacing characters in this International Standard or characters from the basic Cyrillic set. These combining marks (e.g. diaeresis) are used liberally in the non-Slavic languages that have Cyrillic-based alphabets. The BACKSPACE character (hexadecimal code 08 in ISO/IEC 646) should not be used when encoding non-spacing characters. 6.5 The rendering of graphic characters is intended solely to identify uniquely the additional Cyrillic script letters used by non-Slavic l