1、Reference number ISO 15919:2001(E) ISO 2001 INTERNATIONAL STANDARD ISO 15919 First edition 2001-10-01 Information and documentation Transliteration of Devanagari and related Indic scripts into Latin characters Information et documentation Translittration du Devanagari et des critures indiennes lies
2、en caractres latinsISO 15919:2001(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing th
3、e editing. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file c
4、an be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretari
5、at at the address given below. ISO 2001 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address
6、 below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.ch Web www.iso.ch Printed in Switzerland ii ISO 2001 All rights reservedISO 15919:2001(E) ISO 2001 All rights reserved i
7、ii Contents Page 1 Scope 1 2 Conformance1 3 Normative references 1 4 Terms and definitions .2 5 Abbreviated terms .3 6 Characteristics of Indic scripts 3 7 Transliteration tables 4 8 Special requirements and recommendations.16 8.1 Special requirements 16 8.2 Recommendations.18 9 Options .18 10 Table
8、s for uniform transliteration of Indic scripts .19 11 Transliteration scheme for limited character set .19 12 Recommended transliteration of Indic schemes for Perso-Arabic characters.19 13 Additional Indic scripts .19 14 Reverse transliteration19 Annex A (normative) Tables for uniform transliteratio
9、n .20 Annex B (normative) Transliteration table for limited (7-bit) character set 24 Annex C (normative) Recommended transliteration of Indic schemes for Perso-Arabic characters25 Annex D (informative) Examples of Indic characters used for Perso-Arabic .26 Annex E (informative) Additional Indic scri
10、pts 27 Annex F (informative) Reverse transliteration of Indic scripts.28 F.1 Overview.28 F.2 Examples of reverse transliteration in modern Indic languages28 F.3 Reverse transliteration in Vedic texts .28 Bibliography29ISO 15919:2001(E) iv ISO 2001 All rights reserved Foreword ISO (the International
11、Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been esta
12、blished has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardiza
13、tion. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75
14、% of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this International Standard may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. International Standard ISO 15919 was prepared by
15、 Technical Committee ISO/TC 46, Information and documentation, Subcommittee SC 2, Conversion of written languages. Annexes A, B and C form a normative part of this International Standard. Annexes D, E and F are for information only.ISO 15919:2001(E) ISO 2001 All rights reserved v Introduction Script
16、 conversion is often required for documents such as historical and literary texts, geographical texts (including maps and atlases), bibliographies, catalogues, lists and passports (and other identification documents). Text in Devanagari script or other Indic scripts sometimes needs to be shown in La
17、tin script, where users, or equipment that they are using, cannot read or write the text.INTERNATIONAL STANDARD ISO 15919:2001(E) ISO 2001 All rights reserved 1 Information and documentation Transliteration of Devanagari and related Indic scripts into Latin characters 1 Scope This International Stan
18、dard provides tables which enable the transliteration into Latin characters from text in Indic scripts which are largely specified in rows 09 to 0D of UCS (ISO/IEC 10646-1 and Unicode). The tables provide for the Devanagari, Bengali (including the characters used for writing Assamese), Gujarati, Gur
19、mukhi, Kannada, Malayalam, Oriya, Sinhala, Tamil, and Telugu scripts which are used in India, Nepal, Bangladesh and Sri Lanka. The Devanagari, Bengali, Gujarati, Gurmukhi, and Oriya scripts are North Indian scripts, and the Kannada, Malayalam, Tamil, and Telugu scripts are South Indian scripts. The
20、Burmese, Khmer, Thai, Lao and Tibetan scripts which also share a common origin with the Indic scripts, and which are used predominantly in Myanmar, Cambodia, Thailand, Laos, Bhutan and the Tibetan Autonomous Region within China, are not covered by this International Standard. This International Stan
21、dard applies to transliteration of Devanagari, and to Indic scripts related to Devanagari, independent of the period in which it is or was used (i.e. for Devanagari script it can be used for transliterating text in classical Sanskrit, Hindi, Marathi, and the Vedic language, for instance). Other Indi
22、c scripts whose character repertoires are covered by the tables may also be transliterated using this International Standard. Options in this International Standard are defined in clause 9. 2 Conformance Text originally in non-Latin script which is converted to a Latin-script representation conforms
23、 to this International Standard with or without any of the specific recommendations, if it follows the rules defined in 8.1 and the conversion tables given in clause 7 and normative annexes A and B, with or without following any of the three recommendations given in 8.2 and clause 12, all in accorda
24、nce with the options defined in clause 9. A claim of conformance shall specify which options have been chosen, and which recommendations have been followed. 3 Normative references The following normative documents contain provisions which, through reference in this text, constitute provisions of thi
25、s International Standard. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. However, parties to agreements based on this International Standard are encouraged to investigate the possibility of applying the most recent editions of the normative d
26、ocuments indicated below. For undated references, the latest edition of the normative document referred to applies. Members of ISO and IEC maintain registers of currently valid International Standards. ISO/IEC 10646-1, Information technology Universal Multiple-Octet Coded Character Set (UCS) Part 1:
27、 Architecture and Basic Multilingual Plane ISO/IEC 646:1991, Information technology ISO 7-bit coded character set for information interchangeISO 15919:2001(E) 2 ISO 2001 All rights reserved 4 Terms and definitions For the purposes of this International Standard, the following terms and definitions a
28、pply. 4.1 conversion representing graphic characters from a source script by the graphic characters of a target script, most commonly by romanization NOTE The two basic methods of conversion of a system of writing are transliteration and transcription. The use of the terms source script and target s
29、cript in transliteration is analogous to the terms source language and target language in translation. 4.2 script set of graphic characters used for the written form of one or more languages 4.3 graphic character character (other than a control character) that has a visual representation, normally h
30、andwritten, printed or displayed NOTE A graphic character is a single element of a script. Examples are letters, conjunct characters, numerical digits, punctuation marks or diacritical marks. 4.4 reverse transliteration process whereby the characters of a target script are transliterated into those
31、of the source script NOTE This International Standard aims to enable reverse-transliterated text to be identical to the original source text up to equivalent orthography. However, non-reversible transcription-like transliterations are often found to be useful when quoting recent material. 4.5 romani
32、zation conversion of non-Latin graphic characters into Latin graphic characters, using either transliteration or transcription 4.6 transcription representation of the sounds of a source language by graphic characters associated with a target language 4.7 transliteration representation of the graphic
33、 characters of a source script by the graphic characters of a target script NOTE In transcription, pronunciation conventions are of primary importance, while in transliteration, writing conventions are of primary importance. 4.8 UCS Universal Multiple-Octet Coded Character Set (UCS) as defined in IS
34、O/IEC 10646-1 NOTE 1 The Indic scripts listed in ISO/IEC 10646-1:1993 form a subset (with identical codes) of the Indic scripts listed in ISO/IEC 10646-1:2000. Similarly, the Indic scripts listed in the Unicode standard (version 1.0 onwards) form a subset (with identical codes) to the Indic scripts
35、listed in ISO/IEC 10646-1:2000 and the Unicode standard, version 3.0. Any of these standards provide valid character codes for the specific characters concerned. NOTE 2 ISO/IEC 10646-1 is increasingly used for providing character identifiers in a wide range of International Standards, including some
36、 in this International Standard. Use of these identifiers does not impose any requirements to use ISO/IEC 10646-1 or any other character coding standard to represent either the source characters or the target characters in any computer system or in information interchange.ISO 15919:2001(E) ISO 2001
37、All rights reserved 3 5 Abbreviated terms Ben. Bengali script Dev. Devanagari script Guj. Gujarati script Gur. Gurmukhi script Kan. Kannada script Mal. Malayalam script Ori. Oriya script Tam. Tamil script Tel. Telugu script Sin. Sinhala script P-A. Perso-Arabic script 6 Characteristics of Indic scri
38、pts Characters in Indic scripts represent vowels, consonants and their combinations; nasalization, breathings, numerals and punctuation. Each vowel has a full form (occupying a full character space in text, and required when beginning a word or in vowel hiatus) and a combining form (mtr) used when t
39、he vowel follows a consonant, except that the short a standing at the beginning of Indic alphabets has only a full form, because no mtr is required (see below). Consonants include stops, semivowels, spirants, and other speech sounds. Stop consonants are arranged in classes, or vargas, according to t
40、he point of articulation, and within each class are subdivided into unvoiced or voiced, unaspirated or aspirated consonants, and a nasal consonant. Characters for consonants are most simply quoted in a form which includes the inherent vowel a,a si nt h ef i r s t consonant ka in Table 1. The inheren
41、t vowel is removed by the virma sign of the relevant script (Dev., Ben., Guj., Gur., Ori. ,T a m . ,T e l . ,K a n . ,M a l . ,S i n . . AThe relevant mtr is used when any other vowel follows a consonant. Consonant clusters frequently form conjunct characters. Use of virma to form consonant clusters
42、 is unusual, except in Tamil where it is the normal method. When a mtr is associated with a consonant, it replaces the inherent vowel. Mtrs have various forms, even in a single script, and details may be found in dictionaries and grammars. It is important to note that many Indic characters have vari
43、ant forms. Such differences of orthography are not distinguished in this International Standard. Devanagari is used for writing various modern languages, such as Hindi, Marathi, Rajasthani and other languages in India, and Nepali in Nepal. Devanagari and most of the other Indic scripts are used for
44、writing classical languages often used in religious texts, such as the Sanskrit and Vedic languages, and Pali. In some cases, text in Indic scripts uses additional characters for writing words in languages which do not normally use these scripts. Thus some Urdu consonants are typically represented b
45、y adding a dot (nuqta) below certain letters (see Table 1, normative annex C and informative annex D). Two English vowels may also be represented. Devanagari has also been extended to write South Indian languages.ISO 15919:2001(E) 4 ISO 2001 All rights reserved Sinhala script (used in Sri Lanka) has
46、 additional letters, in comparison with the scripts which are used in India, Nepal and Bangladesh. Tamil script (used in South India and also in Sri Lanka) uses fewer characters, in comparison with other scripts which are used in India, Nepal, Bangladesh and Sri Lanka. When the Bengali script is use
47、d to write the Assamese language (in parts of North India), two characters not used in writing Bengali are required. Hence the Assamese script is sometimes regarded as separate from the Bengali script. 7 Transliteration tables 7.1 The transliteration from each Indic script to the Latin script shall
48、be as specified in the Tables 1 to 10 and A.3, subject to the rules specified in 8.1 and the options specified in clause 9. 7.2 The structure of the transliteration tables is explained in the following paragraphs. The target characters (Latin script) fall within the ranges 0020-01FF and 0300-0332 of
49、 ISO/IEC 10646-1:2000. The repertoires for many of the source characters fall within the following ranges of ISO/IEC 10646-1:2000, for the script concerned: 0900-097F Devanagari 0980-09FF Bengali 0A00-0A7F Gurmukhi 0A80-0AFF Gujarati 0B00-0B7F Oriya 0B80-0BFF Tamil 0C00-0C7F Telugu 0C80-0CFF Kannada 0D00-0D7F Malayalam 0D80-0DFF Sinhala Some additional Indic scripts whose character repertoires are included in the character repertoires of these scripts are listed in