ECMA 114-2000 8-Bit Single-Byte Coded Graphic Character Sets Latin Arabic Alphabet《8位元单一位元组码化图形字元集拉丁字母 阿拉伯字母 第2版》.pdf

上传人:dealItalian200 文档编号:704605 上传时间:2019-01-03 格式:PDF 页数:28 大小:139.83KB
下载 相关 举报
ECMA 114-2000 8-Bit Single-Byte Coded Graphic Character Sets Latin Arabic Alphabet《8位元单一位元组码化图形字元集拉丁字母 阿拉伯字母 第2版》.pdf_第1页
第1页 / 共28页
ECMA 114-2000 8-Bit Single-Byte Coded Graphic Character Sets Latin Arabic Alphabet《8位元单一位元组码化图形字元集拉丁字母 阿拉伯字母 第2版》.pdf_第2页
第2页 / 共28页
ECMA 114-2000 8-Bit Single-Byte Coded Graphic Character Sets Latin Arabic Alphabet《8位元单一位元组码化图形字元集拉丁字母 阿拉伯字母 第2版》.pdf_第3页
第3页 / 共28页
ECMA 114-2000 8-Bit Single-Byte Coded Graphic Character Sets Latin Arabic Alphabet《8位元单一位元组码化图形字元集拉丁字母 阿拉伯字母 第2版》.pdf_第4页
第4页 / 共28页
ECMA 114-2000 8-Bit Single-Byte Coded Graphic Character Sets Latin Arabic Alphabet《8位元单一位元组码化图形字元集拉丁字母 阿拉伯字母 第2版》.pdf_第5页
第5页 / 共28页
点击查看更多>>
资源描述

1、Standard ECMA-1142ndEdition - December 2000Standardizing Information and Communication SystemsPhone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http:/www.ecma.ch - Internet: helpdeskecma.ch8-Bit Single-Byte CodedGraphic Character SetsLatin/Arabic Alphabet.Standard ECMA-1142ndEdition - December

2、2000Standardizing Information and Communication SystemsPhone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http:/www.ecma.ch - Internet: helpdeskecma.chMB ECMA-114.DOC 20-12-00 14,388-Bit Single-Byte CodedGraphic Character SetsLatin/Arabic Alphabet.Brief HistoryThe adoption of Standard ECMA-6 (IS

3、O 646) in 1965 as the agreed international 7-bit code for informationinterchange has led to the development of many national, international and application-oriented versions of this codewhich have been in wide use for quite some time.These versions had a number of limitations generally inherent to t

4、he size of the code: they did not provide all graphic characters which may be needed, for some characters, specially for accented letters, it was necessary to resort to BACKSPACE sequences, whichcreated problems when processing data containing such composite characters, interchange among different v

5、ersions was practically limited to the 82 common graphic characters.With the advent of 8-bit coding it was possible to increase the number of graphic characters. ISO 6937/2, forexample, provided a character set covering the requirements of most languages based on the Latin alphabet. Thischaracter se

6、t, although well suited for text communication, was difficult to use for processing as some graphiccharacters were represented by one and others by two bit combinations. Thus, the need was recognized for codedgraphic character sets, each of which: is the same for all users of a given area, provides

7、single-byte coding of all graphic characters thus permitting easy processing, takes into account character sets used in the industry.Since 1982 the urgency of the need for an 8-bit single-byte coded character set was recognized in ECMA as well as inANSI/X3L2 and numerous working papers were exchange

8、d between the two groups. In February 1984 ECMA TC1submitted to ISO/TC97/SC2 (which has become ISO/IEC JTC 1/SC2 in 1987) a proposal for such a coded characterset. At its meeting of April 1984 SC2 decided to propose a new item of work for this topic. Technical discussionsduring and after this meetin

9、g led TC1 to adopt the coding scheme proposed by X3L2. International Standard ISO/IEC8859-1 is based on this joint ANSI/ECMA proposal. ECMA published its corresponding Standard ECMA-94 inMarch 1985.After this first publication, the work of ECMA TC1 on further coded graphic character sets has led to

10、the followingresults:i. The present Standard ECMA-114 for a Latin/Arabic coded graphic character set. In developing this ECMAStandard TC1 closely co-operated with the relevant groups and committees of ASMO, the Arab Organization forStandardization and Metrology, of ATU, the Arab Telecommunication Un

11、ion, and of different Arabic countries.This 2ndEdition has been developed to keep it fully aligned with the new edition of ISO/IEC 8859-6.ii. The second edition of Standard ECMA-94 comprising four coded graphic character sets for the Latin script,identified as Latin Alphabets No. 1 to No. 4. These a

12、lphabets have a number of characters in common, inparticular those allocated to columns 02 to 07. These four Latin Alphabets have been submitted to ISO/IEC andJTC 1 and have become Parts 1 to 4 of ISO/IEC 8859.iii. A series of ECMA Standards for coded graphic character sets comprising those characte

13、rs of the Latin Alphabetsallocated to columns 02 to 07 and characters of another script for multiple-language applications. These ECMAStandards cover the Cyrillic, Greek and Hebrew scripts. These ECMA Standards ECMA-113, ECMA-118 andECMA-121, resp., have become Parts 5, 7 and 8, resp., of ISO/IEC 88

14、59.iv. Latin Alphabets No. 5 and No. 6 have been published as ECMA-128 and ECMA-144, resp. They have becomeParts 9 and 10, resp., of ISO/IEC 8859.This ECMA Standard has been adopted as 2ndEdition of Standard ECMA-114 by the ECMA General Assembly ofDecember 2000.- i -Table of contents1Scope 12 Confor

15、mance 12.1 Conformance of information interchange 12.2 Conformance of devices 12.2.1 Device description 12.2.2 Originating devices 12.2.3 Receiving devices 13 References 14 Definitions 24.1 bit combination 24.2 byte 24.3 character 24.4 code table 24.5 coded character set; code 24.6 coded-character-d

16、ata-element (CC-data-element) 24.7 graphic character 24.8 graphic symbol 24.9 position 25 Notation, code table and names 25.1 Notation 25.2 Layout of the code table 35.3 Names and meanings. 35.3.1 SPACE (SP) 35.3.2 NO-BREAK SPACE (NBSP) 35.3.3 SOFT HYPHEN (SHY) 36 Specification of the coded characte

17、r set 36.1 Characters of the set and their coded representation 46.2 Code table 87 Identification of the character set 97.1 Identification according to ECMA-35 and ECMA-43 97.2 Identification using the ISO International register of coded character sets to be used with escapesequences 10Annex A - Cov

18、erage of languages 11Annex B - Main differences between the first edition and this second edition of ECMA-114 13Annex C - Bibliography 15Annex D - Identification according to ISO/IEC 8824-1 (ASN.1) 17- ii -.1ScopeThis ECMA Standard specifies a set of 146 coded graphic characters identified as the La

19、tin/Arabic alphabet.This set of coded graphic characters is intended for use in data and text processing applications and also forinformation interchange. The set contains graphic characters used for general purpose applications in typicaloffice environments in at least the following languages:Arabi

20、c, English and Latin.Some of the characters in this set are combining characters (see clause 6).This set of coded graphic characters may be regarded as a version of an 8-bit code according to StandardECMA-35 or Standard ECMA-43 at level 1.This ECMA Standard may not be used with any other ECMA Standa

21、rds for 8-bit single-byte coded graphiccharacter sets. If coded characters from more than one ECMA Standard are to be used together, by means ofcode extension techniques, the equivalent coded character sets from ISO/IEC 10367 should be used insteadwithin a version of Standard ECMA-43 at level 2 or l

22、evel 3.The coded characters in this set may be used in conjunction with coded control functions selected fromECMA-48. However, control functions are not used to create composite graphic symbols from two or moregraphic characters (see clause 6).NOTEThis ECMA Standard is not intended for use with Tele

23、matic services defined by ITU-T. If information codedaccording to this ECMA Standard is to be transferred to such services, it will have to conform to therequirements of those services at the access-point.2 Conformance2.1 Conformance of information interchangeA coded-character-data-element (CC-data-

24、element) within coded information for interchange is inconformance with this ECMA Standard if all the coded representations of graphic characters within thatCC-data-element conform to the requirements of clause 6.2.2 Conformance of devicesA device is in conformance with this ECMA Standard if it conf

25、orms to the requirements of 2.2.1, and eitheror both of 2.2.2 and 2.2.3. A claim of conformance shall identify the document which contains thedescription specified in 2.2.1.2.2.1 Device descriptionA device that conforms to this ECMA Standard shall be subject of a description that identifies the mean

26、sby which the user may supply characters to the device, or may recognize them when they are madeavailable to him, as specified respectively in 2.2.2 and 2.2.3.2.2.2 Originating devicesAn originating device shall allow its user to supply any sequence of characters from those specified inclause 6, and

27、 shall be capable of transmitting their coded representations within a CC-data-element.2.2.3 Receiving devicesA receiving device shall be capable of receiving and interpreting any coded representations of charactersthat are within a CC-data-element, and that conform to clause 6, and shall make the c

28、orrespondingcharacters available to its user in such a way that the user can identify them from among those specifiedthere, and can distinguish them from each other.3 ReferencesECMA-6 7-Bit Input/Output Coded Character SetECMA-35 Code Extension Techniques- 2 -ECMA-43 8-Bit Coded Character Set Struct

29、ure and RulesECMA-48 Control Functions for Coded Character SetsECMA-94 8-Bit Single-Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4ECMA-113 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Cyrillic AlphabetECMA-118 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Greek

30、 AlphabetECMA-121 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Hebrew AlphabetECMA-128 8-Bit Single-Byte Coded Graphic Character Sets - Latin Alphabet No. 5ECMA-144 8-Bit Single-Byte Coded Graphic Character Sets - Latin Alphabet No. 6ASMO 449 7-Bit Coded Arabic Character Set for Informatio

31、n Interchange4 DefinitionsFor the purpose of this Standard the following definitions apply.4.1 bit combinationAn ordered set of bits used for the representation of characters.4.2 byteA bit string that is operated upon as a unit.4.3 characterA member of a set of elements used for the organization, co

32、ntrol, or representation of data.4.4 code tableA table showing the characters allocated to each bit combination in a code.4.5 coded character set; codeA set of unambiguous rules that establishes a character set and the one-to-one relationship between thecharacters of the set and their bit combinatio

33、ns.4.6 coded-character-data-element (CC-data-element)An element of interchanged information that is specified to consist of a sequence of coded representationsof characters, in accordance with one or more identified standards for coded character sets.4.7 graphic characterA character, other than a co

34、ntrol function, that has a visual representation normally hand-written, printed ordisplayed, and that has a coded representation consisting of one or more bit combinations.4.8 graphic symbolA visual representation of a graphic character or of a control function.4.9 positionThat part of a code table

35、identified by its column and row co-ordinates.5 Notation, code table and names5.1 NotationThe bits of the bit combinations of the 8-bit code are identified by b8, b7, b6, b5, b4, b3, b2and b1, where b8is the highest-order, or most-significant bit and b1is the lowest-order, or least-significant bit.T

36、he bit combinations may be interpreted to represent numbers in binary notation by attributing thefollowing weights to the individual bits:- 3 -Bit b8b7b6b5b4b3b2b1Weight 128 64 32 16 8 4 2 1Using these weights, the bit combinations are identified by notations of the form xx/yy, where xx and yyare nu

37、mbers in the range 00 to 15. The correspondence between the notations of the form xx/yy and the bitcombinations consisting of the bits b8to b1is as follows: xx is the number represented by b8, b7, b6and b5where these bits are given the weights 8, 4, 2, and 1,respectively. yy is the number represente

38、d by b4, b3, b2and b1where these bits are given the weights 8, 4, 2, and 1,respectively.The bit combinations are also identified by notations of the form hk, where h and k are numbers in therange 0 to F in hexadecimal notation. The number h is the same as the number xx described above, and thenumber

39、 k the same as the number yy described above.5.2 Layout of the code tableAn 8-bit code table consists of 256 positions arranged in 16 columns and 16 rows. The columns and therows are numbered 00 to 15. In hexadecimal notation the columns and the rows are numbered 0 to F.The code table positions are

40、identified by notations of the form xx/yy, where xx is the column number andyy is the row number. The column and row numbers are shown at the top and left edges of the table,respectively. The code table positions are also identified by notations of the form hk, where h is the columnnumber and k is t

41、he row number in hexadecimal notation. The column and row numbers are shown at thebottom and right edges of the table, respectively.The positions of the code table are in one-to-one correspondence with the bit combinations of the code. Thenotation of a code table position, of the form xx/yy, or of t

42、he form hk, is the same as that of thecorresponding bit combination.5.3 Names and meanings.This ECMA Standard assigns a unique name and a unique identifier to each graphic character. These namesand identifiers have been taken from ISO/IEC 10646-1. This ECMA Standard also specifies an acronym foreach

43、 of the characters SPACE, NO-BREAK SPACE and SOFT HYPHEN. For acronyms only Latin capitalletters A to Z are used. It is intended that the acronyms be retained in all translations of the text.Except for SPACE (SP), NO-BREAK SPACE (NBSP) and SOFT HYPHEN (SHY), this ECMA Standarddoes not define and doe

44、s not restrict the meanings of graphic characters.This ECMA Standard specifies a graphic symbol for each graphic character. This symbol is shown in thecorresponding position of the code table. However, this Standard does not specify a particular style or fontdesign for imaging graphic characters.5.3

45、.1 SPACE (SP)A graphic character the visual representation of which consists of the absence of a graphic symbol.5.3.2 NO-BREAK SPACE (NBSP)A graphic character the visual representation of which consists of the absence of a graphic symbol, foruse when a line break is to be prevented in the text as pr

46、esented.5.3.3 SOFT HYPHEN (SHY)A graphic character that is imaged by a graphic symbol identical with, or similar to, that representingHYPHEN, for use when a line break has been established within a word.6 Specification of the coded character setThis ECMA Standard specifies 146 characters allocated t

47、o the bit combinations of the code table (table 2).Some of these characters are combining characters. They are identified in table as such.- 4 -NOTECombining characters are described in ECMA-35, subclause 6.3.3.The coded representation of a combining character shall follow that of the base character

48、 with which it isassociated. Any combining character may be associated with any non-combining character in the ranges 12/01to 13/10 and 14/01 to 14/10 (hexadecimal C1 to DA and E1 to EA).Control functions, such as BACKSPACE or CARRIAGE RETURN, shall not be used to create compositegraphic symbols, wh

49、ich are made up from the graphic representations of two or more characters.NOTEThere is only one set of DIGITS in this ECMA Standard. How these will be imaged is a matter of localconventions. In the code table, graphic symbols for the most common styles of writing digits are given next toeach other. In this way data communication between various Arabic writing countries remains possiblewithout code conversion.6.1 Characters of the set and their coded representationSee table 1.Table 1 - Character set, coded representationBitcombina-tionHex Identifier Name02/00 20 U+0020 SPACE02/0

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 标准规范 > 国际标准 > 其他

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1