1、BRITISH STANDARD BS EN 14603:2004 Information technology Alphanumeric glyph image set for optical character recognition OCR-B Shapes and dimensions of the printed image The European Standard EN 14603:2004 has the status of a British Standard ICS 35.040; 37.080 BS EN 14603:2004 This British Standard
2、was published under the authority of the Standards Policy and Strategy Committee on 17 January 2005 BSI 17 January 2005 ISBN 0 580 45292 1 National foreword This British Standard is the official English language version of EN 14603:2004. It supersedes BS 5464-2:1977 which is withdrawn. The UK partic
3、ipation in its preparation was entrusted to Technical Committee ISTIECCT/-, Information systems technology international and european coordination committee, which has the responsibility to: A list of organizations represented on this committee can be obtained on request to its secretary. Cross-refe
4、rences The British Standards which implement international or European publications referred to in this document may be found in the BSI Catalogue under the section entitled “International Standards Correspondence Index”, or by using the “Search” facility of the BSI Electronic Catalogue or of Britis
5、h Standards Online. This publication does not purport to include all the necessary provisions of a contract. Users are responsible for its correct application. Compliance with a British Standard does not of itself confer immunity from legal obligations. aid enquirers to understand the text; present
6、to the responsible international/European committee any enquiries on the interpretation, or proposals for change, and keep the UK interests informed; monitor related international and European developments and promulgate them in the UK. Summary of pages This document comprises a front cover, an insi
7、de front cover, the EN title page, pages 2 to 33 and a back cover. The BSI copyright notice displayed in this document indicates when the document was last issued. Amendments issued since publication Amd. No. Date CommentsEUROPEANSTANDARD NORMEEUROPENNE EUROPISCHENORM EN14603 December2004 ICS Englis
8、hversion InformationtechnologyAlphanumericglyphimagesetfor opticalcharacterrecognitionOCRBShapesanddimensions oftheprintedimage TechnologiesdelinformationJeudimagesdeglyphe alphanumriquepourlareconnaissanceoptiquede caractresOCRBFormesetdimensionsdelimage imprime ThisEuropeanStandardwasapprovedbyCEN
9、on17June2004. CENmembersareboundtocomplywiththeCEN/CENELECInternalRegulationswhichstipulatetheconditionsforgivingthisEurope an Standardthestatusofanationalstandardwithoutanyalteration.Uptodatelistsandbibliographicalreferencesconcernings uchnational standardsmaybeobtainedonapplicationtotheCentralSecr
10、etariatortoanyCENmember. ThisEuropeanStandardexistsinthreeofficialversions(English,French,German).Aversioninanyotherlanguagemadebytra nslation undertheresponsibilityofaCENmemberintoitsownlanguageandnotifiedtotheCentralSecretariathasthesamestatusast heofficial versions. CENmembersarethenationalstanda
11、rdsbodiesofAustria,Belgium,Cyprus,CzechRepublic,Denmark,Estonia,Finland,France, Germany,Greece,Hungary,Iceland,Ireland,Italy,Latvia,Lithuania,Luxembourg,Malta,Netherlands,Norway,Poland,Portugal, Slovakia, Slovenia,Spain,Sweden,SwitzerlandandUnitedKingdom. EUROPEANCOMMITTEEFORSTANDARDIZATION COMITEUR
12、OPENDENORMALISATION EUROPISCHESKOMITEEFRNORMUNG ManagementCentre:ruedeStassart,36B1050Brussels 2004CEN Allrightsofexploitationinanyformandbyanymeansreserved worldwideforCENnationalMembers. Ref.No.EN14603:2004:EEN 14603:2004 (E) 2 Foreword This document (EN 14603:2004) has been prepared by Technical
13、Committee CEN/TC 304, “Information and communication technologies European localization requirements”, the secretariat of which is held by SIS. This European Standard shall be given the status of a national standard, either by publication of an identical text or by endorsement, at the latest by June
14、 2005, and conflicting national standards shall be withdrawn at the latest by June 2005. The document is based on the International Standard ISO 1073/ll, “Alphanumeric character set for optical recognition Part ll: Character set OCR-B Shapes and dimensions of the printed image”. According to the CEN
15、/CENELEC Internal Regulations, the national standards organizations of the following countries are bound to implement this European Standard: Austria, Belgium, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg,
16、 Malta, Netherlands, Norway, Poland, Portugal, Slovakia, Slovenia, Spain, Sweden, Switzerland and United Kingdom. EN 14603:2004 (E) 3 Contents 1 Scope 5 2 Conformance5 3 Normative references5 4 Terms and definitions .5 5 Coding in OCR applications.6 6 OCR-B styles6 7 OCR-B sizes .6 8Typical dimensio
17、ns of the nominal printed image.7 9 OCR-B glyph image set.8 9.1 Subset 1: Minimal alphanumeric subset8 9.2 Subset 2: Basic alphanumeric subset8 9.3 Subset 3: Extended alphanumeric subset .9 9.4 Subset 4: Options subset 9 10 Index table10 10.1 Availability of glyph images .10 10.2 Identification of d
18、rawings .10 10.3 Application considerations 10 11 Use of diacritical marks 18 11.1 Diacritical mark repertoire18 11.2 Composite glyph images 18 11.3 Rules for glyph image combinations 18 12 Use of the LOW LINE glyph 19 13 SPACE 19 14 Glyph image shape definitions 19 14.1 Reference drawings 19 14.2 A
19、vailability of duplicates 19 14.3 Type dimensions .19 14.4 Constant-strokewidth font, size I.19 14.5 Constant-strokewidth font, size III.20 14.6 Constant-strokewidth font, size IV.20 14.7 Letterpress font, size I 20 15 Printing the letterpress and constant-strokewidth fonts 20 16 Illustration of OCR
20、-B .20 Annex A (normative) Definition of Euro sign glyph image (ISO/IEC 9541-3 syntax) 22 Annex B (informative) Main differences between ISO 1073/II-1976 and this European Standard 23 Annex C (informative) Notes on the implementation of OCR-B .24 Annex D (informative) Glyph-repertoire extension need
21、s identified in JTC 1/SC 2 revision process .25 Annex E (informative) Illustrations of reference drawings .29 Annex F (informative) Availability of reference drawings.32 Bibliography33EN 14603:2004 (E) 4 Introduction Optical Character Recognition technology, OCR, came into use in the 1960s, and some
22、 specialized OCR fonts were designed at the time. In 1976 two such fonts were formally standardized by ISO, designated OCR-A and OCR-B, in the standard ISO 1073 parts I and II, respectively. ISO 1073 was developed by the ISO Technical Committee ISO/TC97, Computers and information processing. At the
23、creation of ISO/IEC JTC 1, responsibility for ISO 1073 was transferred to JTC 1/SC 2, Coded character sets. In order to enlarge the set of characters covered by the standard, especially with special letters used in European- origin languages, a revision of the standard was initiated in 1994 by JTC 1
24、/SC 2, and progressed through three consecutive Committee Drafts. Since however testing of the proposed character set extensions could not be ac- complished, the JTC 1/SC 2 revision was discontinued in 1999. With the introduction of the Euro sign a need primarily European to add that character to th
25、e OCR-B set was recognized. CEN/TC304 therefore decided to develop an OCR-B glyph image shape for the character, verify its recognition properties, and include it in a European version of the OCR-B standard; see CEN/TC304 reports refer- enced in the Bibliography. The decided-on glyph image shape is
26、specified in Annex A. For reasons of continuity, and also to facilitate possible future CEN ISO/IEC cooperation on OCR-B, it was de- cided to use the current ISO text with only the necessary minimum of changes as a basis for the CEN standard, even though the ISO text was developed in an OCR-technolo
27、gy situation rather different from the one existing when this CEN standard is published. In particular, the ISO standard texts division into clauses was kept as far as possible, although some restructuring might have been desirable. A description of the main differences between this European Standar
28、d and ISO 1073/II is given in Annex B. Gen- eral information on the implementation of the OCR-B shapes, taken from ISO 1073/II, has been included in An- nex C. In connection with the verification of the recognition properties of the Euro sign, some limited verification was also done on special lette
29、rs identified during the JTC 1/SC 2 revision work as needed in OCR-B. The extent of this veri- fication is not sufficient for the inclusion of the letters in the OCR-B repertoire at present, but the issue is described in Annex D, as a basis for possible future inclusion work. EN 14603:2004 (E) 5 1 S
30、cope This European Standard defines a set of glyph im- ages designated OCR-B, intended primarily for use in Optical Character Recognition (OCR) appli- cations, but suitable also for visual, i.e. human, reading. It does not relate any coding scheme with these images (see clause 5). This European Stan
31、dard is based on the ISO stan- dard 1073 part II. It differs from that standard in extending normatively the set of glyph images with the Euro currency sign; but also in deleting some glyphs not relevant in present-day OCR processing. It further adds information on a number of glyph im- ages corresp
32、onding to characters specific to some European-origin languages. NOTE In ISO 1073 Part II the term “character“ is used not only in its strict sense, but also to mean the printed images used for their visual, i.e. printed, repre- sentations. In this European Standard the term “glyph image“ is used in
33、 the latter sense. This European Standard contains information on nominal dimensions for the glyph images. Toler- ances, printing quality and other characteristics of the formats needed to satisfy interchange require- ments are covered in other standards (see clause 3). The glyph image set contains
34、117 glyph images comprising digits, capital and small letters, diacritical marks, and symbols. It also contains a definition for SPACE. The diacritical marks are designed for combination with small letters to produce composite glyph im- ages complementing the basic image repertoire. 2 Conformance A
35、printing or OCR reading device is in conformance with this standard if it can generate/recognize, for either or both of the defined styles (see clause 6) and in one or more of the specified sizes (see clause 7), all or part of the specified glyph image subsets (see clause 9). A claim of conformance
36、shall specify all the images in (each of) the style(s) and size(s) generated/ recognized. Such a specification shall take the form of a reference to one of the subsets, a list of the im- ages generated/recognized, or a combination of those. Additionally, a printing or OCR reading device must claim c
37、onformance to International Standard ISO 1831 (see clause 3). Printed images produced by an OCR-B printing device are in conformance with this standard if their nominal shapes and dimensions are in accordance with their respective reference drawing(s) and, in the case of the Euro sign glyph image, w
38、ith Annex A (see clause 14); with the claimed conformance to tolerances and printing quality factors specified in standard ISO 1831 consid- ered. 3 Normative references This European Standard incorporates by dated or un- dated reference, provisions from other publications. These normative references
39、 are cited at the appropriate places in the text and the publications are listed here- after. For dated references, subsequent amendments to or revisions of any of these publications apply to this European Standard only when incorporated in it by amendment or revision. For undated references the lat
40、- est edition of the publication referred to applies. ISO 1831-1980, Printing specifications for optical char- acter recognition. ISO/IEC 9541-3:1994, Information technology Font information interchange Part 3: Glyph shape repre- sentation OCR-B character reference drawings and glyph defini- tion (s
41、ee clause 14). 4 Terms and definitions For the purposes of this European Standard, the fol- lowing terms and definitions apply: 4.1 character a member of a set of elements used for the organisa- tion, control or representation of data. 4.2 coded character set a set of characters, defined by unambigu
42、ous rules that establish the character set and the relationship between the characters of the set and their coded representa- tions. 4.3 composite glyph image An image printed on paper or any other medium in- tended for OCR applications, obtained by superimpos- ing two or more glyph images on the sa
43、me area. 4.4 glyph A recognizable abstract graphic symbol which is inde- pendent of any specific design.EN 14603:2004 (E) 6 4.5 glyph image An image of a glyph, as obtained from a glyph rep- resentation printed on paper or any other medium intended for OCR applications. NOTE The definition above of
44、“coded character set“ differs slightly from definitions in ISO/IEC standards, and the definition of “glyph image“ is more limited. The defini- tion of “composite glyph image“ is specific to this standard (at the time of its publication). 5 Coding in OCR applications This standard defines a set of gl
45、yph images, but does not specify corresponding characters, and re- lates no coding with the images. The images have been named as far as possible in the same way as the characters with corresponding glyphs in the ISO/IEC standard 10646-1 (see Bibliography), but this does not imply any normative asso
46、ciation be- tween the OCR-B glyph images according to this European Standard and the characters of either ISO/IEC 10646-1 or any other standard for coded character sets. Printing and/or OCR applications based on this European Standard must therefore define, through reference to other standards or ot
47、herwise, the set of glyph images which is available for printing and/or shall be recognized, and for each image the corre- sponding character and its coding. 6 OCR-B styles The OCR-B glyph images are defined by this stan- dard in two different styles. The “constant-strokewidth“ style is intended pri
48、mar- ily for printer equipment in which the width of the strokes of the images is less controllable. This is for instance the case for some types of mechanical printers. The “letterpress“ style is intended for printing equipment which can reproduce fine details with high accuracy. For aesthetic reasons, the strokewidths of the letterpress images are varied deliberately, and the stroke endings are specially designed. The shapes of the glyph images for the two styles are specified (with the exception of the Euro sign glyph) by reference drawings. The cons