1、DD CEN/TS15873:2009ICS 03.240; 35.240.60NO COPYING WITHOUT BSI PERMISSION EXCEPT AS PERMITTED BY COPYRIGHT LAWDRAFT FOR DEVELOPMENTPostal Services OpenStandard Interface Address Data FileFormat for OCR/VCSDictionary GenerationThis Draft for Developmentwas published under theauthority of the Standard
2、sPolicy and StrategyCommittee on 31 January2010 BSI 2010ISBN 978 0 580 64241 8Amendments/corrigenda issued since publicationDate CommentsDD CEN/TS 15873:2009National forewordThis Draft for Development is the UK implementation of CEN/TS15873:2009.This publication is not to be regarded as a British St
3、andard.It is being issued in the Draft for Development series of publications andis of a provisional nature. It should be applied on this provisional basis,so that information and experience of its practical application can beobtained.Comments arising from the use of this Draft for Development arere
4、quested so that UK experience can be reported to the internationalorganization responsible for its conversion to an international standard.A review of this publication will be initiated not later than 3 years afterits publication by the international organization so that a decision can betaken on it
5、s status. Notification of the start of the review period will bemade in an announcement in the appropriate issue of Update Standards.According to the replies received by the end of the review period,the responsible BSI Committee will decide whether to support theconversion into an international Stan
6、dard, to extend the life of theTechnical Specification or to withdraw it. Comments should be sent tothe Secretary of the responsible BSI Technical Committee at BritishStandards House, 389 Chiswick High Road, London W4 4AL.The UK participation in its preparation was entrusted to TechnicalCommittee SV
7、S/4, Postal services.A list of organizations represented on this committee can be obtained onrequest to its secretary.This publication does not purport to include all the necessary provisionsof a contract. Users are responsible for its correct application.Compliance with a British Standard cannot co
8、nfer immunityfrom legal obligations.DD CEN/TS 15873:2009TECHNICAL SPECIFICATIONSPCIFICATION TECHNIQUETECHNISCHE SPEZIFIKATIONCEN/TS 15873March 2009ICS 03.240; 35.240.60English VersionPostal Services - Open Standard Interface - Address Data FileFormat for OCR/VCS Dictionary GenerationServices postaux
9、 - Interface de standard ouvert - Format defichiers de donnes dadresses pour la gnration dudictionnaire OCR/VCSPostalische Dienstleistungen - Offene Normschnittstelle -Adressdateiformat fr die Generierung von Wrterbchernin OCR/Videocodier-SystemenThis Technical Specification (CEN/TS) was approved by
10、 CEN on 1 March 2009 for provisional application.The period of validity of this CEN/TS is limited initially to three years. After two years the members of CEN will be requested to submit theircomments, particularly on the question whether the CEN/TS can be converted into a European Standard.CEN memb
11、ers are required to announce the existence of this CEN/TS in the same way as for an EN and to make the CEN/TS availablepromptly at national level in an appropriate form. It is permissible to keep conflicting national standards in force (in parallel to the CEN/TS)until the final decision about the po
12、ssible conversion of the CEN/TS into an EN is reached.CEN members are the national standards bodies of Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland,France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, P
13、oland, Portugal,Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland and United Kingdom.EUROPEAN COMMITTEE FOR STANDARDIZATIONCOMIT EUROPEN DE NORMALISATIONEUROPISCHES KOMITEE FR NORMUNGManagement Centre: Avenue Marnix 17, B-1000 Brussels 2009 CEN All rights of exploitation in any form and by any
14、 means reservedworldwide for CEN national Members.Ref. No. CEN/TS 15873:2009: EDD CEN/TS 15873:2009CEN/TS 15873:2009 (E) 2 Contents Page Foreword 31 Introduction 42 Scope and purpose52.1 Scope 52.2 Purpose .53 Related Standards .73.1 UPU S42 74 Symbols and Abbreviations .75 XML Schema adressTree 85.
15、1 , and 95.2 Address Tree in , and .95.3 Attributes for , and 115.4 String parts in , and . 115.5 Ranges in , and 125.6 Aliases in , and . 135.7 other XML files 145.8 Linking addresses via 155.9 Project specific part of the XML schema . 166 XML Schema addressDeltaTree 186.1 Joining deltas via and fi
16、le names 196.2 Update actions , and . 197 Miscellaneous . 21Annex A 22A.1 General XML Schema part . 22A.2 Example for a project specific XML Schema part . 24A.3 Initial addressTree Example 25A.4 Update addressDeltaTree Example. 26A.5 Updated addressTree Example . 27DD CEN/TS 15873:2009CEN/TS 15873:2
17、009 (E) 3 Foreword This document (CEN/TS 15873:2009) has been prepared by Technical Committee CEN/TC 331 “Postal Services”, the secretariat of which is held by NEN. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. CEN and/or CENELE
18、C shall not be held responsible for identifying any or all such patent rights. According to the CEN/CENELEC Internal Regulations, the national standards organizations of the following countries are bound to announce this Technical Specification: Austria, Belgium, Bulgaria, Cyprus, Czech Republic, De
19、nmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland and the United Kingdom. NOTE This document has been prepared by experts from CEN/TC
20、 331 and UPU, in the framework of the Memorandum of Understanding between UPU and CEN. DD CEN/TS 15873:2009CEN/TS 15873:2009 (E) 4 1 Introduction In initial meetings of CEN/TC331/WG3 interfaces which will benefit from standardization have been identified and agreed on. Candidates for Open Interface
21、standardization are: interface between the image handler and automatic address readers or video coding places; interface from machine control to Barcode Printers; interface from machine control to Barcode Reader / Verifier; interface between scanner, image handler and machine control; file format of
22、 Sort Plan; MIS Interface (Statistics); file format of Address data files. The new intended standard deals with the file format of Address Data Files. OCR results and video coder inputs have to be verified against the “real” existing addresses in order to reach high recognition rates combined with l
23、ow error rates. For that purpose postal operators provide postal address directories to the OCR/VCS suppliers. Usually different postal operators use different file formats for these (source) directories. In typical postal automation systems these files will be processed by directory generation soft
24、ware which creates application specific loadable data. This data usually referred to as “operational directory” is heavily compressed and contains access tables tailored for the specific reading software. Usually different OCR/VCS suppliers use different operational directory formats. This standard
25、shall define a common Address Data File format for postal address directories to be provided from the postal operators to the OCR/VCS suppliers. This Address Data File format shall be designed to hold all information necessary to support address reading and video coding software including data requi
26、red for special recognition tasks e.g. forwarding applications. DD CEN/TS 15873:2009CEN/TS 15873:2009 (E) 5 2 Scope and purpose 2.1 Scope This document defines a file format for the generation of postal address directories. It is designed to hold all information necessary to support address reading
27、software including data required for forwarding applications. In typical postal automation systems these files will be processed by directory generation software which creates application specific loadable data. This data usually referred to as operational directory is heavily compressed and contain
28、s access tables tailored for the specific reading software. Not in the scope of this document are topics external to file like compression, checksums, the interface for transmission to the supplier, modification permissions, error handling on inconsistent data and undo in updates. 2.2 Purpose The fo
29、rmat has been designed with the following requirements in mind: must be able to hold the following data: addresses composed of address components (including aliases and range-data); person and organization names; address codes typically used as sort codes; links between addresses e.g. for use in for
30、warding; should not restrict character encoding; easily customizable for specific applications; should allow complete as well as incremental updates, i.e. change-only data; it must be possible to split data in multiple files for better handling. The ideas behind this format are as follows: The forma
31、t is based on XML. The basic XML structure is general. Project(the term project is used throughout this document to describe a specific application such as address data for a specific country or postal organization) specifics are coded as attributes. This should make it easier to build project indep
32、endent parsers and tools. Address data can be structured hierarchically. An address component appearing in a lot of addresses shall be written once as parent node in all addresses it is used in the XML address tree. Beyond the pure address data, there are general as well as optional project specific
33、 attributes on the level of address components and string parts. In favour of faster parser execution and smaller file sizes the names of XML elements appearing very often are short strings. DD CEN/TS 15873:2009CEN/TS 15873:2009 (E) 6 Semantics are defined only in a basic manner and have to be compl
34、eted in the project specific tailoring process. E.g. a street without numbers in the data may be interpreted as a street which has no numbers, or where all numbers are valid. Due to this users must be aware that the interoperability of this Technical Specification may be limited to be applied to the
35、 specific project. DD CEN/TS 15873:2009CEN/TS 15873:2009 (E) 7 3 Related Standards 3.1 UPU S42 1) UPU S42 is beginning with version -5 a two part standard. Part a contains concepts and the theoretical language description. Part b contains practical examples from different countries and may be supple
36、mented with new examples in some future. 2) UPU S42a defines components an address is composed of as well as postal entities which can be “described” using these address components. The standard goes into great detail in defining a globally usable set of specific address components such as “postcode
37、”, “door”, . 3) UPU S42b describes how to write an address given its constituting address components. It uses templates to describe the order, line-breaks, etc. The templates are country specific (US, Brazil, England, .) and also uses an country specific subset of the globally defined types. 4) UPU
38、S42 address components are assumed to have a type and a string. They do not have additional attributes and do not have aliases. 5) UPU S42 does not define a format for an individual address = address-component collection and does not define a format for an address directory = set of addresses. 6) UP
39、U S42 has no concept of sort codes or forwarding information. UPU S42 will not conflict with the format defined in this document as it targets at a completely different application and type of information described. The only thing in common with address data are the address-component definitions the
40、mselves. These could be used in customizing the ADF for a specific project. UPU S42s excellent glossary should be reused where applicable. 4 Symbols and Abbreviations XML eXtended Markup Language ADF Address Data File DD CEN/TS 15873:2009CEN/TS 15873:2009 (E) 8 5 XML Schema adressTree The syntax is
41、described as an XML schema, divided into a general and a project specific part. The general part of the XML Schema defines the basic structures. It uses some types and attribute groups to be defined in the project specific part. Basically the structure spans a tree of address components represented
42、by XML elements . The general part of the schema is listed in section A.1A.1. The project specific part is explained in section 5.9. This document contains also another XML Schema addressDeltaTree explained in chapter 6. The following Figure 1 shows the general structure of the XML schema. Since the
43、 project specific part does not change the general structure the diagram is independent from any project specifics. Figure 1 General data structure of the XML schema DD CEN/TS 15873:2009CEN/TS 15873:2009 (E) 9 5.1 , and Below the XML root element is one and one section. The stores a string and optio
44、nally a list of global aliases. Aliases are described in section 5.6. A version string of the data contained in the file is stored in the element. 5.2 Address Tree in , and Addresses are stored in an address tree corresponding to the XML elements , and . is the root node of this tree, address compon
45、ents stored in are the nodes and may be additional leafs. Explanation on the element follows in section 5.8. One complete address corresponds to one leafs root path. Each nodes root path identifies a partial address. Other XML elements and attributes carry additional information for the address tree
46、 node. allows to split data into multiple files. One address component in XML element holds a type mentioned in attribute tp and a string in child elements or and . Attribute tp and other optional attributes for are described in section 5.3. Other optional child elements are described in the followi
47、ng sections. In this context one address component holds just a name with a type and does not necessarily describe a real thing or place. Also abstract data like delivery point codes, sort codes or else may be stored in address components represented as XML elements . Example: Some addresses in vari
48、ous formats: In a table: Country City Street HNr GERMANY BERLIN GERMANY KONSTANZ BUECKLESTR 1 GERMANY KONSTANZ BUECKLESTR 2 GERMANY KONSTANZ BUECKLESTR 3 GERMANY KONSTANZ BUECKLESTR 4 As address tree: Figure 2 Addresses formatted as an address tree Country GERMANYCity BERLIN City KONSTANZStreet BUEC
49、KLESTRHNr 1HNr 2HNr 3HNr 4 DD CEN/TS 15873:2009CEN/TS 15873:2009 (E) 10 Address tree representation in XML: 0 GERMANY BERLIN KONSTANZ BUECKLESTR 1 2 3 4 The following verbose XML representation of this example is more like the address table. Each address is written as a child path of the element. Both XML representations are equivalent. Usage of the short form as shown above is strongly recommended, due to less file size overhead and less risk of dupl