1、AIIM MS55 94 1012348 0500450 DOT ANS I/A I I M MS 5 5-1 9 9 4 Standard Recommended Practice for the Identifcation and Indexing of Page Components (Zones) for Automated Processing in an Electronic Image Nlanagement (EM) Environnient Standard Approved As tsEiE9 January 4. 1994 Q, e EI 7 NIM ri, u3 cn
2、5 - s v) Silver Spring, Maryland 20910-5603 z Association for Information and Image Management 1100 Wayne Avenue, Suite 1100 Te I e p hone 301 /587-8202 e a AIIM MS55 94 LOL2348 0500453 T4b m ANSVAIIM MS55-1994 Standard for Information and Image Management - Standard Recommended Practice for the Ide
3、ntification and Indexing of Page Components (Zones) for Automated Processing in an Electronic Image Management (EIM) Environment Association for Information and Image Management Abstract This document recommends the identification and indexing of page components (zones) within documents for use in a
4、utomated processing within the context of electronic image management (EIM) systems. AIIM MS55 94 1012348 0500452 982 Contents Foreword i O 1 Audience, scope and purpose 1 2 Normative references . 1 3 Definitions . 2 4 Information kept to describe a document 2 5 Information kept to describe a page.
5、. .3 6 Information kept to describe a zone. . .3 7 8 Example of a zone structure record. . .6 Figures 1 x- and y-length of a page 3 2 Page orientation .3 3 Zone location and size . 3 4 Zones and zone structure records 6 Syntax of zone structure record . .5 Table 1 Data and interpretation for Figure
6、4 7 Foreword (This foreword is not a part of the Ameri- can National Standard for Information and Image Management - Standard Recommended Practice for the Identijication and Indexing of Page Components (Zbnes) for Automated Processing in un Electronic Image Management (EIM) Environment, ANSI/AIIM Th
7、is Standard Recommended Practice specifies the iden- tification and indexing of page components (zones) with- in documents for use in automated processing within the context of electronic image management systems. This indexing information is relevant to three structural lev- els within a document:
8、Document, Page and Zone (an area within a single page). An AIIM Standards Committee, C15.8 Electronic Im- aging Software and Systems/Document Indexing, is de- veloping an ANSI/AIIM technical report (TR) on indexing issues in electronic image management (EIM) systems. Upon completion, this TR will be
9、 related to ANSUAIIM MS55. The same AIIM subcommittee, C15.8, is also drafting another Standard Recommended Practice covering Electronic Folder Interchange, which will also be a related American National Standard to MS55. An ANWAIIM Technical Report that is in development will relate to this Standar
10、d Recommended Practice. It is ANSUAIIM TR34, Technical Report for Information and Image Management - Methods for Analyzing User Requirements for Image Compression and for Select- ing an Appropriate Image Compression Method to Match User Requirements. Suggestions for improvement of this Standard Reco
11、m- mended Practice are welcome. They should be sent to MS55-1994. ) the Chairman of the AIIM Standards Board, Associa- tion for Information and Image Management, 1100 Wayne Avenue, Suite 1100, Silver Spring, MD 20910. The AIIM Standards Board had the following members at the time it approved this St
12、andard Recommended Practice: Marilyn E. Courtot, Thomas C. Bagg Chair Avi Bender Jewel M. Drass John C. Gale Bruce A. Holroyd Charles A. Plesums George Thoma Stephen Urban Eileen Usovicz Herbert J. White Association for Information and Image Management National Institute of Standards and Technology
13、CONTEL Federal Systems Bell the more heterogeneous a collection of documents, the less likely it is that this Standard Recommended Practice will be useful. 1.4 Exclusions This Standard Recommended Practice does not address issues of image preprocessing, such as page deskewing and forms removal. It i
14、s assumed that deskewing, if re- quired, is applied prior to the location of zones and that forms removal, if required, is applied prior to data in- terpretation. The record described in this Standard Recommended Practice is independent of other file management and file descriptor records associated
15、 with electronic image management. Examples of other record types excluded from, and independent of, this Standard Recommended Practice are: - Image header records or image file directory records describing how an image file is internally structured (resolution, size, tiling, compression type, photo
16、met- ric interpretation, etc.) - Image header records or image file directory records providing background, business description and au- dit trail information used for search, retrieval and management of image files as business records. 2 Normative references All standards and publications are subje
17、ct to revision. When the following documents are superseded by an ap- proved revision, that revision may apply. 1 AIIM MS55 94 = 1012348 0500455 691 2.1 Referenced international standards CCITT Recommendation T.411 (1992): IS0 8613-1 : 1992. Information Technology - Open Document Architecture (ODA)
18、and interchange format - Introduc- tion and general principles. 2.2 Related international standards CCITT Recommendation T.412 (1992): IS0 86 13-2: 1992. Information Technology - Open Document Architecture (ODA) and interchange format - Document structures. CCITT Recommendation T.414 (1992): IS0 861
19、3-4: 1992. Information Technology - Open Document Architecture (ODA) and interchange format - Document profile. CCITT Recommendation T.415 (1992): IS0 86 13-5 : 1992. Information Technology - Open Document Interchange Format (ODIF). CCITT Recommendation T.416 (1992): IS0 8613-6: 1992. Information Te
20、chnology - Open Document Architecture (ODA) and interchange format - Charac- ter content architectures. CCITT Recommendation T.417 (1992): IS0 8613-7: 1992. Information Technology - Open Document Architecture (ODA) and interchange format - Raster graphics content architectures. CCITT Recommendation
21、T4.418 (1992): IS0 8613-8: 1992. Information Technology - Open Document Architecture (ODA) and interchange format - Geomet- ric graphics content architectures. 2.3 Related American national standards ANSUAIIM MS44-1988 (R1993), Standard Recom- mended Practice for quality control of image scanners. A
22、NSUAIIM MS53- 1993, Standard Recommended Prac- tice - file format for storage and exchange of images - bi-level image file format: part 1. 2.4 Referenced publications ANSUAIIM TR2- 1992, Technical Report for Informa- tion and Image Management - Glossary of imaging tech- nology. 2.5 Related publicati
23、ons ANSI/AIIM TR2 1 - 199 1, Technical Report for Informa- tion and Image Management - Recommendation for identifying information to be placed on write-once-read- many (WORM) and rewritable optical disk cartridge label(s) and OD cartridge packaging (shipping con- tainers). ANSUAIIM TR25- 1990, Techn
24、ical Report for Informa- tion and Image Management - The use of optical disks for public records. 3 Definitions The following definitions apply to terms that appear in this Standard Recommended Practice. Other terms are defined in AIIM TR2, Technical Report for Information and Image Management - Glo
25、ssary of Imaging Tech- nology. 3.1 document: A collection of zero or more pages that are related (linked/bound) to each other in some way appropriate to the application. In an electronic image management system, the provision of a zero-page docu- ment allows the creation of a document entity prior t
26、o capturing and linking its page(s). 3.2 origin of page: The origin of a page is its upper left corner after the page is rotated so that its relative presentation orientation is O degrees, i.e., page origin is the upper left corner when a page is positioned for normal human reading. 3.3 page: A page
27、 is equivalent to one side of a 2-dimensional sheet (e.g., paper, microfilm, transparen- cies, etc.). In the case of input media other than paper, a page will be the data in a single image frame. (See ODA-8613, Part 1 .) Note: A single sheet of paper print- ed on both sides and folded to form a mult
28、i-page docu- ment should be treated as either two, four or six pages, as appropriate. No minimum or maximum size is implied. 3.4 zone: A rectangular sub-area on a page, within the bounds of which all data is to be treated (interpreted, compressed, OCRed, stored, discarded, etc.) in the same way. A z
29、one cannot be larger than a page or smaller than a pixel (a single scanning resolution element). 4 Information kept to describe a document The following information regarding document struc- ture is kept once for each document type. 4.1 Page count The cardinal number of pages in a document after sca
30、n- ning. Page count starts at 1, (e.g., 16 pages is encoded as 0016). 4.2 Measurement units Length: 4 ASCII bytes, O-padding to the left Length: 4 ASCII bytes, 0-padding to the left Page and zone sizes may be measured in inches or cen- timeters. Only one measurement type can be specified throughout
31、a document. Two values are defined: 0002: inches 0003: centimeters 2 AIIM MS55 94 = LOL23Lt8 0500456 528 AT 5 Information kept to describe a page The following information regarding page structure is 5.1 Page size (x,y) kept once for each page within each document type. Length: 8 ASCII bytes, three
32、implied decimal places, with O-padding to the left and right, as necessary The x-length and y-length (in measurement units), where x is relative to the horizontal and y is relative to the ver- tical movement of the scan head. As an example of page size, an 8.5” x 11” page would be (x,y) = (oooO8500,
33、 O001 1000), or a 21.6cm x 27.9cm (x,y) = (00021600, 00027900). Page size is relative to page orientation as scanned and is independent of intended display orienta- tion (e.g., portrait or landscape). 1V D -i I-t 1 j h x-length I1 Figure 1 - x- and y-length of a page a 5.2 Page orientation The orien
34、tation of the scanned page relative to the ideal orientation for human examination. Length: 4 ASCII bytes, 0-padding to the left O000 0090 page was scanned at the intended display orientation; the origin is at top left. page was scanned 90 degrees clockwise from the intended display orientation; ori
35、- gin is at top right. page was scanned 180 degrees from the in- tended display orientation; origin is at bot- tom right. O180 0270 page was scanned 270 degrees clockwise from the intended display orientation; ori- gin is at bottom left. 5.3 Zone count The number of zones defined for the page. Zones
36、 do not have to cover (tile) the page completely. Zones can overlap. It is possible, for example, to de- fine a single zone covering the entire page (for purposes of compression) and to define one or more smaller zones within the single large zone (for purposes of perform- ing specialized processing
37、). 6 Information kept to describe a zone The following information regarding zone structure is kept once for each zone within each page. 6.1 Zone location (x,y) Length: 4 ASCII bytes, 0-padding to the left Length: 8 ASCII bytes, with three implied decimal places, 0-padding to the left and right as n
38、ecessary The (x, y) distance (in measurement units) of the up- per left corner of the zone relative to the page origin. Orient the page at O degrees before determining the x- and y-axes. See section 5.1 for a coding example. zone location (y) zone location size 4 zone size (x) Figure 3 - Zone locati
39、on and size 6.2 Zone size (x,y) Length: 8 ASCII bytes, with three implied decimal places, 0-padding to the left and right as necessary L I I I orientation = O orientation = 90 Figure 2 - Page orientation 3 I .I Ia orientation = 180 orientation = 270 AIIM US55 74 m 3032348 0500457 464 The (x, y) size
40、 of the zone (in measurement units). The x and y directions are always positive, relative to the page origin. See section 5.1 for a coding example, 6.3 Zone orientation Length: 4 ASCII bytes, 0-padding to the left The orientation of the zone relative to the page at its intended display orientation.
41、This means that the page has to be logically reoriented so that its origin is rela- tive to the upper left before identifying the zone orien- tation. Zone orientation handles page layouts that include orien- tation of pictures and captions that are rotated (e.g., for purposes of document folding). Z
42、one orientation is represented in the same manner as in page orientation (See section 5.2). 6.4 Spatial resolution (x,y) Length: 8 ASCII bytes, with three implied decimal places, 0-padding to the left and right as necessary The (x,y) spatial scanning resolution for the zone in dots- per-measurement
43、unit. This is independent from zone to zone. Using the conversion factor of 1 inch equals 2.54 cm, then 100 dpi equals 39.37 dpcm. 6.5 Gray-scale resolution The number of bits required to store distinct gray-values at each pixel in the scanned zone. This is also referred to as “depth”. (Assume bit-p
44、acked, i.e., no bit- padding.) This is independent from zone to zone. Some typical values are: Length: 4 ASCII bytes, 0-padding to the left O001 bitonal (bi-level) scan 0006 64-grey-levei image 0008 256-gray-level image 6.6 Zone content Length: 4 ASCII bytes, 0-padding to the left The zone content c
45、ode that distinguishes among types of raster images based on the automated interpretation that may be applicable. It ranges from uninterpreted raster data to several other forms of information that are inherently more interpretable. Some examples of in- terpretation include OCR, ICR, HCR and convers
46、ion to vector formats. Even within a character-recognition ap- plication, input in certain zones may be constrained to numeric only. As well, some CAD or GIS applications may be able to (re)construct topology, while others may not be able to do so. O00 1 0002 0003 0004 0005 0006 0007 0008 0009 0010
47、O01 1 0012 0013 raster image, no interpretation machine print, alphanumeric machine print, numeric only machine print, alpha only handprint, alphanumeric handprint, numeric only handprint, alpha only handwriting , alphanumeric handwriting, numeric only handwriting, alpha only O signature line art, w
48、ithout preservation of topology line art, with topology (e.g., “intersection”, “inside of”, etc.) 4 AI11 IS55 94 1012348 0500458 3T0 7 Syntax of zone structure record Each document type is described by an ASCII string of the following structure: .I .1 .n .k .1 .rn where: := : := : : = := := := := :
49、:= : := := O , , , , , (ASCII coded) (ASCII coded) (ASCII coded) rea1,real (ASCII coded) (ASCII coded) zonecount , i 0001 10002 10003 i 0004 t 0005 i 0006 i 0007 I 0008 I0009 I0010 I 001 1 I0012 I O013 5 The stacking order of zones, for purposes of reconstructing a page, is “on-top-of - i.e., if Zone J and J+l overlap spatially, then Zone J+l is on top of Zone J for display and printing purposes. On-top-of zones are opaque - i.e., the zone under does not show through. AIIM MS55 94 1012348 0500459 237 = 8 Example of a zone structure record The following example sh