CAN CSA-ISO IEC 15444-6A-2008 Information technology JPEG 2000 image coding system Part 6 Compound image file format AMENDMENT 1 Hidden text metadata.pdf

上传人:李朗 文档编号:590637 上传时间:2018-12-15 格式:PDF 页数:52 大小:1.14MB
下载 相关 举报
CAN CSA-ISO IEC 15444-6A-2008 Information technology  JPEG 2000 image coding system  Part 6 Compound image file format AMENDMENT 1 Hidden text metadata.pdf_第1页
第1页 / 共52页
CAN CSA-ISO IEC 15444-6A-2008 Information technology  JPEG 2000 image coding system  Part 6 Compound image file format AMENDMENT 1 Hidden text metadata.pdf_第2页
第2页 / 共52页
CAN CSA-ISO IEC 15444-6A-2008 Information technology  JPEG 2000 image coding system  Part 6 Compound image file format AMENDMENT 1 Hidden text metadata.pdf_第3页
第3页 / 共52页
CAN CSA-ISO IEC 15444-6A-2008 Information technology  JPEG 2000 image coding system  Part 6 Compound image file format AMENDMENT 1 Hidden text metadata.pdf_第4页
第4页 / 共52页
CAN CSA-ISO IEC 15444-6A-2008 Information technology  JPEG 2000 image coding system  Part 6 Compound image file format AMENDMENT 1 Hidden text metadata.pdf_第5页
第5页 / 共52页
亲,该文档总共52页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、 Reference numberISO/IEC 15444-6:2003/Amd.1:2007(E)ISO/IEC 2007Information technology JPEG 2000 image coding system Part 6: Compound image file format AMENDMENT 1: Hidden text metadata Technologies de linformation Systme de codage dimage JPEG 2000 Partie 6: Format de fichier dimage de composant AMEN

2、DEMENT 1: Mtadonnes de texte cach Amendment 1:2008 toNational Standard of CanadaCAN/CSA-ISO/IEC 15444-6:04Amendment 1:2007 to International Standard ISO/IEC 15444-6:2003 has been adopted without modification(IDT) as Amendment 1:2008 to CSA Standard CAN/CSA-ISO/IEC 15444-6:04. This Amendment wasrevie

3、wed by the CSA Technical Committee on Information Technology (TCIT) under the jurisdiction of theStrategic Steering Committee on Information Technology and deemed acceptable for use in Canada.September 2008 International Organization for Standardization (ISO), 2007. All rights reserved. Internationa

4、l Electrotechnical Commission (IEC), 2007. All rights reserved. NOT FOR RESALE. ISO/IEC 15444-6:2003/Amd.1:2007(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces

5、which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems

6、 Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely eve

7、nt that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT ISO/IEC 2007 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or

8、 mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web w

9、ww.iso.org ii ISO/IEC 2007 All rights reservedISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved iiiForeword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. Na

10、tional bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual intere

11、st. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. International Standards are drafted in accordance with t

12、he rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval

13、by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Amendment 1 to ISO/IEC 15444-6:2003 was

14、 prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information. ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved 1Information technology JPEG 2000 image coding system Part 6: Compoun

15、d image file format AMENDMENT 1: Hidden text metadata Add the following normative references to 2.2: IETF RFC 1950, ZLIB Compressed Data Format Specification version 3.3, May 1996 IETF RFC 1951, DEFLATE Compressed Data Format Specification version 1.3, May 1996 IETF RFC 2045, Multipurpose Internet M

16、ail Extensions (MIME) Part One: Format of Internet Message Bodies IETF RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, August 1998 W3C, Cascading Style Sheets, level 1 (CSS1) Specification, http:/www.w3.org/pub/WWW/TR/REC-CSS1 W3C, Cascading Style Sheets, level 2 (CSS2) Specification,

17、http:/www.w3.org/TR/REC-CSS2 W3C, HTML 4.01 Specification, http:/www.w3.org/TR/html401 W3C, XHTML 1.0 Extensible HyperText Markup Language, Second Edition, http:/www.w3.org/TR/xhtml1 W3C, XML Schema Part 0: Primer, Second Edition, http:/www.w3.org/TR/xmlschema-0 W3C, XML Schema Part 1: Structures, S

18、econd Edition, http:/www.w3.org/TR/xmlschema-1 W3C, XML Schema Part 2: Datatypes, Second Edition, http:/www.w3.org/TR/xmlschema-2 Add the following terms and definitions to Clause 3: 3.23 hidden text symbolic representation for the characters and words found in an image 3.24 annotation particular re

19、gion of a page in a JPM document that has associated a URL reference, a note or a highlight 3.25 hidden text XML XML data which describe hidden text and annotations for a single page in a JPM file and which conform to the schema in Annex H 3.26 compressed hidden text XML hidden text XML data compres

20、sed using the mechanisms defined in F.2 ISO/IEC 15444-6:2003/Amd.1:2007(E) 2 ISO/IEC 2007 All rights reserved3.27 hidden text UUID box UUID box containing compressed hidden text XML 3.28 hidden text XML Schema XML Schema for hidden text XML, as defined in H.1 Add the following abbreviations to Claus

21、e 4: HTX Hidden Text XML Add the following subclause after 5.2.8: 5.3 Hidden Text Metadata Hidden text metadata is data representing the text, text elements and text flow associated with an image. In the context of this standard, hidden text is associated with a particular region of a page in a JPM

22、document. Common uses for hidden text include text searching and highlighting, cut-and-paste, and text-to-speech processing. Hidden text describes the flow of the text on a page as well as the text elements. JPM allows a rich, multiple content-type representation of a document. Each region of a page

23、 may be encoded with a compression technique best suited to its characteristics. In regions containing text, high fidelity reproduction of the source image is retained by not replacing the text regions with a character-based rendition through OCR, but rather by using advanced coding methods such as

24、JBIG2. Even OCR results with a 99 percent accuracy contain substantial numbers of errors per page which require expensive human labour to correct. The searchable nature of a character-based rendition can be obtained instead by associating hidden “dirty OCR“ results with the corresponding text image.

25、 This standard defines a format for hidden text metadata. A key issue with hidden text is capturing the ambiguities seen by the OCR engine in a way that allows properly-constructed search engines to find whether and where a given word might be present in a text image. Properly captured, this informa

26、tion provides nearly as much searching precision as an approach using human-corrected “clean OCR“ data, but at much lower cost. Search results are most useful where there are fewer false positives to weed through. Intelligent search engines can take account of such data as confidence and alternate c

27、haracters or alternate words to appropriately alter the ranking of search hits on less certain characters. In many cases, true ambiguity exists in the image and it would confuse a human observer as well. In these cases, saving confidence values for characters and their alternatives or describing sev

28、eral alternative parsings of a string of characters into words can amount to saving the state of the OCR process to allow the problem to be revisited in a later stage, perhaps by a different engine or by access to first a general dictionary and then a set of more specialized dictionaries. As a last

29、step, when a person is presented with the search results, they can dismiss a given search hit by comparison to the actual image data for a character or word. For this purpose (and to allow later-stage OCR processes to resume analysis on the image), bounding box rectangles can be defined for all the

30、elements of the hidden text such as characters, words, lines, paragraphs and regions. By indicating a container relationship among these items, intelligent navigation and text selection can occur at character, word, line, paragraph boundaries. A reading order through these rectangles can be defined

31、for what was in the image just a random placement of unrelated glyphs. While it is primarily designed for use by machines such as search engines, the hidden text can also serve as a crude (if “dirty“) or adequate (if “clean“) alternate representation for an image region to allow it to display on cha

32、racter-based devices (such as mobile phones) or small-area graphics devices (such as PDAs). ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved 3Annotations are added to the document typically with a WYSIWYG editor to indicate URL references, notes, and to highlight key sections of t

33、he document text. Each annotation is associated with a particular region of a page in a JPM document. XML is used for hidden text and annotations because it is a format widely used to store structured information, and can be machine processed. Renumber the original 5.3 as 5.4. Add the following rows

34、 at the correct alphabetical location in Table A.1 of A.4: Table A.1 Boxes defined or referenced within this International Standard Box name Type Superbox Comments (Informative) Hidden Text Metadata htxb (0x68747862) Yes This optional box contains hidden text and annotations. HTX Reference Box phtx

35、(0x70687478) No This optional box can be used to point to Hidden Text Metadata box contents at top file level. Add the following subclauses after B.6.4: B.6.5 Hidden Text Metadata box (superbox) Box type: htxb (0x68747862) Container: Page box or File Mandatory: No Quantity: At most one if the contai

36、ner is the Page box, any number if the container is the file Location: Anywhere in the Page box after the Page Header box if the container is the Page box, or anywhere after the File Type box if the container is the file The Hidden Text Metadata box (htxb) serves as a container for hidden text data.

37、 It is a superbox that may contain an optional Label box and must contain one of two box types. It may either contain one XML box containing hidden text metadata, or it may contain one UUID box containing hidden text metadata as specified in F.2. The type of a Hidden Text Metadata box shall be htxb

38、(0x68747862). The contents of a Hidden Text Metadata box shall be as in Figure B.25: or Figure B.25 Organization of the contents of a Hidden Text Metadata box ISO/IEC 15444-6:2003/Amd.1:2007(E) 4 ISO/IEC 2007 All rights reservedB.6.6 HTX Reference box Box type: phtx (0x70687478) Container: Page box

39、Mandatory: No Quantity: At most one Location: Anywhere in the Page box after the Page Header box If the hidden text for a page is contained in a Hidden Text Metadata box within the corresponding Page box, this box must not appear. If the hidden text for a page is contained in a series of one or more

40、 Hidden Text Metadata boxes at the file level, one HTX reference box has to be included in the corresponding Page box. The type of a HTX Reference box shall be phtx (0x70687478). The contents of a HTX Reference box shall be as in Figure B.26: Figure B.26 Organization of the contents of a HTX Referen

41、ce box Rtyp: Referenced box type. This field specifies the actual type (as would be found in the TBox field in an actual box header) of the box referenced by this HTX Reference box. However, a reader shall not attempt to locate a physically stored box header for the box represented by this HTX Refer

42、ence box, as it is legal to use a HTX Reference box to create a new box that is not contiguously contained in other locations within this or other files, and thus the box header will not exist. flst: Fragment List box. This box specifies the actual locations of the fragments of the referenced HTX el

43、ement. When those fragments are concatenated, in order, as specified by the Fragment List box definition, the resulting byte-stream shall be the contents of the referenced HTX element, which contains hidden text data, and shall not include the box header fields. The format of the Fragment List box i

44、s specified in B.5.1.1. If Rtyp is uuid and the UUID signals deflate compression as defined in F.2, the number of fragments of the Fragment List box must be one. label: Label box. This optional box may contain a Label box which specifies a label or name for the hidden text of the corresponding page.

45、 The structure of a Label box is specified in B.6.3. Table B.31 HTX Reference box contents data structure values Parameter Size (bits) Value Rtyp 32 See Table B.32 flst Variable Variable label Variable Variable Table B.32 Legal Rtyp values Value Meaning xml40 The referenced HTX data shall be contain

46、ed in an XML box as described in Annex F. The XML box is defined in I.7.1 of ITU-T Rec T.800 (2002) | ISO/IEC 15444-1:2004. uuid The referenced HTX data shall be contained in a UUID box as described in Annex F. The UUID box is defined in I.7.2 of ITU-T Rec T.800 (2002) | ISO/IEC 15444-1:2004. All ot

47、her values reserved ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved 5Renumber the original B.6.5 as B.6.7. Add the following annexes after Annex E: Annex F (normative) Hidden Text and Annotations Storage F.1 Storage of HTX in JPM A hidden text XML element is restricted to represe

48、nt text for a single page. It is stored in a Hidden Text Metadata box as defined in B.6.5. The Hidden Text Metadata box either appears within the corresponding Page box or is placed at the top level of the file. If placed on top level, an HTX Reference box as defined in B.6.6 must be placed in the c

49、orresponding Page box to point to the Hidden Text Metadata boxes that composes the hidden text of the page. When a Hidden Text Metadata box is small in size, it is reasonable to place it directly in Page box. In keeping with the usual JPM approach, large objects are generally placed at the top file level. In this case, the much smaller HTX Reference box is placed in the page box and points to the actual data. Also in this case a single HTX Reference box can point to multiple file level Hidden Text Metad

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 标准规范 > 国际标准 > 其他

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1