ImageVerifierCode 换一换
格式:PDF , 页数:52 ,大小:1.14MB ,
资源ID:590637      下载积分:10000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-590637.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(CAN CSA-ISO IEC 15444-6A-2008 Information technology JPEG 2000 image coding system Part 6 Compound image file format AMENDMENT 1 Hidden text metadata.pdf)为本站会员(李朗)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

CAN CSA-ISO IEC 15444-6A-2008 Information technology JPEG 2000 image coding system Part 6 Compound image file format AMENDMENT 1 Hidden text metadata.pdf

1、 Reference numberISO/IEC 15444-6:2003/Amd.1:2007(E)ISO/IEC 2007Information technology JPEG 2000 image coding system Part 6: Compound image file format AMENDMENT 1: Hidden text metadata Technologies de linformation Systme de codage dimage JPEG 2000 Partie 6: Format de fichier dimage de composant AMEN

2、DEMENT 1: Mtadonnes de texte cach Amendment 1:2008 toNational Standard of CanadaCAN/CSA-ISO/IEC 15444-6:04Amendment 1:2007 to International Standard ISO/IEC 15444-6:2003 has been adopted without modification(IDT) as Amendment 1:2008 to CSA Standard CAN/CSA-ISO/IEC 15444-6:04. This Amendment wasrevie

3、wed by the CSA Technical Committee on Information Technology (TCIT) under the jurisdiction of theStrategic Steering Committee on Information Technology and deemed acceptable for use in Canada.September 2008 International Organization for Standardization (ISO), 2007. All rights reserved. Internationa

4、l Electrotechnical Commission (IEC), 2007. All rights reserved. NOT FOR RESALE. ISO/IEC 15444-6:2003/Amd.1:2007(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces

5、which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems

6、 Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely eve

7、nt that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT ISO/IEC 2007 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or

8、 mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org Web w

9、ww.iso.org ii ISO/IEC 2007 All rights reservedISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved iiiForeword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. Na

10、tional bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual intere

11、st. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. International Standards are drafted in accordance with t

12、he rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval

13、by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Amendment 1 to ISO/IEC 15444-6:2003 was

14、 prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information. ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved 1Information technology JPEG 2000 image coding system Part 6: Compoun

15、d image file format AMENDMENT 1: Hidden text metadata Add the following normative references to 2.2: IETF RFC 1950, ZLIB Compressed Data Format Specification version 3.3, May 1996 IETF RFC 1951, DEFLATE Compressed Data Format Specification version 1.3, May 1996 IETF RFC 2045, Multipurpose Internet M

16、ail Extensions (MIME) Part One: Format of Internet Message Bodies IETF RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, August 1998 W3C, Cascading Style Sheets, level 1 (CSS1) Specification, http:/www.w3.org/pub/WWW/TR/REC-CSS1 W3C, Cascading Style Sheets, level 2 (CSS2) Specification,

17、http:/www.w3.org/TR/REC-CSS2 W3C, HTML 4.01 Specification, http:/www.w3.org/TR/html401 W3C, XHTML 1.0 Extensible HyperText Markup Language, Second Edition, http:/www.w3.org/TR/xhtml1 W3C, XML Schema Part 0: Primer, Second Edition, http:/www.w3.org/TR/xmlschema-0 W3C, XML Schema Part 1: Structures, S

18、econd Edition, http:/www.w3.org/TR/xmlschema-1 W3C, XML Schema Part 2: Datatypes, Second Edition, http:/www.w3.org/TR/xmlschema-2 Add the following terms and definitions to Clause 3: 3.23 hidden text symbolic representation for the characters and words found in an image 3.24 annotation particular re

19、gion of a page in a JPM document that has associated a URL reference, a note or a highlight 3.25 hidden text XML XML data which describe hidden text and annotations for a single page in a JPM file and which conform to the schema in Annex H 3.26 compressed hidden text XML hidden text XML data compres

20、sed using the mechanisms defined in F.2 ISO/IEC 15444-6:2003/Amd.1:2007(E) 2 ISO/IEC 2007 All rights reserved3.27 hidden text UUID box UUID box containing compressed hidden text XML 3.28 hidden text XML Schema XML Schema for hidden text XML, as defined in H.1 Add the following abbreviations to Claus

21、e 4: HTX Hidden Text XML Add the following subclause after 5.2.8: 5.3 Hidden Text Metadata Hidden text metadata is data representing the text, text elements and text flow associated with an image. In the context of this standard, hidden text is associated with a particular region of a page in a JPM

22、document. Common uses for hidden text include text searching and highlighting, cut-and-paste, and text-to-speech processing. Hidden text describes the flow of the text on a page as well as the text elements. JPM allows a rich, multiple content-type representation of a document. Each region of a page

23、 may be encoded with a compression technique best suited to its characteristics. In regions containing text, high fidelity reproduction of the source image is retained by not replacing the text regions with a character-based rendition through OCR, but rather by using advanced coding methods such as

24、JBIG2. Even OCR results with a 99 percent accuracy contain substantial numbers of errors per page which require expensive human labour to correct. The searchable nature of a character-based rendition can be obtained instead by associating hidden “dirty OCR“ results with the corresponding text image.

25、 This standard defines a format for hidden text metadata. A key issue with hidden text is capturing the ambiguities seen by the OCR engine in a way that allows properly-constructed search engines to find whether and where a given word might be present in a text image. Properly captured, this informa

26、tion provides nearly as much searching precision as an approach using human-corrected “clean OCR“ data, but at much lower cost. Search results are most useful where there are fewer false positives to weed through. Intelligent search engines can take account of such data as confidence and alternate c

27、haracters or alternate words to appropriately alter the ranking of search hits on less certain characters. In many cases, true ambiguity exists in the image and it would confuse a human observer as well. In these cases, saving confidence values for characters and their alternatives or describing sev

28、eral alternative parsings of a string of characters into words can amount to saving the state of the OCR process to allow the problem to be revisited in a later stage, perhaps by a different engine or by access to first a general dictionary and then a set of more specialized dictionaries. As a last

29、step, when a person is presented with the search results, they can dismiss a given search hit by comparison to the actual image data for a character or word. For this purpose (and to allow later-stage OCR processes to resume analysis on the image), bounding box rectangles can be defined for all the

30、elements of the hidden text such as characters, words, lines, paragraphs and regions. By indicating a container relationship among these items, intelligent navigation and text selection can occur at character, word, line, paragraph boundaries. A reading order through these rectangles can be defined

31、for what was in the image just a random placement of unrelated glyphs. While it is primarily designed for use by machines such as search engines, the hidden text can also serve as a crude (if “dirty“) or adequate (if “clean“) alternate representation for an image region to allow it to display on cha

32、racter-based devices (such as mobile phones) or small-area graphics devices (such as PDAs). ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved 3Annotations are added to the document typically with a WYSIWYG editor to indicate URL references, notes, and to highlight key sections of t

33、he document text. Each annotation is associated with a particular region of a page in a JPM document. XML is used for hidden text and annotations because it is a format widely used to store structured information, and can be machine processed. Renumber the original 5.3 as 5.4. Add the following rows

34、 at the correct alphabetical location in Table A.1 of A.4: Table A.1 Boxes defined or referenced within this International Standard Box name Type Superbox Comments (Informative) Hidden Text Metadata htxb (0x68747862) Yes This optional box contains hidden text and annotations. HTX Reference Box phtx

35、(0x70687478) No This optional box can be used to point to Hidden Text Metadata box contents at top file level. Add the following subclauses after B.6.4: B.6.5 Hidden Text Metadata box (superbox) Box type: htxb (0x68747862) Container: Page box or File Mandatory: No Quantity: At most one if the contai

36、ner is the Page box, any number if the container is the file Location: Anywhere in the Page box after the Page Header box if the container is the Page box, or anywhere after the File Type box if the container is the file The Hidden Text Metadata box (htxb) serves as a container for hidden text data.

37、 It is a superbox that may contain an optional Label box and must contain one of two box types. It may either contain one XML box containing hidden text metadata, or it may contain one UUID box containing hidden text metadata as specified in F.2. The type of a Hidden Text Metadata box shall be htxb

38、(0x68747862). The contents of a Hidden Text Metadata box shall be as in Figure B.25: or Figure B.25 Organization of the contents of a Hidden Text Metadata box ISO/IEC 15444-6:2003/Amd.1:2007(E) 4 ISO/IEC 2007 All rights reservedB.6.6 HTX Reference box Box type: phtx (0x70687478) Container: Page box

39、Mandatory: No Quantity: At most one Location: Anywhere in the Page box after the Page Header box If the hidden text for a page is contained in a Hidden Text Metadata box within the corresponding Page box, this box must not appear. If the hidden text for a page is contained in a series of one or more

40、 Hidden Text Metadata boxes at the file level, one HTX reference box has to be included in the corresponding Page box. The type of a HTX Reference box shall be phtx (0x70687478). The contents of a HTX Reference box shall be as in Figure B.26: Figure B.26 Organization of the contents of a HTX Referen

41、ce box Rtyp: Referenced box type. This field specifies the actual type (as would be found in the TBox field in an actual box header) of the box referenced by this HTX Reference box. However, a reader shall not attempt to locate a physically stored box header for the box represented by this HTX Refer

42、ence box, as it is legal to use a HTX Reference box to create a new box that is not contiguously contained in other locations within this or other files, and thus the box header will not exist. flst: Fragment List box. This box specifies the actual locations of the fragments of the referenced HTX el

43、ement. When those fragments are concatenated, in order, as specified by the Fragment List box definition, the resulting byte-stream shall be the contents of the referenced HTX element, which contains hidden text data, and shall not include the box header fields. The format of the Fragment List box i

44、s specified in B.5.1.1. If Rtyp is uuid and the UUID signals deflate compression as defined in F.2, the number of fragments of the Fragment List box must be one. label: Label box. This optional box may contain a Label box which specifies a label or name for the hidden text of the corresponding page.

45、 The structure of a Label box is specified in B.6.3. Table B.31 HTX Reference box contents data structure values Parameter Size (bits) Value Rtyp 32 See Table B.32 flst Variable Variable label Variable Variable Table B.32 Legal Rtyp values Value Meaning xml40 The referenced HTX data shall be contain

46、ed in an XML box as described in Annex F. The XML box is defined in I.7.1 of ITU-T Rec T.800 (2002) | ISO/IEC 15444-1:2004. uuid The referenced HTX data shall be contained in a UUID box as described in Annex F. The UUID box is defined in I.7.2 of ITU-T Rec T.800 (2002) | ISO/IEC 15444-1:2004. All ot

47、her values reserved ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved 5Renumber the original B.6.5 as B.6.7. Add the following annexes after Annex E: Annex F (normative) Hidden Text and Annotations Storage F.1 Storage of HTX in JPM A hidden text XML element is restricted to represe

48、nt text for a single page. It is stored in a Hidden Text Metadata box as defined in B.6.5. The Hidden Text Metadata box either appears within the corresponding Page box or is placed at the top level of the file. If placed on top level, an HTX Reference box as defined in B.6.6 must be placed in the c

49、orresponding Page box to point to the Hidden Text Metadata boxes that composes the hidden text of the page. When a Hidden Text Metadata box is small in size, it is reasonable to place it directly in Page box. In keeping with the usual JPM approach, large objects are generally placed at the top file level. In this case, the much smaller HTX Reference box is placed in the page box and points to the actual data. Also in this case a single HTX Reference box can point to multiple file level Hidden Text Metad

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1