1、BRITISH STANDARD BS ISO/IEC 11558:1992 Implementation of ISO/IEC11558:1992 Information technology Data compression for information interchange Adaptivecoding with embedded dictionary DCLZ Algorithm UDC 681.3:621.39.519.688BSISO/IEC11558:1992 This British Standard, having been prepared under the dire
2、ctionof the Information Systems Technology StandardsPolicy Committee, waspublished under the authorityof the Standards Boardand comes into effecton 15December1992 BSI 04-2000 The following BSI references relate to the work on this standard: Committee referenceIST/4 Draft for comment91/65414DC ISBN 0
3、 580 21434 6 Committees responsible for this BritishStandard The preparation of this British Standard was entrusted by the Information Systems Technology Standards Policy Committee (IST/-) to Technical CommitteeIST/4, upon which the following bodies were represented: British Computer Society ICI Ima
4、gedata Institution of Electrical Engineers International Computers Ltd. Kodak Ltd. Coopted member Amendments issued since publication Amd. No. Date CommentsBSISO/IEC11558:1992 BSI 04-2000 i Contents Page Committees responsible Inside front cover National foreword ii Foreword iii Text of ISO/IEC 1155
5、8 1BSISO/IEC11558:1992 ii BSI 04-2000 National foreword This British Standard reproduces verbatimISO/IEC11558:1992 and implements it as the UK national standard. This British Standard is published under the direction of the Information Systems Technology Standards Policy Committee whose Technical Co
6、mmitteeIST/4 has the responsibility to: aid enquirers to understand the text; present to the responsible international committee any enquiries on interpretation, or proposals for change, and keep UK interests informed; monitor related international and European developments and promulgate them in th
7、e UK. NOTEInternational and European Standards, as well as overseas standards, are available from BSI Sales Department, BSI, Linford Wood, Milton Keynes, MK146LE. A British Standard does not purport to include all the necessary provisions of a contract. Users of British Standards are responsible for
8、 their correct application. Compliance with a British Standard does not of itself confer immunity from legal obligations. Summary of pages This document comprises a front cover, an inside front cover, pagesi andii, theISO/IEC title page, pagesii toiv, pages1to7 and a back cover. This standard has be
9、en updated (see copyright date) and may have had amendments incorporated. This will be indicated in the amendment table on the inside front cover.ISO/IEC11558:1992 (E) ii BSI 04-2000 Contents Page Foreword iii Introduction 1 1 Scope 1 2 Conformance 1 3 Normative references 1 4 Definitions 1 4.1 Code
10、 Value 1 4.2 Codeword 1 4.3 compression ratio 1 4.4 dictionary 1 4.5 empty state 1 4.6 frozen state 1 5 Conventions and notations 1 6 Algorithm identifier 1 7 DCLZ compression algorithm 1 7.1 Overview 1 7.2 Principle of operation 2 7.2.1 Compilation of the dictionary 2 7.2.2 Frozen dictionary 2 7.2.
11、3 Resetting the dictionary to the empty state 2 7.2.4 Boundaries 2 7.2.5 Re-creation of the dictionary 3 7.3 Code Values 3 7.3.1 Control Codes 3 7.3.2 Encoded Bytes 3 7.3.3 Dictionary Codes 3 7.4 Codewords 3 Annex A (informative) Example of a generic DCLZ algorithm 4 Annex B (informative) Example of
12、 Code Values output for a given input stream 7 Annex C (informative) Bibliography 7 Table A.1 Code Value Generator 5 Table A.2 Codeword Generator 6 Descriptors: Data processing, information interchange, data compression, data representation, coded representation, algorithms.ISO/IEC11558:1992 (E) BSI
13、 04-2000 iii Foreword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of InternationalStandards th
14、rough technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, al
15、so take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IECJTC1. Draft InternationalStandards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an InternationalStandard require
16、s approval by at least75% of the national bodies casting a vote. International Standard ISO/IEC11558 was prepared by the European Computer Manufacturers Association (as StandardECMA-151) and was adopted, under a special “fast-track procedure”, by Joint Technical CommitteeISO/IECJTC1, Information tec
17、hnology, in parallel with its approval by national bodies ofISO andIEC. Annex A toAnnex C of this InternationalStandard are for information only. Patents During the preparation of the ECMA standard, information was gathered on patents upon which application of the standard might depend. Relevant pat
18、ents were identified as belonging to Hewlett Packard Limited. However, neither ECMA norISO/IEC can give authoritative or comprehensive information about evidence, validity or scope of patent and like rights. The patent holders have stated that licences will be granted under reasonable and non-discri
19、minatory terms. Communications on this subject should be addressed to Hewlett Packard Limited Computer Peripherals Bristol Filton road Stoke Gifford Bristol BS12 6QZ UnitedKingdomiv blankISO/IEC11558:1992 (E) BSI 04-2000 1 Introduction In the past decades ISO/IEC have published numerous Internationa
20、lStandards for magnetic tapes, magnetic tape cassettes and cartridges, as well as for optical disk cartridges. Those media developed recently have a very high physical recording density. In order to make an optimal use of the resulting data capacity, compression algorithms have been designed which a
21、llow a reduction of the number of bits required for the representation of user data in coded form. In future, these compression algorithms will be registered by an International Registration Authority to be established byISO/IEC. The registration will consist in allocating to each registered algorit
22、hm a numerical identifier. For a recorded medium this should be included in the recorded format to indicate which compression algorithm(s) has been used. This InternationalStandard is the first of a series of InternationalStandards for compression algorithms. 1 Scope This International Standard spec
23、ifies a lossless compression algorithm to reduce the number of bits required to represent information coded by means of8-bit bytes. This algorithm is known asDCLZ (Data Compression according to Lempel andZiv). This International Standard specifies neither the strategy for resetting the dictionary no
24、r that for freezing it, as these are implementation-dependent. This algorithm is particularly useful when information has to be recorded on an inter-changeable medium. Its use is not limited tothis application. 2 Conformance A compression algorithm shall be in conformance with this InternationalStan
25、dard if its output data stream satisfies the requirements of clause7. 3 Normative reference The following standard contains provisions which, through reference in this text, constitute provisions of this InternationalStandard. At the time of publication, the edition indicated was valid. All standard
26、s are subject to revision, and parties to agreements based on this InternationalStandard are encouraged to investigate the possibility of applying the most recent edition of the standard listed below. Members ofIEC andISO maintain registers of currently valid International Standards. ISO/IEC 11576,
27、Information technology Procedure for the registration of algorithms for the lossless compression of data. 4 Definitions 4.1 code value an integer in the range0 to4095 that is generated by the compression algorithm 4.2 codeword a set of9,10,11 or12 consecutive bits in the output stream to express a C
28、ode Value in binary form 4.3 compression ratio the number of bits in the input stream of the compression algorithm divided by the number of bits in the output stream of the compression algorithm 4.4 dictionary a table, comprising3832 entries, that is used to retain strings of bytes selected from the
29、 input stream. Each entry is identified by a unique Code Value that is greater than263 4.5 empty state the state in which no data is in the dictionary 4.6 frozen state the state in which no further data shall be added to the dictionary 5 Notations and acronyms Numbers in this InternationalStandard a
30、re expressed in decimal notation. EOR: end of record. 6 Algorithm identifier The numeric identifier of this algorithm in the International Register is32. 7 DCLZ compression algorithm 7.1 Overview The DCLZ compression algorithm shall accept information input, in the form of a stream of8-bit data byte
31、s, and shall output Codewords, in the formof a stream of bits which are organised into8-bitbytes. The algorithm shall identify repetition of byte strings in the input stream and shall exclude such redundancy from the output stream.ISO/IEC11558:1992 (E) 2 BSI 04-2000 With many types of information ge
32、nerated, transmitted or recorded by electronic information processing systems and equipment, the degree of repetition in data is sufficiently high to permit the output stream to contain significantly fewer bits than the input stream. Under degenerate circumstances, however, the output stream may con
33、tain more bits than the input stream. The compression ratio achieved in practice is dependent on the characteristics of the actual input data stream. Compression by this algorithm is lossless,i.e. it is possible to restore exactly the original representation of data by means of a complementary decom
34、pression algorithm. The algorithm contains features which aid its implementation in data storage and retrieval equipment which handles, in a sequential manner, data records of varying length. 7.2 Principle of operation The fundamental principle of operation is the compilation of a dictionary of stri
35、ngs of bytes which occur in the input stream, the use of that dictionary to detect repetition, and the generation of a Codeword for each repeated string. The Codeword expresses a Code Value which is the reference to the dictionary entry for the repeated string. 7.2.1 Compilation of the dictionary Pr
36、ior to the commencement of operation of the algorithm, the dictionary shall be reset to the empty state (seealso7.3.1.2). The algorithm shall examine the input stream and shall search for the first occurrence of a unique pair or a unique string. A unique pair is a2-byte string which has not yet been
37、 allocated a dictionary entry. A unique string ofn bytes (n2) is one which has not yet been allocated a dictionary entry; however, the firstn-1bytes shall have been already allocated a dictionary entry. The maximum length of a string for which a dictionary entry can be allocated shall be128bytes. Up
38、on encountering a unique pair, the algorithm shall output a Codeword which expresses the Code Value for the first byte of the pair. Upon encountering a unique string ofn bytes, the algorithm shall output a Codeword which expresses the Code Value for the firstn1 bytes of the string. It shall then ent
39、er the unique pair or unique string into the dictionary and assign the next unused Code Value to the entry, provided that the dictionary is not frozen(see7.2.2) and thatn does not exceed128. Starting with the2nd byte of the current unique pair or the last byte of the current unique string, the algor
40、ithm shall then continue to examine the input stream and search for the next unique pair or unique string. 7.2.2 Frozen dictionary The dictionary shall be considered to be in the frozen state in the following cases: all available Code Values have been assigned; the implementation of the algorithm ha
41、s decided not to enter a unique pair or a unique string into the dictionary, for example because the search for free space in the dictionary takes too much time. The only means by which the dictionary may be removed from the frozen state is by being reset to the empty state (seealso7.3.1.1). 7.2.3 R
42、esetting the dictionary to the empty state The algorithm is permitted to reset the dictionary to the empty state at any time, provided that all bytes which have been input to the algorithm have been expressed by Codewords. The algorithm may, for example, choose to reset the dictionary if the current
43、 degree of compression is not adequate because the current dictionary entries do not reflect the current repetition characteristics of the input stream to a sufficient extent. 7.2.4 Boundaries Within the input stream, natural boundaries may exist between collections of bytes. For example, the stream
44、 may consist of a sequence of records, each comprising one or more bytes; in such a case, a natural boundary exists between records. The algorithm shall provide a means for identifying such boundaries in the output stream, so that they are recognized and re-constituted by a decompression algorithm.
45、Such identification shall be achieved by the output of the EOR Codeword, (see7.3.1.4), followed by a Codeword which expresses the Code Value for the single byte, pair of bytes or string of bytes which is being held temporarily for the purpose of examining the input stream for a unique pair or a uniq
46、ue string. Examination of the input stream shall then continue from the first byte of the next record. The result is that the data between boundaries in the input stream is wholly represented by Codewords between corresponding boundaries in the output stream. Such a boundary in the output stream is
47、considered to be located at the end of the pad bits that follow the Codeword that follows the EOR Codeword.ISO/IEC11558:1992 (E) BSI 04-2000 3 7.2.5 Re-creation of the dictionary The dictionary is not itself included in the output stream as a distinct item. Any appropriate decompression algorithm wi
48、ll re-create the dictionary and restore the original representation of the data from the output stream of the compression algorithm. 7.3 Code Values Code Values in the range0 to7 shall designate Control Codes. See7.3.1. Code Values in the range8 to263 shall designate Encoded Bytes. See7.3.2. Code Va
49、lues in the range264 to4095 shall designate Dictionary Codes. See7.3.3 7.3.1 Control Codes Four Control Codes are defined, with Code Values0,1,2 and3 as described below. Values in the range4 to7 shall not be used. 7.3.1.1 Dictionary Frozen This Control Code shall have the Code Value0. It shall indicate that the dictionary has been frozen. It is not mandatory for the algorithm to output this Code Value. It may be output at any time after the algorithm has decided to freeze the dictionary, provided that all bytes which have been input to the algorithm hav