1、Standard ECMA-151June 1991Published in electronic form in September 1999Standardizing Information and Communication SystemsPhone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http:/www.ecma.ch - Internet: helpdeskecma.chData Compression forInformation Interchange -Adaptive Coding with EmbeddedDic
2、tionary - DCLZ Algorithm.Standard ECMA-151June 1991Standardizing Information and Communication SystemsPhone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http:/www.ecma.ch - Internet: helpdeskecma.chMB ECMA-151.DOC 06-09-99 14,04Data Compression forInformation Interchange -Adaptive Coding with Em
3、beddedDictionary - DCLZ Algorithm.Brief HistoryIn the past decades ECMA have published numerous ECMA Standards for magnetic tapes, magnetic tape cassettes andcartridges, as well as for optical disk cartridges. Those media developed recently have a very high physical recording density.In order to mak
4、e an optimal use of the resulting data capacity, compression algorithms have been designed which allow areduction of the number of bits required for the representation of user data in coded form.In future, these compression algorithms will be registered by an International Registration Authority to
5、be set up by ISO/IEC.The registration will consist in allocating to each registered algorithm a numerical identifier which will be recorded on themedium and, thus, indicate which compression algorithm(s) has been used.The present ECMA Standard is the first of a forthcoming series of ECMA Standards f
6、or compression algorithms. It has beencontributed to ISO/IEC for adoption as an International Standard under the fast-track procedure.This ECMA Standard has been adopted by the ECMA General Assembly of June 1991.- i -Table of contents1Scope 12 Conformance 13 Reference 14 Conventions 15 Algorithm Ide
7、ntifier 16 DCLZ Compression Algorithm 16.1 Overview 16.2 Principle of operation 16.2.1 Compilation of the dictionary 16.2.2 Frozen dictionary 26.2.3 Resetting the dictionary to the empty state 26.2.4 Boundaries 26.2.5 Re-creation of the dictionary 26.3 Code Values 26.3.1 Control Codes 36.3.2 Encoded
8、 Bytes 36.3.3 Dictionary Codes 36.4 Codewords 37 Bibliography 4Appendix A - Example of a Generic DCLZ Algorithm 5Appendix B - Example of Code Values output for a given Input Stream 9- ii -.1ScopeThis ECMA Standard specifies a lossless compression algorithm to reduce the number of bits required torep
9、resent information coded by means of 8-bit bytes. This algorithm is known as DCLZ (Data Compressionaccording to Lempel and Ziv).This ECMA Standard specifies neither the strategy for resetting the dictionary nor that for freezing it, asthese are implementation-dependent.This algorithm is particularly
10、 useful when information has to be recorded on an interchangeable medium. Itsuse is not limited to this application.2 ConformanceA compression algorithm shall be in conformance with this Standard if its output data stream satisfies therequirements of clause 6.3 ReferenceInternational Register of Pro
11、cessing Algorithms (to be established).4 ConventionsNumbers in this Standard are expressed in decimal notation.5 Algorithm IdentifierThe numeric identifier of this algorithm in the International Register is 32.6 DCLZ Compression Algorithm6.1 OverviewThe DCLZ compression algorithm shall accept inform
12、ation input, in the form of a stream of 8-bit databytes, and shall output Codewords, in the form of a stream of bits which are organised into 8-bit bytes.The algorithm shall identify repetition of byte strings in the input stream and shall exclude suchredundancy from the output stream.With many type
13、s of information generated, transmitted or recorded by electronic information processingsystems and equipment, the degree of repetition in data is sufficiently high to permit the output stream tocontain significantly fewer bits than the input stream. Under degenerate circumstances, however, theoutpu
14、t stream may contain more bits than the input stream. The actual ratio of the numbers of bits isdependent on the characteristics of the actual input data stream.Compression by this algorithm is lossless, i.e. it is possible to restore exactly the original representation ofdata by means of a compleme
15、ntary decompression algorithm.The algorithm contains features which aid its implementation in data storage and retrieval equipmentwhich handles, in a sequential manner, data records of varying length.6.2 Principle of operationThe fundamental principle of operation is the compilation of a dictionary
16、of strings of bytes which occurin the input stream, the use of that dictionary to detect repetition, and the generation of a Codeword foreach repeated string. The Codeword expresses a Code Value which is the reference to the dictionary entryfor the repeated string.6.2.1 Compilation of the dictionary
17、The algorithm shall examine the input stream and shall search for the first occurrence of a unique pairor a unique string. A unique pair is a 2-byte string which has not yet been allocated a dictionary entry.A unique string of n bytes (n 2) is one which has not yet been allocated a dictionary entry;
18、 however,- 2 -the first n-1 bytes shall have been already allocated a dictionary entry. The maximum length of a stringfor which a dictionary entry can be allocated shall be 128 bytes.Upon encountering a unique pair, the algorithm shall output a Codeword which expresses the CodeValue for the first by
19、te of the pair. Upon encountering a unique string of n bytes, the algorithm shalloutput a Codeword which expresses the Code Value for the first n-1 bytes of the string.It shall then enter the unique pair or unique string into the dictionary and assign the next unused CodeValue to the entry, provided
20、 that the dictionary is not frozen (see 6.2.2) and that n does not exceed 128.Starting with the 2nd byte of the current unique pair or the last byte of the current unique string, thealgorithm shall then continue to examine the input stream and search for the next unique pair or uniquestring.6.2.2 Fr
21、ozen dictionaryThe dictionary shall be considered frozen in the following cases:- all available Code Values have been assigned,- the implementation of the algorithm has decided not to enter a unique pair or a unique string into thedictionary, for example because the search for free space in the dict
22、ionary takes too much time.In the frozen state no further dictionary entries shall be made. The only means by which the dictionarymay be removed from the frozen state is by being reset to the empty state (see also 6.3.1.1).6.2.3 Resetting the dictionary to the empty statePrior to the commencement of
23、 operation of the algorithm, the dictionary shall be reset to the empty state(see also 6.3.1.2).The algorithm is also permitted to reset the dictionary to this empty state at any time, provided that allbytes which have been input to the algorithm have been expressed by Codewords.The algorithm may, f
24、or example, choose to reset the dictionary if the current degree of compression isnot adequate because the current dictionary entries do not reflect the current repetition characteristics ofthe input stream to a sufficient extent.6.2.4 BoundariesWithin the input stream, natural boundaries may exist
25、between collections of bytes. For example, thestream may consist of a sequence of records, each comprising one or more bytes; in such a case, anatural boundary exists between records. The algorithm shall provide a means for identifying suchboundaries in the output stream, so that they are recognized
26、 and re-constituted by a decompressionalgorithm.Such identification shall be achieved by the output of the EOR Codeword, (see 6.3.1.4), followed by aCodeword which expresses the Code Value for the single byte, pair of bytes or string of bytes which isbeing held temporarily for the purpose of examini
27、ng the input stream for a unique pair or a uniquestring. Examination of the input stream shall then continue from the first byte of the next record. Theresult is that the data between boundaries in the input stream is wholly represented by Codewordsbetween corresponding boundaries in the output stre
28、am. Such boundaries occur after the last of any padbits which may follow the Codeword which follows the EOR Codeword.6.2.5 Re-creation of the dictionaryThe dictionary is not itself included in the output stream as a distinct item. Any appropriatedecompression algorithm will re-create the dictionary
29、and restore the original representation of the datafrom the output stream of the compression algorithm.6.3 Code ValuesCode Values shall be integers in the range 0 to 4095.Code Values in the range 0 to 7 shall designate Control Codes. See 6.3.1.Code Values in the range 8 to 263 shall designate Encode
30、d Bytes. See 6.3.2.Code Values in the range 264 to 4095 shall designate Dictionary Codes. See 6.3.3- 3 -6.3.1 Control CodesFour Control Codes are defined, with Code Values 0, 1, 2 and 3 as described below. Values in the range4 to 7 shall not be used.6.3.1.1 Dictionary FrozenThis Control Code shall h
31、ave the Code Value 0. It shall indicate that the dictionary has been frozen.It is not mandatory for the algorithm to output this Code Value.It may be output at any time after the algorithm has decided to freeze the dictionary, provided that allbytes which have been input to the algorithm have been e
32、xpressed by Codewords.6.3.1.2 Dictionary ResetThis Control Code shall have the Code Value 1. It shall be the first Code Value which is output bythe algorithm after the dictionary is reset to its empty state. It shall not be output at any other time.The Codeword which contains this Code Value shall b
33、e followed in the output stream, if necessary,by a sufficient number of bits set to ZERO to pad to the next 8-bit byte.6.3.1.3 Increment Codeword SizeThis Control Code shall have the Code Value 2. It shall indicate that the number of bits in allsubsequent Codewords (until after the next Increment Co
34、deword Size Control Code, if any) is greaterby 1 than the number of bits in the Codeword which contains this Code Value.6.3.1.4 End of Record (EOR)This Control Code shall have the Code Value 3. It shall indicate that, in the input stream, a recordboundary exists after the byte, pair or string which
35、is represented by the Code Value expressed by thenext Codeword.The Codeword which contains this EOR Code Value shall be followed in the output stream, ifnecessary, by a sufficient number of bits set to ZERO to pad to the next 8-bit byte. The nextCodeword in the output stream shall be followed by a s
36、ufficient number of bits set to ZERO to pad tothe next 8-bit byte. This latter requirement ensures that the set of Codewords which represents arecord in the input stream begins with an 8-bit byte and ends with a subsequent 8-bit byte.6.3.2 Encoded BytesAn Encoded Byte represents a single byte in the
37、 input stream. The complete set of Encoded Bytesrepresents the complete set of possible values of a single byte, i.e. 0 to 255. An Encoded Byte shall becomputed by adding 8 to the value of the byte to be encoded.6.3.3 Dictionary CodesA Dictionary Code identifies a dictionary entry for a pair or a st
38、ring of bytes.6.4 CodewordsA Codeword is the binary expression, in the output stream, of a Code Value.The size of a Codeword shall be 9, 10, 11 or 12 bits. When the dictionary is empty, the Codeword sizeshall be 9 bits. The Codeword size shall be increased, as necessary, to be capable of expressing
39、CodeValues which are too large to be expressed by the current Codeword size. See 6.3.1.3.The Codeword size may also be increased by the algorithm at any other time. If the current Codeword sizeis greater than that which is necessary to express the desired Code Value, the redundant bits shall be in t
40、hemore significant positions than the required bits, and shall be set to ZERO.The only procedure for decreasing the Codeword size shall be the algorithm shall reset the dictionary to the empty state, it shall output a Codeword expressing the Code Value 1, this Codeword shall have the size required b
41、y the last Increment Codeword Size Control Code, andpad bits shall follow to the next byte, if required, the following Codeword shall be a 9-bit Codeword.- 4 -Upon being output, Codewords shall be organised into 8-bit bytes by entering Codeword bits in sequence,starting with the least significant bi
42、t, into successive bits of an 8-bit byte, starting with the rightmost bitand proceeding from right to left. When all positions in a byte have been used, the byte shall be output,and subsequent Codeword bits shall be entered into the next byte of the output stream.7 BibliographyMark J. Bianchi et al
43、“Data Compression in a Half-Inch Reel-to-Reel Tape Drive“, Hewlett-PackardJournal, Volume 40, No. 3, June 1989, pp 26-31.- 5 -Appendix A(informative)Example of a Generic DCLZ AlgorithmThe following is a description of a generic DCLZ compression algorithm. The description is in structured English,als
44、o known as pseudo-code. The language utilises normal English vocabulary, syntax and semantics, plus a specialword (ENDIF) and a style convention. It expresses instructions which are either to be executed unconditionally, or tobe executed if a particular condition (or combination of conditions) exist
45、s. It also expresses such conditions.Additionally, it provides for comments to be annotated; these are intended to aid the reader in understanding thealgorithm.The grouping of instructions into sets which are subject to conditional or repeated execution is denoted by ahierarchy of text indentation.
46、A set comprises all instructions which have the same or a greater degree of indentation.Except where repetition or conditional execution is expressed, instructions are executed in sequence from the top ofthe page. Comments are enclosed in curly brackets and are not instructions.Specific implementati
47、ons of the algorithm may differ from each other, for example in their strategies for deciding tofreeze the dictionary or reset it to an empty state.The algorithm is in two parts. The first part processes the input stream and generates Code Values. If, in the inputstream, a record boundary follows th
48、e last byte of the string represented by a particular Code Value, that Code Valueshall be flagged to enable the second part to generate a Codeword which contains the EOR Code Value.The second part processes the Code Value and generates Codewords.- 6 -A.1 Code Value GeneratorThe operation of the algo
49、rithm shall be as shown on the next pages. The term “Pop“ is used to mean “outputa Code Value“. The term “Pop if non-null, it contains less than 130 bytes. Dictionaryentries are strings of between 2 and 128 bytes in length. Therefore, a search for a 129-byte string in thedictionary will fail. The final byte in the input stream is to be treated as if it is followed by a recordboundary.The following is a general description of the essential features of the algorithm shown on the opposite page.A.1.1 Outer levelInitialize the dictionary to the empty state and output a Dictionary Reset C
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1