1、ANSI INCITS 223-1995 (R2001)(formerly ANSI X3.223-1995 (R2001)for Information Technology Data Compression Algorithm Adaptive Coding withEmbedded Dictionary(DCLZ Algorithm) forInformation InterchangeAmericanNationalStandardApproval of an American National Standard requires review by ANSI that therequ
2、irements for due process, consensus, and other criteria for approval havebeen met by the standards developer.Consensus is established when, in the judgment of the ANSI Board of StandardsReview, substantial agreement has been reached by directly and materiallyaffected interests. Substantial agreement
3、 means much more than a simplemajority, but not necessarily unanimity. Consensus requires that all views andobjections be considered, and that a concerted effort be made toward theirresolution.The use of American National Standards is completely voluntary; their existencedoes not in any respect prec
4、lude anyone, whether he has approved the standardsor not, from manufacturing, marketing, purchasing, or using products, processes,or procedures not conforming to the standards.The American National Standards Institute does not develop standards and will inno circumstances give an interpretation of a
5、ny American National Standard.Moreover, no person shall have the right or authority to issue an interpretation ofan American National Standard in the name of the American National StandardsInstitute. Requests for interpretations should be addressed to the secretariat orsponsor whose name appears on
6、the title page of this standard.CAUTION NOTICE: This American National Standard may be revised orwithdrawn at any time. The procedures of the American National StandardsInstitute require that action be taken periodically to reaffirm, revise, or withdrawthis standard. Purchasers of American National
7、Standards may receive currentinformation on all standards by calling or writing the American National StandardsInstitute.CAUTION: The developers of this standard have requested that holders of patents that may be required for theimplementation of the standard disclose such patents to the publisher.
8、However, neither the developers nor the publisherhave undertaken a patent search in order to identify which, if any, patents may apply to this standard. As of the date ofpublication of this standard and following calls for the identification of patents that may be required for the implementationof t
9、he standard, no such claims have been made. No further patent search is conducted by the developer or publisher inrespect to any standard it processes. No representation is made or implied that licenses are not required to avoidinfringement in the use of this standard.Published byAmerican National S
10、tandards Institute11 West 42nd Street, New York, New York 10036Copyright 1995 by Information Technology Industry Council (ITI)All rights reserved.No part of this publication may be reproduced in anyform, in an electronic retrieval system or otherwise,without prior written permission of ITI, 1250 Eye
11、 Street NW,Washington, DC 20005.Printed in the United States of AmericaANSIX3.223-1995American National Standardfor Information Technology Data Compression Algorithm Adaptive Coding with Embedded Dictionary(DCLZ Algorithm) for Information InterchangeSecretariatInformation Technology Industry Council
12、Approved May 10, 1995American National Standards Institute, Inc.iiContentsPageForeword .iii1 Scope12 Normative reference.13 Definitions .14 Conventions 25 Algorithm identifier .26 DCLZ compression algorithm2AnnexesA Example of a generic DCLZ algorithm .5B Example of Code Values output for a given in
13、put stream9C Bibliography.10iiiForeword (This foreword is not part of American National Standard X3.223-1995.)This standard specifies a lossless data compression algorithm that isintended for general purposes. It contains features that make it particularlyapplicable to systems for recording informat
14、ion on interchangeable media.This standard was developed by Technical Committee X3B5, by X3 Project820. The first draft was produced in November 1990. The second draft wasproduced in April 1991 and was subjected to X3B5 letter ballot. The thirddraft incorporated changes arising from ballot comments.
15、 This standard contains three informative annexes, which are not consid-ered part of the standard.Requests for interpretation, suggestions for improvement or addenda, ordefect reports are welcome. They should be sent to the X3 Secretariat,Information Technology Industry Council, 1250 Eye Street, NW,
16、 Suite 200,Washington, DC 20005.This standard was processed and approved for submittal to ANSI by theAccredited Standards Committee on Information Technology, X3.Committee approval of this standard does not necessarily imply that allcommittee members voted for its approval. At the time it approved t
17、hisstandard, the X3 Committee had the following members:James D. Converse, ChairDonald C. Loughry, Vice-ChairJoanne Flanagan, SecretaryOrganization Represented Name of RepresentativeAmerican Nuclear SocietyGeraldine C. MainSally Hartzell (Alt.)AMP, Inc. Edward KellyCharles Brill (Alt.)Apple Computer
18、, IncDavid K. MichaelAT however, the first n 1bytes shall have been already allocated a dic-tionary entry. The maximum length of a stringfor which a dictionary entry can be allocatedshall be 128 bytes.Upon encountering a unique pair, the algo-rithm shall output a Codeword that expressesthe Code Valu
19、e for the first byte of the pair.Upon encountering a unique string of n bytes,the algorithm shall output a Codeword that3.5 control code: A Code Value in therange 0 to 7 that does not represent inputstream data.3.6 dictionary: A data store, comprising3832 entries, that is used to retain selectedbyte
20、 strings from the input stream. Each entryis identified by a unique Code Value that isgreater than 263.3.7 empty state: The state in which no datais in the dictionary.3.8 EOR: end of record.3.9 frozen state: The state in which it is notpermitted to add data to the dictionary.3.10 pad bit: A bit in t
21、he output stream ofthe compression algorithm that is set to ZEROand does not express data.3.11 record: Related data that is sometimestreated as a unit of information. 4 ConventionsNumbers in this standard are expressed in dec-imal notation.5 Algorithm identifierThe numeric identifier of this algorit
22、hm in theInternational Register is 32 (see ISO/IEC11576).6 DCLZ compression algorithm6.1 OverviewThe DCLZ compression algorithm shall acceptinformation input in the form of a stream of 8-bit data bytes, and shall output Codewords3)in the form of a stream of bits that are orga-nized into 8-bit bytes.
23、 The algorithm shallidentify repetition of byte strings in the inputstream and shall exclude such redundancyfrom the output stream.ANSI X3.223-199523)See clause 3 for a definition of this term.ANSI X3.223-1995expresses the Code Value for the first n 1bytes of the string.It shall then enter the uniqu
24、e pair or uniquestring into the dictionary and assign the nextunused Code Value to the entry, provided thatthe dictionary is not frozen (see 6.2.2) andthat n does not exceed 128.Starting with the second byte of the currentunique pair or the last byte of the currentunique string, the algorithm shall
25、then contin-ue to examine the input stream and search forthe next unique pair or unique string.6.2.2 Frozen dictionaryThe dictionary shall be considered frozen inthe following cases: all available Code Values have beenassigned; the implementation of the algorithm hasdecided not to enter a unique pai
26、r or aunique string into the dictionary, because,for example, the search for free space inthe dictionary takes too much time.In the frozen state, no further dictionaryentries shall be made. The only means bywhich the dictionary may be removed from thefrozen state is by being reset to the emptystate
27、(see also 6.3.1.1).6.2.3 Resetting the dictionary to theempty stateThe algorithm is permitted to reset the dictio-nary to the empty state at any time, providedthat all bytes that have been input to the algo-rithm have been expressed by Codewords.The algorithm may, for example, choose toreset the dic
28、tionary if the current degree ofcompression is not adequate because the cur-rent dictionary entries do not reflect the cur-rent repetition characteristics of the inputstream to a sufficient extent.6.2.4 BoundariesWithin the input stream, natural boundariesmay exist between collections of bytes. Fore
29、xample, the stream may consist of asequence of records, each comprising one ormore bytes; in such a case, a natural bound-ary exists between records. The algorithmshall provide a means for identifying suchboundaries in the output stream, so that theyare recognized and reconstituted by a decom-pressi
30、on algorithm.Such identification shall be achieved by theoutput of the EOR Codeword (see 6.3.1.4),followed by a Codeword that expresses theCode Value for the single byte, pair of bytes,or string of bytes that is being held temporari-ly for the purpose of examining the inputstream for a unique pair o
31、r a unique string.Examination of the input stream shall thencontinue from the first byte of the next record.The result is that the data between bound-aries in the input stream is wholly representedby Codewords between corresponding bound-aries in the output stream. Such a boundary inthe output strea
32、m is considered to be locatedat the end of the pad bits that follow theCodeword that follows the EOR Codeword.6.2.5 Re-creation of the dictionaryThe dictionary is not itself included in the out-put stream as a distinct item. Any appropriatedecompression algorithm will re-create thedictionary and res
33、tore the original representa-tion of the data from the output stream of thecompression algorithm.6.3 Code ValuesCode Values shall be integers in the range 0to 4095.Code Values in the range 0 to 7 shall desig-nate Control Codes.3)(See 6.3.1.)Code Values in the range 8 to 263 shall des-ignate Encoded
34、Bytes. (See 6.3.2.)Code Values in the range 264 to 4095 shalldesignate Dictionary Codes. (See 6.3.3.)6.3.1 Control codesFour Control Codes are defined, with CodeValues 0, 1, 2, and 3 as described in 6.3.1.1 6.3.1.4. Values in the range 4 to 7 shall not beused.6.3.1.1 Dictionary frozenThis Control Co
35、de shall have the Code Value0. It shall indicate that the dictionary has beenfrozen. It is not mandatory for the algorithm tooutput this Code Value.It may be output at any time after the algo-rithm has decided to freeze the dictionary,36.3.3 Dictionary CodesA Dictionary Code identifies a dictionary
36、entryfor a pair or a string of bytes.6.4 CodewordsA Codeword is the binary expression, in theoutput stream, of a Code Value.The size of a Codeword shall be 9, 10, 11, or12 bits. When the dictionary is in the emptystate, the Codeword size shall be 9 bits. TheCodeword size shall be increased, as neces
37、-sary, to be capable of expressing CodeValues that are too large to be expressed bythe current Codeword size. (See 6.3.1.3.)The Codeword size may also be increased bythe algorithm at any other time. If the currentCodeword size is greater than that which isnecessary to express the desired Code Value,
38、the redundant bits shall be in the more signifi-cant positions than the required bits, and shallbe set to ZERO.The procedure for decreasing the Codewordsize shall be as follows:a) The algorithm shall reset the dictionaryto the empty state;b) It shall output a Codeword expressingthe Dictionary Reset
39、Control Code; thisCodeword shall have the size required by thelast Increment Codeword Size Control Code,if any;c) Pad bits, if any are required, shall followto the next byte;d) The next Codeword shall be a 9-bitCodeword.When output, Codewords shall be organizedinto 8-bit bytes by entering Codeword b
40、its insequence, starting with the least significantbit, into successive bits of an 8-bit byte, start-ing with the rightmost bit and proceeding fromright to left. When all positions in a byte havebeen used, the byte shall be output, and sub-sequent Codeword bits shall be entered intothe next byte of
41、the output stream.provided that all bytes that have been input tothe algorithm have been expressed byCodewords.6.3.1.2 Dictionary resetThis Control Code shall have the Code Value 1.It shall be the first Code Value that is output bythe algorithm after the dictionary is reset to itsempty state. It sha
42、ll not be output at any othertime.The Codeword that contains this Code Valueshall be followed in the output stream, if nec-essary, by a sufficient number of bits set toZERO to pad to the next 8-bit byte.6.3.1.3 Increment Codeword sizeThis Control Code shall have the Code Value2. It shall indicate th
43、at the number of bits in allsubsequent Codewords (until after the nextIncrement Codeword Size Control Code, ifany) is greater by 1 than the number of bits inthe Codeword that contains this Code Value.6.3.1.4 End of record (EOR)This Control Code shall have the Code Value3. It shall indicate that, in
44、the input stream, arecord boundary exists after the byte, pair, orstring that is represented by the Code Valueexpressed by the next Codeword.The Codeword that contains this EOR CodeValue shall be followed in the output stream, ifnecessary, by a sufficient number of bits setto ZERO to pad to the next
45、 8-bit byte. Thenext Codeword in the output stream shall befollowed by a sufficient number of bits set toZERO to pad to the next 8-bit byte. This latterrequirement ensures that the set of Code-words that represents a record in the inputstream begins with an 8-bit byte and endswith a subsequent 8-bit
46、 byte.6.3.2 Encoded BytesAn Encoded Byte represents a single byte inthe input stream. The complete set of EncodedBytes represents the complete set of possiblevalues of a single byte, i.e., 0 to 255. AnEncoded Byte shall be computed by adding 8to the value of the byte to be encoded.ANSI X3.223-19954A
47、NSI X3.223-1995This annex provides a description of a genericDCLZ compression algorithm. The descriptionis in structured English, also known as pseu-do-code. The language utilizes normal Englishvocabulary, syntax, and semantics, plus aspecial word (endif) and a style convention. Itexpresses instruct
48、ions that are either to beexecuted unconditionally, or to be executed ifa particular condition (or combination of con-ditions) exists. It also expresses such condi-tions. Additionally, it provides for annotations;these are intended to aid the reader in under-standing the algorithm.The grouping of in
49、structions into sets that aresubject to conditional or repeated execution isdenoted by a hierarchy of text indentation. Aset comprises all instructions that have thesame or a greater degree of indentation.Except where repetition or conditional execu-tion is expressed, instructions are executed insequence from the top of the page.Comments are enclosed in curly brackets andare not instructions. Specific implementationsof the algorithm may differ from each other,for example, in their strategies for deciding tofreeze the dictionary or to reset it to the emptystate.The