1、ANSI INCITS 241-1994 (R1999)(formerly ANSI X3.241-1994 (R1999)for Information Systems Data Compression Method Adaptive Coding with Sliding Windowfor Information InterchangeAmericanNationalStandardApproval of an American National Standard requires review by ANSI that therequirements for due process,
2、consensus, and other criteria for approval havebeen met by the standards developer.Consensus is established when, in the judgment of the ANSI Board of StandardsReview, substantial agreement has been reached by directly and materiallyaffected interests. Substantial agreement means much more than a si
3、mplemajority, but not necessarily unanimity. Consensus requires that all views andobjections be considered, and that a concerted effort be made toward theirresolution.The use of American National Standards is completely voluntary; their existencedoes not in any respect preclude anyone, whether he ha
4、s approved the standardsor not, from manufacturing, marketing, purchasing, or using products, processes,or procedures not conforming to the standards.The American National Standards Institute does not develop standards and will inno circumstances give an interpretation of any American National Stand
5、ard.Moreover, no person shall have the right or authority to issue an interpretation ofan American National Standard in the name of the American National StandardsInstitute. Requests for interpretations should be addressed to the secretariat orsponsor whose name appears on the title page of this sta
6、ndard.CAUTION NOTICE: This American National Standard may be revised orwithdrawn at any time. The procedures of the American National StandardsInstitute require that action be taken periodically to reaffirm, revise, or withdrawthis standard. Purchasers of American National Standards may receive curr
7、entinformation on all standards by calling or writing the American National StandardsInstitute.CAUTION: The developers of this standard have requested that holders of patents that may be required for theimplementation of the standard disclose such patents to the publisher. However, neither the devel
8、opers nor the publisherhave undertaken a patent search in order to identify which, if any, patents may apply to this standard. As of the date ofpublication of this standard, following calls for the identification of patents that may be required for the implementation ofthe standard, notice of one or
9、 more such claims has been received. By publication of this standard, no position is takenwith respect to the validity of this claim or of any rights in connection therewith. The known patent holder(s) has (have),however, filed a statement of willingness to grant a license under these rights on reas
10、onable and nondiscriminatory termsand conditions to applicants desiring to obtain such a license. Details may be obtained from the publisher. No furtherpatent search is conducted by the developer or publisher in respect to any standard it processes. No representation ismade or implied that this is t
11、he only license that may be required to avoid infringement in the use of this standard.Published byAmerican National Standards Institute11 West 42nd Street, New York, New York 10036Copyright 1994 by Information Technology Industry Council (ITI)All rights reserved.No part of this publication may be r
12、eproduced in anyform, in an electronic retrieval system or otherwise,without prior written permission of ITI, 1250 Eye Street NW,Washington, DC 20005.Printed in the United States of AmericaANSIX3.241-1994American National Standardfor Information Systems Data Compression Method Adaptive Coding with S
13、liding Windowfor Information InterchangeSecretariatComputer and Business Equipment Manufacturers AssociationApproved August 30, 1994American National Standards Institute, Inc.iiContentsPageForewordii1 Scope and conformance 12 Normative references.13 Definitions.14 Algorithm identifier.15 Data format
14、 .2AnnexesA Encoding overview.5B Example compression encoding method.8iiiForeword (This foreword is not part of American National Standard X3.241-1994.)This standard specifies a lossless data compression method that is intendedfor general purposes. It contains features that make it particularly appl
15、icableto systems for recording information on interchangeable media.This standard was developed by Technical Committee X3B5, by X3 project882. The first draft was produced in November 1991. The second draft wasproduced in March 1992.There are two annexes in this standard. Both are informative and ar
16、e notconsidered part of this standard.Requests for interpretation, suggestions for improvement or addenda, ordefect reports are welcome. They should be sent to the X3 Secretariat,Computer and Business Equipment Manufacturers Association, 1250 EyeStreet NW, Suite 200, Washington, DC 20005.This standa
17、rd was processed and approved for submittal to ANSI by theAccredited Standards Committee on Information Processing Systems, X3.Committee approval of this standard does not necessarily imply that allcommittee members voted for its approval. At the time it approved thisstandard, the X3 Committee had t
18、he following members:James D. Converse, ChairDonald C. Loughry, Vice-ChairJoanne Flanagan, SecretaryOrganization Represented Name of RepresentativeAmerican Nuclear Society Geraldine C. MainSally Hartzell (Alt.)AMP, IncEdward KellyCharles Brill (Alt.)Apple Computer, Inc. .Karen HigginbottomDavid K. M
19、ichael (Alt.)AT an offset field (described in 5.5); a length field (described in 5.6).5.5 Offset fieldThe offset field of a string token is a variable-length bit pattern that represents the distance5 Data format5.1 OverviewThe data compression encoding method isdesigned to support an adaptive string
20、 com-pression algorithm that can find redundantmultiple byte patterns in the input data streamand replace them with shorter tokens in thecompressed output data stream. The outputdata stream alone may be used to reconstructthe original input stream completely andexactly.5.2 Principle of operationThe
21、input data stream and the output datastream shall consist of a stream of bytes.Within a byte, the bits shall be arranged withbit b8 as the most significant, and bit b1 asthe least significant.The output data bytes are composed of astream of fields with a variable number of bitsin each field. The mos
22、t significant bit of a fieldshall be placed into the most significantunused bit location of an output byte. Theother bits of a field shall be placed in bit loca-tions in an output byte by proceeding insequence towards the least significant end ofthe output byte. The end marker (see 5.7)may be used t
23、o force the output data streamto a byte-boundary.The output data stream consists of the follow-ing three field types: a raw byte token; a string token; an end marker.The structure of these fields is describedlater. Each field may be present multiple timesand may appear in any order. Each field shall
24、be completed before a new field may begin.The encoding algorithm shall maintain a 2048-byte history buffer. Each byte that enters theencoding algorithm from the input data streamshall be placed into the history buffer. Whenthe history buffer becomes full, the newestbytes shall replace the oldest byt
25、es. The histo-ry buffer shall always contain the last 2048bytes that have entered the encoding algorithmsince the last clearing of the history buffer.ANSI X3.241-19942ANSI X3.241-1994in bytes within the history buffer from the firstbyte of the matching pattern to the first byte ofthe source pattern.
26、 The number of bytes isreferred to as offset. The minimum value ofoffset is 1, which represents the byte thatentered the history buffer just before thesource pattern. The maximum value of offsetis 2047.If the value of offset is less than or equal to127, an 8-bit pattern shall be generated. Bitb8 sha
27、ll be a 1. Bits b1 through b7 shall bethe binary value of offset.If the value of offset is greater than 127, a 12-bit pattern shall be generated. Bit b12 shallalways be a 0. Bits b11 through b1 shall bethe binary value of offset.5.6 Length fieldThe length field is a variable-length bit patternthat r
28、epresents the length in bytes of thesource pattern. The number of bytes isreferred to as length. The minimum value oflength is 2. The maximum value of length isunbounded.If the value of length is less than or equal to4, a 2-bit pattern shall be generated. These 2bits shall be the binary value of:(le
29、ngth 2).If the value of length is greater than 4, andless than or equal to 7, a 4-bit pattern shall begenerated. Bits b3 and b4 shall always be avalue of 1. Bits b1 and b2 shall be the binaryvalue of:(length 5).If the value of length is greater than 7, a vari-able-bit pattern shall be used. Multiple
30、 4-bitpatterns shall be generated with all bits set to1. The number of 4-bit patterns shall be:(N + 1), where N is the integer result of(length 8) 15).Then a 4-bit pattern shall be generated with abinary value of the remainder from the divi-sion operation.5.7 End markerThe end marker is a unique 9-b
31、it pattern thatshall be generated at the end of a block ofdata. The history buffer may be optionallycleared at the end of a block of data. The min-imum number of bytes that may be included ina block of data is 0. The maximum number ofbytes is unbounded.Bits b8 and b9 shall be 1s. Bits b1 through b7s
32、hall be 0s. Additional bits with a value of 0shall be written to the output bit stream untilan output byte-boundary is reached.3ANSI X3.241-19944A.1 Encoding overviewThis annex contains an example algorithm that may be used to encode an input stream of data. Thisalgorithm consists of several process
33、es. The process Encode is used to encode a complete block ofdata. This process will invoke other processes throughout its execution.This algorithm assumes that two buffers are available that are infinite in size. Every new byte thatenters the algorithm is inserted at the beginning of the history_buf
34、fer (while the entire contents ofthe history_buffer is moved by 1 byte to make space available). Every new byte is also inserted atthe beginning of the holding_buffer, but the holding_buffer is emptied whenever a token is gener-ated that represents the data stored in the holding_buffer. The most rec
35、ent 2048 bytes in the his-tory_buffer represent the history buffer as defined in 3.3. The holding_buffer represents thesource pattern as defined in 3.7.Other algorithms may be used to encode an input steam of data. Other algorithms need notrequire buffers of infinite size.A.1.1 EncodeThis process is
36、 used to compress an entire block of data. When this process is finished, the inputbyte stream will have been fully encoded and sent to the output byte stream.Figure A.1 Encode processA.1.2 Read_byteThis process is invoked by the Encode process. It accepts a single byte from the input bytestream and
37、 pushes that byte onto both the history_buffer and the holding_buffer.Figure A.2 Read_byte processProcess = Read_byte.Get 8-bit byte from input stream.Insert byte into history_buffer.Insert byte into holding_buffer.Endprocess.Process = Encode.While (input data exists).Read_byte.If (there is no match
38、ing pattern in the history_buffer that exactly matches the source pat- tern in the holding_buffer, that also satisfies the condition that offset is less than 2047).Output_token.Endif.Endwhile.Flush.Endprocess.Annex A(informative)Example encoding algorithmANSI X3.241-19945A.1.3 Output_tokenThis proce
39、ss is invoked by the Encode process. If a single byte is being processed, it will be out-put as a raw byte token. If multiple bytes are being processed, they will be output as a stringtoken. A string token consists of an offset and a length.Figure A.3 Output_token processProcess = Output_token.If (n
40、umber of bytes in holding_buffer 2).Put single 0 bit to output bit stream.Put oldest byte in holding_buffer to output bit stream.Clear the oldest byte from the holding_buffer.Elseif.Put single 1 bit to output bit stream.If (Offset 127).Put single 1 bit to output bit stream.Put 7-bit binary value of
41、offset to output stream.Elseif.Put single 0 bit to output bit stream.Put 11-bit binary value of offset to output stream.Endif.Output_length.Clear all bytes from the holding_buffer except the newest byte.Endif.Endprocess.ANSI X3.241-19946A.1.4 Output_lengthThis process is invoked by the Output_token
42、process. It will output the length portion of the stringtoken being processedFigure A.4 Output_length processA.1.5 OffsetThis process is invoked by the Output_token process. This process will calculate and return theoffset portion of the string token to the Output_token process.Figure A.5 Offset pro
43、cessProcess = Offset.Return the value of (the distance in bytes within the history_buffer from the first byte of the source pattern to the first byte of the matching pattern).Endprocess.Process = Output_length.Set X to (number of bytes in holding_buffer 1).If (X 4).Put 2-bit binary value of (X 2) to
44、 output stream.Elseif.If (X 7).Put 2-bit pattern with all bits set to a 1 bit to output stream.Put (2-bit binary value of (X 5) to output stream.Elseif.Put 4-bit pattern with all bits set to a 1 bit to output stream.Set X to (X 8).While (X 15).Put 4-bit pattern with all bits set to a 1 bit to output
45、 stream.Set X to (X 15).Endwhile.Put 4-bit binary value of X to output stream.Endif.Endif.Endprocess.ANSI X3.241-19947A.1.6 FlushThis process is invoked by the Encode process. This process will force any pending token to beoutput, followed by the end marker.Figure A.6 Flush processProcess = Flush.Wh
46、ile (number of bytes in holding_buffer 0).Output_token.Endwhile.Put 9-bit pattern with b8 and b9 set to 1s and bits b1 through b7 set to 0s to output stream.If (desired to clear the history).Clear all bytes from the history_buffer.Endif.Endprocess.ANSI X3.241-19948Annex B(informative)Example compres
47、sion encodingTable B.1 demonstrates an example encoding output based on a given input byte stream. Within tableB.1, time runs down the page.Input Output bitbyte stream stream CommentsA Source pattern requires at least 2 bytes to matchB 0010000012No matching pattern for AB, output A as raw byte token
48、A 0010000102No matching pattern for BA, output B as raw byte tokenA 0010000012No matching pattern for AA, output A as raw byte tokenA Matching pattern found for AA, wait for possible longer patternA Matching pattern found for AAA, wait for possible longer patternA Matching pattern found for AAAA, wa
49、it for possible longer patternA Matching pattern found for AAAAA, wait for possible longer patternC 11000000111002No matching pattern for AAAAAC, output AAAAA as string token with an offset of 1 and length of 5A 0010000112No matching pattern for CA, output C as raw byte tokenB Matching pattern found for AB, wait for possible longer patternA Matching pattern found for ABA, wait for possible longer patternB 110001001012No matching pattern for ABAB, output ABA as string token with an offset of 9 and length of 3A