1、Standard ECMA-222June 1995Standardizing Information and Communication SystemsPhone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http:/www.ecma.ch - Internet: helpdeskecma.chAdaptive Lossless DataCompression Algorithm.Standard ECMA-222June 1995Standardizing Information and Communication SystemsPh
2、one: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http:/www.ecma.ch - Internet: helpdeskecma.chMB- Ecma-222.doc - 01.09.98 16:05Adaptive Lossless DataCompression Algorithm(ALDC).Brief HistoryIn the past decades ECMA have published numerous ECMA Standards for magnetic tapes, magnetic tape cassette
3、s andcartridges, as well as for optical disk cartridges. Those media developed recently have a very high physical recording density.In order to make optimal use of the resulting data capacity, lossless compression algorithms have been designed which allow areduction of the number of bits required fo
4、r the representation of user data.These compression algorithms are registered by ECMA, the International Registration Authority established by ISO/IEC. Theregistration consists in allocating to each registered algorithm a numerical identifier which will be recorded on the medium and,thus, indicate w
5、hich compression algorithm(s) has been used.This ECMA Standard is the third ECMA Standard for compression algorithms. The two previous standards are:ECMA-151 Data Compression for Information Interchange - Adaptive Coding with Embedded Dictionary - DCLZAlgorithm (June 1991)ECMA-159 Data Compression f
6、or Information Interchange - Binary Arithmetic Coding Algorithm - (December 1991)These ECMA Standards have been adopted under the fast-track procedure by ISO/IEC JTC 1 as International Standards 11558and ISO/IEC 12042, respectively.The present ECMA Standard has also been contributed to ISO/IEC for a
7、doption under the fast-track procedure as anInternational Standard.This ECMA Standard has been adopted by the ECMA General Assembly of June 1995.- i -Table of contents1 Scope 12 Conformance 13 Reference 14 Definitions 14.1 Compressed Data Stream 14.2 Copy Pointer 14.3 Current Address 14.4 Data Byte
8、14.5 Displacement Field 14.6 End Marker 14.7 History Buffer 14.8 Literal 14.9 Matching String 14.10 Match Count 14.11 Match Count Field 24.12 Pad Bits 25 Conventions and Notations 25.1 Representation of numbers 25.2 Names 26 ALDC compression algorithm 26.1 Encoding description for a 512-byte History
9、 Buffer 26.2 Description of the Compressed Data Stream 3Annex A - ALDC encoding format 5Annex B - ALDC Overview 7Annex C - ALDC Encoding Flow Chart 9Annex D - Bibliography 13- ii -.1 ScopeThis ECMA standard specifies a lossless compression algorithm to reduce the number of bytes required to represen
10、tdata. The algorithm is known as Adaptive Lossless Data Compression algorithm (ALDC). The numerical identifiersaccording to ISO/IEC 11576 allocated to this algorithm are:ALDC 512-Byte History Buffer: 3ALDC 1024-Byte History Buffer: 4ALDC 2048-Byte History Buffer: 52 ConformanceA compression algorith
11、m shall be in conformance with this ECMA Standard if its output data stream satisfies therequirements of this ECMA Standard.3 ReferenceISO/IEC 11576:1993, Information Technology - Procedure for the Registration of Algorithms for the LosslessCompression of Data4 DefinitionsFor the purposes of this EC
12、MA Standard, the following definitions apply.4.1 Compressed Data StreamThe output stream after encoding.4.2 Copy PointerA part of the Compressed Data Stream which represents a group of two or more consecutive bytes for which therealready exists an identical group in the History Buffer. It comprises
13、a Length Code Field and a Displacement Field.4.3 Current AddressThe location within the History Buffer where the Data Byte is written.4.4 Data ByteThe current byte of incoming data which is written into the History Buffer and is compared to all data bytespreviously written into the History Buffer.4.
14、5 Displacement FieldThat part of the Copy Pointer which specifies the location within the History Buffer of the first byte of a MatchingString.4.6 End MarkerA string of 12 ONEs indicating the end of the Compressed Data Stream.4.7 History BufferA data structure where incoming data bytes are stored fo
15、r use in the compression and decompression process.4.8 LiteralA Data Byte for which no match was found in the History Buffer.4.9 Matching StringA sequence of bytes in the incoming data which is identical with a sequence of bytes in the History Buffer.4.10 Match CountThe number of bytes in a Matching
16、 String.- 2 -4.11 Match Count FieldThat part of the Copy Pointer which specifies the number of consecutive bytes for which a match was found in theHistory Buffer.4.12 Pad BitsBits set to ZERO and included in the Compressed Data Stream, as required, to maintain an 8-bit byte boundary.5 Conventions an
17、d Notations5.1 Representation of numbersThe following conventions and notations apply in this Standard, unless otherwise stated.G01 The setting of binary bits is denoted by ZERO and ONE.G01 Numbers in binary notation and bit combinations are represented by ZEROs and ONEs with the most significantbit
18、 to the left.G01 All other numbers shall be in decimal form.5.2 NamesThe names of entities are given with a capital initial letter.6 ALDC compression algorithmAn overview of the ALDC encoding process is given in annex B, and a flow chart is contained in annex C.6.1 Encoding description for a 512-byt
19、e History BufferAt the start of encoding, all bytes of the History Buffer shall be reset to all ZEROs. Data bytes shall be stored insequence in the History Buffer, starting with a Current Address of 0.The encoder processes the incoming data stream one byte at a time. The current byte being processed
20、 is referred toas the Data Byte. When a Data Byte is received from the input data stream, it shall be written into the History Bufferat the Current Address. Then the Current Address shall be incremented by 1. If it exceeds the maximum address,which is 511 for a History Buffer size of 512 bytes, it s
21、hall be reset to 0.Step 1The Data Byte shall be compared with each byte previously written into the History Buffer to identify any identicalbytes.Step 2If the Data Byte does not match any byte in the History Buffer, the process shall continue at step 6.G01 If the Data Byte matches one or more bytes
22、in the History Buffer, for every matching byte it shall be notedwhether this matching byte is a continuation of a previous sequence of matching bytes or not.G01 If it is not a continuation, the Displacement Field of the matching byte shall be noted and recorded as having aMatch Count of one byte.G01
23、 If the matching byte is a continuation of a previous string, the Match Count for that string shall be incrementedby 1.Step 3If a Match Count equals 271, the corresponding bytes shall be identified by a Copy Pointer, which shall be added tothe Compressed Data Stream. Its Match Count Field and Displa
24、cement Field shall be specified as defined in 6.2.The next Data Byte shall then be read and the process shall continue at Step 1.If there is no more data to be read, the process shall continue at Step 7.Note: The value of 271 was chosen for implementation reasons.- 3 -Step 4If the Match Count has no
25、t reached 271, any pending Matching Strings shall be checked to see if any are continuedby the Data Byte.G01 If none of the previous Matching Strings is continued and if any of the previous Matching Strings consists oftwo or more bytes, that Matching String having the lowest Displacement Field shall
26、 be identified by a CopyPointer, which shall be added to the Compressed Data Stream. The next Data Byte shall then be read. If thereis no more data to be read, the process shall continue at Step 7.G01 If no previous Matching Strings are continued and the previous matches were only 1-byte matches, th
27、eprevious 1-byte match shall be identified as a Literal and shall be added to the Compressed Data Stream asdefined in 6.2. The next Data Byte shall then be read. If there is no more data to be read, the process shallcontinue at Step 7.Step 5If there are no Matching Strings with a Match Count of 271
28、and there is the continuation of at least 1 previousMatching String, the next Data Byte shall be read. The process shall continue at Step 1.If there is no more data to be read, the pending Matching String shall be identified as a Copy Pointer, which shall beadded to the Compressed Data Stream. The p
29、rocess shall then continue at Step 7.Step 6If the Data Byte does not match any bytes of the History Buffer, a check shall be made for any previous MatchingStrings that may be pending.If there are any previous Matching Strings of two or more bytes, they shall be identified by a Copy Pointer, whichsha
30、ll be added to the Compressed Data Stream. Then the Data Byte that did not match shall be added to theCompressed Data Stream as a Literal. The next Data Byte shall be read and the process shall continue at Step 1. Ifthere is no more data to be read, the process shall continue at Step 7.If there are
31、no pending Matching Strings of two or more bytes and there are any pending 1-byte matches, the bytepreceding the Data Byte shall be identified as a Literal, which shall be added to the Compressed Data Stream. Thenthe Data Byte that did not find a match shall be identified as a Literal, which shall b
32、e added to the Compressed DataStream. The next Data Byte shall be read and the process shall continue at Step 1. If there is no more data to beread, the process shall continue at Step 7.Step 7An End Marker shall be added to the Compressed Data Stream and Pad Bits shall be included as required. This
33、endsthe encoding process.6.2 Description of the Compressed Data StreamAs described above, the processing of the input data generates as its output the Compressed Data Stream. Thecompleted Compressed Data Stream shall consist of:G01 Literals, each preceded by a bit set to ZERO.G01 Copy Pointers, each
34、 preceded by a bit set to ONE.G01 An End Marker preceded by a bit set to ONE.G01 Pad Bits.Once all data has been read, the Compressed Data Stream shall be terminated by a bit set to ONE, followed by anEnd Marker, followed by Pad Bits.During the encoding, if more than one Matching String of the same
35、Match Count is found, the Copy Pointer with thelowest Displacement Field shall be used. The Match Count Field shall consist of 2, 4, 6, 8, or 12 bits, identifyingMatch Counts as specified in table 1. The length of the Displacement Field shall be 9 bits, 10 bits, or 11 bits forHistory Buffer sizes of
36、 512 bytes, 1024 bytes, or 2048 bytes, respectively.- 4 -Table 1-Match Count FieldSetting of Match Count Field Number of bytes000110 00:10 11110 000:110 1111110 0000:1110 11111111 0000 0000:1111 1110 11101111 1110 1111234:78:1516:3132:270271- 5 -Annex A(informative)ALDC encoding formatA.1 Nomenclatu
37、reThe Compressed Data Stream encoding format is described using BNF-like language, the symbols being defined asfollows :Symbol Definition:= the non-terminal on the left side of the “:=“ can be replaced bythe expression on the right sidenon-terminal expression G03 the expression inside the square bra
38、ckets may occur 0 or more timesG04 the logical disjunction OR( ) the text enclosed within the parentheses is a comment only, added for clarity0,1 the terminal binary digits ZERO or ONE- 6 -A.2 ALDC_1 Algorithm - encoding format (512-byte History Buffer):= 0 0 G04G04 := (8-bit byte data):= 0 G04 1:=
39、:= (the length can be 2, 4, 6, 8 or 12 bits)Setting of Match CountMatch Count Field in bytes00 201 310 00 4: : :10 11 7110 000 8: : :110 111 151110 0000 16: : :1110 1111 311111 0000 0000 32: : :1111 1110 1110 2701111 1110 1111 271:= (9-bit):= number of bits sets to ZERO required to maintain an 8-bit
40、 boundary:= 1111 1111 1111A.3 ALDC_2 Algorithm - encoding format (1024-byte History Buffer)(identical to ALDC_1 definition except for a 10-bit Displacement Field):= (10-bit)A.4 ALDC_4 Algorithm - encoding format (2048-byte History Buffer)(identical to ALDC_1 definition except for an 11-bit Displacem
41、ent Field):= (11-bit)- 7 -Annex B(informative)ALDC OverviewThe ALDC algorithm accepts input in 8-bit data bytes and outputs a bit stream representing data in compressed form. TheALDC algorithm is one implementation of the Lempel-Ziv 1 (LZ1) class of data compression algorithms.LZ1 algorithms achieve
42、 compression using a data structure called a History Buffer where incoming data is stored and comparedto previous data in the same History Buffer. An LZ1 encoding process and an LZ1 decoding process both initialize thisstructure to the same known state and update it in an identical fashion. Conseque
43、ntly, these two histories remain identical, so itis not necessary to include history content information with in the compressed data stream.Incoming data is entered into the History Buffer. Each incoming byte is compared with all other bytes previously stored in theHistory Buffer. Compression result
44、s from finding sequential matches. At the beginning of the encoding process the HistoryBuffer is empty. The result of not finding any matching bytes in the History Buffer is that the Compressed Data Stream containsonly Literals. Bytes are encoded as Literals until a matching sequence of two or more
45、bytes occurs. As the History Buffer fills,it becomes increasingly possible for the encoder to represent incoming data by encoding it as a Copy Pointer for a stringalready present in this History Buffer. This is the principal mechanism by which LZ1 algorithms are able to achievecompression.- 8 - 9 -A
46、nnex C(informative)ALDC Encoding Flow ChartStart compressionReset all bytes of theHistory Buffer to all ZEROsSet Current Address to 0Read Data ByteWrite Data Byte to Current Address and increment Current Address by 1Is Current Address equal to History Buffer size?Reset Current Address to 0Compare Da
47、ta Byte with all bytes previously written into History BufferDoesData Byte match any other byte in History BufferBYESNOYESNOA- 10 -Is matchingbyte a continuationof a previousstring ?Increment Match Count of the Matching String by 1Have all bytes that matched in History Buffer been processed?Is there
48、 a Matching String with a Match Count equal to 271 ?Process the Matching String as CopyPointer and output this Copy PointerIs there at least one Matching Stringthat is a continuationof a previousstring ?Does a Matching String have a MatchCount 2?Is there moredata to be read ?Is there moredata to be
49、read ?Process the matching byte preceding the Data Byte as Literal and output itProcess the Matching String as Copy Pointer and output it ACYESYESNOYESNOYESNOYESYESNOYESNONONODGo to next byte in History Buffer that matched Data ByteRecord the Displacement Field value and a Match Count of 1 for the matching byte- 11 -Is there a Matching Stringwith a Match Count 2 that is not processed ? Process this Matching String as Copy Pointer and output itProcess this byte as a Literal and output itHas the previous Matching String a Match Coun