1、ECMA EUROPEAN COMPUTER MANUFACTURERS ASSOCIATION STANDARD ECMA-159 DATA COMPRESSION FOR INFORMATION INTERCHANGE - BINARY ARITHMETIC CODING ALGORITHM - December 1991 BRIEF HISTORY In the past decades ECMA have published numerous ECMA Standards for magnetic tapes, magnetic tape cassettes and cartridge
2、s, as well as for optical disk cartridges. Those media developed recently have a very high physical recording density. In order to make an optimal use of the resulting data capacity, compression algorithms have been designed which allow a reduction of the number of bits required for the representati
3、on of user data in coded form. In future, these compression algorithms will be registered by an International Registration Authority to be set up by ISO/IEC. The registration will consist in allocating to each registered algorithm a numerical identifier which will be recorded on the medium and, thus
4、, indicate which compression algorithm(s) has been used. ECMA has undertaken work on a series of ECMA Standards for compression algorithms. The first of these ECMA Standards was published in June 1991: ECMA-151: Data Compression for Information Interchange - Adaptive Coding with Embedded Dictionary
5、- DCLZ Algorithm The present ECMA Standard is the next one of this series. Both Standard ECMA-151 and the present Standard ECMA-159 have been contributed to ISOAEC for adoption as International Standards under the fast-track procedure. Adopted by the General Assembly of ECMA in December 1991. Table
6、of Contents Page 1 Scope 2 Conformance 3 References 4 Conventions and Notations 5 Algorithm identifier 6 Definitions 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 Block Code Block Code String Encoding input Event Logical Data Record Trailer Unique Table Pair 7 List of Acronyms 8 Compression Algorithm 8.1 General
7、8.2 Encoders 8.3 8.4 Code String 8.5 Table Pairs 8.6 Encoding Formation of a Code Block 8.6.1 Normal Mode 8.6.2 Run Mode 8.7 Completion of the Encoding of a Block Annex A - Example of a binary arithmetic coding algorithm 1 1 1 1 1 1 2 2 11 1 2 3 4 5 6 6.1 6.2 6.3 6.4 ECMA ECMA*A(159 91 m 3404593 001
8、0545 5 m -1- This ECMA Standard specifies an algorithm for the reduction of the number of bits required to represent information. This process is known as Data Compression. The algorithm uses binary arithmetic coding. The algorithm provides lossless compression and is intended for use in information
9、 interchange. Conformance A compression algorithm shall be in conformance with this Standard if it satisfies all mandatory requirements of this Standard. References International Register of Algorithms for Lossless Compression of Data (to be established). Conventions and Notations The following conv
10、entions and notations apply in this Standard unless otherwise stated: - In each field the bytes shall be arranged with Byte 1, the most significant, first. Within each byte the bits shall be arranged with Bit 1, the most significant bit, first and Bit 8, the least significant bit, last. - Letters an
11、d digits in parentheses represent numbers in hexadecimal notation. - The setting of bits is denoted by ZERO or ONE. - Numbers in binary notation and bit combinations are represented by strings of ZEROS and ONES. - Numbers in binary notation and bit combinations are shown with the most significant bi
12、t to the left. Algorithm Identifier The numeric identifier of this algorithm in the International Register shall be 16. Definitions For the purposes of this Standard, the following definitions apply. Block A portion of the Logical Data Record, usually having a length of 512 bytes (see 8.2). Code Blo
13、ck A Block after compression with a Trailer appended. Code String The encoded Logical Data Record. Encoding The process of generating Code Blocks from Blocks. 6.5 6.6 6.7 6.8 7 8 8.1 8.2 8.3 ECMA ECMAaL5 93 W 340Y593 0030546 . 7 -2- Input Event The sample of the input to an encoder currently being e
14、xamined; in Run Mode it is a byte; in Normal Mode it is a bit. Logical Data Record The data entity that is the input to the data compressor. Trailer Data appended to a Block after compression and addition of pad bits. Unique Table Pair The last of the 256 Table Pairs, used only in Run Mode. List of
15、Acronyms CV Current Value EV Estimated Value LDR Logical Data Record TP Table Pair Compression Algorithm General The LDR is transformed to a Code String by a one-pass, adaptive encoding technique designed to provide lossless data compression. By the use of a suitable decoding technique the exact ori
16、ginal LDR can be recovered from the Code String. Encoders The LDR shall be divided into 512-byte Blocks, except for the last Block, which may be of any length less than, or equal to, 512 bytes. The Blocks shall be routed sequentially to eight encoders, numbered from O to 7, commencing with encoder O
17、. If the LDR contains more than 4 0% bytes the compressor shall return to encoder O and repeat the process (see figure 2). Formafion of a Code Block The output of each encoder is a Code Block (see figure i). I I I I Pad bits I for COMPRESSED BLOCK I last I dat it shall be 1 or O. The second number (
18、K) shall be a measure of the probability of the Input Event being equal to the EV. K shall have the value 1, 2, 3 or 4, with the probability shown in table 1. Table 1- Probability values of K 4-8 8 - 16 The probabilities shall be a measure of how much more likely it is that the value of the Input Ev
19、ent is equal to the EV rather than being unequal (e.g. for K=2 the probability that the Input Event is equal to the EV shall be 2 to 4 times as great as the probability that it would not be equal), Before commencing the encoding of the LDR all EVs shall be set to ZERO and all values of K shall be se
20、t to ONE. 8.6 Encoding The data shall first be examined on a byte basis. Bytes shall be fetched sequentially from the Block, starting with the first byte, and compared with the previous byte. The first byte in a Block shall be compared with (40). Run Mode (see 8.6.2) shall be disabled when the first
21、 byte is fetched. ECMA ECNA*359 93 W 3404593 0030548 O W -4- If the current byte differs from the previous byte and Run Mode is not enabled, then the byte shall be encoded, bit by bit, in Normal Mode (see 8.6.1). If the current byte differs from the previous byte and Run Mode is enabled, then encodi
22、ng shall proceed as defined in 8.6.2.2. If the two bytes are identical and Run Mode is not enabled, then Run Mode shall be enabled and the byte shall be encoded, bit by bit, in Normal Mode, If the two bytes are identical and Run Mode is enabled, then encoding shall proceed as defined in 8.6.2. I. 8.
23、6.1 Normal Mode The first (most significant) bit of the byte shall be compared with the EV in the first Table Pair. Depending upon the result of this comparison, one of two actions, which are described in 8.6.1.1, shall result. The choice of which Table Pair to select for the remaining bits of the b
24、yte shall be determined by the bits previously encoded in the byte. The first bit of each byte shall always use the first Table Pair. The second bit shall use the second or third Table Pair, depending upon whether the first bit was ZERO or ONE, respectively. The third bit shall use one of the next f
25、our Table Pairs depending upon the first two bits. The fourth Table Pair shall be used if the first two bits were 00, and so on (see figure 3). , This procedure requires the first 255 Table Pairs. The remaining Table Pair is the Unique Table Pair; it shall be used in the Run Mode only (see 8.6.2). T
26、he process of encoding is then performed in two stages: - The bit is encoded as in 8hIa1, - The values of K and EV are revised as described in 8.6,1.2. 8.6.1.1 Bit Encoding Two values shall be developed during the bit comparison portion of the compression process. Both are fractional binary numbers
27、to four binary places. One value, termed the Current Value (CV), shall be used to generate the output (Code Block). The second value, termed the Width, shall have values in the range 0.0000 to 1.1 11 1. For each Block the CV shall be initialized to 0.0000 and the Width value shall be initialized to
28、1.0000. Each Input Event shall cause the two values to be modified according to the following rules: i) The Input Event is equal to the EV The CV shall be increased by 2-K The Width shall be decreased by 2-K where K is the measure of the probability of the Input Event (see 8.5). If the Code Block is
29、 not null the bit to the left of the point in the CV shall be logically added to the last bit appended to the Code Block and replaced in the CV by a ZERO. if this addition causes the most recently generated complete byte in the Code Block to become (FF), then (O) shall be appended to the Code Block
30、at the end of that byte to prevent a carry from propagating beyond the last complete byte. ECMA ECMA*159 91 3404593 OOLO549 2 -5- If the Code Block is null the bit to the left of the point in the CV will already be a ZERO. The Width shall then be compared with 1.0000. If it is equal to, or greater t
31、han, 1.0000, then the Table Pair shall be revised (see 8.6.1.2) and the next bit fetched for encoding. If it is less than 1.0000, then all the bits of the Width shall be shifted left one position; the leftmost bit, set to ZERO, shall be dropped, and the rightmost position shall be filled with a ZERO
32、. The bit in the first position to the right of the point in the CV shall be appended to the encoder output as part of the Code Block. The remaining bits to the right of the point in the CV shall be shifted left one position and the rightmost position shall be filled with a ZERO. To prevent a carry
33、from propagating beyond the last complete byte, each byte in the Code Block shall be examined as it is completed. If it equals (FF) then (O) shall be appended to the Code Block. The Input Event is not equal to the EV. The Width shall be reset to 1.0000. _- ii) The K bits to the right of the point in
34、 the CV shall be shifted out and appended to the Code Block, the leftmost bit first. As each bit is appended to the Code Block a check shall be made to determine whether a Byfe boundary has been reached. If it has, then the byte just completed shall be checked; if it equals (FF) then (O) shall be ap
35、pended to the Code Block. The rightmost K bit positions of the CV shall be set to ZEROS. 8.6.1.2 Revision of K and EV (see figure 4) As each Input Event is compared with the EV the values of K and the EV for that Table Pair shall be amended. A four-stage binary counter (Mc counter) shall be set to 0
36、000 at the beginning of each Block and incremented as described below. i) The Input Event is equal to the EV K shall be incremented according to table 2, where bits marked X shall be ignored. Table 2 - Rules for incrementing K Current value State of 1 ofK 1 Counter xx11 x111 1111 xxxx New value of K
37、 For all other states of the counter K shall be unchanged. The counter shall then be incremented by 1. If the counter value was 1111 incrementing by 1 shall result in a counter value of 0000. The EV shall be unchanged. I ECMA ECflA*l159 91 W 340Y593 0010550 9 m -6- i) The Input Event is not equal to
38、 the EV For K greater than 1 : The value of K shall be decremented by 1 the counter shall not be incremented the EV shall be unchanged the value of K shall be unchanged For K equal to 1 : the counter shall not be incremented the EV shall be inverted 8.6.2 Run Mode Run Mode shall have been enabled if
39、 the last two bytes to be encoded were identical. 8.6.2.1 If the current byte is identical with these two bytes, then the Unique Table Pair is selected and the EV is compared with ONE; the values of the Width, CV, K and EV are amended, as defined in 8.6.1.1 and 8.6.1.2. This may result in the Code B
40、lock being extended. If the current byte differs from these two bytes then the Unique Table Pair is selected and the EV is compared with ZERO; the values of the Width, CV, K and EV are amended, as defined in 8.6.1.1 and 8.6.1.2. This may result in the Code Block being extended, The byte is then enco
41、ded in Normal Mode and the values of the Width, CV, K and EV are amended as defined in 8.6.1.1 and 8.6.1.2. This may result in the Code Block being extended. Run Mode shall then be disabled. 8.6.2.2 8.7 Completion of the Encoding of a Block if, when the final byte of the Block has been encoded and R
42、un Mode is enabled, action shall be taken as defined in 8.6.2.2, except that no byte remains to be encoded in Normal Mode. When this action has been taken, or if Run Mode was not enabled, the encoder shall be cleared as follows: The four bits to the right of the point in the CV shall be appended to
43、the Code Block, starting with the leftmost bit. As each bit is appended to the Code Block a check shall be made to determine whether a byte boundary has been reached. If it has, then the byte just completed shall be checked; if it equals (FF) then (O) shall be appended to the Code Block. Any remaini
44、ng bits from the CV shall be appended to the Code Block. Pad bits and the Trailer shall then be appended to the Code Block as described in 8.3. -7- (n + 1) DATA BLOCKS NO, No.2 N0.3 N0.4 N0.5 No.6 No.7 No.8 No.9 No.10 - I / Encoder Encoder Encoder Encoder Encoder Encoder Encoder Encoder No. O No. 1
45、No. 2 No, 3 No. 4 No, 5 No, 6 No. 7 , - - - Code Code Code Code Code Code Code Code Code Code Code No. 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 No. 8 No. 9 No.10 No.11 Block Block Block Block Block Block Block Block Block Block Block ,. No. n+l - .I 5 12 bytes ROUTE first 512 byte Block from the Logica
46、l-Data-Record to Encoder O and BLOCK-ENCODE (see A.3.2) using Table O. The Code-Block (output of Encoder) is the first portion of the Code-String (compressor output). remainder of Logical-Data-Record is 512 bytes Encoder-Number = Encoder-Number + 1 DO WHILE SET IF Encoder-Number is 17 THEN ROUTE nex
47、t 5 12-byte Block to Encoder (Encoder-Number) and ELSE SET Encoder-Number = O BLOCK-ENCODE using Table (Encoder-Number). ROUTE next 512-byte Block to Encoder O and BLOCK-ENCODE using Table O. APPEND Code-Block to Code-String. END DO SET IF Encoder-Number is S 7 THEN ROUTE remainder of Logical-Data-R
48、ecord to Encoder (Encoder-Number) and ELSE SET EncoderNumber = O Encoder-Number = Encoder-Number + 1 BLOCK-ENCODE using Table (Encoder-Number). ROUTE remainder of Logical-Data-Record to Encoder O and BLOCK-ENCODE using Table O. APPEND Code-Block to Code-String. ROUTE Logical-Data-Record to Encoder O
49、 and BLOCK-ENCODE using Table O. Code-Block (output of Encoder) is the Code-String (compressor output). END PROCESS = COMPACT I iECMA ECHA8159 91 M 3404593 0010557 1 W - 14- A.3.2 BLOCK-ENCODE - Encoding Blocks is the process of converting Blocks of 512 bytes or less into Code-Blocks. Code-Blocks are terminated with Trailers. PROCESS = BLOCK-ENCODE SET Width = 1.0000 SET Current-Value (CV) = 0.0000 SET Previous-Byte = (40) SET Run-Mode = O SET Mc-Count = O000 SET SET Byte-Number = 1 DO WHILE Block-Bytes-Remaining O Block-Bytes-Remaining = number of bytes in Bl