ECMA321-2001 Streaming Lossless Data Compression Algorithm - (SLDC).pdf

上传人:rimleave225 文档编号:704925 上传时间:2019-01-03 格式:PDF 页数:20 大小:59.70KB
下载 相关 举报
ECMA321-2001 Streaming Lossless Data Compression Algorithm - (SLDC).pdf_第1页
第1页 / 共20页
ECMA321-2001 Streaming Lossless Data Compression Algorithm - (SLDC).pdf_第2页
第2页 / 共20页
ECMA321-2001 Streaming Lossless Data Compression Algorithm - (SLDC).pdf_第3页
第3页 / 共20页
ECMA321-2001 Streaming Lossless Data Compression Algorithm - (SLDC).pdf_第4页
第4页 / 共20页
ECMA321-2001 Streaming Lossless Data Compression Algorithm - (SLDC).pdf_第5页
第5页 / 共20页
点击查看更多>>
资源描述

1、 Standard ECMA-321 June 2001 Standardizing Information and Communication Systems Phone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http:/www.ecma.ch - Internet: helpdeskecma.ch Streaming Lossless Data Compression Algorithm (SLDC) . Standard ECMA-321 June 2001 Standardizing Information and Commu

2、nication Systems Phone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http:/www.ecma.ch - Internet: helpdeskecma.ch P ECMA-321.doc 09-07-01 Streaming Lossless Data Compression Algorithm (SLDC) . Brief History In the past decades, ECMA has published numerous ECMA Standards for magnetic tapes, magne

3、tic tape cassettes and cartridges, as well as for optical disk cartridges. Those media developed recently have a very high physical recording density. In order to make optimal use of the resulting data capacity, lossless compression algorithms have been designed that allow a reduction of the number

4、of bits required for the representation of user data. These compression algorithms are registered by ECMA, the International Registration Authority established by ISO/IEC. The registration consists in allocating to each registered algorithm a numerical identifier that will be recorded on the medium

5、and, thus, indicate which compression algorithm(s) has been used. This ECMA Standard is the fourth ECMA Standard for compression algorithms. The three previous standards are: ECMA-151 ISO/IEC 11558 Data Compression for Information Interchange Adaptive Coding with Embedded Dictionary DCLZ Algorithm (

6、June 1991) ECMA-159 ISO/IEC 12042 Data Compression for Information Interchange - Binary Arithmetic Coding Algorithm - (December 1991) ECMA-222 ISO/IEC 15220 Adaptive Lossless Data Compression Algorithm (June 1995) This ECMA Standard ECMA-SLDC is based on ECMA-222. It has been extended by defining a

7、set of control symbols, that identify: record boundaries in user data locations in the Encoded Data Stream at which the history buffer is reset locations in the Encoded Data Stream at which pad bits are inserted up to the next 32-bit boundary sections in the Encoded Data Stream that contain compress

8、ed or uncompressed data This compression algorithm allows for records of different size and compressibility, along with File Marks, to be efficiently encoded into an output stream in which little or no additional control information is needed to later decode user data. All ECMA Standards listed abov

9、e have been adopted by ISO/IEC as International Standards. This ECMA Standard will also be contributed to ISO/IEC for adoption as an International Standard under the fast-track procedure. This ECMA Standard has been adopted by the ECMA General Assembly of June 2001. - i - Table of contents 1 Scope 1

10、 2 Conformance 1 3 Reference 1 4 Definitions 1 4.1 Control Symbol 1 4.2 Copy Pointer 1 4.3 data byte 1 4.4 Data Symbol 1 4.5 Displacement Field 1 4.6 Encoded Data Stream 1 4.7 Encoded Record 1 4.8 End Marker 1 4.9 End Of Record Symbol (EOR Symbol) 1 4.10 File Mark 1 4.11 File Mark Symbol 2 4.12 Flus

11、h Symbol 2 4.13 History Buffer 2 4.14 Literal 1 2 4.15 Literal 2 2 4.16 Matching String 2 4.17 Match Count 2 4.18 Match Count Field 2 4.19 Pad 2 4.20 Record 2 4.21 Record Segment 2 4.22 Re-entry point 1 4.23 Reset X Symbol 2 4.24 Reset 1 Symbol 2 4.25 Reset 2 Symbol 2 4.26 scheme 1 2 4.27 Scheme 1 S

12、ymbol 2 4.28 scheme 2 2 4.29 Scheme 2 Symbol 3 4.30 user data 3 5 Conventions and Notations 3 5.1 Representation of numbers 3 5.2 Names 3 6 Acronyms 3 7 Algorithm Overview 3 7.1 Scheme 1 Encoding 3 - ii - 7.2 Scheme 2 Encoding 3 7.3 History Buffer 4 8 Encoding Specification 4 8.1 User Data 4 8.2 His

13、tory Buffer 4 8.3 Encoded Data Stream 4 8.3.1 Re-entry Point 5 8.4 Data Symbols 5 8.4.1 Literal 1 Data Symbols 5 8.4.2 Copy Pointer Data Symbols 5 8.4.3 Literal 2 Data Symbols 7 8.5 Control Symbols 7 8.6 Pad 8 1 Scope This ECMA Standard specifies a lossless compression algorithm to reduce the number

14、 of 8-bit bytes required to represent data records and File Marks. The algorithm is known as Streaming Lossless Data Compression algorithm (SLDC). One buffer size (1 024 bytes) is specified. The numerical identifier according to ISO/IEC 11576 allocated to this algorithm is 6. 2 Conformance A compres

15、sion algorithm shall be in conformance with this ECMA Standard if its Encoded Data Stream satisfies the requirements of this ECMA Standard. 3 Reference ISO/IEC 11576:1993, Information Technology - Procedure for the Registration of Algorithms for the Lossless Compression of Data. 4 Definitions 4.1 Ac

16、cess Point A location in the Encoded Data Stream at which data may be decoded. 4.2 Control Symbol A Control Symbol may change the compression scheme, reset the History Buffer, mark the end of a Record, indicate a File Mark, or indicate the termination of an Encoded Data Stream. 4.3 Copy Pointer A pa

17、rt of the Encoded Data Stream output in scheme 1 that replaces a string of data bytes with a specification of a Matching String. 4.4 data byte An element of user data that is to be encoded. 4.5 Data Symbol An element of an Encoded Record that represents one or more data bytes. 4.6 Displacement Field

18、 A field in the Copy Pointer that specifies the location within the History Buffer of the first byte of a Matching String. 4.7 Encoded Data Stream The output stream after encoding User Data. 4.8 Encoded Record The output stream after encoding one Record of user data. 4.9 End Marker A Control Symbol

19、that denotes termination of an Encoded Data Stream. 4.10 End Of Record Symbol (EOR Symbol) A Control Symbol that denotes the end of a Record in the Encoded Data Stream. 4.11 File Mark A recorded element used to mark organisational boundaries (e.g. directory boundaries) in user data. - 2 - 4.12 File

20、Mark Symbol A Control Symbol in Encoded Data Stream that denotes a File Mark in user data. 4.13 Flush Symbol A Control Symbol that, if required, is followed by Pad to make the size of the Encoded Data Stream an integer multiple of 32 bits. 4.14 History Buffer A data structure where incoming data byt

21、es are stored for use by scheme 1 compression and decompression. 4.15 Literal 1 A part of the Encoded Data Stream, output in scheme 1, that represents a single data byte not encoded into any Copy Pointer. 4.16 Literal 2 A part of the Encoded Data Stream, output in scheme 2, that represents a single

22、data byte. 4.17 Matching String A sequence of two or more bytes in the History Buffer that is identical with a sequence of bytes in the user data. 4.18 Match Count The length, in bytes, of a Matching String. 4.19 Match Count Field That part of a Copy Pointer that specifies the Match Count. 4.20 Pad

23、A number of bits inserted into the Encoded Data Stream so that the size of Encoded Data Stream is an integer multiple of 32 bits. 4.21 Record An element of user data that contains at least one data byte. 4.22 Record Segment A section of a Record encoded in a given scheme. 4.23 Reset X Symbol A gener

24、ic reference to either the Reset 1 Symbol or the Reset 2 Symbol. 4.24 Reset 1 Symbol A Control Symbol that indicates History Buffer reset, and that subsequent symbols are encoded in scheme 1. 4.25 Reset 2 Symbol A Control Symbol that indicates History Buffer reset, and that subsequent symbols are en

25、coded in scheme 2. 4.26 scheme 1 A compression scheme that uses a History Buffer to achieve data compression. 4.27 Scheme 1 Symbol A Control Symbol that indicates subsequent Data Symbols are either Copy Pointers or Literal 1s. 4.28 scheme 2 A packing scheme designed to encode uncompressible data wit

26、h minimal expansion. - 3 - 4.29 Scheme 2 Symbol A Control Symbol that indicates subsequent Data Symbols are encoded in scheme 2. 4.30 user data Information that is to be encoded, according to this compression al gorithm. 5 Conventions and Notation 5.1 Representation of numbers The following conventi

27、ons and notations apply in this document unless otherwise stated. The setting of bits is denoted by ZERO or ONE. Numbers in binary notation and bit combinations are strings of digits represented by ZEROs and ONEs with the most significant bit to the left. Letters and digits in parentheses represent

28、numbers in hexadecimal notation. All other numbers are in decimal form 5.2 Names The names of basic elements, e.g. specific fields, are written with a capital initial letter. 6 Acronyms EOR End Of Record lsb least significant bit msb most significant bit 7 Algorithm Overview User data that is to be

29、compressed according to this ECMA Standard consists of Records and File Marks. Records consist of 8-bit data bytes, and may be of any non-zero length. Data bytes may be encoded in either scheme 1 or scheme 2. 7.1 Scheme 1 Encoding There may exist within Records repeating strings of two or more data

30、bytes such that information about the length and position of one string may be substituted in place of a subsequent copy or copies of that same string. This information is known as a Copy Pointer. This ECMA Standard allows Copy Pointer substitution when corresponding bytes of the two strings are off

31、set by 1 to 1 023 data bytes within user data. Where string matches occur, data compression is possible, and the number of bits of encoded data can be less than the number of bits of user data, and data compression is possible. Any data bytes that are part of a repeated string may be encoded as a Co

32、py Pointer. Any data byte that is not encoded as a Copy Pointer is encoded as a Literal 1, in which a leading bit set to ZERO is added to the data byte, thereby indicating that this is a Literal 1. Regions over which Copy Pointers and literal values are encoded are defined as being encoded according

33、 to scheme 1. Scheme 1 encoding is identical with that of ECMA-222, except for the addition of Control Symbols. These are both implementations of the Lempel-Ziv 1 (LZ1) class of data compression algorithms. Following a Reset 1 Symbol or a Scheme 1 Symbol, all bytes of user data shall be encoded acco

34、rding to scheme 1. 7.2 Scheme 2 Encoding There may also exist within user data, regions in which few such repeating strings exist. Where there are no repeating strings, scheme1 encoding requires a 9-bit Literal 1value in the Encoded Data Stream for every data byte. This results in an Encoded Data St

35、ream that has 12,5 % more bits than the user data. In order to avoid this data expansion, scheme 2 encoding may be used. In scheme 2 encoding, data bytes are copied to the output bit stream. In order for a decoder to distinguish a data byte set to (FF) from a Control Symbol, a trailing bit set to ZE

36、RO is encoded following every data byte of (FF). For random data, this tends to - 4 - produce an Encoded Data Stream that has about 0,05 % more bits than the user data. Following a Reset 2 Symbol or a Scheme 2 Symbol, all bytes of user data shall be encoded according to scheme 2. 7.3 History Buffer

37、Matching strings are found within a 1 024-byte History Buffer. Prior to a Reset X Symbol in the Encoded Data Stream, the History Buffer is undefined. Immediately following a Reset X Symbol, the History Buffer is defined as containing no data. As the first 1 024 data bytes following a Reset X Symbol

38、are recorded, each byte is stored in a subsequent location in the History Buffer, from 0 to 1 023. For each data byte N, comparisons may be made with each of the data bytes at locations 0 to N-1 to test for Matching Strings. Once the History Buffer is filled, new bytes replace previously stored byte

39、s in locations 0 to 1 023. The storage location wraps from 1 023 to 0. For a data byte stored at location N, comparisons may be made with each of the data bytes at locations other than N, to test for Matching Strings. Matching Strings may wrap around the end of the History Buffer (e.g. Offset 1 022,

40、 Length 10). By updating the History Buffer identically during decoding, the decoder History Buffer shall be identical, after outputting any specific data byte, with the encoder History Buffer after encoding that same data byte. It is, therefore, not necessary to separately include history content i

41、nformation within the Encoded Data Stream. This ECMA Standard does not specify the conditions under which to reset the History Buffer, switch between scheme 1 and scheme 2, or flush to a 32-bit boundary. 8 Encoding Specification 8.1 User Data User data shall consist of Records and File Marks. Record

42、s may contain any number of 8-bit data bytes. File Marks may be used to mark organisational boundaries (e.g. directory boundaries) in user data. 8.2 History Buffer A History Buffer shall be filled with 1 024 byte locations, numbered 0 to 1 023. For any data byte being recorded, the History Buffer ma

43、y contain the 1 023 preceding data bytes. The content of the History Buffer is initially undefined. Immediately following a Reset X Symbol, the History Buffer is said to contain no data. As each byte N of the first 1 024 bytes are recorded, that byte is stored to location N, from 0 to 1 023, and His

44、tory Buffer locations 0 to N-1 may be examined to find matching data bytes. After a data byte is recorded in location 1 023, the next data byte is recorded in location 0, replacing the first data byte that was stored there. As each subsequent data byte is stored to location N, all History Buffer loc

45、ations other than location N may be examined to find matching data bytes. Data bytes shall be recorded sequentially into the History Buffer regardless of Record boundaries, File Marks, or encoding scheme. A Reset X Symbol causes the next data byte to be stored to location 0, regardless of the locati

46、on to which any previous byte was stored. 8.3 Encoded Data Stream An Encoded Data Stream is a stream of Data Symbols, Control Symbols, and Pads. These shall be packed into consecutive bytes, starting with the most significant bit of byte 0. Figure 1 shows a Control Symbol followed by Data Symbol pac

47、ked into bytes. - 5 - Control Symbol Data Symbol . Encoded Data Symbols msb lsb msb lsb Byte 0 Byte 1 Byte 2 Encoded Data Bytes msb lsb msb lsb msb lsb Figure 1 Encoded Data Stream, packing into bytes NOTE Although an Encoded Data Stream is filled with Pad to 32-bit boundaries, packing is expressed

48、by 8-bit bytes. This allows byte ordering within larger words to be specified by any application that uses this compression algorithm. 8.3.1 Access Point An Access Point is a location in the Encoded Data Stream at which data may be decoded, starting with either a File Mark or the first data byte of

49、a Record. An Access Point must meet the following three requirements. It shall occur at a 32-bit boundary. (Previous data shall be padded). A Reset X Symbol shall precede the first Data Symbol following the Access Point. The first Data Symbol following the Access Point shall represent the first data byte in a Record. 8.4 Data Symbols Data Symbols shall be used to represent bytes of user data. The Data Symbols are Literal 1, Copy Pointer, or Literal 2. Following either a Scheme 1 Symbol or a Reset 1 Symbol, only Literal 1 and Copy Pointer Symbols shall be

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 标准规范 > 国际标准 > 其他

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1