1、 ETSI TS 1Digital cellular telecommUniversal Mobile TelCompression algor(3GPP TS 23.0floppy3TECHNICAL SPECIFICATION123 042 V13.0.0 (2016mmunications system (Phase elecommunications System (LTE; orithm for text messaging ser.042 version 13.0.0 Release 1316-03) e 2+) (GSM); (UMTS); ervices 13) ETSI ET
2、SI TS 123 042 V13.0.0 (2016-03)13GPP TS 23.042 version 13.0.0 Release 13Reference RTS/TSGC-0123042vd00 Keywords GSM, LTE, UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non
3、lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of t
4、he present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific net
5、work drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http:/portal.etsi.org/tb/status/status.asp If you find errors in the
6、present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as auth
7、orized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2016. All rights reserved. DECTTM, PL
8、UGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Associa
9、tion. ETSI ETSI TS 123 042 V13.0.0 (2016-03)23GPP TS 23.042 version 13.0.0 Release 13Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI
10、 members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr
11、.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential t
12、o the present document. Foreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as
13、 being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“, “may“, “need not“, “will
14、“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TS 123 042 V13.0.0 (2016-03)33GPP T
15、S 23.042 version 13.0.0 Release 13Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 6g3Introduction 6g31 Scope 7g32 References 7g32.1 Normative references . 7g32.2 Informative references 7g33 Abbreviations . 7g34 Algorithms 7g34.1 Huffman Coding . 7g34.2 Ch
16、aracter Groups 9g34.3 UCS2 9g34.4 Keywords . 10g34.5 Punctuation . 10g34.6 Character Sets . 10g35 Compressed Data Streams 10g35.1 Structure . 10g35.2 Compression Header 11g35.2.1 Compression Header - Octet 1 11g35.2.2 Compression Header - Octets 2 to n . 12g35.2.2.1 Compression Header reserved exten
17、sion types and values 14g35.2.3 Identifying unique parameter sets . 14g35.3 Compressed Data 14g35.4 Compression Footer . 16g36 Compression processes. 16g36.1 Overview 16g36.1.1 Compression . 17g36.1.2 Decompression . 18g36.2 Character sets . 19g36.2.1 Initialization 19g36.2.2 Character set conversio
18、n . 20g36.2.3 Character case conversion 20g36.3 Punctuation processing . 20g36.3.1 Initialization 21g36.3.2 Compression . 22g36.3.3 Decompression . 23g36.4 Keywords . 23g36.4.1 Dictionaries . 23g36.4.2 Groups 24g36.4.3 Matches. 26g36.4.4 Initialization 27g36.4.5 Compression . 27g36.4.6 Decompression
19、 . 28g36.5 UCS2 28g36.5.1 Initialization 28g36.5.2 Compression . 28g36.5.3 Decompression . 28g36.6 Character group processing 28g36.6.1 Character Groups 29g3ETSI ETSI TS 123 042 V13.0.0 (2016-03)43GPP TS 23.042 version 13.0.0 Release 136.6.2 Initialization 30g36.6.3 Compression . 30g36.6.4 Decompres
20、sion . 32g36.7 Huffman coding 32g36.7.1 Initialization Overview . 33g36.7.2 Initialization 34g36.7.3 Build Tree . 35g36.7.4 Update Tree 35g36.7.5 Add New Node . 35g36.7.6 Compression . 36g36.7.7 Decompression . 36g37 Test Vectors 36g3Annex A (normative): German Language parameters . 38g3A.1 Compress
21、ion Language Context 38g3A.2 Punctuators . 38g3A.3 Keyword Dictionaries. 39g3A.4 Character Groups 43g3A.5 Huffman Initializations. 45g3Annex B (normative): English language parameters 49g3B.1 Compression Language Context 49g3B.2 Punctuators . 49g3B.3 Keyword Dictionaries. 50g3B.4 Character Groups 54
22、g3B.5 Huffman Initializations. 56g3Annex C (normative): Italian Language parameters 60g3Annex D (normative): French Language parameters . 61g3Annex E (normative): Spanish Language parameters . 62g3Annex F (normative): Dutch Language parameters . 63g3Annex G (normative): Swedish Language parameters .
23、 64g3Annex H (normative): Danish Language parameters . 65g3Annex J (normative): Portuguese Language parameters 66g3Annex K (normative): Finnish Language parameters 67g3Annex L (normative): Norwegian Language parameters 68g3Annex M (normative): Greek Language parameters 69g3Annex N (normative): Turki
24、sh Language parameters . 70g3Annex P (normative): Reserved 71g3Annex Q (normative): Reserved 72g3Annex R (normative): Default Parameters for Unspecified Language . 73g3R.1 Compression Language Context 73g3ETSI ETSI TS 123 042 V13.0.0 (2016-03)53GPP TS 23.042 version 13.0.0 Release 13R.2 Punctuators
25、. 73g3R.3 Keyword Dictionaries. 73g3R.4 Character Groups 73g3R.5 Huffman Initializations. 74g3Annex S (informative): Change history . 75g3History 76g3ETSI ETSI TS 123 042 V13.0.0 (2016-03)63GPP TS 23.042 version 13.0.0 Release 13Foreword This Technical Specification has been produced by the 3GPP. Th
26、e contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Ver
27、sion x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 Indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit
28、is incremented when editorial only changes have been incorporated in the specification; Introduction This clause introduces the concepts and mechanisms involved in the compression and decompression of a stream of data. Overview Central to the compression of a stream of data and the subsequent recove
29、ry of the original data is the that both sender and receiver have information that not only describes the content of the data stream, but how the stream is encoded. For example, a simple rule such as “its 8 bit data“ is enough to transport any character value in the range 0 to 255 with 8 bits being
30、required for each and every character. In contrast if both sender and receive know that some characters are more frequent than others, then the more frequent might be encoded in fewer bits while the less frequent in more - resulting in a net reduction of the total number of bits used to express the
31、data stream. This knowledge of the nature of the data stream can be established in two ways. Either both sender and receiver can agree some key aspects of the data stream prior to it being processed or key aspects of the data can be garnered dynamically during its processing. The disadvantage of an
32、approach based on “prior information“ is that it must be known. It can either be carried as a header to the data stream, in which case it adds to the net size of the compressed stream. Or it can be fixed and known to the (de)compression algorithm itself in which case compression performance degrades
33、 as a given stream diverges in nature from these fixed and known states. In contrast, the disadvantage of “dynamic information“ is that it must be discovered; typically this means a greater processing requirement for the (de)compressor. It also implies that compression performance is initially poor
34、as the algorithm has to “learn“ about the data stream before it can apply this knowledge. It will also require greater working memory to store its knowledge about the data stream. The choice of compression algorithms is always a balancing of compression rate (in terms of fewer output bits), working
35、memory requirements of the (de)compressor and CPU bandwidth. For the compression of SMS messages, there is the additional requirement that it should work well (in terms of compression rate) even on short data streams. Compression / Decompression is an optional feature but when implemented, the only
36、mandatory requirement is “Raw Untrained Dynamic Huffman“ . The default initialisation for the Huffman Encoder / Decoder operating in the Raw Untrained Dynamic Huffman mode are defined in annex R. (See also subclause 4.1.) i.e. There is no need for any pre-defined attributes such as language dependen
37、cy to be included. This is of particular significance for entities such as an MS which may have memory storage constraints. ETSI ETSI TS 123 042 V13.0.0 (2016-03)73GPP TS 23.042 version 13.0.0 Release 131 Scope The present document introduces the concepts and mechanisms involved in the compression a
38、nd decompression of a stream of data. 2 References The following documents contain provisions which, through reference in this text, constitute provisions of the present document. - References are either specific (identified by date of publication, edition number, version number, etc.) or non-specif
39、ic. - For a specific reference, subsequent revisions do not apply. - For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Relea
40、se as the present document. 2.1 Normative references 1 3GPP TS 23.038: “Alphabets and language-specific information“. 2.2 Informative references 2 “The Data Compression Handbook 2nd Edition“ by Mark Nelson and Jean-Loup Gailly, published by M or b) the (de)coder can adapt the frequency distribution
41、it uses to (de)code characters based on the incidence of previous characters within the input stream. In both cases, the character frequency distribution is represented in a “tree“ structure, an example of which is shown in figure 1. “Z“f=1“W“f=1Nodef=2“T“f=4Nodef=6“R“f=6Nodef=12“A“f=10“O“f=10Nodef=
42、20Nodef=32“E“f=40Root Nodef=72Figure 1: Character frequency distribution The tree represents the characters Z, W, T, R, A, O and E which have frequencies of 1, 1, 4, 6, 10, 10 and 40 respectively. The characters may be coded as variable length bit streams by starting at the “character node“ and asce
43、nding to the “root node“. At each stage, if a left hand path is traversed, a 0 bit is emitted and if a right hand path is traversed a 1 bit is emitted. Thus the infrequent Z and W would require 5 bits, whereas the most frequent character E requires just 1 bit. The resulting bit stream is decoded by
44、starting at the “root node“ and descending the tree, to the left or right depending on the value of the current bit, until a “character node“ is reached. It is a requirement that at any time the trees expressing the character frequencies shall be identical for both coder and decoder. This can be ach
45、ieved in a number of ways. Firstly, both coder and decoder could use a fixed and pre-agreed frequency distribution that includes all possible characters but as noted above, this use of “prior information“ suffers when a given input stream has a significantly different character frequency distributio
46、n. Secondly, the coder may calculate the character frequency distribution for the entire input stream and prepend this information to the encoded bit stream. The decoder would then generate the appropriate tree prior to processing the bitstream. This approach offers good compression, especially if t
47、he character frequency information may itself be compressed in some manner. Approaches of this type are common but the cost of the prepended information for a potentially small data stream makes it less attractive. Thirdly, extend the algorithm such that although both coder and decoder start with kn
48、own frequency distributions, and subsequently adapt these distributions to reflect the addition of each character in the input stream. One possibility is to have initial distributions that encompass all possible characters so that all that is required, as each input character is processed, is to inc
49、rement the appropriate frequency and update the tree. However, the inclusion of all possible ETSI ETSI TS 123 042 V13.0.0 (2016-03)93GPP TS 23.042 version 13.0.0 Release 13characters in the initial distribution means that the tree is relatively slow to adapt, making this approach less appropriate for short messages. An alternative is to have an initial distribution that does not include all possible characters and to add new characters to the distribution if, and when, they occur in the input stream. To achieve the latter approach, the concept of a “special“ character is
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1