1、 ETSI TS 123 042 V14.0.0 (2017-04) Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; Compression algorithm for text messaging services (3GPP TS 23.042 version 14.0.0 Release 14) floppy3TECHNICAL SPECIFICATION ETSI ETSI TS 123 042 V14
2、.0.0 (2017-04)13GPP TS 23.042 version 14.0.0 Release 14Reference RTS/TSGC-0123042ve00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre
3、 la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document
4、 shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within E
5、TSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https:/portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present d
6、ocument, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by
7、 written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2017. All rights reserved. DECTTM, PLUGTESTSTM
8、, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Association. ETS
9、I ETSI TS 123 042 V14.0.0 (2017-04)23GPP TS 23.042 version 14.0.0 Release 14Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI members
10、and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr.etsi.org
11、/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential to the pre
12、sent document. Foreword This Technical Specification (TS) has been produced by the ETSI 3rdGeneration Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as being
13、 references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“, “may“, “need not“, “will“, “wi
14、ll not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TS 123 042 V14.0.0 (2017-04)33GPP TS 23.0
15、42 version 14.0.0 Release 14Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 6g3Introduction 6g31 Scope 7g32 References 7g32.1 Normative references . 7g32.2 Informative references 7g33 Abbreviations . 7g34 Algorithms 7g34.1 Huffman Coding . 7g34.2 Characte
16、r Groups 9g34.3 UCS2 9g34.4 Keywords . 10g34.5 Punctuation . 10g34.6 Character Sets . 10g35 Compressed Data Streams 10g35.1 Structure . 10g35.2 Compression Header 11g35.2.1 Compression Header - Octet 1 11g35.2.2 Compression Header - Octets 2 to n . 12g35.2.2.1 Compression Header reserved extension t
17、ypes and values 14g35.2.3 Identifying unique parameter sets . 14g35.3 Compressed Data 14g35.4 Compression Footer . 16g36 Compression processes. 16g36.1 Overview 16g36.1.1 Compression . 17g36.1.2 Decompression . 18g36.2 Character sets . 19g36.2.1 Initialization 19g36.2.2 Character set conversion . 20
18、g36.2.3 Character case conversion 20g36.3 Punctuation processing . 20g36.3.1 Initialization 21g36.3.2 Compression . 22g36.3.3 Decompression . 23g36.4 Keywords . 23g36.4.1 Dictionaries . 23g36.4.2 Groups 24g36.4.3 Matches. 26g36.4.4 Initialization 27g36.4.5 Compression . 27g36.4.6 Decompression . 28g
19、36.5 UCS2 28g36.5.1 Initialization 28g36.5.2 Compression . 28g36.5.3 Decompression . 28g36.6 Character group processing 28g36.6.1 Character Groups 29g36.6.2 Initialization 30g3ETSI ETSI TS 123 042 V14.0.0 (2017-04)43GPP TS 23.042 version 14.0.0 Release 146.6.3 Compression . 30g36.6.4 Decompression .
20、 32g36.7 Huffman coding 32g36.7.1 Initialization Overview . 33g36.7.2 Initialization 34g36.7.3 Build Tree . 35g36.7.4 Update Tree 35g36.7.5 Add New Node . 35g36.7.6 Compression . 36g36.7.7 Decompression . 36g37 Test Vectors 36g3Annex A (normative): German Language parameters . 38g3A.1 Compression La
21、nguage Context 38g3A.2 Punctuators . 38g3A.3 Keyword Dictionaries. 39g3A.4 Character Groups 44g3A.5 Huffman Initializations. 47g3Annex B (normative): English language parameters 51g3B.1 Compression Language Context 51g3B.2 Punctuators . 51g3B.3 Keyword Dictionaries. 52g3B.4 Character Groups 57g3B.5
22、Huffman Initializations. 60g3Annex C (normative): Italian Language parameters 64g3Annex D (normative): French Language parameters . 65g3Annex E (normative): Spanish Language parameters . 66g3Annex F (normative): Dutch Language parameters . 67g3Annex G (normative): Swedish Language parameters . 68g3A
23、nnex H (normative): Danish Language parameters . 69g3Annex J (normative): Portuguese Language parameters 70g3Annex K (normative): Finnish Language parameters 71g3Annex L (normative): Norwegian Language parameters 72g3Annex M (normative): Greek Language parameters 73g3Annex N (normative): Turkish Lan
24、guage parameters . 74g3Annex P (normative): Reserved 75g3Annex Q (normative): Reserved 76g3Annex R (normative): Default Parameters for Unspecified Language . 77g3R.1 Compression Language Context 77g3R.2 Punctuators . 77g3ETSI ETSI TS 123 042 V14.0.0 (2017-04)53GPP TS 23.042 version 14.0.0 Release 14
25、R.3 Keyword Dictionaries. 77g3R.4 Character Groups 77g3R.5 Huffman Initializations. 78g3Annex S (informative): Change history . 79g3History 80g3ETSI ETSI TS 123 042 V14.0.0 (2017-04)63GPP TS 23.042 version 14.0.0 Release 14Foreword This Technical Specification (TS) has been produced by ETSI 3rd Gene
26、ration Partnership Project (3GPP). The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying change of release date and an incre
27、ase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 Indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, correcti
28、ons, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the specification; Introduction This clause introduces the concepts and mechanisms involved in the compression and decompression of a stream of data. Overview Central to the compression of a str
29、eam of data and the subsequent recovery of the original data is the that both sender and receiver have information that not only describes the content of the data stream, but how the stream is encoded. For example, a simple rule such as “its 8 bit data“ is enough to transport any character value in
30、the range 0 to 255 with 8 bits being required for each and every character. In contrast if both sender and receive know that some characters are more frequent than others, then the more frequent might be encoded in fewer bits while the less frequent in more - resulting in a net reduction of the tota
31、l number of bits used to express the data stream. This knowledge of the nature of the data stream can be established in two ways. Either both sender and receiver can agree some key aspects of the data stream prior to it being processed or key aspects of the data can be garnered dynamically during it
32、s processing. The disadvantage of an approach based on “prior information“ is that it must be known. It can either be carried as a header to the data stream, in which case it adds to the net size of the compressed stream. Or it can be fixed and known to the (de)compression algorithm itself in which
33、case compression performance degrades as a given stream diverges in nature from these fixed and known states. In contrast, the disadvantage of “dynamic information“ is that it must be discovered; typically this means a greater processing requirement for the (de)compressor. It also implies that compr
34、ession performance is initially poor as the algorithm has to “learn“ about the data stream before it can apply this knowledge. It will also require greater working memory to store its knowledge about the data stream. The choice of compression algorithms is always a balancing of compression rate (in
35、terms of fewer output bits), working memory requirements of the (de)compressor and CPU bandwidth. For the compression of SMS messages, there is the additional requirement that it should work well (in terms of compression rate) even on short data streams. Compression / Decompression is an optional fe
36、ature but when implemented, the only mandatory requirement is Raw Untrained Dynamic Huffman . The default initialisation for the Huffman Encoder / Decoder operating in the Raw Untrained Dynamic Huffman mode are defined in annex R. (See also subclause 4.1.) i.e. There is no need for any pre-defined a
37、ttributes such as language dependency to be included. This is of particular significance for entities such as an MS which may have memory storage constraints. ETSI ETSI TS 123 042 V14.0.0 (2017-04)73GPP TS 23.042 version 14.0.0 Release 141 Scope The present document introduces the concepts and mecha
38、nisms involved in the compression and decompression of a stream of data. 2 References The following documents contain provisions which, through reference in this text, constitute provisions of the present document. - References are either specific (identified by date of publication, edition number,
39、version number, etc.) or non-specific. - For a specific reference, subsequent revisions do not apply. - For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version
40、 of that document in the same Release as the present document. 2.1 Normative references 1 3GPP TS 23.038: “Alphabets and language-specific information“. 2.2 Informative references 2 “The Data Compression Handbook 2nd Edition“ by Mark Nelson and Jean-Loup Gailly, published by M or b) the (de)coder ca
41、n adapt the frequency distribution it uses to (de)code characters based on the incidence of previous characters within the input stream. In both cases, the character frequency distribution is represented in a “tree“ structure, an example of which is shown in figure 1. “Z“f=1“W“f=1Nodef=2“T“f=4Nodef=
42、6“R“f=6Nodef=12“A“f=10“O“f=10Nodef=20Nodef=32“E“f=40Root Nodef=72Figure 1: Character frequency distribution The tree represents the characters Z, W, T, R, A, O and E which have frequencies of 1, 1, 4, 6, 10, 10 and 40 respectively. The characters may be coded as variable length bit streams by starti
43、ng at the “character node“ and ascending to the “root node“. At each stage, if a left hand path is traversed, a 0 bit is emitted and if a right hand path is traversed a 1 bit is emitted. Thus the infrequent Z and W would require 5 bits, whereas the most frequent character E requires just 1 bit. The
44、resulting bit stream is decoded by starting at the “root node“ and descending the tree, to the left or right depending on the value of the current bit, until a “character node“ is reached. It is a requirement that at any time the trees expressing the character frequencies shall be identical for both
45、 coder and decoder. This can be achieved in a number of ways. Firstly, both coder and decoder could use a fixed and pre-agreed frequency distribution that includes all possible characters but as noted above, this use of “prior information“ suffers when a given input stream has a significantly differ
46、ent character frequency distribution. Secondly, the coder may calculate the character frequency distribution for the entire input stream and prepend this information to the encoded bit stream. The decoder would then generate the appropriate tree prior to processing the bitstream. This approach offer
47、s good compression, especially if the character frequency information may itself be compressed in some manner. Approaches of this type are common but the cost of the prepended information for a potentially small data stream makes it less attractive. Thirdly, extend the algorithm such that although b
48、oth coder and decoder start with known frequency distributions, and subsequently adapt these distributions to reflect the addition of each character in the input stream. One possibility is to have initial distributions that encompass all possible characters so that all that is required, as each inpu
49、t character is processed, is to increment the appropriate frequency and update the tree. However, the inclusion of all possible ETSI ETSI TS 123 042 V14.0.0 (2017-04)93GPP TS 23.042 version 14.0.0 Release 14characters in the initial distribution means that the tree is relatively slow to adapt, making this approach less appropriate for short messages. An alternative is to have an initial distribution that does not include all possible characters and to add new characters to the distribution if, and when, they occur in the input stream. To achieve the latter approach, th
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1