ETSI TS 123 042-2016 Digital cellular telecommunications system (Phase 2+) (GSM) Universal Mobile Telecommunications System (UMTS) LTE Compression algorithm for text messaging serv.pdf

上传人:eveningprove235 文档编号:740684 上传时间:2019-01-11 格式:PDF 页数:77 大小:350.28KB
下载 相关 举报
ETSI TS 123 042-2016 Digital cellular telecommunications system (Phase 2+) (GSM) Universal Mobile Telecommunications System (UMTS) LTE Compression algorithm for text messaging serv.pdf_第1页
第1页 / 共77页
ETSI TS 123 042-2016 Digital cellular telecommunications system (Phase 2+) (GSM) Universal Mobile Telecommunications System (UMTS) LTE Compression algorithm for text messaging serv.pdf_第2页
第2页 / 共77页
ETSI TS 123 042-2016 Digital cellular telecommunications system (Phase 2+) (GSM) Universal Mobile Telecommunications System (UMTS) LTE Compression algorithm for text messaging serv.pdf_第3页
第3页 / 共77页
ETSI TS 123 042-2016 Digital cellular telecommunications system (Phase 2+) (GSM) Universal Mobile Telecommunications System (UMTS) LTE Compression algorithm for text messaging serv.pdf_第4页
第4页 / 共77页
ETSI TS 123 042-2016 Digital cellular telecommunications system (Phase 2+) (GSM) Universal Mobile Telecommunications System (UMTS) LTE Compression algorithm for text messaging serv.pdf_第5页
第5页 / 共77页
点击查看更多>>
资源描述

1、 ETSI TS 1Digital cellular telecommUniversal Mobile TelCompression algor(3GPP TS 23.0floppy3TECHNICAL SPECIFICATION123 042 V13.0.0 (2016mmunications system (Phase elecommunications System (LTE; orithm for text messaging ser.042 version 13.0.0 Release 1316-03) e 2+) (GSM); (UMTS); ervices 13) ETSI ET

2、SI TS 123 042 V13.0.0 (2016-03)13GPP TS 23.042 version 13.0.0 Release 13Reference RTS/TSGC-0123042vd00 Keywords GSM, LTE, UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non

3、lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of t

4、he present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific net

5、work drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http:/portal.etsi.org/tb/status/status.asp If you find errors in the

6、present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as auth

7、orized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2016. All rights reserved. DECTTM, PL

8、UGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Associa

9、tion. ETSI ETSI TS 123 042 V13.0.0 (2016-03)23GPP TS 23.042 version 13.0.0 Release 13Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI

10、 members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr

11、.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential t

12、o the present document. Foreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as

13、 being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“, “may“, “need not“, “will

14、“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TS 123 042 V13.0.0 (2016-03)33GPP T

15、S 23.042 version 13.0.0 Release 13Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 6g3Introduction 6g31 Scope 7g32 References 7g32.1 Normative references . 7g32.2 Informative references 7g33 Abbreviations . 7g34 Algorithms 7g34.1 Huffman Coding . 7g34.2 Ch

16、aracter Groups 9g34.3 UCS2 9g34.4 Keywords . 10g34.5 Punctuation . 10g34.6 Character Sets . 10g35 Compressed Data Streams 10g35.1 Structure . 10g35.2 Compression Header 11g35.2.1 Compression Header - Octet 1 11g35.2.2 Compression Header - Octets 2 to n . 12g35.2.2.1 Compression Header reserved exten

17、sion types and values 14g35.2.3 Identifying unique parameter sets . 14g35.3 Compressed Data 14g35.4 Compression Footer . 16g36 Compression processes. 16g36.1 Overview 16g36.1.1 Compression . 17g36.1.2 Decompression . 18g36.2 Character sets . 19g36.2.1 Initialization 19g36.2.2 Character set conversio

18、n . 20g36.2.3 Character case conversion 20g36.3 Punctuation processing . 20g36.3.1 Initialization 21g36.3.2 Compression . 22g36.3.3 Decompression . 23g36.4 Keywords . 23g36.4.1 Dictionaries . 23g36.4.2 Groups 24g36.4.3 Matches. 26g36.4.4 Initialization 27g36.4.5 Compression . 27g36.4.6 Decompression

19、 . 28g36.5 UCS2 28g36.5.1 Initialization 28g36.5.2 Compression . 28g36.5.3 Decompression . 28g36.6 Character group processing 28g36.6.1 Character Groups 29g3ETSI ETSI TS 123 042 V13.0.0 (2016-03)43GPP TS 23.042 version 13.0.0 Release 136.6.2 Initialization 30g36.6.3 Compression . 30g36.6.4 Decompres

20、sion . 32g36.7 Huffman coding 32g36.7.1 Initialization Overview . 33g36.7.2 Initialization 34g36.7.3 Build Tree . 35g36.7.4 Update Tree 35g36.7.5 Add New Node . 35g36.7.6 Compression . 36g36.7.7 Decompression . 36g37 Test Vectors 36g3Annex A (normative): German Language parameters . 38g3A.1 Compress

21、ion Language Context 38g3A.2 Punctuators . 38g3A.3 Keyword Dictionaries. 39g3A.4 Character Groups 43g3A.5 Huffman Initializations. 45g3Annex B (normative): English language parameters 49g3B.1 Compression Language Context 49g3B.2 Punctuators . 49g3B.3 Keyword Dictionaries. 50g3B.4 Character Groups 54

22、g3B.5 Huffman Initializations. 56g3Annex C (normative): Italian Language parameters 60g3Annex D (normative): French Language parameters . 61g3Annex E (normative): Spanish Language parameters . 62g3Annex F (normative): Dutch Language parameters . 63g3Annex G (normative): Swedish Language parameters .

23、 64g3Annex H (normative): Danish Language parameters . 65g3Annex J (normative): Portuguese Language parameters 66g3Annex K (normative): Finnish Language parameters 67g3Annex L (normative): Norwegian Language parameters 68g3Annex M (normative): Greek Language parameters 69g3Annex N (normative): Turki

24、sh Language parameters . 70g3Annex P (normative): Reserved 71g3Annex Q (normative): Reserved 72g3Annex R (normative): Default Parameters for Unspecified Language . 73g3R.1 Compression Language Context 73g3ETSI ETSI TS 123 042 V13.0.0 (2016-03)53GPP TS 23.042 version 13.0.0 Release 13R.2 Punctuators

25、. 73g3R.3 Keyword Dictionaries. 73g3R.4 Character Groups 73g3R.5 Huffman Initializations. 74g3Annex S (informative): Change history . 75g3History 76g3ETSI ETSI TS 123 042 V13.0.0 (2016-03)63GPP TS 23.042 version 13.0.0 Release 13Foreword This Technical Specification has been produced by the 3GPP. Th

26、e contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Ver

27、sion x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 Indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit

28、is incremented when editorial only changes have been incorporated in the specification; Introduction This clause introduces the concepts and mechanisms involved in the compression and decompression of a stream of data. Overview Central to the compression of a stream of data and the subsequent recove

29、ry of the original data is the that both sender and receiver have information that not only describes the content of the data stream, but how the stream is encoded. For example, a simple rule such as “its 8 bit data“ is enough to transport any character value in the range 0 to 255 with 8 bits being

30、required for each and every character. In contrast if both sender and receive know that some characters are more frequent than others, then the more frequent might be encoded in fewer bits while the less frequent in more - resulting in a net reduction of the total number of bits used to express the

31、data stream. This knowledge of the nature of the data stream can be established in two ways. Either both sender and receiver can agree some key aspects of the data stream prior to it being processed or key aspects of the data can be garnered dynamically during its processing. The disadvantage of an

32、approach based on “prior information“ is that it must be known. It can either be carried as a header to the data stream, in which case it adds to the net size of the compressed stream. Or it can be fixed and known to the (de)compression algorithm itself in which case compression performance degrades

33、 as a given stream diverges in nature from these fixed and known states. In contrast, the disadvantage of “dynamic information“ is that it must be discovered; typically this means a greater processing requirement for the (de)compressor. It also implies that compression performance is initially poor

34、as the algorithm has to “learn“ about the data stream before it can apply this knowledge. It will also require greater working memory to store its knowledge about the data stream. The choice of compression algorithms is always a balancing of compression rate (in terms of fewer output bits), working

35、memory requirements of the (de)compressor and CPU bandwidth. For the compression of SMS messages, there is the additional requirement that it should work well (in terms of compression rate) even on short data streams. Compression / Decompression is an optional feature but when implemented, the only

36、mandatory requirement is “Raw Untrained Dynamic Huffman“ . The default initialisation for the Huffman Encoder / Decoder operating in the Raw Untrained Dynamic Huffman mode are defined in annex R. (See also subclause 4.1.) i.e. There is no need for any pre-defined attributes such as language dependen

37、cy to be included. This is of particular significance for entities such as an MS which may have memory storage constraints. ETSI ETSI TS 123 042 V13.0.0 (2016-03)73GPP TS 23.042 version 13.0.0 Release 131 Scope The present document introduces the concepts and mechanisms involved in the compression a

38、nd decompression of a stream of data. 2 References The following documents contain provisions which, through reference in this text, constitute provisions of the present document. - References are either specific (identified by date of publication, edition number, version number, etc.) or non-specif

39、ic. - For a specific reference, subsequent revisions do not apply. - For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Relea

40、se as the present document. 2.1 Normative references 1 3GPP TS 23.038: “Alphabets and language-specific information“. 2.2 Informative references 2 “The Data Compression Handbook 2nd Edition“ by Mark Nelson and Jean-Loup Gailly, published by M or b) the (de)coder can adapt the frequency distribution

41、it uses to (de)code characters based on the incidence of previous characters within the input stream. In both cases, the character frequency distribution is represented in a “tree“ structure, an example of which is shown in figure 1. “Z“f=1“W“f=1Nodef=2“T“f=4Nodef=6“R“f=6Nodef=12“A“f=10“O“f=10Nodef=

42、20Nodef=32“E“f=40Root Nodef=72Figure 1: Character frequency distribution The tree represents the characters Z, W, T, R, A, O and E which have frequencies of 1, 1, 4, 6, 10, 10 and 40 respectively. The characters may be coded as variable length bit streams by starting at the “character node“ and asce

43、nding to the “root node“. At each stage, if a left hand path is traversed, a 0 bit is emitted and if a right hand path is traversed a 1 bit is emitted. Thus the infrequent Z and W would require 5 bits, whereas the most frequent character E requires just 1 bit. The resulting bit stream is decoded by

44、starting at the “root node“ and descending the tree, to the left or right depending on the value of the current bit, until a “character node“ is reached. It is a requirement that at any time the trees expressing the character frequencies shall be identical for both coder and decoder. This can be ach

45、ieved in a number of ways. Firstly, both coder and decoder could use a fixed and pre-agreed frequency distribution that includes all possible characters but as noted above, this use of “prior information“ suffers when a given input stream has a significantly different character frequency distributio

46、n. Secondly, the coder may calculate the character frequency distribution for the entire input stream and prepend this information to the encoded bit stream. The decoder would then generate the appropriate tree prior to processing the bitstream. This approach offers good compression, especially if t

47、he character frequency information may itself be compressed in some manner. Approaches of this type are common but the cost of the prepended information for a potentially small data stream makes it less attractive. Thirdly, extend the algorithm such that although both coder and decoder start with kn

48、own frequency distributions, and subsequently adapt these distributions to reflect the addition of each character in the input stream. One possibility is to have initial distributions that encompass all possible characters so that all that is required, as each input character is processed, is to inc

49、rement the appropriate frequency and update the tree. However, the inclusion of all possible ETSI ETSI TS 123 042 V13.0.0 (2016-03)93GPP TS 23.042 version 13.0.0 Release 13characters in the initial distribution means that the tree is relatively slow to adapt, making this approach less appropriate for short messages. An alternative is to have an initial distribution that does not include all possible characters and to add new characters to the distribution if, and when, they occur in the input stream. To achieve the latter approach, the concept of a “special“ character is

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 标准规范 > 国际标准 > 其他

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1