1、 TIA/EIAINTERIM STANDARDEnhanced Digital AccessCommunications System IMBEImplementationTIA/EIA/IS-69.5APRIL 2000 (r 7/2009)TELECOMMUNICATIONS INDUSTRY ASSOCIATIONRepresenting the telecommunications industry inassociation with the Electronic Industries Alliance TIA/EIA/IS-69.5NOTICETIA/EIA Engineerin
2、g Standards and Publications are designed to serve the public interest througheliminating misunderstandings between manufacturers and purchasers, facilitating interchangeability andimprovement of products, and assisting the purchaser in selecting and obtaining with minimum delay theproper product fo
3、r his particular need. Existence of such Standards and Publications shall not in anyrespect preclude any member or nonmember of TIA/EIA from manufacturing or selling products notconforming to such Standards and Publications, nor shall the existence of such Standards and Publicationspreclude their vo
4、luntary use by those other than TIA/EIA members, whether the standard is to be usedeither domestically or internationally.Standards and Publications are adopted by TIA/EIA in accordance with the American National StandardsInstitute (ANSI) patent policy. By such action, TIA/EIA does not assume any li
5、ability to any patent owner,nor does it assume any obligation whatever to parties adopting the Standard or Publication.TIA/EIA INTERIM STANDARDSTIA/EIA Interim Standards contain information deemed to be of technical value to the industry, and arepublished at the request of the originating Committee
6、without necessarily following the rigorous publicreview and resolution of comments which is a procedural part of the development of a TIA/EIA Standard.TIA/EIA Interim Standards should be reviewed on an annual basis by the formulating Committee and adecision made on whether to proceed to develop a TI
7、A/EIA Standard on this subject. TIA/EIA InterimStandards must be cancelled by the Committee and removed from the TIA/EIA Standards Catalog beforethe end of their third year of existence.Publication of this TIA/EIA Interim Standard for trial use and comment has been approved by theTelecommunications
8、Industry Association. Distribution of this TIA/EIA Interim Standard for commentshall not continue beyond 36 months from the date of publication. It is expected that following this 36month period, this TIA/EIA Interim Standard, revised as necessary, will be submitted to the AmericanNational Standards
9、 Institute for approval as an American National Standard. Suggestions for revisionshould be directed to: Standards An EDACS Overview, a Glossary, and aStatement of Requirements.The readers attention is called to the possibility that compliance with this Standard may require the use ofone or more inv
10、entions covered by patent rights.By publication of this Interim Standard, no position is taken with respect to the validity of those claims orany patent rights in connection therewith. The patent holders so far identified have, however, filedstatements of willingness to grant licenses under those ri
11、ghts on reasonable and nondiscriminatory termsand conditions to applicants desiring to obtain such licenses. Details may be obtained from the publisher.Jim HolthausChairman TR-8.4TIA/EIA/IS-69.5ii(This page intentionally left blank)TIA/EIA/IS-69.5iiiEnhanced Digital Access Communications SystemIMBE
12、ImplementationDigital Voice Systems Inc. (DVSI) claims certain rights, including patent rights, inthe Improved Multi-Band Excitation (IMBE) voice coding algorithm described inthis document and elsewhere. Any use of this technology requires writtenlicense from DVSI. DVSI is willing to grant a royalty
13、-bearing license to use theIMBE voice coding algorithm for Enhanced Digital Access CommunicationsSystems under a set of standard terms and conditions. Details may be obtainedby contacting DVSI as indicated below.Digital Voice Systems, Inc.One Van deGraff DriveBurlington, MA 01803Phone: 781-270-1030F
14、ax: 781-270-0166DVSI acknowledges the Massachusetts Institute of Technology where the Multi-Band Excitation speech model was developed. In addition DVSI acknowledgesthe Rome Air Development Center of the United States Air Force whichsupported the early development of the Multi-Band Excitation speech
15、 model.September 8, 1998Copyright, Digital Voice Systems Inc., 1998DVSI grants a free irrevocable license to the Telecommunications Industry Association (TIA) toincorporate test contained in this contribution and any modifications thereof in the creation of aTIA standards publication; to copyright i
16、n TIAs name any TIA standards publication even though itmay include portions of this contribution; and at TIAs sole discretion to permit others to reproducein whole or in part the resulting TIA standards publication.IMBE is a trademark of Digital Voice Systems, Inc.TIA/EIA/IS-69.5iv(This page intent
17、ionally left blank)TIA/EIA/IS-69.5vCONTENTS1 Scope. 12 Introduction. 13 Multi-Band Excitation Speech Model. 34 Speech Input/Output Requirements. 55 Speech Analysis 75.1 Pitch Estimation 95.1.1 Determination of E(P). 105.1.2 Pitch Tracking. 115.1.3 Look-Back Pitch Tracking 125.1.4 Look-Ahead Pitch Tr
18、acking135.1.5 Pitch Refinement. 155.2 Voiced/Unvoiced Determination 175.3 Estimation of the Spectral Amplitudes 196 Parameter Encoding and Decoding. 206.1 Fundamental Frequency Encoding and Decoding . 216.2 Voiced/Unvoiced Decision Encoding and Decoding . 226.3 Spectral Amplitudes Encoding. 246.3.1
19、Encoding the Gain Vector. 266.3.2 Encoding the Higher Order DCT Coefficients 276.4 Spectral Amplitudes Decoding. 296.4.1 Decoding the Gain Vector. 306.4.2 Decoding the Higher Order DCT Coefficients 317 Bit Manipulations 337.1 Bit Prioritization . 347.2 Encryption . 367.3 Error Control Coding. 377.4
20、Bit Modulation 407.5 Bit Interleaving 427.6 Error Estimation . 427.7 Frame Repeats. 427.8 Frame Muting 438 Spectral Amplitude Enhancement . 449 Adaptive Smoothing 45TIA/EIA/IS-69.5vi10 Parameter Encoding Example . 4711 Speech Synthesis. 5311.1 Speech Synthesis Notation 5311.2 Unvoiced Speech Synthes
21、is. 5411.3 Voiced Speech Synthesis. 5612 Additional Notes . 59Annex A: Variable Initialization 60Annex B: Initial Pitch Estimation Window 61Annex C: Pitch Refinement Window. 63Annex D: FIR Low Pass Filter. 65Annex E: Gain Quantizer Levels 66Annex F: Bit Allocation and Step Size for Transformed Gain
22、Vector. 67Annex G: Bit Allocation for Higher Order DCT Coefficients . 71Annex H: Bit Frame Format. 92Annex I: Speech Synthesis Window 93Annex J: Log Magnitude Prediction Residual Block Lengths . 95Annex K: Flow Charts . 96References. 117TIA/EIA/IS-69.5viiFigures1 Improved Multi-Band Excitation Speec
23、h Coder 22 Comparison of Traditional and MBE Speech Models 43 Analog Front End 54 Analog Input/Output Filter Mask 65 IMBE Speech Analysis Algorithm 76 High Pass Filter Frequency Response at 8 kHz Sampling Rate 87 Relationship between Speech Frames . 88 Window Alignment. 109 Initial Pitch Estimation
24、1110 Pitch Refinement. 1411 IMBE Voiced/Unvoiced Determination 1612 IMBE Frequency Band Structure 1813 IMBE Spectral Amplitude Estimation 1914 Fundamental Frequency Encoding and Decoding. 2215 V/UV Decision Encoding and Decoding 2316 Encloding of the Spectral Amplitudes 2317 Prediction Residual Bloc
25、ks for L = 34 2518 Formation of Gain Vector . 2619 Decoding of the Spectral Amplitudes . 2920 Encoder Bit Manipulations 3321 Decoder Bit Manipulations . 3422 Priority Scanning of 3b through 1+Lb 3623 Formation of Code Ventors 0 through 3 3724 Formation of Code Vectors 4 through 6 3825 Parameter Enha
26、ncement and Smoothing. 4726 IMBE Speech Synthesis 54TIA/EIA/IS-69.5viiiTables1 Bit Allocation Among Model Parameters . 202 Eight Bit Binary Representation 213 Uniform Quantizer Step Size for Higher Order DCT Coefficients 284 Standard Deviation of Higher Order DCT Coefficients 295 Division of Predict
27、ion Residuals into Blocks in Encoding Example 476 Example Bit Allocation and Step Size for the Transformed Gain Vector . 487 Example Bit Allocation and Step Size for Higher Order DCT Coefficients 508 Construction of iu in Encoding Example (1 of 3) 509 Construction of iu in Encoding Example (2 of 3)
28、5110 Construction of iu in Encoding Example (3 of 3) 5211 Breakdown of Algorithmic Delay 59TIA/EIA/IS-69.511 ScopeThis document specifies a voice coding method for the Enhanced Digital Access Communication System.It describes the functional requirements for the transmission and reception of voice in
29、formation using digitalcommunication media described in the standard. This document is specifically intended to define theconversion of voice from an analog representation to a digital representation that consists of a net bit rate of4.4 kbps for voice information, and a gross bit rate of 7.1 kbps a
30、fter error control coding.The voice coder (or vocoder) presented in this document is intended to be used through-out a system in anyequipment that requires an analog-to-digital or digital-to-analog voice interface. Specifically, mobile andportable radios as well as console equipment and gateways to
31、voice networks may contain the vocoderdescribed in this document.2 IntroductionThis document provides a functional description of the Improved Multi-Band Excitation (IMBE) voicecoding algorithm adopted for Enhanced Digital Access Communications Systems. This document describesthe essential operation
32、s that are necessary and sufficient to implement this voice coding algorithm.However, it is highly recommended that the references be studied prior to the implementation of thisalgorithm. It is also recommended that implementations begin with a high-level language simulation of thealgorithm, and the
33、n proceed to a real-time implementation using a digital signal processor. Highperformance real-time implementations have been demonstrated using both floating-point and fixed-pointprocessors. The reader is cautioned that this document does not attempt to describe the most efficient meansof implement
34、ing the IMBE vocoder. The reader should consult one or more references on efficient real-timeprogramming for more information on this subject. Additionally this document does not address vocodertesting and verification. These subjects will be addressed in separate documents that may be released at a
35、later time.TIA/EIA/IS-69.52Figure 1: Improved Multi-Band Excitation Speech CoderThe IMBE speech coder is based on a robust speech model which is referred to as the Multi-BandExcitation (MBE) speech model 3. The basic methodology of the coder is to divide a digital speech inputsignal into overlapping
36、 speech segments (or frames) using a window such as a Kaiser window. Each speechframe is then compared with the underlying speech model, and a set of model parameters are estimated forthat particular frame. The encoder quantizes these model parameters and transmits a bit stream at 7.1 kbps.The decod
37、er receives this bit stream, reconstructs the model parameters, and uses these model parameters togenerate a synthetic speech signal. This synthesized speech signal is the output of the IMBE speech coder asshown in Figure 1. One should note that the IMBE speech coder shown in this figure and defined
38、 by thisdocument is a digital-to-digital function.The IMBE speech coder is a model-based speech coder, or vocoder, which does not try to reproduce theinput speech signal on a sample by sample basis. Instead the IMBE speech coder constructs a syntheticspeech signal which contains the same perceptual
39、information as the original speech signal. Many previousvocoders (such as LPC vocoders, homomorphic vocoders, and channel vocoders) have not been successfulin producing high quality synthetic speech. The IMBE speech coder has two primary advantages over thesevocoders. First, the IMBE speech coder is
40、 based on the MBE speech model which is a more robust modelthan the traditional speech models used in previous vocoders. Second, the IMBE speech coder uses moresophisticated algorithms to estimate the speech model parameters, and to synthesize the speech signal fromthese model parameters.This docume
41、nt is organized as follows. In Section 3 the MBE speech model is briefly reviewed. Thissection presents background material which is useful in understanding operation of the IMBE speech coder.DigitalSpeechMBE ModelParametersPrioritizedBit VectorsDigitalInputGainSpeechAnalysisQuanti-zationEncryp-tion
42、FECEncodingDigitalOutputGainSpeechSynthesisRecon-structionDecryp-tionFECDecodingIMBE EncoderIMBE DecoderDigitalSpeechBit Streamat 7.1 kbpsBit Streamat 7.1 kbpsTIA/EIA/IS-69.53Section 4 describes the basic speech input/output requirements. Section 5 examines the methods used toestimate the speech mod
43、el parameters, and Section 6 examines the quantization and reconstruction of theMBE model parameters. The error correction and the format of the 7.1 kbps bit stream is discussed inSection 7. This is followed by Section 8 which describes the enhancement of the spectral amplitudes, andSection 9 which
44、describes the adaptive smoothing method used to reduce the effect of uncorrectable biterrors. Section 10 then demonstrates the encoding of a typical set of model parameters. Section 11 discussesthe synthesis of speech from the MBE model parameters. A few additional comments on the algorithm andthis
45、document are provided in Section 12. Other information such as bit allocation tables, quantizationlevels and initialization vectors are contained in the attached appendices. In addition, Appendix K containsa set of flow charts describing certain elements of this vocoder. Note that these flow charts
46、have beendesigned to help clarify the various algorithmic steps and do not necessarily describe the best or mostefficient method of implementing the vocoder.3 Multi-Band Excitation Speech ModelLet s(n) denote a discrete speech signal obtained by sampling an analog speech signal. In order to focusatt
47、ention on a short segment of speech over which the model parameters are assumed to be constant, awindow w(n) is applied to the speech signal s(n). The windowed speech signal is defined byThe sequence is referred to as a speech segment or a speech frame. The IMBE analysis algorithmactually uses two d
48、ifferent windows, and , each of which is applied separately to the speechsignal via Equation (1). This will be explained in more detail in Section 5 of this document. The speechsignal s(n) is shifted in time to select any desired segment. For notational convenience refers to thecurrent speech frame.
49、 The next speech frame is obtained by shifting s(n) by 20 ms.A speech segment is modelled as the response of a linear filter to some excitationsignal . Therefore, , the Fourier Transform of , can be expressed aswhere and are the Fourier Transforms of and , respectively.TIA/EIA/IS-69.54In traditional speech models, speech is divided into two classes depending upon the nature ofexcitation signal. For voiced speech the excitation signal is a periodic impuls