1、 AMERICAN NATIONAL STANDARD FOR TELECOMMUNICATIONS ATIS-0100521.2005(R2015) Packet Loss Concealment for Use with ITU-T Recommendation G.711 As a leading technology and solutions development organization, ATIS brings together the top global ICT companies to advance the industrys most-pressing busines
2、s priorities. Through ATIS committees and forums, nearly 200 companies address cloud services, device solutions, emergency services, M2M communications, cyber security, ehealth, network evolution, quality of service, billing support, operations, and more. These priorities follow a fast-track develop
3、ment lifecycle from design and innovation through solutions that include standards, specifications, requirements, business use cases, software toolkits, and interoperability testing. ATIS is accredited by the American National Standards Institute (ANSI). ATIS is the North American Organizational Par
4、tner for the 3rd Generation Partnership Project (3GPP), a founding Partner of oneM2M, a member and major U.S. contributor to the International Telecommunication Union (ITU) Radio and Telecommunications sectors, and a member of the Inter-American Telecommunication Commission (CITEL). For more informa
5、tion, visit . AMERICAN NATIONAL STANDARD Approval of an American National Standard requires review by ANSI that the requirements for due process, consensus, and other criteria for approval have been met by the standards developer. Consensus is established when, in the judgment of the ANSI Board of S
6、tandards Review, substantial agreement has been reached by directly and materially affected interests. Substantial agreement means much more than a simple majority, but not necessarily unanimity. Consensus requires that all views and objections be considered, and that a concerted effort be made towa
7、rds their resolution. The use of American National Standards is completely voluntary; their existence does not in any respect preclude anyone, whether he has approved the standards or not, from manufacturing, marketing, purchasing, or using products, processes, or procedures not conforming to the st
8、andards. The American National Standards Institute does not develop standards and will in no circumstances give an interpretation of any American National Standard. Moreover, no person shall have the right or authority to issue an interpretation of an American National Standard in the name of the Am
9、erican National Standards Institute. Requests for interpretations should be addressed to the secretariat or sponsor whose name appears on the title page of this standard. CAUTION NOTICE: This American National Standard may be revised or withdrawn at any time. The procedures of the American National
10、Standards Institute require that action be taken periodically to reaffirm, revise, or withdraw this standard. Purchasers of American National Standards may receive current information on all standards by calling or writing the American National Standards Institute. Notice of Disclaimer the implement
11、ation involves only the receiver. No modifications are required at the transmitting end. The Annexes provide example code; how-ever, other implementations of the individual algorithms are possible. The performance of the methods defined in this stan-dard has been demonstrated in subjective listening
12、 tests; the Annex A and Annex B methods are subjectively equivalent for the range of conditions tested. Annex A describes a low-complexity, high-quality Packet Loss Concealment algorithm. In informal listening tests it performs well under a variety of input signal conditions: clean speech, noisy spe
13、ech, music, and background noises, and compares fa-vorably with the Packet Loss Concealment algorithms in several of the CELP-based speech coders standardized by the ITU-T. Annex B describes a technique that employs linear prediction to estimate missing speech and uses the resultant vocal tract mode
14、l output and excitation information to reconstruct the signal contained in missing packets. Formal subjective tests have been performed to evaluate this technique and unprotected G.711, using packet losses up to 10%, packet sizes up to 40 ms, and with clean and noisy speech. The results show signifi
15、cant benefit from the application of this technique compared to un-protected G.711. The Alliance for Telecommunication Industry Solutions (ATIS) serves the public through improved understanding between carriers, customers, and manufacturers. The Alliance for Telecommunication Industry Solutions (ATI
16、S) serves the public through improved understanding between carriers, customers, and manufacturers. The Network Performance, Reliability, and Quality of Service Committee (PRQC)formerly T1A1develops and recommends standards, requirements, and techni-cal reports related to the performance, reliabilit
17、y, and associated security aspects of communications networks, as well as the processing of voice, audio, data, image, and video signals, and their multimedia integration. PRQC also develops and recom-mends positions on, and foster consistency with, standards and related subjects under consideration
18、 in other North American and international standards bodies. ANSI guidelines specify two categories of requirements: mandatory and recommendation. The mandatory requirements are designated by the word shall and recommendations by the word should. Where both a mandatory requirement and a recom-mendat
19、ion are specified for the same criterion, the recommendation represents a goal currently identifiable as having distinct compatibility or performance advantages. Suggestions for improvement of this document are welcome. They should be sent to the Alliance for Telecommunications Industry Solutions, P
20、RQC Secretariat, 1200 G Street NW, Suite 500, Washington, DC 20005. ATIS-0100521.2005 iii At the time it approved this document, PRQC, which is responsible for the development of this Standard, had the following members: R. Wohlert, PRQC Chair N. Seitz, PRQC Vice-Chair S. Carioti, ATIS Disciplines S
21、. Barclay, ATIS Secretariat C. Underkoffler, ATIS Chief Editor M. Perkins and L. Thorpe, PRQC Technical Editors Organization Represented Name of Representative Alcatel USA Inc. Ken Biholar ASTRI Jacky Chow AT all users of this American National Standard are therefore encouraged to investigate the po
22、ssibility of applying the most recent edition of the standards listed below. ATIS-0100521.2005 2 ITU-T Recommendation G.711 (11/88), Pulse code modulation (PCM) of voice frequencies.13 ABBREVIATIONS CELP Code-Excited Linear Prediction OLA Overlap-Add PLC Packet Loss Concealment TD-PSOLA Time Domain
23、Pitch Synchronous Overlap Add WSOLA Waveform Similarity Overlap Add 4 DEFINITIONS 4.1 Packet Loss: Loss or corruption of speech segments (packets) during transmission. 4.2 Packet Loss Concealment: A method for reconstructing speech segments that have been lost or corrupted during transmission. 5 CON
24、VENTIONS In this Standard, we use “G.711” as shorthand for ”ITU-T Recommendation G.711” to improve read-ability. For the same reason, ”erasure” is used to refer to a sequence of one or more lost packets. 6 PACKET LOSS CONCEALMENT FOR G.711 This standard presumes an audio system design where the inpu
25、t signal is encoded and packetized at the transmitter and sent over a network to a receiver that decodes the packet and plays out the output. Packet Loss Concealment (PLC) algorithms are intended to reduce the impairment caused when speech packets are lost or damaged during transmission. Many of the
26、 standard CELP-based speech coders, such as those defined in ITU-T Recommendations G.723.1, G.728, and G.729, have frame erasure con-cealment algorithms that are either built-in or defined as added features in their standards. The objec-tive of PLC is similar to that of frame erasure concealment, na
27、mely to generate a synthetic speech signal to replace the missing speech signal that was contained in the lost packets or frames. Ideally, the syn-thesized signal has the same timbre and spectral characteristics as the missing signal, and does not cre-ate unnatural artifacts. Since speech signals ar
28、e often locally stationary, it is possible to use the past his-tory of the signal to generate a reasonable approximation to the missing segment. If the duration of lost packets (i.e., an erasure) is short, and does not occur in a region where the signal is changing rapidly, the erasures may be inaud
29、ible after concealment. The PLC technique described in Annex A to this standard recognizes and goes beyond earlier work performed on pitch waveform replication techniques designed to conceal lost packets 3, 4. Generating 1This document is available from the International Telecommunications Union. AT
30、IS-0100521.2005 3 synthesized speech to replace lost packets has similarities to time-scale expanding of speech. In both cases, the goal is to generate synthetic signals near a region of original speech. Over the last decade, several elegant, high-quality and low-complexity techniques, such as WSOLA
31、 1 and TD-PSOLA 2, have been devised for time-scaling. The PLC method described in Annex A benefits from these well-known time-scaling techniques. The PLC algorithm described in Annex B to this standard uses the well-known linear predictive model of speech production that is widely used in low bit-r
32、ate speech coding. The algorithm estimates the spectral characteristics of a missing segment, and then synthesizes a high-quality approximation to the missing segment using this production model. Unlike CELP-based coders, G.711 has no model of speech production i.e., there is nothing in the en-coder
33、 to help the decoder with the concealment. Hence, the concealment algorithm for G.711 will be independent of the encoder. This implies that the PLC adds computational complexity, memory re-quirements, and delay to the decode process of G.711. This standard recognizes that G.711 is likely to operate
34、in computationally sparse environments. Thus, the computational and memory requirements of PLC are minimized. On the other hand, G.711 has the advantage that the signal returns to the origi-nal signal at the first sample in the first good packet after an erasure. With CELP-based coders, the de-coder
35、s state variables take time to recover after an erasure, especially if the coder is backward adap-tive. Thus, PLC in G.711 has the ability to recover rapidly after an erasure is over. ATIS-0100521.2005 4 Annex A (Normative) A REVERSE ORDER REPLICATED PITCH PERIODS (RORPP) ALGORITHM The PLC technique
36、 defined in this Annex, called Reverse Order Replicated Pitch Periods (RORPP), is scal-able with respect to the size of the packets. It is capable of concealing lost packets when the packets in-clude up to 30 ms of speech, and is effective when packet loss rates are as high as 20%. Annoying im-pairm
37、ents such as clicks, pops, and harshness are minimized, though intelligibility may suffer when packet loss rates are high. A.1 Algorithm Description To add PLC to a G.711 system that currently does not conceal losses, changes are only required in the receiver. In the following discussion, it is assu
38、med that the coder is G.711, the audio input signal is sampled at 8 kHz, and each packet contains 10 ms (80 samples) of audio. Any packet size or sampling rate can be accommodated by adjusting a few parameters. A.1.1 Good Packets During normal operation (good packets), the receiver decodes the recei
39、ved packet and sends its output to the audio port. To support PLC, two operational changes are required at the receiver: 1. A copy of the decoded output is saved in a circular history buffer that is 48.75 ms (390 samples) long. The history buffer is used to calculate the current pitch period and ext
40、ract waveforms dur-ing a sequence of lost packets (i.e., an erasure). 2. The output is delayed by 3.75 ms (30 samples) before it is sent to the audio port. This algorithm de-lay is used for an overlap add (OLA) at the start of an erasure and allows the PLC code to make a smooth transition between th
41、e real and synthesized signals. A.1.2 First Lost Packet At the start of an erasure (when a packet loss is first detected), the circular history buffer is copied to a non-circular buffer, called the pitch buffer, that is easier to work with. The contents of the pitch buffer are used for the duration
42、of the erasure. An additional copy of the most recent 1/4 pitch period, called the last_quarter buffer, is made to allow for the case where the erasure lasts longer than 10 ms. A.1.3 Pitch Detection The pitch is estimated by finding the peak of the normalized cross-correlation of the most recent 20
43、ms of speech in the history buffer, with the previous speech at taps from 5 (40 samples) to 15 ms (120 sam-ples), corresponding to frequencies of 200 to 66 Hz. The pitch range was chosen based on a range used in the post-filter used in ITU-T Rec. G.728. While G.728 uses a lower bound of 2.5 ms (20 s
44、amples), here ATIS-0100521.2005 5 it is increased to 40 samples so the same pitch period is not repeated more than twice in a single 10 ms erasure. To lower complexity, the pitch estimation is calculated in two phases. First, a coarse search is performed on a 2:1 decimated signal, and then a finer s
45、earch is performed in the vicinity of the peak of the coarse search. The complexity can be lowered with a slight degradation in quality by skipping the fine search. From WSOLA 1, it is known that the normalized cross-correlation function can be replaced with ei-ther a non-normalized cross correlatio
46、n, or a cross-Average Magnitude Difference Function (AMDF) and similar results will be obtained. A.1.4 Synthetic Signal Generation for First 10 ms For the first 10 ms of lost speech, the best results are obtained by generating the synthesized signal from the last pitch period, with no attenuation. O
47、nly the most recent 1.25 pitch periods of the pitch buffer are used during the first 10 ms. To ensure a smooth transition between the real and synthetic signal, and a smooth transition if the pitch period is repeated multiple times, an OLA is performed with a triangular window on 1/4 of the pitch pe
48、riod between the last and next to last pitch period. For 1/4 pitch period, the signal starting at 1.25 pitch periods from the end of the pitch buffer is multiplied by an up-sloping ramp and is added to the last 0.25 pitch period in the last_quarter buffer multi-plied by a down-sloping ramp. If compl
49、exity is not an issue, the triangular windows may be replaced with Hanning windows in all OLA operations. The result of the OLA replaces both the tail of the pitch buffer and the tail of the history buffer. It is also output by the receiver during the tail of the last good packet, replacing the original signal. This intro-duces the algorithm delay; the tail of the last packet cannot be output until it is known whether the next packet is lost. If a packet loss occurs, the signal in the tail of the last good packet is modified by the OLA to ensure a smooth transition