ITU-T P 330-2003 Speech processing devices for acoustic enhancement SERIES P TELEPHONE TRANSMISSION QUALITY TELEPHONE INSTALLATIONS LOCAL LINE NETWORKS Subscribers- lines and sets《.pdf

资源描述

1、COVERING NOTE GENERAL SECRETARIAT INTERNATIONAL TELECOMMUNICATION UNION Geneva, 30 September 2003 ITU -TELECOMMUNICATION STANDARDIZATION SECTOR Subject: Amendment 1 (09/2003) to ITU-T Recommendation P.330 (03/2003), Speech processing devices for acoustic enhancement In clause 4.5.1, Acoustic echo pa

2、th All the text at the end of the paragraph belongs to the NOTE. It should be indented and amended as follows: NOTE - It is recommended to avoid extremely long rooms (Length Width, Height) and rooms with extremely low ceilings (Height Width, Height) and rooms with extremely low ceilings (Height Leng

3、th, Width), and preferably also rooms with all the side dimensions nearly identical. Large, flat, parallel room-limiting surfaces, and surface areas that provide broadband sound reflection, particularly wall surfaces at an average room height (roughly 0.8 m to 1.8 m above the floor) should be avoide

4、d, since they can cause flutter echoes and flutter-echo-like disturbances (echoing, roughness), if the test setup is in an unfavourable position. Measuring the local frequency-dependent distribution of sound pressure levels within a selected room in the steady state can help to determine the optimum

5、 position of the test setup. As a general suggestion, the minimum distance between the test setup and room limiting surfaces should be 1 m, regardless of the acoustic properties of these surfaces. This can prevent disturbances due to initial reflections and a rise in sound pressure level that can oc

6、cur locally at low frequencies. The same recommendation applies to geometrically large furniture surfaces that reflect sound. 4.5.2 Parameters and recommended limits - - 4.5.2.1 The weighted loss between the Ri, and Saut network interfaces when the AEC is in normal operation, and when there is no si

7、gnal coming from the local user1. Before each test the terminal is switched on. Weighted terminal coupling loss - single-talk (TCLwst) The weighting is made according to the rule specified in ITU-T Rec. G.122 (computation of talker echo loudness rating). Care must be taken to avoid possible masking

8、of singing effects by the weighting (under study). ITU-T Rec. P.330 (03/2003) 5 The recommended values for each type of hands-free terminal can be found in the relevant ITU-T Recommendations (e.g., ITU-T Rec. P.341 for wideband hands-free terminal and ITU-T Rec. P.342 for digital hands-free terminal

9、). 4.5.2.2 Weighted terminal coupling loss - double-talk (TCLwdt) The weighted loss between the Ri, and Sout network interfaces when the AEC is in normal operation, and where the local user and the far-end user are active simultaneously1. The recommended values for each type of hands-free terminal c

10、an be found in clause 8P.340. 4.5.2.3 The received signal attenuation (at the point) which is inserted by the AEC during double-talk events. The frequency response on the receive side during double-talk should ideally be the same as during single-talk conditions. In practice, however, it may not be

11、possible to implement echo cancellation which provides sufficient echo loss during double-talk, without modifjdng the frequency response. 4.5.2.4 The sent signal attenuation (at the Sout point) which is inserted by the AEC during double-talk events. The frequency response on the send side during dou

12、ble-talk should ideally be the same as during single-talk conditions. In practice, however, it may not be possible to implement echo cancellation which provides sufficient echo loss during double-talk, without modifjdng the frequency response. 4.5.2.5 The total non-linear signal distortion at the po

13、int which can be produced by the AEC during double-talk events. For all the applications, the supplementary distortion at Rout in comparison with single-talk conditions should be low. Received speech attenuation during double-talk (Ardt) Sent speech attenuation during double-talk (Asdt) Received spe

14、ech distortion during double-talk (Drdt) 4.5.2.6 The total non-linear signal distortion at the Sout point which can be produced by the AEC during double-talk events. For all the applications, the supplementary distortion at Sout in comparison with single-talk conditions should be low. Sent speech di

15、stortion during double-talk (Dsdt) 4.5.2.7 The time interval between the onset of the received signal (similarly the transmitted signal) and the instant when the attenuation on the receive path (similarly on the send path) reaches 3 dB. For this purpose, the other side is quiet. 4.5.2.7.1 Receive si

16、de (TRst-r) For all the applications, TRst-r shall be no more than 20 ms. 4.5.2.7.2 Send side (TRst-s) For all the applications, TRst-s shall be no more than 20 ms. 4.5.2.8 The time interval between the onset of the received signal (similarly the sent signal) and the instant when the attenuation on

17、the receive path (similarly on the send path) reaches the value Ardt Build-up time - single-talk (TRst) Build-up time - double-talk (TRdt) 6 ITU-T Rec. P.330 (03/2003) (similarly Asdt). For this purpose, the signal in the opposite direction of transmission is held at a specified level. 4.5.2.8.1 Rec

18、eive side (TRdt-r) TRdt-r should be less than 20 ms, if the attenuation is more than 6 dB. 4.5.2.8.2 Send side (TRdt-s) TRdt-s should be less than 20 ms, if the attenuation is more than 6 dB. 4.5.2.9 Convergence time (Tc) Convergence Time is the time interval between the instant when a specified tes

19、t signal is applied to the Ri, port of the terminal (after all the functions of the AEC have been reset and then enabled), and the instant when the returned echo signal at the Sout port is attenuated by at least a predefined amount. The local user is not active. 4.5.2.10 Hang-over time after double-

20、talk (THdt) The time elapsed between the end of a double-talk event and the instant when the attenuation of the echo recovers a specified value (a signal is received continuously fi-om the distant user). For all the applications, the attenuation of the signal at Sout should be at least 20 dB after T

21、Hdt = 11 second. 4.5.2.11 The echo return loss fi-om Ri, to Sout is measured according to the procedure defined for ERLtst in ITU-T Rec. P.502. Terminal coupling loss temporally weighted - single-talk (TCLtst) 4.5.2.12 Terminal coupling loss temporally weighted echo return loss - double-talk (ERLtdt

22、) The echo return loss fi-om Ri, to Sout is measured according to the procedure described for ERLtdt in ITU-T Rec. P.502. 5 Noise reduction The main purpose of a noise reduction (NR) system in a device is to reduce the annoying and fatiguing effects of the transmitted background noise. The technique

23、s used to reduce background noise may be classified as analogue only, digital only, and combined analogue and digital techniques. 5.1 Analogue components The analogue components of a NR system include the microphone and any analogue circuitry connecting the microphone to the CODEC (analogue-to-digit

24、al converter). There are several techniques used in reducing background noise that only rely on analogue components: a) The proximity of the microphone relative to the talkers mouth is a major factor in determining the SNR. Moving the microphone close to the talkers mouth produces an obvious but sig

25、nificant SNR enhancement (SNRE). Additional microphones may be used to improve the SNR. The analogue signal path is typically designed to have a high-pass filter response. When noise has strong low-frequency components (example: automobile noise), this filter technique will enhance the SNR (as measu

26、red over the full band). The side effect is a noticeable loss of timbre in speech quality (especially in male voices). Microphones may be designed to provide passive directional gain. The most common type used in automobiles is a first-order differential microphone. This microphone can be designed w

27、ith a single transducer using two ports. For a diffuse noise field and the correct b) c) ITU-T Rec. P.330 (03/2003) 7 microphone orientation, a hypercardioid first-order differential microphone array will enhance the SNR by 6 dB compared to an omni-directional microphone. Higher order differential m

28、icrophones are possible. In addition, microphone arrays using passive only techniques are possible but unlikely to be widely used because these arrays need to be very large to have an impact on the low frequencies components of speech. They may contain as many as 16 elements that can provide a direc

29、tional gain of approximately 20 dB at some frequencies. Vocal activity detector - 5.2 The functional units of noise reduction system are devices or parts of devices implemented in the processing unit, which contribute to the general function of noise reduction. There is no restriction on how to impl

30、ement them. A functional block-diagram of a typical processing unit is shown in Figure 2. Two common types of digital noise reduction techniques may be implemented in a digital signal processor (DSP) or other type of microprocessor within a terminal using a single microphone. The following technique

31、s are commonly used: a) Full-band noise suppression: During the pauses in speech, the noise is reduced significantly as long as its energy is below a threshold level. During active speech, the attenuation is removed allowing both speech and noise to pass. This produces an undesirable noise pumping e

32、ffect if the attenuation is set too high. Sub-band noise suppression: The transmitted signal is broken into sub-bands using an Fast Fourier Transform (FFT) algorithm. Only the frequency bands with stationary noise are attenuated while the bands with speech signal are unaltered. The well-known method

33、 of spectral subtraction is one such technique. In practice, noise suppression levels range from 6 to 15 dB. The drawback of these techniques is the existence of a compromise between the level of noise reduction and the distortion of the original speech signal. Hence, it is difficult to find a tunin

34、g that works in all conditions of noise (SNR and type of noise). Under low SNR conditions, however, the speech signal is degraded somewhat if high levels of noise suppression are used. Functional units of a noise reduction system b) Noise cstimator bp& To network d- interface interface bP bP P.33O-F

35、O2 Figure 2F.330 - Functional block-diagram of a typical processing unit (NR part) (bp denotes bypass signal paths for testing purposes) 8 ITU-T Rec. P.330 (03/2003) 5.3 Network equipment may also include noise reduction processing. This leads to tandeming issues of noise reduction functionalities.

36、The added network component shall prevent any degradation of the overall perceived quality. Interaction between terminal noise reduction and signal processing network equipment 5.4 Noise reduction processing delay The noise reduction processing delay is highly dependent on the technique used by the

37、noise filtering. In any case, compliance with transmission planning objectives must be achieved. General information about transmission delays can be found in ITU-T Rec. G.114. 5.5 Noise reduction system specifications 5.5.1 Noise environment Acoustic characteristics of the test environment are desc

38、ribed in 4.5.1. The impact of the environment should be taken into account in the case of distant sound pick-up: in this case, the reverberation of background noise has to be considered as an additional degradation. Broadcasting of background noise test signals is described in 7.10P.340. Background

39、noise test signals should include real signals such as babble noise, office room noise, street noise, car noise (engine, driving conditions at different speeds) and other simulated background noise signals (depending on the uses of the equipment). Levels of background noise test signals should be va

40、ried so as to obtain SNRs in the range -3 dB, 30 dB. NOTE - Some of the corresponding test signals are in ITU-T Rec. P.501 and the test methods described in ITU-T Rec. P.502. Additional test signals are under study. 5.5.2 Parameters and recommended limits All specified parameters should be measured

41、for different SNR values in the range -3 dB, 30 dB. For parameters which consist in measuring a signal level attenuation or a delay, measurement procedures (methods and stimulus) can be found in ITU-T RecsP.501, P.502 and P.340, which apply with the following restrictions: the speech signal level mu

42、st be at least 10 dB higher than the noise level, the noise must be stationary. For all other cases (non-stationary noise, low SNR values, distortion measurement), test methods are under study. All parameters defined below correspond to single-talk conditions. Due to possible interactions between th

43、e AEC and the NR processing integrated in the terminal, parameters under double-talk conditions must be considered also (under study). 5.5.2.1 The sent signal attenuation (at the Sout point) which is inserted by the NR in quiet conditions. 5.5.2.2 The total non-linear signal distortion at the Sout p

44、oint which can be produced by the NR in quiet conditions. For all the applications, the supplementary distortion at Sout in comparison with Si, should be as low as possible. Ideally, no additional distortion should be introduced by the NR. Sent speech attenuation in quiet conditions (Asqc) Sent spee

45、ch distortion in quiet conditions (Dsqc) ITU-T Rec. P.330 (03/2003) 9 5.5.2.3 The sent signal attenuation (at the Sout point) which is inserted by the NR during noisy conditions. Ideally, the frequency response on the send side should not change when the NR is activated. 5.5.2.4 The total non-linear

46、 signal distortion at the Sout point which can be produced by the NR during noisy events. For all the applications, the supplementary distortion at Sout in comparison with Sin should be as low as possible. Ideally, no additional distortion should be introduced by the NR 5.5.2.5 Adaptation time (TA)

47、Adaptation time is the time interval between the instant when a specified noise test signal is applied to the S, port of the terminal (after all the functions of the NR have been reset and then enabled), and the instant when the returned noise test signal at the Sout port is stable within *ldB compa

48、red with the long term reduced noise level (see Figure 3). The local and distant user are not active. Sent speech attenuation during noisy conditions (Asnc) Sent speech distortion during noisy conditions (Dsnc) . TA f t=,+TA t=& apply noise test signal at S, port - t P.330-FO3 Figure 3F.330 - Defini

49、tion of adaptation time (TA) 5.5.2.6 The time elapsed between the end of a speech event and the instant when the attenuation of the noise recovers a specified value. For all the applications, with high levels of background noise -3 dB SNR 15 dB, the attenuation of the noise signal at Sout should be at least 6 dB after TAse = 1001 millisecond. 5.5.2.7 The terminal noise attenuation fi-om Sin to Sout which is inserted by the NR on the background noise signal when no speech signal is present. 5.5.2.8 Noise distortion - no speech (Dnns) The total non-linear signal disto

展开阅读全文