1、GENERAL SECRETARIAT INTERNATIONAL TELECOMMUNICATION UNION Subject: Erratum Recommendation ITU-T P.50 (09/99) TE LEC0 M M U N I CATI ON STANDARDIZATION SECTOR OF ITU Geneva, 25 May 2000 ITU-T Recommendation P.50 (09/99) Artificial voices Replace NOTE on page (ii) by the following: NOTE In this Recomm
2、endation, the expression “Administration“ is used for conciseness to indicate both a telecommunication administration and a recognized operating agency. Union internationale des tlcommunications Place des Nations 121 1 GENVE 20 Suisse - Switzerland - Suiza COPYRIGHT International Telecommunications
3、Union/ITU TelecommunicationsLicensed by Information Handling ServicesINTERNATIONAL TELECOMMUNICATION UNION ITU=T TELECOMMUNICATION STAN DARD IZATIO N SECT0 R OF ITU P.50 (09/99) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Objective measuring apparatus Artif
4、icial voices ITU-T Recommendation P.50 (Previously CCITT Recommendation) COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesGENERAL SECRETARIAT INTERNATIONAL TELECOMMUNICATION UNION Subject: Erratum Geneva, 25 May 2000 Recommendation ITU-T
5、 P.50 (09/99) TELECOMMUNICATION STAN DARD IZAT ION SECTOR OF ITU ITU-T Recommendation P.50 (09199) Artificial voices Replace NOTE on page (ii) by the following: NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicate both a telecommunication administration an
6、d a recognized operating agency. Union internationale des tlcommunications Place des Nations 121 1 GENVE 20 Suisse - Switzerland - Suiza COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesINTERNATIONAL TELECOMMUN CATION UN ION ITU-T TELECO
7、MMUNICATION STANDARDIZATION SECTOR OF ITU P.50 (09/99) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Object ive me as u ring apparat us Artificial voices ITU-T Recommendation P.50 (Previously CCITT Recommendation) COPYRIGHT International Telecommunications Un
8、ion/ITU TelecommunicationsLicensed by Information Handling ServicesITU-T P-SERIES RECOMMENDATIONS TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Vocabulary and effects of transmission parameters on customer opinion of transmission quality Subscribers lines and sets Tran
9、smission standards Objective measuring apparatus Objective electro-acoustical measurements Measurements related to speech loudness Methods for objective and subjective assessment of quality Audiovisual quality in multimedia services Series P.10 Series P.30 Series P.40 Series P.50 Series P.60 Series
10、P.70 Series P.80 P.800 Series P.900 P.300 P.500 I For further details, please refer to ITU-T List of Recommendations. COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesITU-T RECOMMENDATION P.50 ARTIFICIAL VOICES Summary The “artificial vo
11、ice“ described in this Recommendation reproduces the characteristics of human speech for the purpose of characterizing linear and non-linear telecommunication systems and devices, which are intended for the transduction or transmission of speech. The artificial voice is a signal that is mathematical
12、ly defined and that reproduces the time and spectral characteristics of speech which significantly affect the performances of telecommunication systems. Two kinds of artificial voice are defined, reproducing respectively the characteristics of female and male speech. Source ITU-T Recommendation P.50
13、 was revised by ITU-T Study Group 12 (1997-2000) and was approved under the WTSC Resolution No. 1 procedure on 30 September 1999. Recommendation P.50 (09199) i COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesFOREWORD ITU (International
14、Telecommunication Union) is the United Nations Specialized Agency in the field of telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of the ITU. The ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations
15、on them with a view to standardizing telecommunications on a worldwide basis. The World Telecommunication Standardization Conference (WTSC), which meets every four years, establishes the topics for study by the ITU-T Study Groups which, in their turn, produce Recommendations on these topics. The app
16、roval of Recommendations by the Members of the ITU-T is covered by the procedure laid down in WTSC Resolution No. 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with IS0 and IEC. NOTE In this Recommendation t
17、he term recognized operating agency (ROA) includes any individual, company, corporation or governmental organization that operates a public correspondence service. The terms Administration, ROA and public correspondence are defined in the Constitution of the ITU (Geneva, 1992). INTELLECTUAL PROPERTY
18、 RIGHTS The ITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. The ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether as
19、serted by ITU members or others outside of the Recommendation development process. As of the date of approval of this Recommendation, the ITU had not received notice of intellectual propw, protected by patents, which may be required to implement this Recommendation. However, impiementors are caution
20、ed that this may not represent the latest inormation and are therefore strongly urged to consult the TSB patent database. O ITU 2000 All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microf
21、ilm, without permission in writing from the ITU. 11 Recommendation P.50 (09/99) - - - COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesCONTENTS Page 1 2 2.1 2.2 3 3.1 3.2 3.3 4 4.1 4.2 4.3 4.4 4.5 4.6 5 5.1 5.2 5.3 5.4 5.5 6 Introduction
22、 . Scope. purpose and definition Scope and purpose . Definit ion Terminology Electrical artificial voice . Artificial mouth excitation signal Acoustic artificial voice Characteristics . Long-term average spectrum . Short-term spectrum Instantaneous amplitude distribution Segmental power level distri
23、bution . . Spectrum of the modulation envelope Time convergence Generation method Excitation source signal . Glottal excitation . Unvoiced sounds . Power envelope . Spectrum shaping filter Bibliography Annex A . Short-term spectrum characteristics of the artificial voice 1 2 2 4 4 5 6 6 7 7 7 8 9 12
24、 12 Recommendation P.50 (09199) . 111 COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesRecommendation P.50 ARTIFICIAL VOICES (Melbourne, 1988; amended at Helsinki, 1993, Geneva, 1999) 1 Introduction The signal here described reproduces t
25、he characteristics of human speech for the purposes of characterizing linear and non-linear telecommunication systems and devices, which are intended for the transduction or transmission of speech. It is known that for some purposes, such as objective loudness rating measurements, more simple signal
26、s can be used as well. Examples of such signals are pink noise or spectrum-shaped Gaussian noise, which nevertheless cannot be referred to as “artificial voice“ for the purpose of this Recommendation. The artificial voice is a signal that is mathematically defned and that reproduces the time and spe
27、ctral characteristics of speech which significantly affect the performances of telecommunication systems. Two kinds of artificial voice are defined, reproducing respectively the spectral characteristics of female and male speech. The following time and spectral characteristics of real speech are rep
28、roduced by the artificial voice: a) long-term average spectrum; b) short-term spectrum; c) instantaneous amplitude distribution; d) e) syllabic envelope. Appendix I/PSO includes a CD-ROM containing usefl test signals. The signals on this CD-ROM include the signal described in Recommendation P.50 as
29、well as other signals that have been found useful by some Administrations. Additionally, the full speech database that was used to develop Recommendation P.50 is also on this CD-ROM. Appendix IPSO is published separately. voiced and unvoiced structure of speech waveform; 2 Scope, purpose and definit
30、ion 2.1 Scope and purpose The artificial voice is aimed at reproducing the characteristics of real speech over the bandwidth 100 Hz-8 kHz. It can be utilized for characterizing many devices, e.g. carbon microphones, loudspeaking telephone sets, nonlinear coders, echo controlling devices, syllabic co
31、mpandors, nonlinear systems in general. The artificial voice described in this Recommendation is mainly used for objective evaluation of speech processing systems and devices, in which a single-channel signal with continuous activity (iTe. without pauses) is sufficient for measuring characteristics.
32、 An example is evaluation of speech codecs. For objective evaluation that needs two signals with pauses (e.g. evaluation of devices with speech detectors), the artificial conversational speech signal described in Recommendation P.59 should be used. The use of the artificial voice instead of real spe
33、ech has the advantage of both being more easily generated and having a smaller variability than samples of real voice. Recommendation P.50 (09199) 1 COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesqBb257L Obbb3b 715 Of course, when a pa
34、rticular system is tested, the characteristics of the transmission path preceding it are to be considered. The actual test signal has then to be produced as the convolution between the artificial voice and the path response. 2.2 Definition The artificial voice is a signal, mathematically defined, wh
35、ich reproduces all human speech characteristics, relevant to the characterization of linear and nonlinear telecommunication systems. It is intended to give a satisfactory correlation between objective measurements and real speech tests. 3 Terminology The artificial voice can be produced both as an e
36、lectric or as an acoustic signal, according to the system or device under test (eg communication channels, coders, microphones). The following definitions apply with reference to Figure 1, Equalizer Artificial MRP T1206110-93 1 2 3 1 Electrical artificial voice 2 Artificial mouth excitation signal 3
37、 Acoustic artificial voice MRP Mouth Reference Point Figure 1R.50 3.1 Electrical artificial voice The artificial voice produced as an electrical signal for testing transmission channels or other electric devices. 3.2 Artificial mouth excitation signal A signal applied to the artificial mouth in orde
38、r to produce the acoustic artificial voice. It is obtained by equalizing the electrical artificial voice for compensating the sensitivity/fiequency characteristic of the mouth. NOTE-The equalization depends on the particular artificial mouth employed and can be accomplished electrically or mathemati
39、cally within the signal generation process. 3.3 Acoustic artificial voice Acoustic signal at the MRP (Mouth Reference Point) of the artificial mouth. It complies with the same time and spectral specifications as the electrical artificial voice. 2 Recommendation P.50 (09199) COPYRIGHT International T
40、elecommunications Union/ITU TelecommunicationsLicensed by Information Handling Services4 Characteristics 4.1 Long-term average spectrum The third octave filtered long-term average spectrum of the artificial voice is given in Figure 2 and Table 1, normalized for a wideband sound pressure level of -4.
41、7 dBPa. The values of the long-term spectrum of the artificial voice at the MRP can be derived fiom the equation: Sy-) =-376.44 + 465.439(bglofl- 157.745(10g10f)* + 16.7124(10glof) (4- 1) where Sy) is the spectrum density in dB relative to 1 pW/m2 sound intensity per Hertz at the fiequencyf: The def
42、inition f-equency range is f-om 100 Hz to 8 kHz. The curve of the spectrum is shown in Figure 2. The values of Sy) at 1/3 octave IS0 fiequencies are given in the fourth column of Table 1. The tolerances are given in the fifth column of Table 1. The tolerances below 200 Hz apply onto to the male arti
43、ficial voice. The total sound pressure level of the spectrum defined in Equation (4-1) is -4.7 dBPa. However, this spectrum is also applicable for the levels fiom -19.7 to +10.3 dPBa. In other words, the first term of Equation (4- 1) may range fiom -39 1.44 to -361.44. dB(Pa thirteen groups shall be
44、 used for generating the voiced part, while three groups shall be used for generating the unvoiced part. These coefficients are listed in Table 2 both for male and female artificial voices. The twelve filter coefficients shall be updated every 60 ms while generating the signal. More precisely, durin
45、g each 60 ms period the actual filtering coefficients must be adjourned every 2 ms, by linearly interpolating between the two sets of values adopted for subsequent 60 ms intervals. In the voiced sound part, each of 13 groups of coefficients shall be chosen at random once every 780 ms (= 60 ms x 13),
46、 and in the unvoiced sound part each of 3 groups of coefficients shall be chosen at random once every 180 ms (= 60 ms x 3). NOTE - The described implementation of the shaping filter should be considered as an example and is not an integral part of this Recommendation. Any other implementation provid
47、ing the same transfer function can be alternatively used. A sampling frequency of 16 O00 Hz belongs to Table 2a and Table 2b. Recommendation P.50 (09199) 9 COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesT I pu, A vbl puv vb2 b 3 O a T
48、I 4 PVC2 vdl puv b z P, L T PVd2 Pvdl = 12 PJ39 T120619-93 =40PJ13 -t Figure 9P.50 - Short-term power variation patterns of the four unit elements used to generate the artificial voice output - - z- Unit delay 11211680-99 - Figure lO/P.50 - Spectrum shaping filter NOTE - The output port of the filte
49、r at the lower side in Figure 10 is not used but shown to highlight the symmetry of the filter structure. COPYRIGHT International Telecommunications Union/ITU TelecommunicationsLicensed by Information Handling ServicesTable 2a/P.50 - Coefficients ki for male artificial voice 1 Unvoiced 2 3 Voiced 1 2 3 4 5 6 7 8 9 10 11 12 13 kl k2 k3 k4 k5 k6 k7 k8 k9 kl0 kil k12 Q.471 -0.108 0.024 -0.048 0.140 0.036 0.054 0.004 0.123 0.044 0.099 -0.003 ,0284 -0.468 0.030 0.090 O. 124 -0.020 0.087 0.067 O. 13 1 0.01 1 0.076 -0.024 ,0.025 -0.496 -0.176 0.1