1、 ETSI TS 1Digital cellular telecoUniversal Mobile TelANSI-C code fspeech recognitio(3GPP TS 26.2floppy3TECHNICAL SPECIFICATION126 243 V13.0.0 (2016communications system (Phaelecommunications System (LTE; for the fixed-point distributeion extended advanced front.243 version 13.0.0 Release 1316-01) ha
2、se 2+); (UMTS); ted nt-end 13) ETSI ETSI TS 126 243 V13.0.0 (2016-01)13GPP TS 26.243 version 13.0.0 Release 13Reference RTS/TSGS-0426243vd00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00
3、017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any e
4、lectronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format
5、(PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http:/portal.etsi.org/tb/status/s
6、tatus.asp If you find errors in the present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photo
7、copying and microfilm except as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2
8、016. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks reg
9、istered and owned by the GSM Association. ETSI ETSI TS 126 243 V13.0.0 (2016-01)23GPP TS 26.243 version 13.0.0 Release 13Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if
10、 any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are availabl
11、e on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, o
12、r may be, or may become, essential to the present document. Foreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identi
13、ties. These should be interpreted as being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “s
14、hould not“, “may“, “need not“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI
15、 TS 126 243 V13.0.0 (2016-01)33GPP TS 26.243 version 13.0.0 Release 13Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 4g31 Scope 5g32 References 5g33 Definitions and abbreviations . 5g33.1 Definitions 5g33.2 Abbreviations . 5g34 C code structure 5g34.1 Co
16、ntents of the C source code 5g34.2 Program execution 6g34.3 Code hierarchy . 7g34.5 Variables, constants and tables . 12g34.5.1 Description of constants used in the C-code . 13g34.5.2 Description of fixed tables used in the C-code . 16g34.5.3 Static variables used in the C-code . 17g35 File formats
17、21g35.1 Speech file 21g3Annex A (informative): Change history . 22g3History 23g3ETSI ETSI TS 126 243 V13.0.0 (2016-01)43GPP TS 26.243 version 13.0.0 Release 13Foreword This Technical Specification has been produced by the 3rdGeneration Partnership Project (3GPP). The contents of the present document
18、 are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x t
19、he first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 or greater indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is increme
20、nted when editorial only changes have been incorporated in the document. ETSI ETSI TS 126 243 V13.0.0 (2016-01)53GPP TS 26.243 version 13.0.0 Release 131 Scope The present document contains an electronic copy of the ANSI-C code for DSR Extended Advanced Front-end. The ANSI-C code is necessary for a
21、bit exact implementation of DSR Extended Advanced Front-end. 2 References The following documents contain provisions which, through reference in this text, constitute provisions of the present document. 1 ETSI ES 202 050 (2007-01) V1.1.5: “Distributed Speech Recognition; Advanced Front-end Feature E
22、xtraction Algorithm; Compression Algorithm“. 2 ETSI ES 202 212 (2005-11) V1.1.2: “Distributed Speech Recognition; Extended Advanced Front-end Feature Extraction Algorithm; Compression Algorithm, Back-end Speech Reconstruction Algorithm“. 3 3GPP TS 26.177: “Speech Enabled Services (SES); Distributed
23、Speech Recognition (DSR) extended advanced front-end test sequences“. 3 Definitions and abbreviations 3.1 Definitions Definition of terms used in the present document, can be found in 1, 2 3.2 Abbreviations For the purpose of the present document, the following abbreviations apply: ANSI American Nat
24、ional Standards Institute I/O Input/OutputRAM Random Access Memory ROM Read Only Memory AFE Advanced Front-end X-AFE eXtended Advanced Front-end DSR Distributed Speech Recognition 4 C code structure This clause gives an overview of the structure of the bit-exact C code and provides an overview of th
25、e contents and organization of the C code attached to this document. The C code has been verified on the following systems: - Sun Microsystems workstations and GNU gcc compiler - IBM PC compatible computers with Linux operating system and GNU gcc compiler. ANSI-C was selected as the programming lang
26、uage because portability was desirable. 4.1 Contents of the C source code The distributed files with suffix “c“ contain the source code and the files with suffix “h“ are the header files. ETSI ETSI TS 126 243 V13.0.0 (2016-01)63GPP TS 26.243 version 13.0.0 Release 13Makefiles are provided for the pl
27、atforms in which the C code has been verified (listed above). 4.2 Program execution There are separate executables for the FrontEnd and Vector Quantization, with and without Extensions. The command line options are described below. - indicates parameters for the given option for running the executab
28、le () indicates default parameter. FrontEnd w/ Extension: USAGE: bin/ExtAdvFrontEnd infile HTK_outfile pitch_outfile class_outfile options OPTIONS: -q Quiet Mode (FALSE) -F format Input file format (NIST) -fs freq Sampling frequency in kHz (8) -swap Change input byte ordering (Native) -noh No HTK he
29、ader to output file (FALSE) -noc0 No c0 coefficient to output feature vector (FALSE) -nologE No logE component to output feature vector (FALSE) -skip_header_bytes n - Skip header, first n bytes ( Only for -F RAW) -noh, -noc0, -nologE and skip_header_bytes are not used and should not be changed. Fron
30、tEnd w/o Extension: USAGE: bin/AdvFrontEnd infile HTK_outfile options OPTIONS: - Same as FrontEnd w/ Extension Vector Quantization w/ Extension: Usage: extcoder htk_file_in pitch_file_in class_file_in bitstream_file_out pitch_file_out txt_file_out -freq x -VAD/No_VAD htk_file_in Input mel-frequency
31、cepstral coefficient file in HTK MFCC format. pitch_file_in Input pitch period file. class_file_in Input classification file. bit_file_out Output binary bitstream. pitch_file_out Output quantised pitch period file. txt_file_out Vector quantiser output in text format. -freq x Sampling frequency in kH
32、z (8 or 16). -VAD Use voice activity detector data. Voice activity input file must have same name as htk_file, but extension .vad -No_VAD Do not incorporate voice activity detector information in output bitstream. Vector Quantization w/o Extension: Usage: coder htk_file_in bitstream_file_out txt_fil
33、e_out -freq x -VAD/No_VAD htk_file_in Input mel-frequency cepstral coefficient file in HTK MFCC format. bit_file_out Binary output bitstream. txt_file_out Vector quantiser output in text format. -freq x Sampling frequency in kHz (8 or 16). -VAD Use voice activity detector data. Voice activity input
34、file must have same name as htk_file, but extension .vad -No_VAD Do not incorporate voice activity detector information in output bitstream. File extension descriptions as generated by the sample script: .cep Binary file containing cepstral features in HTK format. Output from the FrontEnd, input to
35、the vector quantizer. .pitch Binary file containing pitch information. Output from the FrontEnd, input to the vector quantizer. Only used for Extension. .class Ascii file containing class information. Output from the FrontEnd, input to the vector quantizer. Only used for Extension. .bs Binary file c
36、ontaining the bitstream. Output from the vector quantizer. .log Log files from the different executables. ETSI ETSI TS 126 243 V13.0.0 (2016-01)73GPP TS 26.243 version 13.0.0 Release 134.3 Code hierarchy Tables 1 to 3 are call graphs that show the functions used for AFE (table 1), VQ (table 2), and
37、Extension (table 3). Each column represents a call level and each cell a function. The functions contain calls to the functions in rightwards neighboring cells. The time order in the call graphs is from the top downwards as the processing of a frame advances. All standard C functions: printf(), fwri
38、te(), etc. have been omitted. Also, no basic operations (add(), L_add(), mac(), etc.) or double precision extended operations (e.g. L_Extract() appear in the graphs. The basic operations are not counted as extending the depth, therefore the deepest level in this software is level 7. ETSI ETSI TS 126
39、 243 V13.0.0 (2016-01)83GPP TS 26.243 version 13.0.0 Release 13Table 1: AFE call structure main() AdvProcessInit_B() DoNoiseSupInit_B() DoWaveProcInit_B() DoCompCepsInit_B()DoPostProcInit_B() DoVADInit_F() Do16kProcInit_B()QMF_FIR_Init_B() fir_initialization_B() DP_HP_filters_B()BufIn32Alloc() AdvPr
40、ocessAlloc_B() DoNoiseSupAlloc_B()DoWaveProcAlloc_B() DoCompCepsAlloc_B() DoPostProcAlloc_B()DoVADAlloc_F() Do16kProcAlloc_B() FlushAdvProcess_B() DoVADFlush_F() CvFeatInt2Float() AdvProcessDelete_B() DoNoiseSupDelete_B() DoWaveProcDelete_B() DoCompCepsDelete_B()DoPostProcDelete_B() DoVADDelete_B()
41、BufIn32Free()DoAdvProcess_B() Do16kProcessing_B() DoNoiseSup_B()Get16k_p_bufferData16k_B() Get16k_bufData16kSize_B()Get16k_p_BandsForCoding16k_B()Get16k_p_CodeForBands16k_B() Get16k_dataHP_B() VAD_F() Log_2() DoSigWindowing16_F1() DoSigWindowing16_F2()ff4NRFix32_B() GetL15() GetH15()Mult16x32()Add_M
42、ult16x16_16() Sub_Mult16x16_16()Permut() FFTtoPSD_F() Square24d2_B() Square24_B()Get16k_BFC_dec_B() GetBandsForCoding16k_B()PSDMean_F() NoiseEstimation_F1() Sqrt_2() Sqrt16_2()NoiseEstimation_F2() Sqrt_2() Sqrt16_2()FilterCalc_F() SpeechQVar()FilterBank16()SpeechQSpec() SpeechQMel() DoGainFact_F1()
43、Log_2() DoGainFact_F2() Log_2()DoMelIDCT_F16() ApplyWF() Get16k_dec1()Get16k_dec2()Get16k_dec3() DoSigWindowing16_F3() ff4NRFix32_B() GetL15() GetH15()Mult16x32()Add_Mult16x16_16() Sub_Mult16x16_16()Permut() FFTtoPSD_F() Square24d2_B() Square24_B()DoMelFB_B() CodeBands16k_B()DoSpecSub16k_B()Log_2()
44、UpDateDecal() ApplyDecal()DCOffsetFil_F()Get16k_hpBandsSize_B() ETSI ETSI TS 126 243 V13.0.0 (2016-01)93GPP TS 26.243 version 13.0.0 Release 13Get16k_p_hpBands_B() Get16k_p_bufferCodeForBands16k_B()Get16k_p_CodeForBands16k_B() Get16k_p_bufferCodeWeights_B() Get16k_p_codeWeights_B()Set16k_hpBands_dec
45、_B() DoWaveProc_B() TeagerEng() GetTeagerFilter()GetMaximaPositions() DoCompCeps_B() CepsCompute() Get16k_p_bufferCodeWeights_B() Get16k_p_bufferCodeForBands16k_B() PreEmphHamm() ff4NB16_B()GetBandsForDecoding16k_B() DecodeBands16k_B() FilterBank() Get16k_hpBands_dec_B()Get16k_p_hpBands_B() MergeSSa
46、ndCoded_B()CorrectEnergy_B()CosInv16Khz() cosInv() (only for 8kHz) DoPostProc_B() DoVADProc_F()focalpoint() Table 2: VQ call structure main() quantize_and_print()get_best_dataframe() best_centroid() quant_pitch_abs() get_class_bit()quant_pitch_diff()get_class_bit() mfcc_crc_encode()pc_crc_encode()ET
47、SI ETSI TS 126 243 V13.0.0 (2016-01)103GPP TS 26.243 version 13.0.0 Release 13Table 3: Extension call structure main() RVC_ConstructPitchRom_be() RVC_ConstructPitchMeter_be() Allocate_InterpolatedDft_be() RVC_ResetPitchMeter_be() RVC_DestructPitchRom_be() RVC_DestructPitchMeter_be() Deallocate_Inter
48、polatedDft_be() DoAdvProcess_B() DoPitchExtract() FilterBank() dsr_afe_vad() get_vm() fnLog2() IsLowBandNoise() get_zcm() pre_process() iir_d() iir_s() RVC_MeasurePitch_be() ClearPitch_be() DirichletInterpolation_be() IsLowLevelInput_be() Finalize_be() IsContinuousPitch_be() Mpy_lw_sw() Mpy_lw_sw()
49、PrepareSpectralPeaks_be() CalcSpectrum_be() Mpy_lw_sw() Mpy_lw_sw_Add() FindPeaks_be() Prelim_ScaleDownAmpsOfHighFreqPeaks_be() qsort_be()* swap() CompareIpointAmp_be() RefineSpectralPeaks_be() sqrt_l_fix() Final_ScaleDownAmpsOfHighFreqPeaks_be() Mpy_lw_sw() FindPitchCandidates_be() NormalizeAmplitudes_be() CalcUtilityFunction_be() CreatePieceWiseConstantFunction_be() L_Extract() Mpy_32_16() qsort_be()* swap() Compare_ARRAY_OF_XPOINTS_be() LinkArrayOfPoints_be() AddSortedArrayOfPoints_be() LinkArray