1、 KSKSKSKS KS X ISO/IEC 15938 4SKSKSKS KSKSKS SKSKS KSKS SKS KS 4:KS X ISO/IEC 15938 4: 2005 2005 12 26 X ISO/IEC 15938 4: 2005 e ( ) ( ) ( ) ( ) ( ) ( ) ( ) : : 2003 11 29 : 2005 12 26 2005 0947 : e : e ( ) ( 02 509 7272 4) . 7 5 , . X ISO/IEC 15938 4: 2005 e . KS X ISO/IEC 15938 4: 2005 . A( ) , B(
2、 ) C( ) KS X ISO/IEC 15938 “ (MPEG 7)” . 1: 2: 3: 4: 5: 6: 7 : 8: MPEG 7 i X ISO/IEC 15938 4: 2005 i 1 1. 1 2. 2 3. 2 4. 3 4.1 3 4.2 3 4.3 3 5. 5 5.1 5 5.2 6 5.3 14 5.4 43 6. 45 6.1 45 6.2 45 6.3 47 6.4 52 6.5 62 6.6 83 6.7 94 6.8 108 A( ) , 112 B( ) 120 C( ) 121 125 ii ICS 35.040 KS X ISO/IEC 4 : 1
3、5938 4: 2005Information technology Multimedia content description interface Part 4: Audio 2002 1 ISO/IEC 15938 4 Information technology Multimedia content description interface Part 4: Audio . 1. 1.1 . , , . MPEG 7 . MPEG 7 8 . . . . KS X ISO/IEC 15938 2 . KS X ISO/IEC 15938 2 KS X ISO/IEC 15938 5 .
4、 KS X ISO/IEC 15938 . . KS X ISO/IEC 15938 2 . KS X ISO/IEC 15938 2 . 1.2 . , . , , , . . . (tool box) . . (framework) . (scalable series) . X ISO/IEC 15938 4: 2005 . (silence) , . . . KS X ISO/IEC 15938 4 , , , (melody), . . 2. . 2.1 (frame) (period) , . f . x( f, t) s(t)h(t f S ) s(t): t h(t): (ha
5、mming analysis window) S: (hop size) 2.2 (hop size) . 2.3 (r unning window analysis) . h(t) S f h(t f S) . 2.4 ( instantaneous values) ( ) . ( ) (global) . 3. ASR (Automatic Speech Recognition) CPU (Central Processing Unit) D (Descriptor) DC (Direct Current)(0 Hz) DDL (Description Definition Languag
6、e) DFT (Discrete Fourier Transform) DS (Description Scheme) FFT (Fast Fourier Transform) HMM (Hidden Markov Model) Hz , (Hertz, frequency in cycles per second) LLD (Low Level Descriptor) LOG (Logarithm) LPC (Linear Predictive Coding) MSD (Maximum Squared Distance from the mean) OOV (Out of Vocabular
7、y) RMS (Root Mean Square) SR (Sample Rate) 2 X ISO/IEC 15938 4: 2005 STFT (Short Time Fourier Transform) XML (Extensible Markup Language) 4. 4.1 . . XML . xsd: . KS X ISO/IEC 15938 mpeg7: . . 4.2 (audio representation) (sample) 2 (float) ( 1 1 1 1 ). (MSB: Most Signficant Bit) 1 . 4.3 : KS X ISO/IEC
8、 15938 5/Amd.1(MDS) AudioD AudioDS . . KS X ISO/IEC 15938 . L, C, R, LS, LR, LFE . KS X ISO/IEC 15938 5/Amd.1(MDS) . . (L), (C), (R) (subgroup) . . . (5.1 ) . (L, R, C, LS, RS, LFE) , 3 X ISO/IEC 15938 4: 2005 AMD1 1 . , , ( AMD 12 ). . LFE . 1 2 . (KS X ISO/IEC 15938 5 ) AudioSegmentD Framework . K
9、S X ISO/IEC 15938 5/Amd.1(MDS) 4.2.4 . If Center is present If Center is not presentCenter ch_1 Right ch_3Right ch_2 Left ch_1Left ch_2 ch_N-2 ch_6 ch_4 ch_3 ch_5 ch_4 ch_5 ch_7 ch_6 ch_N-2 ch_N-1 ch_N-1 Surround Center =ch_N (optional) Surround Center =ch_N (optional) AMD1 1 4 X ISO/IEC 15938 4: 20
10、05 AMD1 2 3D ( ) : 5. 5.1 . , AudioSegment . AudioLLDScalarType 5 X ISO/IEC 15938 4: 2005 AudioLLDVectorType . ScalableSeries AudioSegment . AudioSegment MPEG 7 KS X ISO/IEC 15938 5 . . AudioSegment , . AudioSegment MediaTime . TemporalMask AudioSegment . AudioSegment (segment tree) . AudioDType Aud
11、ioDSType . . KS X ISO/IEC 15938 5 . 1 . 1 5.2 (ScalableSeries) 5.2.1 . . SeriesOfScalarType SeriesOfVectorType 2 . . 5.2.2 ScalableSeriesType SeriesOfScalarType SeriesOfVectorType . (series) . 5.2.2.1 6 X ISO/IEC 15938 4: 2005 5.2.2.2 ScalableSeriesType . . . Scaling . . ratio ( ). Scaling 1 . numOf
12、Elements . Scaling totalNumOfSamples . totalNumOfSamples 2 . totalNumOfSamples numOfElements ratio . 2 k . 31 13 . 3 2 2 6 2 . 2 . ratio numOfElements totalNumOfSamples . 7 X ISO/IEC 15938 4: 2005 5.2.3 SeriesOfScalarType . . 5.2.3.1 5.2.3.2 . P Q . N PQ . (ratio 1) Raw . , 0 Mean . Raw . 8 X ISO/IEC 15938 4: 2005 SeriesOfScalarType Raw ( ). , . Min . NumOfElements . Raw . Max . NumOfElements . Raw