TY - JOUR
T1 - Low-dimensional representation of spectral envelope without deterioration for full-band speech analysis/synthesis system
AU - Morise, Masanori
AU - Miyashita, Genta
AU - Ozawa, Kenji
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Numbers JP16H01734, JP15H02726, JP16H05899, JP16K12511.
Publisher Copyright:
Copyright © 2017 ISCA.
PY - 2017
Y1 - 2017
N2 - A speech coding for a full-band speech analysis/synthesis system is described. In this work, full-band speech is defined as speech with a sampling frequency above 40 kHz, whose Nyquist frequency covers the audible frequency range. In prior works, speech coding has generally focused on the narrowband speech with a sampling frequency below 16 kHz. On the other hand, statistical parametric speech synthesis currently uses the full-band speech, and low-dimensional representation of speech parameters is being used. The purpose of this study is to achieve speech coding without deterioration for full-band speech. We focus on a high-quality speech analysis/synthesis system and mel-cepstral analysis using frequency warping. In the frequency warping function, we directly use three auditory scales. We carried out a subjective evaluation using the WORLD vocoder and found that the optimum number of dimensions was around 50. The kind of frequency warping did not significantly affect the sound quality in the dimensions.
AB - A speech coding for a full-band speech analysis/synthesis system is described. In this work, full-band speech is defined as speech with a sampling frequency above 40 kHz, whose Nyquist frequency covers the audible frequency range. In prior works, speech coding has generally focused on the narrowband speech with a sampling frequency below 16 kHz. On the other hand, statistical parametric speech synthesis currently uses the full-band speech, and low-dimensional representation of speech parameters is being used. The purpose of this study is to achieve speech coding without deterioration for full-band speech. We focus on a high-quality speech analysis/synthesis system and mel-cepstral analysis using frequency warping. In the frequency warping function, we directly use three auditory scales. We carried out a subjective evaluation using the WORLD vocoder and found that the optimum number of dimensions was around 50. The kind of frequency warping did not significantly affect the sound quality in the dimensions.
KW - Frequency warping
KW - Spectral envelope
KW - Speech analysis/synthesis
KW - Speech coding
UR - http://www.scopus.com/inward/record.url?scp=85039150871&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2017-67
DO - 10.21437/Interspeech.2017-67
M3 - Conference article
AN - SCOPUS:85039150871
VL - 2017-August
SP - 409
EP - 413
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
T2 - 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Y2 - 20 August 2017 through 24 August 2017
ER -