Low-dimensional representation of spectral envelope without deterioration for full-band speech analysis/synthesis system

Masanori Morise, Genta Miyashita, Kenji Ozawa

Research output: Contribution to journalConference articlepeer-review

4 Citations (Scopus)

Abstract

A speech coding for a full-band speech analysis/synthesis system is described. In this work, full-band speech is defined as speech with a sampling frequency above 40 kHz, whose Nyquist frequency covers the audible frequency range. In prior works, speech coding has generally focused on the narrowband speech with a sampling frequency below 16 kHz. On the other hand, statistical parametric speech synthesis currently uses the full-band speech, and low-dimensional representation of speech parameters is being used. The purpose of this study is to achieve speech coding without deterioration for full-band speech. We focus on a high-quality speech analysis/synthesis system and mel-cepstral analysis using frequency warping. In the frequency warping function, we directly use three auditory scales. We carried out a subjective evaluation using the WORLD vocoder and found that the optimum number of dimensions was around 50. The kind of frequency warping did not significantly affect the sound quality in the dimensions.

Original languageEnglish
Pages (from-to)409-413
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017

Keywords

  • Frequency warping
  • Spectral envelope
  • Speech analysis/synthesis
  • Speech coding

Fingerprint

Dive into the research topics of 'Low-dimensional representation of spectral envelope without deterioration for full-band speech analysis/synthesis system'. Together they form a unique fingerprint.

Cite this