Frequency domain variants of velvet noise and their application to speech processing and synthesis

Hideki Kawahara, Ken Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)

Abstract

We propose a new excitation source signal for VOCODERs and an all-pass impulse response for post-processing of synthetic sounds and pre-processing of natural sounds for data-augmentation. The proposed signals are variants of velvet noise, which is a sparse discrete signal consisting of a few non-zero (1 or -1) elements and sounds smoother than Gaussian white noise. One of the proposed variants, FVN (Frequency domain Velvet Noise) applies the procedure to generate a velvet noise on the cyclic frequency domain of DFT (Discrete Fourier Transform). Then, by smoothing the generated signal to design the phase of an all-pass filter followed by inverse Fourier transform yields the proposed FVN. Temporally variable frequency weighted mixing of FVN generated by frozen and shuffled random number provides a unified excitation signal which can span from random noise to a repetitive pulse train. The other variant, which is an all-pass impulse response, significantly reduces “buzzy” impression of VOCODER output by filtering. Finally, we will discuss applications of the proposed signal for watermarking and psychoacoustic research.

Original languageEnglish
Pages (from-to)2027-2031
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
Publication statusPublished - 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: 2 Sep 20186 Sep 2018

Keywords

  • All-pass filter
  • Speech processing
  • Speech synthesis
  • Voice excitation source
  • Voice quality

Fingerprint

Dive into the research topics of 'Frequency domain variants of velvet noise and their application to speech processing and synthesis'. Together they form a unique fingerprint.

Cite this