抄録
A framework based on human auditory models for speaker identification was proposed. Preliminary evaluation using a very small database was carried out to determine suitable deep neural networks (DNNs) parameters and to evaluate the effectiveness of the proposed framework. A database including four speakers was used for the experiment. This database consists of isolated vowels recorded in the recording studio with a microphone. Using and unifying several frames would improve the performance, and the frame-by-frame evaluation with isolated vowels is therefore one of the most strict conditions. The number of hidden layers was not important compared with the number of units in each hidden layer. The results showed two hidden layers were enough, and more than three did not improve the performance. The result suggests that the type of domain is not important, provided that the DNNs were used as the classifier.
本文言語 | English |
---|---|
ページ(範囲) | 340-343 |
ページ数 | 4 |
ジャーナル | Acoustical Science and Technology |
巻 | 36 |
号 | 4 |
DOI | |
出版ステータス | Published - 2015 |