Music chord recognition from audio data using bidirectional encoder-decoder LSTMs

Takeshi Hori, Kazuyuki Nakamura, Shigeki Sagayama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, we discuss some methods for chord recognition based on long short-term memory recurrent neural networks (LSTM, LSTM-RNN). Chord progressions play an important role in the generation process of music. Actually, music processing systems containing a model for chord progressions achieve high accuracies in tasks like music structure analysis, multi pitch analysis an automatic composition or accompaniment. In previous research, chord progressions were obtained rule- based or have been modeled using stochastic methods like hidden Markov models or probabilistic context-free grammars. Pitch patterns were then regarded as the observations resulting from the hidden states of the chord progression model. Recently, con- volutional neural networks have been used for chord recognition with considerable success. On the other hand, LSTM networks have been shown to be suitable for generating chord progressions, since these neural networks can process time series data very well. The purpose of this study is to evaluate and compare three types of LSTM networks based on the bidirectional and encoderdecoder structure with regards to their chord recognition performance. In order to extract more effective data for chord recognition, we use a constant-Q transform and specmurt analysis to suppress overtone components, and chroma vectorization to reduce the feature dimensionality. The evaluation results show that the encoder-decoder-based LSTM can learn the relationship between the observed chroma vectors and the associated chord progression more effectively than simpler LSTM networks.

Original languageEnglish
Title of host publicationProceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1312-1315
Number of pages4
ISBN (Electronic)9781538615423
DOIs
Publication statusPublished - 5 Feb 2018
Event9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 - Kuala Lumpur, Malaysia
Duration: 12 Dec 201715 Dec 2017

Publication series

NameProceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
Volume2018-February

Conference

Conference9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
CountryMalaysia
CityKuala Lumpur
Period12/12/1715/12/17

    Fingerprint

Cite this

Hori, T., Nakamura, K., & Sagayama, S. (2018). Music chord recognition from audio data using bidirectional encoder-decoder LSTMs. In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 (pp. 1312-1315). (Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017; Vol. 2018-February). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2017.8282235