Expanded estimation model for instantaneous presence in audio-visual content incorporating binaural information

Masaaki Ito, Kenji Ozawa, Masanori Morise, Yuichiro Kinoshita

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The sense of presence is a key component of the performance of multimedia content and systems. Our previous studies have shown that the sense of presence in audio-visual (AV) content has two elements: content presence and system presence. We constructed an estimation model of content presence as a time series. The accuracy of this model is compromised because it does not consider audio system presence. Therefore, a model that takes auditory system presence into account is needed. To construct such a model, we first conducted an experimental evaluation of instantaneous presence for 40 AV content items, using two auditory-reproduction methods of binaural and diotic reproduction techniques. Based on the experimental results, we constructed a neural network-based model that uses 19 AV features, extracted from the content items in 500-ms intervals, considering binaural information. The 19 features consist of 7 audio and 12 visual features. The audio features include two interaural information-related measures which are introduced to represent auditory system presence, i.e. the spatial impression of a sound. The visual features are basically the same as those in our previous model. A generalization test of the expanded model confirms that it is sufficiently accurate to estimate time series presence.

Original languageEnglish
Pages (from-to)1092-1102
Number of pages11
JournalJournal of Information Hiding and Multimedia Signal Processing
Volume8
Issue number5
Publication statusPublished - 1 Jan 2017

Keywords

  • Audio reproduction method
  • Audio-visual content
  • Content and system presence
  • Neural network
  • Sense of presence

Fingerprint

Dive into the research topics of 'Expanded estimation model for instantaneous presence in audio-visual content incorporating binaural information'. Together they form a unique fingerprint.

Cite this