Human Emotion Recognition Based on Voice Analysis
Abstract
This paper presents an experiment in human emotion recognition based on voice analysis. The audio signals generated by capturing the human voice with a microphone are input into an audio processing system that creates image files containing the spectrograms of the corresponding signals. Assuming that certain emotions produce recognizable alterations of the spectral composition of the voice signals, the respective spectrogram images were classified using a convolutional neural network. The training data set consisted in 2407 recordings created by 24 actors displaying a palette of emotional states ranging from neutral, calm, to happy, angry, scared, disgusted and surprised. The overall accuracy of the recognition system was quite modest (around 20%), but the implementation based on the open source TensorFlow library for machine learning is worth attention.
Downloads
@ "Dunarea de Jos" University of Galati