Automatic vocal quality classification
DOI:
https://doi.org/10.33414/ajea.4.408.2019Keywords:
Deep Learning, Artificial Neural Network, Vocal QualityAbstract
In order to classify the vocal quality on GRBAS scale, an approach of end-to-end neural network design is presented. Based on this approach, three neural networks are shown. These neural networks calculate the shortterm Fourier transform (STFT), cepstrum and shimmer of an audio signal. The training of the networks that calculate STFT and shimmer was successful. The network that calculates the cepstrum could not be trained, but an alternative model that calculates the autocovariance could. It is concluded that the developed neural networks are compatible with the proposed approach. This is because they allow the error gradient backpropagation, a necessary condition for training the complete model.