Speaker Recognition Algorithm Based on ASP-SERes2Net
-
-
Abstract
To improve the feature extraction ability of speaker recognition and enhance the low recognition rate in noise environment, a speaker recognition algorithm is proposed based on residual network—ASP- SERes2Net. First of all, the Mel spectrum was used as the input of the neural network. Second, the residual block of the Res2Net was improved and squeeze-and-excitation (SE) attention module was introduced. Then, the average pooling was replaced by the attention statistics pooling (ASP). Finally, the additive angular margin Softmax (AAM-Softmax) function was used to classify the identity of the speaker. Through experiments, the performance of the ASP-SERes2Net algorithm was compared with that of time delay neural networks (TDNN), ResNet34 and Res2Net. The MinDCF value of the ASP- SERes2Net algorithm was 0. 040 1 and EER was 0. 52%, which was significantly better than the other three models. Results show that the ASP-SERes2Net algorithm has better performance and is suitable for speaker recognition applied in noise environment.
-
-