Speaker Recognition Algorithm Based on ASP-SERes2Net

LING Xiaoming; CHEN Hongyan; ZHANG Xiaoyu; ZHANG Zhen

doi:10.11936/bjutxb2023060027

LING Xiaoming, CHEN Hongyan, ZHANG Xiaoyu, ZHANG Zhen. Speaker Recognition Algorithm Based on ASP-SERes2Net[J]. Journal of Beijing University of Technology, 2025, 51(1): 42-50. DOI: 10.11936/bjutxb2023060027

Citation:

Speaker Recognition Algorithm Based on ASP-SERes2Net

Graphical Abstract

Graphical Abstract

Abstract

Abstract

To improve the feature extraction ability of speaker recognition and enhance the low recognition rate in noise environment, a speaker recognition algorithm—ASP-SERes2Net is proposed based on residual network. First, the Mel spectrum was used as the input of the neural network. Second, the residual block of the Res2Net was improved and squeeze-and-excitation (SE) attention module was introduced. Then, the average pooling was replaced by the attention statistics pooling (ASP). Finally, the additive angular margin Softmax (AAM-Softmax) function was used to classify the identity of the speaker. Through experiments, the performance of the ASP-SERes2Net algorithm was compared with that of time delay neural network (TDNN), ResNet34 and Res2Net. The minimum detection cost function (MinDCF) value of the ASP-SERes2Net algorithm was 0.040 1 and equal error rate (EER) was 0.52%, which were significantly better than the other three models. Results show that the ASP-SERes2Net algorithm has better performance and is suitable for speaker recognition applied in noise environment.

FullText(HTML)

References (18)

Cited By

Turn off MathJax

Article Contents

Speaker Recognition Algorithm Based on ASP-SERes2Net

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content