LING Xiaoming, CHEN Hongyan, ZHANG Xiaoyu, ZHANG Zhen. Speaker Recognition Algorithm Based on ASP-SERes2Net[J]. Journal of Beijing University of Technology, 2025, 51(1): 42-50. DOI: 10.11936/bjutxb2023060027
    Citation: LING Xiaoming, CHEN Hongyan, ZHANG Xiaoyu, ZHANG Zhen. Speaker Recognition Algorithm Based on ASP-SERes2Net[J]. Journal of Beijing University of Technology, 2025, 51(1): 42-50. DOI: 10.11936/bjutxb2023060027

    Speaker Recognition Algorithm Based on ASP-SERes2Net

    • To improve the feature extraction ability of speaker recognition and enhance the low recognition rate in noise environment, a speaker recognition algorithm—ASP-SERes2Net is proposed based on residual network. First, the Mel spectrum was used as the input of the neural network. Second, the residual block of the Res2Net was improved and squeeze-and-excitation (SE) attention module was introduced. Then, the average pooling was replaced by the attention statistics pooling (ASP). Finally, the additive angular margin Softmax (AAM-Softmax) function was used to classify the identity of the speaker. Through experiments, the performance of the ASP-SERes2Net algorithm was compared with that of time delay neural network (TDNN), ResNet34 and Res2Net. The minimum detection cost function (MinDCF) value of the ASP-SERes2Net algorithm was 0.040 1 and equal error rate (EER) was 0.52%, which were significantly better than the other three models. Results show that the ASP-SERes2Net algorithm has better performance and is suitable for speaker recognition applied in noise environment.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return