LING Xiaoming, CHEN Hongyan, ZHANG Xiaoyu, ZHANG Zhen. Speaker Recognition Algorithm Based on ASP-SERes2Net[J]. Journal of Beijing University of Technology. DOI: 10.11936/bjutxb2023060027
    Citation: LING Xiaoming, CHEN Hongyan, ZHANG Xiaoyu, ZHANG Zhen. Speaker Recognition Algorithm Based on ASP-SERes2Net[J]. Journal of Beijing University of Technology. DOI: 10.11936/bjutxb2023060027

    Speaker Recognition Algorithm Based on ASP-SERes2Net

    • To improve the feature extraction ability of speaker recognition and enhance the low recognition rate in noise environment, a speaker recognition algorithm is proposed based on residual network—ASP- SERes2Net. First of all, the Mel spectrum was used as the input of the neural network. Second, the residual block of the Res2Net was improved and squeeze-and-excitation (SE) attention module was introduced. Then, the average pooling was replaced by the attention statistics pooling (ASP). Finally, the additive angular margin Softmax (AAM-Softmax) function was used to classify the identity of the speaker. Through experiments, the performance of the ASP-SERes2Net algorithm was compared with that of time delay neural networks (TDNN), ResNet34 and Res2Net. The MinDCF value of the ASP- SERes2Net algorithm was 0. 040 1 and EER was 0. 52%, which was significantly better than the other three models. Results show that the ASP-SERes2Net algorithm has better performance and is suitable for speaker recognition applied in noise environment.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return