孙艳丰, 林仙平, 尹宝才, 贾熹滨. 一种基于双重学习模型的可视语音合成系统[J]. 北京工业大学学报, 2009, 35(5): 702-707.
    引用本文: 孙艳丰, 林仙平, 尹宝才, 贾熹滨. 一种基于双重学习模型的可视语音合成系统[J]. 北京工业大学学报, 2009, 35(5): 702-707.
    SUN Yan-feng, LIN Xian-ping, YIN Bao-cai, JIA Xi-bin. Visual Speech Synthesis Based on Learning Model[J]. Journal of Beijing University of Technology, 2009, 35(5): 702-707.
    Citation: SUN Yan-feng, LIN Xian-ping, YIN Bao-cai, JIA Xi-bin. Visual Speech Synthesis Based on Learning Model[J]. Journal of Beijing University of Technology, 2009, 35(5): 702-707.

    一种基于双重学习模型的可视语音合成系统

    Visual Speech Synthesis Based on Learning Model

    • 摘要: 为了在可视语音合成中获得更具有真实感的口型动画,提出了一种基于双重学习模型的合成方法.通过隐马尔可夫模型和遗传算法相结合的方法,可以更好地学习出语音特征与可视特征间的映射关系.该模型能去除传统语音识别领域在对大样本语音空间提取语音特征时的冗余信息,达到更好的可视语音预测效果.另外,在口型特征的表示上提出了一种基于面部动画参数特征点的几何特征表示,不仅对在不一致的光照条件下获得的训练样本有较好的鲁棒性,能更好地表征口型本身变化,而且与传统的主成分分析特征相比,具有较小的向量维数,提高了训练和合成速度.

       

      Abstract: In order to generate more realistic mouth animation in visual speech synthesis, this paper proposed a method based on a two-level learning model.The authors can learn the potential mapping relationship between acoustic features and the visual features through the combination of HMM(Hidden Markov Models) and GA(Genetic Algorithms).This model can decrease the redundant information in abstracting acoustic features for large acoustic sample space and predict more realistic mouth animation.In addition, this paper also proposed a new method based on FAP points in mouth feature expression.This method can eliminate the effect by illumination and decrease the dimensions of mouth feature vector.It improves the speed of training and synthesis.

       

    /

    返回文章
    返回