• 综合性科技类中文核心期刊
    • 中国科技论文统计源期刊
    • 中国科学引文数据库来源期刊
    • 中国学术期刊文摘数据库(核心版)来源期刊
    • 中国学术期刊综合评价数据库来源期刊

语音同步的可视语音合成技术研究

贾熹滨, 尹宝才, 李敬华

贾熹滨, 尹宝才, 李敬华. 语音同步的可视语音合成技术研究[J]. 北京工业大学学报, 2005, 31(6): 656-661.
引用本文: 贾熹滨, 尹宝才, 李敬华. 语音同步的可视语音合成技术研究[J]. 北京工业大学学报, 2005, 31(6): 656-661.
JIA Xi-bin, YIN Bao-cai, LI Jing-hua. A Survey on Speech-synch Visual Speech Synthesizing Techniques[J]. Journal of Beijing University of Technology, 2005, 31(6): 656-661.
Citation: JIA Xi-bin, YIN Bao-cai, LI Jing-hua. A Survey on Speech-synch Visual Speech Synthesizing Techniques[J]. Journal of Beijing University of Technology, 2005, 31(6): 656-661.

语音同步的可视语音合成技术研究

基金项目: 

国家自然科学基金资助项目(60375007).

详细信息
    作者简介:

    贾熹滨(1969-),女,山西太原人,讲师.

  • 中图分类号: TP391

A Survey on Speech-synch Visual Speech Synthesizing Techniques

  • 摘要: 为了提出一种真实感较强的可视语音合成方案,对目前国内外主流研究方法进行了探讨.在基于对可视语音合成问题分析的基础上,提出了可视语音合成系统研究方法中首先要解决的2个问题:视觉语音特征模型的构建和声视频映射模型的构建.分析了目前国内外研究方法的主要解决方案,提出了在未来研究中本系统将采用的系统框架和重点研究内容.
    Abstract: In order to get a kind of feasible designing scheme to improve the realistic effects, the main research method is discussed. Based on that, two key questions are proposed:One of them is constructing the visual speech representation model, the other, audio/visual mapping model. After analyzing the main resolution scheme both nationally and internationally, system scheme and key research contents are proposed in the end.
  • [1]

    SUMMERFIELD Q.Use of visual information in phonetic perception[J].Phonetica,1979,36(4-5):314-331.

    [2]

    COHEN M,MASSARO D.Modeling coarticulation in synthetic visual speech in models and techniques in computer animation[A].Computer Animation'93[C].Tokyo:Springer-Verlag,1993.139-156.

    [3]

    MCGURK H,MACDONALDJ.Hearing lips and seeing voices[J].Nature,1976,264(5588):746-748.

    [4]

    COHEN M,MASSARO D,CLARK R.Training a talking head[Z].ICMI'02,IEEE 4th Int Conf on Multimodal Interfaces,Pittsburgh,2002.

    [5]

    LEWIS J P,PARKE F.Automated lip-synch and speech synthesis for character animation[Z].CHI/GI 1987 Conference on Human Factors in Computing Systems and Graphics Interface,Toronto,Canada,1987.

    [6] 陈益强,高文,王兆琪,等.基于机器学习的语音驱动人脸动画方法[J].软件学报,2003,14(2):215-222. CHEN Yi-qiang,GAO Wen,WANG Zhao-qi,et al.A speech driven face animation system based on machine learning[J]. Journal of Software,2003,14(2):215-222.(in Chinese)
    [7]

    EZZAT T,POGGIO T.Visual speech synthesis by morphing visemes[J].International Journal of Computer Vision,2000, 38(1):45-57.

    [8]

    BREGLEr C,COVELL M,SLANLEY M.Video rewrite:Driving visual speech with audio[A].Proc ACM SIGGRAPH' 97[C].New York:ACM Press/Addison-Wesley Publishing Co,1997,353-360.

    [9]

    BRAND M.Voice puppetry[A].Proceedings of ACM SIGGRAPH 1999[C].New York:ACM Press/Addison-Wesley Publishing Co,1999.21-28.

    [10]

    COSATTO E,GRAF HP.Photo-realistic talkingheads from image samples[J].IEEE Transactions on Multimedia,2000, 2(3):152-163.

    [11]

    KAKUMANU P,GUTIERREZ OSUNA R,ESPOSITO A,et al.Speech-driven facial animation[A].Proceedings of the 2001 workshop on Perceptive user interfaces[C].Orlando:ACM Press,2001.1-5.

    [12]

    LAVAGETTO F.Time-delay neural networks for estimating lip movements from speech analysis:A useful tool in audiovideo synchronization[J].IEEE Transactions on Circuits and Systems for Video Technology,1997,7(5):786-801.

    [13]

    YAMAMOTO E,NAKAMURA S,SHIKANO K.Lip movement synthesis from speech based on hidden markov models[J]. Speech Communication,1998,26(1-2):105-115.

    [14]

    MASATSUNE T,SHIGEKAZU T M,TAKAO K.Text-to-audio-visual speech synthesis based on parameter generation from HMM[Z].Sixth European Conference on Speech Communication and Technology,Budapest,1999.

    [15]

    RAO R R,CHEN T.Audio-to-visual conversion for multimedia communication[J].IEEE Transactions on Industrial Electronics,1998,45(1):15-22.

    [16]

    CHOI K,LUO Y,HWANG J N.Hidden markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system[J].Journal of VLSI Signal Processing,2001,29(1-2):51-61.

    [17]

    WILLIAMS J,KATSAGGELOS K.An HMMbased speech-to-video synthesizer[J].IEEE Transactions on Neural Networks,2002,13(4):900-915.

计量
  • 文章访问数:  10
  • HTML全文浏览量:  0
  • PDF下载量:  5
  • 被引次数: 0
出版历程
  • 收稿日期:  2004-05-11
  • 网络出版日期:  2022-11-21

目录

    /

    返回文章
    返回