A Survey on Speech-synch Visual Speech Synthesizing Techniques
-
摘要: 为了提出一种真实感较强的可视语音合成方案,对目前国内外主流研究方法进行了探讨.在基于对可视语音合成问题分析的基础上,提出了可视语音合成系统研究方法中首先要解决的2个问题:视觉语音特征模型的构建和声视频映射模型的构建.分析了目前国内外研究方法的主要解决方案,提出了在未来研究中本系统将采用的系统框架和重点研究内容.Abstract: In order to get a kind of feasible designing scheme to improve the realistic effects, the main research method is discussed. Based on that, two key questions are proposed:One of them is constructing the visual speech representation model, the other, audio/visual mapping model. After analyzing the main resolution scheme both nationally and internationally, system scheme and key research contents are proposed in the end.
-
Keywords:
- visual speech animation /
- audio/visual mapping /
- feature location /
- face modeling
-
-
[1] SUMMERFIELD Q.Use of visual information in phonetic perception[J].Phonetica,1979,36(4-5):314-331.
[2] COHEN M,MASSARO D.Modeling coarticulation in synthetic visual speech in models and techniques in computer animation[A].Computer Animation'93[C].Tokyo:Springer-Verlag,1993.139-156.
[3] MCGURK H,MACDONALDJ.Hearing lips and seeing voices[J].Nature,1976,264(5588):746-748.
[4] COHEN M,MASSARO D,CLARK R.Training a talking head[Z].ICMI'02,IEEE 4th Int Conf on Multimodal Interfaces,Pittsburgh,2002.
[5] LEWIS J P,PARKE F.Automated lip-synch and speech synthesis for character animation[Z].CHI/GI 1987 Conference on Human Factors in Computing Systems and Graphics Interface,Toronto,Canada,1987.
[6] 陈益强,高文,王兆琪,等.基于机器学习的语音驱动人脸动画方法[J].软件学报,2003,14(2):215-222. CHEN Yi-qiang,GAO Wen,WANG Zhao-qi,et al.A speech driven face animation system based on machine learning[J]. Journal of Software,2003,14(2):215-222.(in Chinese) [7] EZZAT T,POGGIO T.Visual speech synthesis by morphing visemes[J].International Journal of Computer Vision,2000, 38(1):45-57.
[8] BREGLEr C,COVELL M,SLANLEY M.Video rewrite:Driving visual speech with audio[A].Proc ACM SIGGRAPH' 97[C].New York:ACM Press/Addison-Wesley Publishing Co,1997,353-360.
[9] BRAND M.Voice puppetry[A].Proceedings of ACM SIGGRAPH 1999[C].New York:ACM Press/Addison-Wesley Publishing Co,1999.21-28.
[10] COSATTO E,GRAF HP.Photo-realistic talkingheads from image samples[J].IEEE Transactions on Multimedia,2000, 2(3):152-163.
[11] KAKUMANU P,GUTIERREZ OSUNA R,ESPOSITO A,et al.Speech-driven facial animation[A].Proceedings of the 2001 workshop on Perceptive user interfaces[C].Orlando:ACM Press,2001.1-5.
[12] LAVAGETTO F.Time-delay neural networks for estimating lip movements from speech analysis:A useful tool in audiovideo synchronization[J].IEEE Transactions on Circuits and Systems for Video Technology,1997,7(5):786-801.
[13] YAMAMOTO E,NAKAMURA S,SHIKANO K.Lip movement synthesis from speech based on hidden markov models[J]. Speech Communication,1998,26(1-2):105-115.
[14] MASATSUNE T,SHIGEKAZU T M,TAKAO K.Text-to-audio-visual speech synthesis based on parameter generation from HMM[Z].Sixth European Conference on Speech Communication and Technology,Budapest,1999.
[15] RAO R R,CHEN T.Audio-to-visual conversion for multimedia communication[J].IEEE Transactions on Industrial Electronics,1998,45(1):15-22.
[16] CHOI K,LUO Y,HWANG J N.Hidden markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system[J].Journal of VLSI Signal Processing,2001,29(1-2):51-61.
[17] WILLIAMS J,KATSAGGELOS K.An HMMbased speech-to-video synthesizer[J].IEEE Transactions on Neural Networks,2002,13(4):900-915.
计量
- 文章访问数: 10
- HTML全文浏览量: 0
- PDF下载量: 5