Convert face images into speech waveforms

morrislee
Jul 27, 2021
1 min read

Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

arXiv paper abstract https://arxiv.org/abs/2107.12003

arXiv PDF paper https://arxiv.org/pdf/2107.12003.pdf

Project Web page https://realanonymousiccv.github.io/

... synthesize speaker-specific speech waveforms by conditioning on videos of an individual's face.

... method directly converts face images into speech waveforms under an end-to-end training framework.

The linguistic features are extracted from lip movements using a lip-reading model, and the speaker characteristic features are predicted from face images using cross-modal learning with a pre-trained acoustic model.

... can flexibly synthesize speech waveforms whose speaker characteristics vary depending on the input face images.

Therefore, our method can be regarded as a multi-speaker face-to-speech waveform model.

We show the superiority of our proposed model over conventional methods in terms of both objective and subjective evaluation results. ...

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

#ComputerVision #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Convert face images into speech waveforms

Recent Posts

Comments