Convert face images into speech waveforms
Convert face images into speech waveforms
Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
arXiv paper abstract https://arxiv.org/abs/2107.12003
arXiv PDF paper https://arxiv.org/pdf/2107.12003.pdf
Project Web page https://realanonymousiccv.github.io/
... synthesize speaker-specific speech waveforms by conditioning on videos of an individual's face.
... method directly converts face images into speech waveforms under an end-to-end training framework.
The linguistic features are extracted from lip movements using a lip-reading model, and the speaker characteristic features are predicted from face images using cross-modal learning with a pre-trained acoustic model.
... can flexibly synthesize speech waveforms whose speaker characteristics vary depending on the input face images.
Therefore, our method can be regarded as a multi-speaker face-to-speech waveform model.
We show the superiority of our proposed model over conventional methods in terms of both objective and subjective evaluation results. ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments