top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Convert face images into speech waveforms

Convert face images into speech waveforms


Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

arXiv paper abstract https://arxiv.org/abs/2107.12003





... synthesize speaker-specific speech waveforms by conditioning on videos of an individual's face.


... method directly converts face images into speech waveforms under an end-to-end training framework.


The linguistic features are extracted from lip movements using a lip-reading model, and the speaker characteristic features are predicted from face images using cross-modal learning with a pre-trained acoustic model.


... can flexibly synthesize speech waveforms whose speaker characteristics vary depending on the input face images.


Therefore, our method can be regarded as a multi-speaker face-to-speech waveform model.


We show the superiority of our proposed model over conventional methods in terms of both objective and subjective evaluation results. ...



Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website


15 views0 comments

Comments


ClickBank paid link

bottom of page