国際会議 IEEE MLSP 2024 にて発表(阿部研・共著)
2024年9月22日から25日にかけてイギリス・ロンドンにて開催された The 34th IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP 2024) において,阿部教授による研究発表が行われました.
Takumi Wada, Sunao Hara, and Masanobu Abe,
``Explicit prosody control to realize discourse focus in end-to-end text-to-speech,’'
The 34th IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP 2024), Sept. 2024.
— Poster presentation [2024.9.25], IEEE MLSP 2024, 2024.9.22--25
概要
In this paper, we propose an end-to-end text-to-speech (E2E-TTS) system that enables emphasis prosody using explicit symbols.The proposal hinges on the novel idea that emphasis tags can be generated for TTS training by comparing speech readings of novels by a professional voice actor with synthesized speech generated from the same texts.While human speech naturally contains appropriate emphasis for novels, synthesized speech tends to be more monotonous because of E2E-TTS are typically trained using news article speech. Using this comparative method, we can determine the appropriate emphasis prosody.An explicit prosody-controllable TTS (EPC-TTS) is trained with texts and attached emphasis tags. As prosodic features, F0, power, and a combination of the two are examined.Judging from objective and subjective evaluations, we confirmed that the EPC-TTS can successfully realize a phrase focus using explicit symbols, and the combination of F0 and power yielded the best performance.
参考URL