国際会議 IEEE MLSP 2024 にて発表（阿部研・共著）

September 25, 2024

2024年9月22日から25日にかけてイギリス・ロンドンにて開催された The 34th IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP 2024) において，阿部教授による研究発表が行われました．

Takumi Wada, Sunao Hara, and Masanobu Abe, ``Explicit prosody control to realize discourse focus in end-to-end text-to-speech,’' The 34th IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP 2024), Sept. 2024.
— Poster presentation [2024.9.25], IEEE MLSP 2024, 2024.9.22--25

概要

In this paper, we propose an end-to-end text-to-speech (E2E-TTS) system that enables emphasis prosody using explicit symbols.The proposal hinges on the novel idea that emphasis tags can be generated for TTS training by comparing speech readings of novels by a professional voice actor with synthesized speech generated from the same texts.While human speech naturally contains appropriate emphasis for novels, synthesized speech tends to be more monotonous because of E2E-TTS are typically trained using news article speech. Using this comparative method, we can determine the appropriate emphasis prosody.An explicit prosody-controllable TTS (EPC-TTS) is trained with texts and attached emphasis tags. As prosodic features, F0, power, and a combination of the two are examined.Judging from objective and subjective evaluations, we confirmed that the EPC-TTS can successfully realize a phrase focus using explicit symbols, and the combination of F0 and power yielded the best performance.

参考URL

https://2024.ieeemlsp.org/