ECE of NUS, IEEE SMC Chapter (Singapore), COLIPS and TDSS hosted an online seminar by Dr. Yuki Saito and Mr. Detai Xin.
Speaker: Dr. Yuki Saito and Mr. Detai Xin, Graduate School of Information Science and Technology, The University of Tokyo, Japan
Jointly organized by
Department of Electrical and Computer Engineering, College of Design & Engineering, National University of Singapore
IEEE Systems, Man and Cybernetics Singapore Chapter
Chinese and Oriental Languages Information Processing Society
Teochew Doctorate Society, Singapore
Date and Time: 18 Aug 2022, Thursday, 2-4pm
[Part 1] Towards human-in-the-loop speech synthesis technology (Y. Saito): Recent development on speech synthesis based on deep neural network has made it possible to synthesize high quality speech as natural as human speech. In this talk, I discuss how humans can intervene in state-of-the-art speech synthesis technologies and introduce our two related work: 1) speaker representation learning considering perceptual similarity among speakers and 2) speaker adaptation of multi-speaker speech synthesis based on human perception.
[Part 2] Maintaining data consistency in speech quality assessment and speech emotion recognition (D. Xin): Features or labels have either explicit or implicit relations in many speech processing tasks. In this talk, I will introduce our recent two works that explored relations between speech data: (1) automatic speech quality assessment and (2) speech emotion recognition of nonverbal vocalizations. Our methods used contrastive learning and classifier chains to maintain the data consistency that obtained substantial improvements over baseline methods.
Yuki Saito is a project research associate at The University of Tokyo, Japan. He received the Ph.D. degree in 2021 from the Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan. His research interests include speech synthesis and machine learning. He is the recipient of more than ten paper awards including the 2020 IEEE SPS Young Author Best Paper Awards. He is an SPS Member of IEEE and a Member of ISCA.
Detai Xin is a Ph.D. student at The University of Tokyo, Japan. He received the M.S. degree in 2020 from the Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan. His research interests include speech synthesis, speech processing, and machine learning. His work on speech emotion recognition, speech quality assessment and text-to-speech synthesis has been published in leading conferences including INTERSPEECH and ICASSP. His current research focuses on the study of emotional nonverbal vocalizations to improve the expressive ability of speech synthesis.