Publications

*: First Author(s), †: Corresponding Author(s)

2026

Conference

From Awareness to Adherence: Bridging the Context Gap in Spoken Dialogue Systems via Context-Aware Decoding

Che Hyun Lee*, Heeseung Kim†, and Sungroh Yoon†, in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) regular paper, Sydney, Australia, September 2026.

arXiv code

NaturalFlow: Reducing Disruptive Pauses for Natural Speech Flow in Simultaneous Speech-to-Speech Translation

Dongwook Lee*, Youngho Cho*, Sangkwon Park, Heeseung Kim†, and Sungroh Yoon†, in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) long paper, Sydney, Australia, September 2026. (Oral)

arXiv

Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions

Dongwook Lee*, Eunwoo Song, Che Hyun Lee, Heeseung Kim†, and Sungroh Yoon†, in Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), San Diego, United States, July 2026.

project arXiv dataset

Style-Friendly SNR Sampler for Style-Driven Generation

Jooyoung Choi*, Chaehun Shin*, Yeongtak Oh, Heeseung Kim, Jungbeom Lee†, and Sungroh Yoon†, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, USA, March 2026.

project arXiv

Journal

Beyond Language-Specific Neurons: The Challenge of Identifying Speech-Specific Neurons in Multimodal LLMs

Nohil Park*, Che Hyun Lee, Jiheum Yeom, Heeseung Kim, and Sungroh Yoon†, in IEEE Journal of Selected Topics in Signal Processing, in press.

Preprint & Workshop

TLDR: Compressing Audio Tokens for Efficient Autoregressive Text-to-Speech

Yejin Lee*, Junwon Moon, Hyoeun Kim, Hyunjin Choi, Heeseung Kim, and Kyuhong Shim†, arXiv preprint, 2026.

arXiv

DELTA-TTS: Adapting Autoregressive Model into a Diffusion Language Model for Text-to-Speech

Junwon Moon*, Seungbeom Kim, Yejin Lee, Hoseong Ahn, Sewoong Park, Heeseung Kim, and Kyuhong Shim†, in ICML Workshop on SPIGM, Seoul, South Korea, 2026.

Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

Yeongtak Oh*, Dongwook Lee, Sangkwon Park, Heeseung Kim, and Sungroh Yoon†, arXiv preprint, 2026.

project arXiv

Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching

Junwon Moon*, Hyunjin Choi, Hansol Park, Heeseung Kim, and Kyuhong Shim†, arXiv preprint, 2026.

arXiv

2025

Conference

Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models

Heeseung Kim*, Che Hyun Lee*, Sangkwon Park, Jiheum Yeom, Nohil Park, Sangwon Yu, and Sungroh Yoon†, in Findings of the Association for Computational Linguistics (ACL Findings), Vienna, Austria, July 2025.

project arXiv dataset

EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models

Che Hyun Lee*, Heeseung Kim, Jiheum Yeom, and Sungroh Yoon†, in Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Vienna, Austria, July 2025.

arXiv

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Chaehun Shin*, Jooyoung Choi, Heeseung Kim, and Sungroh Yoon†, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, June 2025.

project arXiv

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers

Nohil Park*, Heeseung Kim, Che Hyun Lee, Jooyoung Choi, Jiheum Yeom, and Sungroh Yoon†, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, April 2025. (Oral)

project arXiv

VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance

Jiheum Yeom*, Heeseung Kim, Jooyoung Choi, Che Hyun Lee, Nohil Park, and Sungroh Yoon†, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, April 2025.

project arXiv

2024

Conference

Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

Heeseung Kim*, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon†, and Kang Min Yoo†, in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, December 2024.

project arXiv code blog article

VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech

Heeseung Kim*, Sang-gil Lee, Jiheum Yeom, Che Hyun Lee, Sungwon Kim, and Sungroh Yoon†, in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Kos, Greece, September 2024.

project arXiv

Preprint & Workshop

HyperCLOVA X Technical Report

HyperCLOVA X Team, NAVER Cloud, arXiv preprint, 2024.

arXiv

2023

Conference

UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data

Heeseung Kim*, Sungwon Kim, Jiheum Yeom, and Sungroh Yoon†, in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Dublin, Ireland, August 2023. (Oral)

project arXiv code

Edit-A-Video: Single Video Editing with Object-Aware Consistency

Chaehun Shin*, Heeseung Kim*, Che Hyun Lee, Sang-gil Lee, and Sungroh Yoon†, in Proceedings of the Asian Conference on Machine Learning (ACML), Istanbul, Turkey, November 2023. (Oral, Best Paper Award)

project arXiv

2022

Conference

Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance

Heeseung Kim*, Sungwon Kim*, and Sungroh Yoon†, in Proceedings of the International Conference on Machine Learning (ICML), Baltimore, USA, July 2022.

project arXiv

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

Sang-gil Lee*, Heeseung Kim, Chaehun Shin, Xu Tan†, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon†, and Tie-Yan Liu, in Proceedings of the International Conference on Learning Representations (ICLR), Virtual, April 2022.

project arXiv code

Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings

Sangwon Yu*, Jongyoon Song, Heeseung Kim, Seong-min Lee, Woo-Jong Ryu, and Sungroh Yoon†, in Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Dublin, Ireland, May 2022.

arXiv code

Stein Latent Optimization for Generative Adversarial Networks

Uiwon Hwang*, Heeseung Kim, Dahuin Jung, Hyemi Jang, Hyungyu Lee, and Sungroh Yoon†, in Proceedings of the International Conference on Learning Representations (ICLR), Virtual, April 2022.

arXiv code

Journal

Silent Speech Recognition with Strain Sensors and Deep Learning Analysis of Directional Facial Muscle Movement

Hyunjun Yoo*, Eunji Kim*, Jong Won Chung*, Hyeon Cho, Sujin Jeong, Heeseung Kim, Dongju Jang, Hayun Kim, Jinsu Yoon, Gae Hwang Lee, Hyunbum Kang, Joo-Young Kim, Youngjun Yun†, Sungroh Yoon†, and Yongtaek Hong†, ACS Applied Materials & Interfaces, 2022.

Preprint & Workshop

Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data

Sungwon Kim*, Heeseung Kim*, and Sungroh Yoon†, arXiv preprint, 2022.

project arXiv

Page updated

Google Sites

Report abuse