I am a second-year M.S. in Computer Science student at the Umass Amherst, advised by Prof. Chuang Gan. I also work closely with Frank Dou from MIT CSAIL. In addition, I collaborate with Dr. Julian Tanke, and Dr. Takashi Shibuya at SonyAI on ongoing research projects. I received my B.Eng. degree from Fudan University, where I was fortunate to be advised by Prof. Yaqian Zhou. During senior years, I was a research intern at SRIBD, supervised by Prof. Haizhou Li.

My research interest lies in human-centered animation. More specifically, I work on character animation, computer graphics, physics-based simulation, and motion estimation. My ultimate goal is to build a neural-based physics engine for simulating human motion as well as human–object–scene interactions — in other words, the system I refer to as Clotho.

🔥 News

2025.10: 🎉🎉 One co-first-author paper TalkCuts has been accepted to NeurIPS 2025.
2025.07: 🎉🎉 One paper Rapverse has been accepted to ICCV 2025.

📝 Publications

* indicates equal contributions.

NeurIPS 2025

TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation

Jiaben Chen^*, Zixin Wang^*, Ailing Zeng, Yang Fu, Xueyang Yu, Siyuan Cen, Julian Tanke, Yihang Chen, Koichi Saito, Yuki Mitsufuji, Chuang Gan

Neural Information Processing Systems (NeurIPS), 2025

project page / paper / code

In this work, we present TalkCuts, a large-scale benchmark dataset designed to facilitate the study of multi-shot human speech video generation.

ICCV 2025

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Zixin Wang, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan

International Conference on Computer Vision (ICCV), 2025

project page / paper / code

In this paper, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rapping vocals, lyrics, and high-quality 3D holistic body meshes.

🛠️ Project

SpeechGPT2: End-to-End Speech-Centric Large Language Model for Unified Listening, Speaking, and Understanding

project page / code

SpeechGPT2 is an end-to-end speech-centric large language model designed to unify listening, speech understanding, and spoken response generation. The model supports natural multi-turn audio-based interaction **without** requiring intermediate text conversion.

📖 Educations

University of Massachusetts Amherst
MSCS Student, 2024.09 - Present
Advisor: Prof. Chuang Gan

Fudan University
Undergraduate, 2020.09 - 2024.06
Advisor: Prof. Yaqian Zhou

💻 Experiences

SRIBD(CUHKSZ)
Research Intern, 2023.06 - 2023.09
Advisor: Prof. Haizhou Li

SONY AI
Collaboration, 2025.01 - Present
Host: Dr. Julian Tanke

MIT-IBM Watson AI Lab
Collaboration, 2025.01 - 2025.05
Host: Dr. Yang Zhang

🎬Life

I love anime and I’m a pretty hardcore ACG fan. I still follow new series every month. My favorite anime isEvangelion. I also watch a lot of movies in my spare time.
My favorite game is Baldur’s Gate 3. My dream is to use AI to rebuild Baldur’s Gate, increase its level of freedom, and create a highly open-world experience with far more player agency.
The two courses I enjoyed the most in college were A Reading of Descartes’ Meditations on First Philosophy and Film Appreciation.