Xiaoke Zhang, Mi Zhou, and Gene Moo Lee
2023
Preparing for 3rd Round Review at MIS Quarterly
Best Student Paper Award Finalist, CIST 2023
Best Paper Award Runner-Up, KrAIS Summer Workshop 2023
DS 2022 · WITS 2022 · UBC 2022 · KrAIS 2023 · CSWIM 2023 · KrAIS 2023 · CIST 2023 · Santa Clara University 2024 · University of Delaware 2024 · Temple University 2024 · NUS 2025 · UBCO 2025
Video-sharing platforms like TikTok have introduced AI voice to assist creators in video creation. This creates a new form of multimodal human-AI collaboration, where AI automates audio narration and creators control textual and visual modalities. While the AI voice may reduce creators’ cognitive load in video production by offloading the effort of speech recording, it may also diminish their social benefit by substituting their unique vocal identities. Drawing on Social Exchange Theory, we theorize and empirically investigate the impact of AI voice adoption on creators’ video production and consumption leveraging a unique dataset of 554,252 TikTok videos. Using a stacked difference-in-differences model with propensity score matching, we find that AI voice adoption increases creators’ video production by 27%. Moreover, while AI voice reduces audio novelty, it enhances textual and visual novelty by freeing creators’ cognitive resources. However, AI voice in videos lowers viewer engagement. Mechanism analyses reveal that less-experienced creators and those with higher vocal identity congruence with the AI voice benefit more from AI voice adoption, as they produce more videos and experience smaller declines in video engagement. This study contributes to the user-generated content and human-AI collaboration literature while providing practical insights for video creators and multimodal UGC platforms.