节奏舞者：基于关键动作转换图和有条件姿态插值网络的3D舞蹈生成方法研究

贺亚运; 彭俊清; 王健宗; 肖京

doi:10.11959/j.issn.2096-0271.2023004

您当前的位置：

首页 >

文章列表页 >

节奏舞者：基于关键动作转换图和有条件姿态插值网络的3D舞蹈生成方法研究

专题：元宇宙与大数据 | 更新时间：2024-06-05

- 节奏舞者：基于关键动作转换图和有条件姿态插值网络的3D舞蹈生成方法研究
- Rhythm dancer: 3D dance generation by keymotion transition graph and pose-interpolation network
- 大数据 2023年9卷第1期页码：23-37
- 作者机构：
- 作者简介：
  
  [ "贺亚运（1990- ），男，平安科技（深圳）有限公司资深算法工程师，主要研究方向为人工智能、声纹识别、元宇宙虚拟人等" ]
  [ "彭俊清（1973- ），男，国家认证计算机系统架构设计师，平安科技（深圳）有限公司资深经理，高级人工智能算法研究员，在IT行业耕耘多年，精通架构设计、云平台和AI系统建设，发表多篇论文，获得多项专利授权" ]
  [ "王健宗（1983- ），男，博士，平安科技（深圳）有限公司副总工程师，美国佛罗里达大学人工智能博士后，中国计算机学会（CCF）杰出会员，深圳市计算机学会理事，深圳市地方级领军人才，《大数据》期刊编委，曾任美国莱斯大学电子与计算机工程系研究员，主要研究方向为隐私计算、元宇宙、边缘计算和量子计算。曾获得中国专利奖优秀奖、深圳市科技进步奖、CCF科学技术奖、《麻省理工科技评论》中国2022年隐私计算科技创新人物称号等" ]
  [ "肖京（1972- ），男，博士，平安集团首席科学家，深圳市政协委员，深圳市决策咨询委员会委员，CCF深圳分部副主席，广东省人工智能与机器人学会副理事长，上海市科协人工智能专业委员会委员，深圳市人工智能行业协会会长。先后在爱普生美国研究院及美国微软公司担任高级研发管理职务。发表学术论文249篇，美国授权专利101项，中国授权专利155项，参与及承担国家级项目11项，获吴文俊人工智能科学技术进步奖一等奖、上海市科学技术进步奖一等奖、中国专利优秀奖、广东省专利优秀奖，以及吴文俊人工智能“杰出贡献奖”" ]
- 基金信息：
  
  广东省重点领域研发计划“新一代人工智能”重大专项;The Key Research and Development Program of Guangdong Province(2021B0101400003)
- DOI：10.11959/j.issn.2096-0271.2023004
  中图分类号： TP399
- 网络首发：2023-01，
  
  纸质出版：2023-01-15
- 稿件说明：
移动端阅览
贺亚运, 彭俊清, 王健宗, 等. 节奏舞者：基于关键动作转换图和有条件姿态插值网络的3D舞蹈生成方法研究[J]. 大数据, 2023,9(1):23-37.

Yayun HE, Junqing PENG, Jianzong WANG, et al. Rhythm dancer: 3D dance generation by keymotion transition graph and pose-interpolation network[J]. Big data research, 2023, 9(1): 23-37.
贺亚运, 彭俊清, 王健宗, 等. 节奏舞者：基于关键动作转换图和有条件姿态插值网络的3D舞蹈生成方法研究[J]. 大数据, 2023,9(1):23-37. DOI： 10.11959/j.issn.2096-0271.2023004.

Yayun HE, Junqing PENG, Jianzong WANG, et al. Rhythm dancer: 3D dance generation by keymotion transition graph and pose-interpolation network[J]. Big data research, 2023, 9(1): 23-37. DOI： 10.11959/j.issn.2096-0271.2023004.

摘要

3D舞蹈是元宇宙中虚拟人的一种重要表现形式，它将音乐与舞蹈进行有机结合，大大增强了元宇宙中相关应用的趣味性。之前的工作通常把3D舞蹈生成简单视作一个序列生成任务，但是生成的舞蹈动作质量较差且与音乐的契合度较低。受人类学习舞蹈过程的启发，提出了一种新颖的3D舞蹈框架——“节奏舞者”来解决上述问题。该框架首先使用VQ-VAE-2对舞蹈进行分层编码量化，可有效改善舞蹈生成质量；然后使用节奏点上的关键动作编码建立关键动作转换图，既可保证生成的舞蹈动作与音乐节拍的契合度，又可增加舞蹈动作的多样性。为了确保关键动作之间平滑自然地连接，提出了一个姿态插值网络来学习关键动作之间的转换动作。通过大量实验证明，该框架避免了长序列生成的不稳定和不可控问题，实现了舞蹈动作与音乐节奏的高度契合，达到了当前最优效果。

Abstract

3D dance is an indispensable form of virtual humans in the metaverse.It organically combines music and dance art

which greatly increases the interest in the metaverse.Previous work usually treats it as a simple sequence generation task

but it is difficult to match the dance movements with the music beat perfectly and the quality of long sequence dance generation is difficult to be guaranteed.Inspired by the process by which humans learn to dance

a novel 3D dance framework “Rhythm Dancer”to solve the above problems was proposed.The framework first uses VQ-VAE-2 to encode and quantify the dances in a hierarchical way

which effectively improves the quality of dance generation.Then

a key movement transition map was created using the core dance movements on the rhythm points

which not only ensures that the generated dance movements fit with the music beat

but also increases the diversity of dance movements.To ensure smooth and natural connections between the core dance moves

a poseinterpolation network was proposed to learn the transition movements between key moves.Extensive experiments demonstrate that the framework not only avoids the instability and uncontrollability problems of long sequence generation

but also achieves a higher match between dance movements and music rhythms

reaching state-of-the-art results.

关键词

Keywords

references

LI R L , YANG S , ROSS D A , et al . AI choreographer:music conditioned 3D dance generation with AIST++ [C ] // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Piscataway:IEEE Press , 2021 .

LI S Y , YU W J , GU T P , et al . Bailando:3D dance generation by actor-critic GPT with choreographic memory [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2022 : 11040 - 11049 .

AHN H , KIM J , KIM K , et al . Generative autoregressive networks for 3D dancing move synthesis from music [J ] . IEEE Robotics and Automation Letters , 2020 , 5 ( 2 ): 3501 - 3508 .

ALEMI O , FRANÇOISE J , PASQUIER P . GrooveNet:real-time music-driven dance movement generation using artificial neural networks [C ] // Proceedings of the 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining . New York:ACM Press , 2017 .

GINOSAR S , BAR A , KOHAVI G , et al . Learning individual styles of conversational gesture [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2019 : 3492 - 3501 .

KAO H K , SU L . Temporally guided music-to-body-movement generation [C ] // Proceedings of the 28th ACM International Conference on Multimedia . New York:ACM Press , 2020 : 147 - 155 .

REN X C , LI H R , HUANG Z J , et al . Self-supervised dance video synthesis conditioned on music [C ] // Proceedings of the 28th ACM International Conference on Multimedia . New York:ACM Press , 2020 : 46 - 54 .

RAZAVI A , OORD A V D , VINYALS O . Generating diverse high-fidelity images with VQ-VAE-2 [J ] . arXiv preprint , 2019 ,arXiv:1906.00446.

KOVAR L , GLEICHER M , PIGHIN F . Motion graphs [C ] // Proceedings of ACM SIGGRAPH 2008 . New York:ACM Press , 2008 : 1 - 10 .

LEE J , SHIN S Y . A hierarchical approach to interactive motion editing for humanlike figures [C ] // Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques .[S.l.:s.n. ] , 1999 : 39 - 48 .

PAPADOURAKIS A G . Motion capture and analysis:US20170348561A1 [P ] . 2017 - 12 - 07 .

HOLDEN D , SAITO J , KOMURA T . A deep learning framework for character motion synthesis and editing [J ] . ACM Transactions on Graphics , 2016 , 35 ( 4 ): 1 - 11 .

HOLDEN D , SAITO J , KOMURA T , et al . Learning motion manifolds with convolutional autoencoders [C ] // Proceedings of SIGGRAPH Asia 2015 Technical Briefs . New York:ACM Press , 2015 : 1 - 4 .

HERNANDEZ A , GALL J , MORENO F . Human motion prediction via spatiotemporal inpainting [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Piscataway:IEEE Press , 2019 : 7133 - 7142 .

LOPER M , MAHMOOD N , ROMERO J , et al . Smpl [J ] . ACM Transactions on Graphics , 2015 , 34 ( 6 ): 1 - 16 .

OORD A V D , VINYALS O , Kavukcuoglu K . Neural discrete representation learning [J ] . arXiv preprint , 2017 ,arXiv:1711.00937.

LEE H Y , YANG X D , LIU M Y , et al . Dancing to music [C ] // Proceedings of the 33rd International Conference on Neural Information Processing Systems . New York:ACM Press , 2019 : 3586 - 3596 .

LI B Y , ZHAO Y C , SHI Z L , et al . DanceFormer:music conditioned 3D dance generation with parametric motion transformer [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 2 ): 1272 - 1279 .

CHEN K , TAN Z P , LEI J , et al . Choreomaster:choreography-oriented music-driven dance synthesis [J ] . ACM Transactions on Graphics , 2021 , 40 ( 4 ): 1 - 13 .

FERREIRA J P , COUTINHO T M , GOMES T L , et al . Learning to dance:a graph convolutional adversarial network to generate realistic dance motions from audio [J ] . Computers ＆ Graphics , 2021 , 94 ( Feb. ): 11 - 21 .

TANG T R , JIA J , MAO H Y . Dance with melody:an LSTM-autoencoder approach to music-oriented dance synthesis [C ] // Proceedings of the 26th ACM international conference on Multimedia . New York:ACM Press , 2018 : 1598 - 1606 .

WEST D B . Introduction to graph theory [M ] // Discrete mathematics in statistical physics . Berlin : Springer , 2001 .

YAN W , ZHANG Y Z , ABBEEL P , et al . VideoGPT:video generation using VQVAE and transformers [J ] . arXiv preprint , 2021 ,arXiv:2104.10157.

CHEN X L , HE K M . Exploring simple Siamese representation learning [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 .

JIN Y H , ZHANG J K , LI M J , et al . Towards the automatic anime characters creation with generative adversarial networks [J ] . arXiv preprint , 2017 ,arXiv:1708.05509.

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York:ACM Press , 2017 : 6000 - 6010 .

ZHANG X J , XU Y , YANG S , et al . Dance generation with style embedding:learning and transferring latent representations of dance styles [J ] . arXiv preprint , 2021 ,arXiv:2104.14802.

KINGMA D P , BA J . Adam:a method for stochastic optimization [C ] // Proceedings of the 3rd International Conference for Learning Representations .[S.l.:s.n. ] , 2015 .

HEUSEL M , RAMSAUER H , UNTERTHINER T , et al . GANs trained by a two time-scale update rule converge to a local Nash equilibrium [J ] . arXiv preprint , 2017 ,arXiv:1706.08500.

WON D G A J . fairmotion - tools to load,process and visualize motion capture data [Z ] . 2020 .

MÜLLER M , RÖDER T , CLAUSEN M . Efficient content-based retrieval of motion capture data [C ] // Proceedings of ACM SIGGRAPH 2005 . New York:ACM Press , 2005 : 677 - 685 .

LI J M , YIN Y H , CHU H , et al . Learning to generate diverse dance motions with transformer [J ] . arXiv preprint , 2020 ,arXiv:2008.08171.

ZHUANG W L , WANG C Y , CHAI J X , et al . Music2Dance:DanceNet for music-driven dance generation [J ] . ACM Transactions on Multimedia Computing,Communications,and Applications , 2022 , 18 ( 2 ): 1 - 21 .

HUANG Y H , ZHANG J J , LIU S Y , et al . Genre-conditioned long-term 3D dance generation driven by music [C ] // Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2022 : 4858 - 4862 .

浏览量

495

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

虚拟人形象合成技术综述

基于改进YOLOv8的高分辨率遥感图像目标检测算法

沙尘图像视觉增强技术综述

情感语音合成综述

基于生成对抗网络的多特征融合去雾技术