Wuhan University Kehua--Home

苏科华

Professor
Supervisor of Doctorate Candidates
Supervisor of Master's Candidates

E-Mail:

Date of Employment:2009-12-01

School/Department:计算机学院

Education Level:研究生毕业

Business Address:D203

Gender:Male

Contact Information:13517299596

Status:Employed

Discipline:Computer Applications Technology
Communications and Information Systems
Other specialties in Software Engineering
Cyberspace Security

Paper Publications

Fine-Grained Position Helps Memorizing More, a Novel Music Compound Transformer Model with Feature Interaction Fusion

Hits:

Affiliation of Author(s):AAAI Press

Journal:Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023

Abstract:Due to the particularity of the simultaneous occurrence of multiple events in music sequences, compound Transformer is proposed to deal with the challenge of long sequences. However, there are two deficiencies in the compound Transformer. First, since the order of events is more important for music than natural language, the information provided by the original absolute position embedding is not precise enough. Second, there is an important correlation between the tokens in the compound word, which is ignored by the current compound Transformer. Therefore, in this work, we propose an improved compound Transformer model for music understanding. Specifically, we propose an attribute embedding fusion module and a novel position encoding scheme with absolute-relative consideration. In the attribute embedding fusion module, different attributes are fused through feature permutation by using a multi-head self-attention mechanism in order to capture rich interactions between attributes. In the novel position encoding scheme, we propose RoAR position encoding, which realizes rotational absolute position encoding, relative position encoding, and absolute-relative position interactive encoding, providing clear and rich orders for musical events. Empirical study on four typical music understanding tasks shows that our attribute fusion approach and RoAR position encoding brings large performance gains. In addition, we further investigate the impact of masked language modeling and casual language modeling pre-training on music understanding.

Co-author:Gong Ruhan,Chen Yineng

First Author:Li Zuchao

Indexed by:Journal paper

Correspondence Author:Su Kehua

Document Type:J

Volume:37

Page Number:5203-5212

Translation or Not:no

Date of Publication:2023-06-27

Included Journals:EI

Pre One:Textual Enhanced Adaptive Meta-Fusion for Few-shot Visual Recognition

Next One:Dual Mutual Information Constraints for Discriminative Clustering

Profile

苏科华，男，武汉大学计算机学院教授、博导。研究主要集中在最优传输（Optimal Transport）领域，它是研究概率测度间最优变换的一类优化问题。在计算机图形学、机器视觉、人工智能、医学图像处理等领域有着广泛的应用。本人主要研究最优传输的几何计算理论和高效算法，并将其应用于网格保测参数化、三维场景优化、智能烧伤评估和卫星互联网任务优化中。主持包括国家自然科学基金、中央军科委、航天5院、华为公司等20多个项目支持，发表论文50余篇，获批发明专利10余项。为CCF计算机辅助设计与图形学（CAD/CG）和虚拟现实与可视化(TCVRV)专委会的执行委员。