武汉大学主页平台管理系统苏科华--中文主页-- Fine-Grained Position Helps Memorizing More, a Novel Music Compound Transformer Model with Feature Interaction Fusion

Fine-Grained Position Helps Memorizing More, a Novel Music Compound Transformer Model with Feature Interaction Fusion

发布时间：2023-09-27 点击次数：

所属单位：: AAAI Press

发表刊物：: Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023

摘要：: Due to the particularity of the simultaneous occurrence of multiple events in music sequences, compound Transformer is proposed to deal with the challenge of long sequences. However, there are two deficiencies in the compound Transformer. First, since the order of events is more important for music than natural language, the information provided by the original absolute position embedding is not precise enough. Second, there is an important correlation between the tokens in the compound word, which is ignored by the current compound Transformer. Therefore, in this work, we propose an improved compound Transformer model for music understanding. Specifically, we propose an attribute embedding fusion module and a novel position encoding scheme with absolute-relative consideration. In the attribute embedding fusion module, different attributes are fused through feature permutation by using a multi-head self-attention mechanism in order to capture rich interactions between attributes. In the novel position encoding scheme, we propose RoAR position encoding, which realizes rotational absolute position encoding, relative position encoding, and absolute-relative position interactive encoding, providing clear and rich orders for musical events. Empirical study on four typical music understanding tasks shows that our attribute fusion approach and RoAR position encoding brings large performance gains. In addition, we further investigate the impact of masked language modeling and casual language modeling pre-training on music understanding.

合写作者：: Gong Ruhan,Chen Yineng

第一作者：: Li Zuchao

论文类型：: 期刊论文

通讯作者：: Su Kehua

文献类型：: J

卷号：: 37

页面范围：: 5203-5212

是否译文：: 否

发表时间：: 2023-06-27

收录刊物：: EI

上一条：Textual Enhanced Adaptive Meta-Fusion for Few-shot Visual Recognition

下一条：Dual Mutual Information Constraints for Discriminative Clustering

苏科华 var _tsites_com_view_mode_type_=8;

Fine-Grained Position Helps Memorizing More, a Novel Music Compound Transformer Model with Feature Interaction Fusion

苏科华