Textual Enhanced Adaptive Meta-Fusion for Few-shot Visual Recognition
发布时间:2023-09-27
点击次数:
- DOI码:
- 10.1109/TMM.2023.3295731
- 所属单位:
- Institute of Electrical and Electronics Engineers Inc.
- 发表刊物:
- IEEE Transactions on Multimedia
- 摘要:
- Few-shot learning (FSL) is a challenging task that aims to train a classifier to recognize novel categories, where only a few annotated examples are available in each category. Recently, many FSL approaches have been proposed based on the meta-learning paradigm, which attempts to learn transferable knowledge from similar tasks by designing a meta-learner. However, most of these approaches only exploit the information from visual modality and do not utilize ones from additional modalities (<italic>e.g.</italic>, textual description). Since the labeled examples in FSL are limited, increasing the information on the examples is a probable solution to improve the classification performance. This motivates us to propose a novel meta-learning method, termed textual enhanced adaptive meta-fusion FSL (TAMF-FSL), which leverages both the visual information from the visual image and semantic information from language supervision. Specifically, TAMF-FSL exploits the semantic information of textual description to improve the visual-based models. We first employ a text encoder to learn the semantic features of each visual category, and then design a modality alignment module and meta-fusion module to align and fuse the visual and semantic features for final prediction. Extensive experiments show that the proposed method outperforms many recent or competitive FSL counterparts on two popular datasets.
- 合写作者:
- Zhan Yibing,Luo Yong,Hu Han,Du Bo
- 第一作者:
- Han Mengya
- 论文类型:
- 期刊论文
- 通讯作者:
- Su Kehua
- 页面范围:
- 1-11
- ISSN号:
- 1520-9210
- 是否译文:
- 否
- 发表时间:
- 2023-03-01
- 收录刊物:
- EI