Cross-Modality Pyramid Alignment for Visual Intention Understanding
发布时间:2023-09-27
点击次数:
- 影响因子:
- 10.6
- DOI码:
- 10.1109/TIP.2023.3261743
- 所属单位:
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- 发表刊物:
- IEEE TRANSACTIONS ON IMAGE PROCESSING
- 刊物所在地:
- 445 HOES LANE, PISCATAWAY, NJ 08855-4141
- 关键字:
- VisualizationTask analysis;Semantics;Feature extraction;Training;Image segmentation;Image color analysis;Visual intention understanding;cross modality;hierarchical relation
- 摘要:
- Visual intention understanding is the task of exploring the potential and underlying meaning expressed in images. Simply modeling the objects or backgrounds within the image content leads to unavoidable comprehension bias. To alleviate this problem, this paper proposes a Cross-modality Pyramid Alignment with Dynamic optimization (CPAD) to enhance the global understanding of visual intention with hierarchical modeling. The core idea is to exploit the hierarchical relationship between visual content and textual intention labels. For visual hierarchy, we formulate the visual intention understanding task as a hierarchical classification problem, capturing multiple granular features in different layers, which corresponds to hierarchical intention labels. For textual hierarchy, we directly extract the semantic representation from intention labels at different levels, which supplements the visual content modeling without extra manual annotations. Moreover, to further narrow the domain gap between different modalities, a cross-modality pyramid alignment module is designed to dynamically optimize the performance of visual intention understanding in a joint learning manner. Comprehensive experiments intuitively demonstrate the superiority of our proposed method, outperforming existing visual intention understanding methods.
- 合写作者:
- Shi Qinghongya,Du Bo
- 第一作者:
- Ye Mang
- 论文类型:
- 文章
- 通讯作者:
- Su Kehua
- 文献类型:
- J
- 卷号:
- 32
- 页面范围:
- 2190-2201
- ISSN号:
- 1057-7149
- 是否译文:
- 否
- 发表时间:
- 2023-05-05
- 收录刊物:
- SCI、EI