Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer
发布时间:2023-09-27
点击次数:
- 影响因子:
- 10.6
- DOI码:
- 10.1109/TIP.2023.3299791
- 所属单位:
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- 发表刊物:
- IEEE TRANSACTIONS ON IMAGE PROCESSING
- 刊物所在地:
- 445 HOES LANE, PISCATAWAY, NJ 08855-4141
- 关键字:
- Cross Relation;Image Retrieval;Transformer
- 摘要:
- Composing Text and Image to Image Retrieval (CTI-IR) aims at finding the target image, which matches the query image visually along with the query text semantically. However, existing works ignore the fact that the reference text usually serves multiple functions, e.g., modification and auxiliary. To address this issue, we put forth a unified solution, namely Hierarchical Aggregation Transformer incorporated with Cross Relation Network (CRN). CRN unifies modification and relevance manner in a single framework. This configuration shows broader applicability, enabling us to model both modification and auxiliary text or their combination in triplet relationships simultaneously. Specifically, CRN includes: 1) Cross Relation Network comprehensively captures the relationships of various composed retrieval scenarios caused by two different query text types, allowing a unified retrieval model to designate adaptive combination strategies for flexible applicability; 2) Hierarchical Aggregation Transformer aggregates top-down features with Multi-layer Perceptron (MLP) to overcome the limitations of edge information loss in a window-based multi-stage Transformer. Extensive experiments demonstrate the superiority of the proposed CRN over all three fashion-domain datasets. Code is available at github.com/yan9qu/crn.
- 合写作者:
- Ye Mang,Cai Zhaohui,Du Bo
- 第一作者:
- Yang Qu
- 论文类型:
- 文章
- 通讯作者:
- Su Kehua
- 文献类型:
- J
- 卷号:
- 32
- 页面范围:
- 4543-4554
- ISSN号:
- 1057-7149
- 是否译文:
- 否
- 发表时间:
- 2023-08-24
- 收录刊物:
- SCI、EI