Wuhan University Chen Guanzhou--Home

Chen Guanzhou

Main positions:副研究员
Gender:Male
Status:Employed
School/Department:测绘遥感信息工程全国重点实验室

Discipline: Photogrammetry and Remote Sensing

MOBILE Version

Click: times

Open Time:..

The Last Update Time:..

Current position: Home >> Scientific Research >> Paper Publications

MSSDF: Modality-shared self-supervised distillation for high-resolution multi-modal remote sensing image learning

Hits : Praise

Impact Factor:15.5

DOI number:10.1016/j.inffus.2025.104006

Journal:Information Fusion

Abstract:High-resolution multi-modal remote sensing (RS) images provide rich complementary information for Earth observation, yet the scarcity of high-quality annotated data remains a major obstacle for effective model training. To address this challenge, we propose a Modality-Shared Self-supervised Distillation Framework (MSSDF) that learns discriminative multi-modal representations with minimal reliance on labeled data. Specifically, MSSDF integrates information-aware and cross-modal masking strategies with multi-objective self-supervised learning, enabling the model to capture modality-shared semantics and compensate for missing or weakly labeled modalities. This design substantially reduces the dependence on large-scale annotations and enhances robustness under limited-label regimes. Extensive experiments on scene classification, semantic segmentation, and change detection tasks demonstrate that MSSDF consistently outperforms state-of-the-art methods, particularly when labeled data are scarce. Specifically, on the Potsdam and Vaihingen semantic segmentation tasks, our method achieved mIoU scores of 78.30 % and 76.50 %, with only 50 % train-set. For the US3D depth estimation task, the RMSE error is reduced to 0.182, and for the binary change detection task in SECOND dataset, our method achieved mIoU scores of 47.51 %, surpassing the second by 3 percentage points. In addition, we construct a high-resolution multi-modal remote sensing image dataset named HR-Pairs, which contains 640,000 DOM (Digital Orthophoto Map) -DSM(Digital Surface Model) pairs with a spatial resolution of 0.05 m, providing a new high-quality dataset for multi-modal remote sensing research. Our pretrain code, checkpoints, and HR-Pairs dataset can be found in https://github.com/CVEO/MSSDF.

Co-author:Chenxi Liu, Jiaqi Wang, Xiaoliang Tan, Wenchao Guo, Qingyuan Yang, Kaiqi Zhang

Indexed by:Journal paper

Correspondence Author:Guanzhou Chen, Xiaodong Zhang

Document Type:J

Volume:129

Page Number:104006

ISSN No.:1566-2535

Translation or Not:no

Date of Publication:2026-05-01

Next One:TripleS: Mitigating multi-task learning conflicts for semantic change detection in high-resolution remote sensing imagery