Homepage - Kunyu Wang

Kunyu Wang

Hello, I am currently a fifth-year Ph.D. student at the School of Information Science and Technology, University of Science and Technology of China (USTC), supervised by Prof. Zheng-Jun Zha. Previously, I received my B.E. from the Special Class for the Gifted Young, Chien-Shiung Wu College, Southeast University (SEU), majoring in Automation. In addition, I also work as a research intern at the Beijing Academy of Artificial Intelligence (BAAI) & GALBOT, working closely with Prof. He Wang.

My Ph.D. research mainly focuses on Machine Learning, with specific directions including Domain Generalization, Test-time Adaptation, and Continual Learning. During my internship, my research focuses on Embodied AI, with specific directions including Embodied Navigation, Vision-Language-Action Models, and Multimodal Large Models.

My research interests include enhancing the generalization and robustness of models’ perceptual capabilities in open-world environments, and enabling robots to perceive, understand, and act in the real world. In my future research career, I hope to pursue work that either drives influential and insightful advances in technology, or explores fundamental principles with scientific significance underlying challenging problems.

I am actively looking for postdoctoral opportunities. If you have a suitable position or collaboration in mind, please feel free to contact me by email.

kunyuwangustc(at)gmail.com Google Scholar GitHub Curriculum Vitae

Education

University of Science and Technology of China

School of Information Science and Technology
Ph.D. Student Sep. 2021 - Jun. 2026
Southeast University

Chien-Shiung Wu College
Special Class for the Gifted Young
B.E. in Automation Sep. 2017 - Jun. 2021

Experience

Beijing Academy of Artificial Intelligence & GALBOT

Research Intern Jul. 2023 - Aug. 2024

Selected Publications (view all )

Machine Learning

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning

Kunyu Wang, Xueyang Fu, Xin Lu, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha^†

CVPR 2025 Oral (3.3% of all accepted papers)

During source-to-target domain transfer, not all source-learned features are beneficial, some can even degrade target performance.

[Paper] [Oral Presentation Slides] [Watch on Bilibili (Featured by VALSE)]

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning

Kunyu Wang, Xueyang Fu, Xin Lu, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha^†

CVPR 2025 Oral (3.3% of all accepted papers)

During source-to-target domain transfer, not all source-learned features are beneficial, some can even degrade target performance.

[Paper] [Oral Presentation Slides] [Watch on Bilibili (Featured by VALSE)]

PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation

Kunyu Wang, Xueyang Fu, Yunfei Bao, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha^†

NeurIPS 2025

Pairwise angular structure in pre-trained weights encodes a domain-invariant semantic prior that should be preserved during transfer.

[Paper] [Code]

PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation

Kunyu Wang, Xueyang Fu, Yunfei Bao, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha^†

NeurIPS 2025

Pairwise angular structure in pre-trained weights encodes a domain-invariant semantic prior that should be preserved during transfer.

[Paper] [Code]

Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay

Kunyu Wang, Xueyang Fu, Chengzhi Cao, Chengjie Ge, Wei Zhai, Zheng-Jun Zha^†

Manuscript under minor revision at TNNLS

We propose a brain-inspired continual learning framework by imitating the complementary learning mechanism of the human brain.

[Paper] [Supplementary Material] [Code]

Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay

Kunyu Wang, Xueyang Fu, Chengzhi Cao, Chengjie Ge, Wei Zhai, Zheng-Jun Zha^†

Manuscript under minor revision at TNNLS

We propose a brain-inspired continual learning framework by imitating the complementary learning mechanism of the human brain.

[Paper] [Supplementary Material] [Code]

Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement

Kunyu Wang, Xueyang Fu, Chengjie Ge, Chengzhi Cao, Zheng-Jun Zha^†

IJCV 2024

We validate that beyond the spatial domain, the frequency domain indeed offers a discriminative axis for decoupling invariant features.

[Paper] [Code]

Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement

Kunyu Wang, Xueyang Fu, Chengjie Ge, Chengzhi Cao, Zheng-Jun Zha^†

IJCV 2024

We validate that beyond the spatial domain, the frequency domain indeed offers a discriminative axis for decoupling invariant features.

[Paper] [Code]

Generalized UAV Object Detection via Frequency Domain Disentanglement

Kunyu Wang, Xueyang Fu, Yukun Huang, Chengzhi Cao, Gege Shi, Zheng-Jun Zha^†

CVPR 2023

We propose a new invariance learning paradigm which improves generalization by extracting domain-invariant spectral components.

[Paper] [Code]

Generalized UAV Object Detection via Frequency Domain Disentanglement

Kunyu Wang, Xueyang Fu, Yukun Huang, Chengzhi Cao, Gege Shi, Zheng-Jun Zha^†

CVPR 2023

We propose a new invariance learning paradigm which improves generalization by extracting domain-invariant spectral components.

[Paper] [Code]

Embodied AI

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Jiazhao Zhang, Kunyu Wang, Shaoan Wang, Minghan Li, Haoran Liu, Songlin Wei, Zhongyuan Wang, Zhizheng Zhang^†, He Wang^†

RSS 2025

We propose Uni-NaVid, a navigation generalist unifying multiple skills in one model, including VLN, ObjNav, EQA, and Human-Following.

[Paper] [Project Page] [Code] [Uni-Navid on CSDN]

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Jiazhao Zhang, Kunyu Wang, Shaoan Wang, Minghan Li, Haoran Liu, Songlin Wei, Zhongyuan Wang, Zhizheng Zhang^†, He Wang^†

RSS 2025

We propose Uni-NaVid, a navigation generalist unifying multiple skills in one model, including VLN, ObjNav, EQA, and Human-Following.

[Paper] [Project Page] [Code] [Uni-Navid on CSDN]

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Jiazhao Zhang*, Kunyu Wang*, Rongtao Xu*, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang^†, He Wang^†

RSS 2024

We propose NaVid, the first generalized embodied navigation large model. Video and language in, actions out!

[Paper] [Project Page] [Code] [Oral Presentation Slides] [NaVid on Baidu Encyclopedia]

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Jiazhao Zhang*, Kunyu Wang*, Rongtao Xu*, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang^†, He Wang^†

RSS 2024

We propose NaVid, the first generalized embodied navigation large model. Video and language in, actions out!

[Paper] [Project Page] [Code] [Oral Presentation Slides] [NaVid on Baidu Encyclopedia]

SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV

Kunyu Wang, Guanbo Wu, Xingbo Wang, Kean Liu, Xin Lu, Chengjie Ge, Wei Zhai, Xueyang Fu, Zheng-Jun Zha^†

Manuscript under major revision at TPAMI

We build a million-scale UAV-based referring expression dataset to support precise retrieval of user-specified targets in UAV imagery.

[Paper] [Dataset]

SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV

Kunyu Wang, Guanbo Wu, Xingbo Wang, Kean Liu, Xin Lu, Chengjie Ge, Wei Zhai, Xueyang Fu, Zheng-Jun Zha^†

Manuscript under major revision at TPAMI

We build a million-scale UAV-based referring expression dataset to support precise retrieval of user-specified targets in UAV imagery.

[Paper] [Dataset]

Warning

Action required

Education

Experience

Selected Publications (view all )

Machine Learning

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning

PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation

PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation

Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay

Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay

Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement

Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement

Generalized UAV Object Detection via Frequency Domain Disentanglement

Generalized UAV Object Detection via Frequency Domain Disentanglement

Embodied AI

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV

SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV

All publications