Kunyu Wang

Hello, I am currently a fifth-year Ph.D. student at the School of Information Science and Technology, University of Science and Technology of China (USTC), supervised by Prof. Zheng-Jun Zha. Previously, I received my B.E. from the Special Class for the Gifted Young, Chien-Shiung Wu College, Southeast University (SEU), majoring in Automation. In addition, I also work as a research intern at the Beijing Academy of Artificial Intelligence (BAAI) & GALBOT, working closely with Prof. He Wang.

My Ph.D. research mainly focuses on Machine Learning, with specific directions including Domain Generalization, Test-time Adaptation, and Continual Learning. During my internship, my research focuses on Embodied AI, with specific directions including Embodied Navigation, Vision-Language-Action Models, and Multimodal Large Models.

My research interests include enhancing the generalization and robustness of models’ perceptual capabilities in open-world environments, and enabling robots to perceive, understand, and act in the real world. In my future research career, I hope to pursue work that either drives influential and insightful advances in technology, or explores fundamental principles with scientific significance underlying challenging problems.

I am actively looking for postdoctoral opportunities. If you have a suitable position or collaboration in mind, please feel free to contact me by email.


Education
  • University of Science and Technology of China
    School of Information Science and Technology
    Ph.D. Student
    Sep. 2021 - Jun. 2026
  • Southeast University
    Chien-Shiung Wu College
    Special Class for the Gifted Young
    B.E. in Automation
    Sep. 2017 - Jun. 2021
Experience
  • Beijing Academy of Artificial Intelligence & GALBOT
    Research Intern Jul. 2023 - Aug. 2024
Selected Publications (view all )
Machine Learning
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning

Kunyu Wang, Xueyang Fu, Xin Lu, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha

CVPR 2025 Oral (3.3% of all accepted papers)

During source-to-target domain transfer, not all source-learned features are beneficial, some can even degrade target performance.

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning

Kunyu Wang, Xueyang Fu, Xin Lu, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha

CVPR 2025 Oral (3.3% of all accepted papers)

During source-to-target domain transfer, not all source-learned features are beneficial, some can even degrade target performance.

PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation

Kunyu Wang, Xueyang Fu, Yunfei Bao, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha

NeurIPS 2025

Pairwise angular structure in pre-trained weights encodes a domain-invariant semantic prior that should be preserved during transfer.

PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation

Kunyu Wang, Xueyang Fu, Yunfei Bao, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha

NeurIPS 2025

Pairwise angular structure in pre-trained weights encodes a domain-invariant semantic prior that should be preserved during transfer.

Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay
Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay

Kunyu Wang, Xueyang Fu, Chengzhi Cao, Chengjie Ge, Wei Zhai, Zheng-Jun Zha

Manuscript under minor revision at TNNLS

We propose a brain-inspired continual learning framework by imitating the complementary learning mechanism of the human brain.

Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay

Kunyu Wang, Xueyang Fu, Chengzhi Cao, Chengjie Ge, Wei Zhai, Zheng-Jun Zha

Manuscript under minor revision at TNNLS

We propose a brain-inspired continual learning framework by imitating the complementary learning mechanism of the human brain.

Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement

Kunyu Wang, Xueyang Fu, Chengjie Ge, Chengzhi Cao, Zheng-Jun Zha

IJCV 2024

We validate that beyond the spatial domain, the frequency domain indeed offers a discriminative axis for decoupling invariant features.

Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement

Kunyu Wang, Xueyang Fu, Chengjie Ge, Chengzhi Cao, Zheng-Jun Zha

IJCV 2024

We validate that beyond the spatial domain, the frequency domain indeed offers a discriminative axis for decoupling invariant features.

Generalized UAV Object Detection via Frequency Domain Disentanglement
Generalized UAV Object Detection via Frequency Domain Disentanglement

Kunyu Wang, Xueyang Fu, Yukun Huang, Chengzhi Cao, Gege Shi, Zheng-Jun Zha

CVPR 2023

We propose a new invariance learning paradigm which improves generalization by extracting domain-invariant spectral components.

Generalized UAV Object Detection via Frequency Domain Disentanglement

Kunyu Wang, Xueyang Fu, Yukun Huang, Chengzhi Cao, Gege Shi, Zheng-Jun Zha

CVPR 2023

We propose a new invariance learning paradigm which improves generalization by extracting domain-invariant spectral components.

Embodied AI
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks gif
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Jiazhao Zhang, Kunyu Wang, Shaoan Wang, Minghan Li, Haoran Liu, Songlin Wei, Zhongyuan Wang, Zhizheng Zhang, He Wang

RSS 2025

We propose Uni-NaVid, a navigation generalist unifying multiple skills in one model, including VLN, ObjNav, EQA, and Human-Following.

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Jiazhao Zhang, Kunyu Wang, Shaoan Wang, Minghan Li, Haoran Liu, Songlin Wei, Zhongyuan Wang, Zhizheng Zhang, He Wang

RSS 2025

We propose Uni-NaVid, a navigation generalist unifying multiple skills in one model, including VLN, ObjNav, EQA, and Human-Following.

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation gif
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Jiazhao Zhang*, Kunyu Wang*, Rongtao Xu*, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang

RSS 2024

We propose NaVid, the first generalized embodied navigation large model. Video and language in, actions out!

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Jiazhao Zhang*, Kunyu Wang*, Rongtao Xu*, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang

RSS 2024

We propose NaVid, the first generalized embodied navigation large model. Video and language in, actions out!

SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV
SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV

Kunyu Wang, Guanbo Wu, Xingbo Wang, Kean Liu, Xin Lu, Chengjie Ge, Wei Zhai, Xueyang Fu, Zheng-Jun Zha

Manuscript under major revision at TPAMI

We build a million-scale UAV-based referring expression dataset to support precise retrieval of user-specified targets in UAV imagery.

SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV

Kunyu Wang, Guanbo Wu, Xingbo Wang, Kean Liu, Xin Lu, Chengjie Ge, Wei Zhai, Xueyang Fu, Zheng-Jun Zha

Manuscript under major revision at TPAMI

We build a million-scale UAV-based referring expression dataset to support precise retrieval of user-specified targets in UAV imagery.

All publications