Xinyu Zhang

I am a Research Fellow at Australian Institute for Machine Learning (AIML), the University of Adelaide, founded by Centre for Augmented Reasoning (CAR). I am working closely with A/Prof. Lingqiao Liu and Prof. Anton van den Hengel.

Previously, I was a Senior Research Scientist in Baidu Inc., working closely with Chief Scientist Dr. Jingdong Wang. I earned my Ph.D from Tongji University and was a joint Ph.D student at the University of Adelaide, under the supervision of Prof. Chunhua Shen, Prof. Javen Qinfeng Shi, Prof. Anton van den Hengel and Prof. Mingyu You.

Research Topics

My research focuses on designing machine learning algorithms to understand and depict the real large-scale unstructured data, and generate and create the synthetic data to simulate the real world.

Specifically, my research topics center on Machine Learning and Computer Vision, especially in:

Generative AI models: Image/Video generation/editing
Foundation model pre-training: Foundation and human-centric pre-training
Self-supervised / un-supervised / semi-supervised learning
Object/Attribute detection/recognition; Image/Text-to-image retrieval

News

Jun, 2025	I will serve as Area Chair in WACV 2026
May, 2025	I will serve as Guest Editor in Entropy with the Special Issue [Rethinking Representation Learning in the Age of Large Models]. Call for papers! The submission deadline is 31 October 2025.
Mar, 2025	I am a Winner for the Women Leading Tech Awards 2025 in Education/Research Category.
Feb, 2025	One paper [Is Generated Image Really Realistic?] is accepted by CVPR 2025
Jan, 2025	Release one paper [SimulateMotion] on Training-free Video Generation for Motion Simulation
Jan, 2025	One paper titled [InCPL] is accepted by Pattern Recognition on Test-time Prompt Tuning
Jan, 2025	Serving as Senior Program Committee Member (SPC) in IJCAI 2025
Dec, 2024	I will serve as Area Chair in ICCV 2025
Sep, 2024	One paper [DEVIL] is accepted by NeurIPS 2024 on Text-to-Video Generation Evaluation in Dynamic Perspective
Feb, 2024	One paper [VRP-SAM] is accepted by CVPR 2024 on Efficient Reference Based Object Segmentation

Papers (📖 Full list)

arXiv

Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss

Xinyu Zhang, Zicheng Duan, Dong Gong, and Lingqiao Liu

arXiv preprint arXiv:2501.07563, 2025

arXiv Bib Code

@article{zhang2025training,
  title = {Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss},
  author = {Zhang, Xinyu and Duan, Zicheng and Gong, Dong and Liu, Lingqiao},
  journal = {arXiv preprint arXiv:2501.07563},
  year = {2025},
}

CVPR

Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers?

Zebin You, Xinyu Zhang, Hanzhong Guo, Jingdong Wang, and 1 more author

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2025

arXiv Bib

@inproceedings{you2024image,
  title = {Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers?},
  author = {You, Zebin and Zhang, Xinyu and Guo, Hanzhong and Wang, Jingdong and Li, Chongxuan},
  booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year = {2025},
}

NeurIPS

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

Mingxiang Liao^*, Hannan Lu^*, Xinyu Zhang^*, Fang Wan, and 5 more authors

In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

Bib PDF Code

@inproceedings{liaoevaluation,
  title = {Evaluation of Text-to-Video Generation Models: A Dynamics Perspective},
  author = {Liao, Mingxiang and Lu, Hannan and Zhang, Xinyu and Wan, Fang and Wang, Tianyu and Zhao, Yuzhong and Zuo, Wangmeng and Ye, Qixiang and Wang, Jingdong},
  booktitle = {The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year = {2024},
}

TMLR

CAE v2: Context autoencoder with CLIP latent alignment

Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, and 7 more authors

Transactions on Machine Learning Research, 2023

Bib PDF Code

@article{zhang2023cae,
  title = {CAE v2: Context autoencoder with CLIP latent alignment},
  author = {Zhang, Xinyu and Chen, Jiahui and Yuan, Junkun and Chen, Qiang and Wang, Jian and Wang, Xiaodi and Han, Shumin and Chen, Xiaokang and Pi, Jimin and Yao, Kun and others},
  journal = {Transactions on Machine Learning Research},
  year = {2023},
}

CVPR

Implicit sample extension for unsupervised person re-identification

Xinyu Zhang, Dongdong Li, Zhigang Wang, Jian Wang, and 4 more authors

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022

Bib PDF Code

@inproceedings{zhang2022implicit,
  title = {Implicit sample extension for unsupervised person re-identification},
  author = {Zhang, Xinyu and Li, Dongdong and Wang, Zhigang and Wang, Jian and Ding, Errui and Shi, Javen Qinfeng and Zhang, Zhaoxiang and Wang, Jingdong},
  booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages = {7369--7378},
  year = {2022},
}

NeurIPS

Hap: Structure-aware masked image modeling for human-centric perception

Junkun Yuan^*, Xinyu Zhang^*†, Hao Zhou, Jian Wang, and 7 more authors

Advances in Neural Information Processing Systems, 2023

Bib PDF Code

@article{yuan2024hap,
  title = {Hap: Structure-aware masked image modeling for human-centric perception},
  author = {Yuan, Junkun and Zhang$\dag$, Xinyu and Zhou, Hao and Wang, Jian and Qiu, Zhongwei and Shao, Zhiyin and Zhang, Shaofeng and Long, Sifan and Kuang, Kun and Yao, Kun and others},
  journal = {Advances in Neural Information Processing Systems},
  volume = {36},
  year = {2023},
}

AAAI

Diverse knowledge distillation for end-to-end person search

Xinyu Zhang, Xinlong Wang, Jia-Wang Bian, Chunhua Shen, and 1 more author

In Proceedings of the AAAI Conference on Artificial Intelligence, 2021

Bib PDF

@inproceedings{zhang2021diverse,
  title = {Diverse knowledge distillation for end-to-end person search},
  author = {Zhang, Xinyu and Wang, Xinlong and Bian, Jia-Wang and Shen, Chunhua and You, Mingyu},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume = {35},
  number = {4},
  pages = {3412--3420},
  year = {2021},
}

ICCV

Self-training with progressive augmentation for unsupervised cross-domain person re-identification

Xinyu Zhang, Jiewei Cao, Chunhua Shen, and Mingyu You

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019

Bib PDF Code

@inproceedings{zhang2019self,
  title = {Self-training with progressive augmentation for unsupervised cross-domain person re-identification},
  author = {Zhang, Xinyu and Cao, Jiewei and Shen, Chunhua and You, Mingyu},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages = {8222--8231},
  year = {2019},
}

Context-aware prompt learning for test-time vision recognition with frozen vision-language model

Junhui Yin, Xinyu Zhang, Lin Wu, and Xiaojie Wang

Pattern Recognition, 2025

arXiv Bib PDF

@article{yin2025context,
  title = {Context-aware prompt learning for test-time vision recognition with frozen vision-language model},
  author = {Yin, Junhui and Zhang, Xinyu and Wu, Lin and Wang, Xiaojie},
  journal = {Pattern Recognition},
  pages = {111359},
  year = {2025},
  publisher = {Elsevier},
}

CVPR

VRP-SAM: SAM with visual reference prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, and 5 more authors

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Bib PDF Code

@inproceedings{sun2024vrp,
  title = {VRP-SAM: SAM with visual reference prompt},
  author = {Sun, Yanpeng and Chen, Jiahui and Zhang, Shan and Zhang, Xinyu and Chen, Qiang and Zhang, Gang and Ding, Errui and Wang, Jingdong and Li, Zechao},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages = {23565--23574},
  year = {2024},
}

ICCV

Unified pre-training with pseudo texts for text-to-image person re-identification

Zhiyin Shao^*, Xinyu Zhang^*, Changxing Ding, Jian Wang, and 1 more author

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Bib PDF Code

@inproceedings{shao2023unified,
  title = {Unified pre-training with pseudo texts for text-to-image person re-identification},
  author = {Shao, Zhiyin and Zhang, Xinyu and Ding, Changxing and Wang, Jian and Wang, Jingdong},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages = {11174--11184},
  year = {2023},
}

TIP

A real-time memory updating strategy for unsupervised person re-identification

Junhui Yin, Xinyu Zhang, Zhanyu Ma, Jun Guo, and 1 more author

IEEE Transactions on Image Processing, 2023

Bib PDF Code

@article{yin2023real,
  title = {A real-time memory updating strategy for unsupervised person re-identification},
  author = {Yin, Junhui and Zhang, Xinyu and Ma, Zhanyu and Guo, Jun and Liu, Yifan},
  journal = {IEEE Transactions on Image Processing},
  volume = {32},
  pages = {2309--2321},
  year = {2023},
  publisher = {IEEE},
}

TIP

STAT: Multi-object tracking based on spatio-temporal topological constraints

Junjie Zhang, Mingyan Wang, Haoran Jiang, Xinyu Zhang, and 2 more authors

IEEE Transactions on Multimedia, 2023

Bib PDF

@article{zhang2023stat,
  title = {STAT: Multi-object tracking based on spatio-temporal topological constraints},
  author = {Zhang, Junjie and Wang, Mingyan and Jiang, Haoran and Zhang, Xinyu and Yan, Chenggang and Zeng, Dan},
  journal = {IEEE Transactions on Multimedia},
  year = {2023},
  publisher = {IEEE},
}

ACMMM

Learning granularity-unified representations for text-to-image person re-identification

Zhiyin Shao, Xinyu Zhang, Meng Fang, Zhifeng Lin, and 2 more authors

In Proceedings of the 30th acm international conference on multimedia, 2022

Bib PDF Code

@inproceedings{shao2022learning,
  title = {Learning granularity-unified representations for text-to-image person re-identification},
  author = {Shao, Zhiyin and Zhang, Xinyu and Fang, Meng and Lin, Zhifeng and Wang, Jian and Ding, Changxing},
  booktitle = {Proceedings of the 30th acm international conference on multimedia},
  pages = {5566--5574},
  year = {2022},
}

ECCV

UFO: unified feature optimization

Teng Xi, Yifan Sun, Deli Yu, Bi Li, and 7 more authors

In European Conference on Computer Vision, 2022

arXiv Bib PDF Code

@inproceedings{xi2022ufo,
  title = {UFO: unified feature optimization},
  author = {Xi, Teng and Sun, Yifan and Yu, Deli and Li, Bi and Peng, Nan and Zhang, Gang and Zhang, Xinyu and Wang, Zhigang and Chen, Jinwen and Wang, Jian and others},
  booktitle = {European Conference on Computer Vision},
  pages = {472--488},
  year = {2022},
  organization = {Springer},
}

IJCAI

Self-Guided Hard Negative Generation for Unsupervised Person Re-Identification.

Dongdong Li, Zhigang Wang, Jian Wang, Xinyu Zhang, and 3 more authors

In IJCAI, 2022

Bib PDF

@inproceedings{li2022self,
  title = {Self-Guided Hard Negative Generation for Unsupervised Person Re-Identification.},
  author = {Li, Dongdong and Wang, Zhigang and Wang, Jian and Zhang, Xinyu and Ding, Errui and Wang, Jingdong and Zhang, Zhaoxiang},
  booktitle = {IJCAI},
  pages = {1067--1073},
  year = {2022},
}

TITS

Part-guided attention learning for vehicle instance retrieval

Xinyu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, and 2 more authors

IEEE Transactions on Intelligent Transportation Systems, 2020

arXiv Bib PDF Code

@article{zhang2020part,
  title = {Part-guided attention learning for vehicle instance retrieval},
  author = {Zhang, Xinyu and Zhang, Rufeng and Cao, Jiewei and Gong, Dong and You, Mingyu and Shen, Chunhua},
  journal = {IEEE Transactions on Intelligent Transportation Systems},
  volume = {23},
  number = {4},
  pages = {3048--3060},
  year = {2020},
  publisher = {IEEE},
}

TITS

An extended filtered channel framework for pedestrian detection

Mingyu You, Yubin Zhang, Chunhua Shen, and Xinyu Zhang

IEEE Transactions on Intelligent Transportation Systems, 2018

Bib PDF

@article{you2018extended,
  title = {An extended filtered channel framework for pedestrian detection},
  author = {You, Mingyu and Zhang, Yubin and Shen, Chunhua and Zhang, Xinyu},
  journal = {IEEE Transactions on Intelligent Transportation Systems},
  volume = {19},
  number = {5},
  pages = {1640--1651},
  year = {2018},
  publisher = {IEEE},
}

arXiv

Add-SD: Rational Generation without Manual Reference

Lingfeng Yang^*, Xinyu Zhang^*, Xiang Li, Jinwen Chen, and 6 more authors

arXiv preprint arXiv:2407.21016, 2024

arXiv Bib

@article{yang2024add,
  title = {Add-SD: Rational Generation without Manual Reference},
  author = {Yang, Lingfeng and Zhang, Xinyu and Li, Xiang and Chen, Jinwen and Yao, Kun and Zhang, Gang and Ding, Errui and Liu, Lingqiao and Wang, Jingdong and Yang, Jian},
  journal = {arXiv preprint arXiv:2407.21016},
  year = {2024},
}

arXiv

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

Qiang Chen^*, Xiangbo Su^*, Xinyu Zhang^*, Jian Wang, and 7 more authors

arXiv preprint arXiv:2406.03459, 2024

arXiv Bib

@article{chen2024lw,
  title = {LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection},
  author = {Chen, Qiang and Su, Xiangbo and Zhang, Xinyu and Wang, Jian and Chen, Jiahui and Shen, Yunpeng and Han, Chuchu and Chen, Ziliang and Xu, Weixiang and Li, Fanrong and others},
  journal = {arXiv preprint arXiv:2406.03459},
  year = {2024},
}

arXiv

Improving multi-modal large language model through boosting vision capabilities

Yanpeng Sun, Huaxin Zhang, Qiang Chen, Xinyu Zhang, and 4 more authors

arXiv preprint arXiv:2410.13733, 2024

arXiv Bib Code

@article{sun2024improving,
  title = {Improving multi-modal large language model through boosting vision capabilities},
  author = {Sun, Yanpeng and Zhang, Huaxin and Chen, Qiang and Zhang, Xinyu and Sang, Nong and Zhang, Gang and Wang, Jingdong and Li, Zechao},
  journal = {arXiv preprint arXiv:2410.13733},
  year = {2024},
}

arXiv

Memorizing comprehensively to learn adaptively: Unsupervised cross-domain person re-id with multi-level memory

Xinyu Zhang, Dong Gong, Jiewei Cao, and Chunhua Shen

arXiv preprint arXiv:2001.04123, 2020

arXiv Bib

@article{zhang2020memorizing,
  title = {Memorizing comprehensively to learn adaptively: Unsupervised cross-domain person re-id with multi-level memory},
  author = {Zhang, Xinyu and Gong, Dong and Cao, Jiewei and Shen, Chunhua},
  journal = {arXiv preprint arXiv:2001.04123},
  year = {2020},
}

Services

Area Chair, ICCV 2025, WACV 2026

Outstanding Reviewer, CVPR 2025

Senior Program Commitee Members, IJCAI 2025

Registration Chair, DICTA 2025

Guest Editor, Entropy 2025 (Topic: Rethinking Representation Learning in the Age of Large Models [submission deadline: 31 October 2025])

Program Commitee Members, ICLR, ICML, NeurIPS, CVPR, ICCV, AAAI, IJCAI, ACMMM, ECCV, BMVC

Journal Reviewer, IEEE TPAMI, IJCV, IEEE TIP, TOMM, IEEE TNNLS, TMM, PR, Neurocomputing

Session Member, Award panel member in Sydney AI meetup 2024

Teaching

2020, S2 - Guest Lecturer, COMP SCI 3314: Introduction to Statistical Machine Learning, The University of Adelaide

2019, S2 - Guest Lecturer, COMP SCI 3314: Introduction to Statistical Machine Learning, The University of Adelaide

2016, S1 - Teaching Assistant, 2080387: Pattern Recognition, Tongji University

2015, S2 - Teaching Assistant, 2080214: Machine Vision, Tongji University