Bo Zhang
Short Bio
I am currently a ZJU 100 Young Professor (Ph.D. supervisor) of Zhejiang University. Previously I served as a senior researcher at Visual Computing Group of Microsoft Research Asia (MSRA) and AI research scientist at DeepSeek.
I received my Ph.D. degree with the Department of Electronic and Computer Engineering at Hong Kong University of Science and Technology (HKUST) in 2019. Prior to that, I received my Bachelor degree of Engineering at Zhejiang University in 2013. I joined the Microsoft Research Asia in May 2019.
My research interest involves 2D/3D content creation, virtual human modeling, multimodal models, and embodied intelligence. My work has made contributions to the field of content generation, such as the high-quality image translation CoCosNet series (with CoCosNet v2 being a CVPR 2021 Best Paper nominee), the industry’s first text-to-image generation diffusion model VQ-Diffusion, the first high-quality 3D diffusion generation model Rodin, the 3D generation technology DreamCraft 3D, and the well-known open-source multimodal large model DeepSeek-VL. Additionally, our work “Bringing Old Photos Back to Life” was listed as one of the top 30 AI advancements in 2020 by the renowned AI media louisbouchard.ai.
We are always open to welcoming self-motivated PhD candidates, Master’s and Bachelor’s students, as well as postdocs and research assistants. Besides, we are also looking for research collaboration with industry and research lab. Feel free to reach out at [email protected].
For prospective students, I am looking for:
- Positive Attitude: I hope you are a nice person; ambitious, caring, energetic, and responsible. I look forward to us resonating well together.
- Strong Mathematical Skills: Ability to abstract problems and a solid foundation in mathematics.
- Excellent Programming Skills: Capability to quickly implement ideas and perform necessary, albeit sometimes “dirty”, but important work.
- Quality over Quantity: Willingness to invest time in producing high-quality work that you can be proud of—aiming for work that counts as three.
- Ownership of Your PhD: This is your PhD. Take full responsibility right from the start and avoid a dependency mindset to empower yourself.
- Commitment to Creation: Willingness to genuinely create (like composing music à la Mozart), rather than picking the low-hanging fruit or chasing after metrics.
Publications
(†Intern student, *Equal Contribution)
DeepSeek AI. “DeepSeek-VL: Towards Real-World Vision-Language Understanding”. (Project lead of DeepSeek-VL) [project][paper] 🔥 (prestigious multimodal language model)
DeepSeek AI. “DeepSeek LLM: Scaling Open-source Language Models with Longtermism”. [project][paper] (prestigious open-source large language model)
Jingxiang Sun†, Bo Zhang✉, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, Yebin Liu✉. “DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior”, International Conference on Learning Representations (ICLR 2024). [project][paper][bibtex]
{sun2023dreamcraft3d,
title={Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior},
author={Sun, Jingxiang and Zhang, Bo and Shao, Ruizhi and Wang, Lizhen and Liu, Wen and Xie, Zhenda and Liu, Yebin},
journal={arXiv preprint arXiv:2310.16818},
year={2023}
}Junshu Tang†, Tengfei Wang†, Bo Zhang✉, Ting Zhang, Ran Yi, Lizhuang Ma, Dong Chen”Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior”, arXiv preprint. [project][paper][bibtex]
{tang2023make,
title={Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior},
author={Tang, Junshu and Wang, Tengfei and Zhang, Bo and Zhang, Ting and Yi, Ran and Ma, Lizhuang and Chen, Dong},
journal={arXiv preprint arXiv:2303.14184},
year={2023}
}Tengfei Wang†*, Bo Zhang*✉, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo. “Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion”, 2023 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2023 Highlight). [project][paper][bibtex]
{wang2022rodin,
title={Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion},
author={Tengfei Wang and Bo Zhang and Ting Zhang and Shuyang Gu and Jianmin Bao and Tadas Baltrusaitis and Jingjing Shen and Dong Chen and Fang Wen and Qifeng Chen and Baining Guo},
journal={arXiv preprint arXiv:2212.06135},
year={2022}
}Binxin Yang†, Shuyang Gu, Bo Zhang✉, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen. “Paint by Example: Exemplar-based Image Editing with Diffusion Models”, 2023 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2023). [paper][code][demo][bibtex]
@article{yang2022paint,
title={Paint by Example: Exemplar-based Image Editing with Diffusion Models},
author={Yang, Binxin and Gu, Shuyang and Zhang, Bo and Zhang, Ting and Chen, Xuejin and Sun, Xiaoyan and Chen, Dong and Wen, Fang},
journal={arXiv preprint arXiv:2211.13227},
year={2022}
}Bowen Zhang†*, Chenyang Qi, Pan Zhang, Bo Zhang✉, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen. “MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation”, 2023 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2023). [project][paper][bibtex]
@article{zhang2022metaportrait,
title={MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation},
author={Zhang, Bowen and Qi, Chenyang and Zhang, Pan and Zhang, Bo and Wu, HsiangTao and Chen, Dong and Chen, Qifeng and Wang, Yong and Wen, Fang},
journal={arXiv preprint arXiv:2212.08062},
year={2022}
}Junshu Tang†, Bo Zhang✉, Binxin Yang†, Ting Zhang, Dong Chen, Lizhuang Ma✉, Fang Wen. “3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation”, arXiv preprint. [project][paper][code][bibtex]
@article{tang2022explicitly,
title={Explicitly Controllable 3D-Aware Portrait Generation},
author={Tang, Junshu and Zhang, Bo and Yang, Binxin and Zhang, Ting and Chen, Dong and Ma, Lizhuang and Wen, Fang},
journal={arXiv preprint arXiv:2209.05434},
year={2022}
}Tengfei Wang†, Ting Zhang, Bo Zhang✉, Hao Ouyang†, Dong Chen, Qifeng Chen, Fang Wen. “Pretraining is All You Need for Image-to-Image Translation”, arXiv preprint. [project][paper][code][demo][bibtex]
@article{wang2022pretraining,
title={Pretraining is All You Need for Image-to-Image Translation},
author={Wang, Tengfei and Zhang, Ting and Zhang, Bo and Ouyang, Hao and Chen, Dong and Chen, Qifeng and Wen, Fang},
journal={arXiv preprint arXiv:2205.12952},
year={2022}
}Hao Ouyang†, Bo Zhang✉, Pan Zhang†, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen✉, Fang Wen. “Real-Time Neural Character Rendering with Pose-Guided Multiplane Images”, European Conference on Computer Vision (ECCV 2022). [project][paper][code][youtube video][bibtex]
@article{ouyang2022real, title={Real-Time Neural Character Rendering with Pose-Guided Multiplane Images},
author={Ouyang, Hao and Zhang, Bo and Zhang, Pan and Yang, Hao and Yang, Jiaolong and Chen, Dong and Chen, Qifeng and Wen, Fang},
journal={arXiv preprint arXiv:2204.11820},
year={2022}
}Bowen Zhang†, Shuyang Gu†, Bo Zhang✉, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, Baining Guo. “StyleSwin: Transformer-based GAN for High-resolution Image Generation”, 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022). [paper][code][demo][bibtex]
@article{zhang2021styleswin,
title={StyleSwin: Transformer-based GAN for High-resolution Image Generation},
author={Bowen Zhang and Shuyang Gu and Bo Zhang and Jianmin Bao and Dong Chen and Fang Wen and Yong Wang and Baining Guo},
journal={arXiv preprint arXiv:2112.10762},
year={2021}
}Shuyang Gu†, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo. “Vector Quantized Diffusion Model for Text-to-Image Synthesis”, 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022 Oral). [paper][code][Huggingface API][huggingface blog][bibtex]
@article{gu2021vector,
title={Vector Quantized Diffusion Model for Text-to-Image Synthesis},
author={Gu, Shuyang and Chen, Dong and Bao, Jianmin and Wen, Fang and Zhang, Bo and Chen, Dongdong and Yuan, Lu and Guo, Baining},
journal={arXiv preprint arXiv:2111.14822},
year={2021}
}Ziyu Wan, Bo Zhang, Dongdong Chen, Jing Liao. “Bringing Old Films Back to Life”, 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022). [project][paper][code][bibtex]
@inproceedings{wan2022bringing, title={Bringing Old Films Back to Life},
author={Wan, Ziyu and Zhang, Bo and Chen, Dongdong and Liao, Jing},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={17694--17703},
year={2022}
}Pan Zhang†, Bo Zhang✉, Ting Zhang, Dong Chen, Fang Wen. “Robust Mutual Learning for Semi-supervised Semantic Segmentation”, arXiv preprint, May 2021. [paper][bibtex]
@article{zhang2021robust,
title={Robust Mutual Learning for Semi-supervised Semantic Segmentation},
author={Zhang, Pan and Zhang, Bo and Zhang, Ting and Chen, Dong and Wen, Fang},
journal={arXiv preprint arXiv:2106.00609},
year={2021}
}Xiaoyu Li, Bo Zhang✉, Jing Liao, Pedro V. Sander. “Let’s See Clearly: Contaminant Artifact Removal for Moving Cameras”, 2021 International Conference on Computer Vision (ICCV 2021). [paper][bibtex]
@article{li2021let,
title={Let's See Clearly: Contaminant Artifact Removal for Moving Cameras},
author={Li, Xiaoyu and Zhang, Bo and Liao, Jing and Sander, Pedro V},
journal={arXiv preprint arXiv:2104.08852},
year={2021}
}Chulin Xie†*, Chuxin Wang†*, Bo Zhang✉, Hao Yang, Dong Chen, Fang Wen. “Style-based Point Generator with Adversarial Rendering for Point Cloud Completion”, 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021). [project][paper][code][bibtex]
@article{xie2021style,
title={Style-based Point Generator with Adversarial Rendering for Point Cloud Completion},
author={Xie, Chulin and Wang, Chuxin and Zhang, Bo and Yang, Hao and Chen, Dong and Wen, Fang},
journal={arXiv preprint arXiv:2103.02535},
year={2021}
}Xingran Zhou†, Bo Zhang✉, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, Fang Wen. “Full-Resolution Correspondence Learning for Image Translation”, 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021 Oral, best paper candidate). [paper][code][bibtex]
@article{zhou2020full,
title={Full-Resolution Correspondence Learning for Image Translation},
author={Zhou, Xingran and Zhang, Bo and Zhang, Ting and Zhang, Pan and Bao, Jianmin and Chen, Dong and Zhang, Zhongfei and Wen, Fang},
journal={arXiv preprint arXiv:2012.02047},
year={2020}
}Pan Zhang†, Bo Zhang✉, Ting Zhang, Dong Chen, Yong Wang, Fang Wen. “Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation”, 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021). [paper][code][bibtex]
@article{zhang2021prototypical,
title={Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation},
author={Zhang, Pan and Zhang, Bo and Zhang, Ting and Chen, Dong and Wang, Yong and Wen, Fang},
journal={arXiv preprint arXiv:2101.10979},
year={2021}
}Xiaoyu Li, Bo Zhang, Jing Liao, Pedro V. Sander. “Deep Sketch-guided Cartoon Video Inbetweening”, IEEE transactions on Visualization and Computer Graphics (TVCG 2021). [paper][code][Youtube][bibtex]
@article{li2021deep,
title={Deep Sketch-guided Cartoon Video Inbetweening},
author={Li, Xiaoyu and Zhang, Bo and Liao, Jing and Sander, Pedro},
journal={IEEE Transactions on Visualization and Computer Graphics},
year={2021},
publisher={IEEE}
}Ziyu Wan†, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen. “Old Photo Restoration via Deep Latent Space Translation”, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022. [paper][code](the algorithm now supports high-resolution restoration.)[bibtex]
@article{wan2020old,
title={Old Photo Restoration via Deep Latent Space Translation},
author={Wan, Ziyu and Zhang, Bo and Chen, Dongdong and Zhang, Pan and Chen, Dong and Liao, Jing and Wen, Fang},
journal={arXiv preprint arXiv:2009.07047},
year={2020}
}Pan Zhang†, Bo Zhang✉, Dong Chen, Lu Yuan, Fang Wen. “Cross-domain Correspondence Learning for Exemplar-based Image Translation”, 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020 Oral). [project][paper][code][slides][bibtex]
@inproceedings{zhang2020cross,
title={Cross-domain correspondence learning for exemplar-based image translation},
author={Zhang, Pan and Zhang, Bo and Chen, Dong and Yuan, Lu and Wen, Fang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5143--5153},
year={2020}
}Ziyu Wan†, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen. “Bringing Old Photos Back to Life”, 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020 Oral). [project][paper][code][supplementary](Welcome to try our Colab demo)[bibtex]
@inproceedings{wan2020bringing,
title={Bringing old photos back to life},
author={Wan, Ziyu and Zhang, Bo and Chen, Dongdong and Zhang, Pan and Chen, Dong and Liao, Jing and Wen, Fang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={2747--2757},
year={2020}
}Xiaoyu Li, Bo Zhang, Jing Liao, Pedro V. Sander. “Document Image Rectification using a Patch-based CNN”, ACM Transactions on Graphics 38(6), 168:1-168:11 (Siggraph Asia 2019). [project][paper][code][bibtex]
@article{li2019document,
title={Document rectification and illumination correction using a patch-based CNN},
author={Li, Xiaoyu and Zhang, Bo and Liao, Jing and Sander, Pedro V},
journal={ACM Transactions on Graphics (TOG)},
volume={38},
number={6},
pages={1--11},
year={2019},
publisher={ACM New York, NY, USA}
}Bo Zhang, Jing Liao, Pedro V. Sander, Amine Bermak. “Deep Exemplar-based Video Colorization”, 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019). [paper][slides][poster][code][youtube video demo][bibtex]
@inproceedings{zhang2019deep,
title={Deep exemplar-based video colorization},
author={Zhang, Bo and He, Mingming and Liao, Jing and Sander, Pedro V and Yuan, Lu and Bermak, Amine and Chen, Dong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={8052--8061},
year={2019}
}Xiaoyu Li, Bo Zhang, Jing Liao, Pedro V. Sander. “Blind Geometric Distortion Correction on Images Through Deep Learning”, 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019). [project][paper][code][bibtex]
@inproceedings{li2019blind,
title={Blind geometric distortion correction on images through deep learning},
author={Li, Xiaoyu and Zhang, Bo and Sander, Pedro V and Liao, Jing},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={4855--4864},
year={2019}
}Bo Zhang, Pedro V. Sander, Chi-Ying Tsui and Amine Bermak. “Microshift: An Efficient Image Compression Algorithm for Hardware”, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2018. [paper][arxiv][code][bibtex]
@article{zhang2018microshift,
title={Microshift: An efficient image compression algorithm for hardware},
author={Zhang, Bo and Sander, Pedro V and Tsui, Chi-Ying and Bermak, Amine},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3430--3443},
year={2018},
publisher={IEEE}
}Xiaopeng Zhong, Bo Zhang, Amine Bermak, Chi-Ying Tsui, Man-Kay Law. “A Low-Power Compression-Based CMOS Image Sensor With Microshift-Guided SAR ADC”, IEEE Transactions on Circuits and Systems II (TCAS-II), 2018. [paper][bibtex]
@article{zhong2018low,
title={A low-power compression-based CMOS image sensor with microshift-guided SAR ADC},
author={Zhong, Xiaopeng and Zhang, Bo and Bermak, Amine and Tsui, Chi-Ying and Law, Man-Kay},
journal={IEEE Transactions on Circuits and Systems II: Express Briefs},
volume={65},
number={10},
pages={1350--1354},
year={2018},
publisher={IEEE}
}Bo Zhang, Pedro V. Sander, Amine Bermak. “Registration Based Retargeted Image Quality Assessment”, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017). [paper][code][bibtex]
@inproceedings{zhang2017registration,
title={Registration based retargeted image quality assessment},
author={Zhang, Bo and Sander, Pedro V and Bermak, Amine},
booktitle={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1258--1262},
year={2017},
organization={IEEE}
}Bo Zhang, Pedro V. Sander, Amine Bermak. “Gradient Magnitude Similarity Deviation On Multiple Scales For Color Image Quality Assessment”, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017). paper[bibtex]
@inproceedings{zhang2017gradient,
title={Gradient magnitude similarity deviation on multiple scales for color image quality assessment},
author={Zhang, Bo and Sander, Pedro V and Bermak, Amine},
booktitle={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1253--1257},
year={2017},
organization={IEEE}
}Bo Zhang, Xiaopeng Zhong, Bo Wang, Pedro V. Sander, and Amine Bermak. “Wide Dynamic Range PSD algorithms and Their Implementation for Compressive Imaging”, in 2016 IEEE International Symposium on Circuits and Systems (ISCAS 2016). [paper][bibtex]
@inproceedings{zhang2016wide,
title={Wide dynamic range PSD algorithms and their implementation for compressive imaging},
author={Zhang, Bo and Zhong, Xiaopeng and Wang, Bo and Sander, Pedro V and Bermak, Amine},
booktitle={2016 IEEE International Symposium on Circuits and Systems (ISCAS)},
pages={2727--2730},
year={2016},
organization={IEEE}
}Xiaopeng Zhong, Bo Zhang, and Amine Bermak. “A Background Subtraction based Column-parallel Analog-to-information Converter for Motion-triggered Vision Sensor”, in 2016 IEEE International Symposium on Circuits and Systems (ISCAS 2016). [paper][bibtex]
@inproceedings{zhong2016background,
title={A background subtraction based column-parallel analog-to-information converter for motion-triggered vision sensor},
author={Zhong, Xiaopeng and Zhang, Bo and Bermak, Amine},
booktitle={2016 IEEE International Symposium on Circuits and Systems (ISCAS)},
pages={1426--1429},
year={2016},
organization={IEEE}
}Liu, Dong, Yongying Yang, Zhongtao Cheng, Hanlu Huang, Bo Zhang, Tong Ling, and Yibing Shen. “Retrieval and analysis of a polarized high-spectral-resolution lidar for profiling aerosol optical properties”, Optics Express, 2013. [paper][bibtex]
@article{liu2013retrieval,
title={Retrieval and analysis of a polarized high-spectral-resolution lidar for profiling aerosol optical properties},
author={Liu, Dong and Yang, Yongying and Cheng, Zhongtao and Huang, Hanlu and Zhang, Bo and Ling, Tong and Shen, Yibing},
journal={Optics express},
volume={21},
number={11},
pages={13084--13093},
year={2013},
publisher={Optical Society of America}
}Liu, Dong, Yongying Yang, Zhongtao Cheng, Hanlu Huang, Bo Zhang, and Yibing Shen. “Development of the ZJU polarized near-infrared high spectral resolution lidar”, in International Symposium on Photoelectronic Detection and Imaging 2013. [paper][bibtex]
@inproceedings{liu2013development,
title={Development of the ZJU polarized near-infrared high spectral resolution lidar},
author={Liu, Dong and Yang, Yongying and Cheng, Zhongtao and Huang, Hanlu and Zhang, Bo and Shen, Yibing},
booktitle={International Symposium on Photoelectronic Detection and Imaging 2013: Laser Sensing and Imaging and Applications},
volume={8905},
pages={89052W},
year={2013},
organization={International Society for Optics and Photonics}
}
Academic Services
Conference Reviewer
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
International Conference on Computer Vision (ICCV)
European Conference on Computer Vision (ECCV)
ACM SIGGRAPH
ACM SIGGRAPH Asia
IEEE Winter Conference on Application of Computer Vision (WACV)
Journal Reviewer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
International Journal on Computer Vision (IJCV)
ACM Transactions on Graphics (TOG)
IEEE Transactions on Image Processing (TIP)
IEEE Transactions on Visualization and Computer Graphics (TVCG)
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
IEEE Transactions on Multimedia (TMM)
Computer Vision and Image Understanding (CVIU)
The Visual Computer (TVCJ)
Neural Computing
Visual informatics
Honors and Awards
Tencent Rhino-bird Research Program (腾讯犀牛鸟计划)
Excellent award, Stars of Tomorrow Internship Program, Microsoft Research Asia (MSRA)
Graduate Research Scholarship of HKUST
Outstanding Undergraduate of Zhejiang University
Outstanding Undergraduate of Zhejiang Province, China
Bronze medal, International RoboCup 2013 competition
First prize, Optical Science Technology Competition of Zhejiang Province
Meritorious Winner (First prize), International Mathematical Contest in Modeling (MCM)
First prize, China Undergraduate Mathematical Contest in Modeling (CUMCM)
First prize, Undergraduate Physics Innovation Competition of Zhejiang Province
First prize, Undergraduate Calculus Competition of Zhejiang Province, China