Xiaosu Zhu's Personal Page

Alibaba Cloud.

Hangzhou, China.

Xiaosu Zhu is now working at Alibaba Cloud. He got Ph.D. at Center for Future Media, UESTC, supervised by Prof. Jingkuan Song and Prof. Lianli Gao.

Interested in AIGC, neural discrete representation, BEV/roadside perception.

Earlier, Aurora Music was selected as the finalist of Design Innovator, in 2018 Windows Developer Award.

news

Apr 01, 2023	2 papers have been accepted to IEEE T-IP!
Sep 14, 2022	1 paper has been accepted to NeurIPS 2022!
Mar 03, 2022	1 paper has been accepted to CVPR 2022!

latest posts

selected publications

IEEE TIP
Revisiting Multi-Codebook Quantization

Xiaosu Zhu, Jingkuan Song, Lianli Gao, and 2 more authors

IEEE Trans. Image. Process., 2023

Abs Bib HTML Code

Multi-Codebook Quantization (MCQ) is a generalized version of existing codebook-based quantizations for Approximate Nearest Neighbor (ANN) search. Specifically, MCQ picks one codeword for each sub-codebook independently and takes the sum of picked codewords to approximate the original vector. The objective function involves no constraints, therefore, MCQ theoretically has the potential to achieve the best performance because solutions of other codebook-based quantization methods are all covered by MCQ’s solution space under the same codebook size setting. However, finding the optimal solution to MCQ is proved to be NP-hard due to its encoding process, i.e ., converting an input vector to a binary code. To tackle this, researchers apply constraints to it to find near-optimal solutions or employ heuristic algorithms that are still time-consuming for encoding. Different from previous approaches, this paper takes the first attempt to find a deep solution to MCQ. The encoding network is designed to be as simple as possible, so the very complex encoding problem becomes simply a feed-forward. Compared with other methods on three datasets, our method shows state-of-the-art performance. Notably, our method is 11×-38× faster than heuristic algorithms for encoding, which makes it more practical for the real scenery of large-scale retrieval.
@article{DeepQ, author = {Zhu, Xiaosu and Song, Jingkuan and Gao, Lianli and Gu, Xiaoyan and Shen, Heng Tao}, title = {Revisiting Multi-Codebook Quantization}, journal = {IEEE Trans. Image. Process.}, year = {2023}, }
NeurIPS
A Lower Bound of Hash Codes’ Performance

Xiaosu Zhu, Jingkuan Song, Yu Lei, and 2 more authors

In NeurIPS , 2022

Abs arXiv Bib Code

As a crucial approach for compact representation learning, hashing has achieved great success in effectiveness and efficiency. Numerous heuristic Hamming space metric learning objectives are designed to obtain high quality hash codes. Nevertheless, a theoretical analysis on criteria of learning good hash codes remains largely unexploited. In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes’ performance. Promoting these two characteristics could lift the bound and improve hash learning. We propose a surrogate model to fully exploit such objective by estimating posterior of hash codes and further control it, which results in low-bias optimization. Extensive experiments reveal effectiveness of the proposed method. By testing on a series of hashing methods, we obtain performance improvements among all of them, with an up to 26.5% increase in mean Average Precision and an up to 20.5% increase in accuracy.
@inproceedings{LowerBound, author = {Zhu, Xiaosu and Song, Jingkuan and Lei, Yu and Gao, Lianli and Shen, Heng Tao}, title = {A Lower Bound of Hash Codes' Performance}, booktitle = {NeurIPS}, year = {2022}, }
CVPR
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression

Xiaosu Zhu, Jingkuan Song, Lianli Gao, and 2 more authors

In CVPR , 2022

Abs arXiv Bib PDF Supp Code

Modeling latent variables with priors and hyperpriors is an essential problem in variational image compression. Formally, trade-off between rate and distortion is handled well if priors and hyperpriors precisely describe latent variables. Current practices only adopt univariate priors and process each variable individually. However, we find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective. These findings reveal visual redundancies to improve rate-distortion performance and parallel processing ability to speed up compression. This encourages us to propose a novel vectorized prior. Specifically, a multivariate Gaussian mixture is proposed with means and covariances to be estimated. Then, a novel probabilistic vector quantization is utilized to effectively approximate means, and remaining covariances are further induced to a unified mixture and solved by cascaded estimation without context models involved. Furthermore, codebooks involved in quantization are extended to multi-codebooks for complexity reduction, which formulates an efficient compression procedure. Extensive experiments on benchmark datasets against state-of-the-art indicate our model has better rate-distortion performance and an impressive 3.18x compression speed up, giving us the ability to perform real-time, high-quality variational image compression in practice. Our source code is publicly available at https://github.com/xiaosu-zhu/McQuic.
@inproceedings{McQuic, author = {Zhu, Xiaosu and Song, Jingkuan and Gao, Lianli and Zheng, Feng and Shen, Heng Tao}, title = {Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression}, booktitle = {CVPR}, pages = {17612--17621}, year = {2022}, }