publications | Xiaosu Zhu's Personal Page

2023

IEEE TIP
Revisiting Multi-Codebook Quantization

Xiaosu Zhu, Jingkuan Song, Lianli Gao, and 2 more authors

IEEE Trans. Image. Process., 2023

Abs Bib HTML Code

Multi-Codebook Quantization (MCQ) is a generalized version of existing codebook-based quantizations for Approximate Nearest Neighbor (ANN) search. Specifically, MCQ picks one codeword for each sub-codebook independently and takes the sum of picked codewords to approximate the original vector. The objective function involves no constraints, therefore, MCQ theoretically has the potential to achieve the best performance because solutions of other codebook-based quantization methods are all covered by MCQ’s solution space under the same codebook size setting. However, finding the optimal solution to MCQ is proved to be NP-hard due to its encoding process, i.e ., converting an input vector to a binary code. To tackle this, researchers apply constraints to it to find near-optimal solutions or employ heuristic algorithms that are still time-consuming for encoding. Different from previous approaches, this paper takes the first attempt to find a deep solution to MCQ. The encoding network is designed to be as simple as possible, so the very complex encoding problem becomes simply a feed-forward. Compared with other methods on three datasets, our method shows state-of-the-art performance. Notably, our method is 11×-38× faster than heuristic algorithms for encoding, which makes it more practical for the real scenery of large-scale retrieval.
@article{DeepQ, author = {Zhu, Xiaosu and Song, Jingkuan and Gao, Lianli and Gu, Xiaoyan and Shen, Heng Tao}, title = {Revisiting Multi-Codebook Quantization}, journal = {IEEE Trans. Image. Process.}, year = {2023}, }

IEEE TIP

Spherical Centralized Quantization for Fast Image Retrieval

Jingkuan Song, Zhibin Zhang, Xiaosu Zhu, and 4 more authors

IEEE Trans. Image. Process., 2023

@article{SCQ,
  author = {Song, Jingkuan and Zhang, Zhibin and Zhu, Xiaosu and Zhao, Qike and Gao, Lianli and Wang, Meng and Shen, Heng Tao},
  title = {Spherical Centralized Quantization for Fast Image Retrieval},
  journal = {IEEE Trans. Image. Process.},
  year = {2023},
}

2022

NeurIPS
A Lower Bound of Hash Codes’ Performance

Xiaosu Zhu, Jingkuan Song, Yu Lei, and 2 more authors

In NeurIPS , 2022

Abs arXiv Bib Code

As a crucial approach for compact representation learning, hashing has achieved great success in effectiveness and efficiency. Numerous heuristic Hamming space metric learning objectives are designed to obtain high quality hash codes. Nevertheless, a theoretical analysis on criteria of learning good hash codes remains largely unexploited. In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes’ performance. Promoting these two characteristics could lift the bound and improve hash learning. We propose a surrogate model to fully exploit such objective by estimating posterior of hash codes and further control it, which results in low-bias optimization. Extensive experiments reveal effectiveness of the proposed method. By testing on a series of hashing methods, we obtain performance improvements among all of them, with an up to 26.5% increase in mean Average Precision and an up to 20.5% increase in accuracy.
@inproceedings{LowerBound, author = {Zhu, Xiaosu and Song, Jingkuan and Lei, Yu and Gao, Lianli and Shen, Heng Tao}, title = {A Lower Bound of Hash Codes' Performance}, booktitle = {NeurIPS}, year = {2022}, }
CVPR
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression

Xiaosu Zhu, Jingkuan Song, Lianli Gao, and 2 more authors

In CVPR , 2022

Abs arXiv Bib PDF Supp Code

Modeling latent variables with priors and hyperpriors is an essential problem in variational image compression. Formally, trade-off between rate and distortion is handled well if priors and hyperpriors precisely describe latent variables. Current practices only adopt univariate priors and process each variable individually. However, we find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective. These findings reveal visual redundancies to improve rate-distortion performance and parallel processing ability to speed up compression. This encourages us to propose a novel vectorized prior. Specifically, a multivariate Gaussian mixture is proposed with means and covariances to be estimated. Then, a novel probabilistic vector quantization is utilized to effectively approximate means, and remaining covariances are further induced to a unified mixture and solved by cascaded estimation without context models involved. Furthermore, codebooks involved in quantization are extended to multi-codebooks for complexity reduction, which formulates an efficient compression procedure. Extensive experiments on benchmark datasets against state-of-the-art indicate our model has better rate-distortion performance and an impressive 3.18x compression speed up, giving us the ability to perform real-time, high-quality variational image compression in practice. Our source code is publicly available at https://github.com/xiaosu-zhu/McQuic.
@inproceedings{McQuic, author = {Zhu, Xiaosu and Song, Jingkuan and Gao, Lianli and Zheng, Feng and Shen, Heng Tao}, title = {Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression}, booktitle = {CVPR}, pages = {17612--17621}, year = {2022}, }

2021

ACM MM
Camera-Agnostic Person Re-Identification via Adversarial Disentangling Learning

Hao Ni, Jingkuan Song, Xiaosu Zhu, and 2 more authors

In ACM MM , 2021

Abs Bib PDF Code

Despite the success of single-domain person re-identification (ReID), current supervised models degrade dramatically when deployed to unseen domains, mainly due to the discrepancy across cameras. To tackle this issue, we propose an Adversarial Disentangling Learning (ADL) framework to decouple camera-related and ID-related features, which can be readily used for camera-agnostic person ReID. ADL adopts a discriminative way instead of the mainstream generative styles in disentangling methods, eg., GAN or VAE based, because for person ReID task only the information to discriminate IDs is needed, and more information to generate images are redundant and may be noisy. Specifically, our model involves a feature separation module that encodes images into two separate feature spaces and a disentangled feature learning module that performs adversarial training to minimize mutual information. We design an effective solution to approximate and minimize mutual information by transforming it into a discrimination problem. The two modules are co-designed to obtain strong generalization ability by only using source dataset. Extensive experiments on three public benchmarks show that our method outperforms the state-of-the-art generalizable person ReID model by a large margin. Our code is publicly available at https://github.com/luckyaci/ADL_ReID.
@inproceedings{MM21, author = {Ni, Hao and Song, Jingkuan and Zhu, Xiaosu and Zheng, Feng and Gao, Lianli}, title = {Camera-Agnostic Person Re-Identification via Adversarial Disentangling Learning}, booktitle = {ACM MM}, pages = {2002--2010}, year = {2021}, }

2020

SIGIR
3D Self-Attention for Unsupervised Video Quantization

Jingkuan Song, Ruimin Lang, Xiaosu Zhu, and 3 more authors

In ACM SIGIR , 2020

Abs Bib PDF Code

Unsupervised video quantization is to compress the original videos to compact binary codes so that video retrieval can be conducted in an efficient way. In this paper, we make a first attempt to combine quantization method with video retrieval called 3D-UVQ, which obtains high retrieval accuracy with low storage cost. In the proposed framework, we address two main problems: 1) how to design an effective pipeline to perceive video contextual information for video features extraction; and 2) how to quantize these features for efficient retrieval. To tackle these problems, we propose a 3D self-attention module to exploit the spatial and temporal contextual information, where each pixel is influenced by its surrounding pixels. By taking a further recurrent operation, each pixel can finally capture the global context from all pixels. Then, we propose gradient-based residual quantization which consists of several quantization blocks to approximate the features gradually. Extensive experimental results on three benchmark datasets demonstrate that our method significantly outperforms the state-of-the-arts. Ablation study shows that both the 3D self-attention module and the gradient-based residual quantization can improve the performance of retrieval. Our model is publicly available at https://github.com/brownwolf/3D-UVQ.
@inproceedings{SIGIR20, author = {Song, Jingkuan and Lang, Ruimin and Zhu, Xiaosu and Xu, Xing and Gao, Lianli and Shen, Heng Tao}, title = {3D Self-Attention for Unsupervised Video Quantization}, booktitle = {ACM SIGIR}, pages = {1061--1070}, year = {2020}, }

2019

IJCAI
Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval

Lianli Gao, Xiaosu Zhu, Jingkuan Song, and 2 more authors

In IJCAI , 2019

Abs arXiv Bib Code

Product Quantization (PQ) has long been a mainstream for generating an exponentially large codebook at very low memory/time cost. Despite its success, PQ is still tricky for the decomposition of high-dimensional vector space, and the retraining of model is usually unavoidable when the code length changes. In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval. DPQ learns the quantization codes sequentially and approximates the original feature space progressively. Therefore, we can train the quantization codes with different code lengths simultaneously. Specifically, we first utilize the label information for guiding the learning of visual features, and then apply several quantization blocks to progressively approach the visual features. Each quantization block is designed to be a layer of a convolutional neural network, and the whole framework can be trained in an end-to-end manner. Experimental results on the benchmark datasets show that our model significantly outperforms the state-of-the-art for image retrieval. Our model is trained once for different code lengths and therefore requires less computation time. Additional ablation study demonstrates the effect of each component of our proposed model. Our code is released at this https URL.
@inproceedings{IJCAI191, author = {Gao, Lianli and Zhu, Xiaosu and Song, Jingkuan and Zhao, Zhou and Shen, Heng Tao}, title = {Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval}, booktitle = {IJCAI}, pages = {723--729}, year = {2019}, pef = {https://www.ijcai.org/proceedings/2019/0102.pdf} }
IJCAI
Deep Recurrent Quantization for Generating Sequential Binary Codes

Jingkuan Song, Xiaosu Zhu, Lianli Gao, and 3 more authors

In IJCAI , 2019

Abs arXiv Bib Code

Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models, where each model can only produce a specific code length. This incurs a considerable training time cost, and largely reduces the flexibility of quantization methods to be deployed in real applications. To address this issue, we propose a Deep Recurrent Quantization (DRQ) architecture which can generate sequential binary codes. To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations. A shared codebook and a scalar factor is designed to be the learnable weights in the deep recurrent quantization block, and the whole framework can be trained in an end-to-end manner. As far as we know, this is the first quantization method that can be trained once and generate sequential binary codes. Experimental results on the benchmark datasets show that our model achieves comparable or even better performance compared with the state-of-the-art for image retrieval. But it requires significantly less number of parameters and training times. Our code is published online: this https URL.
@inproceedings{IJCAI192, author = {Song, Jingkuan and Zhu, Xiaosu and Gao, Lianli and Xu, Xin{-}Shun and Liu, Wu and Shen, Heng Tao}, title = {Deep Recurrent Quantization for Generating Sequential Binary Codes}, booktitle = {IJCAI}, pages = {912--918}, year = {2019}, pef = {https://www.ijcai.org/proceedings/2019/0128.pdf} }