Shared Predictive Cross-Modal Deep Quantization
Abstract
Many recent studies have proven that quantization-based approaches perform generally better than hashing-based approaches on single-modal similarity search.
In this paper, we propose a deep quantization approach, which is among the early attempts of leveraging deep neural networks into quantization-based cross-modal similarity search.
However, due to high storage cost and low query efficiency, these methods cannot deal with large-scale multimodal data. To tackle the efficiency and storage challenges, we study compact coding, a promising solution approaching crossmodal similarity search, especially focusing on a common real-world cross-modal search scenario: image-to-text search.To tackle the efficiency and storage challenges, we study compact coding, a promising solution approaching crossmodal similarity search, especially focusing on a common real-world cross-modal search scenario: image-to-text search.Compact coding methods transform high-dimensional data points to indexable short binary codes, with which similarity search can be executed very efficiently.
Conclusion
In this Shared Predictive Cross-Modal Deep Quantization paper, we proposed a novel quantization approach, namely SPDQ, for efficient cross-modal similarity search.
The superiority of the proposed approach lies in: 1) exploiting a deep neural network to construct the shared subspace across differentmodalitiesand the privatesubspace foreachmodality,
in which the correlations between multiple modalities can be well discovered and the specific characteristics of each modality can also be maintained
2) introducing label alignment to the quantization training procedure, thus preserving the semantic similarities of image–text pairs and greatly improving the search accuracy.
The experimental results on two benchmark multimodal data sets demonstrate that the proposed approach surpasses the existing methods.