Shared Predictive Cross-Modal Deep Quantization

0
721
Shared Predictive Cross-Modal Deep Quantization

Shared Predictive Cross-Modal Deep Quantization

Abstract

Shared Predictive Cross-Modal Deep Quantization.This paper presents a deep compact code learning solution for efficient cross-modal similarity search.
Many recent studies have proven that quantization-based approaches perform generally better than hashing-based approaches on single-modal similarity search.
In this paper, we propose a deep quantization approach, which is among the early attempts of leveraging deep neural networks into quantization-based cross-modal similarity search.
However, due to high storage cost and low query efficiency, these methods cannot deal with large-scale multimodal data. To tackle the efficiency and storage challenges, we study compact coding, a promising solution approaching crossmodal similarity search, especially focusing on a common real-world cross-modal search scenario: image-to-text search.To tackle the efficiency and storage challenges, we study compact coding, a promising solution approaching crossmodal similarity search, especially focusing on a common real-world cross-modal search scenario: image-to-text search.Compact coding methods transform high-dimensional data points to indexable short binary codes, with which similarity search can be executed very efficiently.

Conclusion

In this Shared Predictive Cross-Modal Deep Quantization paper, we proposed a novel quantization approach, namely SPDQ, for efficient cross-modal similarity search.

The superiority of the proposed approach lies in: 1) exploiting a deep neural network to construct the shared subspace across differentmodalitiesand the privatesubspace foreachmodality,

in which the correlations between multiple modalities can be well discovered and the specific characteristics of each modality can also be maintained

2) introducing label alignment to the quantization training procedure, thus preserving the semantic similarities of image–text pairs and greatly improving the search accuracy.

The experimental results on two benchmark multimodal data sets demonstrate that the proposed approach surpasses the existing methods.