
A Mixed Generative-Discriminative Based Hashing Method
Abstract
A Mixed Generative-Discriminative Based Hashing Method management report in data mining have proven to be useful for a variety of tasks and have attracted extensive attention in recent years. Various hashing approaches have been proposed to capture similarities between textual, visual, and cross-media information. However, most of the existing works use a bag-of-words methods to represent textual information. Since words with different forms may have similar meaning, semantic level text similarities can not be well processed in these methods.
To address these challenges, in this paper, we propose a novel method called semantic cross-media hashing (SCMH), which uses continuous word representations to capture the textual similarity at the semantic level and use a deep belief network (DBN) to construct the correlation between different modalities. To demonstrate the effectiveness of the proposed method, we evaluate the proposed method on three commonly used cross-media data sets are used in this work. Experimental results show that the proposed method achieves significantly better performance than state-of-the-art approaches. Moreover, the efficiency of the proposed method is comparable to or better than that of some other hashing methods.
INTRODUCTION
Along with other increasing requirements, social networking has received a big attention these days. Now-a-days digital information is very easy to access , modify and duplicate. As mobile networks and social media sites are elaborating, information input through various channels is possible. Images and videos are entitled with short tags or captions which give rise to a lot of relevant data. This relevant data have semantic correlations. Hence hashing based methods are used. Because of this information retrieval and duplicate detection is possible. Cross-media retrieval is type of retrieval in which the user input query and the obtained results can be of different form.
Therefore, it is desirable to support the retrieval of information through different modalities. For example, images can be used to find semantically relevant textual information. On the other side, images without (or with little) textual descriptions are highly needed to be retrieved with textual query. Most of the existing works use a bag-of-words to model textual information. The semantic level similarities between words or documents are rarely considered. In short text segments (e.g., microblogs, captions, and tags), the similarities between words are especially important for retrieval. Since words with different forms may have similar meaning .For example: journey versus travel, coast versus shore. According to human-assigned similarity judgments more than 90 percent of subjects thought that these pairs of words had similar meanings. Hence, to construct the relation between textual and visual modes.
RELATED WORK
In this process of hashing there are mainly three steps involved input a query , extracting corresponding information using hashing and giving results to the user. Thus various methods are used for retrieval of cross-media till today. They are cross view hashing, Semantic correlation maximization Discriminative coupled dictionary hashing, Latent semantic sparse hashing, Collective matrix hashing. S. Kumar and R. Udupa proposed Cross-view Hashing which maps similar objects to similar codes across the views to enable similarity search. In this work, a hashing-based approach for solving the cross-view similarity search problem is used where each view of a multi-view data object as a compact binary codeword is represented. To support this similarity search, we need the code words of a data object to be similar if not identical.
Later, code words of similar data objects should also be similar. Assuming that we can somehow map data objects to binary code-words, cross-view similarity search can be reduced to the much simpler problem of retrieving all data objects using hamming distance code word, the codeword for the query. Discriminative coupled dictionary hashing generates a coupled dictionary for each modality based on category labels. In this paper, they introduced a discriminative coupled dictionary hashing approach, coupled dictionary for each modality based on category labels which helped in fast cross-media retrieval. Multi view discriminative coupled dictionary hashing(MVDCDH) is extended from DCDH with multi-view representation to enhance the representing capability of the relatively “weak” modalities. Latent semantic sparse hashing uses Matrix Factorization J. Zhou, G. Ding, and Y. Guo , proposed the use of Factorization to represent text and sparse coding to capture the salient structures of images.
LSSH requires the use of both visual and textual information to construct the data set[8]. In this paper[4]Collective matrix factorization hashing (CMFH) generates unified hash codes for different modalities of one instance through collective matrix factorization with latent factor model collective matrix factorization. Also Yue Ting Zhuang, found Semantic correlation maximization (SCM) integrates semantic labels into the hashing learning procedure for preserving the semantic similarity cross modalities.H. Zhang, J. Yuan, X. Gao and Z. Chen introduced cross media retrieval Boosting via feature analysis and relevance feedback. This feature analysis is visual-auditory analysis which adds the boosting in retrieval. And in paper it has been explained about Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval.
In this paper Tri-space and ranking based method which is heterogeneous similarity measure for cross-media retrieval,While other existing methods only focus on the original low level feature spaces or the third common space, their proposed tri-space method focuses on all features.Xiaohua Zhai ,Yuxin Peng ,Jianguo Xiao, Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization, here to measure the content similarity among different media is the key challenge. In this paper, they propose a novel feature learning algorithm for cross-media data, called joint representation learning (JRL), which is able to explore jointly the correlation and semantic information in a unified optimization framework. JRL integrates the sparse and semisupervised regularization for In this different media types into one unified optimization problem, while existing feature learning methods generally focus on a single media type. On one hand, JRL learns sparse projection matrix for different media simultaneously, so different media can align with each other. Also both the labeled data and unlabeled data of different media types are explored. The Unlabeled data of different media increase the diversity of training data and boost the performance of joint representation learning. Furthermore, JRL incorporate the cross-media correlation into the final presentation.
System Configuration:
H/W System Configuration:-
Processor : Pentium IV
Speed : 1 Ghz
RAM : 512 MB (min)
Hard Disk : 20GB
Keyboard : Standard Keyboard
Mouse : Two or Three Button Mouse
Monitor : LCD/LED Monitor
S/W System Configuration:-
Operating System : Windows XP/7
Programming Language : Java/J2EE
Software Version : JDK 1.7 or above
Database : MYSQL
EXISTING SYSTEM:
- Existing methods proposed to use Canonical Correlation Analysis (CCA), manifolds learning, dual-wing harmoniums, deep autoencoder, and deep Boltzmann machine to approach the task. Due to the efficiency of hashing-based methods, there also exists a rich line of work focusing the problem of mapping multi-modal high-dimensional data to low-dimensional hash codes, such as Latent semantic sparse hashing (LSSH), discriminative coupled dictionary hashing (DCDH), Cross-view Hashing (CVH), and so on.
- Most of the existing works use a bag-of-words to model textual information.
DISADVANTAGES OF EXISTING SYSTEM:
- Due to lack of sufficient training samples, relevance feedback of user was used to accurately refine cross-media similarities.
- Not Textual and visual information was used earlier.
THE PROPOSED METHOD
The processing flow of the proposed semantic cross-media hashing (SCMH) method. Given a collection of text-image bi-modality data, we firstly represent image and text respectively. Through table lookup, all the words in a text are transformed to distributed vectors generated by the word embeddings learning methods. For representing images, we use SIFT detector to extract image keypoints. SIFT descriptor is used to calculate descriptors of the extracted keypoints. After these steps, a variable size set of points in the embeddings space represents the text, and a variable size set of points in SIFT descriptor space represents each image. Then, the Fisher kernel framework is utilized to aggregate these points in different spaces into fixed length vectors, which can also be considered as points in the gradient space of the Riemannian manifold. Henceforth, texts and images are represented by vectors with fixed length. Finally, the mapping functions between textual and visual Fisher vectors (FVs) are learned by a deep neural network. We use the learned mapping function to convert FVs of one modality to another. Hash code generation methods are used to transfer FVs of different modalities to short length binary vectors. In the following section, we provide detailed examples of practical applications of the proposed method.
- Motivated by the success of continuous space word representations (also called word embeddings) in a variety of tasks, in this work we propose to incorporate word embeddings to meet these challenges. Words in a text are embedded in a continuous space, which can be viewed as a Bag-of-Embedded-Words (BoEW).
- Since the number of words in a text is dynamic, we proposed a method to aggregate it into a fixed length Fisher Vector (FV), using a Fisher kernel framework. However, the proposed method only focuses on textual information. Another challenge in this task is how to determine the correlation between multi-modal representations. Since we propose the use of a Fisher kernel framework to represent the textual information, we also use it to aggregate the SIFT descriptors of images.
- Through the Fisher kernel framework, both textual and visual information is mapped to points in the gradient space of a Riemannian manifold. However, the relationships that exist between FVs of different modalities are usually highly non-linear. Hence, to construct the correlation between textual and visual modalities, we introduce a DBN based method to model the mapping function, which is used to convert abstract representations of different modalities from one to another.
ADVANTAGES OF PROPOSED SYSTEM:
- The system proposes to incorporate continuous word representations to handle semantic textual similarities and adopted for cross-media retrieval.
- Inspired by the advantages of DBN in handling highly non-linear relationships and noisy data, the system introduces a novel DBN based method to construct the correlation between different modalities.
- A variety of experiments on three cross-media commonly used benchmarks demonstrate the effectiveness of the proposed method.
- The experimental results show that the proposed method can significantly outperform the state-of-the-art methods.
MODULES
- Word Embeddings Learning
- Fisher Kernel Framework
- Mapping Function Learning
- Hash Code Generation
PROPOSED ALGORITHM
SURF++ algorithm is implemented here. This algorithm is the extention of original algorithm surf. Given a collection of text-image bi-modality data, we will firstly represent image and text respectively. Through table lookup, all the words in a text are transformed and need to extract image Keypoints for representing Images. After these steps, a variable size set of points represents the text, and a variable size set of points represents each image with fixed length. Finally, the mapping functions between textual and visual) are learned by a deep neural network. We will use the learned mapping function to convert one modality to another. Hash code generation methods will be used to transfer different modalities. Following are the sequence of steps:
- Hash code Generation
- Matching
Step 1: Various hashing methods are used to create compact hash codes for cross-media retrieval which preserves similarity. In this project , we will use semantic hashing which will create hash codes for the information , information may be visual or textual. Thus hash code generation will be used to transfer different modalities. Here SURF+ algorithm is used for detection and description scheme. It is partly inspired by Sift descriptor. For representing images , Surf++ detector is used to Extract image keypoints. These extracted keypoints descriptors will be calculated using Surf++ descriptor. Thus a set of points will represent text and one set of points will represent images.
Step 2: By comparing the descriptors obtained from different images, matching pairs can be found. For each descriptor, find a match and then verify matches.
FUTURE WORK
In this work, we propose a novel hashing method, SCMH, to perform the near-duplicate detection and cross media retrieval task. We propose to use a set of word embeddings to represent textual information. Fisher kernel framework is incorporated to represent both textual and visual information with fixed length vectors. For mapping the Fisher vectors of different modalities, a deep belief network is proposed to perform the task. We evaluate the proposed method SCMH on three commonly used data sets. SCMH achieves better results than state-of-the-art methods with different the lengths of hash codes. In NUS-WIDE data set, the relative improvements of SCMH over LSSH, which achieves the best results in these datasets, are 10.0 and 18.5 percent on the Text ! Image and Image ! Text tasks respectively. Experimental results demonstrate the effectiveness of the proposed method on the cross-media retrieval task.
CONCLUSIONS
A Mixed Generative-Discriminative Based Hashing Method management report in data mining .Thus for cross-media retrieval we will propose a hashing method which will not confuse images, also will be a fast media retrieval but will not give many similar images. Thus for a better efficiency results will be obtained using less number of iterations which will consume less time as compared to other methods.