Relevance Feedback Algorithms Inspired by Quantum Detection
ABSTRACT
Relevance Feedback Algorithms Inspired by Quantum Detection Information Retrieval (IR) is concerned with indexing and retrieving documents including information relevant to a user„s information need. Relevance Feedback (RF) is a class of effective algorithms for improving Information Retrieval (IR) and it consists of gathering further data representing the user„s information need and automatically creating a new query. Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. Finding relevant document is one of the hard tasks. we propose a class of RF algorithms inspired by quantum detection to re-weight the query terms and to re-rank the document retrieved by an IR system.
Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing. Automated information retrieval systems are used to reduce what has been called “information overload”. Most IR systems compute a numeric score on how well each object in the database matches the query, and rank the objects according to this value. The top ranking objects are then shown and IR system return relevant document to the user. The process may then be iterated if the user wishes to refine the query.
INTRODUCTION
Information retrieval (IR) has experienced huge growth in the past decade as increasing numbers and types of information systems are being developed for end-users. The incorporation of users into IR system evaluation and the study of users information search behaviours and interactions have been identified as important concerns for IR researchers .The proposition that IR systems are fundamentally interactive and should be evaluated from the perspective of users is not new. IR system return the information to the user need. First user enter a query into IR system. IR system return documents related to query. Suppose query is too long IR system can’t understand which type of document user need. So IR system return only relevant and irrelevant documents.
If user want only relevant document, Then RF (Relevance Feedback) algorithm is use, RF algorithm is a class of effective algorithm for improving IR system and modify the query by reducing the term found in irrelevant document. It consist of gathering further data representing the user information need and automatically creating new query. IR system return only relevant document that user need and top rank the document. Suppose user enter a query but there is no document related to query text. Then pattern matching algorithm is use, after using Pattern matching algorithm IR system match the query text with the content present inside the document and IR system return the relevant document that user want.
An IR system addresses the problems caused by query ambiguity by gathering additional evidence that can be used to automatically modify the query . Usually a query is expanded because the queries are short and it cannot exhaustively describe every aspect of the user‟s information need; however, some irrelevant documents may be retrieved or relevant documents may also be missed when a query is not short .The automatic procedure that modify the user‟s queries is known as Relevance Feedback (RF); some relevance assessments about the retrieved documents are collected and the query is expanded by the terms found in the relevant documents, reduced by the terms found in the irrelevant documents or reweighted using relevant or irrelevant documents.
Relevance feedback (RF) is the retrieval task where the system is given not only a user query, but also user feedback on some of the top ranked results. Feedback gives the retrieval system a chance to improve its results by exploiting the extra information through more elaborate techniques. This can be helpful in cases where the users want as many relevant results as possible. RF is one of the most useful Query Modification techniques in the field of Information Retrieval (IR). This method is put into practice when the user needs to improve the query formulated to the IR system, because the documents initially retrieved do not completely fulfill the user‟s information need. Relevance feedback works in the following way: a user submits a query representing his/her information need to the IR system, which then ranks the documents according to their corresponding degrees of relevance to the query (with the documents most closely matching the query ranked first).
The user then inspects this list,1 and determines which documents are relevant and which are not relevant to his/her information need (the relevance judgments). Using this information, the IR system updates the initial query, modifying the importance of the terms it contains 2 (term reweighting), and adding new terms that are considered useful to retrieve more relevant documents (query expansion). This process is repeated until the user is completely satisfied with the set of retrieved relevant documents. Relevance feedback has been successfully applied in a great variety of IR models. RF can be positive, negative or both. Positive RF only brings relevant documents into play and negative RF makes only use of irrelevant documents; any effective RF algorithms includes a “positive” component.
Although positive feedback is a well established technique by now, negative feedback is still problematic and requires further investigation, yet some proposals have already been made such as grouping irrelevant documents before using them for reducing the query. Some of the first types of IR interactions were associated with relevance feedback. Looking closely at this seemingly simple type of interaction, we see the difficulties inherent in Interactive Information Retrieval IIR studies. Assuming that users are provided with information needs, each user is likely to enter a different query, which will lead to different search results and different opportunities for relevance feedback. Each user, in turn, will provide different amounts of feedback, which will create new lists of search results. Furthermore, causes and consequences of these interactions cannot be observed easily since much of this exists in the user‟s head. The actions that are available for observation querying, saving a document, providing relevance feedback are surrogates of cognitive activities. From such observable behaviours we must infer cognitive.
System Configuration:
H/W System Configuration:-
Processor : Pentium IV
Speed : 1 Ghz
RAM : 512 MB (min)
Hard Disk : 20GB
Keyboard : Standard Keyboard
Mouse : Two or Three Button Mouse
Monitor : LCD/LED Monitor
S/W System Configuration:-
Operating System : Windows XP/7
Programming Language : Java/J2EE
Software Version : JDK 1.7 or above
Database : MYSQL
PROPOSED WORK
We are going to propose a IR system using which the user can easily get the relevant document. When the user enter the query for search the document, then it directly compare within the data of the document file. So the relevant document will found by the system. We are also working to add feature, the system will recommend the keyword to the user for getting the best result or document. The basic procedure is:
- The user issues a (short, simple) query
- The system returns an initial set of retrieval results.
- The user marks some returned documents as relevant or nonrelevant.
- The system computes a better representation of the information need based on the user feedback.
- The system displays a revised set of retrieval results.
- It provides relevant documents only to user„s information need.
- Easy to retrieve the data. 8. It reduces the manual work.
- Explicit Relevance Feedback also called as Term relevance feedback. The system will suggest the term which types of term the user should add in search.
- Implicit Relevance Feedback will find out the frequently search document easily.
Relevance Feedback Algorithm
Relevance feedback (RF) is the retrieval task where the system is given not only a user query, but also user feedback on some of the top ranked results. Feedback gives the retrieval system a chance to improve its results by exploiting the extra information through more elaborate techniques. This can be helpful in cases where the users want as many relevant results as possible. RF is one of the most useful Query Modification techniques in the field of Information Retrieval (IR). This method is put into practice when the user needs to improve the query formulated to the IR system, because the documents initially retrieved do not completely fulfill the user‟s information need. Relevance feedback works in the following way: a user submits a query representing his/her information need to the IR system, which then ranks the documents according to their corresponding degrees of relevance to the query (with the documents most closely matching the query ranked first).
Relevance feedback works in the following way: a user submits a query representing his/her information need to the IR system, which then ranks the documents according to their corresponding degrees of relevance to the query (with the documents most closely matching the query ranked first). The user then inspects this list,1 and determines which documents are relevant and which are not relevant to his/her information need (the relevance judgments). Using this information, the IR system updates the initial query, modifying the importance of the terms it contains 2 (term reweighting), and adding new terms that are considered useful to retrieve more relevant documents (query expansion). This process is repeated until the user is completely satisfied with the set of retrieved relevant documents. Relevance feedback has been successfully applied in a great variety of IR models.
CONCLUDING REMARKS
In this paper, a class of RF algorithms inspired by quantum detection has been proposed to re-weight query terms by projecting the query vector on the subspace represented by the eigenvector which is the optimal solution to the problem of finding the maximal distance between two quantum probability distributions. RF is then viewed as a signal detection technique – relevance is the document state to be detected and the queries are the detectors. First, the documents retrieved by an IR system to answer the original query are used to extract a feature matrix. Second, some relevance assessments are obtained according to whether RF is explicit or pseudo. The quantum probability distributions can be estimated and the optimal solution of a distance between two quantum probability distributions can be calculated.
The eigenvector that results from this optimisation problem can be utilized to project the query vector. Third, the retrieved documents can be re-ranked to answer the modified query. The query term re-weighting is different from the re-weighting performed by the classical RF algorithms since each query term variation depends on the other query term variations, thus capturing a kind of term dependence which is not captured by other RF algorithms. Our approach has low complexity and can be used in reality. For each query, the running time of the first document retrieval depends on the number of query terms as usual. The construction of the feature matrix depends on the number of retrieved documents used to estimate the probability distributions – our experiments showed that a few dozens documents can be sufficient.
The data that are necessary to compute this feature matrix can be obtained from the snippets or the term arrays of the retrieved documents; these snippets and arrays are usually available from the main memory of the IR system. The complexity of the calculation of the eigenvectors is limited by the small size of the matrix that represents the distance between two quantum probability distributions – the size of this matrix is indeed the number of terms of the original query and it cannot increase since our approach can effectively work for query term re-weighting with no query expansion. In general, RF and in particular the methods inspired by quantum detection can integrate the retrieval functionalities of modern IR systems within a single learningto-rank frameword. These systems do not rely on only one retrieval technology, they rather combine different algorithms and data structures and predict document rankings. The algorithms inspired by quantum detection that are described in this paper can also be integrated. How they perform in a learning-to-rank framework is left to the future work. This paper focuses on explicit RF and on pseudo RF Implicit RF is based on observations (e.g., click-through data) that are proxies of relevance. The main problem with proxies is that they are not necessarily reliable indicators of relevance and thus should be considered noisy. How quantum detection can help “absorbe” noise can also be investigated in the future work.
Conclusion and further work
Relevance Feedback Algorithms Inspired By Quantum Detection Relevance feedback can go through one or more iterations of this sort. The process exploits the idea that it may be difficult to formulate a good query when you don‟t know the collection well, but it is easy to judge particular documents, and so it makes sense to engage in iterative query refinement of this sort. In such a scenario, relevance feedback can also be effective in tracking a user‟s evolving information need: seeing some documents may lead users to refine their understanding of the information they are seeking. The user submit a query into IR system.
IR system return both relevant and irrelevant documents so the automatic procedure that modify the user‟s queries is known as RF; some relevance assessments about the retrieved documents are collected and the query is expanded by the terms found in the relevant documents, reduced by the terms found in the irrelevant documents or reweighted using relevant or irrelevant documents.