Practical Approximate K Nearest Neighbor Queries with Location and Query Privacy

0
475
Practical Approximate k Nearest Neighbor Queries with Location and Query Privacy

Practical Approximate K Nearest Neighbor Queries with Location and Query Privacy

Abstract

Practical Approximate K Nearest Neighbor Queries with Location and Query Privacy management report in data mining.In mobile communication, spatial queries pose a serious threat to user location privacy because the location of a query may reveal sensitive information about the mobile user. In this paper, we study approximate k nearest neighbor (kNN) queries where the mobile user queries the location-based service (LBS) provider about approximate k nearest points of interest (POIs) on the basis of his current location. We propose a basic solution and a generic solution for the mobile user to preserve his location and query privacy in approximate kNN queries.

The proposed solutions are mainly built on the Paillier public-key cryptosystem and can provide both location and query privacy. To preserve query privacy, our basic solution allows the mobile user to retrieve one type of POIs, for example, approximate k nearest car parks, without revealing to the LBS provider what type of points is retrieved. Our generic solution can be applied to multiple discrete type attributes of private location-based queries. Compared with existing solutions for kNN queries with location privacy, our solution is more efficient. Experiments have shown that our solution is practical for kNN queries.

INTRODUCTION

The embedding of positioning capabilities (e.g., GPS) in mobile devices facilitates the emergence of location based services (LBS), which is considered as the next “killer application” in the wireless data market. LBS allows clients to query a service provider (such as Google or Bing Maps) in a ubiquitous manner, in order to retrieve detailed information about points of interest (POIs) in their vicinity (e.g., restaurants, hospitals, etc.). The LBS provider processes spatial queries on the basis of the location of the mobile user. Location information collected from mobile users, knowingly and unknowingly, can reveal far more than just a user’s latitude and longitude. Knowing where a mobile user is can mean knowing what he/she is doing: attending a religious service or a support meeting, visiting a doctor’s office, shopping for an engagement ring, carrying out nonwork related activities in office, or spending an evening at the corner bar. It might reveal that he is interviewing for a new job or “out” him as a participant at a gun rally or a peace protest. It can mean knowing with whom he/she spends time, and how often.

When location data are aggregated it can reveal his/her regular habits and routines – and when he deviates from them. A 2010 survey conducted for Microsoft in the United Kingdom, Germany, Japan, the United States, and Canada found that 94 percent of consumers who had used location-based services considered them valuable but the same survey found that 52 percent were concerned about potential loss of privacy1 . In this paper, we study approximate k nearest neighbor (kNN) queries where the mobile user queries the location-based service (LBS) provider about approximate k nearest points of interest (POIs) on the basis of his current location. In general, the mobile user needs to submit his location to the LBS provider which then finds out and returns to the user the k nearest POIs by comparing the distances between the mobile user’s location and POIs nearby. This reveals the mobile user’s location to the LBS provider.

There have been numerous techniques that can provide a certain degree of location privacy. These techniques mainly include

  • Information access control;
  • Mix zone ;
  • k-anonymity
  • “Dummy” locations;
  • Geographic data transformation
  • Private Information Retrieval (PIR).
  • Two LBS servers .

LBS queries based on access control, mix zone and k-anonymity require the service provider or the middleware that maintains all user locations. They are vulnerable to misbehavior of the third party. They offer little protection when the service provider/middleware is owned by an untrusted party. There have been private data inadvertently disclosed over the Internet in the past. k-anonymity is initially used for identity privacy protection. It is generally inadequate for location privacy protections, where the notion of distance between locations is important (unlike distances between identities). The effect of LBS queries based on k-anonymity depends heavily on the distribution and density of the mobile users, which, however, are beyond the control of the location privacy technique. LBS queries based on dummy locations require the mobile user randomly to choose a set of fake locations, to send the fake locations to the LBS and to receive the false reports from the LBS over the mobile network.

We have four main contributions as follows:

  • Current PIR-based LBS queries usually require two stages. In the first stage, the mobile user retrieves the index of his location from the LBS provider. In the second stage, the mobile user retrieves the POIs according to the index from the LBS provider. The mobile user and the LBS provider need to run two PIR protocols succeedingly. To simplify the process, we give a solution for kNN queries which needs one PIR only, i.e., the mobile user sends his location (encrypted) to the LBS provider and receives the k nearest POIs (encrypted) from the LBS provider.
  • Current PIR-based LBS queries only allow the mobile user to find out k nearest POIs regardless of the type of POIs. For the first time, we take into account the type of POIs in kNN queries. We give a solution for the mobile user to preserve query privacy, i.e., finding out k nearest PIOs of the same type without revealing to LBS provider what type of POIs he is interested in. For example, our solution allows the mobile user to find out k nearest car parks from the LBS provider without revealing to LBS provider that the type of POIs is car park.
  • Current PIR-based LBS queries allow the mobile user to retrieve only one POI after a protocol execution. For the first time, we take into account sequential queries. We give a solution for the mobile user to query a sequence of POIs without need of multiple executions of the whole protocol. This greatly improves the efficiency of sequential queries.
  • Current PIR-based LBS solutions allow LBS queries according to location and single POI type attribute only. They do not support LBS queries with multiple POI type attributes, e.g., car park and daily parking fee (which can be categorized into discrete data values, such as “Low” (<$10), “Middle” ($10- $30) and “High” (>$30)).

For the first time, we give a generic solution which can be applied to multiple discrete type attributes of private queries. To analyze the security of our solutions, we define a security model for private kNN queries. The security analysis has shown that our solutions ensures both location privacy in the sense that the user does not reveal any information about his location to the LBS provider and query privacy in the sense that the user does not reveal what type of POIs he is interested in to the LBS provider. In addition, our solutions have data privacy in the sense that the LBS provider releases to the user only k nearest POIs per query. We have implemented our solutions on an example of location-based database and experiments have shown that our solutions are practical.

The main differences between our previous work and our current paper are:

  1. The previous work fixed the number of nearest neighbors k. The current work allows any number of nearest neighbors k up to K, where K is a constant;
  2. The previous work defined location privacy which implied query privacy. The current work defines location and query privacy separately;
  3. The previous work used the Rabin cryptosystem to prevent the mobile user to retrieve more than one data per query and did not allow sequential queries without multiple executions of the whole protocol. The current work uses RSA to achieve the data privacy and support sequential queries;
  4. The current work adds a generic solution for multiple discrete type attributes of private location-based queries;
  5. we have added some experiments for variable k.

    System Configuration:

    H/W System Configuration:-

    Processor          : Pentium IV

    Speed               : 1 Ghz

    RAM                  : 512 MB (min)

    Hard Disk          : 20GB

    Keyboard           : Standard Keyboard

    Mouse               : Two or Three Button Mouse

    Monitor             : LCD/LED Monitor

    S/W System Configuration:-

    Operating System               : Windows XP/7

    Programming Language       : Java/J2EE

    Software Version                 : JDK 1.7 or above

    Database                            : MYSQL

EXISTING SYSTEM:

  • The LBS provider processes spatial queries on the basis of the location of the mobile user. Location information collected from mobile users, knowingly and unknowingly, can reveal far more than just a user’s latitude and longitude. Knowing where a mobile user is can mean knowing what he/she is doing: attending a religious service or a support meeting, visiting a doctor’s office, shopping for an engagement ring, carrying out non-work related activities in office, or spending an evening at the corner bar.
  • It might reveal that he is interviewing for a new job or “out” him as a participant at a gun rally or a peace protest. It can mean knowing with whom he/she spends time, and how often.
  • When location data are aggregated it can reveal his/her regular habits and routines – and when he deviates from them.
  • There have been numerous techniques that can provide a certain degree of location privacy. These techniques mainly include: Information access control, Mix zone, k-anonymity, “Dummy” locations, Geographic data transformation, Private Information Retrieval (PIR), Two LBS servers.

DISADVANTAGES OF EXISTING SYSTEM:

  • They are vulnerable to misbehavior of the third party.
  • They offer little protection when the service provider/middleware is owned by an untrusted party.
  • There have been private data inadvertently disclosed over the Internet in the past.
  • Current PIR-based LBS queries only allow the mobile user to find out k nearest POIs regardless of the type of POIs.
  • Current PIR-based LBS queries allow the mobile user to retrieve only one POI after a protocol execution.
  • Current PIR-based LBS solutions allow LBS queries according to location and single POI type attribute only. They do not support LBS queries with multiple POI type attributes

PROPOSED SYSTEM:

  • In this paper, we study approximate k nearest neighbor (kNN) queries where the mobile user queries the location based service provider about approximate k nearest points of interest on the basis of his current location.
  • We construct solutions for kNN queries on the basis of PIR with the Paillier public-key cryptosystem. We have four main contributions:
  • We give a solution for kNN queries which needs one PIR only, i.e., the mobile user sends his location (encrypted) to the LBS provider and receives the k nearest POIs (encrypted) fromthe LBS provider.
  • We give a solution for the mobile user to preserve query privacy, i.e., finding out k nearest PIOs of the same type without revealing to LBS provider what type of POIs he is interested in. For example, our solution allows the mobile user to find out k nearest car parks from the LBS provider without revealing to LBS provider that the type of POIs is car park.
  • We take into account sequential queries. We give a solution for the mobile user to query a sequence of POIs without need of multiple executions of the whole protocol.
  • We give a generic solution which can be applied to multiple discrete type attributes of private queries.

ADVANTAGES OF PROPOSED SYSTEM:

  • The previous work fixed the number of nearest neighbors k. The current work allows any number of nearest neighbors k up to K, where K is a constant.
  • The previous work defined location privacy which implied query privacy. The current work defines location and query privacy separately.
  • The previous work used the Rabin cryptosystem to prevent the mobile user to retrieve more than one data per query and did not allow sequential queries without multiple executions of the whole protocol.
  • The current work uses RSA to achieve the data privacy and support sequential queries.
  • The current work adds a generic solution for multiple discrete type attributes of private location-based queries.

Conclusion and further work

In Practical Approximate K Nearest Neighbor Queries with Location and Query Privacy management report in data mining paper, we have studied existing solutions to perform the kNN operation in the context of MapReduce. We have first approached this problem from a workflow point of view. We have pointed out that all solutions follow three main steps to compute kNN over MapReduce, namely preprocessing of data, partitioning and actual computation. We have listed and explained the different algorithms which could be chosen for each step, and developed their pros and cons, in terms of load balancing, accuracy of results, and overall complexity.

In a second part, we have performed extensive experiments to compare the performance, disk usage and accuracy of all these algorithms in the same environment. We have mainly used two real datasets, a geographic coordinates one (2 dimensions) and an image based one (SURF descriptors, 128 dimensions). For all algorithms, it was the first published experiment on such high dimensions. Moreover, we have performed a fine analysis, outlining, for each algorithm, the importance and difficulty of fine tuning some parameters to obtain the best performance. Overall, this work gives a clear and detailed view of the current algorithms for processing kNN on MapReduce. It also clearly exhibits the limits of each of them in practice and shows precisely the context where they best perform. Above all, this paper can be seen as a guideline to help selecting the most appropriate method to perform the kNN join operation on MapReduce for a particular use case.

After this thorough analysis, we have found a number of limitations on existing solution which could be addressed in future work. First, besides H-BkNNJ, all methods need to replicate the original data to some extend. The number of replications, although necessary to improve precision, has a great impact on disk usage and communication overhead. Finding the optimal parameters to reduce this number is still an open issue. Second, the partitioning methods are all based on properties of R. However, one can expect R to vary as it represents the query set. The cost of repartitioning is currently prohibitive so, for dynamic queries, better approaches might rely on properties of S. Finally, MapReduce, especially through its Hadoop implementation, is well suited for batch processing of static data. The efficiency of theses methods on data stream has yet to be investigated.