Private Over – Threshold Aggregation Protocols Over Distributed Datasets

0
380
Private Over-threshold Aggregation Protocols over Distributed Datasets

Private Over – Threshold Aggregation Protocols Over Distributed Datasets

Abstract

Private Over – Threshold Aggregation Protocols Over Distributed Datasets management report in data mining.We propose the first differentially private aggregation algorithm for distributed time-series data that offers good practical utility without any trusted server. This addresses two important challenges in participatory data-mining applications where

  • individual users collect temporally correlated time-series data (such as location traces, web history, personal health data), and
  • an untrusted third-party aggregator wishes to run aggregate queries on the data.

To ensure differential privacy for time-series data despite the presence of temporal correlation, we propose the Fourier Perturbation Algorithm (FPAk). Standard differential privacy techniques perform poorly for time-series data. To answer n queries, such techniques can result in a noise of Θ(n) to each query answer, making the answers practically useless if n is large. Our FPAk algorithm perturbs the Discrete Fourier Transform of the query answers. For answering n queries, FPAk improves the expected error from Θ(n) to roughly Θ(k) where k is the number of Fourier coefficients that can (approximately) reconstruct all the n query answers. Our experiments show that k n for many real-life data-sets resulting in a huge error-improvement for FPAk. To deal with the absence of a trusted central server, we propose the Distributed Laplace Perturbation Algorithm (DLPA) to add noise in a distributed way in order to guarantee differential privacy. To the best of our knowledge, DLPA is the first distributed differentially private algorithm that can scale with a large number of users: DLPA outperforms the only other distributed solution for differential privacy proposed so far, by reducing the computational load per user from O(U) to O(1) where U is the number of users.

Introduction

The Wireless Sensor Networks is an ad hoc network consisting of a large number of distributed autonomous wireless devices called sensors, which is densely deployed in remote areas to detect the environmental conditions such as temperature, pressure, humidity, sound, vibration, motion, pollutants etc. The sensor nodes are resource constrained in terms of energy, memory and computation capabilities. There are three types of nodes: normal sensor nodes (leaf nodes), intermediate nodes (aggregator), base station (BS or sink or query server). A sensor node has the capability of sensing, processing and communicating the data collected from the environment in which it is deployed and report it to the base station located at remote places. An aggregator aggregates sensed data with other data received from multiple sensor nodes based on some preferred aggregation functions(SUM, AVG, Count, MAX, MIN etc.) and forward aggregation results to another aggregator or BS. Finally, the BS processes the received data and derives the significant information reflecting the events in the target field.

The dense deployment of resource constraint sensor nodes in terms of energy, memory, bandwidth, communication and computational capabilities in close proximity sense the same data, which in turn increases the redundant data in the network. The transmission of these redundant data incurs the sensor node energy. A data aggregation mechanism avoids the redundant data transmission employed in WSNs at aggregator level, which reduces the energy consumption of a node and thereby increases the network lifetime. The extension of this approach is in-network data aggregation which aggregates data progressively as the data passes through the network. The data aggregation can be done in tree, cluster or hybrid topology. The remote/hostile environment of wireless sensor nodes makes the security a challenging problem during data aggregation. However the use of WSNs in security critical application increases the need of data privacy during aggregation. The eavesdropping attack in the wireless link and the compromise attacks in the sensor nodes loose data privacy during data aggregation and help the adversaries to know more about the behavior of the individual nodes.

The privacy preserving data aggregation protocols achieve data privacy by protecting the transmission of a node’s data from its neighbors, because the neighbors can know the aggregated sum and encryption key by compromising of nodes. There are two types of privacy concerns in WSNs: internal privacy and external privacy. The internal privacy is to maintain the privacy of a sensor node from other trusted participating sensor nodes in the WSNs, whereas the external privacy is about to protect the data from outsiders (adversaries). Thus the privacy protecting data aggregation aims to protect individual nodes’ privacy by different transmission trends, in such a way that the adversaries can’t get the sensitive information of a particular node even if the adversaries can overhear and decrypt the data. In addition, the verification of data integrity in privacy preserving protocols maintains the consistency and correctness of messages and it provides a guarantee to the BS that the data is original with no alternation during transmission. This increases the accuracy of aggregation which helps the BS to take critical decisions based on the aggregated result reached at the BS. To address the privacy of sensor data during aggregation, many data aggregation protocols exist in WSNs. In this paper, we provide a comprehensive summary of different privacy preservation protocols in the WSNs and some of the application areas.

RELATED WORK

The related work is three directions: general-purpose, special-purpose, and proxy-based. If we assume the existence of a TTP, the cryptographic problem we are considering becomes trivial. Thus all the related work in the literature has been attempting to find a way to replace the TTP while providing security at the same level as when assuming the existence of such a TTP. The general-purpose solutions rely on fundamental theorem of cryptography addressed by Yao and developed by Goldreich et al. The special-purpose group suggests carefully tuned methods to efficiently solve the problem compared with the general purpose solution. Lastly, the proxy-based schemes introduce some special entities and assign a set of tasks to them, and thus these schemes can achieve a further improved efficiency. However, the results of this line of work makes extra assumptions for the extra entities. General-purpose approaches. In general, we may not assume the existence of a trusted party whom all participating users have to trust in a real world scenario. For that, the first approach we consider is a general solution based on SMC.

The notion of SMC allows n users to create a virtual trusted party. Yao first introduced this notion and a method for performing SMC was developed by Goldriech, Micali and Wigderson . Their result is called the fundamental theorem of cryptography, stating that assuming trapdoor permutations exist, there exists an SMC protocol for every polynomial-size function. Unfortunately, due to the trade-off between generality and efficiency, we cannot achieve an efficient solution for our problem using this tool. Special-purpose approaches. There have been a lot of approaches to improve the efficiency of SMC-based general solutions. One key direction is to devise a specific tool for a solution to this cryptographic problem. A closely related work is a protocol proposed by Burkhart and Dimitropoulos. Their solution efficiently operates with respect to its computation complexity, but has two critical drawbacks: if input datasets are disjoint, the accuracy of their construction decreases sharply because the solution is probabilistic and their round complexity is linear in the number of bits in the data elements. Another closely related work is to apply Kissner and Song’s over-threshold set union protocol.

Their protocol allows us to find all elements appearing at least τ times in the union of input multisets. The core idea of their scheme is as follows: each user represents her input Xi as a polynomial fi whose roots are in Xi . The roots of the τ -th derivative of P = Qn i=1 fi give a set consisting of all elements that appear more than τ times in the union. The main shortcoming of this scheme is that each result set does not count the multiplicity of each element. This property may make it difficult to apply the protocol for the κ + problem. Specifically, consider a case where one needs to find all elements with the greatest multiplicity. We then need to execute the scheme repeatedly until obtaining the final result. While our protocol does not have such high overhead for repeated execution, it also allows changing τ during any step of the protocol execution; c.f. §4.3 for a detailed comparison.

Proxy-based approaches. Chow et al. proposed an efficient scheme extending private set operations. Their scheme introduces two special entities: a randomizer and a computing server, all of which should be semi-honest. One issue with their solution is that it cannot support the aggregate operation over multisets. Another related work is Applebaum et al.’s protocol. This solution is based on an efficiency strategy by adding a proxy and database (DB) for the constant round complexity. Both entities are also assumed to be semi-honest to prevent coalition between them. In particular, their scheme extensively relies on oblivious transfer (OT), a computationally expensive public-key primitive which requires two modular exponentiations per invocation and runs for each bit of the user’s data element.

Furthermore, their protocol extensively uses two semantical secure encryption schemes at the same time: ElGamal encryption together with Goldwasser-Micali encryption The computational complexity is expressed as the number of multiplications over modulo p, and assuming that all elements are less than p. TABLE 1 Summary and Comparison; models are Non-proxy based and Proxy-based. Proxy Round Cpx Comp. Cpx Comm. Cpx Ours × O(n) O(n 2k log p) O(n 2k log p) [4] × O(n(n + k log k) log p) O(n 2k) O(n 2k log p) [2] O(1) O(nk log2 p) O(nk log p) Data Aggregation. Data aggregation is an important technique in distributed systems, with many applications to enable efficient utilization of resources. Thus many researchers in wireless and smart grid networks have focused on the problem. However, in this paper we use the term “data aggregation” in a different sense. We consider data aggregation as a part of data and information mining process where data is searched, gathered, and presented in a report-based, summarized format to achieve specific business objectives. In particular, we are interested in keeping privacy during the whole process of data aggregation.

System Configuration:

H/W System Configuration:-

Processor          : Pentium IV

Speed               : 1 Ghz

RAM                  : 512 MB (min)

Hard Disk          : 20GB

Keyboard           : Standard Keyboard

Mouse               : Two or Three Button Mouse

Monitor             : LCD/LED Monitor

S/W System Configuration:-

Operating System               : Windows XP/7

Programming Language       : Java/J2EE

Software Version                 : JDK 1.7 or above

Database                            : MYSQL

Application Areas of PPDA Protocols

Health Monitoring There are two major health monitoring applications for WSN. First, to monitor the performance of an athlete such as tracking respiration and pulse rate using wearable sensors. Secondly to monitor the health of patients, e.g. personal weight, blood sugar level, blood pressure level, etc. The sensor measurements should be kept secret from other people during transmission to sink node.

Military Surveillance

The WSN can be used to replace the guards and sentries around defensive perimeters, keeping soldiers out of harm’s way, to locate and identify troops, vehicles and targets for potential attack. So privacy of the sensor data is always critical and it should be preserved during data aggregation. Private Households Wireless sensors could be placed in houses to collect statistics about water, gas and electricity consumption within a large neighborhood. The aggregated population statistics is helpful for individuals, businessmen and government agencies to plan the resources. However, individual readings reveal the daily activities of a household i.e., if the water consumed is nil, it means that there was no one in the house.

Conclusion

In Private Over – Threshold Aggregation Protocols Over Distributed Datasetsmanagement report in data mining paper we have looked at the problem of finding the κ + elements securely, and formally defined what it means for a protocol to be a secure κ + protocol. We developed two protocols, with varying operation overhead, analyzed their security, and demonstrated their practicality by analyzing its precise computational and communicational cost. Moreover, we provided a full proof showing that our protocol is secure in the presence of semi-honest adversaries.

Since the semi-honest protocols commonly have critical security restrictions, by requiring every adversary to follow the instructions specified in the protocol, we transformed our basic protocol into a stronger κ + protocol which is also secure in the presence of malicious adversaries. In addition to a full description of our protocol with malicious adversaries, we proved that the protocol is secure within the simulation paradigm. In the future, we will look into converting the Zero-Knowledge Proofs from their present interactive variant into Non-Interactive Zero Knowledge Proofs through the Fiat-Shamir heuristic, which will improve the communication complexity of our protocols.