Privacy-Preserving Utility Verification of the Data Published by Non-Interactive Differentially Private Mechanisms

0
758
Privacy-Preserving Utility Verification of the Data Published by Non-interactive Differentially Private Mechanisms

Privacy-Preserving Utility Verification of the Data Published by Non-Interactive Differentially Private Mechanisms

Abstract

Privacy-Preserving Utility Verification of the Data Published by Non-interactive Differentially Private Mechanisms,In the problem of privacy-preserving collaborative data publishing, a central data publisher is responsible for aggregating sensitive data from multiple parties and then anonymizing it before publishing for data mining. In such scenarios, the data users may have a strong demand to measure the utility of the published data, since most anonymization techniques have side effects on data utility. Nevertheless, this task is non-trivial, because the utility measuring usually requires the aggregated raw data, which is not revealed to the data users due to privacy concerns. Furthermore, the data publishers may even cheat in the raw data, since no one, including the individual providers, knows the full data set. In this paper, we first propose a privacy-preserving utility verification mechanism based upon cryptographic technique for DiffPart-a differentially private scheme designed for set-valued data. This proposal can measure the data utility based upon the encrypted frequencies of the aggregated raw data instead of the plain values, which thus prevents privacy breach. Moreover, it is enabled to privately check the correctness of the encrypted frequencies provided by the publisher, which helps detect dishonest publishers. We also extend this mechanism to DiffGen-another differentially private publishing scheme designed for relational data. Our theoretical and experimental evaluations demonstrate the security and efficiency of the proposed mechanism.

Introduction

In digital life every user are connected with internet and web related activity, Now very large user are want to deal with internet and internet data. Some thing now happen with the data writer they want to write data like novel,Story,etc and securely publish this data on Internet publishing site. Number of sites providing a data publishing feature, but now a day unethical activity are increase, so data privacy preserving is become a very important issue on every level. The data privacy is very big issue on publisher site because they want to create a trust model between writer and reader. In this trust some important point are consider that are data privacy, Data integrity, data security, SERVICE providers have the ability to collect large amounts of user data. Sometimes, a set of providers may try to aggregate their data for specific data mining tasks. For example, the hospitals nation-wide may outsource their medical records to a research group for mining the spreading patterns of influenza epidemics. In this process, how to protect users’ privacy is extremely critical. This is the so-called privacypreserving collaborative data publishing problem. A lot of privacy models and corresponding anonymization mechanisms have been proposed in the literature such as kanonymity and differential privacy. k-anonymity and its variants protect privacy by generalizing the records such that they can not be distinguished from some other records. Differential privacy is a much more rigorous privacy model. It requires that the released data is insensitive to the addition or removal of a single record. To implement this model, the corresponding anonymization mechanisms usually have to add noise to the published data, or probabilistically generalize the raw data. Obviously, all these data anonymization mechanisms have serious side effects on the data utility. As a result, the users of the published data usually have a strong demand to verify the real utility of the anonymized data.

Related Work

Related Work For a discussion of the guarantees provided by differential privacy and their limitations, see [Kasiviswanathan and Smith 2008; Kifer and Machanavajjhala 2011].As the theoretical foundations of differential privacy become better understood, there is momentum to prove privacy guarantees of real systems. Several authors have recently proposed methods for reasoning about differential privacy on the basis of different languages and models of computation, e.g. SQL-like languages [McSherry 2009], higher-order functional languages [Reed and Pierce 2010], imperative languages [Chaudhuri et al. 2011], theMapReducemodel [Roy et al. 2010],and I/O automata [Tschantz et al. 2011]. The unifying basis of these approaches are two key results: The first is the observation that one can achieve privacy by perturbing the output of a deterministic program by a suitable amount of symmetrically distributed noise, giving rise to the so-called Laplacian [Dwork et al. 2006b] and Exponentialmechanisms [McSherry and Talwar 2007]. The second result is theorems that establish privacy bounds for the sequential and parallel composition offerentially private programs, see e.g. [McSherry 2009]. In combination, both results form the basis for creating and analyzing programs by composing differentially private building blocks. While approaches relying on composing building blocks apply to an interesting range of examples, they fall short of covering the expanding frontiers of differentially private mechanisms and algorithms. Examples that cannot be handled by previous approaches include mechanisms that aim for weaker guarantees, such as approximate differential privacy [Dwork et al. 2006a], or randomized algorithms that achieve differential privacy without using any standard mechanism [Gupta et al. 2010]. Dealing with such examples requires fine-grained reasoning about the complex mathematical and probabilistic computations that programs perform on private input data. Such reasoning is particularly intricate and error-prone, and calls for principled approaches and tool support. In this article we present a novel framework for formal reasoning about a large class of quantitative confidentiality properties, including (approximate) differential privacy and probabilistic non-interference.

 

Conclusion

In this Privacy-Preserving Utility Verification of the Data Published by Non-interactive Differentially Private Mechanisms paper, we consider the problem of verifying the utility of data released by non-interactive differentially private methods. Similar mechanisms are proposed to achieve the goal for set-valued and relational data respectively. The proposed solutions require the publisher to provide auxiliary datasets in ciphertext along with the publishing data. The providers then sequentially verify the auxiliary datasets to see whether their data is correctly involved. And finally, any individual can compute a linear transformation of the utility of the released dataset in ciphertext with those verified auxiliary datasets and verify whether the utility can be accepted. Experiments illustrate the efficiency of the solution which is mainly affected by the number of providers and the size of the data.