Data privacy refers to the sensitive information that individuals, organizations or other entities would not like to expose to the external world. For example, medical records can be one kind of privacy data. Privacy data usually contain sensitive information that is very important to its owner and should be processed carefully. Data privacy is not equal to data security. Data security ensures that data or information systems are protected from invalid operations, including unauthorized access, use, exposure, damage, modification, copy, deletion and so on.
Data security can’t guarantee data privacy and vice versa. Figure 1 shows the relation between data security and data privacy. A represents the situation where data privacy is violated while data security is not. For example, an authorized user may expose sensitive information stored in the system by mistake. The exposure operation is authorized, i.e.
data security is not violated. However, sensitive information stored in the system is exposed to the public, which indicates the violation of data privacy. On the contrary, C presents the situation where data privacy is not violated while data security is. One simple example is that an unauthorized user accesses and modifies the data. There is no information exposure/theft (i.e.
data privacy violation). In case of B, both data security and data privacy are not violated. In this paper, we focus on data privacy instead of data security.Since data privacy is always associated with sensitive data (e.g. SSNs, bank accounts, medical records and so on), the violation of data privacy can cause very serious consequences like identity thefts.
Therefore, privacy preservation is very necessary and important.Obviously, the purpose of privacy. . home address are included) in Massachusetts based on the values of gender, birthday and zip code, Sweeney successfully identified the medical records of most people. 9From the above discussion, traditional data privacy is exposed by direct disclosure and privacy preservation can be implemented via cryptography. In big data era, the way of privacy leakage is not only limited to directing disclosing, but also data inference and speculation.
It is costly to apply cryptography on all data because of the huge volume of big data. Therefore, it is not hard to find that due to the special characteristics of big data, traditional privacy preservation methods can ‘t be applied directly on big data. New technologies need to put forward. In addition, And when it comes to privacy preservation in big data, both unauthorized exposure and unwanted inference should be considered.