This paper proposes a new privacy-preserving scheme for estimating the size of the intersection of two given secret subsets. Given the inner product of two Bloom filters (BFs) of the given sets, the proposed scheme applies Bayesian estimation under an assumption of beta distribution for an a priori probability of the size to be estimated. The BF retains the communication complexity and the Bayesian estimation improves the estimation accuracy. A possible application of the proposed protocol is an epidemiological datasets regarding two attributes, Helicobacter pylori infection and stomach cancer. Assuming information related to Helicobacter Pylori infection and stomach cancer are separately collected, the protocol demonstrates that a χ2-test can be performed without disclosing the contents of the two confidential databases.
- Bloom filter
- Privacy-preserving data mining