TransWikia.com

Laplace mechanism on vector record?

Cross Validated Asked on December 29, 2021

Does the definition of neighboring database in differential privacy capture the multi-dimensional record?

Let’s say we have a database domain $mathbb{N}^{ntimes d}$ where $n$ is the number of records and $d$ is the number of attributes in each record.
Assuming there is no missing value, let $x,y in mathbb{N}^{ntimes d}$ be two databases where at most one record differ.

Can we say that $x,y$ are two neighboring databases?
If so, how can we bound the $ell_1$ sensitivity of a query, say simple average query.

Below are the notations of x.y for convenience.
$$
x =begin{pmatrix}
x_{1,1} & x_{1,2} & ldots & x_{1,d} \
vdots & & & vdots \
x_{n,1} & x_{n,2} & ldots & x_{n,d} \
end{pmatrix}, y
=begin{pmatrix}
x_{1,1} & x_{1,2} & ldots & x_{1,d} \
vdots & & & vdots \
y_{n,1} & y_{n,2} & ldots & y_{n,d} \
end{pmatrix}
$$

One Answer

The $ell_1$ sensitivity in DP, depending on the target quer $Q$, is defined as: $$Delta(Q) = max_{x,y: ||x-y||_1leq1}{||Q(x)-Q(y)||_1}$$

First, note that $||x-y||_1$ can mean two things and you have to be careful which one do you mean cause there are some differences: (i) it can mean that $y$ has one more record than $x$, or (ii) $x$ and $y$ have the same number of records where only one record is different between them. Usually, we mean the case (i).

Second, it's important to know the $Q$ is doing the computation (e.g. average) on which columns of the dataset? In the most of datasets, $d-1$ columns are just non-sensitive attributes, and only $1$ column is the sensitive one. For example, a dataset of patients which may or may not have a specific diseas, usually columns like age, gender, etc. are only used by $Q$ for finding a subset in the dataset, then the average of the disease column is calculate. Hence, the term $||Q(x)-Q(y)||_1$ is only calculated based on the value of the sensitive columns.

Third, you need to know what is range (min and max) of every sensitive column you have. For example, if you have two sensitive columns $x_1$ and $x_2$, so what is the min and max value for these two? Moreover, how your query operates on these two? do you release the average of both? or the average of each columns separately?

So, at the end, you need to carefully understand what is the exact design of the system at hand, then using the above mathematical definition will be straightforward.

Answered by moh on December 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP