print("Differential Privacy")

Udacity Course

Differential Privacy

Types of DP

Local DP

Coin flip jaywalking example

Each person is now protected with plausible deniability

If we collect a bunch of samples, and 60% answer yes, then we know the true distribution is 70%, because 70% averaged with 50% (coin flip) is 60% which is the result we obtained.

NB: This privacy technique comes at the cost of accuracy, especially when we only have a few samples. The greater the privacy protection (plausible deniability) the less accurate the results.


Types of Noise

How much noise to add?

Laplacian Noise


Perfect Privacy (AI model)

Training a model on a dataset should return the same model even if we remove any person from the training set.

Training a model is kind of like querying a database

Two points of complexity


Hospital Scenario

Steps

  1. Ask each hospital to train a model on their own dataset (10 models generated)
  2. Use each model to predict on your own local dataset, generating 10 labels for each datapoint
  3. Perform a DP query to generate the final true (DP) label for each datapoint (max function, where max is the most frequent label across the 10 labels assigned; then add laplacian noise for DP)
  4. Retrain a new model on your local dataset which now has DP labels

Misc Notes

https://arxiv.org/abs/1607.00133

Copyright / Privacy can cause implicit bias issues

https://shows.pippa.io/ipse-dixit/episodes/amanda-levendowski-on-copyright-ais-implicit-bias-problem

https://www.wired.com/2014/11/hacker-lexicon-homomorphic-encryption/

Privacy: Control information about your life and other people's access to that information.

Open AI interview on privacy: https://www.youtube.com/watch?v=by08lyQ18EA