Federated Learning Notes

print('Federated Learning Notes')

Expand the set of products we can build, and expand the set of ML models we can train.

Goals of ML and privacy are well aligned -- reduce overfitting and increase generalization

Differential privacy - prove that we're not overfitting

FL - focused update from each data source; ephemeral; only global model persistent

Differences

Adapt to the constantly use of language in the real world

Google keyboard next word predictionSetting search (depending on which app / screen user is in)Google search recommendations

FL in production

FL trains slower (network connection, availability of devices, etc)
FL trains as accurate as non-FL
Add context features that will make model much more useful in production
May be able to surpass what you were doing before (traditional ML not applicable because data can't be aggregated)

Some IP can't be protected

Models used are on device for prediction

FL memorization issues

FL + DP

FL and DP both support low access to data
FL allows for access of all data for a particular user, which helps in DP (bound the contribution of any one user's data)

Test to empirically measure how much data model is memorizing

Secure aggregation

Limit amount of information that the server can learn while it's aggregating the updates
Cryptographically guarantee that the server can't learn any one user's update
Each user sends an encrypted update that includes a noise mask so by itself each of those updates is gibberish
If enough users contribute updates (set by a parameter), then server can decrypt the sum or average of those updates

🌅