Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • EMILY GLANZ: Hi, everyone.

  • Thanks for joining us today.

  • I'm Emily, a software engineer on Google's federated learning

  • team.

  • DANIEL RAMAGE: And I'm Dan.

  • I'm a research scientist and the team lead.

  • We'll be talking to day about Federated Learning-- machine

  • learning on decentralized data.

  • The goal of federated learning is to enable edge devices to do

  • state-of-the-art machine learning without centralizing

  • data and with privacy by default. And, with privacy,

  • what we mean is that we have an aspiration that app developers,

  • centralized servers, and models themselves learn common

  • patterns only.

  • That's really what we mean by privacy.

  • In today's talk, we'll talk about decentralized data, what

  • it means to work with decentralized data

  • in a centralized fashion.

  • That's what we call federated computation.

  • We'll talk a bit about learning on decentralized data.

  • And then we'll give you an introduction

  • to TensorFlow Federated, which is a way that you

  • can experiment with federated computations in simulation

  • today.

  • Along the way, we'll introduce a few privacy principles,

  • like ephemeral reports, and privacy technologies,

  • like federated model averaging that embody those principles.

  • All right, let's start with decentralized data.

  • A lot of data is born at the edge,

  • with billions of phones and IoT devices that generate data.

  • That data can enable better products and smarter models.

  • You saw in yesterday's keynote a lot of ways

  • that that data can be used locally

  • at the edge, with on-device inference,

  • such as the automatic captioning and next generation assistant.

  • On-device inference offers improvements to latency,

  • lets things work offline, often has battery life advantages,

  • and can also have some substantial privacy advantages

  • because a server doesn't need to be

  • in the loop for every interaction you

  • have with that locally-generated data.

  • But if you don't have a server in the loop,

  • how do you answer analytics questions?

  • How do you continue to improve models based on the data

  • that those edge devices have?

  • That's really what we'll be looking at in the context

  • of federated learning.

  • And the app we'll be focusing on today

  • is Gboard, which is Google's mobile keyboard.

  • People don't think much about their keyboards,

  • but they spend hours on it each day.

  • And typing on a mobile keyboard is 40% slower

  • than on a physical one.

  • It is easier to share cute stickers, though.

  • Gboard uses machine-learned models

  • for almost every aspect of the typing experience.

  • Tap typing, gesture typing both depend on models

  • because fingers are a little bit wider than the key targets,

  • and you can't just rely on people hitting

  • exactly the right keystrokes.

  • Similarly, auto-corrections and predictions

  • are powered by learned models, as well as voice

  • to text and other aspects of the experience.

  • All these models run on device, of course,

  • because your keyboard needs to be able to work

  • offline and quickly.

  • For the last few years, our team has

  • been working with the Gboard team

  • to experiment with decentralized data.

  • Gboard aims to be the best and most privacy forward keyboard

  • available.

  • And one of the ways that we're aiming to do that

  • is by making use of an on-device cache of local interactions.

  • This would be things like touch points, type text, context,

  • and more.

  • This data is used exclusively for federated learning

  • and computation.

  • EMILY GLANZ: Cool.

  • Let's jump in to federated computation.

  • Federated computation is basically

  • a MapReduce for decentralized data

  • with privacy-preserving aggregation built in.

  • Let's introduce some of the key concepts

  • of federated computations using a simpler example than Gboard.

  • So here we have our clients.

  • This is a set of devices--

  • some things like cell phones, or sensors, et cetera.

  • Each device has its own data.

  • In this case, let's imagine it's the maximum temperature

  • that that device saw that day, which

  • gets us to our first privacy technology--

  • on-device data sets.

  • Each device keeps the raw data local,

  • and this comes with some obligations.

  • Each device is responsible for data asset management

  • locally, with things like expiring old data

  • and ensuring that the data is encrypted when it's not in use.

  • So how do we get the average maximum temperature

  • experienced by our devices?

  • Let's imagine we had a way to only

  • communicate the average of all client data items

  • to the server.

  • Conceptually, we'd like to compute an aggregate

  • over the distributed data in a secure and private way, which

  • we'll build up to throughout this talk.

  • So now let's walk through an example

  • where the engineer wants to answer a specific question

  • of the decentralized data, like what fraction of users

  • saw a daily high over 70 degrees Fahrenheit.

  • The first step would be for the engineer

  • to input this threshold to the server.

  • Next, this threshold would then be

  • broadcast to the subset of available devices

  • the server has chosen to participate

  • in this round of federated computation.

  • This threshold is then compared to the local temperature data

  • to compute a value.

  • And this is going to be a 1 or a 0,

  • depending on whether the temperature was greater

  • than that threshold.

  • Cool.

  • So these values would then be aggregated

  • using an aggregation operator.

  • In this case, it's a federated mean,

  • which encodes a protocol for computing the average value

  • over the participating devices.

  • The server is responsible for collating device reports

  • throughout the round and emitting this aggregate, which

  • contains the answer to the engineer's question.

  • So this demonstrates our second privacy technology

  • of federated aggregation.

  • The server is combining reports from multiple devices

  • and only persisting the aggregate, which

  • now leads into our first privacy principle of only an aggregate.

  • Performing that federated aggregation only

  • makes the final aggregate data, those sums and averages

  • over the device reports, available to the engineer,

  • without giving them access to an individual report itself.

  • So this now ties into our second privacy

  • principle of ephemeral reports.

  • We don't need to keep those per-device messages

  • after they've been aggregated, so what

  • we collect only stays around for as long as we need it

  • and can be immediately discarded.

  • In practice, what we've just shown

  • is a round of computation.

  • This server will repeat this process multiple times

  • to get a better estimate to the engineer's question.

  • It repeats this multiple times because some devices may not

  • be available at the time of computation

  • or some of the devices may have dropped out during this round.

  • DANIEL RAMAGE: So what's different

  • between federated computation and decentralized computation

  • in the data center with things like MapReduce?

  • Federal computation has challenges

  • that go beyond what we usually experience

  • in distributed computation.

  • Edge devices like phones tend to have limited communication

  • bandwidth, even when they're connected

  • to a home Wi-Fi network.

  • They're also intermittently available because the devices

  • will generally participate only if they are idle, charging,

  • and on an unmetered network.

  • And because each compute node keeps

  • the only copy of its data, the data itself

  • has intermittent availability.

  • Finally, devices participate only

  • with the user's permission, depending on an app's policies.

  • Another difference is that in a federated setting,

  • it is much more distributed than a traditional data

  • center distributed computation.

  • So to give you a sense of orders of magnitude, usually

  • in a data center, you might be looking

  • at thousands or maybe tens of thousands

  • of compute nodes, where this federated setting might

  • have something like a billion compute nodes.

  • Maybe something like 10 million are

  • available at any given time.

  • Something like 1,000 are selected

  • for a given round of computation,

  • and maybe 50 drop out.

  • That's just kind of a rough sense of the scales

  • that we're interested in supporting.

  • And, of course, as Emily mentioned,

  • privacy preserving aggregation is kind of

  • fundamental to the way that we think

  • about federated computation.

  • So when you posed this set of differences,

  • what does it actually look like when you

  • run a computation in practice?

  • This is a graph of the round completion

  • rate by hour over the course of three days for a Gboard model

  • that was trained in the United States.

  • You see this periodic structure of peaks and troughs, which

  • represent day versus night.

  • Because devices are only participating when they're

  • otherwise idle and charging, this

  • represents that the peaks of down completion rate

  • are when more devices are plugged in,

  • which is usually when they're charging on someone's

  • nightstand as they sleep.

  • Rounds complete faster when more devices are available.

  • And the device availability can change over

  • the course of the day.

  • That, in turn, implies a dynamic data availability

  • because the data itself might be slightly

  • different from the users who plug in phones at night

  • versus the day, which is something

  • that we'll get back to when we talk about federated learning

  • in particular.

  • Let's take a more in-depth example of what a federated

  • computation looks like--

  • the relative typing frequencies of common words in Gboard.

  • Typing frequencies are actually useful for improving the Gboard

  • experience in a few ways.

  • If someone has typed the letters H-I, "hi"

  • is much, much more likely than "hieroglyphic."

  • And so knowing those relative word frequencies

  • allows the Gboard team to make the product better.

  • How would we be compute these relative typing frequencies

  • as a federated computation?

  • Instead of the engineers specifying a single threshold.

  • Now, what they would be specifying

  • is something like a snippet of code

  • that's going to be running on each edge device.

  • And in practice, that will often be something that's actually

  • in TensorFlow, but for here, I've

  • written it as Python X pseudocode.

  • So think of that device data as each device's record

  • of what was typed in recent sessions on the phone.

  • So for each word in that device data,

  • if the word is in one of the common words we're

  • trying to count, we'll increase its count

  • when the local device updates.

  • That little program is what would be shipped to the edge

  • and run locally to compute a little map that

  • says that perhaps this phone typed the word "hello" 18 times

  • and "world" 0 times.

  • That update would then be encoded as a vector.

  • Here, the first element of the vector

  • would represent the count for "hello"

  • and the second one for the count for "world,"

  • which would then be combined and summed

  • using the federated aggregation operators that Emily mentioned

  • before.

  • At the server, the engineer would see the counts

  • from all the devices that have participated in that round,

  • not from any single device, which

  • brings up a third privacy principle

  • of focused collection.

  • Devices report only what is needed

  • for this specific computation.

  • There's a lot more richness in the on-device data

  • set that's not being shared.

  • And if the analyst wanted to ask a different question,

  • for example, counting a different set of words,

  • they would run a different computation.

  • This would then repeat over multiple rounds,

  • getting the aggregate counts higher and higher, which

  • in turn would give us better and better estimates

  • of the relative frequencies of the words typed

  • across the population.

  • EMILY GLANZ: Awesome.

  • Let's talk about our third privacy technology

  • of secure aggregation.

  • In the previous example, we saw how this server only

  • needs to emit the sum of vectors reported by the devices.

  • The server could compute this sum from the device reports

  • directly, but we've been researching ways

  • to provide even stronger guarantees.

  • Can we make it so the server itself cannot inspect

  • individual reports?

  • That is, how do we enforce that only in aggregate privacy

  • principle we saw from before in our technical implementation?

  • Secure aggregation is an optional extension

  • to the client/server protocol that embodies this privacy

  • principle.

  • Here's how it works.

  • So this is a simplified overview that

  • demonstrates the key idea of how a server can compute

  • a sum without being able to decrypt

  • the individual messages.

  • In practice, handling phones that have dropped partway

  • is also required by this protocol.

  • See the paper for details.

  • Awesome.

  • So let's jump into this.

  • Through coordination by the server,

  • two devices are going to agree upon a pair of large masks

  • that when summed add to 0.

  • Each device will add these masks to their vectors

  • before reporting.

  • All devices that are participating

  • in this round of computation will

  • exchange these zero-sum pairs.

  • Reports will be completely masked by these values,

  • such that we see that these added pairs now

  • make each individual report themselves look randomized.

  • But when aggregated together, the pairs cancel out,

  • and we're left with only the sum we were looking for.

  • In practice, again, this protocol

  • is more complicated to handle dropout.

  • So we showed you what you can do with federated computation.

  • But what about the much more complex workflows associated

  • with federated learning?

  • Before we jump into federated learning,

  • let's look at the typical workflow

  • a model engineer who's performing machine learning

  • would go through.

  • Typically, they'll have some data in the cloud

  • where they start training and evaluation jobs, potentially

  • in grids to experiment with different hyperparameters,

  • and they'll monitor how well these different jobs are

  • performing.

  • They'll end up with a model that will

  • be a good fit for the distribution of cloud data

  • that's available.

  • So how does this workflow translate

  • into a federated learning workflow?

  • Well, the model engineer might still

  • have some data in the cloud, but now this

  • is proxy data that's similar to the on-device data.

  • This proxy data might be useful for training and evaluating

  • in advance, but our main training loop

  • is now going to take place on our decentralized data.

  • The model engineer will still do things

  • that are typical of a machine learning workflow,

  • like starting and stopping tasks, trying out

  • different learning rates or different hyperparameters,

  • and monitoring their performance as training is occurring.

  • If the model performs well on that decentralized data set,

  • the model engineer now has a good release candidate.

  • They'll evaluate this release candidate

  • using whatever validation techniques they typically

  • use before deploying to users.

  • These are things you can do with ModelValidator and TFX.

  • They'll distribute this final model for on-device inference

  • with TensorFlow Lite after validation,

  • perhaps with a rollout or A/B testing.

  • This deployment workflow is a step

  • that comes after federated learning once they

  • have a model that works well.

  • Note that the model does not continue

  • to train after it's been deployed for inference

  • on device unless the model engineer is

  • doing something more advanced, like on-device personalization.

  • So how does this federated learning part work itself?

  • If a device is idle and charging,

  • it will check into the server.

  • And most of the time, it's going to be told

  • to go away and come back later.

  • But some of the time, the server will have work to do.

  • The initial model as dictated by the model engineer

  • is going to be sent to the phone.

  • For the initial model, usually 0s or a random initialization

  • is sufficient.

  • Or if they have some of that relevant proxy data

  • in the cloud, they can also use a pre-trained model.

  • The client computes an update to the model using

  • their own local training data.

  • Only this update is then sent to the server

  • to be aggregated, not the raw data.

  • Other devices are participating in this round,

  • as well, performing their own local updates to the model.

  • Some of the clients may drop out before reporting their update,

  • but this is OK.

  • The server will aggregate user updates into a new model

  • by averaging the model updates, optionally

  • using secure aggregation.

  • The updates are ephemeral and will be discarded after use.

  • The engineer will be monitoring the performance

  • of federated training through metrics

  • that are themselves aggregated along with the model.

  • Training rounds will continue if the engineer is

  • happy with model performance.

  • A different subset of devices is chosen by the server

  • and given the new model parameters.

  • This is an iterative process and will continue

  • through many training rounds.

  • So what we've just described is our fourth privacy technology

  • of federated model averaging.

  • Our diagram showed federated averaging

  • as the flavor of aggregation performed

  • by the server for distributed machine learning.

  • Federated averaging works by computing

  • a data-weighted average of the model updates

  • from many steps of gradient descent on the device.

  • Other federization optimization techniques could be used.

  • DANIEL RAMAGE: So what's different

  • between federated learning and traditional distributed

  • learning inside a data center?

  • Well, it's all the differences that we

  • saw with federated computation plus some additional ones that

  • are learning specific.

  • For example, the data sets in a data center

  • are usually balanced in size.

  • Most compute nodes will have a roughly equal size

  • slice of the data.

  • In the federated setting, each device has one users' data,

  • and some users might use Gboard much more than others,

  • and therefore those data set sizes might be very different.

  • Similarly, the data in federated computation

  • is very self-correlated.

  • It's not a representative sample of all users' typing.

  • Each device has only one user's data in it.

  • And many distributed training algorithms in the data center

  • make an assumption that every compute node

  • gets a representative sample of the full data set.

  • And, third, that variable data availability

  • that I mentioned earlier--

  • because the people whose phones are plugged in at night

  • versus plugged in during the day might actually be different,

  • for example, night shift workers versus day shift workers,

  • we might actually have different kinds

  • of data available at different times of day,

  • which is a potential source of bias when

  • we're training federated models and an active area of research.

  • What's exciting is the fact that federated model averaging

  • actually works well for a variety of state-of-the-art

  • models despite these differences.

  • That's an empirical result. When we started this line

  • of research, we didn't know if that would be true or if it

  • would apply widely to the kinds of state-of-the-art models that

  • teams like Gboard are interested in pursuing.

  • The fact that it does work well in practice is great news.

  • So when does federated learning apply?

  • When is it most applicable?

  • It's when the on-device data is more

  • relevant than the server-side proxy data or its privacy

  • sensitive or large in ways that would make it not make sense

  • to upload.

  • And, importantly, it works best when

  • the labels for your machine-learned algorithm

  • can be inferred naturally from user interaction.

  • So what does that naturally inferred label look like?

  • Let's take a look at some examples from Gboard.

  • Language modeling is one of the most essential models

  • that powers a bunch of Gboard experiences.

  • The key idea in language modeling

  • is to predict the next word based on typed text so far.

  • And this, of course, powers the prediction strip,

  • but it also powers other aspects of the typing experience.

  • Gboard uses the language model also

  • to help understand as you're tap typing or gesture typing which

  • words are more likely.

  • The model input in this case is the type in sequence so far,

  • and the output is whatever word the user had typed next.

  • That's what we mean by self-labeling.

  • If you take a sequence of text, you

  • can use every prefix of that text to predict the next word.

  • And so that gives a series of training examples

  • as result of people's natural use of the keyboard itself.

  • The Gboard team ran dozens of experiments

  • in order to replace their prediction strip language

  • model with a new one based on a more modern recurrent neural

  • network architecture, described in the paper linked below.

  • On the left, we see a server-trained recurrent neural

  • network compared to the old Gboard model,

  • and on the right, a federated model compared

  • to that same baseline.

  • Now, these two model architectures are identical.

  • The only difference is that one is trained in the data center

  • using the best available server-side proxy data

  • and the other was trained with federated learning.

  • Note that the newer architecture is better in both cases,

  • but the federated model actually does

  • even better than the server model,

  • and that's because the decentralized data better

  • represents what people actually type.

  • On the x-axis here for the federated model,

  • we see the training round, which is

  • how many rounds of computation did it

  • take to hit a given accuracy on the y-axis?

  • And the model tends to converge after about 1,000 rounds, which

  • is something like a week on wall clock time.

  • That's longer than in the data center,

  • where the x-axis measures the step of SGD,

  • where we get to a similar quality in about a day or two.

  • But that week long time frame is still

  • practical for machine learning engineers

  • to do their job because they can start many models in parallel

  • and work productively in this setting,

  • even though it takes a little bit longer.

  • What's the impact of that relatively small difference?

  • It's actually pretty big.

  • The next word prediction accuracy

  • improves by 25% relative.

  • And it actually makes the prediction

  • strip itself more useful.

  • Users click it about 10% more.

  • Another example that the Gboard team has been working with

  • is emoji prediction.

  • Software keyboards have a nice emoji interface

  • that you can find, but many users

  • don't know to look there or find it inconvenient.

  • And so Gboard has introduced the ability

  • to predict emoji right in line on the prediction strip, just

  • like next words.

  • And the federated model was able to learn

  • that the fire emoji is an appropriate completion

  • for this party is lit.

  • Now, on the bottom, you can see a histogram

  • of just the overall frequency of emojis

  • that people tend to type, which has the laugh/cry emoji much

  • more represented.

  • So this is how you know that the context really

  • matters for emoji.

  • We wouldn't want to make that laugh cry emoji just the one

  • that we suggest all the time.

  • And this model ends up with 7% more accurate emoji

  • predictions.

  • And Gboard users actually click the prediction strip 4% more.

  • And I think, most importantly, there

  • are 11% more users who've discovered the joy of including

  • emoji in their texts, and untold numbers of users

  • who are receiving those wonderfully emojiful texts.

  • So far, we've focused on the text entry aspects,

  • but there are other components to where federated learning can

  • apply, such as action prediction in the UI itself.

  • Gboard isn't really just used for typing.

  • A key feature is enabling communication.

  • So much of what people type is in messaging apps,

  • and those apps can become more lively

  • when you share the perfect GIF.

  • So just helping people discover great GIFs

  • to search for and share from the keyboard at the right times

  • without getting in the way is one

  • of Gboard's differentiating product features.

  • This model was trained to predict

  • from the context so far, a query suggestion for a GIF

  • or a sticker, a search or emoji, and whether that suggestion is

  • actually worth showing to the user at this time.

  • An earlier iteration of this model

  • is described at the paper linked below.

  • This model actually resulted in a 47% reduction

  • in unhelpful suggestions, while simultaneously increasing

  • the overall rate of emoji, GIF and sticker shares

  • by being able to better indicate when a GIF search would

  • be appropriate, and that's what you can see in that animation.

  • As someone types "good night," that little "g"

  • turns into a little GIF icon, which

  • indicates that a good GIF is ready to share.

  • One final example that I'd like to give from Gboard

  • is the problem of discovering new words.

  • So what words are people typing that Gboard doesn't know?

  • It can be really hard to type a word

  • that the keyboard doesn't know because it will often

  • auto-correct to something that it does know.

  • And Gboard engineers can use the top typed unknown words

  • to improve the typing experience.

  • They might add new common words to the dictionary

  • in the next model release after manual review

  • or they might find out what kinds of typos

  • are common, suggesting possible fixes to other aspects

  • of the typing experience.

  • Here is a sample of words that people tend to type

  • that Gboard doesn't know.

  • How did we get this list of words

  • if we're not sharing the raw data?

  • We actually trained a recurrent network

  • to predict the sequence of characters people

  • type when they're typing words that the keyboard doesn't know.

  • And that model, just like the next word prediction model,

  • is able to be used to sample out letter by letter words.

  • We then take that model in the data center, and we ask it.

  • We just generate from it.

  • We generate millions and millions of samples

  • from that model that are representative of words

  • that people are typing out in the wild.

  • And if we break these down a little bit,

  • there is a mix of things.

  • There's abbreviations, like "really" and "sorry"

  • missing their vowels.

  • There's extra letters added to "hahah" and "ewwww,"

  • often for emphasis.

  • There are typos that are common enough

  • that they show up even though Gboard likes to auto-correct

  • away from those.

  • There are new names.

  • And we also see examples of non-English words being typed

  • in an English language keyboard, which is what this was--

  • English in the US was what this was trained against.

  • Those non-English words actually indicate another way

  • that Gboard might improve.

  • Gboard has, of course, an experience

  • for typing in multiple languages.

  • And perhaps there's ways that that multilingual experience

  • or switching language more easily could be improved.

  • This also brings us to our fourth privacy principle,

  • which is don't memorize individuals' data.

  • We're careful in this case to use only models aggregated

  • over lots of users and trained only on out of vocabulary

  • words that have a particular flavor, such as not having

  • a sequence of digits.

  • We definitely don't want the model

  • we've trained in federated learning to be able to memorize

  • someone's credit card number.

  • And we're looking further at techniques

  • that can provide other kinds of even stronger and more provable

  • privacy properties.

  • One of those is differential privacy.

  • This is the statistical science of learning common patterns

  • in the data set without memorizing individual examples.

  • This is a field that's been around for a number of years

  • and it is very complementary to federated learning.

  • The main idea is that when you're

  • training a model with federated learning or in the data center,

  • you're going to use appropriately calibrated noise

  • that can obscure an individual's impact on the model

  • that you've learned.

  • This is something that you can experiment

  • with a little bit today in the TensorFlow privacy project,

  • which I've linked here, for more traditional data center

  • settings, where you might have all the data available

  • and you'd like to be able to use an optimizer that adds

  • the right kind of noise to be able to guarantee

  • this property, that individual examples aren't memorized.

  • The combination of differential privacy and federated learning

  • is still very fresh.

  • Google is working to bring this to production,

  • and so I'm giving you kind of a preview

  • of some of these early results.

  • Let me give you a flavor of how this works with privacy

  • technology number five--

  • differentially private model averaging,

  • which is described in the ICLR paper linked here.

  • The main idea is that in every round of federated learning,

  • just like what Emily described for a normal round,

  • an initial model will be sent to the device,

  • and that model will be trained on that device's data.

  • But here's where the first difference comes in.

  • Rather than sending that model update back

  • to the server for aggregation, the device first clips

  • the update, which is to say it makes sure

  • that the model update is limited to a maximum size.

  • And by maximum size, we actually mean in a technical sense

  • the L2 ball of in parameter space.

  • Then the server will add noise when combining the device

  • updates for that round.

  • How much noise?

  • It's noise that's roughly on the same order of magnitude

  • as the maximum size that any one user is going to send.

  • With those two properties combined and properly tuned,

  • it means that any particular aspect of the updated

  • model from that round might be because some user's

  • contribution suggested that the model go

  • that direction or it might be because of the random noise.

  • That gives kind of an intuitive notion of plausible

  • deniability about whether or not any change was

  • due to a user versus the noise, but it actually

  • provides even a more stronger formal property

  • that the model that you learn with differentially private

  • model averaging will be approximately the same model

  • whether or not any one user was actually

  • participating in training.

  • And a consequence of that is that if there

  • is something only one user has typed,

  • this model can't learn it.

  • We've created a production system

  • for federated computation here at Google,

  • which is what has been used by the Gboard team in the examples

  • that I've talked about today.

  • You can learn more about this in the paper

  • we published at SysML this year, "Towards Federated Learning

  • at Scale--

  • System Design."

  • Now, this system is still being used internally.

  • It's not yet a system that we expect external developers

  • to be able to use, but that's something

  • that we're certainly very interested in supporting.

  • EMILY GLANZ: Awesome.

  • We're excited to share our community projects that

  • allows all to develop the building blocks

  • of federated computations.

  • And this is TensorFlow Federated.

  • TFF offers two APIs, the Federated Learning or FL API,

  • and the Federated Core, or FC API.

  • The FL API comes with implementations

  • of federated training and evaluation

  • that can be applied to your existing Keras models

  • so you can experiment with federated learning

  • in simulation.

  • The FC API allows you to build your own federated

  • computations.

  • And TFF also comes with a local runtime for simulations.

  • So, earlier, we showed you how federated computation

  • works conceptually.

  • Here's what this looks like in TFF.

  • So we're going to refer to these sensor readings

  • collectively as a federated value.

  • And each federated value has a type, both the placement--

  • so this is at clients--

  • and the actual type of the data items themselves, or a float32.

  • The server also has a federated type.

  • And, this time, we've dropped the curly braces

  • to indicate that this is one value and not many,

  • which gets us into our next concept is distributed

  • aggregation protocol that runs between the clients

  • and this server.

  • So in this case, it's the TFF federated mean.

  • So this is a federated operator that you

  • can think of as a function, even though its inputs

  • and its outputs live in different places.

  • A federated op represents an abstract specification

  • of a distributed communication protocol.

  • So TFF provides a library of these federated operators

  • that represent the common building

  • types of federated protocols.

  • So now I'm going to run through a brief code example using TFF.

  • I'm not going to go too in-depth,

  • so it might look a little confusing.

  • But at the end, I'm going to put up

  • a link to a site that provides more tutorials,

  • and more walkthrough is of the code.

  • So this section of code that I have highlighted right now

  • declares our federated type that represents our input.

  • So you can see we're defining both the placement,

  • so this is at the TFF clients, and that each data

  • item is a tf.float32.

  • Next, we're passing this as an argument

  • to this special function decorator that declares

  • this a federated computation.

  • And here we're invoking our federated operator.

  • In this case, it's that tff.federated_mean on those

  • sensor readings.

  • So now let's jump back to that example

  • where the model engineer had that specific question of what

  • fraction of sensors saw readings that were greater

  • than that certain threshold.

  • So this is what that looks like in TFF.

  • Our first federated operator in this case is

  • the tff.federated_broadcast that's responsible

  • for broadcasting that threshold to the devices.

  • Our next federated operator is the tff.federated_map that you

  • can think of as the map step in MapReduce.

  • That gets those 1s and 0s representing

  • whether their local values are greater than that threshold.

  • And, finally, we perform a federated aggregation so that

  • tff.federated_mean, to get the result back at the server.

  • So let's look at this, again, in code.

  • We're, again, declaring our inputs.

  • Let's pretend we've already declared our readings type

  • and now we're also defining our threshold type.

  • This time, it has a placement at the server,

  • and we're indicating that there is only one value with that

  • all_equal=True, and it's a tf.float32.

  • So we're again passing that into that function decorator

  • to declare this a federated computation.

  • We're invoking all those federated operators

  • in the appropriate order.

  • So we have that tff.federated_broadcast

  • that's working on the threshold.

  • We're performing our mapping step

  • that's taking a computation I'll talk about in a second

  • and applying it to the readings in that threshold

  • that we just broadcast.

  • And this chunk of code represents

  • the local computation each device will be performing,

  • where they're comparing their own data item to the threshold

  • that they received.

  • So I know that was a fast brief introduction

  • to coding with TFF.

  • Please visit this site, tensorflow.org/federated,

  • to get more hands-on with the code.

  • And if you like links, we have one more link

  • to look at all the ideas we've introduced today

  • about federated learning.

  • Please check out our comic book at federated.withgoogle.com.

  • We were fortunate enough to work with two incredibly talented

  • comic book artists to illustrate these comics as graphic art.

  • And it even has corgis.

  • That's pretty cool.

  • DANIEL RAMAGE: All right, so in today's talk,

  • we covered decentralized data, federated computation, how

  • we can use federated computation building blocks to do learning,

  • and gave you a quick introduction to the TensorFlow

  • Federated project, which you can use to experiment with how

  • federated learning might work on data sets that you have already

  • in the server in simulation today.

  • We expect that you might have seen,

  • the TF Lite team has also announced

  • that training is a big part of their roadmap,

  • and that's something that we are also

  • really excited about for being able to enable

  • external developers to run the kinds of things

  • that we're running internally sometime soon.

  • We also introduced privacy technologies, on-device data

  • sets, federated aggregation, secure aggregation,

  • federated model averaging, and the differentially private

  • version of that, which embodies some privacy principles of only

  • an aggregate, ephemeral reports, focused collection,

  • and not memorizing individuals' data.

  • So we hope we've given you a flavor of the kinds of things

  • that federated learning and computation can do.

  • To learn more, check out the comic book

  • and play a little bit with TensorFlow Federated

  • for a preview of how you can write your own kinds

  • of federated computations.

  • Thank you very much.

  • [APPLAUSE]

  • [MUSIC PLAYING]

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it