Subtitles section Play video
KRZYSZTOF OSTROWSKI: My name is Chris.
I'm leading the TensorFlow Federated in Seattle.
And I'm here to tell you about federated learning
and the platform we've built to support it.
There are two parts of this talk.
First I'll talk about federated learning, how it works and why,
and then I'll switch to talk about the platform.
All right, let's do it.
And this is a machine learning story,
so it begins with data, of course.
And today, the most exciting data out there
is born decentralized on billions of personal devices,
like cell phones.
So how can we create intelligence and better
products from data that's decentralized?
Traditionally, what we do is that there's
a server in the clouds that is hosting the machine learning
model in TensorFlow, and clients all talk to it
to make predictions on their behalf.
And as they do, the client data accumulates on the server
right next to the model.
So model and data, that's all in one place.
Very easy.
We can use traditional techniques that we all know.
And what's also great about this scenario
is that the same model is exposed to data
from all clients and so--
pushing millions of clients, and so it's very efficient.
All right.
If it's so good, why change that, right?
Well actually, in some applications,
it's not so great.
First it doesn't work offline.
There's high latency, so applications that need
fast turnaround may not work.
All these network communications consuming battery life
and bandwidth.
And some data is too sensitive, so collecting is not
an option, or too large.
Some sensitive data could be large.
OK.
What can we do?
Maybe we go to the complete other extreme.
So ditch the server in the clouds.
Now each client is its own client bubble, right?
It has its own TensorFlow run time, its own model.
And it's training.
It's grinding over its data to train
and doesn't communicate with anything.
So now, of course, nothing leaves the device.
None of the concerns from the preceding slide
apply, but you have other problems.
A single client just doesn't have enough data, very often.
It doesn't have enough data to create a good model on its own.
So this doesn't always work.
What if we bring the several back,
but the clients are actually only receiving data
from the server?
Could that work?
So if you have some proxy data on the server that's
similar to the on-device data, you could use it.
You could pre-train the model on the server,
then deploy it to clients, and then let
it potentially evolve further.
So that could work.
Except, very often, there's no good proxy data or not enough
of it for the kinds of on-device data you're interested in.
A second problem is that this here,
the intelligence we're creating is
kind of frozen in time, in the sense that, as I mentioned,
clients won't be able to do a whole lot on their own.
And why does it matter?
And here's one concrete example from actual production
application.
Consider a smart keyboard that's trying
to learn to autocomplete.
If you train a model in the server
and deploy it, now suddenly millions of people
start using a new word, what happens?
You'd think, hey, it's a strong signal, millions of people.
But if you're not one of those millions,
your phone has no clue, right?
And so it could take a lot of punching
into that phone to make it notice that something new has
happened, right?
So yeah, this is not what we want.
We really need the clients so somehow contribute back
towards the common good so they can all benefit.
Federated learning is one way to do that.
Here we start with initial model provided by the server.
This one is not pre-trained.
We don't assume we have proxy data.
It doesn't matter.
It can be just 0s.
So we send it to the client.
The client now trains it locally on its own data.
And this is more than just one step of gradient descent,
but it's also now training to convergence.
Typically, you would just make a few passes
over the data on the clients and then produce
a locally trained model and send it to the server.
And now all the clients are training independently,
but they all use the same initial model to start with.
And the server's job is to orchestrate this process
to make it happen and produce the same--
feed the same initial model to all the clients.
So now once the server collects the locally trained models
from clients, it aggregates them into
a so-called federated model.
And typically what we do is simply average the model
parameters across all clients.
So the server just adds the numbers and that's it.
So this federated model, it has been influenced by data
from our clients, right?
Because it's been influenced by the client models, and those,
in turn, have been influenced by client data.
So we do get those benefits of scale in this scenario,
so that's great.
But there's one question.
What happens to privacy?
So let's look at this closely.
First, client data never left the device.
Only the models trained on this data was shared.
So next, the server does not retain, store,
any of the client models.
It simply adds them up and then throws them away.
It deletes them, right?
So they are ephemeral.
But here they're asking how they know that this
is what the server is doing.
Maybe the server is secretly, somehow,
logging something on site.
So there are cryptographic protocols
that we can use to ensure that that's all legit.
So with those protocols, the server
will only see the final result of aggregation
that will not have access to any of your client contributions.
And we use those in practice, so hopefully
to put your mind at rest.
So the server only ever sees the final aggregate.
You can still wonder how do we know that that doesn't contain
anything sensitive.
So this is where you would use differential privacy.
In a nutshell, each client keeps its updates
and adds a little bit of noise.
So once the final aggregate emerges of the server,
there's enough noise to sort of mask out any
of the individual contributions, but there is still
enough signal to make progress.
So not to get too much into the detail,
but this is also a technique we use in production.
Differential privacy is an established and a commonly used
way to provide anonymity.
If you have any more concerns, I'll
be happy to discuss them offline.
So how does it work in practice?
Firstly, it's not enough to just do it once.
So once you produce a federated model,
you'll feed it back on the server as an initial model
for the next round, then execute many thousands
of rounds, potentially.
That's how long it takes to converge.
And so in this scenario, both clients and server have a role.
Clients are doing all the learning.
That's where all the machine learning sits.
And server is just orchestrating the process,
aggregating and also providing continuity to this process
as we move from one round to another,
because the server is what carries
the state between rounds.
And to drill into this a little bit more,
in the practical applications, clients
are not all available at the same time.
You remember those concerns I mentioned about consuming
battery life and bandwidth.
We want to give users a good experience,
so we want it to be non-disruptive.
So we will only perform training when the device is connected
into the power source on the Wi-Fi network and idles,
so that the user is not negatively affected.
And so that means, out of the billions of clients out there,
only a small fraction are available at any given
time for training.
And this is illustrated on this diagram here.
This is from an actual production system.
We can kind of see, when you track the number of rounds
per hour the server completes across time,
you can see it kind of maxes out at night,
when everyone's asleep and their phone
is connected, and beeps at lunch when everybody's punching
into their phone while eating.
So yeah.
So the clients keep coming and going.
And so that means that, as we move across rounds
from one round to another, the sort of participating clients
will change for various reasons, including that some of them
lose connectivity.
So clients can, in general, drop out at any time.
So that's why, in an actual production system when
it's deployed, there's always a client selection
phase where the exact set of participants is chosen.
And there are many factors that go into it, including
concepts about bias.
But the for this talk, all that's important to remember
is that the sort of clients in each round is different.
So in a nutshell, the characteristics
of production scenarios-- at a glance,
there are many of them-- millions, billions.
They don't talk to each other.
These are cell phones, so no peer to peer connectivity.
Communication is the bottleneck in the whole system.
Clients want to be anonymous.
And, for the most part, they're interchangeable in the sense
that, in the grand scheme of things,
whether a particular device contributed data or not,
it doesn't really affect the result in any way.
And the clients are unavailable and they
can drop out at any time.
Therefore, we have to effectively consider them
as stateless.
Even if they have some memory, there's
no guarantee when they'll be back.
So we treat them as stateless low compute nodes.
And finally, the distribution of data in clients
is very non-uniform because people defer.
So it is only for mobile devices?
No, not at all.
You could use federated learning for things
like a group of hospitals wanting
to learn something together or a group
of financial institutions.
So the general approach is the same.
Of course, the details will differ a little bit.
In this case, clients are very reliable, potentially
very capable.
But there are fewer of them.
So for some of the cryptographic protocols that we're using,
they work better when there are more clients.
And so here, you may have to work harder or use some more
specialized files.
So how well does it work in practice?
We've deployed it on Google in several applications,
including the smart keyboard that I mentioned.
So it runs in production on millions of devices.
And when you compare the performance
of an autocomplete model that learns on federated data,
it's clearly better-- a hierarchy.
We have some more user clicks than the former model trained
on the server.
This is illustrated in some of these diagrams here.
We can see on the right side, the federated model
stabilizes with a better performance.
And the reason for that is that the on-device data
is the good data, the higher quality
data than the proxy data on the server.
Also, I mentioned before that non-federated models
where limited, and they wouldn't necessarily
be able to adapt to changes in the environment
and pick up changes over time.
And so here, we demonstrate the federated model can actually
learn new words that were not initially in the vocabulary
and notice that the people are using them and include them.
So it's worth pointing out, this was definitely
one example of an application we definitely want to use.
Differentiate privacy to make sure
that the only thing you're learning is common things
and that nothing sensitive gets through.
So it worked at Google.
Of course, what you really want to know
is if it will work for your application.
So some rough guidelines here.
Mostly common sense stuff-- like if their own device
data is high quality or if it's sensitive or large, good reason
to use federated learning.
Of course, you also need the labels for training.
And so we can't pay someone to go and label the data,
because it's on-device.
We can access it.
So in some cases, the labels are just part of the data.
Like in the smart keyboard, you know, all the characters
you're trying to predict, people will eventually
type those characters.
And so that's what the labels are.
In some cases, you will have to work harder
to wire up additional signal into your application
to have those labels.
But other than that, it's an--
FL is a new area of active research.
Many variants, many extensions exist
in lots of publications-- hundreds of publications.
Several workshops just this year,
one of them organized at Google.
So you see a little picture of us in this workshop.
Yeah, so it's not guaranteed that any particular solution
will immediately work for you.
You have to just try things out and see what works.
And what we have to all collectively
do to advance this area, this promising field,
is to explore together.
And so that's why we've built TensorFlow Federated.
And so let's get to it.
All right.
So TensorFlow Federated.
What is it?
Development environment that is designed specifically for
federated learning, although it's also
applicable to marginal kinds of computations
that I will get to in a minute.
It provides a new programming language that's--
it's not the interface that's embedded in Python,
so you kind of don't notice it.
But there is actually kind of a programming
underneath that combines TensorFlow and distributed
communication.
In that language, we have implemented a number
of federated algorithms.
And so we provide everything you need for simulations.
So the runtimes, data sets, and everything is there.
It's part of TensorFlow, and it's on GitHub,
so everything is open source and modifiable.
Whom is it for, though?
Two main audiences.
One is the researchers.
Here what we to enable is for people
to very quickly get started.
And so we provide this pseudocode-like, high-level--
language with pseudo high-level abstractions
so that it's very easy for you to express your ideas
in a way that's super compact and you
can see what you're doing.
Also, a number of things you can copy, paste, and fork
and modify.
So that includes the federated learning implementations,
but also we will--
it's still kind of emerging, but we'll
have full end to end examples of research we
produced with scripts you can run and modify and do whatever.
And data sets and also the simulation infrastructure
is designed to be modular, so that whatever kind of resources
you might have, whether it's a cluster in a basement
or something else, you can configure things in such a way
that it works on your hardware.
The second equally important audience is practitioners.
And so we want to be able to take all the latest research
and immediately use it in production.
Assuming this is all implemented in TFF,
hopefully that will happen.
And so we've made a number of decisions to support that.
One is the language that I keep mentioning.
The abstractions are designed in such a way
that we're thinking of production deployment
from day one.
Even though production deployment options were not
something we provided on day one,
but they've been on our minds from day one.
Also, we designed the system in such a way
that whatever code you're writing
in TFF to run in a simulation, you
can take the same code, without any changes,
you can move into production.
I'll get to that later.
And also the system is composable,
so that you can pick the things you want and compose them
together and make it work and modify
whatever you want using the pseudocode-like language,
because the code is in a form that you
should be able to actually read it and understand.
And perhaps most importantly, we're
actually eating our own dog food and using it at Google.
So we are investing our resources
to make sure the project evolves in such a way that--
in a way that's relevant for production deployment.
All right.
So I keep mentioning a new language.
Why do we need a new language for federated learning?
The reason for that is that federated programs
are distributed, right?
So they include clients and server and everything
in between.
So communication is an essential part of the program.
It's not just some system's concern that's second thought.
And so it is kind of expected that-- just as in TensorFlow,
you are expected to be engineering your model
architectures and tinkering with models
and adding new operators here and there.
Same of federated learning, except now your data
flow diagram kind of spans the entire network, right?
And so obviously, communication is also something
that you should be able to engineer and play with.
And we want to give you programming language
abstractions that make it super easy to do that.
And things like point to point messaging
or taking and restoring checkpoints,
we've tried to use those.
That's what our initial implementations
of federated learning were like.
It was unreadable.
It was very, very difficult to work with.
So we've designed a new system based on higher level
abstractions as a basis.
And hopefully, you see how this is done in TFF
and that you like it.
Why stress portability between research and production?
You know, when we think about it,
in idealized federated learning environments,
if you can't look at the data, a lot of things
that we take for granted become more interesting.
Like, you know, you can't just look at the data,
so it may not be easy to see where the outliers are
or debug problems with your predictions
or trying on various models.
There are ways to do some of those things,
but they're not obvious.
And so, for example, you may want
to just go ahead and deploy your model into a live system.
Turn on the real devices, maybe in dry mode
so nothing gets affected, but it kind of runs there.
You can see how well it's doing and integrate in this manner.
So the kind of traditional boundary
between production versus research,
all this gets a little bit more fuzzy.
You sometimes may have to experiment in production.
And so because of that, and the general desire
to transfer new research into production ASAP,
it's essential, in our mind, to provide
this kind of portability.
So you write one version of code, and it works.
Whether it's research or simulation or production,
it's the same code.
And a number of decisions in TFF reflect that, like the fact
that everything is kind of language agnostic and platform
agnostic.
And everything is expressed declaratively,
so that you can compile it into different kind of execution
environments.
OK.
So where do you start?
The basis of building programs in TFF
is with federated computation.
This was a generalization of federated learning, so--
we have clients that have sensitive data.
There are very many of them.
They do all the training.
Server orchestrates these computations
and provides continuity over time.
The clients want to be anonymous,
so whatever operations we do have to be an aggregate.
That's, in essence, what defines a federated computation.
How do we create those?
So now let's go through the various abstractions
that we have in TFF one by one.
Values.
This is a set of clients.
Let's say each of them has a temperature sensor that
produces some readings, let's say a floating point number.
We're going to refer to the collective of all those numbers
as a single federated value.
So a federated value can be a multi-set
of those individual contributions from clients,
right?
These federated values have also federated types.
In this case, it's going to federated float clients.
The curly braces indicate that it's a multi-set.
General type consists of the type
of the individual constituents and what
we call a placement, which is essentially
the entity of the group of system participants.
We have a little placement in TFF.
I won't get into it.
But for starters, you would only use clients and servers
as the ones.
Now, suppose we have a server.
And there's a number on the server.
Let's say it's also some float.
We can also call it a federated value in this case.
It's not a multi-set because there's just one sample of it.
So it's a float in the server.
Now let's get to operators.
Suppose there is a distributed aggregation protocol that
is picking up numbers from the clients and depositing,
let's say, the average or something
like that on the server.
So unlike in a program language like Python,
here in TFF you can think of it as a function.
In this case, the inputs to the function
are in different places than the output.
But that's OK because TFF is essentially
a programming framework for creating distributed systems.
This is a little distributive system.
And so you can model this as a function, in fact.
And you can even give it a functional type.
And this function takes a float and clients
and produces a float and server.
In TFF, we also have a little library
of commonly used functions.
Like federated mean will take a federated float and clients
and produce the average of those on the server.
And others are available.
Now, with all that I've introduced,
you can actually start writing programs.
So let's write a very, very simple,
potentially the simplest possible,
federated computation.
It goes like this.
First, TFF is a strongly typed programming language.
And so you always start by defining the types of things.
I mentioned, we have a federated float and clients.
And so there it goes.
Next you're going to actually write a computation.
And so TFF code is not Python code.
But you express it in Python.
It's really the same idea as what you have in TensorFlow.
TensorFlow code is not Python code.
It's TensorFlow.
These are TensorFlow things that are
executed by TensorFlow runtime.
But you can express them in Python.
Python is the language in which you construct it.
It's the same idea here.
So you write a little Python function.
You decorate it as a sort of federated computation.
You specify the federated type of the inputs.
Now, in the body of this Python function,
the sensor readings parameter represents the federated flow
that came on as the input.
And now we can use federated to open TFF to slice and dice
the value.
In this case, we just called for a mean and that's it.
We hit Return.
So now, what happens here is that, just as in TensorFlow,
the Python function gets traced.
And we construct a little TensorFlow computation
representation in a serialized form
and store it underneath that symbol.
It's kind of the same idea as a TF function getting traced
and TensorFlow graph getting stored
in a serialized form behind it.
So that's just what happens.
So when I say that TFF programs are not Python,
that's what it means.
The get average temperature symbol now
represents a serialized representation
of a code in TFF.
And the reason that's important is because, again, we want
to run those things on devices.
And so they're not going to be interpreted by regular Python.
So now let's look at something slightly larger.
Let's say you have a set of clients.
Each of them has a temperature sensor.
And the analyst on the server want
to know what fraction of the clients
have temperatures reading over some threshold.
So have two inputs here, the red and blue.
Data is sensitive.
We can't collect it.
And so what we do instead, we use a federated broadcast
operator to move the threshold from the server to the clients.
Now that every client has both threshold and its own reading,
and they can compare it, run a little block of TensorFlow
to produce one if it's over the threshold and zero otherwise.
And so you can think of this as like a map step in MapReduce.
And we provide it for the map operator for those kinds
of things as well.
Then finally, so what emerges is a federated float
composed of ones and zeros which you can feed as an input
to the federated mean operator and produce
the federation of the server.
So that's it.
That's the whole program in a diagram form.
And now if you want to write code,
it kind of looks the same, except--
it's code.
So you start by defining your Python function decorated
as a TFF computation.
You specify all the inputs as formal parameters.
And so you see the readings input.
These are the temperatures, the threshold on the server.
The inputs can be anywhere, whether its clients on server.
Just list all of them here.
And now in the body of this function,
you can again use federated operators to slice
and dice those things.
So you see the broadcast here again.
You see the map and mean and so on.
The client side processing, I mentioned it was in TensorFlow.
So in this case, the parameter to the map function
that represents this processing is implemented
in ordinary TensorFlow code.
And that's it.
You just slap the types on top of it
to make sure that everything is strongly typed,
because TFF likes things to be strongly typed
into a type object for you.
And that's it.
That's the whole program.
You can go and run it.
And I think we have a version of this in the tutorials as well.
So now we're on a roll, let's try federated training.
And I'm going to show just a small example of what
we have described in a tutorial on the TensorFlow website.
And I'm going to focus just on the computation that represents
a single round of federated averaging,
just like what we have discussed at the very beginning
of this presentation.
So this computation takes three parameters.
There's a model on the server that the server
wants to feed to the clients.
There's a learning rate.
Let's make it interesting.
And there is a set of on-device data.
So the first thing we'll do is, just as
before, we broadcast the model and the learning rates
from the server to the clients.
Now that the clients have everything-- model,
learning rate, and their own slice of data--
they can perform their client-side training.
And likewise, like in the present example,
is the federated map operator for that.
And the local trade function would be another computation,
presumably implemented in TensorFlow
that, I want to show.
It would look as it always does.
And finally, so the map function produces
a set of client-side models, locally trained models.
And now we just code for a mean operator to average them out.
You can apply that operator to any kind of value,
including structured values.
So that's it.
The output is the average of client side models.
And that's the algorithm that we have.
And so that's the whole program.
And so the version of it in the tutorial,
you can see how that actually runs and works.
So this was, of course, a very simplified example.
How we can start extending it and make it more interesting?
Just two very short examples I'm going to show.
One common thing to do is wanting
to inject compression in various places
to address various kinds of systems and concerns.
And so, for example, if you want to compress data
during broadcast, apply encoding on the server
before you broadcast, and then use a federated map
function to decode on the clients after broadcast.
And so you can see how basically two lines of code
get you what you want, with the decode and encode presumably
being implemented in TensorFlow.
Second example, if you want differential privacy,
very easy.
Before you call federated mean to average your values,
you just call a federated map operator to the arguments.
And to add some clipping and noise--
I'm representing here symbolically,
but that's something that you would normally just
write in TensorFlow.
So again, one line of change for a change like this.
And you can sort of imagine other modifications
you can do like this.
So how can you run it?
Even though I mentioned TFF code is not Python,
you can call it in Python like a function.
And it runs in Python.
What happens under the hood, we spawn a little runtime for TFF
and run a simulation there and return the numbers in Python
so it work seamlessly as if it were Python.
So in this case, if you, let's say,
want to run five runs on training,
this is how we'd write it.
It's just kind of what you would expect.
And a full version of it is, again, in the tutorial.
So you just call the computation and get the numbers back.
And the model is represented as a NumPy structure.
Where do you get data for simulations?
You can, of course, make your own.
But we also provide a couple of data sets and many more
on the way, in that you have simulations data sets module.
Each of these has a load data function.
When you call it, you get a pair of Python objects
that represent training and test data.
And these objects allow you to inspect them.
Now, I mentioned before, TFF computations
don't let you deal with individual clients
or their IDs.
So this is things like inspecting
what clients are in my data set, that's something
that you can only do when orchestrating
your simulation in Python.
You can not do it in TFF for privacy reasons.
So in this case, you can look at the client IDs,
for example, so that it can simulate
what I discussed previously, the client selection.
So here you're taking all the clients,
and this is picking a random sample of them.
Those are my clients for this round.
And now I call the trainer object
to construct a TF data data set-- this is an eager data set
in TensorFlow-- for that particular client
and apply whatever pre-processing you want
using the regular TF data APIs.
And once you create a list of those-- those are my clients,
those are my data sets--
you can feed it as an argument into the computation
just as I've shown before.
And it continues fleshing out your little Python loop.
So it's very easy, very natural to do.
If you don't want to implement everything from scratch, as we
sort of did in this tutorial, you
might use one of the canned APIs,
like the tff.learning module.
So for example, here's one function
that constructs federated training computations.
It's easiest to use with Keras if you have Keras.
You don't have to use Keras, but it's much easier if you do.
So if you have a Keras model, you just
code a one-liner function to convert it into a form
that TFF can absorb.
And then these one-liner codes shown here take that model
and construct computations that you can use
for training and evaluation.
And you use them in the same way.
You write little Python loops as those that you've seen before.
So the trained object has an initialized computation.
It has a pair of computations.
Initialized creates state on the server, the initialized state
for the first round.
And then the next computation represents a single round
of training.
So it will take the initial state before the round started
and produce new state after the round completed.
And that state includes the model as well as
various kinds of counters and things like that.
In each round, as you saw before,
we can perform client selection and simulate
various kinds of system behavior and things like that.
So it's very easy to use.
And same for evaluation.
It can take that final state after training,
extract the model out of it, and feed it
to the evaluation computation.
So the eval is a computation.
Again, you just call it like a Python function.
And that gets you the metric back, and things like this.
So by default, when you just invoke computations
like functions, as I've shown, it kind of all just
runs on your machine in your process.
There are various ways to speed it up.
We provided a helpful framework for constructing simulation run
times.
Right now there is one ready-to-use solution.
If you want to run multi-threaded simulations,
this snippet of code that I'm showing here, with one line,
you create a local executor that has multiple threads in it
and then make it the default. And then whatever you type
will run in that.
If you want something more powerful, not long from now
we'll have a kind of all-inclusive, ready-to-use
solution for running things on Google Cloud and Kubernetes
in a multi-machine setting.
If you don't want to wait for that,
you can actually just go and stitch it up yourself.
Because all the components are basically there
in that tff.framework namespace.
And those include various kinds of little executors
you can stack up together in an executor
stack that you can use to construct
the various multi-machine architectures
with multiple tiers of aggregation, support for GPUs,
and things like that.
And it's designed to be extensible
so that people can plug in various kinds of components
into it.
Now, if you want to go beyond just running simulations,
it is also possible.
For that, the options are still emerging.
But the two that already exist are on the table.
It may involve a bit of effort, but it's possible.
One is you can actually plug in your physical devices
into the simulation framework.
So for example, you can implement a simple GRPC backend
interface that we supply, say, to run on your Arduino device
or something.
And then you can plug that as a little worker node
into a simulation framework.
And now you can run on your physical devices.
That's not something you would use for a large scale
production setting.
But it's certainly doable for smaller scale experiments.
And also, we have an emerging set of compiler tools
that take TFF computations and transform them
into a form that's more amenable for execution
in a particular kind of backend.
So for example, there is a body of code emerging that
supports MapReduce-like systems, that
takes computations and makes them look like MapReduces so
that we can run it on Hadoop or something.
It's usable, not quite finished, but somewhat usable.
If you're interested in pursuing either of those options,
I'd be happy to discuss them.
And more deployment options on the way.
I can't really talk about them.
But stay tuned for updates.
If you need something that we haven't provided,
this is intended to be an open framework and a community
project.
So by all means, please contribute.
Just implement it and send it to people, requests,
so that everyone can benefit.
There are many ways you can contribute.
If you're a modeler, you can contribute models and data sets
and things like that.
If you're interested in machine learning-- federated learning
algorithms, you can contribute algorithms to the framework
or help us re-architect it to make it easier to use.
Contribute core abstractions, also new types of backends.
As I mentioned, this backend support
for actually deploying things is emerging.
And if you have ideas, perhaps you can contribute to the TFF.
That's all I have.
Thank you very much.
[APPLAUSE]
AUDIENCE: So this sort of changes the way
that you create a model.
I have two questions about that.
When you start with a model, do you start with some [INAUDIBLE]
data to create an initial value that you will then
start the clients with?
And then secondly, do you ever re-deploy the average model
back to the clients?
Or do clients sort of spin off on their own--
CREW: Sorry to interrupt.
Do you mind starting over?
AUDIENCE: So the two questions are,
when clients start learning on their own data
and then you have an averaged model on the server,
do you ever send the averaged model back
to the clients for performance boost?
Or do clients just spin off on their own afterwards?
And then the second question is, how do you start to model?
Do you use proxy data initially?
And how do you iterate with your model's accuracy and things
like that?
SPEAKER: Yeah.
So for the first question, in a system
we have running in production, the way it works--
and that's different from TFF.
That's just a deployed platform.
And so there are many ways you can engineer this.
But just talking about the particular example,
our production system, the clients periodically
come back to the server.
So every time clients get involved
in a new round of training, they automatically
get that new model.
So that's one way you can arrange for this to happen.
That's probably the easiest.
So you're kind of contributing as well as benefiting
by getting the latest.
And the other question was how do you
get started on building models.
And so, if you do have proxy data and you think it's useful,
then it certainly helps to play with it.
At least you can get some idea of what
model architectures are good.
You can never be sure because proxy data is only so good.
And if you never looked at the on-device data,
you'd never really know for sure how good
your proxy data might be.
So you might use proxy data.
But you might also choose not to.
You can simply try different model architectures,
deploy them on devices in, like as I mentioned, dry mode.
So it would be kind of running on devices
and getting evaluated but not affecting
anything other than consuming a bit of resources.
You could deploy hundreds of those at the same time
on different subsets of the clients
and see which are the most promising.
That second route would be more of a pure approach that
applies to any kind of on-device data,
including when you have absolutely no idea
where to get proxy data.
Like some weird sensor data might look like that.
And both are possible.
AUDIENCE: So first question I have is, does the TFF library--
does it integrate with TF Light?
And the second question I have is,
since it's language platform agnostic,
are you able to use it in non-Python--
can I use it in the language that's not Python?
KRZYSZTOF OSTROWSKI: OK.
Let me start from the second one.
So TFF computations are not Python.
I think I had a link on the slide.
If not, I can follow up later.
There's a protocol definition that describes
what a TFF computation is.
And it's a data structure that has absolutely no relationship
to Python.
So yeah, you could take it and you could execute it
in a completely different environment that
has nothing to do with Python.
And TensorFlow code inside of that computation
is represented as GraphDefs, TensorFlow GraphDefs.
So if you were to round it on a different kind of TensorFlow
run time, to the extent you can take those GraphDefs
and convert them for that other runtime,
maybe converting the ops or whatever,
that's also an option.
So TFF itself doesn't integrate with TF Light because TFF
itself does not include a platform
for on-device execution.
TFF is more like--
the best way to think of it is more like a compiler framework
in a dev environment.
But yes, you could use it with TF Light.
So you could define your computations
and maybe apply some conversion tools
to convert all the TensorFlow computations into a form
that TF Light can absorb and then
arrange for it to be executed.
AUDIENCE: Thank you.
AUDIENCE: Good talk.
Thank you.
I had a couple of questions.
So does the client--
do the models train until convergence?
KRZYSZTOF OSTROWSKI: Say that again.
Clients--
AUDIENCE: The clients, do they train until convergence?
Do they, or--
SPEAKER: No.
Typically, you would make a few passes over the client data
sets.
Because you don't have to train for convergence.
You're going to run 10,000 rounds anyway.
So doesn't matter.
AUDIENCE: And when the average model doesn't have access
to the data, how do you measure its performance and how
do you know it's good enough to now deploy--
send it back to all the clients?
KRZYSZTOF OSTROWSKI: Sorry.
If average model is--
AUDIENCE: So the average model is on your local server.
And then you don't have access to the data.
How do you measure the performance
of the average model?
How do you know when to deploy that model back?
KRZYSZTOF OSTROWSKI: Yeah.
So I did not describe federated evaluation.
But basically, it's like the temperature sensor example.
You can take that model, broadcast it to the clients.
Now the clients have the model and the data.
They can evaluate.
Each produces some accuracy metric,
average those out or compute a distribution.
And there you go.
So federated evaluation is kind of the same idea, just simpler.
AUDIENCE: OK.
And another question was, is there a way in federated
learning in TensorFlow where you can share parts of--
for example, the clients--
KRZYSZTOF OSTROWSKI: Sorry.
Share what?
AUDIENCE: So the clients have different labels, assuming,
but they have similar data.
Is there a mechanism where you can say the client shares most
of the model but they have their own couple of layers for them?
Maybe the last layer of the network
is specific to the client but not shared across clients.
Or does the entire model have to be shared across all clients?
KRZYSZTOF OSTROWSKI: Yeah.
It's not a capability that we include at the moment.
But it sounds like conceivably something we could do.
Maybe you can follow up with that.
Maybe you can contribute.
AUDIENCE: Thanks.
AUDIENCE: So one question that I had was,
when you kind of aggregate all of these models
into a central server, it seems like one of the problems
that federated learning solves is, I guess,
distributing computation.
But when you get to like a million people using the Google
keyboard, or a lot more actually,
it seems like either the server is
going to have to reject some gradient computations,
or there is some hierarchical aggregation
system where you aggregate the models upstream or whatever.
So I'm wondering if the second is true.
Are there latency issues with gradients reaching
the central model by the time that the model's changed
so much that it might corrupt it a little bit?
SPEAKER: So a couple of things.
First, this is not the same as gradient descent in the sense
that each client does a whole bunch of computation.
It trains for a while.
So what clients send to the server are not gradients.
They're updates, differences between trained models
and initial models that include a whole bunch of clients
in training.
That's just one thing.
The second one, with respect to which
clients have to participate in computation, so not
all clients.
If you, say, have 1 million clients,
you could pick 1,000 client samples.
And first make an iteration of the model on the first 1,000
clients.
Then make iteration on another 1,000 clients.
You don't have to include all the clients at once.
The only thing that matters is that eventually most clients
participate, so that most clients have a chance
to influence the training process at some point.
But they don't have to simultaneously be present.
But with respect to hierarchical aggregations, that's also true.
So both are true.
You do have hierarchical aggregations in our system
because you don't want a single server to be
talking to 10,000 machines.
But you also don't have to include the entire population
in training.
I think I answered all of them.
All right.
Thank you.
[APPLAUSE]