Subtitles section Play video Print subtitles KRZYSZTOF OSTROWSKI: My name is Chris. I'm leading the TensorFlow Federated in Seattle. And I'm here to tell you about federated learning and the platform we've built to support it. There are two parts of this talk. First I'll talk about federated learning, how it works and why, and then I'll switch to talk about the platform. All right, let's do it. And this is a machine learning story, so it begins with data, of course. And today, the most exciting data out there is born decentralized on billions of personal devices, like cell phones. So how can we create intelligence and better products from data that's decentralized? Traditionally, what we do is that there's a server in the clouds that is hosting the machine learning model in TensorFlow, and clients all talk to it to make predictions on their behalf. And as they do, the client data accumulates on the server right next to the model. So model and data, that's all in one place. Very easy. We can use traditional techniques that we all know. And what's also great about this scenario is that the same model is exposed to data from all clients and so-- pushing millions of clients, and so it's very efficient. All right. If it's so good, why change that, right? Well actually, in some applications, it's not so great. First it doesn't work offline. There's high latency, so applications that need fast turnaround may not work. All these network communications consuming battery life and bandwidth. And some data is too sensitive, so collecting is not an option, or too large. Some sensitive data could be large. OK. What can we do? Maybe we go to the complete other extreme. So ditch the server in the clouds. Now each client is its own client bubble, right? It has its own TensorFlow run time, its own model. And it's training. It's grinding over its data to train and doesn't communicate with anything. So now, of course, nothing leaves the device. None of the concerns from the preceding slide apply, but you have other problems. A single client just doesn't have enough data, very often. It doesn't have enough data to create a good model on its own. So this doesn't always work. What if we bring the several back, but the clients are actually only receiving data from the server? Could that work? So if you have some proxy data on the server that's similar to the on-device data, you could use it. You could pre-train the model on the server, then deploy it to clients, and then let it potentially evolve further. So that could work. Except, very often, there's no good proxy data or not enough of it for the kinds of on-device data you're interested in. A second problem is that this here, the intelligence we're creating is kind of frozen in time, in the sense that, as I mentioned, clients won't be able to do a whole lot on their own. And why does it matter? And here's one concrete example from actual production application. Consider a smart keyboard that's trying to learn to autocomplete. If you train a model in the server and deploy it, now suddenly millions of people start using a new word, what happens? You'd think, hey, it's a strong signal, millions of people. But if you're not one of those millions, your phone has no clue, right? And so it could take a lot of punching into that phone to make it notice that something new has happened, right? So yeah, this is not what we want. We really need the clients so somehow contribute back towards the common good so they can all benefit. Federated learning is one way to do that. Here we start with initial model provided by the server. This one is not pre-trained. We don't assume we have proxy data. It doesn't matter. It can be just 0s. So we send it to the client. The client now trains it locally on its own data. And this is more than just one step of gradient descent, but it's also now training to convergence. Typically, you would just make a few passes over the data on the clients and then produce a locally trained model and send it to the server. And now all the clients are training independently, but they all use the same initial model to start with. And the server's job is to orchestrate this process to make it happen and produce the same-- feed the same initial model to all the clients. So now once the server collects the locally trained models from clients, it aggregates them into a so-called federated model. And typically what we do is simply average the model parameters across all clients. So the server just adds the numbers and that's it. So this federated model, it has been influenced by data from our clients, right? Because it's been influenced by the client models, and those, in turn, have been influenced by client data. So we do get those benefits of scale in this scenario, so that's great. But there's one question. What happens to privacy? So let's look at this closely. First, client data never left the device. Only the models trained on this data was shared. So next, the server does not retain, store, any of the client models. It simply adds them up and then throws them away. It deletes them, right? So they are ephemeral. But here they're asking how they know that this is what the server is doing. Maybe the server is secretly, somehow, logging something on site. So there are cryptographic protocols that we can use to ensure that that's all legit. So with those protocols, the server will only see the final result of aggregation that will not have access to any of your client contributions. And we use those in practice, so hopefully to put your mind at rest. So the server only ever sees the final aggregate. You can still wonder how do we know that that doesn't contain anything sensitive. So this is where you would use differential privacy. In a nutshell, each client keeps its updates and adds a little bit of noise. So once the final aggregate emerges of the server, there's enough noise to sort of mask out any of the individual contributions, but there is still enough signal to make progress. So not to get too much into the detail, but this is also a technique we use in production. Differential privacy is an established and a commonly used way to provide anonymity. If you have any more concerns, I'll be happy to discuss them offline. So how does it work in practice? Firstly, it's not enough to just do it once. So once you produce a federated model, you'll feed it back on the server as an initial model for the next round, then execute many thousands of rounds, potentially. That's how long it takes to converge. And so in this scenario, both clients and server have a role. Clients are doing all the learning. That's where all the machine learning sits. And server is just orchestrating the process, aggregating and also providing continuity to this process as we move from one round to another, because the server is what carries the state between rounds. And to drill into this a little bit more, in the practical applications, clients are not all available at the same time. You remember those concerns I mentioned about consuming battery life and bandwidth. We want to give users a good experience, so we want it to be non-disruptive. So we will only perform training when the device is connected into the power source on the Wi-Fi network and idles, so that the user is not negatively affected. And so that means, out of the billions of clients out there, only a small fraction are available at any given time for training. And this is illustrated on this diagram here. This is from an actual production system. We can kind of see, when you track the number of rounds per hour the server completes across time, you can see it kind of maxes out at night, when everyone's asleep and their phone is connected, and beeps at lunch when everybody's punching into their phone while eating. So yeah. So the clients keep coming and going. And so that means that, as we move across rounds from one round to another, the sort of participating clients will change for various reasons, including that some of them lose connectivity. So clients can, in general, drop out at any time. So that's why, in an actual production system when it's deployed, there's always a client selection phase where the exact set of participants is chosen. And there are many factors that go into it, including concepts about bias. But the for this talk, all that's important to remember is that the sort of clients in each round is different. So in a nutshell, the characteristics of production scenarios-- at a glance, there are many of them-- millions, billions. They don't talk to each other. These are cell phones, so no peer to peer connectivity. Communication is the bottleneck in the whole system. Clients want to be anonymous. And, for the most part, they're interchangeable in the sense that, in the grand scheme of things, whether a particular device contributed data or not, it doesn't really affect the result in any way. And the clients are unavailable and they can drop out at any time. Therefore, we have to effectively consider them as stateless. Even if they have some memory, there's no guarantee when they'll be back. So we treat them as stateless low compute nodes. And finally, the distribution of data in clients is very non-uniform because people defer. So it is only for mobile devices? No, not at all. You could use federated learning for things like a group of hospitals wanting to learn something together or a group of financial institutions. So the general approach is the same. Of course, the details will differ a little bit. In this case, clients are very reliable, potentially very capable. But there are fewer of them. So for some of the cryptographic protocols that we're using, they work better when there are more clients. And so here, you may have to work harder or use some more specialized files. So how well does it work in practice? We've deployed it on Google in several applications, including the smart keyboard that I mentioned. So it runs in production on millions of devices. And when you compare the performance of an autocomplete model that learns on federated data, it's clearly better-- a hierarchy. We have some more user clicks than the former model trained on the server. This is illustrated in some of these diagrams here. We can see on the right side, the federated model stabilizes with a better performance. And the reason for that is that the on-device data is the good data, the higher quality data than the proxy data on the server. Also, I mentioned before that non-federated models where limited, and they wouldn't necessarily be able to adapt to changes in the environment and pick up changes over time. And so here, we demonstrate the federated model can actually learn new words that were not initially in the vocabulary and notice that the people are using them and include them. So it's worth pointing out, this was definitely one example of an application we definitely want to use. Differentiate privacy to make sure that the only thing you're learning is common things and that nothing sensitive gets through. So it worked at Google. Of course, what you really want to know is if it will work for your application. So some rough guidelines here. Mostly common sense stuff-- like if their own device data is high quality or if it's sensitive or large, good reason to use federated learning. Of course, you also need the labels for training. And so we can't pay someone to go and label the data, because it's on-device. We can access it. So in some cases, the labels are just part of the data. Like in the smart keyboard, you know, all the characters you're trying to predict, people will eventually type those characters. And so that's what the labels are. In some cases, you will have to work harder to wire up additional signal into your application to have those labels. But other than that, it's an-- FL is a new area of active research. Many variants, many extensions exist in lots of publications-- hundreds of publications. Several workshops just this year, one of them organized at Google. So you see a little picture of us in this workshop. Yeah, so it's not guaranteed that any particular solution will immediately work for you. You have to just try things out and see what works. And what we have to all collectively do to advance this area, this promising field, is to explore together. And so that's why we've built TensorFlow Federated. And so let's get to it. All right. So TensorFlow Federated. What is it? Development environment that is designed specifically for federated learning, although it's also applicable to marginal kinds of computations that I will get to in a minute. It provides a new programming language that's-- it's not the interface that's embedded in Python, so you kind of don't notice it. But there is actually kind of a programming underneath that combines TensorFlow and distributed communication. In that language, we have implemented a number of federated algorithms. And so we provide everything you need for simulations. So the runtimes, data sets, and everything is there. It's part of TensorFlow, and it's on GitHub, so everything is open source and modifiable. Whom is it for, though? Two main audiences. One is the researchers. Here what we to enable is for people to very quickly get started. And so we provide this pseudocode-like, high-level-- language with pseudo high-level abstractions so that it's very easy for you to express your ideas in a way that's super compact and you can see what you're doing. Also, a number of things you can copy, paste, and fork and modify. So that includes the federated learning implementations, but also we will-- it's still kind of emerging, but we'll have full end to end examples of research we produced with scripts you can run and modify and do whatever. And data sets and also the simulation infrastructure is designed to be modular, so that whatever kind of resources you might have, whether it's a cluster in a basement or something else, you can configure things in such a way that it works on your hardware. The second equally important audience is practitioners. And so we want to be able to take all the latest research and immediately use it in production. Assuming this is all implemented in TFF, hopefully that will happen. And so we've made a number of decisions to support that. One is the language that I keep mentioning. The abstractions are designed in such a way that we're thinking of production deployment from day one. Even though production deployment options were not something we provided on day one, but they've been on our minds from day one. Also, we designed the system in such a way that whatever code you're writing in TFF to run in a simulation, you can take the same code, without any changes, you can move into production. I'll get to that later. And also the system is composable, so that you can pick the things you want and compose them together and make it work and modify whatever you want using the pseudocode-like language, because the code is in a form that you should be able to actually read it and understand. And perhaps most importantly, we're actually eating our own dog food and using it at Google. So we are investing our resources to make sure the project evolves in such a way that-- in a way that's relevant for production deployment. All right. So I keep mentioning a new language. Why do we need a new language for federated learning? The reason for that is that federated programs are distributed, right? So they include clients and server and everything in between. So communication is an essential part of the program. It's not just some system's concern that's second thought. And so it is kind of expected that-- just as in TensorFlow, you are expected to be engineering your model architectures and tinkering with models and adding new operators here and there. Same of federated learning, except now your data flow diagram kind of spans the entire network, right? And so obviously, communication is also something that you should be able to engineer and play with. And we want to give you programming language abstractions that make it super easy to do that. And things like point to point messaging or taking and restoring checkpoints, we've tried to use those. That's what our initial implementations of federated learning were like. It was unreadable. It was very, very difficult to work with. So we've designed a new system based on higher level abstractions as a basis. And hopefully, you see how this is done in TFF and that you like it. Why stress portability between research and production? You know, when we think about it, in idealized federated learning environments, if you can't look at the data, a lot of things that we take for granted become more interesting. Like, you know, you can't just look at the data, so it may not be easy to see where the outliers are or debug problems with your predictions or trying on various models. There are ways to do some of those things, but they're not obvious. And so, for example, you may want to just go ahead and deploy your model into a live system. Turn on the real devices, maybe in dry mode so nothing gets affected, but it kind of runs there. You can see how well it's doing and integrate in this manner. So the kind of traditional boundary between production versus research, all this gets a little bit more fuzzy. You sometimes may have to experiment in production. And so because of that, and the general desire to transfer new research into production ASAP, it's essential, in our mind, to provide this kind of portability. So you write one version of code, and it works. Whether it's research or simulation or production, it's the same code. And a number of decisions in TFF reflect that, like the fact that everything is kind of language agnostic and platform agnostic. And everything is expressed declaratively, so that you can compile it into different kind of execution environments. OK. So where do you start? The basis of building programs in TFF is with federated computation. This was a generalization of federated learning, so-- we have clients that have sensitive data. There are very many of them. They do all the training. Server orchestrates these computations and provides continuity over time. The clients want to be anonymous, so whatever operations we do have to be an aggregate. That's, in essence, what defines a federated computation. How do we create those? So now let's go through the various abstractions that we have in TFF one by one. Values. This is a set of clients. Let's say each of them has a temperature sensor that produces some readings, let's say a floating point number. We're going to refer to the collective of all those numbers as a single federated value. So a federated value can be a multi-set of those individual contributions from clients, right? These federated values have also federated types. In this case, it's going to federated float clients. The curly braces indicate that it's a multi-set. General type consists of the type of the individual constituents and what we call a placement, which is essentially the entity of the group of system participants. We have a little placement in TFF. I won't get into it. But for starters, you would only use clients and servers as the ones. Now, suppose we have a server. And there's a number on the server. Let's say it's also some float. We can also call it a federated value in this case. It's not a multi-set because there's just one sample of it. So it's a float in the server. Now let's get to operators. Suppose there is a distributed aggregation protocol that is picking up numbers from the clients and depositing, let's say, the average or something like that on the server. So unlike in a program language like Python, here in TFF you can think of it as a function. In this case, the inputs to the function are in different places than the output. But that's OK because TFF is essentially a programming framework for creating distributed systems. This is a little distributive system. And so you can model this as a function, in fact. And you can even give it a functional type. And this function takes a float and clients and produces a float and server. In TFF, we also have a little library of commonly used functions. Like federated mean will take a federated float and clients and produce the average of those on the server. And others are available. Now, with all that I've introduced, you can actually start writing programs. So let's write a very, very simple, potentially the simplest possible, federated computation. It goes like this. First, TFF is a strongly typed programming language. And so you always start by defining the types of things. I mentioned, we have a federated float and clients. And so there it goes. Next you're going to actually write a computation. And so TFF code is not Python code. But you express it in Python. It's really the same idea as what you have in TensorFlow. TensorFlow code is not Python code. It's TensorFlow. These are TensorFlow things that are executed by TensorFlow runtime. But you can express them in Python. Python is the language in which you construct it. It's the same idea here. So you write a little Python function. You decorate it as a sort of federated computation. You specify the federated type of the inputs. Now, in the body of this Python function, the sensor readings parameter represents the federated flow that came on as the input. And now we can use federated to open TFF to slice and dice the value. In this case, we just called for a mean and that's it. We hit Return. So now, what happens here is that, just as in TensorFlow, the Python function gets traced. And we construct a little TensorFlow computation representation in a serialized form and store it underneath that symbol. It's kind of the same idea as a TF function getting traced and TensorFlow graph getting stored in a serialized form behind it. So that's just what happens. So when I say that TFF programs are not Python, that's what it means. The get average temperature symbol now represents a serialized representation of a code in TFF. And the reason that's important is because, again, we want to run those things on devices. And so they're not going to be interpreted by regular Python. So now let's look at something slightly larger. Let's say you have a set of clients. Each of them has a temperature sensor. And the analyst on the server want to know what fraction of the clients have temperatures reading over some threshold. So have two inputs here, the red and blue. Data is sensitive. We can't collect it. And so what we do instead, we use a federated broadcast operator to move the threshold from the server to the clients. Now that every client has both threshold and its own reading, and they can compare it, run a little block of TensorFlow to produce one if it's over the threshold and zero otherwise. And so you can think of this as like a map step in MapReduce. And we provide it for the map operator for those kinds of things as well. Then finally, so what emerges is a federated float composed of ones and zeros which you can feed as an input to the federated mean operator and produce the federation of the server. So that's it. That's the whole program in a diagram form. And now if you want to write code, it kind of looks the same, except-- it's code. So you start by defining your Python function decorated as a TFF computation. You specify all the inputs as formal parameters. And so you see the readings input. These are the temperatures, the threshold on the server. The inputs can be anywhere, whether its clients on server. Just list all of them here. And now in the body of this function, you can again use federated operators to slice and dice those things. So you see the broadcast here again. You see the map and mean and so on. The client side processing, I mentioned it was in TensorFlow. So in this case, the parameter to the map function that represents this processing is implemented in ordinary TensorFlow code. And that's it. You just slap the types on top of it to make sure that everything is strongly typed, because TFF likes things to be strongly typed into a type object for you. And that's it. That's the whole program. You can go and run it. And I think we have a version of this in the tutorials as well. So now we're on a roll, let's try federated training. And I'm going to show just a small example of what we have described in a tutorial on the TensorFlow website. And I'm going to focus just on the computation that represents a single round of federated averaging, just like what we have discussed at the very beginning of this presentation. So this computation takes three parameters. There's a model on the server that the server wants to feed to the clients. There's a learning rate. Let's make it interesting. And there is a set of on-device data. So the first thing we'll do is, just as before, we broadcast the model and the learning rates from the server to the clients. Now that the clients have everything-- model, learning rate, and their own slice of data-- they can perform their client-side training. And likewise, like in the present example, is the federated map operator for that. And the local trade function would be another computation, presumably implemented in TensorFlow that, I want to show. It would look as it always does. And finally, so the map function produces a set of client-side models, locally trained models. And now we just code for a mean operator to average them out. You can apply that operator to any kind of value, including structured values. So that's it. The output is the average of client side models. And that's the algorithm that we have. And so that's the whole program. And so the version of it in the tutorial, you can see how that actually runs and works. So this was, of course, a very simplified example. How we can start extending it and make it more interesting? Just two very short examples I'm going to show. One common thing to do is wanting to inject compression in various places to address various kinds of systems and concerns. And so, for example, if you want to compress data during broadcast, apply encoding on the server before you broadcast, and then use a federated map function to decode on the clients after broadcast. And so you can see how basically two lines of code get you what you want, with the decode and encode presumably being implemented in TensorFlow. Second example, if you want differential privacy, very easy. Before you call federated mean to average your values, you just call a federated map operator to the arguments. And to add some clipping and noise-- I'm representing here symbolically, but that's something that you would normally just write in TensorFlow. So again, one line of change for a change like this. And you can sort of imagine other modifications you can do like this. So how can you run it? Even though I mentioned TFF code is not Python, you can call it in Python like a function. And it runs in Python. What happens under the hood, we spawn a little runtime for TFF and run a simulation there and return the numbers in Python so it work seamlessly as if it were Python. So in this case, if you, let's say, want to run five runs on training, this is how we'd write it. It's just kind of what you would expect. And a full version of it is, again, in the tutorial. So you just call the computation and get the numbers back. And the model is represented as a NumPy structure. Where do you get data for simulations? You can, of course, make your own. But we also provide a couple of data sets and many more on the way, in that you have simulations data sets module. Each of these has a load data function. When you call it, you get a pair of Python objects that represent training and test data. And these objects allow you to inspect them. Now, I mentioned before, TFF computations don't let you deal with individual clients or their IDs. So this is things like inspecting what clients are in my data set, that's something that you can only do when orchestrating your simulation in Python. You can not do it in TFF for privacy reasons. So in this case, you can look at the client IDs, for example, so that it can simulate what I discussed previously, the client selection. So here you're taking all the clients, and this is picking a random sample of them. Those are my clients for this round. And now I call the trainer object to construct a TF data data set-- this is an eager data set in TensorFlow-- for that particular client and apply whatever pre-processing you want using the regular TF data APIs. And once you create a list of those-- those are my clients, those are my data sets-- you can feed it as an argument into the computation just as I've shown before. And it continues fleshing out your little Python loop. So it's very easy, very natural to do. If you don't want to implement everything from scratch, as we sort of did in this tutorial, you might use one of the canned APIs, like the tff.learning module. So for example, here's one function that constructs federated training computations. It's easiest to use with Keras if you have Keras. You don't have to use Keras, but it's much easier if you do. So if you have a Keras model, you just code a one-liner function to convert it into a form that TFF can absorb. And then these one-liner codes shown here take that model and construct computations that you can use for training and evaluation. And you use them in the same way. You write little Python loops as those that you've seen before. So the trained object has an initialized computation. It has a pair of computations. Initialized creates state on the server, the initialized state for the first round. And then the next computation represents a single round of training. So it will take the initial state before the round started and produce new state after the round completed. And that state includes the model as well as various kinds of counters and things like that. In each round, as you saw before, we can perform client selection and simulate various kinds of system behavior and things like that. So it's very easy to use. And same for evaluation. It can take that final state after training, extract the model out of it, and feed it to the evaluation computation. So the eval is a computation. Again, you just call it like a Python function. And that gets you the metric back, and things like this. So by default, when you just invoke computations like functions, as I've shown, it kind of all just runs on your machine in your process. There are various ways to speed it up. We provided a helpful framework for constructing simulation run times. Right now there is one ready-to-use solution. If you want to run multi-threaded simulations, this snippet of code that I'm showing here, with one line, you create a local executor that has multiple threads in it and then make it the default. And then whatever you type will run in that. If you want something more powerful, not long from now we'll have a kind of all-inclusive, ready-to-use solution for running things on Google Cloud and Kubernetes in a multi-machine setting. If you don't want to wait for that, you can actually just go and stitch it up yourself. Because all the components are basically there in that tff.framework namespace. And those include various kinds of little executors you can stack up together in an executor stack that you can use to construct the various multi-machine architectures with multiple tiers of aggregation, support for GPUs, and things like that. And it's designed to be extensible so that people can plug in various kinds of components into it. Now, if you want to go beyond just running simulations, it is also possible. For that, the options are still emerging. But the two that already exist are on the table. It may involve a bit of effort, but it's possible. One is you can actually plug in your physical devices into the simulation framework. So for example, you can implement a simple GRPC backend interface that we supply, say, to run on your Arduino device or something. And then you can plug that as a little worker node into a simulation framework. And now you can run on your physical devices. That's not something you would use for a large scale production setting. But it's certainly doable for smaller scale experiments. And also, we have an emerging set of compiler tools that take TFF computations and transform them into a form that's more amenable for execution in a particular kind of backend. So for example, there is a body of code emerging that supports MapReduce-like systems, that takes computations and makes them look like MapReduces so that we can run it on Hadoop or something. It's usable, not quite finished, but somewhat usable. If you're interested in pursuing either of those options, I'd be happy to discuss them. And more deployment options on the way. I can't really talk about them. But stay tuned for updates. If you need something that we haven't provided, this is intended to be an open framework and a community project. So by all means, please contribute. Just implement it and send it to people, requests, so that everyone can benefit. There are many ways you can contribute. If you're a modeler, you can contribute models and data sets and things like that. If you're interested in machine learning-- federated learning algorithms, you can contribute algorithms to the framework or help us re-architect it to make it easier to use. Contribute core abstractions, also new types of backends. As I mentioned, this backend support for actually deploying things is emerging. And if you have ideas, perhaps you can contribute to the TFF. That's all I have. Thank you very much. [APPLAUSE] AUDIENCE: So this sort of changes the way that you create a model. I have two questions about that. When you start with a model, do you start with some [INAUDIBLE] data to create an initial value that you will then start the clients with? And then secondly, do you ever re-deploy the average model back to the clients? Or do clients sort of spin off on their own-- CREW: Sorry to interrupt. Do you mind starting over? AUDIENCE: So the two questions are, when clients start learning on their own data and then you have an averaged model on the server, do you ever send the averaged model back to the clients for performance boost? Or do clients just spin off on their own afterwards? And then the second question is, how do you start to model? Do you use proxy data initially? And how do you iterate with your model's accuracy and things like that? SPEAKER: Yeah. So for the first question, in a system we have running in production, the way it works-- and that's different from TFF. That's just a deployed platform. And so there are many ways you can engineer this. But just talking about the particular example, our production system, the clients periodically come back to the server. So every time clients get involved in a new round of training, they automatically get that new model. So that's one way you can arrange for this to happen. That's probably the easiest. So you're kind of contributing as well as benefiting by getting the latest. And the other question was how do you get started on building models. And so, if you do have proxy data and you think it's useful, then it certainly helps to play with it. At least you can get some idea of what model architectures are good. You can never be sure because proxy data is only so good. And if you never looked at the on-device data, you'd never really know for sure how good your proxy data might be. So you might use proxy data. But you might also choose not to. You can simply try different model architectures, deploy them on devices in, like as I mentioned, dry mode. So it would be kind of running on devices and getting evaluated but not affecting anything other than consuming a bit of resources. You could deploy hundreds of those at the same time on different subsets of the clients and see which are the most promising. That second route would be more of a pure approach that applies to any kind of on-device data, including when you have absolutely no idea where to get proxy data. Like some weird sensor data might look like that. And both are possible. AUDIENCE: So first question I have is, does the TFF library-- does it integrate with TF Light? And the second question I have is, since it's language platform agnostic, are you able to use it in non-Python-- can I use it in the language that's not Python? KRZYSZTOF OSTROWSKI: OK. Let me start from the second one. So TFF computations are not Python. I think I had a link on the slide. If not, I can follow up later. There's a protocol definition that describes what a TFF computation is. And it's a data structure that has absolutely no relationship to Python. So yeah, you could take it and you could execute it in a completely different environment that has nothing to do with Python. And TensorFlow code inside of that computation is represented as GraphDefs, TensorFlow GraphDefs. So if you were to round it on a different kind of TensorFlow run time, to the extent you can take those GraphDefs and convert them for that other runtime, maybe converting the ops or whatever, that's also an option. So TFF itself doesn't integrate with TF Light because TFF itself does not include a platform for on-device execution. TFF is more like-- the best way to think of it is more like a compiler framework in a dev environment. But yes, you could use it with TF Light. So you could define your computations and maybe apply some conversion tools to convert all the TensorFlow computations into a form that TF Light can absorb and then arrange for it to be executed. AUDIENCE: Thank you. AUDIENCE: Good talk. Thank you. I had a couple of questions. So does the client-- do the models train until convergence? KRZYSZTOF OSTROWSKI: Say that again. Clients-- AUDIENCE: The clients, do they train until convergence? Do they, or-- SPEAKER: No. Typically, you would make a few passes over the client data sets. Because you don't have to train for convergence. You're going to run 10,000 rounds anyway. So doesn't matter. AUDIENCE: And when the average model doesn't have access to the data, how do you measure its performance and how do you know it's good enough to now deploy-- send it back to all the clients? KRZYSZTOF OSTROWSKI: Sorry. If average model is-- AUDIENCE: So the average model is on your local server. And then you don't have access to the data. How do you measure the performance of the average model? How do you know when to deploy that model back? KRZYSZTOF OSTROWSKI: Yeah. So I did not describe federated evaluation. But basically, it's like the temperature sensor example. You can take that model, broadcast it to the clients. Now the clients have the model and the data. They can evaluate. Each produces some accuracy metric, average those out or compute a distribution. And there you go. So federated evaluation is kind of the same idea, just simpler. AUDIENCE: OK. And another question was, is there a way in federated learning in TensorFlow where you can share parts of-- for example, the clients-- KRZYSZTOF OSTROWSKI: Sorry. Share what? AUDIENCE: So the clients have different labels, assuming, but they have similar data. Is there a mechanism where you can say the client shares most of the model but they have their own couple of layers for them? Maybe the last layer of the network is specific to the client but not shared across clients. Or does the entire model have to be shared across all clients? KRZYSZTOF OSTROWSKI: Yeah. It's not a capability that we include at the moment. But it sounds like conceivably something we could do. Maybe you can follow up with that. Maybe you can contribute. AUDIENCE: Thanks. AUDIENCE: So one question that I had was, when you kind of aggregate all of these models into a central server, it seems like one of the problems that federated learning solves is, I guess, distributing computation. But when you get to like a million people using the Google keyboard, or a lot more actually, it seems like either the server is going to have to reject some gradient computations, or there is some hierarchical aggregation system where you aggregate the models upstream or whatever. So I'm wondering if the second is true. Are there latency issues with gradients reaching the central model by the time that the model's changed so much that it might corrupt it a little bit? SPEAKER: So a couple of things. First, this is not the same as gradient descent in the sense that each client does a whole bunch of computation. It trains for a while. So what clients send to the server are not gradients. They're updates, differences between trained models and initial models that include a whole bunch of clients in training. That's just one thing. The second one, with respect to which clients have to participate in computation, so not all clients. If you, say, have 1 million clients, you could pick 1,000 client samples. And first make an iteration of the model on the first 1,000 clients. Then make iteration on another 1,000 clients. You don't have to include all the clients at once. The only thing that matters is that eventually most clients participate, so that most clients have a chance to influence the training process at some point. But they don't have to simultaneously be present. But with respect to hierarchical aggregations, that's also true. So both are true. You do have hierarchical aggregations in our system because you don't want a single server to be talking to 10,000 machines. But you also don't have to include the entire population in training. I think I answered all of them. All right. Thank you. [APPLAUSE]
B1 server data model client computation learning Federated learning with TensorFlow Federated (TF World '19) 4 0 林宜悉 posted on 2020/03/31 More Share Save Report Video vocabulary