Subtitles section Play video Print subtitles [MUSIC PLAYING] CHRIS LATTNER: Hi, everyone. I'm Chris. And this is Brennan. And we're super excited to tell you about a new approach to machine learning. So here in the TensorFlow team, it is our jobs to push the state of the art in machine learning forward. And we've learned a lot over the last few years with deep learning. And we've incorporated most of that all into TensorFlow 2. And we're really excited about it. But, here, we're looking a little bit further beyond TensorFlow 2. And what do I mean by further? Well, eager mode makes it really easy to train a dynamic model. But deploying it still requires you take that and then write a bunch of C++ code to help drive it. And that could be better. Similarly, some researchers are interested in taking machine learning models and integrating them into larger applications. That also often requires writing C++ code. We always want more flexible and expressive autodifferentiation mechanisms. And one of things we're excited about is being able to define reusable types that then can be put into new places and used with automatic differentiation. And we always love improving your developer workflow. We want to make you more productive by taking errors in your code and bringing them to your source and also by just improving your iteration time. Now, what we're really trying to do here is lift TensorFlow to entirely new heights. And to do that, we need to be able to innovate at all levels of the stack. This includes the compiler and the language. And that's what Swift for TensorFlow is all about. We think that applying new solutions to old problems can help push machine learning even further than before. Well, let's jump into some code. So first, what is Swift? Swift is a modern and cross-platform programming language that's designed to be easy to learn and use. Swift uses types. And types are great, because they can help you catch errors earlier. And also, they encourage good API design. Now, Swift uses type inference, so it's really easy to use and very elegant. But it's also open source and has an open language evolution process, which allows us to change the language and make it better for machine learning which is really great. Let's jump into a more relevant example. This is how you define a simple model in Swift for TensorFlow. As you can see, we're laying out our layers here. And then we can find a forward function, which composes them together in a linear sequence. You've probably noticed that this looks a lot like Keras. That's no accident, of course. We want you to be able to take what you know about Keras and bring it forward into this world as well. Now, once we have a simple model, let's train it. How do we do that? All we have to is instantiate our model, pick an optimizer and some random input data, and then pick a training loop. And, here, we'll write it by hand. One of the reasons we like writing by hand is that it gives you the maximum flexibility to play with different kinds of constructs. And you can do whatever you want, which is really great. But some of the major advantages of Swift for TensorFlow are the workflow. And so instead of telling you about it, what do you think, Brennan, should be show them? BRENNAN SAETA: Let's do it. All right, the team has thought long and hard about what's the easiest way for people to get started using Swift for TensorFlow. And what could be easier than just opening up a browser tab? This is Google Colab, hosted Jupyter notebooks. And it comes with Swift for TensorFlow built right in. Let's see it in action. Here is the layer model, the model that Chris just showed you a couple of slides ago. And we're going to run it using some random training data right here in the browser. So we're going to instantiate the model. We're going to use the stochastic gradient descent SGD optimizer. And here we go. We have now just trained a model using Swift for TensorFlow in our browser on some training data right here. Now, we can see the training loss is decreasing over time. So that's great. But if you're ever like me and whenever I try and use machine learning in any application, I start with a simple model. And I've got to iterate. I've got to tweak the model to make it fit better to the task at hand. So since we're trying to show you the workflow, let's actually edit this model. Let's make it more accurate. So here we are. Now, let's think a little for a moment. What changes do we want to make to our model? Well, this is deep learning after all. So the answer is always to go deeper, right? But you may have been following the recent literature in state of the art in that not just sequential layers, but skip connections or residual connections are a really good idea to make sure your model continues to train effectively. So let's go through and actually add an extra layer to our model. Let's add some skip connections. And we're going to do it all right now in under 90 seconds. Are you ready? All right, here we go. So the first thing that we want to do is we need to define our additional layer. So we're going to fill in this dense layer. Whoops. Flow. And one thing you can see is that we're using Tab autocomplete to help fill in code as we're trying to develop and modify our model. Now, we're going to fix up the shapes right here really quick, so that the residual connections will all work. If I can type properly, that would go better. All right, great. We have now defined our model with the additional layers. All we need to do is modify the forward pass, so that we add those skipped connections. So here we go. The first thing we need to do is we need to store in a temporary variable the output of the flattened layer. Then we're going to feed the output of the flattened layer to our first dense layer. So dense.applied to tmp in context. Now, for the coup de grace, here is our residual connection. So dense2.applied to tmp + tmp2 in context. Run that. And, yes, that works. We have now just defined a new model that has residual connections and is one additional layer deeper. Let's see how it does. So we're going to reinstantiate our model and rerun the training loop. And if you recall from the loss that we saw before, this one is now substantially lower. This is great. This is an example of what it's like to use Swift for TensorFlow to develop and iterate as you apply models to applications and challenges. But Swift for TensorFlow-- thank [APPLAUSE] But Swift for TensorFlow was designed for researchers. And researchers often need to do more than just change models and change the way the architecture fits together. Researchers often need to define entirely new abstractions or layers. And so let's actually see that live right now. Let's define a new custom layer. So let's say we had the brilliant idea that we wanted to modify the standard dense layer that takes a weights and biases and we wanted to add an additional bias set of parameters, OK? So we're going to define this double bias dense layer right here. So I'm going to type this really quickly. Stand by 15 seconds. Here we go. [LAUGHTER] Woo, all right, that was great. So let's actually walk through the codes that you can see what's going on. So the first thing that we have is we define our parameters. So these are a W, like our weights for our neurons, and B1, bias one, and B2, our second bias. We defined an initializer that takes an input size and an output size just like dense does. We use that to initialize our parameters. The forward pass is very simple to write. So here's just applied to. And we just take the matrix multiplication of input by our weights, and we add in our bias terms. That's it. We've now just defined a custom layer right in Colab in just a few lines of code. All right, let's see how it goes. Here's model two. And so we're going to use our double bias dense layer. And we're going to instantiate and. We're going to train it using, again, our custom handwritten training loop. Here's an example of another way that we think Swift for TensorFlow makes your life easier. Because Swift for TensorFlow can statically analyze your code, it can be really helpful to you. I don't know about you, but I regularly put typos in my code. I don't if you saw me typing earlier. And Swift for TensorFlow here is helping you out, right? It's saying, look, you mistyped softmaxCrossEntropy. This should be labels, OK? All right, so we run it. We train it. And our loss isn't as good. This was not the right idea. But this is an example of how easy it is for researchers to experiment with new ideas really easily in Swift for TensorFlow. But let's go deeper. Swift for TensorFlow is, again, designed for researchers. And researchers need to be able to customize everything, right? That's the whole point of research. And so let's show an example of how to customize something other than just a model or a layer. So you may have heard that large GPU clusters or TPU super pods are, like, delivering massive breakthroughs in research and advancing the state of the art in certain applications and domains. And you may have also heard that, as you scale up to effectively utilize these massive hardware pools, you need to increase your batch size. And so let's say you're a researcher, and you want to try and figure out what are the best ways to train deep neural networks at larger batch sizes. Well, if you're a researcher, you probably can't buy a whole GPU cluster or rent a whole TPU super pod all the time for your experiments. But you often have a GPU under your desk. So let's see how we can simulate running on a super large data parallel GPU or TPU cluster on a single machine. We're going to do it all in a few lines of code right here. So here's our custom training loop. Well, here's the standard part, right? This is 1 to 10 training epics. And what we're going to do is, instead of just applying our model forward once, we have an additional inner loop, right? So we're going to run our forward pass. We're going to run our model-- whoops-- four times. And we're going to take the gradients for each step. And we're going to aggregate them in this grads variable. OK? This simulates running on four independent accelerators, four GPUs or four TPUs in a data parallel fashion on a batch that's actually four times as large as what we actually run. We're going to then use our optimizer to update our model along these aggregated gradients, again simulating a data parallel synchronous training process. That's it. That's all there is to it. We're really excited by this sort of flexibility and capabilities that Swift for TensorFlow brings to researchers. Back over to you, Chris. CHRIS LATTNER: Thanks, Brennan. [APPLAUSE] So I think that the focus on catching errors early and also productivity enhancements like code completion can help you in a lot of ways. And it's not just about, like, automating typing of code. But it can also be about discovery of APIs. So another thing that's really cool about Swift as a language is that it has really good interoperability with C code. And so in Swift, you can literally just import a C header file and call symbols directly from C without wrappers, without boilerplate or anything involved. It just works. So we've taken this approach. In the TensorFlow team, we've taken this approach and brought it to the world of Python. And one of the cool things about this is that that allows you to combine the power of Swift for TensorFlow with all the advantages of the Python ecosystem. How about we take a look? BRENNAN SAETA: Thanks, Chris. The Python data science ecosystem is incredibly powerful and vibrant. And we wanted to make sure that, as you start using Swift for TensorFlow, you didn't miss all your favorite libraries and utilities that you were used to. And so we've built a seamless Python interoperability capability to Swift for TensorFlow. And let's see how it works in the context of my favorite Python data science library, NumPy. So the first thing you need to do is import TensorFlow and import Python. And once you do that, that defines this Python object that allows you to import arbitrary Python libraries. So here we import pyplot from the matplotlib library and NumPy. And we assign it to np, OK? After that, we can just use np just as if we were in Python. So, here, we call linspace. We're going to call sine and cosine. And we're going to pass those values to pyplot. When we run the cell, it just works exactly as you'd expect. [APPLAUSE] Thank you. [APPLAUSE] Now, this sort of kind of looks like the Python code you're used to writing, but this is actually pure Swift. It just works seamlessly. But this is maybe a bit of a toy example. So let's see this a little bit more in context. OpenAI has done a lot of work in the area of reinforcement learning. And in order to help that along, they developed a Python library called OpenAI Gym. Gym contains a collection of environments that are very useful when you're trying to train a reinforcement learning agent across a variety of different challenges. Let's use OpenAI Gym to train a reinforcement learning agent in Swift for TensorFlow right now our browsers. So the first thing we need to do is we need to import Gym. We're going to define a few hyperparameters here. And, now, we define our neural network. In this case, we're going to pick a simple two-layer dense network. And it's just a sequential model, OK? After that, we have some helper code to filter out bad or short episodes and whatnot. But here's the real meat of it. We're going to use Gym to instantiate the CartPole v0 environment. So that's our env. We're going to then instantiate our network right here and our optimizer. And here's our training loop. There we go. So we're going to get a bunch of episodes. We're going to run our model, get the gradients. And we're going to apply those to our optimizer. And we're going to record the mean rewards as we train, OK? It's all very simple, straightforward Swift. And here you can see us training a Swift for TensorFlow model in an OpenAI Gym environment using the Python bridge, totally seamless. And of course, afterwards, you can keep track of the parameters of the rewards. In this case, we're going to plot the mean rewards as the model trained using Python NumPy, totally seamless. You can get started using Swift for TensorFlow using all the libraries you know and love and take advantage of what Swift for TensorFlow brings to the table. Back over to you, Chris. CHRIS LATTNER: Thanks, Brennan. So one of the things that I love about this is it's not just about being able to leverage big important libraries like NumPy. We're working on the ability to integrate Swift for TensorFlow and Python for TensorFlow code together, which we think will provide a nice transition path to make you able to incrementally move code from one world to the other. Now, I think it's fair to say that calculus is an integral part of machine learning. [LAUGHTER] And we think that differentiable programming is so important that we've built it right into the language. This has a number of huge advantages, including enabling more flexible and custom work with differentiables, with derivatives. And we think this is really cool. So I'd like to take a look. BRENNAN SAETA: So we've been using Swift for TensorFlow's differential programming capabilities throughout all of our demos so far. But let's really break it down and see what's going on at a fundamental level. So here we define my function that takes two doubles and returns a double based on some products, and sums, and quotients. If we want Swift for TensorFlow to automatically compute the derivative for us, we just annotate it at differential. Swift for TensorFlow will then derive the derivative for this function right when we run the cell. To use this autogenerated derivative, use gradient. So gradient takes two things. It takes a closure to evaluate and a point that you want to evaluate your closure at. So here we go. This is what it is to take the derivative of a function at a particular point. So we can change it surround. This one's my favorite tasty number. And that works nicely. Now, one thing to note, we've just been taking the partial derivatives of my function with respect to a. But, of course, you can take the partial derivatives and get a full gradient of my function, like so. Often with neural networks, however, you want to get not just the gradients for your network as you're trying to train it and optimize your loss function. You often want what the network predicted, right? This is really useful to compute accuracy or other debugging sort of information. And for that you can use value with gradient. And that returns a tuple containing both the value and the gradient, shockingly enough. Now, one thing to note, in Swift, tuples can actually have named parameters. They aren't just ordered. And so you can actually see that it prints out really nicely. And you can access values. We think this is, again, another nice little thing that helps makes writing and debugging code and, more importantly, reading it and understanding it later a little bit easier. But the one thing that I want to call out is that throughout this we've been using just normal types. These aren't tensor of something. It's just plain old double. This is because automatic differentiation is built right into the language in Swift for TensorFlow. It makes it really easy to express your thoughts very clearly. But even though it's built into the language, we've actually worked very hard to make sure that automatic differentiation is totally flexible so that you can customize it to whatever it needs you have. And instead of telling you about that, let's show you. So let's say you want to define an algebra in 2D space. Well, you're certainly going to need a point data type. So here we define a point struct with x and y. And we just market differentiable. We can define helper functions on it like dot or other helper functions. And Swift for TensorFlow, when you try and use your code, will often automatically infer when you need gradients to be automatically computed for you by the compiler. But often, it's a good idea to document your intentions. And so you can annotate your helper functions as @differentiable. The other reason why we recommend doing this is because this helps catch errors. So here, Swift for TensorFlow is actually telling you that, hey, you can only differentiate functions that return values that conform to differentiable. But int doesn't conform to differentiable, right? What this is telling you is that my helper function returns an int. And int is all about taking infinitesimally small steps as you optimize and take gradients, right? And integers just are very discrete. And so Swift for TensorFlow is helping to catch errors, you know, right when you write the code very easily and tell you what's going on. So the solution, of course, is just to not mark that as @differentiable. The cell runs just fine. But let's say we also wanted to go beyond just defining the dot product. Let's say we also wanted to define the magnitude helper function. That is the magnitude of the vector defined by the origin to the point in question. So to do that, we can use the distance formula if you're going to do Euclidean distance. And we can define an extension on point that does this. But we're going to pretend for a moment that Swift doesn't include a square root function, because I want a good excuse for you to see the interoperability with C, OK? So we're actually going to use C's square root function that operates on doubles, OK? So based on the definition of Euclidean distance, we can define the magnitude. And it totally just-- no, it doesn't quite work. OK. Let's see what's going on. So we wanted magnitude to be differentiable. And it's saying that you can't differentiate the square root function, because this is an external function that hasn't been marked as differentiable. OK. What's that saying? Well, the square root, it's a C function. It was compiled by the C compiler. And as of today, the C compiler can't automatically compute derivatives for you. So Swift for TensorFlow is saying like, hey, this isn't going to work. This is excellent, because it gives me a great excuse to show you how to write custom gradients. All right, so here we define a wrapper function, mySqrt square root, that just calls down in the forward pass to the C square root function. In the backwards pass, we take our double and we return a tuple of two values. Rather, the first element in the tuple is the normal value in the forward pass. And the second is a pullback closure. And this is where you define the backwards pass capturing whatever values you need from the forward pass. OK? So we're going to run that. We're going to go back up to our definition of magnitude and change it from square root to my square root, rerun the cell, and it works. We've now defined point and two methods on it, dot and magnitude. And we can now combine these in arbitrary other silly differentiable functions. So, here, I've defined the silly function. And we've marked it as differentiable. And we're going to take two points. We're also going to take a double, right? You can mix and match differentiable data types totally fluidly. We're going to return double, and we're going to take magnitudes and do dot products. It's a silly function after all. And we can then use it, compute the gradient of this function at arbitrary data points. Just like you'd expect, you can get the value of the function, get full gradients in addition to partial derivatives with respect to individual values. That's been a quick run through of how to use customization, custom gradients, custom data types with a language integrated automatic differentiation built into Swift for TensorFlow. But let's go one step further. Let's put all this together and show how you can write your own debuggers as an example of how this power is all in your hands. So often when you're debugging models, you often want to be able to see the gradients at different points within your model. And so here, we can just define in regular Swift code a gradient debugger. Now, it's going to take as input a double. And it's going to return it just like normal for the forward pass, right? It's an identity function. On the backwards pass, we're going to get the gradient. We're going to print the gradient. And then we're going to return it. So we're just passing it through just printing it out. Now that we've defined this gradient debugger ourselves, we can use it in our silly function to see what's going on as we take derivatives. So gradient debugger, there we go. We can rerun that. And when we take the gradients, we can now see that for that point in the silly function of a dot b, the gradient is 3.80. That's been a brief tour through how automatic differentiation works in Swift for TensorFlow and how it's customizable so that you can harness the power in whatever abstractions or systems you need to build. Back over to you, Chris. CHRIS LATTNER: Thanks, Brennan. [APPLAUSE] So the funny thing about all this is that the algorithms that we're building on were defined back in the 1970s. And so it really took language integration to be able to bring these things forward into the world of machine learning. There's a tremendous amount of depth here. And I'm really excited to see what you all can do with it. And we think that this is going to enable new kinds of research which we're very excited about. There's also a ton of depth. And if you're interested in learning more, we have a bunch of detailed design documents available online. Let's talk about performance a little bit. Now, Swift is fast. And this comes from a number of different things, one of which is that the language itself has really good low level performance. There's also no GIL to get in the way of concurrency. Swift for TensorFlow also has some advanced compiler techniques to automatically identify graphs for you and extract them. So you don't have to think about that. The consequence of all this together is that we think Swift has the world's most advanced eager mode. Now, you may not care that much about performance. You may wonder, like, why do we care about this stuff? Well, we're seeing various trends in the industry where people are defining neural nets and then want to integrate them into other larger applications. And typically what this requires is this requires you to export graphs and then write a bunch of C++ code to load and orchestrate them in various ways. So let's take a look at an example of this. AlphaGo Zero is really impressive work that combines three major classes of techniques. Of course, you have deep learning on the one hand. But it also drives it through Monte Carlo tracers to actually find and evaluate these spaces. And then it runs them at scale on industry leading TPU accelerators. And so it's the combination of all three of these things that make AlphaGo Zero possible. Now, this is possible today. And if you're an advanced team like DeepMind, you can totally do this. But it's much more difficult than it should be. And we think that breaking down barriers like this can lead to new breakthroughs in science. And we think that this is what can drive progress forward. So instead of talking about it again, let's take a look. BRENNAN SAETA: MiniGo is an open source go player inspired by DeepMind's AlphaGo Zero project. It's available on GitHub. And you can certainly check out the code. And I encourage you to. They're also going to be here. And they have some other presentations tomorrow. But the MiniGo project, when they started out, they were getting everything in normal TensorFlow. And it was working great until they started trying to run at scale on large clusters of TPUs. There, they ran into performance problems and had to rewrite things like Monte Carlo tree search into C++ in order to effectively utilize modern accelerators. Here, we've reimplemented Monte Carlo tree search and the rest of the MiniGo self-play in pure Swift. And we're going to let you see it running right here in Colab. So here we define a helper function where we take in a game configuration and a couple of participants. These are our white and black players. And we're going to run, basically, play the game until we have a winner or loser. And so let's actually run this. Here, we define a game configuration. We're going to play between a Monte Carlo tree search powered by neural networks versus just a random player just to see how easy it is to flip back and forth or mix and match between deep learning and other arbitrary machine learning algorithms right here in Swift. So here you go. You can see them playing white, black, playing different moves back and forth. And it just goes. We think that Swift for TensorFlow is going to unlock whole new classes of algorithms and research, because of how easy it is to do everything in one language with no barriers, no having to rewrite things into C++. Back over to you, Chris. CHRIS LATTNER: Thank, Brennan. The cool thing about this, of course, is that you can actually do something like this in a workbook, which is pretty phenomenal. And we've seen many different families of new techniques that can be combined together and fused in different ways. And bringing this to more people we think will lead to new kinds of interesting research. Now, our work on usability and design is not just about high-end researchers. So we love them, but Swift is also widely used to teach new programmers how to code. And education is very close to our hearts. And so I'm very excited to announce a collaboration that we're embarking on with none other than Jeremy Howard. But instead of talking about this, I'd rather have Jeremy speak about it now. JEREMY HOWARD: At Fast.AI, we're always looking to push the boundaries of what's possible with deep learning, especially pushing to make recent advances more accessible. We've been involved with setting image net speed records at a cost of just $25 and building the world's best document classifier. Hundreds of thousands have become deep learning practitioners through our courses and are producing state of the art results with our library. We think that with Swift for TensorFlow, we can go even further. So we're announcing today that our next course will include a big Swift component co-taught by someone that knows Swift pretty well. BRENNAN SAETA: Chris, I think he means you. CHRIS LATTNER: Yeah, we'll see how this goes. So I'm super excited to be able to help teach the next generation of learners. But I'm also really excited that Jeremy will be bringing his expertise in API design and helping us shape the high level APIs in Swift for TensorFlow. So we've talked about many things. But the most important part is Swift for TensorFlow is really TensorFlow at its core. And we think this is super important, because we've worked really hard to make sure that it integrates with all the things going on in the big TensorFlow family. And we're very excited about that. Now, you may be wondering where you could get this. So Swift for TensorFlow is open source. You can find out it on GitHub now. And you can join our community. It also works great in Colab as you've seen today. We have tutorials. We have examples. And all the demos you saw today are available now in Colab, which is great. We've also released our 0.2 release, which includes all the basic infrastructure and underlying technology to power these demos and examples. And we're actively working on high level APIs right now. So this is not ready for production yet as you could guess. But we're very excited about shaping this future, building this out, exploring this new programming model. And this is a great opportunity for advanced researchers to get involved and help shape the future of this platform. So we'd love it for you to try it out and let us know what you think. Thank you. [APPLAUSE] [MUSIC PLAYING]
B1 swift model define gradient brennan chris Swift for TensorFlow: The Next-Generation Machine Learning Framework (TF Dev Summit '19) 4 0 林宜悉 posted on 2020/03/31 More Share Save Report Video vocabulary