Subtitles section Play video Print subtitles Hi everybody, and welcome to this episode of TensorFlow Meets. I'm delighted to be chatting with Sergio Guadarrama. You're from the TensorFlow Agents team, right? Now, you did a talk at the TensorFlow Developer Summit around TF Agents, and could you tell us all about what TF Agents is and what it does? Oh, definitely. So, TF Agents is a reinforcement learning library for TensorFlow that we have created inside Google to solve many of the problems that we have. We were struggling to get all these RL algorithms that they're publishing every day, to get it right, all the little details. So we decided to build this library with a lot of tests to make sure everybody can use it. Okay, so anybody can download TensorFlow Agents, play around with all that kind of stuff. Now, you mentioned it's about reinforcement learning. To most of us, we kind of know a little bit about reinforcement learning, but could you tell us what it really is and what it's all about? So, the main idea behind reinforcement learning is like when you interact with its own environment, or some game or some task, you're going to play different actions and then you're going to get a reward when you do the things correctly, and then you're going to get a negative reward when you think the things incorrectly. And from that you can learn. Basically, on that reward, you can learn. So, almost like the way a real person learns. It's kind of like a person learns when you get rewarded, and things like that. It's actually inspired from that. I see. Cool. So, now, one of the things that you spoke about in your presentation-- and we have that on YouTube for people to watch-- but one of the things you spoke about that I thought was really cool, and you showed a Breakout-style game where there's the wall, then there's the bat, then there's the bouncing ball, how does that work from a reinforcement learning perspective? So, in that case, what happens, the agent will look at the environment, in this case, the game, see where the bricks are, where the ball is, and make a decision, like where should I move the paddle-- to the left or to the right? Make sure the ball doesn't fall. And then by playing multiple times, sometimes it will fall. Eventually it will learn when you let it fall, you don't win, you lose. So you have to keep the ball moving up and breaking all these bricks on top. Right. Now, how does that work from a TF Agent's perspective. Is the environment there, the game board or-- Yeah, it's already predefined, you can load it. We have already a lot of environments defined for you, so you can just load all the Atari games, OpenAI, Deep Mind Control, many of those things are ready to go. But you can also define your own environment. When you have a specific task, we make it very easy to bring it in and define your own task to solve. Now, in something like the Breakout game, for example, the reward is the score, right? So as you knock off a brick, your score goes up, so how does it see that, how is it that getting labeled? Is it reading the raw pixels on the screen, or is it-- how does that actually work? So, in that case, it's actually given from the game. In other cases, more complicated, like in a recommender system, it would be based on the interaction with the user, for example. I see. Okay, cool. Wow, interesting stuff. Now, this is all open source that you said, right? So it's on github.com/tensorflow/agents? That's it, that's correct. Now I noticed when I was poking around in there that there's a bunch of Colab notebooks. Do you have any that you'd recommend people to play with, any favorites? I think the best one to start is the DQN Cartpole. It's a full example-- you can go through all the steps, and you can see the videos, you can play around with it, you can modify it, see what happens, and how it solves to keep like a small cartpole, keeping the balance. Interesting. How long does it take to train that? That one takes a few minutes. - Oh really, just a few minutes? - Yeah. Wow, so reinforcement learning in a notebook with TF Agents, and it takes a few minutes to help me predict a cartpole. - Yeah. - Wow. Okay. Any others that people should check out? Yeah, the other thing is we are looking forward to the community to contribute new environments, new tasks, or new algorithms for people who have new ideas to contribute, and we are looking forward to pull requests or GitHub issues. Okay, nice. Have you seen any scenarios that excite you? Oh yeah, we applied it to some of the robotics tasks. It was very interesting to see a robot actually learning how to grasp objects, how to move around. The first time you see it actually doing the task that you try, is very rewarding. Cool, I'll look out for that. So if somebody wants to get started with this, where should they go? So, they can go to github.com/tensorflow/agents and over there they can find all Colabs, examples, and everything, and download the code from there. Okay, so they can just kick the tires and have fun with it, right? - Oh yeah, definitely. - Cool, I'd love to see what kind of things people will produce. Yeah, me too. Already the agent plays Breakout better than I do. (laughs) So, thanks so much, Sergio. This has been a lot of fun. Thank you. And thanks everybody for watching this episode of TensorFlow Meets. If you have any questions for me or any questions for Sergio, please leave them in the comments below, and be sure to check out his talk at the TF Dev Summit that you'll see also on this channel. So thank you. ♪ (music) ♪
A2 reinforcement tf learning sergio reward breakout TF-Agents: Reinforcement Learning (TensorFlow Meets) 4 0 林宜悉 posted on 2020/04/04 More Share Save Report Video vocabulary