Name: 人工智能安全的具體問題（論文） - Computerphile (Concrete Problems in AI Safety (Paper) - Computerphile)
Uploaded: 2021-01-14T10:47:25.000Z
Duration: 9 min 1 s
Description: 【看影片學英語】數萬部 YouTube 影片，搭配英漢字典即點即查，輕鬆掌握單字發音與用法，長久累積看電影不必再看字幕。

過去簡單式

Today I thought I'd talk about a paper fairly recent.

A paper called "Concrete Problems in AI Safety"

Which is going to be related to the stuff I was talking about

before with the "Stop Button". It's got a bunch of authors; mostly from Google Brain

Google's AI research department, I guess..

Well a lot of it's AI research, but specifically Google Brain and some

people from Stanford and Berkeley and opening iEARN. Whatever... it's a

collaboration between a lot of different authors

The idea of the paper is trying to lay out a set of problems that we are able to

currently make progress on like if we're concerned about this far-off sort of

super intelligence stuff.. Sure; it seems important and it's interesting and

difficult and whatever, but it's quite difficult to sit down and actually do

anything about it because we don't know very much about what a super

intelligence would be like or how it would be implemented or whatever....

The idea of this paper is that it... It lays out some problems that we can tackle now

which will be helpful now and that I think will be helpful later on as well

with more advanced AI systems and making them safe as well. It lists five problems:

Avoiding negative side effects, which is quite closely related to the stuff we've

been talking about before with the stop button or the stamp collector. A lot of

the problems with that can be framed as negative side effects. They do the thing

you ask them to but in the process of doing that they do a lot of things but

you don't want them to. These are like the robot running over the baby right?

Yeah, anything where it does the thing you wanted it to, like it makes you the

cup of tea or it collects you stamps or whatever, but in the process of doing

that, it also does things you don't want it to do. So those are your negative side

effects. So that's the first of the research areas is how do we avoid these

negative side effects.. Then there's avoiding reward hacking, which is about

systems gaming their reward function. Doing something which technically counts

but isn't really what you intended the reward function to be. There's a lot of

different ways that that can manifest but this is like this is already a

common problem in machine learning systems where you come up with your

evaluation function or your reward function or whatever your objective

function and the system very carefully optimizes to exactly what you wrote and

then you realize what you wrote isn't what you meant. Scalable oversight is the

next one. It's a problem that human beings have all the time, anytime you've

started a new job. You don't know what to do and you have someone who does who's

supervising you. The question is what questions do you

ask and how many questions do you ask because current machine learning systems

can learn pretty well if you give them a million examples but you don't want your

robot to ask you a million questions, you know. You want it to only ask a few

questions and use that information efficiently to learn from you. Safe

exploration is the next one which is about, well, about safely exploring the

range of possible actions. So, you will want the system to experiment, you know,

try different things, try out different approaches. That's the only way it's

going to find what's going to work but there are some things that you don't

want it to try even once like the baby. Right, right.. Yeah you don't want it to

say "What happens if I run over this baby?" Do you want certain possible things

that it might consider trying to actually not try at all because you

can't afford to have them happen even once in the real world. Like a

thermonuclear war option; What happens if I do this? You don't want it to try that.

Is that the sort of thing that.. Yeah, yeah.. I'm thinking of war games.. Yes, yeah.. yeah. Global

Thermal Nuclear War . It runs through a simulation of every possible type of

nuclear war, right? But it does it in simulation. You want your system not to

run through every possible type of thermonuclear war in real life to find

out it doesn't work cause you can't.. It's too unsafe to do that even once. The last

area to look into is robustness to distributional shift. Yeah

It's a complicated term but the concept is not. It's just that the

situation can change over time. So you may end up; you may make something.

You train it; it performs well and then things change to be different from the

training scenario and that is inherently very difficult. It's something

humans struggle with. You find yourself in a situation you've

never been in before but the difference I think or one of the

useful things that humans do is, notice that there's a problem a lot of current

machine learning systems. If something changes underneath them

and their training is no longer useful they have no way of knowing that. So they

continue being just as confident in their answers that now make no sense

because they haven't noticed that there's a change. So.. if we can't

make systems that can just react to completely unforeseen circumstances, we

may be able to make systems that at least can recognize that they're in

unforeseen circumstances and ask for help and then maybe we have a scalable

supervision situation there where they recognize the problem and that's when

they ask for help. I suppose a simplified simplistic example of this is when you have

an out-of-date satnav and it doesn't seem to realize that you happen to be doing

70 miles an hour over a plowed field because somebody else, you know, built a

road there. Yeah, exactly. The general tendency of unless you program them

specifically not to; to just plow on with what they think they should be doing.

Yeah. It can cause problems and in a large scale heavily depended on , you know , in

this case, it's your sat-nav. So it's not too big of a deal because it's not

actually driving the car and you know what's wrong and you can ignore it

As AI systems become more important and more integrated into

everything, that kind of thing, can become a real problem.

Although, you would hope the car doesn't take you in  plowed field in

first place. Yeah. Is it an open paper or does it

the way it does all of these is it gives a quick outline of what the

problem is. The example they usually use is a cleaning robot like we've made this.

We've made a robot it's in an office or something and it's cleaning up and then

they sort of framed the different problems those things that could go

wrong in that scenario. So it's pretty similar to they get me a cup of tea and

don't run over the baby type set up. It's clean the office and, you know, not knock

anything over or destroy anything. And then, for each one, the paper talks about

things we can work on, basically. Things that we don't know how to do yet but

which seem like they might be doable in a year or two and some careful thought

This paper. Is this one for people to read? Yeah, really good. It doesn't cover

anything like the range of the problems in AI safety but of the problems

specifically about avoiding accidents, because all of these are

these are ways of creating possible accidents, right? Possible causes of

accidents. There's all kinds of other problems you've been having in AI that