Placeholder Image

Subtitles section Play video

  • Katrin Erk>> It's so incredibly to process language which is ridiculous

  • because little children can speak. Little kids learn language

  • by being in the middle of the world. So, they have all these

  • visions, sound and they have language to go along with that.

  • The way that computers learn language nowadays is

  • more like if you sat a baby down with

  • a huge pile of The New York Times and said, "OK, here you go, now learn language."

  • One problem with human language is that it's horribly

  • ambiguous. Take, for example, the word "run." You can

  • run a race, run a company or you

  • run your car into a bog. Run-ing, running.

  • Traditionally what people have done is try to get the computer to

  • pick up on patterns using a dictionary. So,

  • here is this clear list of senses, so, run has say, 20

  • and here's the list. And then,

  • if you had one occurrence like "he ran the company" you would have to say,

  • "OK, that is sense #12 and not sense #11 and not sense #13."

  • Trouble is, all of those meanings are somewhat related

  • but, they're still different because you draw different conclusions.

  • Now, what's the poor computer to do. So, I'm thinking

  • maybe we can't distinguish senses as clearly say, here's where one

  • sense begins, here's where the next sense stops. What I'm doing

  • is to represent each time you use a word like run

  • with the context in which it appears. So, all of this context

  • you can put into numbers and then present such a context as a point

  • in a high dimensional space. And in order to represent

  • words as these points in high dimensional space I need a whole

  • lot of data. I need to have all that context and I need to have it in a form

  • that I can count it, which means, 100 million words is good,

  • a billion is better, 2 billion or 3 billion words, yeah,

  • then you can actually get decent models. But if you want to

  • compute with that amount of data and you do this on a single desktop machine

  • then you better be prepared to wait for a long time and I did that for a while.

  • I'd start an experiment, wait 3 weeks to see how it got out,

  • that's painful, really painful.

  • With TACC I can do the same things in a couple hours. So, we need super-computing because

  • all of natural language processing these days but, in particular when you want to do

  • stuff with word meanings you need to use a lot of data and I mean a lot of data.

Katrin Erk>> It's so incredibly to process language which is ridiculous

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it

A2

我的電腦什麼時候能理解我? (When Will My Computer Understand Me?)

  • 164 4
    Nana Chen posted on 2021/01/14
Video vocabulary