Subtitles section Play video
Ten years ago,
10 年前,
computer vision researchers thought that getting a computer
電腦視覺研究人員認為,
to tell the difference between a cat and a dog
要讓電腦辨別貓與狗的差別,
would be almost impossible,
幾乎是比登天還難,
even with the significant advance in the state of artificial intelligence.
即使用了相當先進的 人工智慧都很難辦到。
Now we can do it at a level greater than 99 percent accuracy.
現在我們可以把辨別的準確度 提升到 99% 以上。
This is called image classification --
這技術叫做圖像分類——
give it an image, put a label to that image --
給電腦看圖片, 並給圖片貼上標籤——
and computers know thousands of other categories as well.
電腦還可以識別出 許多其它類別的東西。
I'm a graduate student at the University of Washington,
我目前是華盛頓大學的研究生,
and I work on a project called Darknet,
我正在做一個專題叫做「暗黑網路」,
which is a neural network framework
它是一個用來訓練及測試
for training and testing computer vision models.
電腦視覺模型的神經網路架構。
So let's just see what Darknet thinks
所以,讓我們來瞧瞧暗黑網路
of this image that we have.
對我們照片識別能力的狀況。
When we run our classifier
當我們在這張照片上
on this image,
開啟我們的分類器,
we see we don't just get a prediction of dog or cat,
可以看到電腦現在不只 在預測這是狗或貓,
we actually get specific breed predictions.
它實際上正在擷取特定品種的預測。
That's the level of granularity we have now.
這就是現在我們電腦的粒度等級。
And it's correct.
辨別正確。
My dog is in fact a malamute.
我的狗的確是隻雪橇犬。
So we've made amazing strides in image classification,
所以,我們在圖像識別上 已經有了很大的進步,
but what happens when we run our classifier
但如果我們用識別器
on an image that looks like this?
來辨別這樣的照片呢?
Well ...
嗯……
We see that the classifier comes back with a pretty similar prediction.
可以看到從分類器 得到的預測也相當類似。
And it's correct, there is a malamute in the image,
沒錯,圖片中有一隻雪橇狗,
but just given this label, we don't actually know that much
但它只給出一個標籤,
about what's going on in the image.
我們對這張照片的理解 還不是很完整。
We need something more powerful.
我們需要更強的東西。
I work on a problem called object detection,
我正在研究一個問題, 叫做「物件偵測」,
where we look at an image and try to find all of the objects,
我們把一張照片中的 所有物體都找出來,
put bounding boxes around them
用邊界框把它們框起來,
and say what those objects are.
然後標示它們是那些東西。
So here's what happens when we run a detector on this image.
我們來看一下當我們在這一張圖片上 執行偵測軟體時,會發生甚麼事。
Now, with this kind of result,
現在,有了這類的結果,
we can do a lot more with our computer vision algorithms.
我們就可以利用電腦視覺演算法, 幫我們做更多的事。
We see that it knows that there's a cat and a dog.
我們可以看到, 電腦知道圖片中有一隻貓和狗。
It knows their relative locations,
它知道牠們彼此的相對位置、
their size.
大小。
It may even know some extra information.
電腦甚至可能知道其它的資訊。
There's a book sitting in the background.
它也看到了背景中有一本書。
And if you want to build a system on top of computer vision,
如果你想要建立一個 基於電腦視覺系統的實用系統,
say a self-driving vehicle or a robotic system,
比如說,自動駕駛車或機械人系統,
this is the kind of information that you want.
這類就會是你想要的資訊。
You want something so that you can interact with the physical world.
你會想要一個可以 與實體世界互動的東西。
Now, when I started working on object detection,
當我開始做物件偵測時,
it took 20 seconds to process a single image.
它要花 20 秒才能處理一張圖片。
And to get a feel for why speed is so important in this domain,
為了讓各位體會 為什麼這個領域這麼講究速度,
here's an example of an object detector
我這邊做個執行物件偵測器的示範,
that takes two seconds to process an image.
一張照片只要 2 秒的處理時間。
So this is 10 times faster
所以,比 20 秒一張的偵測器
than the 20-seconds-per-image detector,
快了 10 倍,
and you can see that by the time it makes predictions,
各位可以看到, 在它識別圖像的過程中,
the entire state of the world has changed,
周圍環境已經發生了變化,
and this wouldn't be very useful
但對一個應用軟體而言,
for an application.
這樣的速度是很鷄肋的。
If we speed this up by another factor of 10,
如果我們把另一個參數調升到 10 ,
this is a detector running at five frames per second.
這個偵測器每秒 就可以識別 5 張圖片。
This is a lot better,
這樣好多了,
but for example,
但,假如,
if there's any significant movement,
移動很快的時候……
I wouldn't want a system like this driving my car.
我可不想在我車上裝這樣慢的系統。
This is our detection system running in real time on my laptop.
這是在我筆電上運行的 即時偵測系統。
So it smoothly tracks me as I move around the frame,
我在框框附近移動的時候, 它可以很順暢地追蹤著我,
and it's robust to a wide variety of changes in size,
而且,它可以根據不同的大小、
pose,
姿勢、
forward, backward.
前、後來做調整。
This is great.
太棒了。
This is what we really need
如果我們要建立一個 基於電腦視覺系統的實用系統,
if we're going to build systems on top of computer vision.
這個才會是我真正想要的。
(Applause)
(掌聲)
So in just a few years,
所以,才幾年的時間,
we've gone from 20 seconds per image
我們從每 20 秒處理一張照片,
to 20 milliseconds per image, a thousand times faster.
進步到每張照片只要 20 毫秒, 快了 1000 倍。
How did we get there?
我們是如何辦到的?
Well, in the past, object detection systems
過去,物件偵測系統,
would take an image like this
會把一張像這樣的照片,
and split it into a bunch of regions
分割成好幾個小區塊,
and then run a classifier on each of these regions,
然後在每一個小區塊 運行分類器軟體,
and high scores for that classifier
相似度得分如果比較高
would be considered detections in the image.
會被識別器認為照片偵測成功。
But this involved running a classifier thousands of times over an image,
但這樣一張圖片要執行 好幾千次的識別指令、
thousands of neural network evaluations to produce detection.
經過好幾千次的神經網路評估 才有辦法偵測出來。
Instead, we trained a single network to do all of detection for us.
但我們不是這樣做,我們訓練了一個 網路模型來幫我們完成所有的偵測。
It produces all of the bounding boxes and class probabilities simultaneously.
它可以同時產出邊界框 並同時對可能的結果進行評估。
With our system, instead of looking at an image thousands of times
有了我們的系統, 你就不用一張圖片看了好幾千遍
to produce detection,
才能偵測出來。
you only look once,
你只要看一眼 (YOLO),
and that's why we call it the YOLO method of object detection.
所以我們簡稱這個 物件偵測技術為「YOLO」。
So with this speed, we're not just limited to images;
所以,有了這樣的辨識速度, 我們不只可以偵測圖片;
we can process video in real time.
還可以處理即時的影片。
And now, instead of just seeing that cat and dog,
現在各位看到的不是 貓、狗的靜態圖片,
we can see them move around and interact with each other.
而是有牠們在移動、 互動的動態影片。
This is a detector that we trained
這是我們用微軟 COCO 資料集裡
on 80 different classes
80 種不同的類別
in Microsoft's COCO dataset.
訓練出來的辨識器。
It has all sorts of things like spoon and fork, bowl,
它包含各種東西, 像是湯匙、叉子、碗
common objects like that.
這類的日常用品。
It has a variety of more exotic things:
它還有很多奇妙的東西:
animals, cars, zebras, giraffes.
動物、車子、斑馬、長頸鹿。
And now we're going to do something fun.
現在我們要進行一件好玩的事。
We're just going to go out into the audience
我們會進到觀眾席,
and see what kind of things we can detect.
去看看能辨識到哪些東西。
Does anyone want a stuffed animal?
有誰要填充娃娃?
There are some teddy bears out there.
這邊還有一些泰迪熊。
And we can turn down our threshold for detection a little bit,
我們現在降低一下 對偵測結果的精確度的要求,
so we can find more of you guys out in the audience.
這樣我們可以在觀眾席中 找到更多東西。
Let's see if we can get these stop signs.
我們來看看能不能偵測到停止標誌。
We find some backpacks.
我們有偵測到一些背包。
Let's just zoom in a little bit.
現在把鏡頭拉近一點。
And this is great.
這真的很厲害。
And all of the processing is happening in real time
所有的偵測流程
on the laptop.
都可以在筆電裡即時呈現。
And it's important to remember
更重要的是,
that this is a general purpose object detection system,
這只是一個一般用的物件偵測系統,
so we can train this for any image domain.
我們還可以訓練它 辨別任何領域的照片。
The same code that we use
同樣的程式碼, 放在自動駕駛車裡,
to find stop signs or pedestrians,
可以偵測到停止標誌、行人、
bicycles in a self-driving vehicle,
腳踏車,
can be used to find cancer cells
但放到組織切片
in a tissue biopsy.
就可以偵測出癌症細胞。
And there are researchers around the globe already using this technology
現在全球有很多研究人員 已經開始在使用這項技術
for advances in things like medicine, robotics.
做進一步的研究, 像是醫藥、機械人領域。
This morning, I read a paper
今天早上,我讀到一篇文章,
where they were taking a census of animals in Nairobi National Park
在奈洛比國家公園裡, 他們要對動物們進行統計調查,
with YOLO as part of this detection system.
YOLO 就是其使用的 偵測系統的一部分。
And that's because Darknet is open source
而這一切都是因為 暗黑網路是開放原始碼,
and in the public domain, free for anyone to use.
在公眾領域, 任何人都可以免費使用。
(Applause)
(掌聲)
But we wanted to make detection even more accessible and usable,
但我們希望偵測系統 可以更親民、更好用,
so through a combination of model optimization,
所以在經過模型優化、
network binarization and approximation,
網路二值化及近似度化的整合後,
we actually have object detection running on a phone.
我們終於可以在手機上偵測物件。
(Applause)
(掌聲)
And I'm really excited because now we have a pretty powerful solution
而我真的相當興奮,因為我們現在
to this low-level computer vision problem,
在低階的電腦影像處理問題上 有了相當強力的解決方式,
and anyone can take it and build something with it.
任何人都可以拿去並創造一些東西。
So now the rest is up to all of you
所以,接下來就看各位
and people around the world with access to this software,
以及全世界所有人 用這個軟體大展身手了,
and I can't wait to see what people will build with this technology.
我真的等不及想看看你們 用這項科技所做出來的產品。
Thank you.
謝謝。
(Applause)
(掌聲)