Subtitles section Play video
This is a three. It's sloppily written and rendered at an extremely low resolution of 28 by 28 pixels.
這是一個隨意書寫的28*28像素、解析度很低的數字 3
But your brain has no trouble recognizing it as a three and I want you to take a moment to appreciate
但你的大腦一看見就能輕鬆辨識出來 ,我想要你好好欣賞這點
How crazy it is that brains can do this so effortlessly?
人腦能夠毫無障礙地辨識是非常厲害的
I mean this this and this are also recognizable as threes,
我的意思是,這個、這個、還有這個,都能被識別為 3
even though the specific values of each pixel is very different from one image to the next.
即使前後圖像的圖形組成有很大差異
The particular light-sensitive cells in your eye that are firing when you see this three
當你看到這張 3 在眼中所激發的感光細胞
are very different from the ones firing when you see this three.
跟當你看到這張 3 所激發的感光細胞是非常不同的
But something in that crazy smart visual cortex of yours
但在你驚人聰明的視覺皮層的處理下
resolves these as representing the same idea while at the same time recognizing other images as their own distinct ideas
能將這兩個 3 視為同一個概念,同時將其他圖像視為不同的概念
But if I told you hey sit down and write for me a program that takes in a grid of 28 by 28
要是我要你:「嘿!坐下來幫我寫個程式, 」
pixels like this and outputs a single number between 0 and 10 telling you what it thinks the digit is
「輸入像這個 28*28 像素的數字圖像」
Well the task goes from comically trivial to dauntingly difficult
「接著輸出該程式認為的 0 到 10 之間的一個數字 ,必須跟你認為的一樣。」
Unless you've been living under a rock
這個任務將不再是家常便飯,而變得嚇死人的困難
I think I hardly need to motivate the relevance and importance of machine learning and neural networks to the present into the future
除非你是山頂洞人
But what I want to do here is show you what a neural network actually is
我想不用再強調機器學習和神經網路之間,對未來發展的關聯性和重要性
Assuming no background and to help visualize what it's doing not as a buzzword but as a piece of math
我現在要向你展示神經網路究竟是什麼
My hope is just that you come away feeling like this structure itself is
假設你沒有相關背景知識
Motivated and to feel like you know what it means when you read or you hear about a neural network quote-unquote learning
我會視覺化神經網路的運作
This video is just going to be devoted to the structure component of that and the following one is going to tackle learning
並且把它當作一門數學,不僅僅是當下流行詞語
What we're going to do is put together a neural network that can learn to recognize handwritten digits
我希望你將能理解為什麼神經網路是長這個樣子
This is a somewhat classic example for
當你看到或聽到機器藉著神經網路來「學習」時 是了解其意涵的
Introducing the topic and I'm happy to stick with the status quo here because at the end of the two videos I want to point
這支影片將解釋神經網路的構造
You to a couple good resources where you can learn more and where you can download the code that does this and play with it?
而下一部影片將解釋機器學習
on your own computer
我們要做的是打造一個可以辨識手寫數字的神經網路
There are many many variants of neural networks and in recent years
這是介紹這種主題很典型的範例
There's been sort of a boom in research towards these variants
我樂於保持這種模式,因為在看完兩支影片後
But in these two introductory videos you and I are just going to look at the simplest plain-vanilla form with no added frills
我會給你一些很好的網站,你可以在那裡學到很多,並且下載程式碼
This is kind of a necessary
在你的電腦裡好好研究
prerequisite for understanding any of the more powerful modern variants and
神經網路發展成很多很多不同類型
Trust me it still has plenty of complexity for us to wrap our minds around
而且近年來對這些的研究有爆炸性的趨勢
But even in this simplest form it can learn to recognize handwritten digits
但這兩支入門影片只會帶你來認識,最簡單的一種神經網路:「多層感知機」(MLP) 最基本的樣子
Which is a pretty cool thing for a computer to be able to do.
這是必要的入門知識
And at the same time you'll see how it does fall short of a couple hopes that we might have for it
對於將來要理解現在任何一種強大的神經網路
As the name suggests neural networks are inspired by the brain, but let's break that down
而且相信我,今天的主題已經夠複雜了,足以讓你腦袋打結
What are the neurons and in what sense are they linked together?
即使是這麼簡單的神經網路也可以經由學習來分辨手寫數字
Right now when I say neuron all I want you to think about is a thing that holds a number
這對電腦來說是非常酷的一件事
Specifically a number between 0 & 1 it's really not more than that
而且與此同時你也將看到神經網路不盡人意的地方
For example the network starts with a bunch of neurons corresponding to each of the 28 times 28 pixels of the input image
神經網路一如其名,是啟發自生物的大腦神經結構
which is
讓我們來剖析它吧
784 neurons in total each one of these holds a number that represents the grayscale value of the corresponding pixel
何謂神經元,又是什麼機制讓它們連在一起的?
ranging from 0 for black pixels up to 1 for white pixels
現在,當我說「神經元」,我要你聯想到它是乘載一個數字的容器
This number inside the neuron is called its activation and the image you might have in mind here
基本是介於 0 和 1 之間的數字,但實際上不止於此
Is that each neuron is lit up when its activation is a high number?
例如:神經網路以輸入圖像的每個像素,對應到每個神經元作為輸入
So all of these 784 neurons make up the first layer of our network
也就是說輸入層總共有 784 個神經元,每個都有乘載數字 ,每個數字代表了對應像素的灰階值
Now jumping over to the last layer this has ten neurons each representing one of the digits
灰階值 0 即黑色,1 即白色
the activation in these neurons again some number that's between zero and one
這些在神經元中的數字稱為「激勵值」
Represents how much the system thinks that a given image?
在此你可能注意到
Corresponds with a given digit. There's also a couple layers in between called the hidden layers
每當神經元激勵值越高,該神經元就越亮
Which for the time being?
於是全部的 784 個神經元,組成了神經網路的第一層
Should just be a giant question mark for how on earth this process of recognizing digits is going to be handled
我們現在跳到最後一層,這層有 10 個神經元,各自表示 0 到 9 的數字
In this network I chose two hidden layers each one with 16 neurons and admittedly that's kind of an arbitrary choice
同樣在這邊的神經元也各自有著介於 0 到 1 的激勵值
to be honest I chose two layers based on how I want to motivate the structure in just a moment and
表示對於給定的圖像,神經網路對於實際數字的判斷結果
16 well that was just a nice number to fit on the screen in practice
在輸入層和輸出層之間,有數個「隱藏層」
There is a lot of room for experiment with a specific structure here
現在在本入門影片裡
The way the network operates activations in one layer determine the activations of the next layer
對於神經網路是如何進行判斷的,我們只能先把它看做是巨大的問號
And of course the heart of the network as an information processing mechanism comes down to exactly how those
本影片展示的視覺化神經網路,我設計了兩個隱層,個別搭載16個神經元
activations from one layer bring about activations in the next layer
這只是擺好看的設定
It's meant to be loosely analogous to how in biological networks of neurons some groups of neurons firing
老實說之所以選擇兩個隱層,是基於視覺化讓你看得清楚的考量,待會解釋
cause certain others to fire
而安排 16 個神經元,只是為了符合版面,也是為了讓你看得清楚
Now the network
在實際應用上,神經網路的結構經實驗不斷調整可以變得非常巨大且特殊
I'm showing here has already been trained to recognize digits and let me show you what I mean by that
神經網路操作一層激勵值的方式,會決定下一層的激勵值
It means if you feed in an image lighting up all
這正是神經網路的核心價值,不同以往的資料處理技術,現在發展成只要輸入
784 neurons of the input layer according to the brightness of each pixel in the image
激勵值從上一層傳到下一層,最後輸出足夠正確的結果
That pattern of activations causes some very specific pattern in the next layer
神經網路的本質就是模仿生物的大腦,就像一叢腦細胞被激發
Which causes some pattern in the one after it?
引發其他神經細胞的串聯反應
Which finally gives some pattern in the output layer and?
我現在展示的
The brightest neuron of that output layer is the network's choice so to speak for what digit this image represents?
這個神經網路已經訓練完成了,可以準確辨識圖像中的數字
And before jumping into the math for how one layer influences the next or how training works?
讓我來解釋「傳遞激勵值」這點
Let's just talk about why it's even reasonable to expect a layered structure like this to behave intelligently
意思是:當你輸入一張 28x28 像素的圖像,它將點亮
What are we expecting here? What is the best hope for what those middle layers might be doing?
所有 784 個神經元 每個都比照對應像素的灰階值,來決定自己的激勵值
Well when you or I recognize digits we piece together various components a nine has a loop up top and a line on the right
決定的數值分布狀態會影響下一層被啟動的神經元的分布
an 8 also has a loop up top, but it's paired with another loop down low
又會導致下一層不同的分布
A 4 basically breaks down into three specific lines and things like that
最後抵達輸出層,輸出層的神經元也會有特定的分布
Now in a perfect world we might hope that each neuron in the second-to-last layer
而最亮的那個就是神經網路所認為最有可能答對的答案
corresponds with one of these sub components
但在一腳踏進數學之前,要先知道上層如何影響下層,而且機器學習為什麼會有用
That anytime you feed in an image with say a loop up top like a 9 or an 8
為什麼我們認為層狀結構會像這樣聰明地運作是非常合理的?
There's some specific
我們在期待什麼呢? 我們最想要神經網路的隱藏層怎麼運作呢?
Neuron whose activation is going to be close to one and I don't mean this specific loop of pixels the hope would be that any
在你或我在辨識圖中的數字時我們會把各種筆畫拼湊在一起。
Generally loopy pattern towards the top sets off this neuron that way going from the third layer to the last one
一個 9 字,上面有圓圈,而在右邊有一條直線
just requires learning which combination of sub components corresponds to which digits
一個 8 字,上面也有一个圓圈但在下面與另一個圓圈相連
Of course that just kicks the problem down the road
一個 4 基本上可以拆解成就像那些特定的筆畫
Because how would you recognize these sub components or even learn what the right sub components should be and I still haven't even talked about
一個理想的情況中我們會希望第二層的每個神經元
How one layer influences the next but run with me on this one for a moment
能識別這些筆劃的其中之一
recognizing a loop can also break down into subproblems
每次你輸入一個有頂部有個圓圈的圖像如 9 或 8 時
One reasonable way to do this would be to first recognize the various little edges that make it up
隱層第二層的某些特定神經元的激勵值就會接近 1
Similarly a long line like the kind you might see in the digits 1 or 4 or 7
而我要的並不是單單適用這種圓圈,而是更廣泛的各種圓圈皆適用
Well that's really just a long edge or maybe you think of it as a certain pattern of several smaller edges
如此,在隱層第二層到輸出層的神經元
So maybe our hope is that each neuron in the second layer of the network
只需要學習對應於數字的筆畫的組合
corresponds with the various relevant little edges
當然這又丟出了一道難題
Maybe when an image like this one comes in it lights up all of the neurons
因為你怎麼讓那些神經元知道那些數字該對應到那些特定的筆畫?
associated with around eight to ten specific little edges
而我甚至還沒開始講上一層怎麼影響下一層,但是再聽我解釋一下這裡
which in turn lights up the neurons associated with the upper loop and a long vertical line and
辨識一個圓圈的問題也可以分解成辨識一些較小零件的問題
Those light up the neuron associated with a nine
一個合理的方法是認出組成它的各式各樣的邊
whether or not
同樣的道理,你在數字 1,4 或者 7 中所看到的一條長線
This is what our final network actually does is another question, one that I'll come back to once we see how to train the network
真的就是好幾小條的短線,根據特定筆畫順序,組合成的長線
But this is a hope that we might have. A sort of goal with the layered structure like this
所以我們期望在這網絡第二層
Moreover you can imagine how being able to detect edges and patterns like this would be really useful for other image recognition tasks
對應著各式各樣的一些小邊
And even beyond image recognition there are all sorts of intelligent things you might want to do that break down into layers of abstraction
也許出現一個像這樣的一個圖像就點亮
Parsing speech for example involves taking raw audio and picking out distinct sounds which combine to make certain syllables
所有大約有 8 到 10 種有關的特定神經元
Which combine to form words which combine to make up phrases and more abstract thoughts etc
它接著點亮後上方的圓圈和一根垂直的長線以及點亮和一條線相聯的神經元
But getting back to how any of this actually works picture yourself right now designing
最後點亮數字 9 的神經元
How exactly the activations in one layer might determine the activations in the next?
不管這個是不是我們最終的網絡實際上的實施是另一個問題
The goal is to have some mechanism that could conceivably combine pixels into edges
這個我們在知道怎樣了訓練網絡之後我在回過來講
Or edges into patterns or patterns into digits and to zoom in on one very specific example
但至少我們可能有點希望 像是一種以這樣分層結構為目標的
Let's say the hope is for one particular
你可以進一步想像怎樣能來檢測像這樣的邊和式樣對其他的圖像識別功能真是有用的
Neuron in the second layer to pick up on whether or not the image has an edge in this region here
並甚至在圖像識別之外做各種各樣智能的東西也許你也想分解成一些抽象的層
The question at hand is what parameters should the network have
例如句子的分析涉及到把原始的語音提出一些獨特的聲音構成一些音節再構成
what dials and knobs should you be able to tweak so that it's expressive enough to potentially capture this pattern or
詞再構成詞組以及更為抽象的思想等。
Any other pixel pattern or the pattern that several edges can make a loop and other such things?
但回到這些實際是怎樣工作的把你自己現在就放到這個的情景怎樣來設計
Well, what we'll do is assign a weight to each one of the connections between our neuron and the neurons from the first layer
如何在讓這層中的激勵函數可以決定下一層的激勵函數呢?
These weights are just numbers
這目標是有一些機能它想起來可以集中到一個特定的樣本來把一些像素結合成
then take all those activations from the first layer and compute their weighted sum according to these weights I
邊或者把邊結合成式樣或者式樣成爲數字 在這個特別的例子裡面
Find it helpful to think of these weights as being organized into a little grid of their own
我們希望第二層的這一個神經元
And I'm going to use green pixels to indicate positive weights and red pixels to indicate negative weights
可以正確的辨識出這個圖像裡有沒有一條邊
Where the brightness of that pixel is some loose depiction of the weights value?
現在我們想知道的是網路裡有哪些參數
Now if we made the weights associated with almost all of the pixels zero
要怎麼調整這些參數才能讓完整的表達出是這個圖案
except for some positive weights in this region that we care about
還是其他的圖案或是由數個邊組合成的圓圈之類的
then taking the weighted sum of
我們會分配給神經元和輸入層間的每一個連接線一個權重
all the pixel values really just amounts to adding up the values of the pixel just in the region that we care about
權重單純只是一個數字而已
And, if you really want it to pick up on whether there's an edge here what you might do is have some negative weights
然後計算所有激勵函數的加權總和
associated with the surrounding pixels
把這些權重整理成一個圖像應該更好理解
Then the sum is largest when those middle pixels are bright, but the surrounding pixels are darker
我把正的權重值標記為綠色 負的權重值標記為紅色
When you compute a weighted sum like this you might come out with any number
當顏色越亮代表它的值跟 0 差距越大
but for this network what we want is for activations to be some value between 0 & 1
除了我們所關注的區域以外
so a common thing to do is to pump this weighted sum
所有的權重值都改為 0
Into some function that squishes the real number line into the range between 0 & 1 and
然後去取得所有像素的加權總合
A common function that does this is called the sigmoid function also known as a logistic curve
幾乎就等於只有我們所關注的區域的值提升了
basically very negative inputs end up close to zero very positive inputs end up close to 1
如果知道這裡是不是真的存在一條邊
and it just steadily increases around the input 0
你只需要在周圍加上負的權重
So the activation of the neuron here is basically a measure of how positive the relevant weighted sum is
這樣當中間的像素亮但是周圍的像素暗 就可以得到最大的加權總和
But maybe it's not that you want the neuron to light up when the weighted sum is bigger than 0
當你計算加權總和時 它的值可能是任意實數
Maybe you only want it to be active when the sum is bigger than say 10
但是在這裡我們想要計算完的結果介於 0 跟 1 之間
That is you want some bias for it to be inactive
所以我們通常會把這個值丟進一個函數裡面
what we'll do then is just add in some other number like negative 10 to this weighted sum
把這個實數軸壓縮成一個介於 0 到 1 之間
Before plugging it through the sigmoid squishification function
有一個常見的函數叫做「Sigmoid」也被稱為「邏輯函數」
That additional number is called the bias
基本上越小的數會越來越接近 0 越大的數會越來越接近 1
So the weights tell you what pixel pattern this neuron in the second layer is picking up on and the bias
輸入值在 0 附近的會平穩增長
tells you how high the weighted sum needs to be before the neuron starts getting meaningfully active
所以從神經網路得到的激勵函數基本上就代表加權總和的大小
And that is just one neuron
但是不是每次只要加權總和大於零的時候就點亮神經元
Every other neuron in this layer is going to be connected to all
也許你只想要在它大於 10 的時候啟動
784 pixels neurons from the first layer and each one of those 784 connections has its own weight associated with it
所以要加入一個門檻來確保它不會隨便啟動
also each one has some bias some other number that you add on to the weighted sum before squishing it with the sigmoid and
我們只要在加權總和後面加上一個像是 負10 之類的數
That's a lot to think about with this hidden layer of 16 neurons
再把它塞進邏輯函數裡
that's a total of 784 times 16 weights along with 16 biases
這個附加的數字就叫做偏置
And all of that is just the connections from the first layer to the second the connections between the other layers
所以權重告訴我們下一層的神經元所關注的圖樣
Also, have a bunch of weights and biases associated with them
偏置則告訴我們加權總和要超過什麼程度才是有意義的
All said and done this network has almost exactly
以上只是一個神經元的情況
13,000 total weights and biases
在這一層的每個神經元都會連接第一層共 784 個神經元
13,000 knobs and dials that can be tweaked and turned to make this network behave in different ways
而且這 784 條連接線都各有一個屬於自己的權重
So when we talk about learning?
還有每一個神經元都會在計算完加權總和後再加上自己的偏置再用邏輯函數輸出自己的結果
What that's referring to is getting the computer to find a valid setting for all of these many many numbers so that it'll actually solve
讓我們看看這個有著 16 個神經元的隱藏層
the problem at hand
這 16 個神經元都各有 784 個自己的權重和 16 個偏置
one thought
這些還只是第一層和第二層的連接而已
Experiment that is at once fun and kind of horrifying is to imagine sitting down and setting all of these weights and biases by hand
在其他層裡還有他們各自的權重和偏置
Purposefully tweaking the numbers so that the second layer picks up on edges the third layer picks up on patterns etc
整體來說整個網路使用了
I personally find this satisfying rather than just reading the network as a total black box
大約 13,000 個權重和偏置
Because when the network doesn't perform the way you
13,000 個可以調整的參數來讓網路可以呈現不同的結果
anticipate if you've built up a little bit of a relationship with what those weights and biases actually mean you have a starting place for
所以當我們談到學習的時候
Experimenting with how to change the structure to improve or when the network does work?
就是在說如何讓電腦去找到一大堆正確的參數
But not for the reasons you might expect
讓它解決問題
Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible
有一個仔細想想會很嚇人的情況
solutions
想像一下如果你需要手動調整這些權重和偏置
By the way the actual function here is a little cumbersome to write down. Don't you think?
設定這些數字來讓第二層識別一條邊 然後讓第三層識別圖案
So let me show you a more notationally compact way that these connections are represented. This is how you'd see it
我個人認為這樣想像會比把它整個當成一個黑盒子更好
If you choose to read up more about neural networks
因為當網路的輸出和你的認知有所差異時
Organize all of the activations from one layer into a column as a vector
如果你能足夠了解權重與偏置的關係
Then organize all of the weights as a matrix where each row of that matrix
就更容易該怎麼改變結構來修正
corresponds to the connections between one layer and a particular neuron in the next layer
或是網路能輸出正確的結果但是過程跟你想像中有差異
What that means is that taking the weighted sum of the activations in the first layer according to these weights?
那麼去挖掘權重和偏置的實際境況對於測試你的認知有幫助
Corresponds to one of the terms in the matrix vector product of everything we have on the left here
和找出所有可能的解決方案
By the way so much of machine learning just comes down to having a good grasp of linear algebra
順帶一提 你不覺得這個公式看起來有點複雜嗎?
So for any of you who want a nice visual understanding for matrices and what matrix vector multiplication means take a look at the series I did on linear algebra
所以讓我來示範以簡單的符號來表達整個公式
especially chapter three
如果你以後想要繼續鑽研神經網路就會很常看到
Back to our expression instead of talking about adding the bias to each one of these values independently we represent it by
我們把這一層的激勵函數放到一個向量中
Organizing all those biases into a vector and adding the entire vector to the previous matrix vector product
然後把所有的權重放到矩陣中
Then as a final step
在這個陣列裡的每一列將會對應到這一層和下一層的所有連線的權重
I'll rap a sigmoid around the outside here
這代表矩陣相乘後的每一項都代表其中一個神經元的激勵函數
And what that's supposed to represent is that you're going to apply the sigmoid function to each specific
矩陣相乘的結果就是對應的神經元激勵函數
component of the resulting vector inside
順道一提 學習機器學習需要對線性代數有一定的了解
So once you write down this weight matrix and these vectors as their own symbols you can
所以任何想要透過視覺化的教學理解矩陣和矩陣乘法可以去看我的以前做的線性代數系列影片
communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression and
特別是第三篇
This makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication
會到正題 當我要加上偏置時也會把每一個偏置
Remember how earlier I said these neurons are simply things that hold numbers
放到矩陣裡在和前面算出來的結果做矩陣加法
Well of course the specific numbers that they hold depends on the image you feed in
最後一步
So it's actually more accurate to think of each neuron as a function one that takes in the
我會用邏輯函數把整個結果包起來
outputs of all the neurons in the previous layer and spits out a number between zero and one
意思是把最後得到的向量一個一個丟進邏輯函數中
Really the entire network is just a function one that takes in
來計算出每一個結果
784 numbers as an input and spits out ten numbers as an output
現在當我們用簡單的符號來寫下公式
It's an absurdly
就可以很清楚完整的表達每一層之間的關係
Complicated function one that involves thirteen thousand parameters in the forms of these weights and biases that pick up on certain patterns and which involves
還有這也讓我們更簡單快速的編寫程式碼 像是很多的對於矩陣運算有最佳化的函式庫
iterating many matrix vector products and the sigmoid squish evocation function
還記得我一開始說神經元只是一個簡單承載數字的東西嗎
But it's just a function nonetheless and in a way it's kind of reassuring that it looks complicated
當然 它裡面所裝得數字取決於你給他的圖像
I mean if it were any simpler what hope would we have that it could take on the challenge of recognizing digits?
所以更準確地說應該把神經元當作一個函數
And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Oh?
它的輸出是取決於上一層所有的神經元然後轉換成一個介於 0 到 1 的數字
That's what I'll show in the next video, and I'll also dig a little more into what this particular network we are seeing is really doing
其實也可以把整個神經網路當作一個函數
Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out
有著 784 個輸入值和 10 輸出值的函數
But realistically most of you don't actually receive notifications from YouTube, do you ?
它是一個非常複雜的函數
Maybe more honestly I should say subscribe so that the neural networks that underlie YouTube's
用來選擇正確圖案的權重和偏置參數就有 13000 個
Recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you
還會不停地用到矩陣乘法和邏輯函數
anyway stay posted for more
即便它看起來很複雜但卻是很可靠的一個函數
Thank you very much to everyone supporting these videos on patreon
我的意思是如果它看起來很簡單 我們怎麼能期望它辨識數字呢
I've been a little slow to progress in the probability series this summer
但是它是如何完成這個任務的? 這個網路是如何只透過讀取數據來學習如何調整權重和偏置的?
But I'm jumping back into it after this project so patrons you can look out for updates there
這就是下一集的內容了 我們也會更深入更多關於網路運作的細節
To close things off here I have with me Lisha Li
現在又到了提醒大家如果要獲得更多新影片的通知趕快訂閱
Lee who did her PhD work on the theoretical side of deep learning and who currently works at a venture capital firm called amplify partners
但是大部分的時候你都沒有收到來自 YouTube 的通知吧?
Who kindly provided some of the funding for this video so Lisha one thing
也許我該說趕快訂閱讓 YouYube 的推薦演算法神經網路
I think we should quickly bring up is this sigmoid function
去相信你想看到關於這個頻道的消息
As I understand it early networks used this to squish the relevant weighted sum into that interval between zero and one
總之留意更多消息
You know kind of motivated by this biological analogy of neurons either being inactive or active (Lisha) - Exactly
感謝大家在 Patreon 平台上的支持
(3B1B) - But relatively few modern networks actually use sigmoid anymore. That's kind of old school right ? (Lisha) - Yeah or rather
我的機率系列影片在這個夏天進展得有點慢
ReLU seems to be much easier to train (3B1B) - And ReLU really stands for rectified linear unit
但是在這個系列結束後我會回到那個系列 所以大家可以留意新消息
(Lisha) - Yes it's this kind of function where you're just taking a max of 0 and a where a is given by
在影片的結尾 我邀請了 Lisha Li
what you were explaining in the video and what this was sort of motivated from I think was a
她在博士研究是關於深度學習的理論 現在一個叫做 Amplify Partners 的創投公司任職
partially by a biological
他們慷慨的贊助了這個影片
Analogy with how
所以 Lisha 我們想要快速的提一下「Sigmoid」這個函數
Neurons would either be activated or not and so if it passes a certain threshold
據我所知早期的神經網路都會使用這個函數來讓權重介於 0 到 1 之間
It would be the identity function
用來模仿生物學上的神經元是處於活耀還是不活耀的狀態 (Lisha) 沒錯
But if it did not then it would just not be activated so be zero so it's kind of a simplification
但是現在神經網路幾乎沒有使用「Sigmoid」函數,是因為它已經太老套了是嗎?
Using sigmoids didn't help training, or it was very difficult to train
(Lisha) 沒錯,更準確的說是使用 ReLU 更容易訓練 (3B1B) ReLU 的全名是「線性整流函數」 對吧
It's at some point and people just tried relu and it happened to work
(Lisha) 對,這個函數會返回 0 跟輸入值的最大值
Very well for these incredibly
我的解釋是使用這個函數比較符合
Deep neural networks. (3B1B) - All right
生物學的原理
Thank You Lisha
類似於
for background amplify partners in early-stage VC invests in technical founders building the next generation of companies focused on the
神經元在甚麼時候活耀或不活耀 當它超過一個門檻的時候
applications of AI if you or someone that you know has ever thought about starting a company someday
它就像恆等函數一樣
Or if you're working on an early-stage one right now the Amplify folks would love to hear from you
但如果它沒有超過則輸出 0 所以它是一個很簡單的函數
they even set up a specific email for this video 3blue1brown@amplifypartners.com
使用「Sigmoid」並沒有幫助訓練或是說它很難以訓練
so feel free to reach out to them through that
後來有人嘗試了 ReLU