Subtitles section Play video
After supervised learning, the most widely used form of machine learning is unsupervised learning.
在監督學習之後,應用最廣泛的機器學習形式是無監督學習。
Let's take a look at what that means.
讓我們來看看這意味著什麼。
We've talked about supervised learning, and this video is about unsupervised learning.
我們已經討論過監督學習,本視頻將介紹無監督學習。
But don't let the name unsupervised fool you.
但千萬別被 "無人監管 "這個名字騙了。
Unsupervised learning is, I think, just as super as supervised learning.
我認為,無監督學習和有監督學習一樣,都是超級學習方法。
When we're looking at supervised learning in the last video, recall that it looks something like this.
當我們看上一段視頻中的監督學習時,請回憶一下,它看起來是這樣的。
In the case of a classification problem, each example was associated with an output label Y, such as benign or malignant, designated by the O's and crosses.
在分類問題中,每個示例都與輸出標籤 Y 相關聯,如良性或惡性,由 O 和叉號表示。
In unsupervised learning, we're given data that isn't associated with any output labels Y.
在無監督學習中,我們得到的數據與任何輸出標籤 Y 無關。
Say you're given data on patients and their tumor size and the patient's age, but not whether the tumor was benign or malignant.
假設你得到了病人的數據、腫瘤大小和病人的年齡,但不知道腫瘤是良性還是惡性。
So the dataset looks like this on the right.
是以,數據集看起來就像右邊這樣。
We're not asked to diagnose whether the tumor is benign or malignant because we're not given any labels Y in the dataset.
我們不需要診斷腫瘤是良性還是惡性,因為數據集中沒有給我們任何標籤 Y。
Instead, our job is to find some structure or some pattern or just find something interesting in the data.
相反,我們的工作是找到一些結構或模式,或者只是在數據中找到一些有趣的東西。
This is unsupervised learning.
這就是無監督學習。
We call it unsupervised because we're not trying to supervise the algorithm to give some, quote, right answer for every input.
我們稱之為 "無監督",是因為我們並不是要監督算法為每一個輸入給出正確的答案。
Instead, we ask the algorithm to figure out all by itself what's interesting or what patterns or structures there might be in this data.
相反,我們要求算法自己找出數據中有趣的東西或可能存在的模式或結構。
With this particular dataset, an unsupervised learning algorithm might decide that the data can be assigned to two different groups or two different clusters.
對於這個特定的數據集,無監督學習算法可能會認為數據可以分配到兩個不同的組或兩個不同的簇。
It might decide that there's one cluster or group over here, and there's another cluster or group over here.
它可能會決定,這裡有一個群組或小組,這裡還有另一個群組或小組。
This is a particular type of unsupervised learning called a clustering algorithm because it places the unlabeled data into different clusters, and this turns out to be used in many applications.
這是一種特殊的無監督學習方式,被稱為聚類算法,因為它能將未標記的數據放入不同的聚類中,這在很多應用中都會用到。
For example, clustering is used in Google News.
例如,谷歌新聞就使用了聚類技術。
What Google News does is, every day it goes and looks at hundreds of thousands of news articles on the Internet and groups related stories together.
谷歌新聞每天都會瀏覽互聯網上成千上萬的新聞報道,並將相關的故事歸納在一起。
For example, here's a sample from Google News, where the headline of the top article is, giant panda gives birth to rear twin cubs at Japan's oldest zoo.
例如,下面是谷歌新聞中的一個樣本,其中置頂文章的標題是:大熊貓在日本最古老的動物園產下一對雙胞胎幼崽。
This article had actually caught my eye because my daughter loves pandas and so there are a lot of stuffed panda toys and watching of panda videos in my house.
這篇文章引起了我的注意,因為我的女兒喜歡熊貓,所以家裡有很多熊貓玩具和熊貓視頻。
Looking at this, you might notice that below this are other related articles.
看到這裡,您可能會注意到下面還有其他相關文章。
Maybe from the headlines alone, you can start to guess what clustering might be doing.
也許單從標題上,你就能猜到聚類可能會帶來什麼影響。
Notice that the word panda appears here, here, here, here, and here.
請注意,"熊貓 "一詞在這裡、這裡、這裡、這裡和這裡都出現過。
Notice that the word twin also appears in all five articles, and the word zoo also appears in all of these articles.
請注意,孿生一詞也出現在所有五個條款中,而動物園一詞也出現在所有這些條款中。
The clustering algorithm is finding articles, all of all the hundreds of thousands of news articles on the Internet that day, finding the articles that mention similar words and grouping them into clusters.
聚類算法是在當天互聯網上所有成千上萬篇新聞文章中查找文章,找出提到相似詞語的文章,並將它們歸為一組。
Now, what's cool is that this clustering algorithm figures out on its own which words suggest that certain articles are in the same group.
現在,最酷的是,這種聚類算法會自己找出哪些詞表明某些文章屬於同一組。
What I mean is, there isn't an employee at Google News who's telling the algorithm to find articles that the word panda and twins and zoo to put them into the same cluster.
我的意思是,谷歌新聞的員工並沒有告訴算法,要找到包含熊貓和雙胞胎的文章,並把它們放到同一個集群裡。
The news topics change every day and there are so many news stories, it just isn't feasible to have people doing this every single day for all the topics the news covers.
新聞話題每天都在變化,新聞報道也非常多,讓人們每天都對所有的新聞話題進行報道是不可行的。
Instead, the algorithm has to figure out on its own without supervision, what are the clusters of news articles today.
相反,算法必須在沒有監督的情況下自己找出今天的新聞文章集群。
That's why this clustering algorithm is a type of unsupervised learning algorithm.
這就是為什麼這種聚類算法是一種無監督學習算法。
Let's look at a second example of unsupervised learning applied to clustering genetic or DNA data.
讓我們來看第二個無監督學習應用於基因或 DNA 數據聚類的例子。
This image shows a picture of DNA microarray data.
該圖顯示的是 DNA 微陣列數據。
These look like tiny grids of a spreadsheet and each tiny column represents the genetic or DNA activity of one person.
它們看起來就像電子表格中的小網格,每一小列代表一個人的基因或 DNA 活動。
For example, this entire column here is from one person's DNA, and this other column is of another person.
例如,這整列是一個人的 DNA,而另一列是另一個人的 DNA。
Each row represents a particular gene.
每一行代表一個特定基因。
Just as an example, perhaps this row here might represent a gene that affects eye color, or this row here is a gene that affects how tall someone is.
舉個例子,也許這一行代表的是影響眼睛顏色的基因,或者這一行代表的是影響身高的基因。
Researchers have even found a genetic link to whether someone dislikes certain vegetables such as broccoli or Brussels sprouts or asparagus.
研究人員甚至發現,一個人是否不喜歡某些蔬菜(如西蘭花、球芽甘藍或蘆筍)與遺傳有關。
Next time someone asks you, why didn't you finish your salad?
下次有人問你,為什麼沒吃完沙拉?
You can tell them, maybe it's genetic.
你可以告訴他們,也許這是遺傳。
For DNA microarrays, the idea is to measure how much certain genes are expressed for each individual person.
DNA 微陣列的原理是測量每個人體內某些基因的表達量。
These colors, red, green, gray, and so on, show the degree to which different individuals do or do not have a specific gene active.
這些顏色(紅色、綠色、灰色等)顯示了不同個體的特定基因活躍或不活躍的程度。
What you can do is then run a clustering algorithm to group individuals into different categories or different types of people.
然後,您可以運行聚類算法,將個人分為不同類別或不同類型的人。
Like maybe these individuals are grouped together, and let's just call this type 1.
也許這些人被歸為一類,我們姑且稱之為第一類。
These people are grouped into type 2, and these people are grouped as type 3.
這些人被歸為 2 型,這些人被歸為 3 型。
This is unsupervised learning because we're not telling the algorithm in advance that there is a type 1 person with certain characteristics or a type 2 person with certain characteristics.
這是一種無監督學習,因為我們並沒有事先告訴算法,有一種類型的人具有某些特徵,或者有一種類型的人具有某些特徵。
Instead, what we're saying is, here's a bunch of data.
相反,我們要說的是,這裡有一堆數據。
I don't know what the different types of people are, but can you automatically find structure in the data, and automatically figure out what are the major types of individuals?
我不知道有哪些不同類型的人,但你能自動在數據中找到結構,自動找出哪些是主要的個人類型嗎?
Since we're not giving the algorithm the right answer for the examples in advance, this is unsupervised learning.
由於我們並沒有事先給算法提供例子的正確答案,是以這是一種無監督學習。
Here's a third example.
下面是第三個例子。
Many companies have huge databases of customer information.
許多公司都擁有龐大的客戶資訊數據庫。
Given this data, can you automatically group your customers into different market segments so that you can more efficiently serve your customers?
有了這些數據,您能否自動將客戶劃分為不同的細分市場,從而更有效地為客戶提供服務?
Quite briefly, the deeplearning.ai team did some research to better understand the deeplearning.ai community and why different individuals take these classes, subscribe to the batch weekly newsletter, or attend our pioneer events.
簡而言之,deeplearning.ai 團隊做了一些研究,以更好地瞭解 deeplearning.ai 社區,以及不同的人為什麼參加這些課程、訂閱批量週報或參加我們的先鋒活動。
Let's visualize the deeplearning.ai community as this collection of people, running clustering, that is market segmentation, found a few distinct groups of individuals.
讓我們把 deeplearning.ai 社區想象成這樣一群人,進行聚類,也就是市場細分,發現了幾個不同的群體。
One group's primary motivation is seeking knowledge to grow their skills.
其中一個群體的主要動機是尋求知識,增長技能。
Perhaps this is you, and if so, that's great.
也許這就是你,如果是這樣,那就太好了。
A second group's primary motivation is looking for a way to develop their career.
第二類人的主要動機是尋求職業發展的途徑。
Maybe you want to get a promotion or a new job or make some career progression.
也許你想獲得晉升或新工作,或者在職業生涯中取得一些進步。
If this describes you, that's great too.
如果你是這樣的人,那也很好。
And yet another group wants to stay updated on how AI impacts their field of work.
還有一類人希望瞭解人工智能如何影響他們的工作領域。
Perhaps this is you.
也許這就是你。
That's great too.
這也很好。
This is a clustering that our team use to try to better serve our community as we're trying to figure out what are the major categories of learners in the deeplearning.ai community.
這是我們團隊為更好地服務社區而進行的聚類,因為我們正試圖找出 deeplearning.ai 社區中學習者的主要類別。
So if any of these is your top motivation for learning, that's great.
是以,如果其中任何一項是你學習的最大動力,那就再好不過了。
And I hope I'll be able to help you on your journey.
希望我能在你的旅途中為你提供幫助。
Or in case this is you and you want something totally different than the other three categories, that's fine too.
或者,如果這就是你,你想要的東西與其他三類完全不同,那也沒關係。
And I want you to know I love you all the same.
我想讓你知道,我同樣愛你。
So to summarize, a clustering algorithm, which is a type of unsupervised learning algorithm, takes data without labels and tries to automatically group them into clusters.
綜上所述,聚類算法是一種無監督學習算法,它獲取無標籤的數據,並嘗試將其自動歸類為聚類。
And so maybe the next time you see or think of a panda, maybe you think of clustering as well.
所以,也許下次你看到或想到熊貓時,也會想到集群。
And besides clustering, there are other types of unsupervised learning as well.
除了聚類,還有其他類型的無監督學習。
Let's go on to the next video to take a look at some other types of unsupervised learning algorithms.
讓我們繼續觀看下一個視頻,瞭解其他類型的無監督學習算法。