Subtitles section Play video
In this video, we'll introduce terms and notation that we'll use throughout this course.
在本視頻中,我們將介紹本課程中會用到的術語和符號。
Let's start with variable types.
讓我們從變量類型開始。
We'll compare and contrast two pairs of variable types.
我們將對比兩對變量類型。
Here's the first.
這是第一個。
The first pair is response variable versus explanatory variable.
第一對是響應變量與解釋變量。
The analyst is primarily interested in the response variable.
分析人員主要關注的是響應變量。
We want to know if, and how, we can understand the response variable better using other variables.
我們想知道是否以及如何利用其他變量更好地理解響應變量。
In the Commute and Chris setup, questions 1 and 2 have commute as the response variable.
在 "通勤 "和 "克里斯 "設置中,問題 1 和 2 將 "通勤 "作為響應變量。
Chris hopes to understand how commute is affected by other variables.
克里斯希望瞭解其他變量對通勤的影響。
The response variable goes by other names, like output variable and dependent variable.
響應變量還有其他名稱,如輸出變量和因變量。
In contrast, an explanatory variable is any variable used to study the response variable.
相反,解釋變量是用於研究響應變量的任何變量。
The goal here is to find potential relationships between the response variable and an explanatory variable.
這樣做的目的是找到響應變量與解釋變量之間的潛在關係。
In the Commute and Chris setup, question 1 uses departure as an explanatory variable to analyze commute.
在通勤和克里斯的設置中,問題 1 使用出發作為解釋變量來分析通勤情況。
We also call explanatory variables input variables or independent variables.
我們也稱解釋變量為輸入變量或自變量。
Sometimes, we may even call them predictors or features.
有時,我們甚至可以稱它們為預測因子或特徵。
Even though we can call response and explanatory variables dependent and independent variables, they are conceptually different from the independent random variables that we encountered in probability.
儘管我們可以把反應變量和解釋變量稱為因變量和自變量,但它們在概念上與我們在概率論中遇到的獨立隨機變量不同。
Okay, that's our first pair.
好了,這是我們的第一對。
Our second variable type pair is a quantitative variable versus a qualitative variable.
我們的第二對變量類型是定量變量與定性變量。
As the name suggests, quantitative variables take on quantities, which we can divide into two main groups, which creates count variables and continuous variables.
顧名思義,定量變量具有數量,我們可以將其分為兩大類,即計數變量和連續變量。
Count variables take on non-negative integers, while continuous variables take on values from an interval.
計數變量取值為非負整數,而連續變量取值為區間值。
We commonly call qualitative variables as categorical variables.
我們通常將定性變量稱為分類變量。
These variables take on a small number of possible categories, also known as classes or levels.
這些變量有少量可能的類別,也稱為類別或級別。
It's common that we'll assign numbers to the categories, but this does not convert the variables into count variables.
我們通常會給類別分配數字,但這並不會將變量轉換為計數變量。
Okay, let's review the commute and Chris setup and categorize the eight variables into count, continuous, and categorical variables.
好了,讓我們回顧一下通勤和克里斯的設置,並將八個變量分為計數變量、連續變量和分類變量。
Commute is measured in minutes.
通勤時間以分鐘計算。
Because any value exceeding zero is possible, commute is a continuous variable.
因為任何超過零的值都是可能的,所以通勤是一個連續變量。
For similar reasons, departure, temp, and precip chance are also continuous variables.
出於類似的原因,偏離、溫度和降水概率也是連續變量。
They just take on values from different intervals.
它們只是在不同的時間間隔內取值。
Next, precip, season, and accident all take on two or four possible outcomes, making them categorical variables.
其次,降水、季節和事故都有兩種或四種可能的結果,是以是分類變量。
Last is police, which takes on non-negative integers, making it a count variable.
最後是警察,它接受非負整數,是一個計數變量。
We can subdivide categorical variables further into nominal and ordinal variables.
我們可以將分類變量進一步細分為名義變量和順序變量。
If there's not a meaningful order to the categories, then it's a nominal variable.
如果分類沒有一個有意義的順序,那麼它就是一個名義變量。
In the case where we assign numbers to categories, the numbers only act as labels.
在我們為類別分配數字的情況下,數字只起到標籤的作用。
If there is a meaningful order to the categories, then it's an ordinal variable.
如果類別有一個有意義的順序,那麼它就是一個序數變量。
In the case where numbers are assigned to the categories, the numbers communicate the order.
在為類別分配數字的情況下,數字表示順序。
Let's use season from the commute and Chris setup as an example.
讓我們以通勤中的季節和克里斯的設置為例。
If we assign 1 to winter, 2 to spring, 3 to summer, and 4 to fall, then the categories follow the calendar seasons in sequence and therefore have meaningful order.
如果我們把 1 指定為冬季,2 指定為春季,3 指定為夏季,4 指定為秋季,那麼這些類別就會按照日曆上的季節順序排列,是以就有了有意義的順序。
This makes season an ordinal variable.
這使得季節成為一個順序變量。
If instead we assign numbers based on alphabetical order of the seasons, then we do not have meaningful order to the categories.
如果我們根據季節的字母順序來分配數字,那麼我們的分類順序就沒有意義了。
And this makes season a nominal variable.
這使得季節成為一個名義變量。
Now that we've explored variable types, let's establish basic notation that we'll use throughout this course.
既然我們已經瞭解了變量類型,那麼我們就來建立本課程中將一直使用的基本符號。
We denote variables in general by the letter x.
我們一般用字母 x 來表示變量。
If there are multiple variables, we use the subscript j to distinguish between variables.
如果存在多個變量,我們使用下標 j 來區分變量。
However, it's common to use the letter y to denote response variables.
不過,通常使用字母 y 來表示響應變量。
We use p to represent the number of variables in a dataset, excluding the response variable if there is one.
我們用 p 表示數據集中的變量數量,如果有響應變量,則不包括響應變量。
This means j can take on integer values from 1 to p.
這意味著 j 可以取 1 到 p 的整數值。
For example, x sub 2 represents the second explanatory variable.
例如,x 子 2 代表第二個解釋變量。
Now, what if we want to refer to a specific observation of a variable?
現在,如果我們想引用變量的某個具體觀測值,該怎麼辦?
We use subscript i for this, and the letter n represents the total number of observations in the dataset.
我們使用下標 i 來表示,字母 n 代表數據集中的觀察結果總數。
This means i can take on integer values from 1 to n.
這意味著 i 可以取 1 到 n 的整數值。
Using the commute and Chris scenario as an example, the fifth observation contains values from the fifth recorded day.
以通勤和克里斯的情況為例,第五個觀測值包含第五個記錄日的值。
These include y sub 5, the response variable data point recorded on that day, and x sub 5 comma 1 through x sub 5 comma p, data points for the explanatory variables from the same day.
其中包括 y sub 5(當天記錄的響應變量數據點)和 x sub 5 逗號 1 至 x sub 5 逗號 p(當天的解釋變量數據點)。
But a word of caution about subscripts.
但關於下標,還是要提醒一下。
When x has two numbers in its subscript, the first number is i, the second number is j, as we have shown.
當 x 的下標中有兩個數字時,第一個數字是 i,第二個數字是 j,如我們所示。
However, if x has only one number in its subscript, it can be either i or j.
但是,如果 x 的下標只有一個數字,那麼它可以是 i 或 j。
So, how can we identify which is which?
那麼,我們怎樣才能識別哪個是哪個呢?
Well, it depends on the context.
這要看具體情況。
Make sure you read carefully.
請務必仔細閱讀。
In general, if there is only one x variable, that is, p equals 1, then there is no purpose for subscript j.
一般來說,如果只有一個 x 變量,即 p 等於 1,那麼就不需要下標 j 了。
So, in this case, the one number in the subscript is usually i.
是以,在這種情況下,下標中的一個數字通常是 i。
However, if there are multiple x variables, then the one number in the subscript is usually j.
不過,如果有多個 x 變量,那麼下標中的一個數字通常就是 j。
You may recall from a probability course that we use uppercase letters to represent random variables, such as capital X and capital Y.
您可能還記得,在概率課程中,我們用大寫字母來表示隨機變量,如大寫 X 和大寫 Y。
We can add subscripts to these letters the same way.
我們可以用同樣的方法為這些字母添加下標。
Introducing subscript i adds clarity, but it can also make equations or expressions messy and difficult to read.
引入下標 i 會增加清晰度,但也會使等式或表達式變得混亂難讀。
But we combat this problem by moving to vector and matrix notations.
不過,我們通過改用向量和矩陣符號來解決這個問題。
If we use a matrix to represent a data set, rows represent the observations, while columns represent the variables.
如果我們用矩陣來表示數據集,那麼行代表觀測值,列代表變量。
We'll see more of this in future sections, but for now, let's review some basic facts about matrices.
我們將在以後的章節中看到更多這方面的內容,但現在,讓我們回顧一下有關矩陣的一些基本事實。
First, for matrix A, A superscript T is A's transpose.
首先,對於矩陣 A,A 的上標 T 是 A 的轉置。
Transposing simply means swapping the rows and columns so that the k-th column becomes the k-th row, and vice versa.
對換簡單地說就是交換行和列,使第 k 列變成第 k 行,反之亦然。
Notice transposing a matrix also reverses its dimensions.
請注意,矩陣的轉置也會反轉其維度。
If A is an A by B matrix, then A transpose is a B by A matrix.
如果 A 是一個 A 乘 B 的矩陣,那麼 A 的轉置就是一個 B 乘 A 的矩陣。
And second, A superscript negative one is A's inverse.
其次,A 的上標負一是 A 的倒數。
If we multiply a matrix by its inverse in any order, we will get the identity matrix.
如果我們以任何順序將矩陣與它的逆矩陣相乘,就會得到同一矩陣。
Note that the identity matrix has ones in its diagonal and zeros elsewhere.
請注意,同一矩陣的對角線上為 1,其他地方為 0。
One of the enemies in this course is confusion.
本課程的敵人之一就是混亂。
We'll try to minimize confusion by using clear and consistent notation.
我們將盡量使用清晰一致的符號,以減少混淆。
However, don't assume that the conventions that we use here are universal.
不過,不要以為我們在這裡使用的慣例是通用的。
Remember, notation only represents concepts.
記住,符號只代表概念。
However, authors may use different notation to suit their needs.
不過,作者可以根據自己的需要使用不同的符號。
They may even use the same notation for different but similar concepts.
他們甚至可能對不同但相似的概念使用相同的符號。
So, train yourself to distinguish the concept from the notation.
是以,要訓練自己區分概念和符號。