Placeholder Image

Subtitles section Play video

  • Hello world. It's Siraj,

  • and what's the deal with vectors!

  • You're going to see this word a lot in machine learning

  • And it's one of the most crucial concepts to understand.

  • A huge part of Machine learning is

  • finding a way to properly represent some data sets

  • programmatically

  • Let's say you're a manager at Tesla

  • And you're given a data set of some measurements

  • for each car that was produced in the past week.

  • Each car on the list has three measurements

  • or features its length width and height

  • So a given car can then be represented as a point

  • in three-dimensional space where the values in each dimension

  • correlates to one of the features we are measuring.

  • This same logic applies to data points that have 300 features.

  • We can represent them in

  • 300-dimensional space.

  • While this is intuitively hard for us to understand as three dimensional beings.

  • Machines can do this very well.

  • Robot: Right, what do you want t..... Mother *****

  • This data point X is considered a Vector.

  • A vector is a 1-dimensional array.

  • Think of it as a list of values or a row in a table.

  • A vector of n-

  • -elements is an n-Dimensional Vector

  • with one dimension for each element.

  • So for a 4-dimensional Data point.

  • We can use a 1-by-4 array to hold its 4 feature values

  • and because it represents a set of features.

  • We call it a feature vector.

  • More general than a Vector is a Matrix.

  • A Matrix is a rectangular array of numbers

  • and a Vector is a row or column of a Matrix.

  • So each row in a Matrix could represent a different data point

  • with each column being its respective features.

  • Less general than a vector is a Scalar

  • which is a single number.

  • The most general term for all of these concepts is a Tensor.

  • A Tensor is a multi-dimensional array

  • so a First-order Tensor is a Vector,

  • a Second-order Tensor is a Matrix and

  • Tensors of order three and higher are

  • called higher-order Tensors.

  • So if a 1-D te nsor looks like a line...

  • Stop.

  • Who are you?

  • I think they get it.

  • I think they get it.

  • You could represent a social graph that contains

  • friends of friends of friends as a higher-order Tensor.

  • This is why Google built a library called

  • TensorFlow.

  • It allows you to create a computational graph

  • where Tensors created from data sets can

  • flow through a series of mathematical operations that optimize for an objective.

  • and while they built an entirely new type of chip called a TPU or Tensor Processing Unit.

  • As computational power and the amount of data we have

  • increases we are becoming more capable of processing

  • multi-dimensional data.

  • Vectors are typically represented in a multitude of ways,

  • and they're used in many different fields of science,

  • Especially physics since vectors act as a bookkeeping tool to keep track of two pieces of information,

  • typically a magnitude and a direction for physical quantity.

  • For example in Einstein's General theory of relativity

  • The Curvature of space-time

  • which gives rise to gravity

  • is described by what's called

  • a Riemann Curvature Tensor

  • which is a tensor of order 4.

  • So badass.

  • So we can represent not only the fabric of reality this way.

  • But the gradient of our optimization problem as well.

  • During first order optimization,

  • the weights of our model are

  • updated incrementally after each pass over the training data set,

  • given an Error Function like the Sum of Squared Errors,

  • We can compute the magnitude and

  • direction of the weight update by taking a step in the opposite direction of the error gradient.

  • This all comes from Linear Algebra

  • Algebra

  • roughly means relationships,

  • and it explores the relationships between unknown numbers.

  • Linear Algebra roughly means line-like relationships.

  • It's the way of organizing information about vector spaces

  • that makes manipulating groups of numbers

  • simultaneously easy.

  • It defines these structures like Vectors and Matrices

  • to hold these numbers and introduces new rules on how to add,

  • multiply, subtract and divide them.

  • So given two arrays,

  • the algebraic way to multiply them would be to do it like this

  • and the linear algebraic way would look like this

  • We compute the dot product,

  • instead of

  • multiplying each number like this.

  • The linear algebraic approach is

  • three times faster in this case.

  • Any type of data can be represented as a vector,

  • images, videos, stock indices,

  • text, audio signals,

  • dougie dancing.

  • No matter the type of data,

  • it can be broken down into a set of numbers

  • The model is not really accepting the data.

  • It keeps throwing errors.

  • Let me see.

  • Oh it looks like you got to vectorize it.

  • What do you mean?

  • The model you wrote expected tensors of a certain size as its input.

  • So we basically got to reshape the input data.

  • so it's in the right vector space

  • and then once it is

  • we can compute things like the Cosine distance between Data points and

  • the Vector norm.

  • Is there a Python library to do that?

  • You gotta love NumPy

  • Vectorization is essentially just a matrix operation,

  • and I can do it in a single line

  • Awesome.

  • Well you vectorize it up.

  • I've gotta back-propagate out for today.

  • Cool, where to?

  • Tinder date

  • All right, yeah. See ya ...

  • A researcher named McCullough

  • used the machine learning model called a neural network

  • to create vectors for words

  • WORD2VEC.

  • Given some input corpus of text,

  • like thousands of news articles,

  • it would try to predict the next word

  • in a sentence given the words around it.

  • So a given word is

  • encoded into a vector.

  • The model then uses that vector to try and predict the next word

  • if it's prediction doesn't match the actual next word,

  • the components of this vector are adjusted.

  • Each words context in the corpus

  • acts as a teacher,

  • sending error signals back to adjust the vector.

  • The vectors of words that are judged

  • similarly by their context are iteratively nudged closer together

  • by adjusting the numbers in the vector and so after training the model learns

  • thousands of Vectors for words.

  • Give it a new word

  • and it will find its associated word vector

  • also called word embedding.

  • Vectors don't just represent data.

  • They help represent our models too.

  • Many types of machine learning models represent their Learnings as vectors.

  • All types of Neural networks do this.

  • Given some data it will learn Dense

  • Representations of that data

  • These representations are essentially

  • categories akin to if you have a data set of different colored eye pictures.

  • It will learn a general

  • representation for all eye colors.

  • So given a new unlabeled eye picture,

  • it would be able to recognize it as an eye

  • I see vectors

  • Good.

  • Once data is vectorized,

  • we can do so many things with it

  • A trained Word2Vec model turns words into vectors,

  • then we can perform mathematical

  • operations on these vectors.

  • We can see how closely related words are

  • by computing the distance between their vectors.

  • The word Sweden, for example,

  • is closely related to other wealthy Northern European countries

  • Because the distance between them

  • is small when plotted on a graph

  • Word vectors that are similar tend to

  • cluster together like types of animals.

  • Associations can be built like Rome is to Italy as

  • Beijing is to China

  • and operations like performing hotels plus motel gives us Holiday Inn

  • Incredibly, vectorizing words is able to

  • capture their semantic meanings numerically.

  • The way we're able to compute the distance

  • between two vectors

  • is by using the notion of a vector Norm

  • A norm is any function G that maps Vectors to real numbers

  • that satisfies the following conditions

  • The lengths are always positive.

  • The length of zero implies zero.

  • Scalar multiplication

  • extends lengths in a predictable way

  • and distances add

  • reasonably

  • so in a basic vector space the norm of a vector

  • would be its absolute value

  • and the distance between two numbers like this.

  • Usually the length of a vector is

  • calculated using the Euclidean norm

  • which is defined like so

  • but this isn't the only way to define length.

  • There are others.

  • You'll see the terms L1 norm

  • and L2 norm used a lot in machine learning.

  • The L2 norm is the Euclidean norm.

  • The L1 norm is also called the Manhattan distance.

  • We can use either to normalize a vector to get its unit vector

  • and use that to compute the distance.

  • Computing the distance between vectors is useful

  • for showing users recommendations.

  • Both of these terms are also used in the process of regularization.

  • We train models to fit a set of training data

  • But sometimes the model gets so fit to the training data

  • that it doesn't have good prediction performance.

  • It can't generalize well to new data points.

  • To prevent this overfitting,

  • we have to regularize our model.

  • The common method to finding the best

  • model is by defining a Loss function that describes

  • how well the model fits the data.

  • To sum things up, feature vectors are used to represent numeric

  • or symbolic characteristics of data called features

  • in a mathematical way.

  • They can be represented in

  • multi-dimensional vector spaces where we can perform

  • operations on them

  • like computing their distance and adding them

  • and we can do this by computing the vector norm

  • which describes the size of a vector.

  • Also useful for

  • preventing overfitting.

  • The Wizard of the week Award goes to Vishnu Kumar.

  • He implemented both gradient descent and

  • Newton's method to create a model

  • able to predict the amount of calories burned for cycling a certain distance

  • the plots are great and the

  • code is architected very legibly.

  • Check it out. Amazing work, Vishnu.

  • And the runner up for the last-minute

  • entry is Hamad Shaikh

  • I'd love to have details your notebook was

  • This week's challenge is to implement both

  • L1 and L2 regularization on a linear regression model

  • Check the Github readme in the description for details and winners will be announced in a week.

Hello world. It's Siraj,

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it