Subtitles section Play video Print subtitles Hello world. It's Siraj, and what's the deal with vectors! You're going to see this word a lot in machine learning And it's one of the most crucial concepts to understand. A huge part of Machine learning is finding a way to properly represent some data sets programmatically Let's say you're a manager at Tesla And you're given a data set of some measurements for each car that was produced in the past week. Each car on the list has three measurements or features its length width and height So a given car can then be represented as a point in three-dimensional space where the values in each dimension correlates to one of the features we are measuring. This same logic applies to data points that have 300 features. We can represent them in 300-dimensional space. While this is intuitively hard for us to understand as three dimensional beings. Machines can do this very well. Robot: Right, what do you want t..... Mother ***** This data point X is considered a Vector. A vector is a 1-dimensional array. Think of it as a list of values or a row in a table. A vector of n- -elements is an n-Dimensional Vector with one dimension for each element. So for a 4-dimensional Data point. We can use a 1-by-4 array to hold its 4 feature values and because it represents a set of features. We call it a feature vector. More general than a Vector is a Matrix. A Matrix is a rectangular array of numbers and a Vector is a row or column of a Matrix. So each row in a Matrix could represent a different data point with each column being its respective features. Less general than a vector is a Scalar which is a single number. The most general term for all of these concepts is a Tensor. A Tensor is a multi-dimensional array so a First-order Tensor is a Vector, a Second-order Tensor is a Matrix and Tensors of order three and higher are called higher-order Tensors. So if a 1-D te nsor looks like a line... Stop. Who are you? I think they get it. I think they get it. You could represent a social graph that contains friends of friends of friends as a higher-order Tensor. This is why Google built a library called TensorFlow. It allows you to create a computational graph where Tensors created from data sets can flow through a series of mathematical operations that optimize for an objective. and while they built an entirely new type of chip called a TPU or Tensor Processing Unit. As computational power and the amount of data we have increases we are becoming more capable of processing multi-dimensional data. Vectors are typically represented in a multitude of ways, and they're used in many different fields of science, Especially physics since vectors act as a bookkeeping tool to keep track of two pieces of information, typically a magnitude and a direction for physical quantity. For example in Einstein's General theory of relativity The Curvature of space-time which gives rise to gravity is described by what's called a Riemann Curvature Tensor which is a tensor of order 4. So badass. So we can represent not only the fabric of reality this way. But the gradient of our optimization problem as well. During first order optimization, the weights of our model are updated incrementally after each pass over the training data set, given an Error Function like the Sum of Squared Errors, We can compute the magnitude and direction of the weight update by taking a step in the opposite direction of the error gradient. This all comes from Linear Algebra Algebra roughly means relationships, and it explores the relationships between unknown numbers. Linear Algebra roughly means line-like relationships. It's the way of organizing information about vector spaces that makes manipulating groups of numbers simultaneously easy. It defines these structures like Vectors and Matrices to hold these numbers and introduces new rules on how to add, multiply, subtract and divide them. So given two arrays, the algebraic way to multiply them would be to do it like this and the linear algebraic way would look like this We compute the dot product, instead of multiplying each number like this. The linear algebraic approach is three times faster in this case. Any type of data can be represented as a vector, images, videos, stock indices, text, audio signals, dougie dancing. No matter the type of data, it can be broken down into a set of numbers The model is not really accepting the data. It keeps throwing errors. Let me see. Oh it looks like you got to vectorize it. What do you mean? The model you wrote expected tensors of a certain size as its input. So we basically got to reshape the input data. so it's in the right vector space and then once it is we can compute things like the Cosine distance between Data points and the Vector norm. Is there a Python library to do that? You gotta love NumPy Vectorization is essentially just a matrix operation, and I can do it in a single line Awesome. Well you vectorize it up. I've gotta back-propagate out for today. Cool, where to? Tinder date All right, yeah. See ya ... A researcher named McCullough used the machine learning model called a neural network to create vectors for words WORD2VEC. Given some input corpus of text, like thousands of news articles, it would try to predict the next word in a sentence given the words around it. So a given word is encoded into a vector. The model then uses that vector to try and predict the next word if it's prediction doesn't match the actual next word, the components of this vector are adjusted. Each words context in the corpus acts as a teacher, sending error signals back to adjust the vector. The vectors of words that are judged similarly by their context are iteratively nudged closer together by adjusting the numbers in the vector and so after training the model learns thousands of Vectors for words. Give it a new word and it will find its associated word vector also called word embedding. Vectors don't just represent data. They help represent our models too. Many types of machine learning models represent their Learnings as vectors. All types of Neural networks do this. Given some data it will learn Dense Representations of that data These representations are essentially categories akin to if you have a data set of different colored eye pictures. It will learn a general representation for all eye colors. So given a new unlabeled eye picture, it would be able to recognize it as an eye I see vectors Good. Once data is vectorized, we can do so many things with it A trained Word2Vec model turns words into vectors, then we can perform mathematical operations on these vectors. We can see how closely related words are by computing the distance between their vectors. The word Sweden, for example, is closely related to other wealthy Northern European countries Because the distance between them is small when plotted on a graph Word vectors that are similar tend to cluster together like types of animals. Associations can be built like Rome is to Italy as Beijing is to China and operations like performing hotels plus motel gives us Holiday Inn Incredibly, vectorizing words is able to capture their semantic meanings numerically. The way we're able to compute the distance between two vectors is by using the notion of a vector Norm A norm is any function G that maps Vectors to real numbers that satisfies the following conditions The lengths are always positive. The length of zero implies zero. Scalar multiplication extends lengths in a predictable way and distances add reasonably so in a basic vector space the norm of a vector would be its absolute value and the distance between two numbers like this. Usually the length of a vector is calculated using the Euclidean norm which is defined like so but this isn't the only way to define length. There are others. You'll see the terms L1 norm and L2 norm used a lot in machine learning. The L2 norm is the Euclidean norm. The L1 norm is also called the Manhattan distance. We can use either to normalize a vector to get its unit vector and use that to compute the distance. Computing the distance between vectors is useful for showing users recommendations. Both of these terms are also used in the process of regularization. We train models to fit a set of training data But sometimes the model gets so fit to the training data that it doesn't have good prediction performance. It can't generalize well to new data points. To prevent this overfitting, we have to regularize our model. The common method to finding the best model is by defining a Loss function that describes how well the model fits the data. To sum things up, feature vectors are used to represent numeric or symbolic characteristics of data called features in a mathematical way. They can be represented in multi-dimensional vector spaces where we can perform operations on them like computing their distance and adding them and we can do this by computing the vector norm which describes the size of a vector. Also useful for preventing overfitting. The Wizard of the week Award goes to Vishnu Kumar. He implemented both gradient descent and Newton's method to create a model able to predict the amount of calories burned for cycling a certain distance the plots are great and the code is architected very legibly. Check it out. Amazing work, Vishnu. And the runner up for the last-minute entry is Hamad Shaikh I'd love to have details your notebook was This week's challenge is to implement both L1 and L2 regularization on a linear regression model Check the Github readme in the description for details and winners will be announced in a week.
B2 US vector data norm tensor dimensional model Vectors - The Math of Intelligence #3 43 11 alex posted on 2017/09/16 More Share Save Report Video vocabulary