Subtitles section Play video Print subtitles Hi, everybody. I'm Laurence Maroney from the Tensorflow team at Google, and today we're going to talk about text classification. It's part one of a two part series will focus on the data and getting it ready to train a neural network. You will do this hands on, using a workbook that you can find it. The link in the description below announce, step you through it. Text classification has some unique challenges. So before you get coding, let me step you through. Some of these first of all, neural networks typically deal with numbers and not text when learning patterns that could be used for prediction or classifications. So in this case, we're looking at learning from movie reviews to see if those reviews are positive or negative. On. The first step, of course, is to change the words into numbers that represent them. They'll be a little bit more processing of these words into factors determining their sentiments on. We'll cover that in the next video, so let's get coating first. First things first. I'll have to check the licenses before I begin, and now all import tensorflow and numb pie. I'll also use care, ass and print out the version of Tensorflow that I'm using. Okay, now it's time to get the data sense. The IMDb scent is included with Care us, so let's download it on. Let's take a look at what's in there. Note that in this case, the nice folks that care us have done the work for us of converting the words into integers. They've also sorted them into a dictionary so that lower numbers are the most common words and higher numbers of the least common words. So when we loaded the specified 10,000 words, this will then give us the top 10,000 words that he used across all of the reviews. Okay, now we've loaded the data, and we have our training data and labels as well as our test data and labels. It's also nicely sorted into integers for us, which is a great first step for learning. Let's see what the data looks like Next. First, we'll look at our training data. You'll see that we have a total of 25,000 items of data and 25,000 labels describing them the label's very simple it zero for a negative review on one for a positive one. A reviews look like this. It's just a long set of numbers, and these are the indexes into the array of words. The review will start with a one indicating the start of the review. So the first word in the review is Word number 14 which translates to the word this followed by the Value 22 which translates to the word film. The next bit of code. Is that a handy, dandy way of decoding the review? Note that the value zero through three are reserved, with one being the start of the review, as we mentioned a moment ago, and zero is for padding now. This is important, and you'll see that in a moment I could now decode the review and see that 1 14 22 are the start character in this and then film. It's pretty girl right now. Earlier, I skipped over this piece of code, showing me the length of the review. So, for example, the first movie was 218 words long. On the second was 189 words long. Now that's really awkward, and it's confusing to train a neural network. If all of the training data is of different lengths. So let's pick a standard length for every review and if it's longer, will trim it to that length and if it's shorter, will pat it to that length. The caress pre processing AP eyes make this really easy. Here you can see I'm taking the training and test data on making sure it's 256 words long. If I need to pad its, then I'll pad it with the pad character, which is the zero that we saw earlier. A quick look will now show that it worked there. All 256 words Love on. If I now look at my first set of training data, you'll see that it's patted by zeros. Remember, it had been 218 words long, so the extras get patted out to make it to 56. Great. Our training and test data is now ready. So in the next episode you'll take a look at how to design a neural network to accept this data on to train a model to determine the sentiment of movie reviews.
B1 data review training data training neural length Prepare your data for ML | Text Classification Tutorial Pt. 1 (Coding TensorFlow) 1 0 林宜悉 posted on 2020/03/25 More Share Save Report Video vocabulary