Placeholder Image

Subtitles section Play video

  • Alright, once we have our data, it would be good to know how to get a general sense of

  • it.

  • There are six essential functions we can use to grasp the shape of what were working

  • with.

  • I personally like to keep these close at hand because I use them often.

  • These are the nrow(), ncol(), colnames(), rownames(), str(), which you already know,

  • and the summary() function.

  • Alright, let’s see what insights each of these gives us.

  • I’ll use the Pokémon data from the past two lessons again.

  • It is big and full of wonder!

  • As you can guess, nrow() gives us the number of rows our data has, not counting the column

  • names.

  • Respectively, ncol() provides us the number of columns.

  • Let’s do these two together.

  • So, our data has 811 observations, and 14 variables.

  • Now we have an idea of how large our current data set is.

  • And that’s something!

  • Moving on, we know we have 14 columns, but it would be better if we knew what’s inside

  • these columns.

  • Let’s look at their names with colnames().

  • Cool, so we have id, Pokémon, what the species id of that Pokémon is (which is probably

  • just the same as id), and so onknowing the names of our variables makes it a lot

  • easier when we need to slice and subset our data to do operations on specific values from

  • a column.

  • Right.

  • The rownames() function here is a little useless (and probably often will be, because it’s

  • not a common practice to name your rows, especially if your data set is large), but!

  • It is the natural counterpart for colnames() and I must mention it.

  • Finally, we have the str() and summary() functions.

  • You are already familiar with str(): it gives you the compact version of your data structure.

  • It really comes in handy when you want to have a quick look at your data and how it’s

  • organised.

  • R returns the structure of our data, with row and column numbers, and for each column,

  • or variable, the basic data type, as well as a couple of value instances.

  • Awesome.

  • If, when importing the data, we hadn’t set our stringsAsFactors = argument to FALSE,

  • we would have a bunch of factor where we have character data now.

  • Great.

  • And last but not least, the summary() function.

  • Now, this one is truly a multipurpose statistic and it should be one of the first things you

  • consult when starting to work on a new data set.

  • Summary() provides an excellent, well, summary, of the object you pass into it.

  • It is a bit more useful with numerical data, because it provides the essential descriptive

  • statistics (but we will go over this in a bit more detail in the statistics section

  • of the course).

  • Let’s see what it will tell us about my.pok.

  • Okay, so it took every variable in our data set and computed a bunch of useful things,

  • like the means and medians of each.

  • It also provided us with scope information like minimum and maximum values.

  • Awesome.

  • Our character variables, like the Pokémons and their types, are a bit less represented.

  • As you can see, all we get out of the function is information about the class and mode of

  • the objects.

  • Okay.

  • As usual, we can use summary() on a single variable or only a selection of variables,

  • if we are not interested in getting these basic descriptive stats for everything, en

  • masse, but we will learn how to slice through a data frame in the next lesson.

  • Alright, that’s it for this video, guys!

  • Have a play around with some of the data we have provided or with R’s pre-packaged data

  • sets, which you can find by calling data().

  • Alright!

  • I’ll see you next time!

Alright, once we have our data, it would be good to know how to get a general sense of

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it