Placeholder Image

Subtitles section Play video

  • Hello again!

  • In this video we will learn how to import data into R, or how-to-do-that-thing-youll-be-doing-all-the-time-as-data-scientists.

  • As you can tell, this is an important lesson, so buckle up, and let’s get to it.

  • If you want to work in parallel with me and still haven’t downloaded the resources for

  • the course, please do it now.

  • The resources for this lesson include a couple of data sets which we will be loading into

  • R during the lesson.

  • If you have downloaded them, check where they are saved, or copy them into your working

  • directory so theyre easy to locate.

  • The data file names are: pokRdex_comma, and pokRdex_delim.

  • Do you remember how to check and set your working directory?

  • The working directory is where we will be loading data from and saving data to.

  • To check what your current working directory is, type getwd() into the command.

  • If youre not happy with where this takes you, you can set your directory by typing

  • setwd() and passing a path to the location on your computer that works best for you.

  • Alternatively, you can use RStudio’s Session tab in the ribbon on top and set your directory

  • from there.

  • Cool.

  • Now that were in the directory where our files are, we can load data easily.

  • Although there are tons of different data file types, we will stick to the most commonly

  • used ones: text and .csv; you can find information about importing other file types in the resources

  • for this lesson.

  • Okay, let's get to it!

  • The general-purpose data reading function in R is the read.table() function.

  • To load data in the form of a text file from your working directory, pass the name of the

  • file in quotes, and then specify at least the following: first, the separator for your

  • data (this tells R what distinguishes your values); second, whether your data set has

  • a header row or not; and third, whether R should encode your string variables as factors.

  • Using our Pokémon data, the command looks like this...

  • So, why is it important to specify all these things?

  • Well.

  • Separators tell R how to look at your data and how to structure the data frame.

  • The header signals whether the first row of the data file is values or variable names.

  • If your data doesn’t have variable names and begins directly with the first observation,

  • then set that argument to FALSE.

  • Finally, the stringsAsFactors = argument tells R whether to convert all string variables

  • to factors or not.

  • Since our string variables include Pokémon names and these are clearly not factors, we

  • set the argument to FALSE.

  • Of course, there are tons more arguments for the read.table() function but these are the

  • crucial ones to remember.

  • Okay, this covers the basic architecture of reading a data file into R. You can use the

  • read.table() function with a lot of different data types because of its flexibility and

  • argument set-up.

  • Next, well talk about reading specific data types: comma-separated-values, or CSV

  • files, and tab-delimited files.

  • See you there.

Hello again!

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it