Data frames in R - Import a CSV in R - VoiceTube: Learn English through videos!

Subtitles section Play video

Alright.
So, now we know how read.table() works, and that paves our way to learning some shortcuts.
The file we loaded into R the previous lesson is a CSV file; it is a simple text document
in which the values are separated by commas.
CSV files are extremely common, so R’s brainy developers have given us a shortcut function
with which to load them faster...
This function is from the read.table() family, and it is called read.csv().
read.csv() takes fewer arguments than read.table(), because its defaults are set in a very
convenient way; headers are set to TRUE, and separators are set to a comma. All we need to do in order to read
a table is pass the name of the file, and specify whether we want our strings to be
factors or not.
Sweet!
Apart from comma-separated, values in a text data file can be separated by tabs; these
types of documents are called tab-delimited files.
And just as with CSVs, there is a read.table() shortcut to reading them: read.delim().
What’s happening behind the scenes here is that the sep = argument is set to \t, header
is again TRUE, and a bunch of other useful arguments are set to default to their most
commonly used values.
Now, just before we wrap this up, I want to mention a few important things.
First, for those of you in Europe or anywhere else in the world where the notation for the
decimal is a comma, and therefore CSV files don’t really work for you, there is a read.csv2()
function designed to deal with this problem.
It reads CSV files with a semi-colon as a separator.
The same goes for read.delim()which also has a version 2 with the exact same purpose.
Second.
Often, data files from external sources come with additional text, either as an introduction
or a sign-off, which will only cause havoc in your data if your end up importing it.
Therefore, it is excellent that we can tell R to completely ignore the first few lines
of text in our data file.
If you want to restrict where R stops reading the data file, you can tell it to read a precise
number of rows with the nrow = argument.
For example, our Pokémon data is way too large, and I may only be interested in the
first 100 Pokémon.
If I set nrow = 100, this is exactly what I will get.
Pay attention to what happened here: the heather doesn’t count towards the number of rows
specified.
nrow = stands for rows of observations.
Okay, let’s break it off here.
Super good job, everyone!
The next lesson will be very short, and it will complete the data import/export circle:
we will be talking about exporting data.
See you there!
And…
May the Force Be with You

No results