R Only Reading First Column of Csv
Reading and Writing CSV Files
Overview
Teaching: thirty min
Exercises: 0 minQuestions
How practise I read data from a CSV file into R?
How do I write data to a CSV file?
Objectives
Read in a .csv, and explore the arguments of the csv reader.
Write the altered information gear up to a new .csv, and explore the arguments.
The most mutual way that scientists store data is in Excel spreadsheets. While at that place are R packages designed to access information from Excel spreadsheets (e.k., gdata, RODBC, XLConnect, xlsx, RExcel), users ofttimes find it easier to save their spreadsheets in comma-separated values files (CSV) and then apply R'southward built in functionality to read and manipulate the data. In this short lesson, we'll larn how to read information from a .csv and write to a new .csv, and explore the arguments that let you lot read and write the information correctly for your needs.
Read a .csv and Explore the Arguments
Let'southward start past opening a .csv file containing information on the speeds at which cars of unlike colors were clocked in 45 mph zones in the four-corners states (CarSpeeds.csv
). Nosotros volition use the built in read.csv(...)
function call, which reads the data in as a data frame, and assign the data frame to a variable (using <-
) so that information technology is stored in R's retentivity. And then nosotros volition explore some of the basic arguments that tin can be supplied to the function. Commencement, open the RStudio project containing the scripts and data you were working on in episode 'Analyzing Patient Data'.
# Import the data and look at the first half-dozen rows carSpeeds <- read.csv ( file = 'information/car-speeds.csv' ) head ( carSpeeds )
Color Speed State 1 Blueish 32 NewMexico two Crimson 45 Arizona 3 Blueish 35 Colorado 4 White 34 Arizona 5 Red 25 Arizona half dozen Blueish 41 Arizona
Irresolute Delimiters
The default delimiter of the
read.csv()
function is a comma, merely you can utilise other delimiters by supplying the 'sep' argument to the function (e.chiliad., typingsep = ';'
allows a semi-colon separated file to be correctly imported - see?read.csv()
for more information on this and other options for working with unlike file types).
The call to a higher place will import the data, but we accept not taken advantage of several handy arguments that can be helpful in loading the data in the format we want. Allow'south explore some of these arguments.
The default for read.csv(...)
is to prepare the header
argument to Truthful
. This means that the first row of values in the .csv is set as header information (column names). If your data gear up does not have a header, set the header
argument to Imitation
:
# The commencement row of the data without setting the header argument: carSpeeds [ ane , ]
Colour Speed Land i Bluish 32 NewMexico
# The first row of the information if the header argument is set to Imitation: carSpeeds <- read.csv ( file = 'data/car-speeds.csv' , header = FALSE ) carSpeeds [ i , ]
V1 V2 V3 one Color Speed Country
Clearly this is non the desired beliefs for this data set, but it may be useful if y'all have a dataset without headers.
The stringsAsFactors
Statement
In older versions of R (prior to 4.0) this was perhaps the nearly important argument in read.csv()
, particularly if you were working with categorical data. This is because the default behavior of R was to catechumen character strings into factors, which may make it difficult to do such things as replace values. It is of import to be aware of this behaviour, which we volition demonstrate. For example, let's say nosotros find out that the information collector was color blind, and accidentally recorded green cars every bit being blueish. In order to correct the information set, let'southward replace 'Bluish' with 'Dark-green' in the $Colour
column:
# Here we volition utilize R's `ifelse` function, in which we provide the examination phrase, # the outcome if the result of the exam is 'True', and the outcome if the # upshot is 'Simulated'. We volition likewise assign the results to the Color column, # using '<-' # First - reload the data with a header carSpeeds <- read.csv ( file = 'information/car-speeds.csv' , stringsAsFactors = TRUE ) carSpeeds $ Color <- ifelse ( carSpeeds $ Color == 'Blue' , 'Green' , carSpeeds $ Color ) carSpeeds $ Color
[1] "Green" "1" "Green" "5" "4" "Greenish" "Green" "2" "v" [ten] "iv" "4" "5" "Green" "Green" "2" "4" "Green" "Green" [19] "5" "Greenish" "Green" "Green" "iv" "Green" "4" "four" "4" [28] "4" "5" "Green" "four" "5" "2" "4" "2" "2" [37] "Greenish" "4" "2" "4" "ii" "ii" "four" "4" "5" [46] "2" "Green" "4" "4" "ii" "two" "4" "v" "4" [55] "Green" "Green" "2" "Light-green" "5" "2" "4" "Green" "Green" [64] "5" "2" "4" "4" "2" "Green" "v" "Green" "4" [73] "5" "5" "Green" "Green" "Green" "Green" "Green" "5" "ii" [82] "Green" "5" "2" "ii" "4" "four" "5" "five" "5" [91] "5" "4" "iv" "4" "v" "ii" "5" "2" "2" [100] "5"
What happened?!? It looks similar 'Blue' was replaced with 'Green', only every other color was turned into a number (as a grapheme cord, given the quote marks before and later). This is because the colors of the cars were loaded as factors, and the factor level was reported post-obit replacement.
To see the internal construction, we can utilize some other function, str()
. In this example, the dataframe'southward internal construction includes the format of each column, which is what we are interested in. str()
will be reviewed a fiddling more than in the lesson Data Types and Structures.
# Reload the data with a header (the previous ifelse call modifies attributes) carSpeeds <- read.csv ( file = 'information/car-speeds.csv' , stringsAsFactors = TRUE ) str ( carSpeeds )
'data.frame': 100 obs. of three variables: $ Color: Factor w/ 5 levels " Red","Black",..: three 1 three 5 4 3 3 two 5 4 ... $ Speed: int 32 45 35 34 25 41 34 29 31 26 ... $ State: Factor w/ 4 levels "Arizona","Colorado",..: three 1 2 1 1 1 iii two 1 two ...
We tin can run across that the $Color
and $State
columns are factors and $Speed
is a numeric column.
Now, let's load the dataset using stringsAsFactors=Faux
, and see what happens when we try to supervene upon 'Blue' with 'Green' in the $Color
cavalcade:
carSpeeds <- read.csv ( file = 'data/auto-speeds.csv' , stringsAsFactors = Fake ) str ( carSpeeds )
'data.frame': 100 obs. of three variables: $ Color: chr "Blue" " Red" "Blue" "White" ... $ Speed: int 32 45 35 34 25 41 34 29 31 26 ... $ State: chr "NewMexico" "Arizona" "Colorado" "Arizona" ...
carSpeeds $ Color <- ifelse ( carSpeeds $ Colour == 'Blue' , 'Green' , carSpeeds $ Color ) carSpeeds $ Color
[1] "Green" " Scarlet" "Dark-green" "White" "Red" "Green" "Green" "Black" "White" [10] "Red" "Ruby" "White" "Green" "Dark-green" "Black" "Red" "Greenish" "Light-green" [xix] "White" "Light-green" "Greenish" "Green" "Red" "Green" "Ruby-red" "Ruby-red" "Scarlet" [28] "Scarlet" "White" "Green" "Red" "White" "Blackness" "Carmine" "Black" "Black" [37] "Green" "Cherry" "Black" "Red" "Black" "Blackness" "Blood-red" "Red" "White" [46] "Black" "Green" "Reddish" "Red" "Black" "Black" "Cherry-red" "White" "Red" [55] "Dark-green" "Green" "Black" "Green" "White" "Black" "Blood-red" "Dark-green" "Green" [64] "White" "Blackness" "Red" "Cherry" "Black" "Light-green" "White" "Dark-green" "Red" [73] "White" "White" "Light-green" "Green" "Green" "Green" "Green" "White" "Blackness" [82] "Light-green" "White" "Blackness" "Black" "Red" "Red" "White" "White" "White" [91] "White" "Crimson" "Red" "Scarlet" "White" "Black" "White" "Black" "Blackness" [100] "White"
That'southward better! And we tin can see how the data now is read as character instead of factor. From R version four.0 onwards we practise not take to specify stringsAsFactors=False
, this is the default behavior.
The as.is
Argument
This is an extension of the stringsAsFactors
statement, only gives you lot control over individual columns. For instance, if we want the colors of cars imported as strings, simply we want the names of the states imported every bit factors, nosotros would load the information set every bit:
carSpeeds <- read.csv ( file = 'data/car-speeds.csv' , as.is = 1 ) # Note, the 1 applies as.is to the showtime cavalcade but
Now nosotros can see that if we try to replace 'Blueish' with 'Greenish' in the $Color
column everything looks fine, while trying to supervene upon 'Arizona' with 'Ohio' in the $State
column returns the gene numbers for the names of states that nosotros haven't replaced:
'data.frame': 100 obs. of iii variables: $ Color: chr "Blue" " Red" "Blue" "White" ... $ Speed: int 32 45 35 34 25 41 34 29 31 26 ... $ State: Gene w/ 4 levels "Arizona","Colorado",..: 3 1 ii 1 ane 1 iii 2 1 2 ...
carSpeeds $ Colour <- ifelse ( carSpeeds $ Color == 'Bluish' , 'Greenish' , carSpeeds $ Colour ) carSpeeds $ Color
[i] "Dark-green" " Scarlet" "Greenish" "White" "Red" "Green" "Green" "Black" "White" [10] "Red" "Red" "White" "Light-green" "Green" "Blackness" "Red" "Green" "Green" [19] "White" "Green" "Green" "Light-green" "Red" "Green" "Red" "Red" "Red" [28] "Cherry-red" "White" "Light-green" "Carmine" "White" "Black" "Crimson" "Black" "Blackness" [37] "Green" "Cherry-red" "Blackness" "Ruby-red" "Black" "Black" "Red" "Cerise" "White" [46] "Black" "Green" "Red" "Cherry-red" "Blackness" "Black" "Red" "White" "Cherry" [55] "Green" "Light-green" "Black" "Light-green" "White" "Blackness" "Crimson" "Dark-green" "Light-green" [64] "White" "Black" "Red" "Cherry-red" "Blackness" "Green" "White" "Light-green" "Red" [73] "White" "White" "Green" "Green" "Green" "Green" "Dark-green" "White" "Black" [82] "Greenish" "White" "Black" "Black" "Red" "Crimson" "White" "White" "White" [91] "White" "Red" "Scarlet" "Red" "White" "Blackness" "White" "Blackness" "Black" [100] "White"
carSpeeds $ State <- ifelse ( carSpeeds $ State == 'Arizona' , 'Ohio' , carSpeeds $ Country ) carSpeeds $ State
[ane] "3" "Ohio" "2" "Ohio" "Ohio" "Ohio" "iii" "ii" "Ohio" "2" [xi] "4" "4" "iv" "four" "4" "3" "Ohio" "iii" "Ohio" "4" [21] "4" "4" "3" "2" "2" "iii" "2" "4" "2" "4" [31] "3" "2" "2" "4" "2" "two" "iii" "Ohio" "4" "ii" [41] "2" "3" "Ohio" "4" "Ohio" "2" "three" "3" "3" "2" [51] "Ohio" "4" "four" "Ohio" "three" "ii" "4" "ii" "4" "4" [61] "4" "ii" "3" "ii" "3" "2" "three" "Ohio" "3" "4" [71] "iv" "ii" "Ohio" "4" "2" "ii" "two" "Ohio" "3" "Ohio" [81] "4" "two" "ii" "Ohio" "Ohio" "Ohio" "four" "Ohio" "4" "4" [91] "4" "Ohio" "Ohio" "3" "2" "2" "4" "iii" "Ohio" "4"
We tin can see that $Color
cavalcade is a character while $Land
is a gene.
Updating Values in a Cistron
Suppose nosotros want to keep the colors of cars as factors for some other operations we want to perform. Write code for replacing 'Blue' with 'Green' in the
$Color
column of the cars dataset without importing the data withstringsAsFactors=False
.Solution
carSpeeds <- read.csv ( file = 'data/machine-speeds.csv' ) # Replace 'Blue' with 'Green' in cars$Color without using the stringsAsFactors # or as.is arguments carSpeeds $ Color <- ifelse ( as.character ( carSpeeds $ Color ) == 'Blue' , 'Green' , as.character ( carSpeeds $ Color )) # Convert colors dorsum to factors carSpeeds $ Color <- as.factor ( carSpeeds $ Color )
The strip.white
Argument
It is not uncommon for mistakes to have been fabricated when the data were recorded, for example a space (whitespace) may accept been inserted before a data value. By default this whitespace will exist kept in the R surroundings, such that '\ Red' will be recognized as a different value than 'Red'. In order to avoid this blazon of error, use the strip.white
statement. Permit's see how this works by checking for the unique values in the $Colour
column of our dataset:
Here, the data recorder added a infinite before the color of the car in 1 of the cells:
# We utilise the built-in unique() role to extract the unique colors in our dataset unique ( carSpeeds $ Color )
[1] Greenish Red White Carmine Black Levels: Red Black Green Red White
Oops, we see two values for crimson cars.
Let'southward try again, this time importing the data using the strip.white
argument. NOTE - this argument must be accompanied by the sep
argument, by which we point the blazon of delimiter in the file (the comma for near .csv files)
carSpeeds <- read.csv ( file = 'information/car-speeds.csv' , stringsAsFactors = Simulated , strip.white = Truthful , sep = ',' ) unique ( carSpeeds $ Color )
[1] "Blue" "Red" "White" "Black"
That's improve!
Specify Missing Information When Loading
It is common for data sets to have missing values, or mistakes. The convention for recording missing values often depends on the individual who collected the information and can be recorded as
n.a.
,--
, or empty cells " ". R recognises the reserved graphic symbol stringNA
equally a missing value, but not some of the examples above. Permit's say the inflamation scale in the data set nosotros used earlierinflammation-01.csv
actually starts atane
for no inflamation and the zero values (0
) were a missed observation. Looking at the?read.csv
help page is at that place an argument we could use to ensure all zeros (0
) are read in every bitNA
? Perhaps, in theauto-speeds.csv
data contains mistakes and the person measuring the motorcar speeds could non accurately distinguish between "Blackness or "Bluish" cars. Is there a way to specify more than than 1 'string', such as "Black" and "Blue", to exist replaced byNA
Solution
read.csv ( file = "data/inflammation-01.csv" , na.strings = "0" )
or , in
car-speeds.csv
use a character vector for multiple values.read.csv ( file = 'information/car-speeds.csv' , na.strings = c ( "Black" , "Blue" ) )
Write a New .csv and Explore the Arguments
After altering our cars dataset by replacing 'Bluish' with 'Green' in the $Color
column, nosotros now want to save the output. There are several arguments for the write.csv(...)
function call, a few of which are particularly important for how the data are exported. Let'southward explore these now.
# Export the information. The write.csv() function requires a minimum of two # arguments, the data to be saved and the name of the output file. write.csv ( carSpeeds , file = 'data/motorcar-speeds-cleaned.csv' )
If you open the file, you'll run across that it has header names, because the data had headers within R, but that in that location are numbers in the start cavalcade.
The row.names
Argument
This argument allows us to set the names of the rows in the output data file. R'south default for this argument is TRUE
, and since it does not know what else to name the rows for the cars information gear up, it resorts to using row numbers. To right this, we tin set row.names
to Faux
:
write.csv ( carSpeeds , file = 'data/car-speeds-cleaned.csv' , row.names = Faux )
Now nosotros see:
Setting Cavalcade Names
In that location is too a
col.names
argument, which can be used to set the cavalcade names for a data fix without headers. If the data gear up already has headers (e.grand., nosotros used theheaders = Truthful
statement when importing the data) and then acol.names
argument will be ignored.
The na
Argument
There are times when we desire to specify certain values for NA
southward in the data ready (due east.grand., nosotros are going to pass the data to a program that but accepts -9999 equally a nodata value). In this case, we desire to ready the NA
value of our output file to the desired value, using the na statement. Permit'southward see how this works:
# First, supersede the speed in the 3rd row with NA, past using an index (square # brackets to indicate the position of the value we want to supersede) carSpeeds $ Speed [ three ] <- NA head ( carSpeeds )
Colour Speed State ane Blue 32 NewMexico two Red 45 Arizona 3 Blueish NA Colorado iv White 34 Arizona 5 Crimson 25 Arizona 6 Bluish 41 Arizona
write.csv ( carSpeeds , file = 'data/car-speeds-cleaned.csv' , row.names = FALSE )
Now we'll set NA
to -9999 when nosotros write the new .csv file:
# Annotation - the na statement requires a cord input write.csv ( carSpeeds , file = 'information/car-speeds-cleaned.csv' , row.names = Faux , na = '-9999' )
And we see:
Central Points
Import data from a .csv file using the
read.csv(...)
function.Sympathise some of the key arguments bachelor for importing the data properly, including
header
,stringsAsFactors
,as.is
, andstrip.white
.Write data to a new .csv file using the
write.csv(...)
officeUnderstand some of the key arguments available for exporting the information properly, such as
row.names
,col.names
, andna
.
Source: https://swcarpentry.github.io/r-novice-inflammation/11-supp-read-write-csv/
0 Response to "R Only Reading First Column of Csv"
ارسال یک نظر