R Only Reading First Column of Csv

Reading and Writing CSV Files

Overview

Teaching: thirty min
Exercises: 0 min

Questions

  • How practise I read data from a CSV file into R?

  • How do I write data to a CSV file?

Objectives

  • Read in a .csv, and explore the arguments of the csv reader.

  • Write the altered information gear up to a new .csv, and explore the arguments.

The most mutual way that scientists store data is in Excel spreadsheets. While at that place are R packages designed to access information from Excel spreadsheets (e.k., gdata, RODBC, XLConnect, xlsx, RExcel), users ofttimes find it easier to save their spreadsheets in comma-separated values files (CSV) and then apply R'southward built in functionality to read and manipulate the data. In this short lesson, we'll larn how to read information from a .csv and write to a new .csv, and explore the arguments that let you lot read and write the information correctly for your needs.

Read a .csv and Explore the Arguments

Let'southward start past opening a .csv file containing information on the speeds at which cars of unlike colors were clocked in 45 mph zones in the four-corners states (CarSpeeds.csv). Nosotros volition use the built in read.csv(...) function call, which reads the data in as a data frame, and assign the data frame to a variable (using <-) so that information technology is stored in R's retentivity. And then nosotros volition explore some of the basic arguments that tin can be supplied to the function. Commencement, open the RStudio project containing the scripts and data you were working on in episode 'Analyzing Patient Data'.

                          # Import the data and look at the first half-dozen rows                                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'information/car-speeds.csv'              )                                          head              (              carSpeeds              )                                                  
                          Color Speed     State 1  Blueish    32 NewMexico two   Crimson    45   Arizona 3  Blueish    35  Colorado 4 White    34   Arizona 5   Red    25   Arizona half dozen  Blueish    41   Arizona                      

Irresolute Delimiters

The default delimiter of the read.csv() function is a comma, merely you can utilise other delimiters by supplying the 'sep' argument to the function (e.chiliad., typing sep = ';' allows a semi-colon separated file to be correctly imported - see ?read.csv() for more information on this and other options for working with unlike file types).

The call to a higher place will import the data, but we accept not taken advantage of several handy arguments that can be helpful in loading the data in the format we want. Allow'south explore some of these arguments.

The default for read.csv(...) is to prepare the header argument to Truthful. This means that the first row of values in the .csv is set as header information (column names). If your data gear up does not have a header, set the header argument to Imitation:

                          # The commencement row of the data without setting the header argument:                                          carSpeeds              [              ane              ,                                          ]                                                  
                          Colour Speed     Land i  Bluish    32 NewMexico                      
                          # The first row of the information if the header argument is set to Imitation:                                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'data/car-speeds.csv'              ,                                          header                                          =                                          FALSE              )                                          carSpeeds              [              i              ,                                          ]                                                  
                          V1    V2    V3 one Color Speed Country                      

Clearly this is non the desired beliefs for this data set, but it may be useful if y'all have a dataset without headers.

The stringsAsFactors Statement

In older versions of R (prior to 4.0) this was perhaps the nearly important argument in read.csv(), particularly if you were working with categorical data. This is because the default behavior of R was to catechumen character strings into factors, which may make it difficult to do such things as replace values. It is of import to be aware of this behaviour, which we volition demonstrate. For example, let's say nosotros find out that the information collector was color blind, and accidentally recorded green cars every bit being blueish. In order to correct the information set, let'southward replace 'Bluish' with 'Dark-green' in the $Colour column:

                          # Here we volition utilize R's `ifelse` function, in which we provide the examination phrase,                                          # the outcome if the result of the exam is 'True', and the outcome if the                                          # upshot is 'Simulated'. We volition likewise assign the results to the Color column,                                          # using '<-'                                          # First - reload the data with a header                                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'information/car-speeds.csv'              ,                                          stringsAsFactors                                          =                                          TRUE              )                                          carSpeeds              $              Color                                          <-                                          ifelse              (              carSpeeds              $              Color                                          ==                                          'Blue'              ,                                          'Green'              ,                                          carSpeeds              $              Color              )                                          carSpeeds              $              Color                                                  
                          [1] "Green" "1"     "Green" "5"     "4"     "Greenish" "Green" "2"     "v"      [ten] "iv"     "4"     "5"     "Green" "Green" "2"     "4"     "Green" "Green"  [19] "5"     "Greenish" "Green" "Green" "iv"     "Green" "4"     "four"     "4"      [28] "4"     "5"     "Green" "four"     "5"     "2"     "4"     "2"     "2"      [37] "Greenish" "4"     "2"     "4"     "ii"     "ii"     "four"     "4"     "5"      [46] "2"     "Green" "4"     "4"     "ii"     "two"     "4"     "v"     "4"      [55] "Green" "Green" "2"     "Light-green" "5"     "2"     "4"     "Green" "Green"  [64] "5"     "2"     "4"     "4"     "2"     "Green" "v"     "Green" "4"      [73] "5"     "5"     "Green" "Green" "Green" "Green" "Green" "5"     "ii"      [82] "Green" "5"     "2"     "ii"     "4"     "four"     "5"     "five"     "5"      [91] "5"     "4"     "iv"     "4"     "v"     "ii"     "5"     "2"     "2"     [100] "5"                      

What happened?!? It looks similar 'Blue' was replaced with 'Green', only every other color was turned into a number (as a grapheme cord, given the quote marks before and later). This is because the colors of the cars were loaded as factors, and the factor level was reported post-obit replacement.

To see the internal construction, we can utilize some other function, str(). In this example, the dataframe'southward internal construction includes the format of each column, which is what we are interested in. str() will be reviewed a fiddling more than in the lesson Data Types and Structures.

                          # Reload the data with a header (the previous ifelse call modifies attributes)                                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'information/car-speeds.csv'              ,                                          stringsAsFactors                                          =                                          TRUE              )                                          str              (              carSpeeds              )                                                  
            'data.frame':	100 obs. of  three variables:  $ Color: Factor w/ 5 levels " Red","Black",..: three 1 three 5 4 3 3 two 5 4 ...  $ Speed: int  32 45 35 34 25 41 34 29 31 26 ...  $ State: Factor w/ 4 levels "Arizona","Colorado",..: three 1 2 1 1 1 iii two 1 two ...                      

We tin can run across that the $Color and $State columns are factors and $Speed is a numeric column.

Now, let's load the dataset using stringsAsFactors=Faux, and see what happens when we try to supervene upon 'Blue' with 'Green' in the $Color cavalcade:

                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'data/auto-speeds.csv'              ,                                          stringsAsFactors                                          =                                          Fake              )                                          str              (              carSpeeds              )                                                  
            'data.frame':	100 obs. of  three variables:  $ Color: chr  "Blue" " Red" "Blue" "White" ...  $ Speed: int  32 45 35 34 25 41 34 29 31 26 ...  $ State: chr  "NewMexico" "Arizona" "Colorado" "Arizona" ...                      
                          carSpeeds              $              Color                                          <-                                          ifelse              (              carSpeeds              $              Colour                                          ==                                          'Blue'              ,                                          'Green'              ,                                          carSpeeds              $              Color              )                                          carSpeeds              $              Color                                                  
                          [1] "Green" " Scarlet"  "Dark-green" "White" "Red"   "Green" "Green" "Black" "White"  [10] "Red"   "Ruby"   "White" "Green" "Dark-green" "Black" "Red"   "Greenish" "Light-green"  [xix] "White" "Light-green" "Greenish" "Green" "Red"   "Green" "Ruby-red"   "Ruby-red"   "Scarlet"    [28] "Scarlet"   "White" "Green" "Red"   "White" "Blackness" "Carmine"   "Black" "Black"  [37] "Green" "Cherry"   "Black" "Red"   "Black" "Blackness" "Blood-red"   "Red"   "White"  [46] "Black" "Green" "Reddish"   "Red"   "Black" "Black" "Cherry-red"   "White" "Red"    [55] "Dark-green" "Green" "Black" "Green" "White" "Black" "Blood-red"   "Dark-green" "Green"  [64] "White" "Blackness" "Red"   "Cherry"   "Black" "Light-green" "White" "Dark-green" "Red"    [73] "White" "White" "Light-green" "Green" "Green" "Green" "Green" "White" "Blackness"  [82] "Light-green" "White" "Blackness" "Black" "Red"   "Red"   "White" "White" "White"  [91] "White" "Crimson"   "Red"   "Scarlet"   "White" "Black" "White" "Black" "Blackness" [100] "White"                      

That'southward better! And we tin can see how the data now is read as character instead of factor. From R version four.0 onwards we practise not take to specify stringsAsFactors=False, this is the default behavior.

The as.is Argument

This is an extension of the stringsAsFactors statement, only gives you lot control over individual columns. For instance, if we want the colors of cars imported as strings, simply we want the names of the states imported every bit factors, nosotros would load the information set every bit:

                          carSpeeds                                          <-                                          read.csv              (              file                                          =                                          'data/car-speeds.csv'              ,                                          as.is                                          =                                          1              )                                          # Note, the 1 applies as.is to the showtime cavalcade but                                                  

Now nosotros can see that if we try to replace 'Blueish' with 'Greenish' in the $Color column everything looks fine, while trying to supervene upon 'Arizona' with 'Ohio' in the $State column returns the gene numbers for the names of states that nosotros haven't replaced:

            'data.frame':	100 obs. of  iii variables:  $ Color: chr  "Blue" " Red" "Blue" "White" ...  $ Speed: int  32 45 35 34 25 41 34 29 31 26 ...  $ State: Gene w/ 4 levels "Arizona","Colorado",..: 3 1 ii 1 ane 1 iii 2 1 2 ...                      
                          carSpeeds              $              Colour                                          <-                                          ifelse              (              carSpeeds              $              Color                                          ==                                          'Bluish'              ,                                          'Greenish'              ,                                          carSpeeds              $              Colour              )                                          carSpeeds              $              Color                                                  
                          [i] "Dark-green" " Scarlet"  "Greenish" "White" "Red"   "Green" "Green" "Black" "White"  [10] "Red"   "Red"   "White" "Light-green" "Green" "Blackness" "Red"   "Green" "Green"  [19] "White" "Green" "Green" "Light-green" "Red"   "Green" "Red"   "Red"   "Red"    [28] "Cherry-red"   "White" "Light-green" "Carmine"   "White" "Black" "Crimson"   "Black" "Blackness"  [37] "Green" "Cherry-red"   "Blackness" "Ruby-red"   "Black" "Black" "Red"   "Cerise"   "White"  [46] "Black" "Green" "Red"   "Cherry-red"   "Blackness" "Black" "Red"   "White" "Cherry"    [55] "Green" "Light-green" "Black" "Light-green" "White" "Blackness" "Crimson"   "Dark-green" "Light-green"  [64] "White" "Black" "Red"   "Cherry-red"   "Blackness" "Green" "White" "Light-green" "Red"    [73] "White" "White" "Green" "Green" "Green" "Green" "Dark-green" "White" "Black"  [82] "Greenish" "White" "Black" "Black" "Red"   "Crimson"   "White" "White" "White"  [91] "White" "Red"   "Scarlet"   "Red"   "White" "Blackness" "White" "Blackness" "Black" [100] "White"                      
                          carSpeeds              $              State                                          <-                                          ifelse              (              carSpeeds              $              State                                          ==                                          'Arizona'              ,                                          'Ohio'              ,                                          carSpeeds              $              Country              )                                          carSpeeds              $              State                                                  
                          [ane] "3"    "Ohio" "2"    "Ohio" "Ohio" "Ohio" "iii"    "ii"    "Ohio" "2"     [xi] "4"    "4"    "iv"    "four"    "4"    "3"    "Ohio" "iii"    "Ohio" "4"     [21] "4"    "4"    "3"    "2"    "2"    "iii"    "2"    "4"    "2"    "4"     [31] "3"    "2"    "2"    "4"    "2"    "two"    "iii"    "Ohio" "4"    "ii"     [41] "2"    "3"    "Ohio" "4"    "Ohio" "2"    "three"    "3"    "3"    "2"     [51] "Ohio" "4"    "four"    "Ohio" "three"    "ii"    "4"    "ii"    "4"    "4"     [61] "4"    "ii"    "3"    "ii"    "3"    "2"    "three"    "Ohio" "3"    "4"     [71] "iv"    "ii"    "Ohio" "4"    "2"    "ii"    "two"    "Ohio" "3"    "Ohio"  [81] "4"    "two"    "ii"    "Ohio" "Ohio" "Ohio" "four"    "Ohio" "4"    "4"     [91] "4"    "Ohio" "Ohio" "3"    "2"    "2"    "4"    "iii"    "Ohio" "4"                      

We tin can see that $Color cavalcade is a character while $Land is a gene.

Updating Values in a Cistron

Suppose nosotros want to keep the colors of cars as factors for some other operations we want to perform. Write code for replacing 'Blue' with 'Green' in the $Color column of the cars dataset without importing the data with stringsAsFactors=False.

Solution

                                  carSpeeds                                                      <-                                                      read.csv                  (                  file                                                      =                                                      'data/machine-speeds.csv'                  )                                                      # Replace 'Blue' with 'Green' in cars$Color without using the stringsAsFactors                                                      # or as.is arguments                                                      carSpeeds                  $                  Color                                                      <-                                                      ifelse                  (                  as.character                  (                  carSpeeds                  $                  Color                  )                                                      ==                                                      'Blue'                  ,                                                      'Green'                  ,                                                      as.character                  (                  carSpeeds                  $                  Color                  ))                                                      # Convert colors dorsum to factors                                                      carSpeeds                  $                  Color                                                      <-                                                      as.factor                  (                  carSpeeds                  $                  Color                  )                                                                  

The strip.white Argument

It is not uncommon for mistakes to have been fabricated when the data were recorded, for example a space (whitespace) may accept been inserted before a data value. By default this whitespace will exist kept in the R surroundings, such that '\ Red' will be recognized as a different value than 'Red'. In order to avoid this blazon of error, use the strip.white statement. Permit's see how this works by checking for the unique values in the $Colour column of our dataset:

Here, the data recorder added a infinite before the color of the car in 1 of the cells:

                          # We utilise the built-in unique() role to extract the unique colors in our dataset                                          unique              (              carSpeeds              $              Color              )                                                  
            [1] Greenish  Red  White Carmine   Black Levels:  Red Black Green Red White                      

Oops, we see two values for crimson cars.

Let'southward try again, this time importing the data using the strip.white argument. NOTE - this argument must be accompanied by the sep argument, by which we point the blazon of delimiter in the file (the comma for near .csv files)

                          carSpeeds                                          <-                                          read.csv              (                                          file                                          =                                          'information/car-speeds.csv'              ,                                          stringsAsFactors                                          =                                          Simulated              ,                                          strip.white                                          =                                          Truthful              ,                                          sep                                          =                                          ','                                          )                                          unique              (              carSpeeds              $              Color              )                                                  
            [1] "Blue"  "Red"   "White" "Black"                      

That's improve!

Specify Missing Information When Loading

It is common for data sets to have missing values, or mistakes. The convention for recording missing values often depends on the individual who collected the information and can be recorded as n.a., --, or empty cells " ". R recognises the reserved graphic symbol string NA equally a missing value, but not some of the examples above. Permit's say the inflamation scale in the data set nosotros used earlier inflammation-01.csv actually starts at ane for no inflamation and the zero values (0) were a missed observation. Looking at the ?read.csv help page is at that place an argument we could use to ensure all zeros (0) are read in every bit NA? Perhaps, in the auto-speeds.csv data contains mistakes and the person measuring the motorcar speeds could non accurately distinguish between "Blackness or "Bluish" cars. Is there a way to specify more than than 1 'string', such as "Black" and "Blue", to exist replaced by NA

Solution

                                  read.csv                  (                  file                                                      =                                                      "data/inflammation-01.csv"                  ,                                                      na.strings                                                      =                                                      "0"                  )                                                                  

or , in car-speeds.csv use a character vector for multiple values.

                                  read.csv                  (                                                      file                                                      =                                                      'information/car-speeds.csv'                  ,                                                      na.strings                                                      =                                                      c                  (                  "Black"                  ,                                                      "Blue"                  )                                                      )                                                                  

Write a New .csv and Explore the Arguments

After altering our cars dataset by replacing 'Bluish' with 'Green' in the $Color column, nosotros now want to save the output. There are several arguments for the write.csv(...) function call, a few of which are particularly important for how the data are exported. Let'southward explore these now.

                          # Export the information. The write.csv() function requires a minimum of two                                          # arguments, the data to be saved and the name of the output file.                                          write.csv              (              carSpeeds              ,                                          file                                          =                                          'data/motorcar-speeds-cleaned.csv'              )                                                  

If you open the file, you'll run across that it has header names, because the data had headers within R, but that in that location are numbers in the start cavalcade.

csv written without row.names argument

The row.names Argument

This argument allows us to set the names of the rows in the output data file. R'south default for this argument is TRUE, and since it does not know what else to name the rows for the cars information gear up, it resorts to using row numbers. To right this, we tin set row.names to Faux:

                          write.csv              (              carSpeeds              ,                                          file                                          =                                          'data/car-speeds-cleaned.csv'              ,                                          row.names                                          =                                          Faux              )                                                  

Now nosotros see:

csv written with row.names argument

Setting Cavalcade Names

In that location is too a col.names argument, which can be used to set the cavalcade names for a data fix without headers. If the data gear up already has headers (e.grand., nosotros used the headers = Truthful statement when importing the data) and then a col.names argument will be ignored.

The na Argument

There are times when we desire to specify certain values for NAsouthward in the data ready (due east.grand., nosotros are going to pass the data to a program that but accepts -9999 equally a nodata value). In this case, we desire to ready the NA value of our output file to the desired value, using the na statement. Permit'southward see how this works:

                          # First, supersede the speed in the 3rd row with NA, past using an index (square                                          # brackets to indicate the position of the value we want to supersede)                                          carSpeeds              $              Speed              [              three              ]                                          <-                                          NA                                          head              (              carSpeeds              )                                                  
                          Colour Speed     State ane  Blue    32 NewMexico two   Red    45   Arizona 3  Blueish    NA  Colorado iv White    34   Arizona 5   Crimson    25   Arizona 6  Bluish    41   Arizona                      
                          write.csv              (              carSpeeds              ,                                          file                                          =                                          'data/car-speeds-cleaned.csv'              ,                                          row.names                                          =                                          FALSE              )                                                  

Now we'll set NA to -9999 when nosotros write the new .csv file:

                          # Annotation - the na statement requires a cord input                                          write.csv              (              carSpeeds              ,                                          file                                          =                                          'information/car-speeds-cleaned.csv'              ,                                          row.names                                          =                                          Faux              ,                                          na                                          =                                          '-9999'              )                                                  

And we see:

csv written with -9999 as NA

Central Points

  • Import data from a .csv file using the read.csv(...) function.

  • Sympathise some of the key arguments bachelor for importing the data properly, including header, stringsAsFactors, as.is, and strip.white.

  • Write data to a new .csv file using the write.csv(...) office

  • Understand some of the key arguments available for exporting the information properly, such as row.names, col.names, and na.

greenearom1989.blogspot.com

Source: https://swcarpentry.github.io/r-novice-inflammation/11-supp-read-write-csv/

0 Response to "R Only Reading First Column of Csv"

ارسال یک نظر

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel