15 vroom

The vroom package provides a speedy way to read in and write out data. Try using vroom when working with very large data sets. Install vroom if you haven’t already:

15.1 Read

vroom::vroom() reads in delimited data (csv, tsv, etc.).

#> Observations: 3,496
#> Variables: 9
#> chr [6]: city, state, city_size, mode, state_abb, state_region
#> dbl [3]: n, percent, moe
#> 
#> Call `spec()` for a copy-pastable column specification
#> Specify the column types with `col_types` to quiet this message
#> # A tibble: 3,496 x 9
#>   city       state    city_size mode      n percent   moe state_abb state_region
#>   <chr>      <chr>    <chr>     <chr> <dbl>   <dbl> <dbl> <chr>     <chr>       
#> 1 Aberdeen … South D… Small     Bike    110     0.8   0.5 SD        North Centr…
#> 2 Acworth c… Georgia  Small     Bike      0     0     0.4 GA        South       
#> 3 Addison v… Illinois Small     Bike     43     0.2   0.3 IL        North Centr…
#> 4 Adelanto … Califor… Small     Bike      0     0     0.5 CA        West        
#> 5 Adrian ci… Michigan Small     Bike    121     1.5   1   MI        North Centr…
#> 6 Agawam To… Massach… Small     Bike      0     0     0.2 MA        Northeast   
#> # … with 3,490 more rows

vroom::vroom() is so fast because it doesn’t actually read in all the data. It reads in a portion of the data, remembers where it stopped, and reads in more only if it needs to later. This is all done automatically and you shouldn’t need to think much about how it works.

vroom has the same interface as the readr functions. The arguments you’re familiar with from read_csv()col_types, skip, etc.–also work in vroom::vroom().

If you want to see how the columns were parsed, call spec().

#> Observations: 3,496
#> Variables: 9
#> chr [6]: city, state, city_size, mode, state_abb, state_region
#> dbl [3]: n, percent, moe
#> 
#> Call `spec()` for a copy-pastable column specification
#> Specify the column types with `col_types` to quiet this message
#> cols(
#>   city = col_character(),
#>   state = col_character(),
#>   city_size = col_character(),
#>   mode = col_character(),
#>   n = col_double(),
#>   percent = col_double(),
#>   moe = col_double(),
#>   state_abb = col_character(),
#>   state_region = col_character()
#> )

You can then copy this output and supply it, with or without changes, to col_types.

#> # A tibble: 3,496 x 9
#>   city       state    city_size mode      n percent   moe state_abb state_region
#>   <chr>      <chr>    <chr>     <chr> <dbl>   <dbl> <dbl> <chr>     <chr>       
#> 1 Aberdeen … South D… Small     Bike    110     0.8   0.5 SD        North Centr…
#> 2 Acworth c… Georgia  Small     Bike      0     0     0.4 GA        South       
#> 3 Addison v… Illinois Small     Bike     43     0.2   0.3 IL        North Centr…
#> 4 Adelanto … Califor… Small     Bike      0     0     0.5 CA        West        
#> 5 Adrian ci… Michigan Small     Bike    121     1.5   1   MI        North Centr…
#> 6 Agawam To… Massach… Small     Bike      0     0     0.2 MA        Northeast   
#> # … with 3,490 more rows

This is useful if you want to ensure that vroom() parses the data exactly as you want.

15.2 Write

You can also write with vroom.

vroom::vroom_write() is the speedy analogue of the readr write functions (read_delim(), read_csv(), etc.). Note that, by default, vroom::vroom_write() uses tabs as its delimiter, creating a .tsv file.

If you want to write to a .csv file, use delim = ",".