I’ve posted an issue on dplyr github page. I can reproduce the results using the code below. It has to do with whether the csv contains a column of rownames without a header. read_csv
and read.csv
handle this differently, thus producing differing results with filter
.
First the case when it works
write_csv
to read_csv
or read.csv
; both work fine with filter
library(readr) library(dplyr) mtcars %>% write_csv("~/Desktop/test.csv") test_r <- read_csv("~/Desktop/test.csv") %>% filter(hp>100) test.r <- read.csv("~/Desktop/test.csv") %>% filter(hp>100)
Now for when it fails
When csv is generated through a process like write.csv
, unless the person changes the default of row.names
to FALSE
, it introduces a column of rownames w/o a header. When reading the data back in, read_csv
does not fill in the header where the rownames are, but read.csv
imputes an X. Thus, when filter
works on read.csv
imports, it has all headers with filled cells, but filter
after read_csv
has an empty header cell at least where rownames are.
The following code should error after test1_r %>% filter(hp>100)
with the following error
Error in filter_impl(.data, dots) : attempt to use zero-length variable name
Again, the big difference is how write.csv produces the csv.
mtcars %>% write.csv("~/Desktop/test1.csv") test1_r <- read_csv("~/Desktop/test1.csv") test1_r %>% str() #should fail here test1_r %>% filter(hp>100) test1.r <- read.csv("~/Desktop/test1.csv") test1.r %>% str() test1.r %>% filter(hp>100)
To solve the problem, you can use read.csv
as mentioned above by @hackR. Or you can subset out the first column when you know the csv behaves like this:
test1_r <- read_csv("~/Desktop/test1.csv")[-1]
Or, if you are in control of the csv-creation step, you can add the option row.names=FALSE
to write.csv
mtcars %>% write.csv("~/Desktop/test2.csv", row.names = FALSE) test2.r <- read_csv("~/Desktop/test2.csv") test2.r %>% str() test2.r%>% filter(hp>100)
or use write_csv
as shown above.