Well then: welcome to the R world 😉
Here you go
Setting up the code
urls <- c( "http://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html", "http://en.wikipedia.org/wiki/Xz", "xxxxx" ) readUrl <- function(url) { out <- tryCatch( { # Just to highlight: if you want to use more than one # R expression in the "try" part then you'll have to # use curly brackets. # 'tryCatch()' will return the last evaluated expression # in case the "try" part was completed successfully message("This is the 'try' part") readLines(con=url, warn=FALSE) # The return value of `readLines()` is the actual value # that will be returned in case there is no condition # (e.g. warning or error). # You don't need to state the return value via `return()` as code # in the "try" part is not wrapped inside a function (unlike that # for the condition handlers for warnings and error below) }, error=function(cond) { message(paste("URL does not seem to exist:", url)) message("Here's the original error message:") message(cond) # Choose a return value in case of error return(NA) }, warning=function(cond) { message(paste("URL caused a warning:", url)) message("Here's the original warning message:") message(cond) # Choose a return value in case of warning return(NULL) }, finally={ # NOTE: # Here goes everything that should be executed at the end, # regardless of success or error. # If you want more than one expression to be executed, then you # need to wrap them in curly brackets ({...}); otherwise you could # just have written 'finally=<expression>' message(paste("Processed URL:", url)) message("Some other message at the end") } ) return(out) }
Applying the code
> y <- lapply(urls, readUrl) Processed URL: http://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html Some other message at the end Processed URL: http://en.wikipedia.org/wiki/Xz Some other message at the end URL does not seem to exist: xxxxx Here's the original error message: cannot open the connection Processed URL: xxxxx Some other message at the end Warning message: In file(con, "r") : cannot open file 'xxxxx': No such file or directory
Investigating the output
> head(y[[1]]) [1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">" [2] "<html><head><title>R: Functions to Manipulate Connections</title>" [3] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">" [4] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">" [5] "</head><body>" [6] "" > length(y) [1] 3 > y[[3]] [1] NA
Additional remarks
tryCatch
tryCatch
returns the value associated to executing expr
unless there’s an error or a warning. In this case, specific return values (see return(NA)
above) can be specified by supplying a respective handler function (see arguments error
and warning
in ?tryCatch
). These can be functions that already exist, but you can also define them within tryCatch()
(as I did above).
The implications of choosing specific return values of the handler functions
As we’ve specified that NA
should be returned in case of error, the third element in y
is NA
. If we’d have chosen NULL
to be the return value, the length of y
would just have been 2
instead of 3
as lapply()
will simply “ignore” return values that are NULL
. Also note that if you don’t specify an explicit return value via return()
, the handler functions will return NULL
(i.e. in case of an error or a warning condition).
“Undesired” warning message
As warn=FALSE
doesn’t seem to have any effect, an alternative way to suppress the warning (which in this case isn’t really of interest) is to use
suppressWarnings(readLines(con=url))
instead of
readLines(con=url, warn=FALSE)
Multiple expressions
Note that you can also place multiple expressions in the “actual expressions part” (argument expr
of tryCatch()
) if you wrap them in curly brackets (just like I illustrated in the finally
part).