What is integer overflow in R and how can it happen?

You can answer many of your questions by reading the help page ?integer. It says:

R uses 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9.

Expanding to larger integers is under consideration by R Core but it’s not going to happen in the near future.

If you want a “bignum” capacity then install Martin Maechler’s Rmpfr package [PDF]. I recommend the ‘Rmpfr’ package because of its author’s reputation. Martin Maechler is also heavily involved with the Matrix package development, and in R Core as well. There are alternatives, including arithmetic packages such as ‘gmp’, ‘Brobdingnag’ and ‘Ryacas’ package (the latter also offers a symbolic math interface).

Next, to respond to the critical comments in the answer you linked to, and how to assess the relevance to your work, consider this: If there were the same statistical functionality available in one of those “modern” languages as there is in R, you would probably see a user migration in that direction. But I would say that migration, and certainly growth, is in the R direction at the moment. R was built by statisticians for statistics.

There was at one time a Lisp variant with a statistics package, Xlisp-Stat, but its main developer and proponent is now a member of R-Core. On the other hand one of the earliest R developers, Ross Ihaka, suggests working toward development in a Lisp-like language [PDF]. There is a compiled language called Clojure (pronounced as English speakers would say “closure”) with an experimental interface, Rincanter.

Update:

The new versions of R (3.0.+) has 53 bit integers of a sort (using the numeric mantissa). When an “integer” vector element is assigned a value in excess of ‘.Machine$integer.max’, the entire vector is coerced to “numeric”, a.k.a. “double”. Maximum value for integers remains as it was, however, there may be coercion of integer vectors to doubles to preserve accuracy in cases that would formerly generate overflow. Unfortunately, the length of lists, matrix and array dimensions, and vectors is still set at integer.max.

When reading in large values from files, it is probably safer to use character-class as the target and then manipulate. If there is coercion to NA values, there will be a warning.

2 thoughts on “What is integer overflow in R and how can it happen?”

  1. whoah this blog is wonderful i really like reading your articles. Keep up the great paintings! You realize, a lot of people are hunting round for this info, you could help them greatly.

    Reply

Leave a Comment