What do these R glm error messages mean: “Error: no valid set of coefficients has been found: please supply starting values”

Here are two related questions but they are not duplicates of mine as the first one has a solution specific to the data set and the second one involves a failure of glm when start is supplied alongside an offset.

https://stackoverflow.com/questions/31342637/error-please-supply-starting-valueshttps://stackoverflow.com/questions/8212063/r-glm-starting-values-not-accepted-log-link

I have the following dataset:

library(data.table)
df <- data.frame(names = factor(1:10))
set.seed(0)
df$probs <- c(0, 0, runif(8, 0, 1))
df$response = lapply(df$probs, function(i){
  rbinom(50, 1, i)  
})



dt <- data.table(df)

dt <- dt[, list(response = unlist(response)), by = c('names', 'probs')]

such that dt is:

> dt
     names     probs response 
  1:     1 0.0000000        0 
  2:     1 0.0000000        0 
  3:     1 0.0000000        0 
  4:     1 0.0000000        0 
  5:     1 0.0000000        0 
 ---                                     
496:    10 0.9446753        0 
497:    10 0.9446753        1 
498:    10 0.9446753        1 
499:    10 0.9446753        1 
500:    10 0.9446753        1 

I am trying to fit a logistic regression model with the identity link, using lm2 <- glm(data = dt, formula = response ~ probs, family = binomial(link='identity')).

This gives an error:

Error: no valid set of coefficients has been found: please supply starting values

I tried fixing it by supplying a start argument, but then I get another error.

> lm2 <- glm(data = dt, formula = response ~ probs, family = binomial(link='identity'), start = c(0, 1))
Error: cannot find valid starting values: please specify some

At this point these errors make no sense to me and I have no idea what to do.

EDIT: @iraserd has thrown some more light on this problem. Using start = c(0.5, 0.5), I get:

> lm2 <- glm(data = dt, formula = response ~ probs, family = binomial(link='identity'), start = c(0.5, 0.5))
There were 25 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: step size truncated: out of bounds
2: step size truncated: out of bounds
3: step size truncated: out of bounds
4: step size truncated: out of bounds
5: step size truncated: out of bounds
6: step size truncated: out of bounds
7: step size truncated: out of bounds
8: step size truncated: out of bounds
9: step size truncated: out of bounds
10: step size truncated: out of bounds
11: step size truncated: out of bounds
12: step size truncated: out of bounds
13: step size truncated: out of bounds
14: step size truncated: out of bounds
15: step size truncated: out of bounds
16: step size truncated: out of bounds
17: step size truncated: out of bounds
18: step size truncated: out of bounds
19: step size truncated: out of bounds
20: step size truncated: out of bounds
21: step size truncated: out of bounds
22: step size truncated: out of bounds
23: step size truncated: out of bounds
24: step size truncated: out of bounds
25: glm.fit: algorithm stopped at boundary value

and

> summary(lm2)

Call:
glm(formula = response ~ probs, family = binomial(link = "identity"), 
    data = dt, start = c(0.5, 0.5))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4023  -0.6710   0.3389   0.4641   1.7897  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.486e-08  1.752e-06   0.008    0.993    
probs       9.995e-01  2.068e-03 483.372   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 69312  on 49999  degrees of freedom
Residual deviance: 35984  on 49998  degrees of freedom
AIC: 35988

Number of Fisher Scoring iterations: 24

I highly suspect this has something to do with the fact that some of the responses are generated with true probability zero which causes problems as the coefficient of probs approaches 1.

Leave a Comment