This example is an experiemnt studying the effect of gamma radiation on the number of chromosomal abnormalities (ca). The number of cells differ at each run of experiment. So it makes sense to normalize ca by the number of cells. The example comes from Faraway book chapter 5.3

data(dicentric, package="faraway")
dicentric

Visualization of data

dicentric$rate <- dicentric$ca / dicentric$cells
plot(rate ~ jitter(doseamt), dicentric)

plot(rate ~ doserate, dicentric, col = dicentric$doseamt)

Rate itself does not seem to linearly depend on predictors. Effect of predictors seems to be multiplicative

library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
dicentric$rate <- dicentric$ca / dicentric$cells
plot(log(rate) ~ jitter(doseamt), dicentric)

plot(log(rate) ~ log(doserate), dicentric, col = dicentric$doseamt)

Use a linear model

fit.linear <- lm(log(rate) ~ factor(doseamt) * log(doserate), dicentric)
summary(fit.linear)
## 
## Call:
## lm(formula = log(rate) ~ factor(doseamt) * log(doserate), data = dicentric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.16950 -0.05413  0.00585  0.06717  0.16343 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -2.76243    0.03352 -82.402  < 2e-16 ***
## factor(doseamt)2.5                1.64378    0.04741  34.672  < 2e-16 ***
## factor(doseamt)5                  2.77866    0.04741  58.610  < 2e-16 ***
## log(doserate)                     0.07561    0.02866   2.638 0.015364 *  
## factor(doseamt)2.5:log(doserate)  0.16483    0.04053   4.067 0.000553 ***
## factor(doseamt)5:log(doserate)    0.19480    0.04053   4.807 9.47e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1006 on 21 degrees of freedom
## Multiple R-squared:  0.9943, Adjusted R-squared:  0.9929 
## F-statistic: 729.4 on 5 and 21 DF,  p-value: < 2.2e-16
plot(residuals(fit.linear) ~ exp(fitted(fit.linear)), xlab = "fitted rate", ylab = "residuals")

In this dataset, the linear model actually seems fine if we transform the response (in the book the residual plot looks different as the response is not log-transformed)

Use Poisson regression

A more principled way is to use Poisson regression on the ca counts. We need to put log(cells) as a covariate as the effect of # of cells on ca count should be multiplicative

fit.poisson <- glm(ca ~ log(cells) + log(doserate)*factor(doseamt), family = poisson, data = dicentric)
summary(fit.poisson)
## 
## Call:
## glm(formula = ca ~ log(cells) + log(doserate) * factor(doseamt), 
##     family = poisson, data = dicentric)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.49901  -0.62229  -0.05021   0.76919   1.59525  
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                      -2.76534    0.38116  -7.255 4.02e-13 ***
## log(cells)                        1.00252    0.05137  19.517  < 2e-16 ***
## log(doserate)                     0.07200    0.03547   2.030 0.042403 *  
## factor(doseamt)2.5                1.62984    0.10273  15.866  < 2e-16 ***
## factor(doseamt)5                  2.76673    0.12287  22.517  < 2e-16 ***
## log(doserate):factor(doseamt)2.5  0.16111    0.04837   3.331 0.000866 ***
## log(doserate):factor(doseamt)5    0.19316    0.04299   4.493 7.03e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 916.127  on 26  degrees of freedom
## Residual deviance:  21.748  on 20  degrees of freedom
## AIC: 211.15
## 
## Number of Fisher Scoring iterations: 4

The coefficient of log(cells) is very close to 1

Poisson regression for the rate directly

To improve intepretation, we can force of coefficient of log(cells) to be 1

fit.poisson.offset <- glm(ca ~ offset(log(cells)) + log(doserate)*factor(doseamt), family = poisson, data = dicentric)
summary(fit.poisson.offset)
## 
## Call:
## glm(formula = ca ~ offset(log(cells)) + log(doserate) * factor(doseamt), 
##     family = poisson, data = dicentric)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.49101  -0.62473  -0.05078   0.76786   1.59115  
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                      -2.74671    0.03426 -80.165  < 2e-16 ***
## log(doserate)                     0.07178    0.03518   2.041 0.041299 *  
## factor(doseamt)2.5                1.62542    0.04946  32.863  < 2e-16 ***
## factor(doseamt)5                  2.76109    0.04349  63.491  < 2e-16 ***
## log(doserate):factor(doseamt)2.5  0.16122    0.04830   3.338 0.000844 ***
## log(doserate):factor(doseamt)5    0.19350    0.04243   4.561  5.1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 4753.00  on 26  degrees of freedom
## Residual deviance:   21.75  on 21  degrees of freedom
## AIC: 209.16
## 
## Number of Fisher Scoring iterations: 4

The deviance residuals look good. Also, the residual deviance is very small compared to the null deviance. We keep seeing a strong interaction effect.