odds.n.ends
was created in order to take the results
from a binary logistic regression model estimated using the
glm()
package and compute model significance, model fit,
and the odds ratios and 95% confidence intervals typically reported from
binary logistic regression analyses.
The small demonstration data set includes three variables. The first
is a binary outcome variable (sick
) with two values, 1 and
0 where 1 represents sick and 0 represents not sick. The second is an
integer representing age in years (age
) as one of the
predictors, and a three-category nominal variable showing smoking status
(smoke
).
# enter demo data
sick <- c(0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0)
age <- c(23, 25, 26, 34, 54, 46, 48, 95, 81, 42, 62, 25, 31, 49, 57, 52, 54, 63, 61, 50,
43, 35, 26, 74, 34, 46, 43, 65, 81, 42, 62, 25, 21, 47, 51, 22, 34, 59, 26, 55)
smoke <- c('Former', 'Former', 'Former', 'Never', 'Current', 'Current', 'Current', 'Current', 'Never', 'Former', 'Never', 'Former', 'Current', 'Former', 'Never', 'Current', 'Current', 'Current', 'Former', 'Never','Former', 'Former', 'Former', 'Never', 'Current', 'Current', 'Current', 'Current', 'Never', 'Former', 'Never', 'Former', 'Current', 'Former', 'Never', 'Current', 'Current', 'Current', 'Former', 'Never')
# create data frame
smokeData <- data.frame(sick, age, smoke)
The glm()
function will be used to estimate a binary
logistic regression model predicting the sick
outcome based
on age
and smoke
.
# estimate the logistic regression model object
logisticModel <- glm(formula = sick ~ age + smoke, data = smokeData, na.action = na.exclude, family = binomial(logit))
# print model summary for the logistic model object
summary(object = logisticModel)
##
## Call:
## glm(formula = sick ~ age + smoke, family = binomial(logit), data = smokeData,
## na.action = na.exclude)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.28649 1.58753 -2.070 0.0384 *
## age 0.10442 0.03711 2.814 0.0049 **
## smokeFormer -1.12544 0.94693 -1.189 0.2346
## smokeNever -2.47194 1.25103 -1.976 0.0482 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 54.548 on 39 degrees of freedom
## Residual deviance: 37.896 on 36 degrees of freedom
## AIC: 45.896
##
## Number of Fisher Scoring iterations: 5
The summary contains model coefficients, coefficient significance, and deviance and AIC which are measures of lack of fit of the model. While this information is useful in determining which of the predictors is significant and whether the deviance (lack of fit) was reduced between a null model with no predictors in it and an estimated model.
# open odds.n.ends package
library(package = "odds.n.ends")
# get the basics
odds.n.ends(mod = logisticModel)
## Waiting for profiling to be done...
## $`Logistic regression model significance`
## Chi-squared d.f. p
## 16.652 3.000 0.001
##
## $`Contingency tables (model fit): frequency predicted`
## Number observed
## Number predicted 1 0 Sum
## 1 19 4 23
## 0 4 13 17
## Sum 23 17 40
##
## $`Count R-squared (model fit): percent correctly predicted`
## [1] 80
##
## $`Model sensitivity`
## [1] 0.826087
##
## $`Model specificity`
## [1] 0.7647059
##
## $`Predictor odds ratios and 95% CI`
## OR 2.5 % 97.5 %
## (Intercept) 0.03738466 0.001102466 0.6610966
## age 1.11006273 1.041062741 1.2081565
## smokeFormer 0.32450861 0.045942281 2.0537937
## smokeNever 0.08442065 0.005379054 0.8158007
The results show that the model was statistically significantly
better than a baseline model at explaining the outcome [χ2(3) = 16.652; p =
.001]. The model correctly predicted 19 of those who were sick
(sick = 1
) and 13 of those who were not sick
(sick = 0
), for a total of 32 correctly predicted out of 40
(Count-R2 = .80 or
80% correctly predicted). The model was more sensitive, with 82.6% of
those who were sick (the cases) correctly predicted, and less specific,
with 76.5% of the members of the reference group correctly predicted.
Age was a statistically significant predictor of the outcome; for every
one year increase in age, the odds of being sick increased by 11% (OR =
1.11; 95% CI: 1.04 - 1.21). There was no statistically significant
difference in odds of being sick for former smokers compared to current
smokers. Never smokers had 92% lower odds of being sick compared to
current smokers; this decrease was statistically significant (OR = .08;
95% CI: .005 - .82).
The odds.n.ends
package has several additional options
including the ability to get an ROC curve (use option
rocPlot = TRUE
) and histograms of predicted probabilities
(use option predProbPlot = TRUE
). Colors for these plots
can be set with options color1 =
and color2 =
.
Finally, the threshold for a predicted probability being counted as a
case (outcome = 1) has a default value of .5, so any predicted
probability that is .5 or higher will be counted as a case, and any
predicted probability below .5 will be counted as a reference group
member (outcome = 0). This threshold can be adjusted using the
thresh =
argument.