Binary logistic regression analysis enables the estimation of the relationship between one or more independent (or explanatory) variables and the dependent (or outcome) variable with two categories. The regression coefficient () of a logistic regression is the estimated increase in the log odds of the outcome per unit increase in the value of the predictor variable.
More formally, let be the binary outcome variable indicating no/yes with 0/1, and be the probability of to be 1, so that . Let be a set of explanatory variables. Then, the logistic regression of on estimates parameter values for ,…, via the maximum likelihood method of the following equation:
Additionally, the exponential function of the regression coefficient () is obtained, which is the odds ratio () associated with a one‑unit increase in the explanatory variable. Then, in terms of probabilities, the equation above is translated into the following:
The transformation of log odds () into odds ratios (; ) makes the data more interpretable in terms of probability. The odds ratio () is a measure of the relative likelihood of a particular outcome across two groups. The odds ratio for observing the outcome when an antecedent is present is:
where represents the “odds” of observing the outcome when the antecedent is present, and represents the “odds” of observing the outcome when the antecedent is not present. Thus, an odds ratio indicates the degree to which an explanatory variable is associated with a categorical outcome variable with two categories (e.g. yes/no) or more than two categories. An odds ratio below one denotes a negative association; an odds ratio above one indicates a positive association; and an odds ratio of one means that there is no association. For instance, if the association between being a female teacher and having chosen teaching as first choice as a career is being analysed, the following odds ratios would be interpreted as:
0.2: Female teachers are five times less likely to have chosen teaching as a first choice as a career than male teachers.
0.5: Female teachers are half as likely to have chosen teaching as a first choice as a career than male teachers.
0.9: Female teachers are 10% less likely to have chosen teaching as a first choice as a career than male teachers.
1: Female and male teachers are equally likely to have chosen teaching as a first choice as a career.
1.1: Female teachers are 10% more likely to have chosen teaching as a first choice as a career than male teachers.
2: Female teachers are twice as likely to have chosen teaching as a first choice as a career than male teachers.
5: Female teachers are five times more likely to have chosen teaching as a first choice as a career than male teachers.
The odds ratios in bold indicate that the relative risk/odds ratio is statistically significantly different from 1 at the 95% confidence level. To compute statistical significance around the value of 1 (the null hypothesis), the relative‑risk/odds‑ratio statistic is assumed to follow a log‑normal distribution, rather than a normal distribution, under the null hypothesis.
Binary logistic regressions cannot provide a goodness‑of‑fit measure that would be equivalent to the R‑squared (R²), which represents the proportion of the observed variation in the dependent (or outcome) variable that can be explained by the independent (or explanatory) variables. Unlike linear regressions with normally distributed residuals, it is not possible to find a closed-form expression for the coefficient values that maximise the likelihood function of logistic regressions; thus, an iterative process must be used instead. Yet, the goodness-of-fit of binary logistic models can be evaluated by the pseudo‑R².1 Similarly to the R², the pseudo‑R² also ranges from 0 to 1, with higher values indicating better model fit. Nevertheless, pseudo‑R² cannot be interpreted as one would interpret the R².