Search for a command to run...
KEY POINT: Logistic regression is used to estimate the relationship between one or more independent variables and a binary (dichotomous) outcome variable.Related Article, see p 367 In this issue of Anesthesia & Analgesia, Park et al1 report results of an observational study on the risk of hypoxemia (defined as a peripheral oxygen saturation <90%) during rapid sequence induction (RSI) versus a modified RSI technique in infants and neonates undergoing pyloromyotomy. The authors used logistic regression to analyze the association between the induction technique and the risk of hypoxemia while controlling for potential confounders. Logistic regression is used to estimate the association of one or more independent (predictor) variables with a binary dependent (outcome) variable.2 A binary (or dichotomous) variable is a categorical variable that can only take 2 different values or levels, such as “positive for hypoxemia versus negative for hypoxemia” or “dead versus alive.” A simple example with only one independent variable (X) is shown in the Figure, where the dependent variable can have a value of either 0 or 1. In this example, as the value of the independent variable increases, the probability that the dependent variable takes value of 1 also seems to increase. More formally, logistic regression can be used to estimate the probability (or risk) of a particular outcome given the value(s) of the independent variable(s).Figure.: Relationship between a continuous independent variable X and a binary outcome that can take values 0 (eg, “no”) or 1 (eg, “yes”). As shown, the probability that the value of the outcome is 1 seems to increase with increasing values of X. A, Using a straight line to model the relationship of the independent variable with the probability provides a poor fit; results in estimated probabilities <0 and >1; and grossly violates the assumptions of linear regression. Logistic regression models a linear relationship of the independent variable with the natural logarithm (ln) of the odds of the outcome variable. B, This translates to a sigmoid relationship between the independent variable and the probability of the outcome being 1, with predicted probabilities appropriately constrained between 0 and 1.Logistic regression is actually an extension of linear regression.2,3 Rather than modeling a linear relationship between the independent variable (X) and the probability of the outcome (Figure A), which is unnatural since it would allow predicted probabilities outside the range of 0–1, it assumes a linear (straight line) relationship with the logit (the natural logarithm of the odds) of the outcome. The regression coefficients represent the intercept (b0) and slope (b1) of this line When solving this equation for the probability (P), the probability has a sigmoidal relationship with the independent variable (Figure A), and the estimated probabilities are now appropriately constrained between 0 and 1. As with linear regression, logistic regression can be easily extended to accommodate >1 independent (predictor) variable. Researchers can then study the relationship between each variable and the binary (dichotomous) outcome while holding constant the values of the other independent variables. This is particularly useful not only to understand the independent relationship of each variable with the outcome, but also, as done by Park et al,1 to adjust the estimates for the effects of confounding variables4 in observational research. A major advantage of logistic regression compared to other similar approaches like probit regression—and therefore, a reason for its popularity among medical researchers—is that the exponentiated logistic regression slope coefficient (eb) can be conveniently interpreted as an odds ratio. The odds ratio indicates how much the odds of a particular outcome change for a 1-unit increase in the independent variable (for continuous independent variables) or versus a reference category (for categorical variables). For example, the adjusted odds ratio of 2.8 (95% confidence interval [CI], 1.5-5.3) reported by Park et al1 indicates that the odds of hypoxemia is estimated to be almost 3 times higher when receiving conventional RSI as compared to the modified technique (the reference category in their analysis), after controlling for and thus accounting for the potential confounders. Valid inferences of logistic regression rely on its assumptions being met, which include For simple logistic regression with a continuous independent variable, its relationship with the logit is assumed to be linear. This is basically also true for multivariable logistic regression, but the model can be specified to accommodate a curved relationship. Observations must be independent of each other (eg, must not be repeated measurements within the same subjects). Other techniques, like generalized linear mixed-effects models, are required for correlated data.5 The model needs to be specified correctly, as explained in more detail in the Statistical Minute on linear regression in the previous issue of Anesthesia & Analgesia.3 Several methods are available to assess (a) the calibration (how closely the observed risk matches the predicted risk)―commonly assessed with the Hosmer-Lemeshow goodness of fit test; and (b) the discrimination (how well the binary outcome can be predicted)―commonly assessed by the area under a receiver operator characteristic curve (also referred to as the c-statistic) of the logistic regression model.6 This Statistical Minute focuses on binary logistic regression, which is usually simply referred to as “logistic regression.” Additional techniques are available for categorical data (multinomial logistic regression) and ordinal data (ordinal logistic regression).
Published in: Anesthesia & Analgesia
Volume 132, Issue 2, pp. 365-366