Prediction models in obstetrics: understanding the treatment paradox and potential solutions to the threat it poses
Obstetrics focuses on the early identification of pregnancies at risk of adverse outcomes to plan targeted intervention. Clinicians often use probabilistic reasoning, intuitively based on clinical history and tests, to assess the risk of complications in a mother or fetus; however, they need to be aware of false-positive and false-negative test results in their clinical decision-making. Prediction models (also known as prognostic models) provide individualised risk estimates for clinically important outcomes in patients with a particular disease or condition.1 Derived using statistical models, they include multiple predictors, such as age, previous history, and, increasingly, biomarkers. Although clinicians’ intuition has a place and a role in prediction, it has been shown that statistical prediction models give more accurate prognosis than clinicians can achieve working on their own.2
As in other clinical fields, the development and use of prediction models in obstetrics has limitations.3 In the development phase there are many statistical challenges, including: the ascertainment of a suitable sample size; the choice of candidate predictors; reliable measurement of the outcome and predictors; the identification of important predictors and their functional form; and internal validation, potentially including bootstrap resampling and cross-validation, as well as shrinkage for potential over-optimism in model performance.4 Most, if not all, models perform well when they are internally validated; however, the use of prediction models has been hampered by a distinct lack of research into the external validation of prediction models distant from the specific population that they were developed in, and the assessment of models on the behaviour of doctors and on patient outcomes. Another issue that limits clinical use, and which is the focus for this commentary, is the handling of interventions in the prediction model.
To understand the challenges in obstetric prediction, we focus this commentary on models for complications in pre-eclampsia as an exemplar. Pre-eclampsia, particularly with an early onset before 34 weeks of gestation, is one of the most common causes of adverse maternal and fetal outcomes. The management of women with pre-eclampsia includes an assessment of clinical history, the elicitation of specific signs, and investigations for maternal and fetal wellbeing.5 Delivery reduces the probability of maternal complications, but prolonging the pregnancy is usually desirable for the fetus, especially when pre-eclampsia is diagnosed at preterm gestations. The administration of treatments with known effectiveness, e.g. magnesium sulphate and parenteral antihypertensives, minimise the risk of eclampsia and intracerebral haemorrhage.6
The development of a prediction model for adverse outcomes in women with pre-eclampsia poses challenges typical in obstetrics. A strong predictor of a common complication that triggers an effective treatment may prevent the occurrence of a certain proportion of adverse outcomes. In this situation the predictor that triggered the treatment in the first place will appear poorer in its predictive performance when used in a simple model, a phenomenon described as the treatment paradox.7, 8 This problem applies widely in obstetrics: for example, during prognostication in preterm birth, tocolytic effectiveness may mask predictive accuracy.
A recently published prediction model for adverse maternal outcomes, pre-eclampsia integrated estimate of risk (PIERS), did not identify high blood pressure as a predictor of adverse outcomes.9 This may be caused by the dilution of the predictive value of blood pressure as a result of the treatment paradox. During model development, women with high blood pressure are likely to receive effective interventions (antihypertensive or delivery). Although they may be at high risk, with effective treatment many of them will not develop the outcome. If blood pressure is truly a predictor of high(er) risk in untreated patients, effective intervention will make it look like a poor predictor during the development of a model in treated patients. This problem frequently jeopardises obstetric modelling, altering the true predictor–outcome association and the natural outcome rate (incidence). Ignoring such a predictor in modelling has important consequences.
In pre-eclampsia, a prediction model without blood pressure risks overlooking patients with high blood pressure, and may actually underestimate the outcome risks (as an important predictor is missed). Incorporating the predictor in a simple model has other consequences. Data from routine care provide estimates of outcome risks in the context of current care, where treatment decisions are made based on some of the predictors, so it may produce falsely low prediction. How should these challenges be handled to produce appropriate analyses and outputs?
We convened a panel of experts in pre-eclampsia and prognostic research, to explore the potential solutions in the development of a valid prediction model for adverse maternal or fetal outcomes. The 24-member panel comprised obstetricians, statisticians, clinical epidemiologists, and researchers (Appendix S1). The consensus panel focused in particular on the methodological challenges that arise from the treatment paradox in the development of a prediction model for complications in women with early-onset pre-eclampsia for the PREP study (Development and validation of a prediction model for risk of complications in women with early-onset pre-eclampsia).10
The basics: population, predictors, and outcomes
First, the panel agreed upon relevant methodological issues regarding population, predictors, and outcomes in models for predicting complications in early-onset pre-eclampsia. A definition of the study sample at the inception of the cohort is essential for generalisability, and for the assessment of the usefulness of any model. With the multisystemic nature of pre-eclampsia, different predictor groups may be pathophysiologically associated with different outcomes. The case mix impacts on the distribution of the predictors and the prevalence of the outcomes, and this in turn impacts on predictor–outcome associations, thereby influencing the accuracy of the model. For example, abnormal liver function tests and haematological variables may be expected to be more predictive of placental abruption and postpartum haemorrhage.
The composition of the cohort should allow the capture of all relevant predictors and outcomes. In women with early-onset pre-eclampsia, more than one outcome, such as eclampsia or abruption, are clinically important. As only around 1% of all pregnant mothers develop early-onset pre-eclampsia, the rates of occurrence of individual outcomes are very low. Robust models require large sample sizes such that there are 10–15 outcome events per predictor.11 For a feasible study, composite maternal and fetal outcomes would be required and competing risk models could be considered. Developed using a Delphi consensus survey (Table 1),9 the outcomes should be prioritised for clinical importance to allow for the selection of one outcome per case when there are several recorded.
Predictors | Headache, epigastric pain, chest pain, visual disturbance, hyper reflexia, nausea, vomiting, dyspnoea, clonus, pre-existing medical conditions, number of fetuses, platelet count, haemoglobin, blood pressure, serum uric acid, serum urea, serum creatinine, proteinuria level, liver function tests, renal function tests, pulse oximetry, maternal age and gestational age at diagnosis of early onset pre-eclampsia |
Maternal outcomes | Mortality, hepatic dysfunction, hepatic haematoma or rupture, Glasgow Coma Scale <13, eclamptic seizures, posterior reversible encephalopathy, Bell's palsy, stroke, cortical blindness, reversible ischaemic neurological deficit (RIND), retinal detachment, acute renal insufficiency, renal dialysis, transfusion of blood products, positive inotropic support, myocardial ischaemia or infarction, need for >50% oxygen for more than 1 hour, need for intubation, pulmonary oedema, abruption, postpartum haemorrhage, and delivery at <34 weeks of gestation |
Fetal outcomes | Perinatal or infant mortality, bronchopulmonary dysplasia (defined as oxygen requirement at 36 weeks of corrected gestation unrelated to an acute respiratory episode), necrotising enterocolitis (include only Bell's stage 2 or 3; definition – evidence of pneumotosis intestinalis on abdominal X–ray and/or surgical intervention), grade–III/IV intraventricular haemorrhage, cystic periventricular leukomalacia, stage 3–5 retinopathy of prematurity, hypoxic ischaemic encephalopathy [Apgar score ≤5 at 10 minutes and/or pH 7.00 in first 60 minutes of life and/or base deficit ≥−16 in first 60 minutes associated with abnormal conscious level (lethargy, stupor, or coma) and seizures, and/or poor/weak suck, and/or hypotonia, and/or abnormal reflexes] |
Prediction models are population-specific and may produce different results in different populations. This is particularly true in settings where all women with predisposing conditions are treated with the same protocol.
Minimising bias from the treatment paradox in models developed in the context of current care
The following methods could address bias from the treatment paradox, when it is not possible or unethical to withhold treatment during the development of a prediction model.
Standardisation of treatment
The standardisation of treatment across predictor levels will ensure that the predictor of interest is always used for initiating treatment (creating complete collinearity in the modelling). When a treatment becomes specific for a predictor, such as high blood pressure and the administration of anti-hypertensives, it becomes possible to interpret the results (e.g. normal blood pressure goes with an absence of anti-hypertensives). This, however, requires fully standardised care with no variation in treatment: i.e. the same antihypertensive medication and dosage must be provided at exactly the same treatment thresholds. Although the management of early-onset pre-eclampsia, such as the commencement of anti-hypertensives and magnesium sulphate, is somewhat standardised by guidelines [e.g. from the National Institute for Health and Care Excellence (NICE) in the UK], the threshold for commencing treatment varies between clinicians and centres. Furthermore, the response from a specific antihypertensive and dosage varies between individual patients. This limits the applicability of such a strategy, although one could consider the use of multilevel models to allow for any differences between clinicians and treatment centres.
Predictor substitution
Bias arising from the treatment paradox may be minimised by removing all of the predictors upon which the decision to treat is based, and substituting these for an alternative predictor. This may be possible in certain settings. For example, Cukjati et al.12 found it difficult to predict wound healing based on the attributes of the initial wound, patient, and treatment. Instead, they used weekly follow-up measurements of the wound area to predict successful wound healing; however, in women with early-onset pre-eclampsia, this poses a problem, as clinicians rely on more than one predictor, such as blood pressure, symptoms, and haematological and biochemical indices, when deciding on treatment. Removal of these will severely restrict the ability to include any meaningful predictors in the model. Furthermore, other predictors are potentially correlated with the predictors used to make treatment decisions; for example, a high body mass index (BMI) is correlated with high blood pressure. Therefore, the substitution of predictors is difficult.
Treatment as a predictor
The ‘decision to treat’ or actual treatment, such as the administration of antihypertensive, could itself be considered a predictor; however, the ‘decision to treat’ may be influenced by other modelled predictors, such as hypertension. Differentiating treatment from predictor effects becomes difficult. We could adjust for the interaction between ‘decision to treat’ as a predictor and each of the other prognostic factors in the model; however, when many predictors are involved, or when ‘decision to treat’ is based on multiple predictors, this approach becomes complex. In such situations, extremely large sample sizes are needed for the reliable assessment of interactions.
Treatment as an outcome
When starting a treatment is likely to prevent an adverse outcome, those who received the treatment could also be considered to have experienced the outcome. For example, in a prediction model used to predict complications in patients with chronic pulmonary disease, the authors did not focus on mortality as an outcome but instead used hospitalisation and need for treatment, i.e. symptomatic deterioration requiring pulsed oral steroid use.13 In women with early-onset pre-eclampsia, if a large proportion of women are delivered at an early preterm gestation (before 34 weeks), then delivery itself could be considered as an outcome (replacing complications that would have occurred in the absence of delivery). In the absence of a standardised protocol for decision to deliver at early preterm gestation, such an approach could help to overcome the limitations in the model as a result of delivery preventing the occurrence of an adverse outcome.
Propensity scores
Another possible solution for the treatment paradox is to use propensity scores. The propensity score is the probability of a treatment being assigned to an individual based on observed pre-treatment variables.9 One can designate propensity scores to each individual in the study taking into account the multiway interactions with other predictors (on which the decision to treat was based).
Such a model provides risk estimates for an individual taking into account the probability of requiring treatment. It requires a prior knowledge of the propensity score, and has limited clinical applicability for making decisions on whether to treat.
A related option is to take into account each participant's propensity score by using it to define a weight for the contribution of that particular participant towards the logistic model analysis and development of the prognostic model. Women who received little or no treatment (and hence less treatment effect) should carry more weight than those who had a high probability of being treated.
In the absence of these considerations and appropriate adjustments, all models affected by the treatment paradox will yield invalid outputs.
Conclusion
Prediction modelling in obstetric research is pivotal, but faces many challenges. Clinicians and researchers should be aware of the potential problems, particularly the so-called treatment paradox, in the development of models using routine care data, and how this affects the interpretation and use of such models for clinical management. New models that are developed, validated, or updated should address these methodological issues.
Disclosure of interests
None declared. Completed disclosure of interests form available to view online as supporting information.
Contribution to authorship
All authors contributed to the drafts and final version of the article.
Details of ethics approval
The PREP study received ethical approval from the National Research Ethics Service Committee West Midlands (approval number 11/WM/0248).
Funding
The National Institute for Health Research (NIHR) – Health Technology Assessment programme funded the PREP study and the prognostic expert meeting (HTA 09/22/163).
Acknowledgements
We would like to thank members of the PREP prognostic meeting expert panel for their contribution in prioritising the composite outcomes, and for their help in addressing the challenges encountered with developing the PREP prediction models.