Volume 55, Issue 3 p. 357-367
Original Paper
Free Access

Development and validation of predictive models for QUiPP App v.2: tool for predicting preterm birth in women with symptoms of threatened preterm labor

J. Carter

Corresponding Author

J. Carter

Department of Women and Children's Health, School of Life Course Sciences, King's College London, London, UK

Correspondence to: Dr J. Carter, Department of Women and Children's Health, School of Life Course Sciences, King's College London, 10th Floor, North Wing, St Thomas' Hospital Campus, Westminster Bridge Road, London, SE1 7EH, UK (e-mail: [email protected])Search for more papers by this author
P. T. Seed

P. T. Seed

Department of Women and Children's Health, School of Life Course Sciences, King's College London, London, UK

Search for more papers by this author
H. A. Watson

H. A. Watson

Department of Women and Children's Health, School of Life Course Sciences, King's College London, London, UK

Search for more papers by this author
A. L. David

A. L. David

Institute for Women's Health, University College London, London, UK

National Institute for Health Research, University College London Hospitals, Biomedical Research Centre, London, UK

Search for more papers by this author
J. Sandall

J. Sandall

Department of Women and Children's Health, School of Life Course Sciences, King's College London, London, UK

Search for more papers by this author
A. H. Shennan

A. H. Shennan

Department of Women and Children's Health, School of Life Course Sciences, King's College London, London, UK

A.H.S. and R.M.T. are joint senior authors.Search for more papers by this author
R. M. Tribe

R. M. Tribe

Department of Women and Children's Health, School of Life Course Sciences, King's College London, London, UK

A.H.S. and R.M.T. are joint senior authors.Search for more papers by this author
First published: 06 August 2019
Citations: 36

ABSTRACT

en

Objective

To develop enhanced prediction models to update the QUiPP App prototype, a tool providing individualized risk of spontaneous preterm birth (sPTB), for use in women with symptoms of threatened preterm labor (TPTL), incorporating risk factors, transvaginal ultrasound assessment of cervical length (CL) and cervicovaginal fluid quantitative fetal fibronectin (qfFN) test results.

Methods

Participants were pregnant women between 23 + 0 and 34 + 6 weeks' gestation with symptoms of TPTL, recruited as part of four prospective cohort studies carried out at 16 UK hospitals between October 2010 and October 2017. The training set comprised all women whose outcomes were known in May 2017 (n = 1032). The validation set comprised women whose outcomes were gathered between June 2017 and March 2018 (n = 506). Parametric survival models were developed for three combinations of predictors: risk factors plus qfFN test results alone, risk factors plus CL alone, and risk factors plus both qfFN and CL. The best models were selected using the Akaike and Bayesian information criteria. The estimated probability of sPTB < 30, < 34 or < 37 weeks' gestation and within 1 or 2 weeks of testing was calculated and receiver-operating-characteristics (ROC) curves were created to demonstrate the diagnostic ability of the prediction models.

Results

Predictive statistics were similar between the training and the validation sets at most outcome time points and for each combination of predictors. Areas under the ROC curves (AUC) demonstrated that all three algorithms had good accuracy for the prediction of sPTB at < 30, < 34 and < 37 weeks' gestation and within 1 and 2 weeks' post-testing in the validation set, particularly the model combining risk factors plus qfFN alone (AUC: 0.96 at < 30 weeks; 0.85 at < 34 weeks; 0.77 at < 37 weeks; 0.91 at < 1 week from testing; and 0.92 at < 2 weeks from testing).

Conclusions

Validation of the new prediction models suggests that the QUiPP App v.2 can reliably calculate risk of sPTB in women with TPTL. Use of the QUiPP App in practice could lead to better targeting of intervention, while providing reassurance and avoiding unnecessary intervention in women at low risk. Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd.

RESUMEN

es

Desarrollo y validación de modelos predictivos para la Aplicación QUiPP v.2: herramienta para predecir el parto pretérmino en mujeres con síntomas de amenaza de parto prematuro

Objetivo

Desarrollar modelos de predicción mejorados para actualizar el prototipo de la Aplicación QUiPP, una herramienta que proporciona el riesgo individualizado de parto pretérmino espontáneo (PPTE), para su uso en mujeres con síntomas de amenaza de parto pretérmino (APPT), mediante la incorporación de los factores de riesgo, la evaluación de la longitud cervical (LC) mediante ecografía transvaginal y los resultados de la prueba de fibronectina fetal cuantitativa (qfFN, por sus siglas en inglés) del líquido cérvico-vaginal.

Métodos

Las participantes fueron mujeres embarazadas entre 23 + 0 y 34 + 6 semanas de gestación con síntomas de APPT, reclutadas como parte de cuatro estudios de cohorte prospectivos llevados a cabo en 16 hospitales del Reino Unido entre octubre de 2010 y octubre de 2017. El grupo de entrenamiento comprendía a todas las mujeres cuyos resultados se conocían en mayo de 2017 (n = 1032). El grupo de validación estaba compuesto por mujeres cuyos resultados se recogieron entre junio de 2017 y marzo de 2018 (n = 506). Se desarrollaron modelos paramétricos de supervivencia para tres combinaciones de predictores: factores de riesgo más resultados de pruebas de qfFN solamente, factores de riesgo más LC solamente, y factores de riesgo más tanto qfFN como LC. Los mejores modelos fueron seleccionados utilizando los criterios de información de Akaike y Bayesiano. Se calculó la probabilidad estimada de PPTE a <30, <34 o <37 semanas de gestación y dentro de 1 o 2 semanas de la prueba y se crearon curvas de la característica operativa del receptor (ROC, por sus siglas en inglés) para demostrar la capacidad de diagnóstico de los modelos de predicción.

Resultados

Las estadísticas de predicción fueron similares entre los grupos de entrenamiento y de validación en la mayoría de los puntos de tiempo de los resultados y para cada combinación de predictores. Las áreas bajo las curvas (ABC) ROC demostraron que los tres algoritmos tuvieron una buena precisión para la predicción del PPTE a <30, <34 y <37 semanas de gestación y dentro de 1 a 2 semanas después de la prueba en el grupo de validación, en particular el modelo que combina los factores de riesgo más qfFN por si solo (ABC: 0,96 a <30 semanas; 0,85 at <34 semanas; 0,77 at <37 semanas; 0,91 at <1 semana de la prueba; y 0,92 a <2 semanas de la prueba

Conclusiones

La validación de los nuevos modelos de predicción sugiere que la Aplicación QUiPP v.2 puede calcular de manera fiable el riesgo de PPTE en mujeres con APPT. El uso de la Aplicación QUiPP en la práctica podría llevar a un mejor cribado para la intervención, a la vez que daría seguridad y evitaría intervenciones innecesarias en mujeres con bajo riesgo.

摘要

zh

QUiPP App v.2预测模型的开发与验证:先兆早产症状妇女的早产预测工具

目标

开发增强型预测模型,完善先兆早产症状(TPTL)妇女的QUiPP App原型,结合危险因素、经阴道超声评估宫颈长度(CL)和宫颈阴道液定量胎儿纤维粘连蛋白(qfFN)检测结果预测自然早产个体化风险(sPTB)。

方法

试验参与者是一群妊娠23+0至34+6周显示TPTL症状的孕妇,她们被招募参加2010年10月至2017年10月之间在英国16家医院进行的四项前瞻性群组研究中的一项或几项。训练集包括2017年5月得出结果的所有孕妇(n=1032)。验证集包括2017年6月至2018年3月间收集结果的孕妇(n=506)。针对三种预测变量组合开发了几个参数生存模型:危险因素与qfFn检验结果、危险因素与CL、危险因素加上qfFn和CL。根据Akaike和Bayesian信息准则选出了几个最佳模型。计算了妊娠30周、34周或37周内以及试验1周或2周内的sPTB估算概率,画出了受试者工作特征(ROC)曲线,以证明预测模型的诊断能力。

结果

在大多数结果时间点以及每种预测变量组合下,训练集和验证集之间的预测统计结果是相似的。从ROC曲线(AUC)下部区域可以看出,这三种算法的验证集妊娠30周、34周或37周内以及试验后1周2周内sPTB估算准确率都很高,尤其是结合危险因素和qfFn的那个模型(妊娠30周34周和37周内、试验后1周2周内的AUC分别为0.96、0.85、0.77、0.91和0.92)。

结论

新型预测模型的验证结果表明,QUiPP App v.2可以精确估算TPTL孕妇的sPTB风险。在临床实践中应用QUiPP应用程序可以实施精准干预,同时让低风险孕妇放心,也避免不必要的干预。

CONTRIBUTION

What are the novel findings of this work?

The QUiPP App v.2 is an enhanced, reliable risk-assessment tool that combines risk factors with fetal fibronectin and cervical length and calculates a simple percentage risk of spontaneous preterm birth.

What are the clinical implications of this work?

Use of the QUiPP App v.2 could increase confidence in clinical decisions, improve targeting and timing of interventions to reduce preterm birth and its associated morbidities, and limit unnecessary intervention and women's anxiety.

INTRODUCTION

Although advances in healthcare have resulted in more, and younger, babies surviving early delivery, preterm birth (PTB) remains hard to predict, even in women with symptoms of threatened preterm labor (TPTL). Symptoms of TPTL are not accurate predictors of PTB1, 2, but many symptomatic women receive unnecessary interventions because the consequences of not treating those in true preterm labor could be devastating. Overtreatment results in avoidable exposure to the danger of adverse effects, particularly when receiving repeat doses of steroids for fetal lung maturation, which has been associated with reduced birth weight3-5. It also results in significant unnecessary healthcare expenditure6-8.

In the absence of a definitive diagnostic test for early labor, management decisions are based on assessment of risk; however, risk assessment for TPTL is difficult due to its multifactorial nature. Risk factors, such as previous spontaneous PTB (sPTB) and cervical surgery, along with gestational age and nature of symptoms, need to be considered. Additional tests, such as transvaginal ultrasound (TVS) cervical length (CL) measurement and fetal fibronectin (fFN) can aid clinical decision-making9-13. The potential of fFN as a predictive marker for sPTB has been long established14-19. In the UK, fFN test results were, until relatively recently, presented as dichotomous (i.e. positive or negative) based on a threshold of 50 ng/mL. Newer fFN analyzers provide results as concentrations in ng/mL, and it has been suggested that using alternative thresholds, i.e. < 10 and > 200 ng/mL, rather than 50 ng/mL, may improve positive prediction14, 20.

We developed the QUiPP mobile phone application (www.quipp.org) for clinical-decision support in women at increased risk of sPTB. This easy-to-use risk-assessment tool combines risk factors and test results to calculate a simple individualized percentage risk of delivery. The algorithms used in generating the risk prediction scores for the first version of the QUiPP App have been reported previously21, 22. The present paper reports further development and validation of algorithms for calculating sPTB risk in women with symptoms of TPTL, incorporated in the second version of the App (QUiPP App v.2). These new algorithms improve the utility and flexibility of the QUiPP App through the introduction of additional risk factors, including twin pregnancy, and calculation of risk using quantitative fFN (qfFN) (ng/mL) alone, TVS-CL (mm) alone or a combination of the two tests.

METHODS

Upgrade of the QUiPP App was originally planned to include all recruited patients from the PETRA study, an observational study designed to collect data for this purpose (REC reference 14/LO/1988). However, before the participant recruitment phase was completed, a funding application to evaluate the QUiPP App's usability, acceptability and effect on management, in a clinical trial, was successful (EQUIPTT, REC Ref. 17/LO/1802; ISRCTN trial registry number ISRCTN17846337). A decision was taken to update the predictive algorithms prior to completion of the PETRA study, using all outcome data already gathered at the end of May 2017 along with relevant (i.e. matched eligibility criteria and symptoms of TPTL) participant data from the earlier prospective cohort studies EQUIPP (REC Ref. 10/H0806/68), POPPY (REC Ref. 09/H0802/97) and INSIGHT (REC Ref. 13/LO/0393). These studies all investigated PTB prediction and, similar to the PETRA study, utilized our Preterm Birth Studies database (www.medscinet.net/ptbstudies). A summary of study characteristics and number of participants from each study included in the training set is shown in Figure S1.

Participants were pregnant women between 23 + 0 and 34 + 6 weeks' gestation with symptoms of TPTL, e.g. abdominal pain or tightening. Women were excluded if diagnosed with established labor, ruptured membranes or antepartum hemorrhage. qfFN test results and CL measurements were known to attending clinicians, and management, e.g. hospital admission and administration of steroids and tocolytics, was according to local protocols. All data were gathered between October 2010 and October 2017 from 16 UK hospitals.

The training set for this analysis (n = 1032) included data from 382 (37%) participants of the EQUIPP study that had also been used for development of the algorithms for symptomatic women used in the first version of the QUiPP App21, although this was limited to data relating to women with only fFN test results and singleton pregnancy. As there was no statistical reason to exclude these data from the current analysis, they were incorporated in order to increase the predictive ability of the new models. Inclusion of additional data on CL and twin pregnancies, collected since creation of the first prediction models, increases the flexibility of the QUiPP App in clinical practice. The new models were tested by calibration before being applied in the second version of the QUiPP App. CE Marking as a Class 1 Medical Device was granted before general release of the App in September 2017 (MHRA Reference Number for Medical Device/standalone software Z301 registration is A015030). Formal validation of the prediction models was carried out on a different subset of women (n = 506) after completion of the PETRA study (April 2018). Statistical analysis was carried out using Stata SE software (version 14.2; StataCorp LP, College Station, TX, USA).

Power calculation

A power calculation was performed prior to commencement of the PETRA study. We anticipated that clinicians would be willing to consider women in the lower-risk group as being closer to normal (i.e. standard risk) if the true rate of preterm labor in this group could be demonstrated (with 95% confidence) to be lower than the expected rate for women with TPTL symptoms (i.e. lower than 10% with a best estimate of 6.7%), and concluded that full data on 550 standard-risk and 61 high-risk women (total 611 women) in the proposed validation would be sufficient to achieve 80% power in the PETRA study. Allowing for 95% compliance and completion, a recruitment target of 643 women was considered adequate to validate the predictive value of each test (qfFN and CL) with an additional 300 to be used as a training set. The final numbers used in the development and validation of the new algorithms provided sufficient power to achieve our objectives.

Model generation

In total, six prediction algorithms were needed for the development of the new version of the QUiPP App: three for symptomatic women and three for asymptomatic high-risk women. The algorithm is selected according to whether the woman is asymptomatic but at high risk of sPTB (e.g. with a previous history of PTB, preterm prelabor rupture of membranes (PPROM), late miscarriage or cervical surgery) or symptomatic of TPTL (any risk status), and whether her risk assessment includes qfFN concentration alone, CL measurement alone or both test results. Data were therefore split and tested in six groups: asymptomatic high-risk women with (1) qfFN test, (2) CL measurement, and (3) both test results; and women with symptoms of TPTL (any risk status) with (4) qfFN test, (5) CL measurement, and (6) both test results. In this paper, we report findings from the validation of the three algorithms appropriate for the cohorts with symptoms of TPTL. Development and validation of the predictive models for asymptomatic high-risk women is reported in another paper by our group published in the same issue of the Journal23.

Women with incomplete data, invalid visits (outside gestational-age range or inappropriate symptoms), invalid or missing test results, sexual intercourse within the past 24 h or major fetal abnormality were excluded from the analysis. Women with a twin pregnancy were included, but triplets and higher-order multiples were excluded due to inadequate numbers. In twin pregnancies, the gestational age at delivery and the outcomes of the first baby were used in the analysis. Women whose labor was induced or who had Cesarean section following PPROM were regarded as having had sPTB.

Women who had received an intervention intended to reduce the risk of sPTB (i.e. tocolysis, progesterone, cerclage or Arabin pessary) were not excluded from analysis. This is justified because, as this was not a randomized trial, any estimated treatment differences were likely to have been misleading, and could even have been in the wrong direction when compared with the true treatment effect. There is a treatment paradox such that patients undergoing treatment typically have more severe disease and worse outcomes than do those not selected for treatment, which could potentially create the impression that treatment has a negative effect on the condition. A decision to treat to prevent early delivery is therefore a marker of greater risk for early delivery, even if the treatment itself tends to lengthen the gestation. Looking solely at untreated women would mean selecting the women perceived as at lowest risk, and so underestimating the risk of prematurity. While not ideal, including all women irrespective of treatment gives us the best estimate of risk in the clinical setting in which the data were gathered. Additionally, if we included the changes associated with treatment in the App, there would be a real risk of clinicians using it to decide not to offer treatment, even in cases in which treatment would be beneficial.

Cox's proportional hazards regression was used to determine which predictive risk factors to use in the model. Factors tested included demographic characteristics (i.e. maternal age, body mass index (BMI), ethnicity, deprivation score and smoking), clinical risk factors (i.e. previous history of PTB or PPROM, previous late miscarriage, previous cervical surgery or twin pregnancy) and test results (qfFN and TVS-CL). Simple regression methods were not sophisticated enough to create the QUiPP App prediction models because time to delivery after testing has to be very precise, with very smooth survival curves, and therefore parametric survival analysis was used. This process involved testing the data using several different parametric survival analysis functions, namely exponential, gamma, Gompertz, log-logistic, log-normal and Weibull.

When women presented more than once in the same pregnancy for TPTL assessment, later results were introduced as time-updated covariates, i.e. if delivery had not occurred before the next visit, the predicted risk was recalculated based on the gestational age at the next visit. In survival analysis, data are ‘censored’ if the outcome of interest has not occurred during the follow-up period24. In this study, data were censored if sPTB had not occurred by 37 weeks' gestation. Iatrogenic PTB was treated as a non-event, so data relating to women with this pregnancy outcome were censored at 37 weeks. Checks were undertaken to determine whether the data needed to be transformed before analysis using fractional polynomials. The entire procedure was repeated for each of the three combinations of predictors (i.e. risk factors plus qfFN only, risk factors plus CL only, and risk factors plus both tests), and different models were produced in each case. The best models were then determined by reference to Akaike's information criterion and the Bayesian information criterion, according to which of the lowest values are considered to have the best fit to the data25. This is a method developed for comparing non-nested regression models for which significance tests are not available.

None of the predetermined demographic factors (maternal age, BMI, ethnicity, deprivation score and smoking) affected prediction of sPTB in the models. This was not because they have no value as predictors in themselves, but because the other predictors (i.e. major risk factors, qfFN and CL) were much stronger, so addition of the demographic factors did not affect the overall score. In testing the model for women who had both fFN and CL test results, multivariate regression analysis showed that only previous cervical surgery provided additional predictive power to fFN and CL test results in women with symptoms of TPTL. However, the composite of risk factors used in the QUiPP App v.2 predictive algorithms for high-risk asymptomatic women (i.e. multiple pregnancy, history of sPTB or PPROM, or previous late miscarriage or cervical surgery) was tested to establish whether it affected the prediction in symptomatic women. There was little difference, so a decision was made to use this composite of risk factors for consistency.

The prediction models were then tested by simple calibration. This involves comparing individual tests of clinically significant groups to confirm that the observed event rates were consistent with the predicted probability of the event. A 5% prediction rate for sPTB within 7 days of testing was used as the threshold because this was the lowest value of a range of 5–15% that was suggested should be recommended for intervention according to our TPTL Delphi consensus survey26. The calibration tests provided reassurance that the models were acceptable to proceed with development of the QUiPP App v.2 before formal validation was undertaken.

Model validation

Validation was carried out on a later subset of women from the PETRA study for whom outcomes were gathered between June 2017 and March 2018 (n = 506). Predictive statistics, including sensitivity, specificity, balanced accuracy ((sensitivity + specificity)/2), likelihood ratios, positive (PPV) and negative (NPV) predictive values, and separation probabilities (PPV + NPV − 100%), were calculated using a percentage risk of ≥ 5% as an indication of a positive test. This threshold was chosen, as in the calibration exercise, in accordance with the findings of our TPTL Delphi consensus survey26. Results are tabulated with statistics for both the training and validation sets, by test group (risk factors plus qfFN alone, CL alone or both tests), for prediction of sPTB at < 30, < 34 and < 37 weeks' gestation and within 1 and 2 weeks' post-testing. These time points were chosen because, first, the gestations at delivery are clinically important indicators for likely neonatal morbidity, and second, they are useful in guiding appropriate management, such as the timing of steroids. Receiver-operating-characteristics (ROC) curves were drawn and areas under the curve (AUC) were calculated.

RESULTS

During the study period, 1760 pregnancies between 23 + 0 and 34 + 6 weeks' gestation with symptoms of TPTL were seen. Following exclusions, the training dataset comprised 1173 observations from 1032 women with fFN test results and 229 observations from 204 women with both qfFN and CL measurements (Figure 1). The validation set comprised 588 observations (n = 506 women): 576 qfFN tests from 502 women, 155 CL measurements from 132 women, and 143 observations with both qfFN and CL measurements from 128 women. The training set included 41 sets of twins and the validation set 33 sets. Demographic characteristics and risk status of participants in the training and validation sets are shown in Table 1.

Details are in the caption following the image
Flowchart showing inclusion of women with symptoms of threatened preterm labor (TPTL) in training and validation datasets for development and validation of predictive models for QUiPP App v.2. CL, cervical length; qfFN, quantitative fetal fibronectin.
Table 1. Demographic characteristics and risk factors for spontaneous preterm birth (sPTB) in women with symptoms of threatened preterm labor included in training and validation datasets for generation and validation of models for QUiPP App v.2
Parameter Training set (n = 1032) Validation set (n = 506) Both groups combined (n = 1538)
Maternal age (years) 29.9 ± 5.7 29.8 ± 6.0 29.9 ± 5.8
BMI (kg/m2)* 26.1 ± 5.9 26.0 ± 6.1 26.1 ± 5.9
IMD score 31.3 ± 13.5 30.2 ± 15.2 30.9 ± 14.1
Ethnicity
European 562 (54.9) 326 (64.4) 888 (58.0)
African or Caribbean 70 (6.8) 33 (6.5) 103 (6.7)
Asian (India/Pakistan/Bangladesh) 277 (27.1) 92 (18.2) 369 (24.1)
Other (incl. Chinese) 115 (11.2) 55 (10.9) 170 (11.1)
Risk factor for sPTB
Previous PTB < 37 weeks 158 (15.3) 83 (16.4) 241 (15.7)
Previous PPROM < 37 weeks 74 (7.2) 34 (6.7) 108 (7.0)
Previous late miscarriage 79 (7.7) 13 (2.6) 92 (6.0)
Cervical surgery 65 (6.3) 26 (5.1) 91 (5.9)
Twin pregnancy 41 (4.0) 33 (6.5) 74 (4.8)
  • Data are mean ± SD or n (%).
  • Data available for:
  • * 1025 patients in training and 506 in validation set;
  • 947 in training and 504 in validation set;
  • 1024 patients in training and 506 in validation set.
  • BMI, body mass index; IMD, index of multiple deprivation; PTB, preterm birth; PPROM, preterm prelabor rupture of membranes.

In cases in which intervention status was known, 30.3% (310/1024) of women received steroids for fetal lung maturation and 8.2% (115/1405) received tocolysis. Most women (92.4%, 1292/1399) received no prophylactic intervention (e.g. cerclage or progesterone) for PTB risk.

The incidences of sPTB < 30, < 34 and < 37 weeks' gestation, and < 1 and < 2 weeks of testing for each of the test groups and between training and validation sets were similar (Table 2).

Table 2. Incidence of spontaneous preterm birth (sPTB) at < 30, < 34 and < 37 weeks' gestation and within 1 and 2 weeks from testing in training and validation sets, according to whether women underwent quantitative fetal fibronectin (qfFN) testing alone, cervical-length measurement (CL) alone or both tests
sPTB Training set Validation set
sPTB sPTB
Number of tests (n) n % (95% CI) Number of tests (n) n % (95% CI)
qfFN alone
< 30 weeks* 574 22 3.8 (2.4–5.7) 272 10 3.7 (1.8–6.7)
< 34 weeks 1066 60 5.6 (4.3–7.2) 520 26 5.0 (3.3–7.2)
< 37 weeks 1173 144 12.3 (10.5–14.3) 576 68 11.8 (9.3–14.7)
< 1 week from testing 1173 15 1.3 (0.7–2.1) 576 13 2.3 (1.2–3.8)
< 2 weeks from testing 1173 38 3.2 (2.3–4.4) 576 18 3.1 (1.9–4.9)
CL alone
< 30 weeks* 147 17 11.6 (6.9–17.9) 92 9 9.8 (4.6–17.8)
< 34 weeks 214 41 19.2 (14.1–25.1) 150 17 11.3 (6.7–17.5)
< 37 weeks 229 69 30.1 (24.3–36.5) 155 32 20.6 (14.6–27.9)
< 1 week from testing 229 8 3.5 (1.5–6.8) 155 7 4.5 (1.8–9.1)
< 2 weeks from testing 229 21 9.2 (5.8–13.7) 155 8 5.2 (2.3–9.9)
qfFN and CL
< 30 weeks* 147 17 11.6 (6.9–17.9) 83 8 9.6 (4.3–18.1)
< 34 weeks 214 41 19.2 (14.1–25.1) 138 16 11.6 (6.8–18.1)
< 37 weeks 229 69 30.1 (24.3–36.5) 143 31 21.7 (15.2–29.3)
< 1 week from testing 229 8 3.5 (1.5–6.8) 143 7 4.9 (2.0–9.8)
< 2 weeks from testing 229 21 9.2 (5.8–13.7) 143 8 5.6 (2.4–10.7)
  • * Some women were recruited after 30 weeks and therefore not included here.
  • Some women were recruited after 34 weeks and therefore not included here.

Predictive statistics

Three algorithms were developed so that the QUiPP App could be used in different symptomatic TPTL scenarios, i.e. when a woman has (1) qfFN testing alone, (2) CL measurement alone, or (3) both tests. The created prediction models generated formulae that provided individual risk scores dependent on risk factors and test results (Appendix S1).

Table 3 presents predictive statistics of the three algorithms tested on the training and validation sets for prediction of sPTB at < 30, < 34 and < 37 weeks' gestation and within 1 and 2 weeks of testing. We observed a reasonable similarity between the training and validation sets at most outcome time points and for each combination of predictors. In the qfFN group (the largest group), the ability of the algorithm to predict sPTB at < 30 weeks' gestation had the highest balanced accuracy with a sensitivity of 90.0%, specificity of 90.8%, positive likelihood ratio of 9.83, negative likelihood ratio of 0.11, PPV of 27.3% and NPV of 99.6% in the validation set. Although NPV is always high when prevalence is low, the finding that the NPV is greater than the overall proportion of women unaffected, as is the case for all combinations of predictors for sPTB < 30 weeks' gestation (risk factors plus qfFN alone: 99.6% vs 96.3%; risk factors plus CL alone: 98.1% vs 90.2%; risk factors plus qfFN and CL: 100% vs 90.4%), demonstrates the usefulness of the QUiPP App as a predictive test.

Table 3. Predictive statistics for spontaneous preterm birth (sPTB) at < 30, < 34 and < 37 weeks' gestation and within 1 and 2 weeks from testing for QUiPP App v.2 algorithms used in women with cervicovaginal fluid quantitative fetal fibronectin (qfFN) results alone, transvaginal ultrasound cervical-length (CL) measurement alone or both test results
sPTB Training set Validation set
Sens (%) Spec (%) Balanced accuracy* LR+ LR− PPV (%) NPV (%) Sens (%) Spec (%) Balanced accuracy* LR+ LR− PPV (%) NPV (%)
qfFN alone
< 30 weeks 81.8(59.7–94.8) 92.9(90.5–94.9) 87.38(76.66–93.58) 11.58(8.07–16.62) 0.20(0.08–0.47) 31.6(19.9–45.2) 99.2(98.0–99.8) 90.0(55.5–99.7) 90.8(86.7–94.0) 90.42(75.96–96.57) 9.83(6.37–15.16) 0.11(0.02–0.71) 27.3(13.3–45.5) 99.6(97.7–100)
< 34 weeks 80.0(67.7–89.2) 74.2(71.3–76.8) 77.08(71.41–81.91) 3.10(2.63–3.65) 0.27(0.16–0.45) 15.6(11.7–20.1) 98.4(97.3–99.2) 84.6(65.1–95.6) 70.9(66.6–74.8) 77.73(69.54–84.22) 2.90(2.34–3.60) 0.22(0.09–0.54) 13.3(8.5–19.4) 98.9(97.1–99.7)
< 37 weeks 82.6(75.4–88.4) 62.7(59.6–65.6) 72.66(68.98–76.05) 2.21(1.99–2.47) 0.28(0.19–0.40) 23.7(20.0–27.6) 96.3(94.5–97.6) 80.9(69.5–89.4) 56.9(52.5–61.2) 68.89(63.28–73.99) 1.88(1.61–2.19) 0.34(0.20–0.55) 20.1(15.5–25.3) 95.7(92.8–97.7)
< 1 week from testing 73.3(44.9–92.2) 94.0(92.4–95.3) 83.64(68.78–92.23) 12.13(8.29–17.75) 0.28(0.12–0.66) 13.6(7.0–23.0) 99.6(99.1–99.9) 53.8(25.1–80.8) 92.0(89.5–94.1) 72.93(56.27–84.94) 6.74(3.79–11.98) 0.50(0.28–0.90) 13.5(5.6–25.8) 98.9(97.5–99.6)
< 2 weeks from testing 81.6(65.7–92.3) 86.6(84.5–88.5) 84.09(76.81–89.41) 6.09(4.93–7.53) 0.21(0.11–0.42) 16.9(11.8–23.2) 99.3(98.5–99.7) 83.3(58.6–96.4) 84.2(80.9–87.2) 83.78(73.07–90.77) 5.28(3.99–7.00) 0.20(0.07–0.56) 14.6(8.4–22.9) 99.4(98.2–99.9)
CL alone
< 30 weeks 94.1(71.3–99.9) 63.8(55.0–72.1) 78.98(69.94–85.86) 2.60(2.01–3.37) 0.09(0.01–0.62) 25.4(15.3–37.9) 98.8(93.5–100) 88.9(51.8–99.7) 61.4(50.1–71.9) 75.17(60.75–85.55) 2.31(1.61–3.29) 0.18(0.03–1.16) 20.0(9.1–35.6) 98.1(89.7–100)
< 34 weeks 92.7(80.1–98.5) 35.8(28.7–43.5) 64.26(56.75–71.13) 1.44(1.25–1.66) 0.20(0.07–0.62) 25.5(18.7–33.3) 95.4(87.1–99.0) 100(80.5–100) 34.6(26.6–43.3) 67.29(57.50–75.78) 1.53(1.35–1.73) 0.00 16.3(9.8–24.9) 100(92.3–100)
< 37 weeks 100(94.8–100) 8.1(4.4–13.5) 54.06(47.19–60.78) 1.09(1.04–1.14) 0.00 31.9(25.8–38.6) 100(75.3–100) 100(89.1–100) 5.7(2.3–11.4) 52.85(43.43–62.06) 1.06(1.02–1.11) 0.00 21.6(15.3–29.1) 100(59.0–100)
< 1 week from testing 87.5(47.3–99.7) 81.0(75.2–85.9) 84.25(68.68–92.88) 4.60(3.16–6.72) 0.15(0.02–0.97) 14.3(5.9–27.2) 99.4(96.9–100) 57.1(18.4–90.1) 78.4(70.9–84.7) 67.76(46.69–83.45) 2.64(1.30–5.38) 0.55(0.23–1.29) 11.1(3.1–26.1) 97.5(92.8–99.5)
< 2 weeks from testing 81.0(58.1–94.6) 66.8(60.0–73.2) 73.89(63.78–81.97) 2.44(1.84–3.24) 0.29(0.12–0.69) 19.8(12.0–29.8) 97.2(93.0–99.2) 75.0(34.9–96.8) 63.3(54.9–71.1) 69.13(51.77–82.37) 2.04(1.30–3.21) 0.40(0.12–1.32) 10.0(3.8–20.5) 97.9(92.6–99.7)
qfFN and CL
< 30 weeks 100(80.5–100) 68.5(59.7–76.3) 84.23(77.72–89.11) 3.17(2.46–4.08) 0.00 29.3(18.1–42.7) 100(95.9–100) 100(63.1–100) 60.0(48.0–71.1) 80.00(69.22–87.68) 2.50(1.89–3.30) 0.00 21.1(9.6–37.3) 100(92.1–100)
< 34 weeks 92.7(80.1–98.5) 44.5(37.0–52.2) 68.60(61.40–74.99) 1.67(1.43–1.96) 0.16(0.05–0.49) 28.4(20.9–36.8) 96.3(89.4–99.2) 93.8(69.8–99.8) 39.3(30.6–48.6) 66.55(55.66–75.91) 1.55(1.28–1.87) 0.16(0.02–1.07) 16.9(9.8–26.3) 98.0(89.1–99.9)
< 37 weeks 94.2(85.8–98.4) 16.3(10.9–22.9) 55.23(48.37–61.89) 1.12(1.03–1.23) 0.36(0.13–0.98) 32.7(26.2–39.7) 86.7(69.3–96.2) 100(88.8–100) 16.1(9.8–24.2) 58.04(48.82–66.72) 1.19(1.10–1.29) 0.00 24.8(17.5–33.3) 100(81.5–100)
< 1 week from testing 100(63.1–100) 78.7(72.7–83.9) 89.37(83.77–91.19) 4.70(3.65–6.06) 0.00 14.5(6.5–26.7) 100(97.9–100) 85.7(42.1–99.6) 75.7(67.6–82.7) 80.72(63.55–90.96) 3.53(2.31–5.40) 0.19(0.03–1.16) 15.4(5.9–30.5) 99.0(94.8–100)
< 2 weeks from testing 90.5(69.6–98.8) 69.2(62.5–75.4) 79.85(71.41–86.28) 2.94(2.30–3.76) 0.14(0.04–0.52) 22.9(14.4–33.4) 98.6(95.1–99.8) 100(63.1–100) 60.0(51.2–68.3) 80.00(70.46–87.03) 2.50(2.03–3.07) 0.00 12.9(5.7–23.9)

100

(95.5–100)

  • Values in parentheses are 95% CI.
  • For each gestational time point, individual percentage risk of ≥ 5% was considered as positive test result.
  • * Calculated as ((Sens + Spec)/2).
  • LR−/+, negative/positive likelihood ratio; NPV, negative predictive value; PPV, positive predictive value; Sens, sensitivity; Spec, specificity.

While the balanced accuracy statistic (Table 3) reflects the balance of sensitivity and specificity using the ≥ 5% risk cut-off for positive test, the ROC curves shown in Figures 2-4 indicate the overall test performance of each of the algorithms, using the validation set only, at all percentage risks. The AUC is a measure of how well the parameter (% risk) can distinguish between two groups, in this case women with and those without a pregnancy outcome of sPTB.

Details are in the caption following the image
Receiver-operating-characteristics (ROC) curves showing prediction by QUiPP App v.2 of spontaneous preterm birth at < 30 (a), < 34 (b) and < 37 (c) weeks' gestation and within 1 (d) and 2 (e) weeks after testing in patients from validation set who underwent quantitative fetal fibronectin testing alone. Area under ROC curve: (a) 0.96 (95% CI, 0.94–0.99), n = 272, standard error (SE) = 0.013; (b) 0.85 (95% CI, 0.78–0.92), n = 520, SE = 0.035; (c) 0.77 (95% CI, 0.71–0.83), n = 576, SE = 0.030; (d) 0.91 (95% CI, 0.87–0.96), n = 576, SE = 0.022; (e) 0.92 (95% CI, 0.88–0.96), n = 576, SE = 0.022.
Details are in the caption following the image
Receiver-operating-characteristics (ROC) curves showing prediction by QUiPP App v.2 of spontaneous preterm birth at < 30 (a), < 34 (b) and < 37 (c) weeks' gestation and within 1 (d) and 2 (e) weeks after testing in patients from validation set who underwent cervical-length measurement alone. Area under ROC curve: (a) 0.86 (95% CI, 0.71–0.99), n = 92, standard error (SE) = 0.073; (b) 0.79 (95% CI, 0.69–0.90), n = 150, SE = 0.053; (c) 0.72 (95% CI, 0.62–0.82), n = 155, SE = 0.050; (d) 0.70 (95% CI, 0.45–0.95), n = 155, SE = 0.127; (e) 0.73 (95% CI, 0.51–0.95), n = 155, SE = 0.114.
Details are in the caption following the image
Receiver-operating-characteristics (ROC) curves showing prediction by QUiPP App v.2 of spontaneous preterm birth at < 30 (a), < 34 (b) and < 37 (c) weeks' gestation and within 1 (d) and 2 (e) weeks after testing in patients from validation set who had both quantitative fetal fibronectin testing and cervical-length measurement. Area under ROC curve: (a) 0.95 (95% CI, 0.90–1.00), n = 83, standard error (SE) = 0.028; (b) 0.83 (95% CI, 0.73–0.93), n = 138, SE = 0.052; (c) 0.73 (95% CI, 0.63–0.83), n = 143, SE = 0.051; (d) 0.88 (95% CI, 0.77–0.98), n = 143, SE = 0.056; (e) 0.89 (95% CI, 0.79–0.99), n = 143, SE = 0.049.

For the algorithm based on risk factors and qfFN alone, the AUC for predicting sPTB at < 30 weeks' gestation (AUC, 0.96) indicates good prediction, with similarly high AUCs for predicting sPTB at < 1 and < 2 weeks' post-testing. The risk prediction algorithm using CL alone performed best for the prediction of sPTB < 30 weeks, but its performance was inferior to that of the algorithm using qfFN results alone. When both qfFN and CL results were combined, the prediction of the algorithm improved, but was still inferior to the predictive performance of the algorithm using qfFN results alone, at all time points.

In order to compare directly the predictive ability of the different combinations of predictors, we compared AUCs in the validation set of women who underwent both tests (Table 4 and Figure S2). Although the addition of CL to qfFN appeared to be useful, the comparisons indicated no difference between the individual tests and combination of the two tests for predicting sPTB < 30, < 34 and < 37 weeks' gestation. However, for prediction of sPTB within 1 and 2 weeks after testing, qfFN alone appeared to be a better predictor than CL alone, but it was no better than the combination of qfFN and CL results. CL alone showed reduced ability to predict sPTB within 1 and 2 weeks' post-testing, with AUCs of 0.6975 and 0.7306, respectively. It should be noted that the number of women included in this analysis (i.e. who had both qfFN and CL tests) was small (particularly so for the prediction of sPTB < 30 weeks), so these results should be interpreted with caution.

Table 4. Comparison of areas under the receiver-operating-characteristics curves (AUC) for prediction by QUiPP App v.2 of spontaneous preterm birth (sPTB) at < 30, < 34 and < 37 weeks' gestation and within 1 and 2 weeks after testing, in women who underwent both quantitative fetal fibronectin (qfFN) testing and cervical length (CL) measurement, using results of qfFN test alone, CL alone and both tests combined
Algorithm AUC (95% CI) Standard error P > χ2*
sPTB < 30 weeks (n = 83)
Both qfFN and CL 0.953 (0.899–1.000) 0.028 Standard
qfFN alone 0.907 (0.832–0.982) 0.038 0.14
CL alone 0.848 (0.693–1.000) 0.079 0.17
sPTB < 34 weeks (n = 138)
Both qfFN and CL 0.831 (0.729–0.933) 0.052 Standard
qfFN alone 0.783 (0.662–0.905) 0.062 0.09
CL alone 0.789 (0.683–0.896) 0.055 0.35
sPTB < 37 weeks (n = 143)
Both qfFN and CL 0.731 (0.630–0.831) 0.051 Standard
qfFN alone 0.692 (0.589–0.796) 0.053 0.24
CL alone 0.719 (0.619–0.819) 0.050 0.75
sPTB < 1 week post-testing (n = 143)
Both qfFN and CL 0.875 (0.766–0.984) 0.056 Standard
qfFN alone 0.893 (0.811–0.975) 0.042 0.65
CL alone 0.698 (0.450–0.945) 0.126 0.01
sPTB < 2 weeks post-testing (n = 143)
Both qfFN and CL 0.889 (0.793–0.985) 0.049 Standard
qfFN alone 0.904 (0.833–0.975) 0.036 0.67
CL alone 0.731 (0.510–0.951) 0.113 0.02
  • * P > χ2: probability > χ2 significance test for differences between test groups.
  • n indicates number of observations; some women were recruited at later gestations.

As the model development included data from women who had interventions for reducing the likelihood of sPTB (i.e. tocolysis, progesterone, cerclage and Arabin pessary), we wanted to confirm that the risk was not underestimated. Consequently, we compared the performance of the models between women who had these interventions and those who did not. The performance of the App was either similar between women with intervention and those without or showed significantly poorer agreement (i.e. smaller AUC) or higher risk (odds ratio > 1 by logistic regression) in women who had intervention.

DISCUSSION

Our findings demonstrate an improved ability of the new QUiPP App algorithms to predict sPTB in women with symptoms of TPTL compared with the algorithms created for the first version of the App21. The previous version included data from 382 symptomatic women (190 in the training set and 192 in the validation set) using only the combination of risk factors and qfFN test results, as TVS-CL data were unavailable. The ability of the previous model to predict sPTB at < 30, < 34 and < 37 weeks' gestation and within 2 and 4 weeks of testing was demonstrated by predictive statistics using a threshold of > 10% to indicate a positive result. Using the new algorithms created for the QUiPP App v.2, prediction of sPTB at < 30, < 34 and < 37 weeks' gestation and within 1 and 2 weeks of testing was investigated using a threshold of ≥ 5%, on a substantially larger cohort (1032 in the training set and 506 in the validation set). Although comparison of the new algorithms can be made only with the qfFN group, our findings demonstrate a significant increase in sensitivity, the test's ability to correctly predict sPTB, at all outcome time points. PPVs, i.e. the probability that a woman with a positive test (in this case, a ≥ 5% or 10% risk of sPTB) will have sPTB, were lower in the later cohort, while the NPVs, i.e. the probability that a woman with a negative test (percentage risk < 5% or 10%) will not have sPTB, were similar to those in the previous cohort21. Unlike sensitivity and specificity, PPV and NPV are dependent on prevalence, and NPV will always be high when the prevalence is low. In the study of Kuhrt et al.21, the prevalence of sPTB was higher at all time points, so it is not surprising that the PPVs in that validation set were higher than those in the current one.

The AUCs demonstrating the predictive ability of the QUiPP App v.2 algorithms can also be compared with our previous findings21. In this earlier study, we found AUCs of 0.88, 0.83, 0.77, 0.77 and 0.78, for prediction of sPTB at < 30, < 34, < 37 weeks' gestation and within 2 and 4 weeks of testing, respectively. This represented an overall improvement compared with an earlier systematic review of the ability of fFN to predict sPTB15. That review included data from 40 studies and 26 876 women, and demonstrated AUCs of 0.71 and 0.77 for prediction by fFN of sPTB at < 34 and < 37 weeks' gestation, respectively. In the QUiPP App v.2 validation set, AUCs were higher than previously reported15, 21 for all but the time point of < 37 weeks (AUC: 0.96 at < 30 weeks; 0.85 at < 34 weeks; 0.77 at < 37 weeks; 0.91 at < 1 week post-testing and 0.92 at < 2 weeks post-testing).

Comparison of the predictive statistics of the new algorithms with our earlier work in symptomatic women21 demonstrate improved prediction, but the earlier analysis was based on algorithms developed using risk factors and qfFN only. The QUiPP App v.2 algorithms were created and validated for predicting sPTB using risk factors in combination with qfFN alone, TVS-CL alone or both tests. The ability of the QUiPP App v.2 to predict sPTB using risk factors and either one or both tests increases its utility and flexibility as it can be used when fFN testing is unavailable, and TVS-CL is increasingly common as training becomes more widespread.

When comparing the models between women who had interventions to reduce risk of sPTB and those who did not, we found little difference or reduced AUCs in the higher-risk group receiving intervention. This reduction in AUC is typically found when comparing homogeneous subgroups and does not indicate poor model performance.

Limitations

Similarly to other studies of TPTL, prevalence of PTB was low in our cohort. Only 17 women (1.6%) of the total PETRA cohort (n = 1037) delivered < 30 weeks' gestation. One reason for the low prevalence was that, due to the prospective design of the study, women had to be recruited before the outcome (gestation at delivery) was known. Many women whose TPTL symptoms progressed quickly into established labor could have been missed because research staff were unable to approach them before they delivered. Despite the low prevalence, however, the overall cohort size is larger than previously reported, so the number of events is greater, which allows for increased confidence in the findings.

Implications for practice

The UK's National Institute for Health and Care Excellence (NICE) Preterm Birth guideline27 recommends that, in women over 30 weeks' gestation, TVS-CL should be offered first, followed by fFN testing only if TVS-CL is unavailable. Combining both tests is not recommended. While some investigators have found added value in combining the tests28-32, others have not33, 34. In this project, the effect of combining CL with qfFN on predictive ability was also examined. Our findings indicated that prediction was not improved by combination of the two tests and, indeed, that TVS-CL alone was inferior to the two tests combined in predicting sPTB within 1 (AUC 0.698 vs 0.875, P = 0.01) or 2 (AUC 0.731 vs 0.889, P = 0.02) weeks after testing. This suggests fFN has superior predictive ability, and based on these findings, fFN should be recommended as first choice of test in TPTL over TVS-CL.

For women with suspected preterm labor under 30 weeks' gestation, NICE recommends a ‘treat all’ strategy, without reference to either fFN or CL tests. We modelled the effect of this strategy on a cohort of 188 symptomatic women < 30 weeks' gestation, using the QUiPP App, and found that 90% (n = 169) of hospital admissions could have been safely avoided if a threshold of 5% risk of delivery within the 7 days had been used to guide clinical practice35.

Conclusions

The QUiPP App v.2 is a reliable, simple-to-use tool that combines risk factors and test results into one simple percentage risk score. Its use could increase confidence in management decisions and lead to improved targeting and timing of interventions for reducing sPTB and its associated morbidities, while limiting unnecessary interventions and women's anxiety. The ability of the new algorithms to predict sPTB < 30 weeks is particularly important and should inform revision of the current NICE ‘treat all’ strategy for symptomatic women at < 30 weeks. Results of the EQUIPTT trial36 may also provide evidence for review of the NICE recommendations.

ACKNOWLEDGMENTS

The authors wish to thank all the women who took part in this study and Tommy's charity, which supports all the research in the Department of Women and Children's Health at St Thomas' Hospital. This is a summary of independent research funded by the National Institute for Health Research (NIHR)'s NIHR/HEE CAT Clinical Doctoral Research Fellowship Programme (Ref. CDRF-2013-04-026). P.T.S. is partly funded by Tommy's (Registered Charity No. 1060508) and by NIHR Collaboration for Leadership in Applied Health Research and Care, South London. J.S. is a National Institute for Health Research (NIHR) Senior Investigator and also supported by the NIHR Collaboration for Leadership in Applied Health Research and Care, South London at King's College Hospital NHS Foundation Trust. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This research is supported by the NIHR BRC at Guy's & St Thomas' NHS Foundation Trust and King's College, London, and at University College London Hospitals NHS Foundation Trust and University College London.