Relative accuracy of computerized intrapartum fetal heart rate pattern recognition by ultrasound and abdominal electrocardiogram detection

Introduction: Noninvasive fetal heart rate monitoring using transabdominal


| INTRODUC TI ON
2][3][4][5][6][7][8] We demonstrated previously that intrapartum fetal heart rate (FHR) detection using fetal ECG signals obtained by maternal abdominal surface electrodes (afECG) was more accurate and reliable than external ultrasound (US)-based monitoring when each external technique was compared with FHR data obtained simultaneously from a fetal scalp electrode (FSE). 8That study assessed the ability of the two types of external monitoring to identify each fetal heartbeat detected by the FSE.It did not evaluate directly the accuracy of the external devices in identifying the FHR baseline, its variability, or the presence of accelerations and decelerations, central features of FHR pattern interpretation.Moreover, although a close correspondence was demonstrated in heart beat identification between afECG and FSE-derived data, the specific hypothesis that afECG is not inferior to traditional US-based monitoring in identifying FHR patterns (baseline FHR, variability, decelerations, and accelerations) was not tested.
To address this issue we applied a standardized computer-based approach for recognition of FHR patterns to data obtained simultaneously from a direct FSE, US, and the afECG technique.We used a computer-based approach to eliminate concerns about inter-and intra-observer variability.We compared FHR patterns generated by the external devices to those from the FSE.The FSE is considered the most accurate and reliable technique used in current practice, and was, therefore, used as the standard for comparison.The program quantified the baseline FHR, and long-and short-term variability, and it noted the presence and magnitude of accelerations and decelerations.Although most of these terms are used by clinicians to assess fetal well-being, for the purposes of this study we used them solely to describe the morphology of the FHR tracings, and not to infer anything about fetal condition.

| MATERIAL AND ME THODS
We undertook a secondary analysis of data from a prospective 3-center trial designed to determine the performance of 2 modes of external FHR monitoring (afECG and US) compared with information obtained from an FSE. 8 For this report we examined data from 30 patients of one hospital (Queens Hospital Center, Jamaica, NY, USA) in the parent study.The hospital used the Model 50XM system (Philips Healthcare, Andover, MA, USA) for standard FHR monitoring in all participants.The study design required the simultaneous use of 3 methods of FHR detection in each participant (afECG, US, and FSE).At the time of the study the afECG monitor (Model AN24; Monica Healthcare Ltd, Nottingham, UK) was being evaluated by the US Food and Drug Administration; it has since become commercially available.
In the parent investigation, 36 women entered the trial at the study hospital.We analyzed 30 of these in the first stage of labor; 24 were studied in the second stage.Six potential participants were excluded because the first-stage monitoring period for the 3 coincident tracings was <30 minutes (4 participants), or the US transducer malfunctioned, leaving an uninterpretable tracing (2 participants).
Each woman had a singleton term (≥37 weeks) pregnancy in cephalic presentation and was recruited early in spontaneous labor, or when she presented for labor induction.Each was monitored initially with the US technique.The tracing from the US monitor was available for clinical decision-making.The transducer position was adjusted as necessary by the attendant nurse every 20-30 minutes.The 5 afECG electrodes were applied to the maternal abdomen when it was determined that the US device was working appropriately.Data from the afECG were not visible to the care team.
An FSE was inserted later in labor in some patients if clinically indicated.The 30 women in whom simultaneous tracings from all 3 monitors were available for at least 30 minutes were the study participants.
FHR information from the US and the afECG monitors along with that from the FSE were transmitted to a bedside computer for analysis.No externally derived FHR data were visible to the care team once the FSE was applied, the US record was evaluated at least every 20-30 minutes by a research assistant who repositioned the transducer if necessary.traditional ultrasound-based monitoring.Abdominal-fetal electrocardiogram should, therefore, be considered a primary option for externally monitored patients.

| Data collection and analysis
0][21][22][23] Engineers at Monica Healthcare, Ltd. created an algorithm (Monica Decision Support [MDS]) based on the principles established by Dawes and   his colleagues (and as published by Dobbe et al) 24 and it was applied to assess the FHR patterns in our study.This algorithm was validated using a series of 87 hours of FHR data by the Divisie Perinatologie en Gynaecologie UMCU Utrecht (Utrecht, The Netherlands).This approach allowed us to examine the FHR patterns electronically in a uniform manner, avoiding the subjective observer bias that can confound consistent interpretation.
In order to generate the MDS parameters, FHR data were organized before analysis into 3.75-second epochs.The mean FHR (FHR av(i) ) in beats/minute (bpm) in each epoch was converted to a pulse interval in milliseconds for analysis, ie, T av(i) = 60 000/FHR av(i) .
The software then applied a rejection algorithm to each FHR pattern.FHR av(i) values <30 or >200 bpm and any T av(i) value >1.55 or <0.6 times the average of the preceding three T av(i) values were excluded.Also, any 30-minute window that contained <15 minutes of FHR data was not analyzed.After applying the rejection algorithm we identified the baseline FHR by deploying an exponential low-pass smoothing filter 24 having coefficients that resulted in a cut-off frequency of 1/600 Hz.The resulting time constant provided a slowly varying heart rate from which prominent changes (ie, accelerations and decelerations, the characteristics of which are described below) were removed, leaving behind the baseline FHR.
Otherwise stated, the baseline FHR is the average FHR excluding accelerations and decelerations over a 30-minute period referred to below as a "frame."Our software also quantified long-term and short-term variability, and noted the presence of accelerations and decelerations.
Long-term variability (LTV) was assessed by determining over each minute the minute range, which is the absolute difference between the longest and shortest 3.75-second epoch pulse interval (T av(i) ), expressed in milliseconds over each sequential minute of baseline tracing.Minutes with more than 10 seconds of signal loss or that contained any part of a deceleration were excluded from LTV analysis.Then, the average LTV value over each 30-minute frame was calculated for analysis, thus generating the "mean minute range."Short-term variability (STV) was measured by analyzing 25 the pulse interval of each 3.75-second epoch.The absolute difference from one epoch's pulse interval (T av(i) ) to the immediately following epoch's pulse interval (T av(i+1) ) expresses the short-term variation.
Over 1 minute an average short-term variation is derived from these absolute differences.Then in each 30-minute frame the average STV was calculated and used in our analyses.
FHR accelerations and decelerations were identified by deviations from the baseline.A small acceleration was recorded when there was a rise in FHR from baseline of >10 bpm but ≤ 15 bpm that lasted >15 seconds; a large acceleration required a rise of more than 15 bpm and a duration of >15 seconds.
A small deceleration was noted when the FHR dropped between 10 and 20 beats on average for 60 seconds, or fell more than 20 beats from the baseline for 30 seconds.A large deceleration had a drop of at least 20 beats from baseline and a duration of at least 60 seconds or at least 40 beats for 30 seconds.The number of large and small decelerations and accelerations detected in each 30-minute block of the FHR tracing was compared between methods of FHR detection.A false-positive acceleration or deceleration was one that was noted on an external method but not by the FSE; conversely, a false-negative acceleration or deceleration was recognized by the FSE but not by the comparison external device.

| Statistical analyses
The FHR data from all three sources were synchronized to within 0.25 seconds before analysis.As noted above, the FHR values over For each participant we identified all paired frame data points, ie, those in which the FSE standard and the comparison external monitor extracted an FHR analysis term.The paired data were used to determine differences between the distributions of data in the study groups.We calculated two sets of results for each participant: one compared the afECG against the FSE; the second compared the US method against the FSE.In each case the comparisons were made of the errors in the measurements, ie, the difference between the standard FSE value and the external value.Hence, the larger the error value, the greater the external device measurement deviated from that of the FSE.Because the distributions of these error measurements were not normal, we compared them using the nonparametric Wilcoxon signed-rank test.
Accuracy of the external monitoring modes in identifying each FHR parameter in comparison with the FSE standard was also assessed by Bland-Altman analysis, a standard approach to determining the degree of agreement between 2 measurements.6][27] In this manner a regression of the results from each pair of devices could be determined.This provided bias and limit of agreement values for each FHR data set comparison (FSE vs. afECG; FSE vs. US).The spread of values around the regression line is expressed mathematically by their root mean square error, limits of agreement, and bias.

| Ethical approval
The original study was approved by the Institutional Review Board of each participating institution, and conformed to the guidelines of the World Medical Association Declaration of Helsinki.For this report we examined data from 30 patients of one hospital (Queens Hospital Center, Jamaica, NY, USA) from the parent study.The Institutional Review Board of the Mount Sinai (NY) Medical Center approved the protocol (study #09-2213) on 5 January 2010.

| Overall sample attributes
The gestational age (mean ± SD) of the participants was 39.4 ± 1.1 wk; maternal age was 25.9 ± 4.4 y; and body mass index

TA B L E 3 False-positive and falsenegative periodic event rates
The results of all Bland-Altman analyses are summarized in Table 1.
For each comparison, afECG gave results closer to those of the FSE than did US.Results of the comparison of the error distributions (Wilcoxon test) are shown in Table 2.For each parameter, with the exception of identifying large decelerations in the second stage, afECG was superior to US.

| Baseline rate
The spread of average baseline heart rate values around the nearly horizontal Bland-Altman regression line is quite narrow for afECG, as can be seen visually (Figure 1A) and confirmed by the minimal bias, narrow limits of agreement, and small root mean square error (Table 1).
When, however, US was compared with the FSE the results were quite different (Figure 1B).The spread of values around the regression line was substantially greater, with wider limits of agreement, and greater bias and root mean square error.An error of more than 10 bpm occurred only 1.0% of the time with afECG; it occurred about 14% of the time with US.These differences were similarly distributed in stages I and II of labor (Figures 1C and 1D).For each comparison of errors (the difference between paired external and FSE-derived heart rates), the differences between the afECG and US baseline FHR values were statistically significant (Table 2).

| Variability
For measures of both STV and LTV, the difference between the external method and the FSE-derived variability was substantially greater for the US than the afECG method, differences most evident at diminished variability levels, when error has the greatest clinical importance.This was true in both first and second stages of labor.
When the STV from the FSE was low (<5.0 ms) the afECG error remained stable, but the US error increased considerably (Figures 2-3).
The Bland-Altman LTV plots (Figures 3A-3D) were similar to those for STV in that the afECG error was minimal for low values of LTV.

| Accelerations and decelerations
The afECG method identified periodic small and large accelerations and small decelerations with greater reliability than did US in both stage I and stage II of labor.That is to say, the differences in the error frequencies between the external monitoring modes and the FSE were statistically significant, and Bland-Altman patterns of agreement showed a wider spread around the regression line for US than for afECG (Table 1).Large deceleration identifications in stage I showed that the errors between afECG and the FSE were smaller than those between US and the FSE (P = 0.015) (Table 2); however, for stage II they were not significantly different (P = 0.255).
Figures 4A-4D indicate the large deceleration error count as a function of frame number, demonstrating false positive identifications in Stages I and II for afECG and US.These results (Table 3) demonstrate overall a low rate of false events for both external monitoring methods, generally lower than 2 per 30 minutes.For both methods, the false-negative and false-positive rates were significantly higher in the second stage than in the first.afECG had a significantly higher rate of false-negative accelerations in both labor stages, and a higher false-negative rate for small decelerations in the second.False-positive accelerations and decelerations were significantly more common during the first stage with US than with afECG monitoring.

| D ISCUSS I ON
We pattern interpretation, [28][29][30] it is critically important that the technique used to assess rate and variability provide the most accurate available information.It has been assumed since the introduction of autocorrelation techniques to cardiotachometer software that US-derived FHR patterns can be assumed to display true FHR variability.2][33][34] On the contrary, our results show that the representation of FHR variability may be altered considerably by US-based monitoring, and should be interpreted with considerable caution.Moreover, the situations in which US is most likely to exaggerate true variability are those in which it is actually diminished, potentially leading to false reassurance and failure to timely intervention.If an external FHR monitoring technique is chosen, afECG will depict STV and LTV more accurately than will US techniques.
Our results also indicate that the clinician can be confident that periodic changes in the heart rate pattern-accelerations and decelerations of various descriptions-will be as recognizable with afECG-derived monitoring as with US.In fact, the error in identifying accelerations and decelerations was consistently smaller with afECG.
A large randomized trial would be necessary to determine whether use of this kind of monitoring technique results in neonatal outcomes that are equivalent or superior to monitoring with traditional US-derived heart rate patterns.This is the first study to address whether afECG monitoring produces FHR patterns that would be interpreted in the same manner as those produced by US-based cardiotachometry.Of concern in designing this study was the fact that there is considerable inter-and intra-observer variation in the visual interpretation of FHR patterns. 35,36We addressed this potential pitfall by using an established electronic means to identify FHR characteristics.This substituted an objective and completely uniform diagnostic tool for the inherent subjectivity of human visual interpretation.
This was a secondary analysis of a portion of data from a previously published study.The parent study was not designed or powered to address the specific hypothesis of this analysis.Nevertheless, we had a substantial number of observations, and almost all our results were statistically significant.This approach to FHR analysis does not distinguish among various types of decelerations, as visual observation can, and not every practitioner would agree with all features of the algorithm we used for FHR pattern identification.That notwithstanding, it provided us with a purely objective and unbiased Another potential limitation was the fact that the US device was evaluated and adjusted (if necessary) every 20-30 minutes, and we did not document the number of adjustments made in each participant.This could have resulted in misdirected insonation and the potential for error in the inter-evaluation intervals, reinforcing the dangers of relying on US-derived measures of fetal well-being in a busy ward.This issue is not a concern with afECG monitoring, where the electrodes do not require adjustment during the entire labor, presenting the medical team with increased time to care for the mother rather than the electronic fetal monitoring instrument.

| CON CLUS ION
We conclude that the afECG-based technique for continuous FHR monitoring is an appropriate choice for noninvasive monitoring.The heart rate patterns it generated were interpretable, and were generally superior to those produced by a traditional US method in terms of their conformity with patterns derived from a direct fetal electrode.The observed differences were generally of sufficient magnitude to influence the clinical interpretation of FHR patterns.

ACK N OWLED G M ENTS
We are grateful to Drs.Sophia Ommani and Molham Solomon for

each 3 .
75-second epoch were averaged into a single FHR value for that epoch; therefore, there were 16 FHR epoch values in every minute.Using the MDS algorithm we extracted the baseline heart rate, STV, LTV, and the number of accelerations and decelerations in the simultaneous FHR recordings for each patient over all 30-minute frames in the first and second stages of labor.The median number of frame intervals analyzed per patient in stage I was 5, interquartile range 2-9, and in stage II was 3, interquartile range 2-4.In total for stage I and II there were 263 frames and 82 frames for afECG and 228 frames and 85 frames for US.
A Z-test for difference of proportions was used to compare the rates of false-positive and false-negative accelerations and decelerations between monitoring modalities.The statistical processing software deployed was IBM SPSS Statistics v. 24.0 (IBM Corp; Armonk, NY, USA).Matlab R2016A (The MathWorks, Natick, MA, USA) was used to create graphic Bland-Altman and box plots.All analyses were performed for all measured parameters on the complete pooled data from each monitoring modality.
was 32.1 ± 8.3 kg/m 2 .All 30 study participants were monitored in the first stage of labor and 24 in the second stage.The monitoring duration per participant was 218 ± 187 min in the first stage and 91 ± 59 min in the second stage of labor.

F I G U R E 1
(A-D) Bland-Altman plots for baseline fetal heart rate (FHR) in stage I of labor comparing agreement of abdominal-fetal electrocardiogram (afECG) method with fetal scalp electrode (FSE) method (A) and ultrasound (US) with FSE (B).Stage II results are also shown for afECG (C) and US (D).In both stages the afECG displayed better agreement with the FSE than did US [Color figure can be viewed at wileyonlinelibrary.com] [Colour figure can be viewed at wileyonlinelibrary.com]

F I G U R E 2
(A-D) Bland-Altman plots for short-term variability (STV) in stage I of labor comparing agreement of abdominal-fetal electrocardiogram (afECG) method with fetal scalp electrode (FSE) method (A) and Doppler ultrasound (US) with FSE (B).Stage II results are also shown for aFECG (C) and US (D).In both stages the afECG displayed better agreement with the FSE than did US [Color figure can be viewed at wileyonlinelibrary.com] found the computerized identification of FHR characteristics for the afECG technique was in most instances more faithful to results from the scalp electrode than US-based FHR detection in the identification of baseline heart rate, and heart rate variability, accelerations, and decelerations.These results considerably expand our understanding of afECG-derived FHR patterns.Although the high accuracy and reliability of this technique in recognizing fetal cardiac activity was demonstrated previously, we have now shown that fundamental characteristics of heart rate patterns used in assessment of the fetus (rate, variability, periodic changes) are readily recognizable when using an afECG system.This system, therefore, proved equivalent or superior to US in replicating heart rate patterns derived from an FSE.Our results have important clinical implications.Current recommendations for the clinical assessment of variability in the interpretation of FHR patterns require making fine distinctions in the degree of variability observed.Because such distinctions have a potentially substantial influence on decision-making, and because they are a component of all extant systems for FHR F I G U R E 3 (A-D) Bland-Altman plots for long-term variability (LTV) in stage I of labor comparing agreement of abdominal-fetal electrocardiogram (afECG) method with fetal scalp electrode (FSE) method (A) and Doppler ultrasound (US) with FSE (B).Stage II results are also shown for aFECG (C) and US (D).In both stages the afECG displayed better agreement with the FSE than did US [Color figure can be viewed at wileyonlinelibrary.com]

F I G U R E 4
(A-D) Large deceleration error count as a function of frame number for abdominal-fetal electrocardiogram (afECG) (A and C) and ultrasound (US) (B and D) for both stages I and II.afECG shows consistently lower error count than US [Color figure can be viewed at wileyonlinelibrary.com] means of analyzing and comparing the outputs of the three modes of monitoring used in the study.
their work on the parent study, and to Dennis Chanter, D Phil of Statisfaction Statistical Consultancy for his advice concerning the statistical analysis.A grant (University of Nottingham EPSRC Impact Accelerator EP/511730/1 and RA45V) supported the analyses done by Dr. Chong Liu for this work.CO N FLI C T S O F I NTE R E S T Barrie Hayes-Gill and Terrence Martin were once Executive Director and Director of Marketing, respectively, at Monica Healthcare, Ltd.Neither has any current affiliation with the company or with GE Healthcare.Wayne R. Cohen has been a paid consultant to Monica Healthcare, Ltd. and to GE Healthcare.

Modality Parameter False-negative rate (number/30 min) False-positive rate (number/30 min)
Note: n, number of 30-min frames analyzed in stage I/stage II; afECG, abdominal-fetal ECG method; US, Doppler ultrasound method.Data are the mean number of false-positive or false-negative accelerations or decelerations per 30-minute frame.A false-positive event is one that occurred in the test device but not in the scalp electrode data; a false-negative event was seen in the scalp electrode but not in the test device.*Stage II significantly different from stage I using Z-test.**Significantly different from US for same parameter using Z-test.