Comparison of Lerner score, Doppler ultrasound examination, and their combination for discrimination between benign and malignant adnexal masses
Abstract
Objective To determine whether the combined use of Lerner’s morphologic score and color Doppler ultrasound examination results in better discrimination of benign and malignant adnexal masses than the use of Lerner’s score alone or Doppler variables alone.
Design One hundred and seventy-three consecutive women with a pelvic mass judged clinically to be of adnexal origin underwent preoperative ultrasound examination including color and spectral Doppler techniques. One hundred and forty-nine tumors were benign and 24 malignant. The sensitivity and false-positive rate with regard to malignancy were calculated for Lerner’s score, six Doppler variables and combinations of Lerner’s score and Doppler variables. Previously defined gray scale and Doppler criteria of malignancy were used and tested prospectively. The best method was defined as that detecting most malignancies with the lowest false-positive rate.
Results Lerner’s score had a sensitivity of 92% and a false-positive rate of 36%. The best Doppler variable—time-averaged maximum velocity—had similar diagnostic properties with a sensitivity of 100% and a false-positive rate of 41%. Combining Lerner’s score with Doppler measurement of time-averaged maximum velocity—i.e. requiring both Lerner’s score and time-averaged maximum velocity to indicate malignancy for a malignant diagnosis to be made—had a sensitivity of 92% and a false-positive rate of 19%.
Conclusions The combined use of Lerner’s score and measurement of time-averaged maximum velocity is a better method for discrimination of benign and malignant adnexal masses than the use of Lerner’s score alone or Doppler ultrasound examination alone. The clinical value of the combined method needs to be cross-validated prospectively in a new series of tumors.
Introduction
Most experienced ultrasound examiners probably base their diagnosis of benignity or malignancy in an adnexal mass on subjective evaluation of the gray scale ultrasound image, combined with clinical information about the patient. However, less experienced ultrasound examiners might prefer to use a morphological classification system, e.g. that suggested by Granberg and co-workers 1 or by Valentin and colleagues2 or a morphologic scoring system, such as Sassone’s score3 or Lerner’s score 4.
The role of color Doppler ultrasound examination in the differential diagnosis of benign and malignant adnexal masses is still controversial. Almost certainly, the extent to which Doppler examination contributes to a correct diagnosis of an adnexal mass depends on which gray scale imaging method is used 5. To the best of my knowledge, the contribution of Doppler ultrasound examination to the correct diagnosis of an adnexal mass when the preliminary diagnosis is based on Lerner’s score has not been investigated.
The purpose of this study was to test prospectively previously defined cut-off values for Lerner’s score and Doppler variables and to determine whether the combined use of Lerner’s score and color Doppler ultrasound examination results in better sensitivity and specificity with regard to malignancy than the use of Lerner’s score alone or Doppler variables alone.
PATIENTS and METHODS
The study was approved by the Ethics Committee of the Medical Faculty, Lund University, Sweden.
One hundred and ninety-nine consecutive women scheduled for laparotomy or laparoscopic surgery because of a pelvic mass judged clinically to be of adnexal origin were recruited for the study and underwent preoperative ultrasound examination as described below. Twenty-six women were excluded for the following reasons: in 10 women surgery was cancelled and replaced by biopsy with cytological diagnosis or clinical follow-up; in three women, no histopathologic diagnosis was obtained despite laparotomy; in 13 women the preoperative ultrasound examination revealed normal adnexa and no pelvic tumor, this finding being confirmed at subsequent laparoscopy (n = 9), or clinical and ultrasound follow-up (n = 4). Thus, 173 women were included in the study, 98 of whom were premenopausal (median age 37.5 years, range 18–54) and 70 postmenopausal (median age 66 years, range 51–88; median 15 years past menopause, range 1–44); four had undergone hysterectomy (median age 51.5 years, range 44–66) and in the remaining case (a 47-year-old woman) menopausal status was unknown. One hundred and twenty women had one tumor, 52 women had two pelvic tumors and one woman had three pelvic tumors. To minimize bias, each woman contributed only one tumor to the study.
The women underwent ultrasound examination within 8 days preceding the operative procedure (laparotomy or laparoscopic surgery), but without regard to the day of the menstrual cycle. All examinations were performed by the author and were carried out in a systematic and predetermined manner. They started with transabdominal and/or transvaginal real-time gray scale ultrasound examination of the pelvis. Transvaginal examination was carried out with the woman in the lithotomy position, after first emptying the bladder. One hundred and ten women underwent both transabdominal and transvaginal gray scale ultrasound examination, whereas 63 women underwent transvaginal gray scale examination only. The length (L), depth (D) and width (W) of each tumor were measured in cm with calipers on the frozen ultrasound image, tumor volume (cm3) being calculated according to the formula: L × D × W × 0.5. After completion of the examination a Lerner score 4 was assigned to each tumor.
The Doppler ultrasound examination technique has been described in detail in a previous publication 6. Briefly, tumor vascularization was visualized with the color Doppler technique, each tumor being characterized by the color content of the tumor scan, as rated subjectively on a visual analogue scale of 0–100 arbitrary units. The number of units assigned to the tumor as a whole was designated the ‘tumor color score’, reflecting both the proportion of tumor tissue that was colored and the color scale. Having assigned a color score to the tumor, the examiner identified the tumor artery with the highest blood flow velocity. Blood flow velocity waveforms were obtained by placing the Doppler gate over the colored area(s) still visible after reducing the color Doppler sensitivity, and then activating the pulsed Doppler function. Several attempts were made to ensure that the vessel with the highest velocity had indeed been identified. Angle correction was not possible due to the smallness and tortuosity of the vessels. Instead, the highest achievable Doppler shift signals were sought for each vessel examined. The examinations were documented on videotape and on hard copy.
The ultrasound examinations were carried out using an Acuson 128 XP ultrasound system equipped with a 4-MHz transabdominal and a 7-MHz transvaginal probes (Acuson Inc., Mountain View, CA, USA). Both in color and spectral modes, the Doppler ultrasound frequency was 5 MHz. A high-pass filter with a cut-off level of 125 Hz was used. The output energy of the Doppler instrument did not exceed 500 mW/cm2 (spatial peak temporal average intensity).
The arterial Doppler spectra were analyzed off-line from the videotapes, using the built-in software of the ultrasound system. Three uniform consecutive heart beats were analyzed, and the resulting values averaged. The analysis was based on the envelope of the Doppler shift spectrum, and the waveforms were characterized by the peak systolic velocity, the time-averaged maximum velocity, the pulsatility index [PI (the peak velocity minus the minimum velocity after the peak, divided by the mean velocity over the cycle) 7], and the resistance index [RI (the peak velocity minus the minimum velocity after the peak, divided by the peak velocity) 8]. The blood flow velocity waveform with the highest time-averaged maximum velocity was selected to characterize the tumor.
The results of the ultrasound examinations were compared with those of histologic examination of the respective specimens. Borderline tumors were classified as malignant tumors. Staging of malignant ovarian tumors was done by the attending physician in accordance with the classification system recommended by the International Federation of Gynecology and Obstetrics 9.
The sensitivity and false-positive rate (defined as 100–specificity (%)) with regard to malignancy were calculated for Lerner’s score, six Doppler variables and the combined use of Lerner’s score and Doppler variables. The respective cut-off values recommended in previous publications were used. Thus, malignancy was indicated by a Lerner score ≥ 34, a tumor color score ≥ 40 or ≥ 626, a PI < 1.010, an RI < 0.4 11, a time-averaged maximum velocity ≥ 7.2 cm/s6 or a peak systolic velocity ≥ 14.4 cm/s6. In 29 tumors no arterial Doppler shift signals could be detected, and therefore no values for time-averaged maximum velocity, peak systolic velocity, PI or RI could be calculated. In a first analysis, these tumors were excluded. In a second analysis, the 29 tumors were assumed to be characterized by a PI ≥ 1.0, an RI ≥ 0.4, a time-averaged maximum velocity < 7.2 cm/s and a peak systolic velocity < 14.4 cm/s, i.e. the missing Doppler variables were presumed to indicate benignity. Two types of combination of Lerner’s score and Doppler variables were tested: (1) malignancy was diagnosed if both Lerner’s score and the Doppler variable indicated malignancy; and (2) malignancy was diagnosed if either Lerner’s score or the Doppler variable indicated malignancy. The sensitivity of each method was plotted against its false-positive rate. The best diagnostic method was defined as that detecting most malignancies with the lowest false-positive rate 12.
The large sample approximation of the sign test, MacNemar’s χ2 test, was used to compare the statistical significance of differences in false-positive rate. Two-tailed P-values are given with 5% as the level of significance. All statistical calculations and analyses except MacNemar’s χ2 test were carried out using the Statview SE + GraphicsTM statistical program (Abacus Concepts, Inc., Berkeley, CA). The StatXact-3 statistical program (Cytel Software Corporation, Cambridge, MA) was used to carry out MacNemar’s χ2 test.
Results
The histopathologic diagnoses are shown in Table 1. Median tumour size was 96 cm3 (range 2–4312 cm3).
Histopathologic diagnosis | n |
---|---|
Benign tumors | |
Endometriosis | 26 |
Dermoid cyst | 20 |
Mucinous cystoma | 19 |
Serous cystoma | 15 |
‘Benign cyst’ | 13 |
Adenofibroma | 12 |
Fibroma/fibrothecoma | 9 |
Myoma | 7 |
Hydro-, pyo- ‘or haemato-salpinx | 8 |
Paraovarian cyst | 6 |
Pelvic abscess | 3 |
Peritoneal cyst | 3 |
Struma ovarii | 3 |
Adhesions | 2 |
Follicular cyst | 1 |
Corpus luteum cyst | 1 |
Polycystic ovary | 1 |
Malignant tumors | |
Borderline ovarian tumor, stage I | 5 |
Ovary cancer | |
Stage I | 4 |
Stage II | 1 |
Stage III | 10 |
Stage IV | 2 |
Gonadal stroma cell tumor in ovary, Stage I | 1 |
Primary colon cancer | 1 |
Total | 173 |
The sensitivity and false-positive rate of the methods when only tumors with detectable arterial Doppler shift signals were taken into account are shown in Table 2. Lerner’s score was by far the best diagnostic test. No single Doppler variable performed better. Requiring both Lerner’s score and time-averaged maximum velocity to indicate malignancy for a diagnosis of malignancy to be made decreased the false-positive rate from 36% (when Lerner’s score was used alone) to 23% (P < 0.0001), whereas the sensitivity remained unchanged at 92%. The combination of Lerner’s score with Doppler variables other than time-averaged maximum velocity did not result in any substantial improvement in the balance between sensitivity and false-positive rate, as compared to the use of Lerner’s score alone.
Method | Sensitivity(%) | False-positiverate (%) | Positivepredictive value (%) | Negative predictive value (%) |
---|---|---|---|---|
Single variable | ||||
Lerner score > 3 | 92 (22/24) | 37 (44/120) | 33 (22/66) | 97 (76/78) |
TAMXV ≥ 7.2 cm/s | 100 (24/24) | 51 (61/120) | 28 (24/85) | 100 (59/59) |
PSV ≥ 14.4 cm/s | 79 (19/24) | 36 (43/120) | 31 (19/62) | 94 (77/82) |
Color score ≥ 40 * | 58 (14/24) | 21 (24/117) | 37 (14/38) | 90 (93/103) |
Color score ≥ 62 * | 33 (8/24) | 9 (11/117) | 42 (8/19) | 87 (106/122) |
PI < 1.0 | 88 (21/24) | 67 (81/120) | 21 (21/102) | 93 (39/42) |
RI < 0.4 | 25 (6/24) | 6 (7/120) | 46 (6/13) | 86 (113/131) |
Combined variables | ||||
Lerner score ≥ 3 and † | ||||
TAMXV ≥ 7.2 cm/s | 92 (22/24) | 23 (28/120) | 44 (22/50) | 98 (92/94) |
PI < 1.0 | 83 (20/24) | 30 (36/120) | 36 (20/56) | 95 (84/88) |
PSV ≥ 14.4 cm/s | 71 (17/24) | 17 (20/120) | 46 (17/37) | 93 (100/107) |
Color score ≥40 * | 54 (13/24) | 9 (11/117) | 54 (13/24) | 91 (106/117) |
Color score ≥ 62 * | 33 (8/24) | 6 (7/117) | 53 (8/15) | 87 (110/126) |
RI < 0.4 | 21 (5/24) | 4 (5/120) | 50 (5/10) | 86 (115/134) |
Lerner score > 3 or ‡ | ||||
RI < 0.4 | 96 (23/24) | 38 (46/120) | 33 (23/69) | 99 (74/75) |
Color score ≥ 62 * | 92 (22/24) | 39 (46/117) | 32 (22/68) | 97 (71/73) |
Color score ≥40 * | 96 (23/24) | 47 (55/117) | 30 (23/78) | 98 (62/63) |
PSV ≥ 14.4 cm/s | 100 (24/24) | 56 (67/120) | 26 (24/91) | 100 (53/53) |
TAMXV ≥ 7.2 cm/s | 100 (24/24) | 64 (77/120) | 24 (24/101) | 100 (43/43) |
PI < 1.0 | 96 (23/24) | 74 (89/120) | 21 (23/112) | 97 (31/32) |
- * No color score given in three tumors.
- † A malignancy diagnosis is made if both Lerner’s score and the Doppler variable indicate malignancy.
- ‡ A malignancy diagnosis is made if either Lerner’s score or the Doppler variable indicates malignancy.
- The methods are listed with regard to their diagnostic properties, the best method in each section being shown first and the poorest last.
- TAMXV = time-averaged maximum velocity; PSV = peak systolic velocity; PI = pulsatility index; RI = resistance index.
The sensitivity and false-positive rate of the methods when all tumours were taken into account are shown in Table 3. When an undetectable arterial Doppler shift spectrum was considered to indicate ‘benign’ values for blood flow velocity, PI and RI, time-averaged maximum velocity and Lerner’s score had similar diagnostic properties in terms of balance between sensitivity and false-positive rate (sensitivity 100% vs. 92%, false-positive rate 41% vs. 36%). Requiring both Lerner’s score and time-averaged maximum velocity to indicate malignancy for a diagnosis of malignancy to be made improved the balance between sensitivity and false-positive rate, as compared to using Lerner’s score alone or time-averaged maximum velocity alone. The improvement was due to a statistically significant decrease in false-positive rate (from 36% and 41%, respectively, to 19%; P < 0.0001). The combination of Lerner’s score with Doppler variables other than time-averaged maximum velocity did not result in any substantial improvement in the balance between sensitivity and false-positive rate, as compared to the use of Lerner’s score alone or time-averaged maximum velocity alone.
Method | Sensitivity(%) | False-positive rate (%) | Positive predictive value (%) | Negative predictive value (%) | |
---|---|---|---|---|---|
Single variable | |||||
TAMXV ≥ 7.2 cm/s | 100 (24/24) | 41 (61/149) | 28 (24/85) | 100 (88/88) | |
Lerner score ≥ 3 | 92 (22/24) | 36 (53/149) | 29 (22/75) | 98 (96/98) | |
PSV ≥ 14.4 cm/s | 79 (19/24) | 29 (43/149) | 31 (19/62) | 96 (106/111) | |
Color score ≥40 * | 58 (14/24) | 17 (24/145) | 37 (14/38) | 92 (121/131) | |
Color score ≥ 62 * | 33 (8/24) | 8 (11/145) | 42 (8/19) | 89 (134/150) | |
PI < 1.0 | 88 (21/24) | 54 (81/149) | 21 (21/102) | 96 (68/71) | |
RI < 0.4 | 25 (6/24) | 5 (7/149) | 46 (6/13) | 89 (142/160) | |
Combined variables | |||||
Lerner score ≥ 3 and † | |||||
TAMXV ≥ 7.2 cm/s | 92 (22/24) | 19 (28/149) | 44 (22/50) | 98 (121/123) | |
PI < 1.0 | 83 (20/24) | 24 (36/149) | 36 (20/56) | 97 (113/117) | |
PSV ≥ 14.4 cm/s | 71 (17/24) | 13 (20/149) | 46 (17/37) | 95 (129/136) | |
Color score ≥ 40 * | 54 (13/24) | 8 (11/145) | 54 (13/24) | 92 (134/145) | |
Color score ≥ 62 * | 33 (8/24) | 5 (7/145) | 53 (8/15) | 90 (138/154) | |
RI < 0.4 | 21 (5/24) | 3 (5/149) | 50 (5/10) | 88 (144/163) | |
Lerner score ≥ 3 or ‡ | |||||
RI < 0.4 | 96 (23/24) | 37 (55/149) | 34 (23/68) | 99 (94/95) | |
Color score ≥ 62 * | 92 (22/24) | 37 (54/145) | 29 (22/76) | 98 (91/93) | |
Color score ≥ 40 * | 96 (23/24) | 43 (63/145) | 27 (23/86) | 99 (82/83) | |
PSV ≥ 14.4 cm/s | 100 (24/24) | 51 (76/149) | 24 (24/100) | 100 (73/73) | |
TAMXV ≥ 7.2 cm/s | 100 (24/24) | 58 (86/149) | 22 (24/110) | 100 (63/63) | |
PI < 1.0 | 96 (23/24) | 66 (98/149) | 19 (23/121) | 98 (51/52) |
- * No color score given in three tumors.
- † A malignancy diagnosis is made if both Lerner’s score and the Doppler variable indicate malignancy.
- ‡ A malignancy diagnosis is made if either Lerner’s score or the Doppler variable indicates malignancy.
- The methods are listed with regard to their diagnostic properties, the best method in each section being shown first and the poorest last
- TAMXV = time-averaged maximum velocity; PSV = peak systolic velocity; PI = pulsatility index; RI = resistance index.
Discussion
In this study, Lerner’s score and measurement of time-averaged maximum velocity were the best diagnostic tests. Both tests manifested high sensitivity, but unfortunately were also associated with high false-positive rates. By combining the two methods, i.e. by requiring both to indicate malignancy for a malignant diagnosis to be made, the false-positive rate decreased without any substantial change in sensitivity. Thus, the combined use of the two methods resulted in better balance between sensitivity and false-positive rate than the use of either method alone.
Experienced ultrasound examiners probably prefer to use their own subjective evaluation of the ultrasound image (‘pattern recognition’) for determining the risk of malignancy in an adnexal mass instead of using a morphologic classification system or a morphologic score. Subjective evaluation of the gray scale ultrasound image by an experienced examiner using a good ultrasound system is a better method than Lerner’s score for distinguishing benign and malignant adnexal masses 13. In fact, it is so good that combining it with Doppler ultrasound examination does not improve sensitivity or the false-positive rate 13. Lerner’s score might be a suitable diagnostic tool for less experienced ultrasound examiners, even though testing of Lerner’s score in previous studies was probably performed only by experienced examiners4,14. The results of this study suggest that the high false-positive rate of Lerner’s score may decrease if it is combined with measurement of the highest time-averaged maximum velocity in the tumor. However, measurement of the highest time averaged maximum velocity requires considerable skill. Therefore, Doppler ultrasound examination might contribute less to a correct diagnosis if performed by an inexperienced ultrasound examiner. The definition of ‘experienced ultrasound examiner’ could be discussed. Timmerman suggests that the ability of an ultrasound examiner to distinguish benign and malignant adnexal masses on the basis of representative still images taken by an expert is unlikely to improve substantially after the examiner has carried out 1000 gynecologic ultrasound examinations 15. The number of examinations needed to learn to obtain high-quality representative images probably varies between individuals but is likely to exceed 1000.
It is a strength of this study that both the gray scale imaging method and the Doppler methods were tested prospectively using predefined criteria of benignity and malignancy. However, the way of combining the two methods is suggested on the basis of the results of this study. Therefore, to confirm the clinical value of combining Lerner’s score with Doppler measurement of time-averaged maximum velocity, the suggested combination needs to be tested prospectively in a new series of tumors.
The material was analyzed in two ways. The first analysis included only tumors with detectable arterial Doppler shift signals, the second analysis included all tumors, i.e. the missing Doppler variables were presumed to indicate benignity. Both types of analysis may be criticized, the first because it does not allow all tumors to be classified on the basis of Doppler results, the second because it does not represent a true prospective evaluation of previously suggested cut-off values (the cut-off values being suggested exclusively on the basis of tumors with detectable arterial Doppler shift signals) and because it entails assumptions made about unmeasurable Doppler signals. However, both analyses yielded similar results, even though measurement of time-averaged maximum velocity performed better when undetectable arterial blood flow was considered to indicate benignity. It seems reasonable to assume that tumors in which arterial Doppler shift signals cannot be elicited are characterized by time-averaged maximum blood flow velocity < 7.2 cm/s and are benign–unless the inability to detect arterial Doppler shift spectra is due to nonoptimal examination conditions. Therefore, analysis including all tumors is possibly the most appropriate.
The definition of ‘best method’ used in this study is open to question, because in certain situations high sensitivity is preferable to a low false-positive rate, whereas in other situations the reverse is true. However, the balance between sensitivity and false-positive rate (used here) is a measure of the overall performance of a test enabling comparison of different methods. ‘Accuracy’ is another possible measure of overall test performance. However, in this study where most tumors were benign, accuracy mainly reflected specificity.
In conclusion, the combined use of Lerner’s score and measurement of time-averaged maximum velocity seems to be a better method for discrimination of benign and malignant adnexal masses than the use of Lerner’s score alone or Doppler ultrasound examination alone. The clinical value of the combined method needs to be cross-validated prospectively in a new series of tumors. Preferably, this testing should be done by less experienced ultrasound examiners.
Acknowledgements
The study was supported by grants from the Malmö General Hospital Cancer Foundation, Funds administered by the Malmö Health Care Administration, the Faculty of Medicine of Lund University, the Anna-Lisa och Sven-Erik Lundgren Foundation for Medical Research, the Ingabritt and Ame Lundberg Research Foundation and the Swedish Medical Research Council (grant nos. B96-17X-11605-0IA and K98-17X-11605-03A).