Accuracy and correlates of maternal recall of birthweight and gestational age

Objective To determine the accuracy of maternal recall of children birthweight (BW) and gestational age (GA), using the Danish Medical Birth Register (DBR) as reference and to examine the reliability of recalled BW and its potential correlates. Design Comparison of data from the DBR and the European Youth Heart Study (EYHS). Setting Schools in Odense, Denmark. Population A total of 1271 and 678 mothers of school children participated with information in the accuracy studies of BW and GA, respectively. The reliability sample of BW was composed of 359 women. Method The agreement between the two sources was evaluated by mean differences (MD), intraclass correlation coefficient (ICC) and Bland–Altman's plots. The misclassification of the various BW and GA categories were also estimated. Main outcome measures Differences between recalled and registered BW and GA. Results There was high agreement between recalled and registered BW (MD =−0.2 g; ICC = 0.94) and GA (MD = 0.3 weeks; ICC = 0.76). Only 1.6% of BW would have been misclassified into low, normal or high BW and 16.5% of GA would have been misclassified into preterm, term or post-term based on maternal recall. The logistic regression revealed that the most important variables in the discordance between recalled and registered BW were ethnicity and parity. Maternal recall of BW was highly reliable (MD =−5.5 g; ICC = 0.93), and reliability remained high across subgroups. Conclusion Maternal recall of BW and GA seems to be sufficiently accurate for clinical and epidemiological use. Please cite this paper as: Adegboye A, Heitmann B. Accuracy and correlates of maternal recall of birthweight and gestational age. BJOG 2008;115:886–893.


Introduction
Birthweight (BW) and gestational age (GA) are recognised as important measures of pregnancy outcomes. 1 Evidence is accumulating to show that BW and GA are also associated with health throughout the lifespan, supporting the fetal origins hypothesis of adult diseases. 2 BW and GA can be measured and registered as part of the routine medical record. However, recorded information may be unavailable if a child was born a number of years ago, at home or in areas where hospital birth records are not obtained or where there are problems with data quality. Addi-tionally, hospital or state records are not available for deliveries occurring outside the country. For these reasons, in epidemiological studies, maternal recall is often the only feasible means by which information can be obtained. Therefore, it is important to assess the accuracy and reliability of maternal recall by comparison with direct measurement.
To date, the accuracy of maternal report of BW [3][4][5][6][7][8] has been subject of more attention than maternal report of GA. 9,10 Furthermore, previous studies have focused mainly on the accuracy of maternal recall, and only few studies have examined both the accuracy and reliability of recalled BW. 3,9 Epidemiological studies have demonstrated that accuracy of maternal recall differs significantly among populations. 7 The accuracy of recalled and registered information on BW range markedly from 71% of exact agreement (USA) 11 to 16% of agreement within BW groups (China). 12 Thus, it is desirable to validate the responses given in questionnaires among subsamples of target populations.
The aims of this study were to examine the accuracy of maternal recall of BW and GA in a Danish population and to identify their correlates. In addition, reliability of maternal recall of BW was assessed.

Study design
The Danish part of the European Youth Heart Study (EYHS) is a longitudinal study of the associations between lifestyle and risk factors for cardiovascular disease in children, from which boys and girls in the third and ninth grades were recruited in 1997-98 in Odense county, DK. The children at third grade were followed for 6 years (2003/04), when a new third grade cohort was introduced. Complete information on the cohorts is presented elsewhere. 13,14 Mothers of participating children completed a questionnaire on the child's BW in g at both baseline and follow up. Maternal information on child's GA in weeks was collected at follow up only. Parents were also asked about their socio-demographic characteristics, lifestyle and current weight and height.
The study was approved by the local scientific ethics committee. All parents gave written informed consent and all children gave verbal consent.

Data linkage
Recalled information on BW and GA was compared with registered information in the national Danish Medical Birth Register (MBR). The EYHS was linked with MBR database by matching the national identification number (unique 10-digit identity number). The MBR was established in 1968 and has been computerised since 1973. This register contains information of all births in Denmark and is considered of good quality. 15

Study population
A total of 1537 children and parents participated in the EYHS. Of these, 1448 respondents were the biological mother of the participating child and 1428 mothers recalled, at least, once their children's BW and 359 recalled both at baseline and at the 6-year follow up. The first maternal recall on child's BW was preferably used for the accuracy analyses. Maternal recall at follow up was used in the absence of BW information at baseline (32%).
Regarding accuracy analysis of BW, a total of 157 children were excluded due to missing information on BW in the MBR database (incomplete identity number or born outside Denmark), leaving a final accuracy sample of 1271 women. The reliability sample of BW was composed of 359 women.
Of 749 biological mothers who provided information on children's GA, a total of 72 children were excluded due to missing information in the MBR database. Therefore, 678 women constituted the accuracy sample for GA.

Statistical analysis
Women who did not remember their children's BW were compared with the remaining cohort of women with, at least, one recalled BW using chi-square test and analysis of variance. The same procedure was applied for those who did not recall GA. However, the analysis was restricted to follow up data because maternal information on children's GA was collected at follow up only.
The discrepancy between recalled and registered BW and GA was assessed by Student's t test (mean difference [MD]; SD). Positive values represent overestimation and negative values represent underestimation of true value. Student's t test was also used for testing the difference between second recall and first maternal recall of BW.
Correlation and agreement between recalled and registered information on BW and GA and second versus first maternal recall on BW for the overall sample and across groups were investigated using Pearson's coefficient correlation (r) and intraclass correlation coefficient (ICC).
Recalled and registered BWs and GAs were also compared using Bland-Altman plot, 16 which consists of a graphical display of the differences (recalled -registered) against their MD (recalled + registered/2), 95% limits of agreement and respective confidence intervals. The 95% limits of agreement (MD ± 1.96 SD of the differences) identify the range of scores in which 95% of the differences between the two measurement methods are expected to fall. Moreover, the Bland-Altman plot was used to identify causes of discrepancies (observations outside the limits of agreement) between maternally recalled and registered information.
Linear regression analysis was used to assess the relationship between registered (dependent variable) and recalled BW and GA (independent variable). We tested the hypothesis that the coefficients b 0 and b 1 correspond to 0 and 1, respectively, which indicates a perfect fit. Stepwise multiple logistic regressions were used to relate the odds of being outside the limits of agreement and of having a discrepancy of more than 100 g in BW and 2 weeks in GA. Cutoffs of 100 g and 2 weeks were chosen because they represent differences in BW and GA of physiological significances. 7 Also, sensitivity of maternal recall for detecting BW and GA groups, classified according to cutoffs of clinical significance, was presented: low (<2500 g), normal (2500-4000 g) and high BW (>4000 g) and preterm (<37 weeks), term (37-41 weeks) and post-term ( ‡42 weeks). Based on age-and-sex-adjusted weight categories from a Danish reference population, 17 we classified infants, who were born between 27 and 43 weeks of gestation, as being small for gestational age (SGA) (£10th percentile), adequate for gestational age (AGA) or large for gestational age (LGA) ( ‡ 90th percentile). Kappa coefficients Maternal recall of birthweight and gestational age ª 2008 The Authors Journal compilation ª RCOG 2008 BJOG An International Journal of Obstetrics and Gynaecology were applied to evaluate the magnitude of agreement, for example kappa values of 0.8 and greater represented 'excellent' agreement, 0.61-0.8 'substantial' and 0.41-0.6 'moderate' agreement. 18 Moreover, we evaluated the quality of BW and GA information on the MBR. The percentage of implausible values for GA and BW-GA combinations was calculated according to cutoffs proposed by Alexander et al. 19,20 The analyses were performed using Stata 9.0 (StataCrop, TX, USA).

Results
Except for current body mass index (BMI), no differences were found for socio-demographic and birth-related characteristics between the women who recalled and those who did not recall children's BW, for example nonresponders had higher BMI than responders. Regarding missing recall on GA, significant differences were detected in relation to the child's age, ethnicity, maternal age, education and parity. Child's age and maternal age and parity were higher in the nonrespondent group. The proportion of nonwhite children and low-educated women were also higher among nonparticipants than among participants (Table 1). BW, as recorded by MBR, ranged from 866 to 5200 g (MD 3388 g, SD 567.1), with some evidences that the figures had been rounded off to 0. GA ranged from 26 weeks to 45 weeks and (MD 39.6 weeks, SD 1.9). No implausible values for GA (<20 weeks or >50 weeks) 19 and BW-GA combinations 20 were found.

Correlations and discrepancies across groups
In total, 68 and 42%, respectively, of the maternally recalled BWs and GAs were completely identical to those recorded on Analyses are restricted to biological mothers. Analysis of variance (mean [SD]) and chi-square test were performed. *Comparison between women who did not recall and who recalled at least once their children's BWs (EYHS; n 5 1448). **Comparison between women who did not recall and who recalled their children gestational age at birth. Analyses restrict those who participated at follow up (EYHS; n 5 846). the MBR. Additionally, 92% of the BWs were recalled within 100 g of the registered BWs, and 94% of the GAs were recalled within 2 weeks of registered GAs (Table 2).
Overall, there was a slight tendency for women to underestimate their children BWs (MD -0.2 g, SD 142.4) and overestimate GA (MD 0.3 weeks, SD 1.9). No significant dif-ference in mean discrepancies between maternal recall and registered BWs across subgroups were detected. A significant underestimation of GA was detected among mothers of nonwhite children and single mothers. Mothers who gave birth to SGA (MD -0.1 weeks, SD 1.5) babies underestimated the GA compared with those who gave birth to AGA (MD MD, mean discrepancy (recalled -registered information); r, Pearson's coefficient correlation; -, no estimation due to few observations. *Statistically significant (P , 0.05).
Maternal recall of birthweight and gestational age ª 2008 The Authors Journal compilation ª RCOG 2008 BJOG An International Journal of Obstetrics and Gynaecology 0.3 weeks, SD 1.1; P = 0.048) and LGA babies (MD 0.4 weeks, SD 1.2; P = 0.006). A significant difference was detected between maternal BMI and maternal educational groups. Overweight and higher educated mothers significantly overestimated the GA compared with others. The overall Pearson's correlation coefficients and ICC on BW were 0.97 and 0.94, respectively, and varied little only across subgroups. The overall correlation coefficients and ICC on GA were 0.85 and 0.76, respectively, and were markedly lower (r = 0.37; ICC = 0.42) in the group of children who were born after 41 weeks (Table 3). Table 4 shows that among the 359 women who informed about their children's BWs twice (6 years apart), maternal recall was highly reliable (r = 0.97; ICC = 0.93; MD = -5.5 g, SD 132.4). Reliability of recall remained high when considered separately by subgroups.

Linear regression analyses and graphic display of agreement
In the linear regression analyses using registered BW and GA as dependent variables and maternal recall as independent variable, the linear and angular coefficients were significant (P = 0.0001) . Hence, the hypothesis of perfect fit was rejected, which suggests that maternally recalled BW and GA should be adjusted through linear regression equation.
The Bland-Altman plots for BWs and GAs are given in Figures 1 and 2, respectively. Figure 1 shows that there were small differences only between recalled and registered BW, considering that the majority of the points were located close to the horizontal line, which represents 0. Only 3.6% of the differences were outside the limits of agreement (-285/ 284.5 g). In comparing the concentration of points above and below 0 (perfect agreement), a slight tendency towards underestimation among normal BW children, especially from 2900 to 3600 g (around the mean value of 3400 g), was observed. Figure 2 also shows that most of the differences between maternally recalled and registered GA aggregated within 95% limits of agreement (5.6% outside the limits; limit bands are -2.1/2.6 weeks). However, it was possible to identify a distinct pattern of agreement according to GA at birth, for example a trend towards underestimation between 36 and 39 weeks of gestation, and a slight trend towards overestimation for post-term infants.

Explanations for discrepancies
The stepwise analysis including child age, ethnicity, gender, BW, GA, BW-for-GA groups, maternal age, education, civil status, parity and BMI showed that the only factors related to BW differences outside the 95% limits of agreement were having a nonwhite child and having a previous birth. The same variables were associated with having a discrepancy of more than 100 g. A variable indicating whether BW recall was collected at baseline or follow up was also introduced in the models, but no significant effect was observed.
The variables significantly associated with discrepancies in GA were maternal civil status and maternal BMI. Single mothers had a higher likelihood of discrepant recall of GA than married mothers and the likelihood of discrepant recall rose with increasing maternal BMI (Table 5). MD, mean discrepancy (second recall -first recall); r, Pearson's coefficient correlation. -, no estimation due to few observations. All comparisons were non-statistically significant (P . 0.05). Mean of recalled and registered BW (g) Figure 1. Agreement between recalled and registered BW, with 95% limits of agreement, confidence intervals and regression line.   Sensitivity for detecting BW and GA groups Examining how errors in maternal recall would affect the classification of children into low-, normal-and high-BW groups show that only 1.6% (20/1271) of births would have been misclassified (kappa 95%; P = 0.000). The misclassifications of the GA groups into preterm, term and post-term delivery (17%) and BW-for GA into SGA, AGA and LGA (21%) were higher than misclassification of the BW groups. However, the magnitude of agreement for both GA and BW-for GA groups was moderate (kappa 56 and 58%, respectively, P = 0.000) ( Table 6).

Discussion
In this study, we demonstrate a high degree of accuracy and reliability of maternal recall of their children's BWs. Maternally recalled GA was also accurate and the degree of accuracy varied according to the child's ethnicity, BW-for GA groups, maternal civil status, BMI and education. Unexpectedly, mothers with a higher education overestimated the GA compared with less-educated mothers. Although the mean differences were statistically significant, the discrepancies between recalled and registered information were less than 1 week in all groups, which appears to be of little clinical relevance.
Although the overall underestimation of BW (-0.2 g) and overestimation of GA (0.3 weeks) were very small and had low impacts on BW and GA classification groups, it is important to consider that for evaluation of fetal growth combinations of both information resulted in an error of larger magnitude (21%).
The results demonstrate that the magnitude of agreement were higher for BW than for GA. This is not a surprising finding because child BW is the sort of information always awaited by the parents after delivery, and it is often repeated to family and friends, and therefore more likely memorised. GA is not often mentioned after delivery, especially if the child was born at term. Moreover, estimation of GA is more complex and accuracy varies according to the method used. 21 Although the linear regression analyses suggested that maternal recall of BW and GA should be adjusted through linear regression equation, the models showed a high correlation and a good explanatory capacity. Recalled BW and GA explained 94 and 72% of the variance in registered information, respectively. Whether correction for measurement error would be appropriate on the basis of this validation study is still debatable. If a gold standard measured with error is used to correct another imperfect measurement, this can introduce new bias. 22 The MBR was used as a standard source of information to validate the maternal recalls. The MBR is not a perfect gold standard; however, it seems to have a good quality. 15 The logistic regression analysis proved that the most important variable in the discordance between recall-based and register-based information on BW were child's ethnicity and having a previous child. These findings are in accordance with other studies, also showing that multiparous mothers may confuse their children's BWs and that mothers of nonwhite infants recall their children's BW less accurately than mothers of white infants. 7,23 Relevant variables predicting inaccuracy of GA in the logistic models were maternal civil status and BMI. Maternal civil status might be considered as proxy of socioeconomic class and family support and maternal BMI as proxy of lifestyle factors and health concern. Although several methods have been proposed to validate maternal recall of BW and GA information, [3][4][5]10 validation studies are often analysed inappropriately, notably by using correlation coefficient that may be misleading. Pearson's correlation coefficients measure the strength of a correlation or linear relatedness between two variables, but not agreement. Consequently, Pearson's correlation coefficients can reach high values when there is disagreement between two measurements, if the bias is systematic. 16 This problem can be overcome by using ICC, which combines a measure of correlation with a test in the difference of means (within and between subjects). In the present validation study, the use of various methods simultaneously in the analyses allowed a better view of the importance and the sensitivity grade of each one.
Combining the graphical approach of Bland-Altman with ICC allows for identification of heterogeneous patterns of agreement. A heterogeneous pattern of accuracy through different levels of the registered information can be more easily identified by a quick look at the graph. The presence of heterogeneity indicates the need to estimate ICC for different level of the variable studied (e.g. trend towards underestimation in BW between 2900 and 3600 g). The two methods may complement each other in pilot studies aiming to evaluate the agreement between recalled and registered BW and GA. Identifying groups with reliable information for those variables may justify the use of self-reported values, thus making fieldwork cheaper and easier.

Conclusion
Although the quality of maternal recall on BW and GA might have a slight importance for clinical practice, it is a relevant issue to future epidemiological research, which may lead to clinically useful information.
The small magnitude of means of the difference between recall-based and register-based information and the low rate of misclassification into BW and GA groups suggest that maternal recall of BW and GA can provide accurate information for epidemiological studies regarding fetal and infant growth.