Systematic analysis of copy-number variations associated with early pregnancy loss
ABSTRACT
Objectives
Embryonic numerical and structural chromosomal abnormalities are the most common cause of early pregnancy loss. However, the role of submicroscopic copy-number variations (CNVs) in early pregnancy loss is unclear, and little is known about the critical regions and candidate genes for miscarriage, because of the large size of structural chromosomal abnormalities. The aim of this study was to identify potential miscarriage-associated submicroscopic CNVs and critical regions of large CNVs as well as candidate genes for miscarriage.
Methods
Over a 5-year period, 5180 fresh miscarriage specimens were investigated using quantitative fluorescent polymerase chain reaction/CNV sequencing or chromosomal microarray analysis. Statistically significant submicroscopic CNVs were identified by comparing the frequency of recurrent submicroscopic CNVs between cases and a published control cohort. Furthermore, genes within critical regions of miscarriage-associated CNVs were prioritized by integrating the Residual Variation Intolerance Score and the human gene expression dataset for identification of potential miscarriage candidate genes.
Results
Results without significant maternal-cell contamination were obtained in 5003 of the 5180 (96.6%) cases. Clinically significant chromosomal abnormalities were identified in 59.1% (2955/5003) of these cases. Three recurrent submicroscopic CNVs (microdeletions in 22q11.21, 2q37.3 and 9p24.3p24.2) were significantly more frequent in miscarriage cases, and were considered to be associated with miscarriage. Moreover, 44 critical regions of large CNVs were observed, including 14 deletions and 30 duplications. There were 309 genes identified as potential miscarriage candidate genes through gene-prioritization analysis.
Conclusions
We identified potential miscarriage candidate CNVs and genes. These data demonstrate the importance of CNVs in the etiology of miscarriage and highlight the importance of ongoing analysis of CNVs in the study of miscarriage. Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd.
CONTRIBUTION
What are the novel findings of this work?
It is known that embryonic major chromosomal abnormalities are the most common cause of miscarriage. Our results demonstrate the role of copy-number variations (CNVs) in the etiology of miscarriage.
What are the clinical implications of this work?
We identified potential miscarriage candidate CNVs and genes. This work highlights the importance of ongoing analysis of CNVs in the study of miscarriage.
INTRODUCTION
Early pregnancy loss, or miscarriage, is the most frequent complication in first-trimester pregnancy, occurring in 10–15% of all clinically recognized pregnancies1.
Embryonic or parental chromosomal abnormalities, antiphospholipid syndrome, uterine anomalies, thrombophilias, endocrine disorders, infectious disorders, autoimmune diseases and sperm quality, as well as lifestyle issues, are all considered potential causes of miscarriage2. Among these, embryonic numerical and structural chromosomal abnormalities are the most common cause, accounting for more than 50% of miscarriages.
With the wide application of high-resolution molecular techniques, including chromosomal microarray analysis (CMA) and next-generation sequencing (NGS), submicroscopic copy-number variations (CNVs) have been recognized to be associated with a wide range of human diseases, including congenital anomalies and neurodevelopmental disorders3-5. In recent years, submicroscopic CNVs have also been observed in cases of miscarriage6-18. However, previous studies have focused mainly on evaluating the diagnostic yield and efficacy of CMA and NGS in samples of products of conception (POCs) and mostly included small sample sizes8-17. Specific information based on large cohorts regarding the association between submicroscopic CNVs and miscarriage is limited. Besides, although large CNVs are known to cause miscarriage, there are few reports of the specific regions critical for miscarriage and miscarriage candidate genes in CNV regions.
In the current study, we aimed to evaluate systematically the incidence and distribution of chromosomal abnormalities detected by CMA and the NGS-based method, CNV sequencing (CNV-seq), in more than 5000 cases of miscarriage. We analyzed potential miscarriage-associated submicroscopic CNVs and critical regions of large CNVs. Moreover, we sought to identify potential miscarriage candidate genes from critical regions of miscarriage-associated CNVs using gene-prioritization analysis.
METHODS
Study subjects
This retrospective analysis of cases of early pregnancy loss before 13 gestational weeks included subjects referred for genetic analysis by three centers in mainland China between 2013 and 2018; a small number of these cases have been reported in our previous publication17. Informed consent was obtained from all patients. Fresh chorionic villi were separated from maternal decidua by a standardized method, as described previously19. For comparison of the frequencies of CNVs, we used a control cohort of Asian populations from a previously published study20.
Testing strategies
Analysis for chromosomal abnormalities was performed using quantitative fluorescent polymerase chain reaction (QF-PCR)/CNV-seq or CMA. For the QF-PCR/CNV-seq strategy, suitable samples were first investigated using QF-PCR assays for significant maternal-cell contamination (MCC) and triploidy. Significant MCC was defined as levels exceeding 30%; these cases were excluded from the study. For the CMA strategy, all cases were tested directly by CMA, since the single-nucleotide polymorphism (SNP) array used in our study can detect simultaneously aneuploidies, triploidies, CNVs, uniparental disomy (UPD) and MCC. The testing strategies and numbers of cases are summarized in Figure 1.

QF-PCR
DNA was extracted from samples of POCs using a QIAamp DNA Mini Kit (Qiagen, Hilden, Germany). QF-PCR was performed using 15 highly polymorphic short tandem repeat markers, including three markers (D21S1435, D21S11 and D21S1411) for chromosome 21, four markers (D18S1002, D18S391, D18S535 and D18S386) for chromosome 18, four markers (D13S628, D13S742, D13S634 and D13S305) for chromosome 13, and four markers (DXS981, DXS6809, X22 and AMXY) for sex chromosomes. Amplification of microsatellite markers was carried out following the manufacturer's instructions. PCR products were electrophoresed on an ABI 3500 DX genetic analyzer and results were analyzed using ABI GeneMapper software (Applied Biosystems, Foster City, CA, USA).
CNV-seq
CNV-seq was carried out as reported previously, with minor modifications21. In brief, ∼ 50–100 ng genomic DNA was fragmented and used for construction of DNA libraries by adapter ligation and PCR amplification. DNA libraries were sequenced by an Ion Proton Sequencer (Thermo Fisher Scientific, Waltham, MA, USA) to generate about 4–5 million raw single-end sequencing reads of approximately 200 base pairs in length. A total of 2.8–3.2 million uniquely mapped reads were aligned to the University of California Santa Cruz (UCSC) Human Genome Build 19 (hg19) (Genome Reference Consortium (GRC) Build 37) using the Burrows–Wheeler algorithm22 and allocated to a 20-kilobase (kb) bin on each chromosome. GC correction was performed to eliminate the effect of GC bias between different samples using a three-step process (LOESS regression, intrarun normalization and linear model regression) as described previously23. CNVs were identified by circular binary segmentation (CBS algorithm)24.
CMA
Genomic DNA was isolated from tissue samples according to standard procedures. Two CMA platforms were used for identification of chromosomal abnormalities in this study, the HumanCytoSNP-12 array (Illumina, San Diego, CA, USA) and the Affymetrix CytoScan 750K array (Affymetrix, Santa Clara, CA, USA). SNP array experiments and molecular karyotype analysis for the Illumina platform were performed as reported previously17. The Affymetrix CytoScan 750K array, including approximately 550 000 oligonucleotide probes and 200 000 SNP probes, with average marker spacing of roughly one probe every 4.1 kb, was used for the whole-genome scan. Genomic DNA was fragmented, labeled and hybridized to CytoScan 750K arrays according to the manufacturer's protocol. Molecular karyotype analysis was performed using ChAS 3.2 software (Affymetrix). CNVs were reported according to the UCSC hg19. CNVs detected by the two platforms had an effective minimum resolution of 100 kb. Regions of allelic homozygosity (ROHs) were displayed at a threshold of 5 mega base pairs (Mb). A single large ROH (≥ 10 Mb) or multiple ROHs on a single chromosome (suggestive of mixed isodisomy and heterodisomy) as well as ROHs covering the entire chromosome were used for indicating UPD. Mosaicism for CNVs ≥ 5 Mb was reported when the detection threshold of 30% was exceeded. Significant MCC was reported when the levels of MCC exceeded 30% and subsequent QF-PCR was carried out to validate the proportion of fetal and maternal DNA in cases with a female fetus. Cases with significant MCC were excluded from this study.
Evaluation of CNVs
Large CNVs (≥ 10 Mb) were defined as partial aneuploidies, whereas submicroscopic CNVs (< 10 Mb) were classified as microdeletions and microduplications. Large CNVs detected in three or more cases were defined as recurrent large CNVs, while submicroscopic CNVs found in two or more cases were defined as recurrent submicroscopic CNVs. Pathogenicity of submicroscopic CNVs was evaluated according to the American College of Medical Genetics guidelines25. CNVs were classified into three major categories: pathogenic CNVs, variants of uncertain significance (VOUS) and benign CNVs. Only pathogenic CNVs and VOUS are reported in this study.
Statistical analysis
Chi-square or Fisher's exact test was used to compare the frequency of CNVs between the case and the control cohorts. P-value < 0.05 was considered statistically significant in all tests, which were performed using SPSS statistical software (Version 22.0, IBM Corp., Armonk, NY, USA).
Identification of candidate genes from CNVs in miscarriage
Candidate genes related to miscarriage were identified by integrating Residual Variation Intolerance Score (RVIS) percentiles and the human gene expression dataset of the genes in the smallest overlapping regions (SORs) of miscarriage-associated CNVs. Large CNVs found in at least three miscarriage cases were evaluated for SORs in this study. RVIS, which is based on allele frequency from the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP), is a gene-level scoring system commonly used to prioritize candidate genes26, 27. The more intolerant a gene is, the more likely it is that it is associated with disease if mutated or at an abnormal level. In this study, RVIS percentiles ≤ 25th were used for identifying candidate genes; scores were obtained from the RVIS website (http://genic-intolerance.org/). Gene expression levels of human tissues were obtained from the Human U133A/GNF1H Gene Atlas on the BioGPS website (http://biogps.org/) using a microarray dataset of previous research in humans28. Genes with an expression value in any human fetal or placental tissue higher than the upper quartile of all tissues were defined as high expression in the fetal or placental tissue. The gene ontology (GO) analysis was performed using the DAVID Bioinformatics Database (https://david.ncifcrf.gov/).
RESULTS
Specimen characteristics
Initially, 5180 cases of miscarriage were included in this study. We excluded 177 cases because of significant MCC (Figure 1). Of the 5003 miscarriage cases ultimately included in our analysis, 535 cases were from our previous study17. All cases were miscarriages occurring before 13 weeks' gestation, with a mean gestational age of 9.5 (range, 4–13) weeks. Maternal age was < 30 years in 2241 (44.8%) cases, 30–34 years in 1568 (31.3%) cases, 35–39 years in 897 (17.9%) cases and ≥ 40 years in 297 (5.9%) cases.
Chromosomal abnormalities detected by QF-PCR/CNV-seq and CMA
A total of 5003 cases with fetal results were available for further analysis, including 1902 cases detected by QF-PCR/CNV-seq and 3101 cases tested by CMA. Overall, normal results were identified in 1963 (39.2%) cases, VOUS were identified in 85 (1.7%) cases (Table S1) and abnormal results were identified in 2955 (59.1%) cases (Table 1 and Figure 2). Of the 2955 cases with abnormal results, 2879 (97.4%) could theoretically be detected by traditional cytogenetic G-banding analysis (> 10 Mb).
Chromosomal abnormality | CNV-seq combined with QF-PCR (n = 1902) | CMA (n = 3101) | Total (n = 5003) | |||
---|---|---|---|---|---|---|
n | Frequency (%) | n | Frequency (%) | n | Frequency (%) | |
Single aneuploidy | 871 | 45.8 | 1308 | 42.2 | 2179 | 43.6 |
Autosomal trisomy | 688 | 36.2 | 1085 | 35.0 | 1773 | 35.4 |
Autosomal monosomy | 15 | 0.8 | 14 | 0.5 | 29 | 0.6 |
Monosomy X | 165 | 8.7 | 203 | 6.5 | 368 | 7.4 |
Sex chromosome trisomy | 3 | 0.2 | 6 | 0.2 | 9 | 0.2 |
Multiple aneuploidy | 64 | 3.4 | 94 | 3.0 | 158 | 3.2 |
Double aneuploidy | 55 | 2.9 | 80 | 2.6 | 135 | 2.7 |
Triple aneuploidy | 7 | 0.4 | 13 | 0.4 | 20 | 0.4 |
Quadruple aneuploidy | 2 | 0.1 | 1 | 0.0 | 3 | 0.1 |
Polyploidy | 121 | 6.4 | 233 | 7.5 | 354 | 7.1 |
Triploidy | 121 | 6.4 | 221 | 7.1 | 342 | 6.8 |
Tetraploidy | 0 | 0.0 | 12 | 0.4 | 12 | 0.2 |
Partial aneuploidy (large CNV) | 76 | 4.0 | 112 | 3.6 | 188 | 3.8 |
Terminal deletion/duplication | 28 | 1.5 | 42 | 1.4 | 70 | 1.4 |
Terminal deletion + terminal duplication (suggestive of unbalanced translocation) | 30 | 1.6 | 47 | 1.5 | 77 | 1.5 |
Terminal deletion + interstitial duplication in one chromosome | 8 | 0.4 | 15 | 0.5 | 23 | 0.5 |
Interstitial deletion/duplication | 10 | 0.5 | 8 | 0.3 | 18 | 0.4 |
Microdeletion/microduplication (submicroscopic CNV) | 34 | 1.8 | 101 | 3.3 | 135 | 2.7 |
Pathogenic CNV | 11 | 0.6 | 39 | 1.3 | 50 | 1.0 |
VOUS | 23 | 1.2 | 62 | 2.0 | 85 | 1.7 |
UPD | 0 | 0.0 | 26 | 0.8 | 26 | 0.5 |
Whole-genome UPD | 0 | 0.0 | 19 | 0.6 | 19 | 0.4 |
Single-chromosome UPD | 0 | 0.0 | 4 | 0.1 | 4 | 0.1 |
Segmental UPD | 0 | 0.0 | 3 | 0.1 | 3 | 0.1 |
- CMA, chromosomal microarray analysis; CNV-seq, copy-number-variation sequencing; QF-PCR, quantitative fluorescent polymerase chain reaction; UPD, uniparental disomy; VOUS, variant of uncertain significance.

Aneuploidy was the most common abnormal finding, with 2179 (43.6%) single aneuploidies and 158 (3.2%) multiple aneuploidies being identified (Table 1). With the exception of chromosome 1, aneuploidies were observed in all chromosomes. Trisomies accounted for the majority of single aneuploidies (1782/2179; 81.8%). Of the single trisomies, trisomy 16 was the most frequent (511/1782, 28.7%), and trisomy 22 the second most frequent (289/1782, 16.2%) (Figure S1). Monosomy was found in 18.2% (397/2179) of cases with single aneuploidy. Monosomy X represented 92.7% (368/397) of these cases, and autosomal monosomy comprised the remaining 7.3% (29/397), with 22 cases of monosomy 21, three of monosomy 22, two of monosomy 4 and two of monosomy 13. Polyploidy was found in 354 (7.1%) cases: triploidy in 342 (6.8%) cases and tetraploidy in 12 (0.2%).
Partial aneuploidy (large CNV) was observed in 188 (3.8%) cases (Tables 1 and S2). Terminal deletion coupled with terminal duplication, suggestive of unbalanced translocation, accounted for the largest proportion of partial aneuploidies (77 cases; 41.0% of all partial aneuploidies and 1.5% of all cases analyzed). Microdeletions and microduplications (submicroscopic CNVs) were identified in 135 (2.7%) cases. Among these, 50 (1.0%) were considered as pathogenic CNVs (Table S3) and the remainder were classified as VOUS.
UPD was detected in 26 (0.5%) cases, none of which had known history of parental consanguinity. Whole-genome UPD, suggestive of molar pregnancy, was observed in 19 cases (Figure S2). Single-chromosome UPD (involving chromosomes 6, 21 and 22) was identified in four cases, presumably reflecting a monosomy rescue event. In addition, three cases of segmental UPD (involving chromosomes 1q, 10p and 22q) were observed. The most likely mechanism contributing to segmental UPD in these cases would be a meiotic/mitotic error, in which a meiotic non-disjunction event was followed by mitotic crossing-over between the paternal and maternal homologues, with subsequent trisomy rescue29. The incidence of UPD could have been underestimated, since UPD cannot be identified in specimens evaluated by CNV-seq.
We investigated the relationship between maternal age and chromosomal abnormalities, finding that the frequency of aneuploidy increased with maternal age, whereas the frequencies of other types of chromosomal abnormalities were not correlated with maternal age (Figure S3).
Identification of recurrent CNVs associated with miscarriage
To identify significant CNVs related to miscarriage, cases with numerical chromosomal abnormalities were excluded from CNV analysis. As a result, a total of 416 CNVs in 323 cases were subjected to further analysis, including 274 large CNVs in 188 cases, 52 submicroscopic pathogenic CNVs in 50 cases and 90 VOUS in 85 cases. The distribution of large CNVs and submicroscopic pathogenic CNVs in miscarriage cases is shown in Figure 3. For the purpose of identifying miscarriage-associated submicroscopic CNVs and critical regions of large CNVs, large CNVs and submicroscopic CNVs were analyzed separately.

We identified 16 recurrent (n ≥ 2) submicroscopic CNVs, including nine pathogenic CNVs and seven VOUS. The frequencies of recurrent submicroscopic pathogenic CNVs are summarized in Table 2. Of these pathogenic CNVs, three were identified with significantly higher frequencies in miscarriage cases than in healthy controls20, whereas no VOUS were statistically more common in miscarriage cases. These three statistically significant recurrent pathogenic CNVs involved microdeletions in 22q11.21, 2q37.3 and 9p24.3p24.2, and were considered to be associated with miscarriage.
Chr | Cytoband | Type | Genomic coordinates of SORs (hg19) | Size range (Kb) | Cases (n = 2124*) | Controls20 (n = 6533) (n) | Cases in meta-analysis18 (n) | |
---|---|---|---|---|---|---|---|---|
n | P | |||||||
22 | 22q11.21 | Del | chr22:18,938,367-21,460,220 | 2524–3300 | 6 | < 0.0001 | 0 | 4 |
2 | 2q37.3 | Del | chr2:239,900,000-243,020,000 | 3120–8692 | 3 | 0.015 | 0 | 0 |
9 | 9p24.3p24.2 | Del | chr9:208,454-2,407,101 | 2361–7416 | 3 | 0.015 | 0 | 0 |
7 | 7q11.23 | Del | chr7:72,394,456-74,138,121 | 1744–1931 | 2 | 0.06 | 0 | 2 |
7 | 7q11.23 | Dup | chr7:72,861,782-74,097,488 | 1236–1420 | 2 | 0.06 | 0 | 0 |
1 | 1p36.33p36.31 | Del | chr1:752,566-5,555,529 | 4803–6000 | 2 | 0.06 | 0 | 2 |
15 | 15q11.2 | Del | chr15:22,761,271-23,164,315 | 403–753 | 4 | 0.236 | 5 | 0 |
19 | 19p13.3 | Del | chr19:267,039-1,771,582 | 1505–3886 | 2 | 0.06 | 0 | 0 |
Y | Yp11.31p11.2 | Del | chrY:2,654,900-6,144,599 | 3494–7010 | 2 | — | — | 0 |
- * Count excludes samples with numerical and structural chromosomal abnormalities.
- Chr, chromosome; Del, deletion; Dup, duplication; SORs, smallest overlapping regions.
We identified 44 recurrent large CNVs (14 deletions and 30 duplications) which existed in at least three miscarriage cases. The SORs of deletions and duplications ranged in size from 10.2 to 22.8 Mb and from 0.86 to 49.5 Mb, respectively (Table S4). Large deletions occurred most commonly in chromosome 8, followed by chromosomes 4 and 1. Large duplications occurred most frequently in chromosome 8, followed by chromosomes 7 and 11.
Identification of miscarriage candidate genes
To identify miscarriage candidate genes, we carried out a gene-prioritization process by integrating genes in critical regions of miscarriage-associated CNVs with RVIS percentiles and the human gene expression dataset. A total of 5736 genes within the SORs of large CNVs and statistically significant submicroscopic CNVs were subjected to gene-prioritization analysis. We identified 309 genes in these CNV regions with low RVIS percentile and high expression in fetal or placental tissues; these were considered to be potential miscarriage candidate genes (Table S5). The GO analysis showed that these 309 genes were more frequent in four major functional categories: localization, signaling, developmental process and biological adhesion (P < 0.05). In particular, genes were significantly more common in nervous-system development. Moreover, 62.5% (193/309) of the candidate genes had mammalian phenotypes, according to Mouse Genome Informatics (MGI), of which 24.4% (47/193) could contribute to murine embryonic lethality when mutated or at abnormal levels.
DISCUSSION
In this study, we applied CMA and QF-PCR/CNV-seq to investigate the incidence and distribution of chromosomal abnormalities in a large cohort of miscarriage cases. Based on these results, we identified miscarriage-associated submicroscopic CNVs and critical regions of miscarriage-associated large CNVs. We also identified potential miscarriage candidate genes from recurrent miscarriage-associated CNVs through gene prioritization analysis. Overall, the frequency of pathogenic chromosomal abnormalities was 59.1% (2955/5003), the majority (97.4%) of which were > 10 Mb and should theoretically be detectable by traditional cytogenetic analysis. The detection rate of VOUS in our study was 1.7% (85/5003). Both the frequency of pathogenic chromosomal abnormalities and that of VOUS were consistent with findings of previous studies6, 7.
Although the role of submicroscopic CNVs has been well studied in patients with structural anomalies and neurodevelopmental disorders3-5, it is still unclear whether these abnormalities contribute to miscarriage. In this study, submicroscopic pathogenic CNVs were observed at a frequency of 1.0% (50/5003), which is slightly higher than frequencies reported previously (0.6% by Levy et al.6 and 0.8% by Sahoo et al.7. We identified three recurrent submicroscopic CNVs considered to be associated with miscarriage (microdeletions at 22q11.21, 2q37.3 and 9p24.3p24.2) that were significantly more common in cases of miscarriage compared with the control cohort of Asian populations20.
The most common miscarriage-associated submicroscopic CNV was the 22q11.21 microdeletion, which has been observed frequently in patients with congenital heart disease, neurodevelopmental disorder or other structural abnormality30. In our study, the prevalence of this microdeletion was 0.28% (6/2124) in cases with normal results at the karyotypic level, which is remarkably higher than that in the general population (0.013% of live births)31. Recently, Pauta et al.18 conducted a meta-analysis of the incremental yield of CMA over karyotyping in miscarriage, which revealed that the incremental yield of the 22q11.21 microdeletion was 0.22% (4/1813) in miscarriage cases. This microdeletion has also been identified in cases of stillbirth and neonatal death32, 33. Accordingly, we suggest that 22q11.21 microdeletion is a miscarriage-causing locus, and that early embryonic death could be caused by major heart malformations resulting from the 22q11.21 deletion.
Another miscarriage-associated submicroscopic CNV identified in this study was 2q37.3 microdeletion, which was found in three miscarriage cases but in no controls. This microdeletion is associated with a rare congenital syndrome known as brachydactyly-mental retardation syndrome, characterized by neurodevelopmental disorder, obesity and skeletal and craniofacial abnormality34, 35. Additionally, it has been reported that 20% of cases have congenital heart defects that may lead to miscarriage35. However, this CNV has not been reported previously in miscarriage cases, probably because of the low incidence of this microdeletion and the limited number of miscarriage cases evaluated by high-resolution technology. Our finding of isolated 2q37.3 microdeletion in three cases suggests that this genomic imbalance could be associated with miscarriage.
Microdeletions in 9p24.3p24.2 were also identified as significant CNVs related to miscarriage, occurring in three miscarriage cases but in no controls. This microdeletion has been associated previously with intellectual disability and craniofacial and genital abnormalities but not with miscarriage36. In addition to microdeletions in 9p24.3p24.2, we also found large CNVs encompassing this interval in four cases of miscarriage, one of them being isolated CNV. Therefore, we suggest that 9p24.3p24.2 microdeletion may contribute to miscarriage.
In addition to these three microdeletions, we identified genomic imbalances in 7q11.23 (two deletions and two duplications) in four miscarriage cases. Microdeletions in this region are associated with Williams syndrome, a developmental disorder with an estimated prevalence of 1 in 10 00037. Microduplications resulting in 7q11.23 duplication syndrome have been reported to be present in one in 7500–20 000 births38. The clinical manifestations of both syndromes are variable, with shared phenotypes, including neurobehavioral disorders, cardiovascular abnormality and facial dysmorphism37, 38. Although neither of the syndromes reached statistical significance (2/2124 vs 0/6533, P = 0.06 for both), this locus had a significant burden of CNVs in miscarriage cases (4/2124 vs 0/6533, P = 0.004). In a previous study, 7q11.23 microdeletion was observed at a frequency of 0.05% (1/1861) in miscarriage cases6. Our findings provide additional evidence that CNVs at the 7q11.23 locus may be associated with an increased risk of miscarriage.
Large CNVs are known to be causative of miscarriage. Interestingly, large CNVs were identified at a prevalence of 3.8% (188/5003) in our study, which is higher than reported previously (2.0% by Levy et al.6 and 1.7% by Sahoo et al.7). Furthermore, we found that large CNVs were remarkably more common at certain chromosomal regions. In total, 44 critical regions associated with miscarriage were identified from large CNVs, including 14 deletions and 30 duplications. Among these, chromosome 8 was observed with the highest incidence of both large deletions and large duplications, which is consistent with the findings of a previous study6. One possible explanation is that maternal 8p inversion, which is delimited by the olfactory receptor (OR) gene clusters, is the most common genomic polymorphism on autosomal chromosomes and may confer susceptibility to unequal crossovers between two OR gene clusters39-41. This can lead to the formation of recurrent chromosome rearrangements, including inverted duplication with terminal deletion of 8p, 8p deletion, isochromosome 8q, a supernumerary marker chromosome and interstitial 8p23.3 duplication39-41. These critical regions identified in miscarriage cases provide a unique source for prioritizing miscarriage candidate genes.
In general, large chromosomal abnormalities which contain numerous genes are expected to be lethal and can lead to miscarriage, whereas small imbalances could be viable. Nevertheless, recent studies have shown that dosage changes or mutations of genes that play an important role in early embryonic development could also contribute to miscarriage42. Identification of specific genes which lead to miscarriage could provide information for etiologic analysis. However, identifying the candidate genes for miscarriage is challenging, because most of the CNVs are large. By using gene prioritization analysis, we identified 309 miscarriage candidate genes from critical regions of 44 large CNVs and three statistically significant submicroscopic CNVs. The genes were significantly enriched in the developmental process, especially nervous-system development. Furthermore, 62.5% (193/309) of the candidate genes had mammalian phenotypes in MGI and 24.4% (47/193) of these can lead to murine embryonic lethality when mutated or at abnormal levels. Among these 309 miscarriage candidate genes, six (MFSD10, VLDLR, STT3A, BRPF1, SLC4A2 and SLITRK6) were reported previously to be candidate genes for human early development26, suggesting that these genes may be associated with the pathogenesis of miscarriage.
Our study has some limitations. The sample size is not sufficiently large to identify all miscarriage-associated CNVs. Therefore, further large cohort studies using high-resolution methods are required to validate the CNVs identified in this study and to identify other novel CNVs associated with miscarriage. In addition, no gene functional study was performed. Further functional studies will be necessary to demonstrate the role of these candidate genes in the pathogenesis of miscarriage.
In conclusion, the results of this study suggest that CNVs make a significant contribution to the pathogenesis of miscarriage. Furthermore, our study highlights the importance of ongoing analysis of CNVs in the study of miscarriage.
ACKNOWLEDGMENTS
The study was supported by the National Natural Science Foundation of China (81770236, 81801445, 81602300, 81701427 and 81801373), the Project on Maternity and Child Health of Jiangsu Province (F201818), the National Key R&D Program of China (2018YFC1002402) and the Natural Science Foundation of Jiangsu Province (BK20160139 and BK20181121).