Assessing the effect of childbearing on blood DNA methylation through comparison of parous and nulliparous females

Background Pregnancy and childbirth have been connected to modified risk of a wide variety of conditions in later life, including neurodegenerative disorders and cancers. The presence, extent, and direction of the effect that childbearing status has on decreasing or increasing the risk of these conditions differs depending on the disease. The mechanisms by which pregnancy and childbirth modify the risk of diseases are still unknown. DNA methylation (DNAm) alterations that occur during pregnancy and persist after childbirth may help us understand this phenomenon. Results Blood DNAm was available from 89 women (28 parous; 61 nulliparous) at ages 18 and 26 years in the Isle of Wight birth cohort; no significant differences in the population characteristics were present between the analyzed population and the full cohort. We performed an epigenome-wide association study on 389,355 CpGs and identified 184 CpGs to be significantly differentially methylated between parous and nulliparous women after adjusting for confounders and multiple testing. Of these CpGs, 105 had regression coefficients in the same direction in an independent Mexico City based ELEMENT cohort, of which 13 were significant (replication P < 0.05). These 13 CpGs were associated with 16 unique genes. DNAm levels tracked with gene expression in 3 of the replicated genes, one of which ( TM2D3 ) was differentially expressed in parous vs nulliparous women. Gene disease association analysis identified a network of parous-associated diseases. Conclusions Our results suggest that pregnancy and childbirth lead to DNAm changes in parous women and these changes persist at least 6 months and up to 8 years postpartum. Parous-related CpG sites may play a role in how childbearing status modifies risk of later life diseases in women. Further studies are needed to explore the linkage and mechanism.


Background
Women's reproductive history may be linked to future health outcomes, as bearing children provides protection from some conditions and increases the risk of others.For example, childbearing can decrease the risk of subsequent hormone receptor positive breast, ovarian, and endometrial cancers in parous women (those who have given birth) as compared to nulliparous women (those who have not given birth) [11,24,28,38,39,49].In contrast, even when pregnancy is healthy with no gestational complications, parous women have been shown to have increased risk of subsequent Alzheimer's disease, stroke, and myocardial infarction as compared to their nulliparous counterparts, with parity (the number of pregnancies) also influencing cardiovascular disease risk [15,59].
Pregnancy has also been linked to changes in DNA methylation (DNAm) -the addition or removal of methyl groups to cytosine-phosphate-guanine (CpG) sites on DNA, frequently in promoter regions -which regulate gene expression.DNAm changes throughout pregnancy are considered as an important part of gene regulation and normal cellular control mechanisms [14,16].Fradin et al. investigated DNAm changes in maternal blood of a longitudinal cohort of pregnant women from early pregnancy (up to 18 weeks) to late pregnancy (35 weeks and later) and found a gain of methylation in genes involved in morphogenesis, such as ezrin, and a loss of methylation in genes promoting maternal-infant bonding [22].Gruzieva et al. identified 196 CpGs displaying significant longitudinal DNAm change from pre-pregnancy (0.86 to 3.35 years before conception), through early (weeks 10 to 14) and late (weeks 26 to 28) pregnancy, to post-pregnancy (2 to 4 days after delivery); 17 of these CpGs were significant in an independent replication cohort.Many of these CpGs are involved in metabolism-related pathways, with some specific to physiological changes in pregnancy, such as mammary gland development [25].Other studies have focused on the association of adverse pregnancy conditions, such as gestational diabetes [18,21] and preeclampsia [3,26,53] on DNAm.Finally, patterns of DNAm from blood samples have been associated with various diseases including breast cancer [7,9,27,41,57,58] and degenerative neuropathies [1,56].Furthermore, neurodegenerative diseases are increasingly being recognized as related to reproductive history [6,17,35,52].
Thus, the elements of the evidence connecting childbearing status, DNAm, and subsequent health outcomes for women may be summarized as: 1) bearing children alters the risk for certain diseases later in life, 2) childbearing changes DNAm, and 3) patterns of DNAm are associated with certain diseases.These connections are predicated on the idea that pregnancy leaves a "footprint" such that pregnancy-associated DNAm changes are maintained post pregnancy and subsequently impact health outcomes.Indeed, a growing number of studies have examined if the DNAm changes that occur during pregnancy remain from 2-4 days [25], to an average at 10 months [46], to a median at 16.7 years [8] after childbirth.However, knowledge gaps remain as Campagna et al. did not examine DNAm before or during pregnancy.Furthermore, Lin et al. and Gruzieva et al. made contradicting conclusions with Lin et al. reporting that DNAm after delivery was overall higher than that during pregnancy in a pilot study of 10 women, while Gruzieva et al. showed a decrease in average methylation levels in 91% of identified CpGs over the studied period (before, during and 2-4 days after pregnancy).
There are two major limitations to the current knowledge of childbearing related DNAm changes.First, studies of parous women have not examined nulliparous controls.Second, only a single pilot study [46] has examined if DNAm changes that occur during pregnancy remain beyond the immediate post-partum period; it is important to provide further evidence as to whether these changes are stable and maintained over time or soon return to their prepartum status.This study aimed to fill these two gaps by identifying differential DNAm CpGs between parous and nulliparous women and by evaluating DNAm from pre-pregnancy samples and samples collected at least 6 months after delivery.Specifically, we used data and samples from the Isle of Wight (IOW) birth cohort to conduct an epigenome-wide association study to find differential methylation at two time points in young adulthood between parous and nulliparous women.Many DNA regions experience changes in methylation levels during periods of physiologic changes, such as puberty and pregnancy; in order to focus only on DNAm changes attributable to pregnancy, the methylome comparisons were made between parous and nulliparous women at ages 18 and 26 years, timepoints that were pre-and post-pregnancy for the parous subjects.We assessed replication in the Early Life Exposure in Mexico to ENvironmental Toxicants (ELEMENT) cohort.Further validation of our results was pursued by assessing gene expression, GOmeth gene ontology, KEGG pathway gene ontology, and differentially methylated regions in the IOW cohort.

Participant characteristics
Characteristics of the IOW study population are provided in Table 1 (left panel).Of the 750 female study subjects, 144 had DNAm samples at both ages 18 and 26 years.Among these 144 women, 89 had information on their childbearing history ("yes" vs "no") validated from their medical records.Figure 1 depicts how analyzed participants were selected from the parent cohorts.Characteristics, including covariates adjusted in the downstream analyses, of the 89 analyzed subjects are also provided and compared with the population in Table 1 to check whether the analytical sample represents the cohort population.There were no significant differences (p-value > 0.05) in the characteristics of the cohort population and analyzed subset (Table 1; left panel).Of those 89 subjects included in analyses, 28 (31.46%) had given birth at least 6 months before age 26 (and after age 18) and 61 (68.54%) had not had a child prior to 26 years.The same characteristics were also compared between the 28 parous and 61 nulliparous women in Table 2 (left panel).There were significant differences in socioeconomic status (SES) between parous and nulliparous IOW women (p-value = 0.0125, Table 2; left panel).Women with higher income/housing and lower education tended to be nulliparous up to age 26.Smoking status was marginally significantly different (p-value = 0.0634) between parous and nulliparous IOW women, with a higher proportion of nulliparous women being non-smokers (including no environmental tobacco smoke exposure inside the home) at age 18 compared to parous women.Similar characteristics of the ELEMENT female study population (N = 275) and the DNAm subsample (n = 54) are detailed in Table 1 (right panel).Participants from ELEMENT were on average younger than IOW participants and their ages of data collection at T1 and T2 had a small range.Due to their younger age, a smaller proportion (n = 23, ~ 8% of 275) of ELEMENT participants had given birth at least 6 months before the second DNAm measurement.The nulliparous comparisons (n = 31) were selected to match parous women with ages as close as possible, which explains that (a) the mean age of analyzed samples at both T1 and T2 was significantly older than the study population at both T1 and T2 (p-values < 0.0001) and (b) the proportion of active and passive smokers was significantly higher in analyzed samples (p-value = 0.0001).As shown in Table 2, the mean ages of parous and nulliparous women in ELEMENT at T1 (pre-pregnancy) were 16.3 and 16.22 yrs old, respectively, and did not differ significantly from each other.The mean age of nulliparous women was significantly younger than parous women at T2 (postpartum) even though we tried to match ages between parous women and nulliparous controls (mean age of nulliparous women: 18.6; mean age of parous women: 20; p-value < 0.0001).Similar to the IOW study, SES was also significantly different between parous and nulliparous women in the analyzed ELEMENT cohort (p = 0.017).Women with higher SES at T1 tended to be nulliparous up to T2.Smoking status was not significantly different between parous and nulliparous women in ELEMENT (p-value = 0.5638).In both, the IOW and ELEMENT cohorts, body mass index (BMI) changes were not significantly different between parous and nulliparous women.

ttScreening and linear regression analyses
To  of BMI from age 18 to 26 yrs, smoking categories (treated as an ordinal variable), and socioeconomic status (treated as a categorical variable) at age 18 were adjusted for in the linear regression model as covariates.184 CpGs were significantly associated with childbearing (FDR-adjusted p-value < 0.05 from the linear regression).The regression coefficients and p-values of the 184 CpGs are provided in the Appendix (ATable 1).

Replication analyses
The 184 significant CpGs identified in the IOW study were extracted from the ELEMENT DNAm dataset (n = 54) and their association with childbearing was estimated using linear regression models, adjusting for equivalent or comparable covariates as in discovery analyses in IOW.Age at T2 was included in the replication analysis since women in ELEMENT subsample demonstrated variability in age.105 of 184 CpGs had regression coefficients in the same direction as the IOW study, of which 13 were significantly replicated in the ELEMENT study (raw p-value < 0.05).Table 3 provides the regression coefficients, standard errors (SE) and p-values of these 13 CpGs for the IOW and ELEMENT samples, and their 16 associated unique genes.Boxplots in Fig. 2 demonstrate the mean methylation levels (in beta values) of these 13 replicated CpGs, 7 of which had negative regression coefficients and 6 of which had positive regression coefficients in parous compared with nulliparous women in both cohorts.

Gene enrichment analysis and gene-disease association
We conducted gene set enrichment analysis on the 16 genes (associated with 13 replicated CpGs) using GOmeth method, which performs unbiased gene set testing following differential methylation analysis.No KEGG biological pathways survived after FDR adjustment.We then searched these 16 genes in the disease gene database DisGeNET v7.0 (10 of the 16 genes were present in the database) and identified a number of diseases associated with parous-related genes with gene disease association score greater than 0.3 (AFig.1).The 10 genes found in the DisGeNET v7.0 include SEMA3A, AKAP13, SLC15A2, DOK2, ADAMTS17, ADARB2, NUP37, PLD5, TPK1, and SLC15A3, which were most notably linked with neoplasms, mental disorders and nervous system diseases (e.g.Parkinson's disease, schizophrenia, autism), and substance use (e.g.prescription drug abuse, substance use disorder).

RNAseq analysis
Of the 16 genes associated with the 13 identified and replicated parous-related CpGs, 13 genes were present in the gene expression data generated from blood samples at age 26 in the IOW study.We checked the Pearson correlations between the DNAm (M-value) of each of the 13 CpGs and the log-transformed gene expression level of their corresponding genes (n = 165) and found the methylation levels of cg19086905 (ADAMTS17; r = -0.14;p-value = 0.064) and cg13375690 (PDE7A; r = -0.21;p-value = 0.006) were negatively associated with gene expression levels, while the methylation level of cg01537571 (TM2D3; r = 0.16; p-value = 0.037) was positively associated with the gene expression level of TM2D3.Considering the non-normality of DNAm and gene expression levels, we also checked Spearman correlations, which provide similar results.We further tested the differential gene expression at age 26 between parous and nulliparous women (n = 75; 25 parous vs 50 nulliparous women) for the 16 genes associated with dmCpGs.Only one gene, TM2D3, had significantly lower gene expression levels in parous women compared to the nulliparous counterpart (regression coefficient = -0.17;p-value = 0.0085).

Differentially Methylated Regions (DMRs)
Using the dmrff R package, we identified 1 DMR that was marginally significantly associated with childbearing (FDR adjusted p-value = 0.1024; Table 4).The DMR contained 1 gene and 12 unique CpG sites as listed in Table 4. None of these 12 CpG sites overlapped with the individual CpG sites identified in the EWAS study using ttScreening and linear regression.However, 5 out of the 12 CpGs were significantly differentially methylated between parous and nulliparous women with raw p-values < 0.05 using the IOW dataset (Table 4; column 8).None of these 5 CpGs were significantly replicated in ELEMENT.

Discussion
Our aim was to identify parous-related DNAm changes that are sustained for at least 6 months after childbirth, as a step toward understanding how childbearing impacts disease risk later in life.To identify CpGs that potentially underlie this relationship, we performed an epigenomewide association study in the IOW birth cohort to test the hypothesis that parous women have significantly different DNAm as compared to nulliparous women and that these changes persist at least 6 months after giving birth.We identified 184 CpGs that were significantly differentially methylated between parous and nulliparous women.We pursued replication of our findings in a subsample of the ELEMENT cohort.Out of the 184 CpGs identified in the IOW, 105 of these CpGs had regression coefficients in the same direction in the ELEMENT samples, of which 13 were significantly associated in the ELEMENT participants using linear regression while adjusting for the same covariates.Among these 13 parous-related CpGs, 7 had significantly lower levels of methylation in parous versus nulliparous women in both IOW and ELEMENT cohorts.The remaining 6 CpGs had higher levels of methylation in parous women compared to their nulliparous counterparts.
The gene ontology analysis of the 16 genes associated with 13 parous-related CpGs using GOmeth did not identify any significant KEGG biological pathways.We further searched the 16 parous-related genes in the disease gene database DisGeNET v7.0 and identified a genedisease network of genes that were mainly linked with neoplasms, mental disorders and nervous system diseases (e.g.Parkinson disease, schizophrenia, autism), and substance use (e.g.prescription drug abuse, substance use disorder).
The gene expression data of 13 (out of 16) genes were available from whole blood samples (age 26) in the IOW cohort and were used to further test (i) the association between DNAm of the 13 parous-related CpGs and gene expression of the 13 genes and (ii) the differential gene expression between parous and nulliparous women at age 26.Lower methylation of cg19086905 (ADAMTS17) and cg13375690 (PDE7A; Body) were linked with higher gene expression levels.Only one gene, TM2 domain containing 3 (TM2D3), had significantly lower gene expression level in parous women compared to the nulliparous counterpart (regression coefficient = -0.1104;p-value = 0.0028).Similarly, the methylation level of cg01537571, which is located within a CpG island and near the transcription start site of TM2D3, was significantly lower in parous women than their nulliparous counterparts in both IOW and ELEMENT cohorts.Our association analysis indicated that subjects with higher methylation level of cg01537571 have higher gene expression level at age 26 in IOW study.Genetic variants in TM2D3 have been reported to be significantly associated with increased risk of developing late-onset Alzheimer's disease through an exome-wide association analysis [34].
increased parity has been shown to be a risk factor for Alzheimer's disease [15], which may be related to the lower methylation of cg01537571 in parous women.
We searched the genes associated with the 13 identified and replicated CpGs through GTEx Portal (https:// gtexp ortal.org/ home/) and found that less than half of them (TM2D3, DOK2, PDE7A, AKAP13, SLC15A3, TPK1) are routinely expressed in whole blood samples, with the remainder are primarily expressed in other tissues.With regard to our gene enrichment analysis findings, TM2D3, SLC15A2, PLD5, ADARB2 and PARPBP are expressed in the brain.MAP1LC3C, NUP37 and ADAMTS17 are expressed in breast tissue.Further studies are needed to investigate how these parous-related genes are differentially expressed across a variety of tissues between parous and nulliparous women.
Among the genes identified in these analyses, SEMA3A (semaphorin 3A) is of interest based on its potential role in neoplasias and other diseases that may have a modified risk in parous women.SEMA3A had decreased DNAm in parous women (Fig. 2, Table 3) and was associated with neoplasms as well as mental health and reproductive disorders in the DisGeNET gene disease-class heatmap (AFig 1).Semaphorins are guidance cue signaling molecules that direct developing neurons and other tissues during embryogenesis.They also persist in adulthood, playing a role in the regulation of neuroplasticity [13].Changes in SEMA3A gene expression levels have been noted in several nervous systems diseases, including schizophrenia (increased in cerebellar Purkinje cells) [23], which was a disorder associated with SEMA3A in our gene-disease network.With regard to neoplasia, SEMA3A plays paradoxical roles in the tumor microenvironment in different types of tumors [10,36,55].For example, SEMA3A promotes tumor progression in hepatocellular carcinoma by means of enhanced proliferation, migration, and invasion and was most commonly upregulated in patients with recurrent hepatocellular carcinoma [32].In contrast, SEMA3A serves as a tumor suppressor in head, neck, and breast cancers [20,31].Specifically in breast cancer, SEMA3A increases antitumoral M1 macrophage proliferation leading to recruitment and activation of CD8 + T cells and NK cells, which repress tumor growth [20].While more studies are needed to explore the link and mechanism, it may be that parous women may have a lower risk of developing breast cancers through lower DNAm near the transcription start site of SEMA3A.Studies investigating whether blood DNAm levels change similarly in blood (assessed here) and breast tissue (target tissue of interest) following childbearing are also needed to understand the implications of this finding.
The key strength of this study is its design that compares DNAm profiles of parous women at two timepoints, pre-and post-pregnancy, to those of nulliparous women.Studies that focus on the health effects of pregnancy and parity commonly include data points pre-and post-pregnancy, but less commonly do they have comparable data on nulliparous women.Being able to directly compare methylation at CpGs epigenome-wide between parous and nulliparous women in both the discovery and replication cohorts is a clear asset to advancing our understanding of DNAm as it may relate to long-term health outcomes for women.Furthermore, our study investigated the methylation changes from pre-pregnancy to months or years after childbirth and identified CpGs at which methylation changes persisted.These changes, in turn could be related to later-life diseases if our assertions hold up in future studies.Another strength is our comprehensive approach to validation with DNAm followed by measurement of gene expression, assessment of DMRs, and gene enrichment analyses in the discovery cohort.The mapping of diseases onto the genes associated with the identified CpGs helps to put our findings into context and provides direction in terms of genes of interest.Additionally, replication of our findings generated from the IOW cohort (98% White in the United Kingdom) in a subsample from an independent cohort from another geographic location (Mexico) and of a different ethnic background (all Hispanic) supports both the internal validity and the generalizability of the original findings.Finally, our study is disease-independent, so as not to limit our outcomes to a predetermined set of conditions.While we were a priori interested in any potential relationships between DNAm and reproductive cancers, we did not anticipate that substance abuse disorders, for example, would be among the outcomes related to the associated genes and thus our results open new avenues to explore.
Our study design has some notable strengths relative to other studies examining the impact of reproductive history on DNAm.Specifically, we elected to use at least a 6-month window following parturition before post-pregnancy samples were collected.In contrast, Gruzieva et al. [25] collected samples 2-4 days after delivery, which may be too short an interval to identify DNAm changes that persist.This time in the follow up interval for sample collection may explain the contrasting results in that none of the CpGs identified in this study were among the CpGs identified by Gruzieva et.al. [25].
There are several limitations to note in our study.First, the nulliparous status of women in the discovery cohort was obtained from the medical record.While unlikely, if a woman received prenatal care and/or gave birth off the Isle of Wight it may not have been recorded in the medical records to which we had access.Second, we excluded women who were within 6 months of childbirth from our study to avoid any transient methylation changes due to childbirth.However, it is unknown how long it takes methylation to stabilize or to diminish postpartum.Third, the sample size of parous women is relatively small (28 parous women in the IOW study and 20 parous women in the ELEMENT study), which would potentially reduce the capability of detecting CpGs that are truly associated with childbearing status.Fourth, the discovery (IOW) and replication (ELEMENT) cohorts are different in terms of population distribution in race/ethnicity, age, and geography, which could limit the power of discovering replicable parous-related epigenetic biomarkers.Fifth, factors/exposures that potentially change during pregnancy may contribute to DNAm changes persisting after childbirth.However, we are unable to assess these effects since these factors/exposures were not measured in nulliparous women at similar times as parous women.Finally, as common practice in epidemiological studies, blood DNA was used a proxy for changes in other tissues, and it is known that many genes have different methylation and expression profiles that vary by tissue (i.e.breast, brain).Similarly, the gene expression data is limited to whole blood samples in IOW, and as noted above, approximately half of the identified genes are not expressed in this tissue.

Conclusion
Our results suggest that epigenetic changes may be associated with childbearing status in women.If these findings in methylation of blood leukocyte DNA reflect epigenetic changes in other relevant tissues, it may provide a possible mechanism of how childbearing is linked to various future health outcomes.Diseases associated with the identified genes involved cancers, the nervous system, and substance uses.Additional studies are needed in cohorts with follow up in later life to investigate whether and how parous-related DNAm changes are associated with later-life health outcomes in women.

Discovery cohort -IOW
The Isle of Wight (IOW) is a 3-generation birth cohort established in the United Kingdom to prospectively study the natural history of asthma and allergic conditions [4].Study participants were enrolled in the birth cohort study between 1989 and 1990 by contacting potential parents (first generation, IOW-F0) while they were carrying the second generation (IOW-F1).The IOW-F1 generation has been followed up for 26 years.IOW-F1 female participants of the birth cohort have been followed through their pregnancy occurring between the years 2011 and 2015.When F1-women became pregnant, information was gathered about their lifestyle and health during pregnancy.The racial identity of most participants (98%) in the F1 generation was white.

Replication cohort -ELEMENT
ELEMENT (Early Life Exposures in Mexico to ENvironmental Toxicants), is a multi-generation birth cohort of Hispanic women from Mexico City maternity hospitals that has been following participants for nearly three decades with a focus on studying how dietary and environmental toxicant exposures affect women and children.ELEMENT originally consisted of three sequential birth cohorts and multiple early childhood follow-up visits, as previously described [48].A subset of the original participants (550 mother-child pairs) from the second and third cohorts attended two follow-up visits, approximately two years apart, during adolescence adolescent years.In 2019, ELEMENT launched a third generation study ('E3Gen'), following up ELEMENT participants that had become parents.Extensive questionnaire (demographic, dietary, and more) and anthropometry data along with blood and urine samples were collected from mothers and their offspring at each study visit.

Childbearing status
For IOW, participants' childbearing status as either parous ("yes" indicating giving at least one live birth) or nulliparous ("no" indicating no reported pregnancy or childbirth) was assessed up to age 26 by their participation in the 3rd generation IOW study (inclusion requires having a child) as well as from the medical records.Parous women who were pregnant or within 6 months postpartum at the time of DNAm measurement at age 26 were excluded from this study.Women with documented early pregnancy termination, infertility treatment, and late miscarriage were excluded from the analyses.
For the ELEMENT replication subsample, participants who had children ages 6 months to 5 years of age in 2019 were invited to attend the E3Gen visit with their child or children.Twenty-three eligible female participants with one or two children each participated; this is referred to as T2 for the parous group.All had epigenetic analyses, e.g., DNAm of blood leukocytes, completed prior to pregnancy at the adolescent T1 visit.For the nulliparous group, women were selected that had a) T1 epigenetic analysis completed, b) attended a second adolescent visit > 2 years later without a known pregnancy in between (their T2 timepoint), and c) did not have missing data for necessary covariates.44 women met these criteria; from among them 31 were selected to match the age range at T1 of the parous group cohort of origin as closely as possible.

DNA methylation measurements and cell estimation
In both cohorts, peripheral whole blood samples were collected twice.In the IOW cohort study, the first sample (T1) was collected at age 18 years, prior to any pregnancy.The second sample (T2) was collected at age 26 years, which was at least 6 months after childbirth for the parous subjects.DNA was extracted from blood samples using a standard salting procedure [47] or commercial kits (Qiagen, Germantown, MD, USA).DNA concentration was determined using fluorometry (Qubit, Invitrogen ™ , Thermo Fisher Scientific,Waltham, MA, USA).To measure epigenome-wide methylation levels, age 18 samples were processed on either Illumina Infinium HumanMethylation450 arrays or Infinium Methylatio-nEPIC BeadChips (Illumina, Inc., San Diego, CA, USA).All the age 26 samples were processed on MethylationE-PIC BeadChips.Multiple identical control samples were assigned to each bisulfite conversion batch for assessment of assay variability.DNAm data were preprocessed (including background level correction and quantile normalization) using the CPACOR pipeline for data from both platforms (i.e.HumanMethylation450 and Methyla-tionEPIC) [43,44].Specifically, an R package, minfi was used to quantile-normalize the DNAm intensity data [5,44].Quantile-normalized intensities were then used to calculate beta (β) values, which represent proportions of intensity of methylated (M) over the sum of methylated and unmethylated (U) sites/probes (β where c is a constant to prevent zero in the denominator if M + U is too small).Beta values close to 0 or 1 tend to suffer from severe heteroscedasticity, and base-2 logit transformed beta values (denoted as M-values) have been demonstrated to perform better in differential analysis of methylation levels [19].Therefore, methylation levels in the analysis are represented using M-values.As quality control, probes that did not reach a p-value of 10 -16 in at least 95% of samples were excluded.The same criterion was applied to exclude samples, i.e., samples with p-value > 10 -16 in at least 95% of the CpGs [44].CpGs on sex chromosome were excluded.Use of both Human-Methylation450 (containing > 450 K CpGs) and Meth-ylationEPIC (containing > 850 K CpGs) platforms is a consequence of technology evolution (HumanMethyla-tion450 arrays are no longer available).The HumanMeth-ylation450 and MethylationEPIC arrays were initially processed (quality control, quantile normalization of background-corrected intensities) separately for beta estimation and then analysis of 389,355 common CpG sites between 450 K and EPIC was undertaken.Methylation data were quantile normalized using the minfi R package and batch corrected using ComBat in R [37].
In the ELEMENT cohort study, peripheral blood samples were collected via venipuncture and stored in EDTA-preservative after all study visits.For both parous and nulliparous women, the T1 sample for epigenetic analyses came from the early adolescent study visit; participants were ages 11 to 18 years (mean age: 16.3).The T2 sample came from the second adolescent visit for nulliparous subjects and from the E3Gen post-birth visit for parous subjects (age range 16 to 21 years, mean age: 19.23).DNA was isolated from blood leukocytes via the Flexigene kit (Qiagen) and DNAm was quantified via the MethylationEPIC platform.DNAm data were preprocessed using the same pipeline for data as the IOW study.
Since DNAm was analyzed in whole blood, there was a need to assess changes of DNAm while adjusting for changes in cell composition of whole blood.Proportions of the cell types CD4 + T-cells, CD8 + T-cells, natural killer cells, B-cells, neutrophils, monocytes, and eosinophils were estimated using the approach by [29,33] through estimateCellCounts function from the R package minfi [5].Prior studies suggested that DNA methylation analyzed in whole blood may be substantially confounded by cell composition effects [30,33,42].Estimated cell compositions were thus included in the analyses as confounders.

Confounding variables
In both discovery (IOW) and replication (ELEMENT) cohorts, confounding variables included BMI, smoking status (including active and second-hand/passive smoking), and socioeconomic status (SES) at both time points when DNAm measurements were collected (denoted as T1 and T2).Active smoking was recorded as "yes" for a current smoker.Second-hand smoking was recorded as "yes" if anyone in the household smokes inside the home.Considering that active and passive smoking and SES may be affected by reproductive status (for example, mothers or family may quit smoking during and after pregnancy; household income may reduce if mothers do not work outside the home beyond a typical maternity leave), smoking status and SES at T2 may serve as mediators between childbearing status and DNAm at T2, and thus were not included as confounding variables.K-mean clustering analysis was applied on the active and second-hand smoking at T1 and grouped female participants into 4 smoking categories (the number of clusters K = 4 is chosen due to the largest CCC value) interpreted as "non-active and non-passive smokers at T1, " "passive and non-active smokers at T1, " "active and non-passive smokers at T1, " and "active and passive smokers at T1, " which were coded as 1, 2, 3, and 4 respectively and were treated as an ordinal variable in the regression analysis.The K-mean clustering also imputed the missing values in smoking status at T1.
In the IOW study, SES categories were generated by K-mean clustering using education, housing information and income, which were extracted from the questionnaire.Education was recorded as "School, " "6th Form College, " "Further Education, " and "Other." Housing information was recorded in ordinal data as "Rented Council/Housing Assoc., " "Rented Private, " "Lives with Parents, " "Owned Private, " and "Other." Income was also recorded in the ordinal groups as "Less than £12,000, " "£12,000 to £17,999, " "£18,000 to £29,999, " "£30,000 to £41,999, " "Greater than £42,000, " and "Prefer not to say." In the cluster analysis, "Other" in education and housing information, and "Prefer not to say" in the income were treated as missing values.In the replication study with ELEMENT cohort data, socio-economic status is measured through a scale created by the Mexican Association of Marketing Research (AMAI) that classifies a household's socioeconomic status based on 6 variables: level of education of head of household, number of bedrooms, number of bathrooms, employed people over the age of 14, number of vehicles, and access to internet [12].The standardized criteria for socio-economic levels in Mexico "AMAI 13 × 6" was utilized for analyses."AMAI 13 × 6" rule classified the households in Mexico into 6 levels: A (highest), B, C, D, E, and F (lowest).
The change of BMI from T1 to T2, smoking categories at T1, and socioeconomic status at T1 were included in the model as covariates.Age of the participants at T2 was also included as a covariate in the replication analysis with the ELEMENT data.

Gene expression measurement
RNAseq was used to assess gene expression in blood samples collected at age 26 years.Total RNA was extracted using PAXgene Blood RNA kits (Qiagen, Germantown, PA, USA) and brought up in PAXgene Blood miRNA PreAnalytic elution buffer.RNA was quantified by Qubit dsDNA High Sensitivity assays (Invitrogen ™ , Thermo Fisher Scientific, Waltham, MA, USA) and RIN scores were determined on a 2100 Bioanalyzer instrument (Agilent Technologies, Santa Clara, CA, USA).Sequencing libraries were prepared using the Illumina TruSeq Stranded mRNA Library Preparation Kits with IDT for Illumina Unique Dual Index barcode primers following the manufacturer's recommendations (Illumina, Inc., San Diego, CA, USA).Completed libraries were assessed for quantity and quality using a combination of Qubit dsDNA High Sensitivity assays and either an Advanced Analytical Fragment Analyzer High Sensitivity DNA NGS or Agilent 4200 TapeStation High Sensitivity DNA 1000 assays.Libraries were pooled in equimolar amounts for multiplexed sequencing.Pools were quantified using the KAPA Library Quantification qPCR kits (Roche Diagnostics Corp., Indianapolis, IN, USA) or Invitrogen Collibri Quantification qPCR kits.Each pool was loaded onto one lane of either an Illumina HiSeq 4000 flow cell, an Illumina NovaSeq 6000 S4 flow cell, or an Illumina NovaSeq 6000 SP flow cell and sequencing was performed in a 2 × 75 bp or 2 × 150 bp paired end format.Base calling was done by Illumina Real Time Analysis (RTA) (v2.7.7 or v3.4.4) and output of RTA was demultiplexed and converted to FastQ format with Illumina Bcl2fastq (v2.19.1 or v2.20.0).HISAT2 (v2.1.0)aligner was used to map reads against the Human Genome GRch37 version 75 [40] and alignment files were produced in the Sequence Alignment Map (SAM) format.Files were converted into the Binary Alignment Map (BAM) format using SAMtools (v1.3.1)[45].The number of reads mapped to each gene in the same reference genome used for alignment were counted using HTse (v0.00.1) [2].The countToFPKM package (https:// github.com/ AAlhe ndi17 07/ count ToF-PKM) was used to calculate normalized read count FPKM (Fragments Per Kilobase of transcript per Million mapped reads) and their log transformed values were used for data analysis.

Statistical analysis
The study aimed to examine whether DNAm changes are associated with childbearing status between ages 18 and 26 years old.A set of analyses was carried out to achieve this goal.First, ttScreening approach was applied to screen for significant CpG sites with differential DNAm (M-values) at age 26 that are potentially associated with childbearing status up to age 26 while adjusting for composition of cell types CD8 + T cells, CD4 + T cells, natural killer cells, B cells, monocytes, and granulocytes at age 26.ttScreening is a screening approach that repeatedly splits samples into training and test data and utilizes the training and testing data to filter out uninformative DNAm sites [51].CpGs showing statistical significance in at least 50% of randomly-split "training and testing" datasets were selected as potentially outcome-associated CpGs and were then considered for further downstream analysis.Second, focusing on the potential significant CpGs identified by ttScreening, linear regressions were implemented with DNAm (M-values) at age 26 as the dependent variable, childbearing status as the predictor, and cell-adjusted DNAm at age 18 or T1 (i.e.residuals from linear regressions of DNAm at age 18 on the six cell compositions at age 18), six cell compositions at age 26, BMI changes from age 18 to 26 yrs, smoking categories at age 18, and socioeconomic status at age 18 as covariates.Third, in addition to identification of individual parous-associated CpGs, we performed regional DNAm analyses using DMRff [54] to identify differentially methylated regions (DMRs) associated with childbirth in women.We first conducted EWAS using linear regression with DNAm in M values at age 26 as a dependent variable, childbearing status as an independent variable, and with the following covariates: 1) cell-adjusted DNAm at age 18 (i.e.residuals from linear regressions of DNAm at age 18 on the six cell compositions at age 18), 2) six cell compositions at age 26, 3) BMI changes from age 18 to 26, 4) smoking categories at age 18, and 5) SES at age 18.We then performed regional DNAm analysis using DMRff with the regression coefficients, standard errors, and p-values of childbearing status generated from the EWAS analysis.Fourth, we conducted gene ontology analyses of differentially methylated individual CpGs using the GOmeth function from the R package Miss-Methyl [50].Fifth, we searched the disease gene database, DisGeNET v7.0, which is a platform containing large collections of genes and variants associated with health status, to characterize genes and related molecular pathways of the CpGs of interest.Sixth, differential gene expression analysis was performed on the genes (mapping from the identified CpGs in the Second Step) to further test (i) the association between DNAm of identified CpGs and gene expression of the corresponding genes at age 26 using Pearson's and Spearman's correlation and (ii) the differential gene expression at age 26 between parous and nulliparous women using linear regressions.

Replication analysis
To replicate the findings identified in the discovery cohort (IOW), linear regressions were implemented with DNAm at T2 as the dependent variable, childbearing status as the predictor, and cell-adjusted DNAm at T1 (i.e.residuals from linear regressions of DNAm at T1 on age and cell composition at T1), six cell compositions and age at T2, BMI changes between T1 and T2, smoking categories at T1 and T2, as well as socioeconomic status (SES) at T2 as covariates.

Fig. 1
Fig.1Flow diagram depicting how analyzed participants were selected from the parent cohorts (*: only included are parous women who were at least 6 months postpartum at the time of DNAm measurement at T2; **: nulliparous women were selected to match the age range at T1 of the parous group cohort of origin as closely as possible.Details can be found in "Childbearing status" of Sect.5)

Fig. 2
Fig. 2 Violin and boxplots of DNAm (in beta values representing 0 to 1 proportion methylated) measured at age yrs (IOW) and time T2 (ELEMENT) of 13 statistically significant parous-related CpGs with the same coefficient direction in IOW and ELEMENT cohorts

Table 1
Descriptive characteristics of study populations and analyzed samples (IOW and ELEMENT)

Table 2
Descriptive characteristics of parous and nulliparous women (IOW and ELEMENT) a The two smoking categories with no subjects were excluded from the chi-square test b Smoking clusters were generated from passive and active smoking categories at T1 using K-mean clustering analysis (see "Confounding variables" in Method section for more details)

Table 3
CpG sites significantly associated with childbearing history in discovery and replication analyses and with the same coefficient direction a Nulliparous is reference; negative values indicate a decrease in DNAm in parous women

Table 4
DMRs associated with childbearing status Differentially methylated positions (DMPs) are the individual CpG sites identified in EWAS study using ttScreening and linear regression * raw p-value < 0.05 a