A longitudinal epigenome-wide association study of preeclamptic and normotensive pregnancy

Background While preeclampsia (PE) is a leading cause of pregnancy-related morbidity/mortality, its underlying mechanisms are not fully understood. DNA methylation (DNAm) is a dynamic regulator of gene expression that may offer insight into PE pathophysiology and/or serve as a biomarker (e.g., risk, subtype, a therapeutic response). This study’s purpose was to evaluate for differences in blood-based DNAm across all trimesters between individuals eventually diagnosed with PE (cases) and individuals who remained normotensive throughout pregnancy, did not develop proteinuria, and birthed a normally grown infant (controls). Results In the discovery phase, longitudinal, genome-wide DNAm data were generated across three trimesters of pregnancy in 56 participants (n=28 cases, n=28 controls) individually matched on self-identified race, pre-pregnancy body mass index, smoking, and gestational age at sample collection. An epigenome-wide association study (EWAS) was conducted, using surrogate variable analysis to account for unwanted sources of variation. No CpGs met the genome-wide significance p value threshold of 9×10−8, but 16 CpGs (trimester 1: 5; trimester 2: 1; trimester 3: 10) met the suggestive significance threshold of 1×10−5. DNAm data were also evaluated for differentially methylated regions (DMRs) by PE status. Three DMRs in each trimester were significant after Bonferonni-adjustment. Since only third-trimester samples were available from an independent replication sample (n=64 cases, n=50 controls), the top suggestive hits from trimester 3 (cg16155413 and cg21882990 associated with TRAF3IP2-AS1/TRAF3IP2 genes, which also made up the top DMR) were carried forward for replication. During replication, DNAm data were also generated for validation purposes from discovery phase third trimester samples. While significant associations between DNAm and PE status were observed at both sites in the validation sample, no associations between DNAm and PE status were observed in the independent replication sample. Conclusions The discovery phase findings for cg16155413/cg21882990 (TRAF3IP2-AS1/TRAF3IP2) were validated with a new platform but were not replicated in an independent sample. Given the differences in participant characteristics between the discovery and replication samples, we cannot rule out important signals for these CpGs. Additional research is warranted for cg16155413/cg21882990, as well as top hits in trimesters 1–2 and significant DMRs that were not examined in the replication phase.


Quality Control
Discovery phase sample availability Figure S1. Overview of discovery phase data collection and quality control

Quality control report and issues
We used the R minfi and ENmix packages to examine the density plot of values, mean detection of p-values for each sample, averaged bisulfite conversion intensity plot, and performed data quality control (1)(2)(3)(4). CpG probes are multimodal and have two peaks as shown in the value density plot ( Figure S3).

Figure S3. Beta density plot for raw data
Most samples had low mean detection p-values, with only one sample having mean detection pvalues larger than 0.01 ( Figure S4).

Figure S4. Mean detection p-values for each sample
In examining that sample in more detail, it was identified as "bad" since it did not cluster with other samples based on median intensities ( Figure S5).
In total, 5 samples were identified as "bad" samples in Enmix quality control checks where 4 samples were low quality samples with the percentage of low quality CpGs greater than 0.01, and 1 sample was an outlier in the beta value distribution.

Figure S5. Clustering of samples based on median intensities
Next, DNA methylation levels between technical replicates were checked. In the raw data, discrepancies between methylation levels of some technical replicates indicated potential sources of unwanted variation (Figure S6 and S7).

Figure S6. Raw methylation beta values of technical replicates before normalization
Adjacent samples of the same color are replicates. The black horizontal line is the mean beta value for each sample.

Figure S7. Raw methylation M values of technical replicates before normalization
Adjacent samples of the same color are replicates. The black horizontal line is the mean beta value for each sample.
Next we used the R lumi package to take a more detailed look at the relationship between technical replicates using Multi-Dimensional Scaling (MDS) or hierarchical clustering methods 23 using clustering plots (5). Based on our laboratory design and the inclusion of 28 technical replicates, we expected 28 "true pairs" (i.e., perfect matches). In looking at the raw data, however, the initial clustering prior to normalization did not align well. Out of the 28 duplicate pairs, there were only 15 true pairs identified ( Figure S8). Figure S8. Clustering for all duplicates before normalization A "true" or "correct" pairing occurs when the last four digits of the sample ID match.

Normalization
Next, functional normalization was performed using the funtooNorm package which was designed to not only perform this critical quality control step, but also adjust for multiple cell types and tissues (6). This approach is an extension of the functional normalization method available in the minfi package (1).
The funtooNorm package allows the flexibility to select the ideal number of components using cross-validation, which are computed based on the control probes and cell type data. Principal component regression (PCR) or partial least square regression (PLS) model is fit for the funtooNorm normalization (6). Our DNA methylation data were unique in that they were longitudinal across three trimesters of pregnancy. Thus, we treated trimester as the "cell type" variable to retain variation due to time.
This package has a function which can be used to choose the number of components based on the cross-validation for each methylation channel. Based on our results, setting the number of components equal to 3 was an ideal choice as it produced a significantly lower RMSE compared to setting the number of components equal to 1 or 2 ( Figure S9).

Removal of bad probes
In terms of CpG probes, raw DNA methylation data were returned for 865,859 CpGs before quality control procedures by minfi. Our quality control procedures included (1) removal of probes with SNPs at CpG or single base extension (SBE) sites; (2) removal of cross-reactive probes; (3) since our study only included females, CpG probes on the Y chromosome were removed; and (4) removal of "bad" probes identified by ENmix (percentage of low quality CpG sites across samples greater than 0.05 whose detection p-values were greater than 0.01 or number of beads less than 3). Table S1 shows number of CpGs removed by each quality control step.
The density plot of normalized data after quality control is shown in Figure S10. * Enmix bad probes refer to any remaining bad probes that were identified by R Enmix package. The "bad probes" was identified if the percentage of a CpG site across samples having detection p-values greater than 0.01 was greater than 0.05 or the number of beads less than 3 .

Figure S10. Post-quality control beta density plot
Clustering analysis of duplicate pairs Next, duplicate pairs were again examined to assess overall performance of quality control procedures. In repeating the examination of clustering for all technical replicates across all trimesters, 18 out of 28 pairs were true pairs. Thus, the funtooNorm method improved the clustering results compared to the raw data clustering results.
To examine the clustering of samples and replicates across each trimester, we created clustering and bar plots to compare each sample and its replicate within each trimester. Perfect matching (9 out of 9 duplicate pairs) was displayed for pregnancy trimester 1 ( Figure S11). For trimester 2, 6 out of 10 were true pairs ( Figure S13) and for trimester 3, 8 out of 9 were true pairs ( Figure  S15). These results demonstrated that funtooNorm successfully reduced unwanted sources of variation.
Boxplots depicting the methylation levels between technical replicates support this as most duplicate pairs had the same or very similar levels of methylation within each trimester ( Figure  S12, S14, and S16).

Figure S11. Post-quality control clustering plot for trimester 1 based on beta values
A "true" or "correct" pairing occurs when the last four digits of the sample ID match.         Table S5. Sensitivity analysis examining Trimester 3 DNAm data in the discovery (i.e., validation), replication, and combined samples of replication phase while including samples with "Pass" or "Check" laboratory designation (Table S3 (Table 5), include only samples with a "Pass" laboratory designation. This table presents a sensitivity analysis including both "Pass" and "Check" samples.  Validation sample, those with DNAm data from both the discovery (Infinium® MethylationEPIC Beadchip) and replication (pyrosequencing platforms); Independent sample, an independent sample of participants completely separate from the discovery phase; Combined sample, combination of validation and independent samples.