Ethics
The present study was approved by the Ethics Committees of the National Cancer Center (Tokyo, Japan), Keio University, and Iwate Medical University (Approval ID: HG H25-19). All experiments were performed in accordance with the approval guidelines. All participants provided written informed consent.
Study participants and sample collection
The study was divided into two phases: discovery and replication. In the discovery phase, the whole-blood-derived DNA of 50 ccRCC patients was provided by the NCC (Tokyo, Japan), and that of 50 sex- and age-matched healthy controls was provided by the Tohoku Medical Megabank Community-Based Cohort Study (TMM CommCohort) [37]. In the replication phase, the whole-blood-derived DNA from 48 individuals (independent of the discovery phase) was provided by the NCC and TMM CommCohort for the ccRCC and control samples, respectively. All blood samples were collected in EDTA blood collection tubes, and blood-derived DNA was purified using the Gentra Puregene Blood Kit for the NCC samples and QIAGEN Autopure LS for the TMM CommCohort samples. ccRCC was diagnosed by imaging (MRI or CT), as well as microscopic and gross observations by a skilled pathologist. In both phases, the ccRCC and TMM group samples were age-matched within ±2 years, and body mass index (BMI) was matched whenever possible. Individual health checkups and a self-reported questionnaire were used to define smoking status, alcohol consumption status, and prevalent diseases (i.e., chronic kidney disease, hypertension, dyslipidemia, and diabetes). Significance tests on participant characteristics were performed using paired t-tests for numerical values such as laboratory values, and chi-square tests for the number of people such as with disease prevalence.
Preparation of sequencing libraries and TB-seq
Aliquots of genomic DNA (gDNA; 1 μg), eluted in 50 μL of TE buffer, were sheared into 150–200 bp fragments using a Covaris LE220 Focused-ultrasonicator (Thermo-Fisher Scientific, Waltham, MA, USA). Sequencing libraries for TB-seq were prepared using Agilent SureSelect Human Methyl-Seq Capture Library and Reagent Kits on an Agilent Bravo automated library preparation system (Agilent Technologies, Santa Clara, CA, USA) according to the manufacturer's instructions. In the replication phase, we used an Agilent SureSelect Human Methyl-Seq Custom Capture Kit with customized probes (i.e., the common DNA methylation variations (CDMV) [22] probe set). Bisulfite treatment for all sequencing libraries was performed using an EZ DNA Methylation-Gold Kit (Zymo Research, Irvine, CA, USA). The pooled 17-pM libraries were spiked with 20% PhiX Control v3 (Illumina Inc., San Diego, CA, USA) and subjected to paired-end sequencing (2 × 125 bp) on a HiSeq 2500 system (Illumina).
DNA methylation profiling in targeted CpGs
Raw sequencing data were converted to FASTQ format using Illumina bcl2fastq2 Conversion software v2.20. The sequencing quality of the raw data was assessed using FastQC software v0.11.5, adapters were trimmed using Trim Galore software v0.4.2, and short reads (< 20 bp) were removed. The remaining reads were aligned to the Genome Reference Consortium Human Reference 38 (GRCh38) build, downloaded from the UCSC Genome Browser website [38], using Novoalign software v3.6.5. The aligned data were processed using bioinformatics tools, as previously reported [25]. Methylated CpGs were detected by NovoMethyl software v1.4, and the methylation levels in targeted CpGs were calculated as beta values using R software v3.3.1.
Epigenome-wide association study
EWAS was performed using a linear regression model to identify differentially methylated CpGs associated with ccRCC. The analysis was adjusted for age, sex, and the estimated cell-type composition. Cell-type composition was estimated using the estimateCellCounts.R function in the minfi Bioconductor package [39, 40] with modifications. Specifically, instead of using the Illumina Infinium HumanMethylation450 data on sorted blood cell populations implemented in the FlowSorted.Blood.450 k package in Bioconductor, we referred to the DNAm data from six sorted leukocyte populations (B cells, CD4+ T lymphocytes, CD8+ T lymphocytes, monocytes, NK cells, and neutrophils) from the whole-genome bisulfite sequencing of 12 individuals [22, 23] and selected the top 50 CpGs showing hypermethylated and hypomethylated CpGs in each cell type for further analysis. All analyses were conducted under the same conditions in both phases, and the genome-wide suggestive threshold was defined as p < 1.00 × 10−6; however, the Bonferroni-corrected significance threshold was set to p < 1.59 × 10−8 (0.05/3145479) in the discovery phase and p < 3.42 × 10−8 (0.05/1460699) in the replication phase. The statistical analysis scripts used in this study are available on our GitHub website (https://github.com/H-Ohmomo/ccRCC_EWASscript_20220323).
The predictive accuracy of the identified CpGs as whole-blood-based DNAm biomarkers for ccRCC was evaluated by plotting receiver operating characteristic (ROC) curves. Specificity, sensitivity, and the area under the curve (AUC) were calculated based on DNAm levels using ROCR [41] and pROC [42] packages.
To examine trends between DNAm levels of significantly different methylated CpGs for ccRCC and the ccRCC cancer stage, we performed a Jonckheere–Terpstra trend test using the DescTools package [43] in R. A statistically significant difference was defined as Ptrend < 0.05.
Quantitative trait methylation analysis
To evaluate the relationships between ccRCC-associated CpGs and gene expression, we conducted cis-eQTM analyses with a simple linear regression model using the iMETHYL database [44]. The expression levels [log10(FPKM + 0.1)] of each protein-coding gene were specified as dependent variables, and the DNAm levels of each ccRCC-associated CpG site were specified as independent variables. The neighborhood (cis) was defined as the area within 1 million base pairs upstream or downstream of the ccRCC-related CpGs.