Skip to main content

Epigenetic scores of blood-based proteins as biomarkers of general cognitive function and brain health

Abstract

Background

Epigenetic Scores (EpiScores) for blood protein levels have been associated with disease outcomes and measures of brain health, highlighting their potential usefulness as clinical biomarkers. They are typically derived via penalised regression, whereby a linear weighted sum of DNA methylation (DNAm) levels at CpG sites are predictive of protein levels. Here, we examine 84 previously published protein EpiScores as possible biomarkers of cross-sectional and longitudinal measures of general cognitive function and brain health, and incident dementia across three independent cohorts.

Results

Using 84 protein EpiScores as candidate biomarkers, associations with general cognitive function (both cross-sectionally and longitudinally) were tested in three independent cohorts: Generation Scotland (GS), and the Lothian Birth Cohorts of 1921 and 1936 (LBC1921 and LBC1936, respectively). A meta-analysis of general cognitive functioning results in all three cohorts identified 18 EpiScore associations (absolute meta-analytic standardised estimates ranged from 0.03 to 0.14, median of 0.04, PFDR < 0.05). Several associations were also observed between EpiScores and global brain volumetric measures in the LBC1936. An EpiScore for the S100A9 protein (a known Alzheimer disease biomarker) was associated with general cognitive functioning (meta-analytic standardised beta: − 0.06, P = 1.3 × 10−9), and with time-to-dementia in GS (Hazard ratio 1.24, 95% confidence interval 1.08–1.44, P = 0.003), but not in LBC1936 (Hazard ratio 1.11, P = 0.32).

Conclusions

EpiScores might make a contribution to the risk profile of poor general cognitive function and global brain health, and risk of dementia, however these scores require replication in further studies.

Introduction

A projected 152 million people worldwide will have dementia by 2050 [1]. Dementia is characterised by cognitive decline with consequent serious limitations on performance of everyday activities, independence and quality of life in older age, even in the absence of dementia [2,3,4]. Stable, consistent biological markers (biomarkers) of these outcomes might facilitate early detection, opening up a window for possible intervention [5]. Biomarkers can also be used for monitoring progression, understanding the molecular mechanism of a phenotype, and identification of candidate drug targets. Proteins are commonly used as biomarkers, as changes in levels can be indicative of disease status or risk [6]. Discovery of blood-based biomarkers is desirable as blood is easily accessible, can be taken at routine appointments and is cost-effective.

The term epigenetics refers to chemical modifications to DNA that do not affect the underlying sequence. The dynamic nature of these modifications can affect gene expression levels, therefore in turn affecting protein expression levels [7, 8]. DNA methylation (DNAm) is the most commonly studied epigenetic modification, and is typically characterised by the addition of a methyl group to the cytosine base in a cytosine-guanine motif (CpG). Epigenetic scores (EpiScores) for proteins are typically derived from a linear weighted sum of DNAm levels at CpG sites that, in combination, are predictive of protein levels. The selection of CpGs for EpiScores is typically performed via penalised regression models whereby all sites on a genome-wide array are input as potential features. A recent study directly compared measured CRP and CRP EpiScore levels, showing higher test–retest reliability for the EpiScore [9]. For inflammatory proteins such as CRP, it may be that EpiScores for protein levels provide a more stable reflection of chronic inflammation. Additionally, the CRP EpiScore was found to have an average 6.4-fold stronger effect estimates in associations with brain imaging measures, versus measured CRP [10]. EpiScores for CRP and IL6 inversely associated with general cognitive function in studies where the measured protein association was less strong/significant [9,10,11]. These studies suggest that protein EpiScores might represent useful markers of brain health.

Gadd et al. [12] trained 84 protein EpiScores in the German cohort KORA which had a Pearson correlation (r) > 0.1 and P < 0.05 when compared with measured protein levels in a test cohort. Several of these EpiScores were found to associate with a number of disease outcomes including stroke, type 2 diabetes and lung cancer, highlighting their potential usefulness as clinical biomarkers of disease [12].

In this study, we examined if the same 84 EpiScores were associated with a general factor for cognitive function, longitudinal cognitive change, and magnetic resonance imaging (MRI) measures of global brain health and longitudinal brain changes in up to three independent cohorts (depending on data availability): Generation Scotland (GS), the Lothian Birth Cohorts of 1921 (LBC1921) and 1936 (LBC1936). We also investigated if the EpiScores associated with an incident (binary) dementia diagnosis and time-to-dementia (Fig. 1).

Fig. 1
figure 1

Study overview. A study summary figure highlighting the data available for cognitive testing (maximum N for one cognitive test at wave 1), dementia diagnosis (N for cases and controls with methylation data) and brain imaging (maximum N for one MRI measure at wave 2) across the LBC1921, LBC1936 and GS cohorts. Created with BioRender.com

Methods

The Generation Scotland cohort

The Generation Scotland: Scottish Family Health Study (GS) has been previously described in detail by Smith et al. [13]. In brief, GS is a cohort study of > 20,000 individuals and their families living in Scotland. GS provides a resource with genome-wide genetic, epigenetic, clinical, lifestyle and sociodemographic data. Participants in GS were aged between 17 and 99 years at the study baseline, with a mean age of 47.5 years (SD: 14.93). 58.8% of the GS cohort is female. Recruitment took place between 2006 and 2011.

Lothian Birth Cohorts of 1921 and 1936

The Lothian Birth Cohorts of 1921 and 1936 (LBC1921 and LBC1936) comprise older community-dwelling adults born in 1921 and 1936 [14, 15]. Most of these individuals sat a test of general intelligence—the Moray House Test No.12—at about age 11 years while at school in Scotland in 1932 and 1947, respectively. Subsequently, individuals residing in the Lothian area later in life were invited to join the LBC studies (at age ~ 79 for LBC1921 and age ~ 70 for LBC1936). Participants underwent a series of physical, cognitive and medical assessments at regular intervals (age ~ 79, 83, 87, 90, 92 for LBC1921, and age ~ 70, 73, 76, 79, and 82 for LBC1936). The participants provided blood samples from which genetic, epigenetic and biomarker data were obtained. Beginning at the second assessment (age 73), LBC1936 participants also underwent whole brain structural MRI scans. The mean age at wave 1 in the LBC1936 is 69.5 years (SD: 0.83) and 49.77% of the cohort is female. The mean age at wave 1 in the LBC1921 is 79.1 (SD: 0.58) and 58.17% of the cohort is female.

EpiScores in the Generation Scotland and the Lothian Birth Cohorts

The training and testing of the 84 EpiScores used in this study have been described previously [12]. Briefly, the 84 EpiScores are the result of penalised regression models (one model for each protein) that select CpG sites that, in weighted combination, are predictive of individual protein levels. These 84 EpiScores met a testing threshold of Pearson r > 0.1 and p < 0.05 when projected into a subset of the GS cohort (STRADL: N = 778 [16]) and compared with measured protein levels [12]. EpiScores were projected into methylation data (beta values) in the LBC’s (nLBC1921 = 436; nLBC1936 = 895) and the GS cohort (n = 18,413) before being corrected for technical covariates through linear regression. Details of DNAm profiling and processing are detailed in Additional file 1. In GS, EpiScores were corrected for set and batch. In LBC1921 and LBC1936, EpiScores were corrected for set, array and hybridization date. Residuals from these regression models were extracted and used for all downstream analyses.

Cognitive test data

Cognitive testing in the GS cohort and LBC studies have been described previously [13,14,15, 17]. Briefly, cross-sectional scores are available for four tests in GS, while longitudinal data were considered for 13 tests in LBC1936 and for four tests in LBC1921 (full details in Additional file 1 with summary data presented in Additional file 3: Tables S1–S3).

MRI measures of brain health in LBC1936

Protocols for magnetic resonance imaging (MRI) acquisition and processing carried out in the LBC1936 cohort have been described previously [18]. Four measures of global brain health were considered: total brain volume, grey matter volume, normal appearing white matter volume, and white matter hyperintensity volume. These were assessed across four waves of data collection, starting at wave 2 (age 73). Intracranial volume was included as a covariate for baseline (intercept) analyses to account for any previous volume loss. Full details are presented in Additional file 1 with summary data in Additional file 3: Table S4.

Dementia diagnosis information

Dementia diagnosis data were obtained in all three cohorts. Full details are provided in Additional file 1. Briefly, GS data were obtained via linkage to primary and secondary care records (235 incident cases, 7555 controls—filtered so all were aged 65 or above at the time of diagnosis/censoring, Additional file 3: Table S5).

Dementia diagnosis information for LBC1921 and LBC1936 were obtained through electronic heath record (EHR) review [19]. Clinician home visits were also carried out by request in LBC1921 and LBC1936 when a participant showed signs of cognitive impairment, self-reported dementia, or an LBC researcher suspected the participant may have dementia. Consensus meetings were held to discuss each participant and determine whether they had dementia, probable dementia, possible dementia or had no dementia diagnosis, as well as dementia subtype (where possible) [19]. Of the participants with methylation data, there were 108 and 110 participants with a dementia diagnosis (692 and 452 controls) in LBC1936 and LBC1921, respectively (Additional file 3: Table S5). Date of diagnosis/time-to-event information was only available in LBC1936.

Statistical analysis

All statistical analysis were performed in R version 4.0.3 (2020-10-10) [20].

Descriptive statistics

Sample sizes for cognitive, brain MRI measures and dementia shown in Fig. 1 highlight the maximal data available. Sample sizes vary across tests and decrease over follow-up in both LBC cohorts. Therefore, data available for each test/measure at each wave can be found in Additional file 3: Tables S1–S5.

Predictors of cognitive function, cognitive change and MRI brain health measures

All analyses in this study included basic- and fully-adjusted models. Outcomes of interest were latent intercept and slope variables for brain and cognitive outcomes (see Additional file 1 for details and Additional file 3: Tables S6–S9). Regression analyses were performed within the structural equation framework. Continuous covariates were scaled to aid in model convergence and to obtain standardised regression coefficients.

$${\text{Basic}}\,{\text{model:}}\,{\text{Outcome}}\,{\text{of}}\,{\text{interest}}\,\sim \,{\text{EpiScore}}\, + \,{\text{Age}}\,{\text{at}}\,{\text{baseline}}\, + \,{\text{Sex}}$$
$$\begin{aligned} {\text{Full}}\,{\text{model}}: \, & {\text{Outcome of interest}}\,\sim \,{\text{EpiScore}}\, + \,{\text{Age at baseline}}\, + \,{\text{Sex}}\, + \,{\text{Scottish}}\,{\text{Index}}\,{\text{of}}\,{\text{Multiple}}\,{\text{Deprivation}}\,\left( {{\text{SIMD}}} \right) \\ & + \,{\text{Epigenetic}}\,{\text{smoking}}\,{\text{score }}\left( {{\text{EpiSmoker}}} \right)\, + \,{\text{Body}}\,{\text{Mass}}\,{\text{Index}}\left( {{\text{BMI}}} \right)\, + \,{\text{Alcohol}}\,{\text{units}}\,{\text{per}}\,{\text{week}} \\ \end{aligned}$$

Information regarding alcohol intake (weekly units) was obtained via a self-reported questionnaire. The Scottish Index of Multiple Deprivation (SIMD, 2006) in LBC1936 and GS, and social grades determined by highest reached occupation in LBC1921 [21, 22]. The SIMD ranged from 1 (most deprived) to 6505 (least deprived). Body Mass Index (BMI in kg/m2) was obtained via an in-clinic physical assessment. Epigenetic smoking scores were calculated for each participant from their DNAm profiles using the R package EpiSmokEr [23].

Descriptive statistics for all covariates in GS, LBC1936 and LBC1921 can be found in Additional file 3: Tables S10–S12.

Dementia analysis

Associations between the EpiScores and incident dementia (binary outcome) were tested in all three cohorts using logistic regression models with the “glm” function (with family set to binomial) from the R stats package (version: 4.0.3) [20]. Time-to-dementia analyses were also run in LBC1936 and GS using Cox proportional hazards (CoxPH) models through the R survival package (version: 3.3.1) [24]. Sensitivity analyses to account for related individuals (GS) and death as a competing risk (GS and LBC1936) were also considered (details in Additional file 1).

In GS, baseline appointments were from 2006 to 2011 and the dementia censor date was set to April 2022 resulting in a maximum of ~ 11–16 years lag time between sample collection and dementia. In LBC1936, sample collection was carried out at baseline appointment where participants were ~ age 70 and maximum age at the last dementia ascertainment is 86 years resulting in a maximum lag time of 16 years between sample collection and dementia. In LBC1921, sample collection was carried out at baseline appointment where participants were ~ age 79 years. The consensus meeting was in 2016 meaning the maximum age at dementia diagnosis could be 95; therefore, the maximum lag time between sample collection and dementia is ~ 16 years.

Meta-analyses

Meta-analyses were performed to obtain effect sizes weighted by sample size using results from the general cognitive function, dementia diagnosis (binary) and time-to-dementia models using the R package metafor (version: 4.2-0) [25].

Gene ontology enrichment and biological function/pathway look-up

Gene ontology analysis (GO) was performed on the statistically significant protein EpiScores using Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) software [26]. Specifically, we analysed the genes that code for the proteins that the EpiScores are proxies for. Benjamini–Hochberg False Discovery Rate (FDR) correction was used at a threshold of PFDR < 0.05. A gene list covering all of the 84 EpiScores was used as the background set of genes to test against. The UniProt database (Release 2024_01) [27] and Reactome database (Release 87) [28] were used to look-up the biological function/pathways for the proteins mapping to the significant EpiScores across each analysis.

Results

EpiScore associations with general cognitive function

A latent factor of general cognitive function (intercept) generated in three separate cohorts was regressed on 84 EpiScores in separate linear models. Benjamini and Hochberg false discovery rate (PFDR < 0.05) correction was applied to results to account for multiple testing. In the basic models (adjusted for age and sex), 20 (GS), 13 (LBC1921) and 31 (LBC1936) EpiScores were significantly associated with general cognitive function (Absolute standard effect size range: 0.09–0.41, PFDR < 0.05, Additional file 3: Table S13). Fully adjusted models were also examined in which no significant associations were found in LBC1936, 5 associations were found in LBC1921, and 40 associations in GS (Absolute standard effect size range: 0.02–0.53, PFDR < 0.05, Additional file 3: Table S13). A meta-analysis of effect sizes for general cognitive function in all three cohorts was performed for basic- and fully-adjusted model results (Additional file 3: Table S14). In the meta-analysis of the basic results for general cognitive function, 36 EpiScores were found to be significantly associated (Absolute standard effect size range: 0.06–0.22, PFDR < 0.05). 18 EpiScore associations from the fully adjusted models were significant (Absolute standard effect sizes range: 0.03–0.14, PFDR < 0.05, Fig. 2). The biological functions and pathways of the 18 genes that these significant protein EpiScores correspond to were explored via GO enrichment analysis and database look-up (Additional file 3: Table S15). No enriched biological processes or pathways were found (PFDR > 0.05). The most common Reactome pathway identifier was neutrophil degranulation (R-HSA-6798695) for six of the proteins.

Fig. 2
figure 2

Meta-analysis of EpiScore associations with general cognitive function in three cohorts. The plot shows the meta-analysed regression coefficients for each EpiScore from the fully adjusted models, found to be significantly associated with general cognitive function after FDR correction. Error bars indicate 95% confidence intervals [95% CI]

Next, in LBC1921 and LBC1936, a general latent factor of cognitive change (slope) was regressed on the 84 EpiScores in separate linear models. No EpiScores were significantly associated with a general factor of cognitive change in either LBC1921 or LBC1936 in models with basic adjustments after FDR correction. However, three EpiScores were nominally associated (Absolute standard effect size range: 0.19–0.2, P < 0.05) with slope in the LBC1921. Fully adjusted models were also examined in which no FDR significant associations were found in either cohort (Additional file 3: Table S16). One and three EpiScores were nominally associated with slope in LBC1936 and LBC1921, respectively (Absolute standard effect size range: 0.09–0.25, P < 0.05) in the fully adjusted models. The number of associations with cognitive function and change summarised in Additional file 3: Table S17.

EpiScore associations with MRI measures of global brain health

The 84 EpiScores were then studied in relation to four MRI markers of brain health (total brain volume, grey matter volume, normal appearing white matter volume, and white matter hyperintensity volume) and their changes over time in LBC1936. In basic models adjusted for age and sex, 21 EpiScores were significantly associated with total brain volume, 28 with grey matter volume, 16 with normal appearing white matter volume and 3 with white matter hyperintensity volume (Absolute standard effect size range: 0.04–0.21, PFDR < 0.05). Eleven EpiScores were found to associate with three or more MRI measures of brain health in the basic models (PFDR < 0.05, Fig. 3). The maximum number of proteins that the eleven EpiScores were proxies for, and that had overlapping Reactome identifiers was two (Additional file 3: Table S15). SELL and PIGR had the Reactome identifier for neutrophil degranulation (R-HSA-6798695), while SELL and ICAM5 had the Reactome identifier for immunoregulatory interactions between a lymphoid and non-lymphoid cell (R-HSA-198933). No biological pathways were found to be significantly enriched for these results (PFDR > 0.05). Fully adjusted models were examined to determine if associations were attenuated when covariates relevant to brain health were included in the model (Additional file 3: Table S18). One EpiScore, CRP, was found to be associated with grey matter volume at baseline (Standard effect size: − 0.09, PFDR < 0.05).

Fig. 3
figure 3

EpiScore associations with cross-sectional MRI measures of brain health in the LBC1936 cohort. Plot shows the standardised regression coefficients for each EpiScore found to be significantly associated with three or more MRI measures of brain health in basic models in the LBC1936 cohort (FDR < 0.05). Error bars indicate 95% confidence intervals [95% CI]. Direction of effect sizes and 95% CI have been recoded for white matter hyperintensity volume

EpiScore associations with the slope (change over ~ 9.5 years) for each MRI measure were tested; no FDR significant results were observed. However, nominally significant associations were observed for all four measures in models with basic adjustments (Absolute standard effect size range: 0.12–0.3, P < 0.05, Additional file 3: Table S19). A summary table for the number of associations observed with cross-sectional and longitudinal MRI measures for basic and fully adjusted models can be found in Additional file 3: Table S20.

EpiScore associations with incident dementia

EpiScores associations with a binary dementia diagnosis and time-to-dementia were examined. As age at dementia diagnosis was not available in the LBC1921, this cohort was only included in logistic regression models testing the binary outcome for dementia. In the logistic regression models with basic adjustments, three significant associations: SEMA3E (OR 1.54), ICAM5 (OR 0.66), and PIGR (OR 0.66) were observed in LBC1921 (PFDR < 0.05). Of these associations, the ICAM5 EpiScore (OR 1.2) was nominally significant in GS (P < 0.05). The remaining two associations were not nominally significant in GS or LBC1936 (Fig. 4Panel A, Additional file 3: Table S21).

Fig. 4
figure 4

EpiScore associations with incident dementia (binary) and time-to-dementia. Panel A: FDR significant Odds ratios for EpiScores with dementia status (binary) in LBC1921. The Odds ratios for GS and LBC1936 for the same EpiScores have been included for comparison despite being only nominally significant or non-significant. Panel B: FDR significant Hazard ratio for EpiScores with incident dementia for the mixed effects Cox models in GS. The Hazard ratios for LBC1936 from the CoxPH model for the same EpiScores have been included for comparison despite being non-significant (Panel B). All error bars represent 95% confidence intervals [95% CI]

CoxPH models were used to test the association between EpiScores and time-to-dementia in GS and LBC1936 (Additional file 3: Table S22). Additionally, mixed effects Cox models were run in GS to account for relatedness (Additional file 3: Table S23). In the basic mixed effects models for GS, 13 significant (PFDR < 0.05) associations were observed; these were not found to be significant in the LBC1936 cohort (Fig. 4Panel B). The biological function/pathways of the 13 proteins that these significant EpiScores correspond to were explored (Additional file 3: Table S15). Seven Reactome pathway identifiers overlapped with two proteins including platelet degranulation (R-HSA-114608), collagen degradation (R-HSA-1442490), degradation of the extracellular matrix (R-HSA-1474228), and regulation of insulin-like growth factor transport and uptake by insulin-like growth factor binding proteins (R-HSA-381426). No biological pathways were found to be significantly enriched (PFDR > 0.05). Only the MMP2 EpiScore was significant (HR 0.71, PFDR < 0.05) after full adjustments were made to the mixed effects models in GS. No significant findings were observed in the competing risk models for either cohort (Additional file 3: Table S24). However, there was good agreement between the hazard ratios from the cause-specific and competing risk models (Additional file 2: Fig. S4). Additionally, separate meta-analyses of the results obtained from the logistic regression models and the time-to-event analyses were carried out (Additional file 3: Tables S25 and S26). No EpiScores were found to be significant in either analysis after FDR correction.

Discussion

In this study, we identified multiple associations between protein EpiScores and measures of cognitive function, MRI proxies of brain health and dementia in three independent cohorts.

EpiScore associations with general cognitive function and global brain volume

Eighteen EpiScores were significantly associated with general cognitive function in the meta-analysis of the fully adjusted results. Several of the proteins that these EpiScores are proxies for are involved in overlapping biological pathways including neutrophil degranulation (S100A9, LYZ, MMP9, PIGR, RETN, and MPO). Three of the eighteen EpiScores (for CRP, PIGR, and NTRK3) were also associated with total brain volume, grey matter volume, and normal appearing white matter volume at baseline in models with basic adjustments. The EpiScore for PIGR was associated with incident dementia (as a binary outcome) in the LBC1921 basic-adjusted model (OR 0.66, PFDR < 0.05) and time-to-dementia in the GS mixed effects Cox models with basic adjustments but in the opposite direction (HR 1.34, PFDR < 0.05). The EpiScore for CRP was also found to associate with time-to-dementia in the GS mixed effects model with basic adjustments (HR 1.35, PFDR < 0.05).

CRP is an acute-phase inflammatory protein, mainly transcribed in response to high levels of inflammatory proteins [29,30,31]. In previous studies performed with the LBC1936 and GS, an EpiScore for CRP was found to be negatively associated with cognitive function [9, 10]. Differences in methodology between this study and previous studies in LBC1936 and GS exist. However, this comparison, particularly in GS where the sample size is over an order of magnitude greater than previous CRP EpiScore—cognition studies, provides excellent replication for the association. To our knowledge, no association between PIGR and NTRK3 with general cognitive function/global volumetric MRI measures of brain health have been described previously. PIGR is expressed in the endothelial cells of the blood–brain barrier and binds to the bacteria Streptococcus pneumonia (Pneumococci)—a leading causes of bacterial meningitis [32]. According to the World Health Organisation, one in five individuals who previously had meningitis suffer from long-term complications including cognitive impairments [33]. NTRK3 binds Neurotrophin-3, an important neuro-growth factor. A reduction in transcript levels of NTRK3, also known as tyrosine kinase receptor C (trkC), has been observed in patients with schizophrenia [34, 35]. Previous studies have also highlighted a potential association between NTRK3 and hippocampal function in both mice and humans [36,37,38].

S100A9 EpiScore associates with time-to-dementia in GS

Thirteen EpiScores were significantly associated with incident time-to-dementia in the GS Cox mixed effects model (basic adjustments). Some of the proteins that these thirteen scores are proxies for had overlapping Reactome pathway identifiers. RARRES2 and LGALS3BP had identifiers for platelet degranulation (R-HSA-114608), while MMP12 and MMP2 had identifiers for collagen degradation (R-HSA-1442490). Of these thirteen EpiScores, four (PIGR, S100A9, C5, and CRP) were found to overlap with associated EpiScores in the meta-analyses of the general cognitive function results (fully adjusted models). Seven EpiScores (NCAM1, SLITRK5, IGFBP4, MMP12, PIGR, CRP, and ICAM5) overlapped with associations observed in three or more of the MRI global volumetric measures at baseline in LBC1936 (basic adjustments). The S100A9 EpiScore is of particular interest as it has been previously identified as a potential biomarker of Alzheimer’s disease [39]. A significant inverse association was also observed between the S100A9 EpiScore and general cognitive function in the meta-analysis, in the fully adjusted model. S100A9 is known to co-localise with amyloid beta and is thought to contribute to plaque formation [39, 40]. A reduction in S100A9 in an Alzheimer’s disease mouse model resulted in less amyloid beta plaques and less cognitive impairment [41]. In cerebrospinal fluid of patients with Alzheimer’s disease, significantly lower levels of S100A9 protein has been observed compared with controls [39]. The lack of replication observed in the LBC1936 could be due to using all-cause dementia as a phenotype and biomarkers may be specific to certain subtypes of dementia. Future work could investigate the subtypes of dementia to determine if different EpiScores associate with a specific subtype.

Strengths and limitations

Strengths of this study include the large sample sizes and multi-cohort analyses. Further, inclusion of the longitudinal Lothian Birth Cohorts facilitated the study of EpiScores as biomarkers of cognitive change over time. Different sample sizes and lifestyle factors as well as age profiles may explain why some EpiScore associations did not replicate across all cohorts. However, the scores that did replicate across all cohorts are potential biomarkers of cognitive function across the mid-to-late life.

A limitation of this study is that the population base is of European ancestries and living in Scotland and so may not generalise to other populations. Further work is needed to investigate if the findings are generalisable across the life course. This is important to understand because early life immune dysregulation contributes to some neurodevelopmental disorders. For example, a recent study found that a DNAm-based proxy of CRP correlates with inflammation burden and MRI markers of encephalopathy of prematurity after preterm birth [42]. Another limitation of this study is the lack of replication for MRI findings and the consideration of EpiScores from a single time-point. The absence of the measured proteins in these cohorts is also a limitation as we were unable to compared EpiScore performance against measured protein.

Conclusion

In conclusion, 84 protein EpiScores were tested against measures of general cognitive function, brain health and incident dementia across three human cohorts. Several EpiScores analysed in this study may augment typical risk factors of brain health, however further replication studies are required.

Availability of data and materials

The source datasets from the cohorts that were analysed during the current study are not publicly available due to them containing information that could compromise participant consent and confidentiality. Data can be obtained from the data owners. Instructions for accessing Generation Scotland data can be found here: https://www.ed.ac.uk/generation-scotland/for-researchers/access; the ‘GS Access Request Form’ can be downloaded from this site. Completed request forms must be sent to access@generationscotland.org to be approved by the Generation Scotland Access Committee. According to the terms of consent for GS participants, access to data must be reviewed by the GS Access Committee. Instructions for accessing Lothian Birth Cohort data, alongside a Data Request Form template, Data Summary Tables and Data Dictionaries can be found here: https://www.ed.ac.uk/lothian-birth-cohorts/data-access-collaboration.

Code availability

All R code used in analyses is provided at: https://github.com/hmsmith22/EpiScore_biomarkers_of_brain_health.

References

  1. Nichols E, Steinmetz JD, Vollset SE, Fukutaki K, Chalek J, Abd-Allah F, et al. Estimation of the global prevalence of dementia in 2019 and forecasted prevalence in 2050: an analysis for the Global Burden of Disease Study 2019. Lancet Public Health. 2022;7(2):e105–25.

    Article  Google Scholar 

  2. Tucker-Drob EM, Briley DA, Starr JM, Deary IJ. Structure and correlates of cognitive aging in a narrow age cohort. Psychol Aging. 2014;29:236–49.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Plassman BL, Langa KM, Fisher GG, Heeringa SG, Weir DR, Ofstedal MB, et al. Prevalence of cognitive impairment without dementia in the United States. Ann Intern Med. 2008;148(6):427–34.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Bárrios H, Narciso S, Guerreiro M, Maroco J, Logsdon R, de Mendonça A. Quality of life in patients with mild cognitive impairment. Aging Ment Health. 2013;17(3):287–92.

    Article  PubMed  Google Scholar 

  5. Lleó A, Cavedo E, Parnetti L, Vanderstichele H, Herukka SK, Andreasen N, et al. Cerebrospinal fluid biomarkers in trials for Alzheimer and Parkinson diseases. Nat Rev Neurol. 2015;11(1):41–55.

    Article  PubMed  Google Scholar 

  6. Pase MP, Beiser AS, Himali JJ, Satizabal CL, Aparicio HJ, DeCarli C, et al. Assessment of plasma total tau level as a predictive biomarker for dementia and related endophenotypes. JAMA Neurol. 2019;76(5):598–606.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Lea AJ, Vockley CM, Johnston RA, Del Carpio CA, Barreiro LB, Reddy TE, et al. Genome-wide quantification of the effects of DNA methylation on human gene regulation. Elife. 2018;7:e37513.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Hillary RF, McCartney DL, Harris SE, Stevenson AJ, Seeboth A, Zhang Q, et al. Genome and epigenome wide studies of neurological protein biomarkers in the Lothian birth cohort 1936. Nat Commun. 2019;10(1):3160.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Stevenson AJ, McCartney DL, Hillary RF, Campbell A, Morris SW, Bermingham ML, et al. Characterisation of an inflammation-related epigenetic score and its association with cognitive ability. Clin Epigenetics. 2020;12(1):113.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Conole ELS, Stevenson AJ, Maniega SM, Harris SE, Green C, Hernández MDCV, et al. DNA methylation and protein markers of chronic inflammation and their associations with brain and cognitive aging. Neurology. 2021;97(23):e2340–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Stevenson AJ, Gadd DA, Hillary RF, McCartney DL, Campbell A, Walker RM, et al. Creating and validating a DNA methylation-based proxy for interleukin-6. J Gerontol A Biol Sci Med Sci. 2021;76(12):2284–92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gadd DA, Hillary RF, McCartney DL, Zaghlool SB, Stevenson AJ, Cheng Y, et al. Epigenetic scores for the circulating proteome as tools for disease prediction. Elife. 2022;11:e71802.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Smith BH, Campbell A, Linksted P, Fitzpatrick B, Jackson C, Kerr SM, et al. Cohort profile: generation Scotland: Scottish family health study (GS:SFHS). The study, its participants and their potential for genetic research on health and illness. Int J Epidemiol. 2013;42(3):689–700.

    Article  PubMed  Google Scholar 

  14. Deary IJ, Gow AJ, Pattie A, Starr JM. Cohort profile: the Lothian birth cohorts of 1921 and 1936. Int J Epidemiol. 2012;41(6):1576–84.

    Article  PubMed  Google Scholar 

  15. Taylor AM, Pattie A, Deary IJ. Cohort profile update: the Lothian birth cohorts of 1921 and 1936. Int J Epidemiol. 2018;47(4):1042.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Navrady LB, Wolters MK, MacIntyre DJ, Clarke TK, Campbell AI, Murray AD, et al. Cohort profile: stratifying resilience and depression longitudinally (STRADL): a questionnaire follow-up of Generation Scotland: Scottish family health study (GS:SFHS). Int J Epidemiol. 2018;47(1):13–4.

    Article  CAS  PubMed  Google Scholar 

  17. McCartney DL, Hillary RF, Conole ELS, Banos DT, Gadd DA, Walker RM, et al. Blood-based epigenome-wide analyses of cognitive abilities. Genome Biol. 2022;23(1):26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wardlaw JM, Bastin ME, Valdés Hernández MC, Maniega SM, Royle NA, Morris Z, et al. Brain aging, cognition in youth and old age and vascular disease in the Lothian birth cohort 1936: rationale, design and methodology of the imaging protocol. Int J Stroke. 2011;6(6):547–59.

    Article  PubMed  Google Scholar 

  19. Mullin DS, Stirland LE, Buchanan E, Convery C-A, Cox SR, Deary IJ, et al. Identifying dementia using medical data linkage in a longitudinal cohort study: Lothian birth cohort 1936. BMC Psychiatry. 2023;23(1):303.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Team RC. R: A language and environment for statistical computing. MSOR connections. 2014;1.

  21. Office of the Chief Statistician tSG. Scottish Index of Multiple Deprivation: 2009 General report. 2009.

  22. Office GR. Census 1951: classification of occupations. London: HMSO; 1956.

    Google Scholar 

  23. Bollepalli S, Korhonen T, Kaprio J, Anders S, Ollikainen M. EpiSmokEr: a robust classifier to determine smoking status from DNA methylation data. Epigenomics. 2019;11(13):1469–86.

    Article  CAS  PubMed  Google Scholar 

  24. Therneau TM. A package for survival analysis in R .2023.

  25. Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48.

    Article  Google Scholar 

  26. Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8(1):1826.

    Article  PubMed  PubMed Central  Google Scholar 

  27. The UC. UniProt: the universal protein knowledgebase in 2023. Nucl Acids Res. 2023;51(D1):D523–31.

    Article  Google Scholar 

  28. Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, et al. The reactome pathway knowledgebase 2024. Nucl Acids Res. 2024;52(D1):D672–8.

    Article  PubMed  Google Scholar 

  29. Sproston NR, Ashworth JJ. Role of C-reactive protein at sites of inflammation and infection. Front Immunol. 2018;9:754.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Depraetere S, Willems J, Joniau M. Stimulation of CRP secretion in HepG2 cells: cooperative effect of dexamethasone and interleukin 6. Agents Actions. 1991;34(3):369–75.

    Article  CAS  PubMed  Google Scholar 

  31. Szalai AJ, van Ginkel FW, Dalrymple SA, Murray R, McGhee JR, Volanakis JE. Testosterone and IL-6 requirements for human C-reactive protein gene expression in transgenic mice. J Immunol. 1998;160(11):5294–9.

    Article  CAS  PubMed  Google Scholar 

  32. Iovino F, Engelen-Lee J-Y, Brouwer M, van de Beek D, van der Ende A, Valls Seron M, et al. pIgR and PECAM-1 bind to pneumococcal adhesins RrgA and PspC mediating bacterial brain invasion. J Exp Med. 2017;214(6):1619–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Organization WH. Meningitis 2023 [Available from: https://www.who.int/news-room/fact-sheets/detail/meningitis

  34. Weickert CS, Ligons DL, Romanczyk T, Ungaro G, Hyde TM, Herman MM, et al. Reductions in neurotrophin receptor mRNAs in the prefrontal cortex of patients with schizophrenia. Mol Psychiatry. 2005;10(7):637–50.

    Article  CAS  PubMed  Google Scholar 

  35. Schramm M, Falkai P, Feldmann N, Knable MB, Bayer TA. Reduced tyrosine kinase receptor C mRNA levels in the frontal cortex of patients with schizophrenia. Neurosci Lett. 1998;257(2):65–8.

    Article  CAS  PubMed  Google Scholar 

  36. Albert MN, Soledad A, Vı́ctor A, José ADRO, Joan B, Raquel O, et al. TrkB and TrkC signaling are required for maturation and synaptogenesis of hippocampal connections. J Neurosci. 1998;18(18):7336.

    Article  Google Scholar 

  37. Otal R, Martínez A, Soriano E. Lack of TrkB and TrkC signaling alters the synaptogenesis and maturation of mossy fiber terminals in the hippocampus. Cell Tissue Res. 2005;319(3):349–58.

    Article  PubMed  Google Scholar 

  38. Otnæss MK, Djurovic S, Rimol LM, Kulle B, Kähler AK, Jönsson EG, et al. Evidence for a possible association of neurotrophin receptor (NTRK-3) gene polymorphisms with hippocampal function and schizophrenia. Neurobiol Dis. 2009;34(3):518–24.

    Article  PubMed  Google Scholar 

  39. Horvath I, Jia X, Johansson P, Wang C, Moskalenko R, Steinau A, et al. Pro-inflammatory S100A9 protein as a robust biomarker differentiating early stages of cognitive impairment in Alzheimer’s disease. ACS Chem Neurosci. 2016;7(1):34–9.

    Article  CAS  PubMed  Google Scholar 

  40. Wang C, Klechikov AG, Gharibyan AL, Wärmländer SK, Jarvet J, Zhao L, et al. The role of pro-inflammatory S100A9 in Alzheimer’s disease amyloid-neuroinflammatory cascade. Acta Neuropathol. 2014;127(4):507–22.

    Article  CAS  PubMed  Google Scholar 

  41. Ha TY, Chang KA, Kim J, Kim HS, Kim S, Chong YH, et al. S100a9 knockdown decreases the memory impairment and the neuropathology in Tg2576 mice, AD animal model. PLoS ONE. 2010;5(1):e8840.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Conole ELS, Vaher K, Cabez MB, Sullivan G, Stevenson AJ, Hall J, et al. Immuno-epigenetic signature derived in saliva associates with the encephalopathy of prematurity and perinatal inflammatory disorders. Brain Behav Immun. 2023;110:322–38.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This research was funded in whole, or in part, by the Wellcome Trust. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. H.M.S and D.A.G. are students on the Translational Neuroscience PhD programme funded by Wellcome [21843/Z/19/Z, 108890/Z/15/Z]. R.F.H is supported through a MRC IEU Short-term Fellowship. R.E.M is supported by Alzheimer’s Society major project grant AS-PG-19b-010. Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates (CZD/16/6) and the Scottish Funding Council (HR03006). Genotyping and DNA methylation profiling of the GS samples was carried out by the Genetics Core Laboratory at the Edinburgh Clinical Research Facility, Edinburgh, Scotland and was funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award STratifying Resilience and Depression Longitudinally (STRADL; Reference 104036/Z/14/Z). The DNA methylation data assayed for Generation Scotland was partially funded by a 2018 NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation (Ref: 27404; awardee: Dr David M Howard) and by a JMAS SIM fellowship from the Royal College of Physicians of Edinburgh (Awardee: Dr Heather C Whalley). LBC1921 was supported by the UK’s Biotechnology and Biological Sciences Research Council (BBSRC), The Royal Society, and The Chief Scientist Office of the Scottish Government. LBC1936 is supported by the BBSRC, and the Economic and Social Research Council [BB/W008793/1] (which supports SEH), Age UK (Disconnected Mind project), the Milton Damerel Trust, and the University of Edinburgh. SRC, JEM and IJD are supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (221890/Z/20/Z). Methylation typing in the LBCs was supported by Centre for Cognitive Ageing and Cognitive Epidemiology (Pilot Fund award), Age UK, The Wellcome Trust Institutional Strategic Support Fund, The University of Edinburgh, and The University of Queensland. JMW is supported by the Dementia Research Institute funded by the UK Medical Research Council, Alzheimer’s Society and Alzheimers Research UK. MVH is supported by the Row Fogo Charitable Trust; SMM is supported by Age UK and the BBSRC. KMG was supported by an MRC University Unit grant to the MRC Human Genetics Unit. DSM PhD Fellowship was funded by the Royal College of Psychiatrists and the Masonic Charitable Foundation. JPB is supported by a MRC UKRI Programme Grant MR/X003434/1. A.D.C is supported by a Medical Research Council PhD Studentship in Precision Medicine with funding by the Medical Research Council Doctoral Training Programme and the University of Edinburgh College of Medicine and Veterinary Medicine. TCR is supported by the BBSRC and ESRC (BB/W008793/1), Chief Scientist Office and NHS Research Scotland, the Royal Society of Edinburgh, and Alzheimer Scotland. SEH is supported by the BBSRC/ESRC [BB/W008793/1]. PR is funded by the ASDRC (Alzheimer Scotland Dementia Research Centre). MEB is funded by a BBSRC grant and an NIH connectome grant.

Author information

Authors and Affiliations

Authors

Contributions

HMS, REM, JEM, and SRC were responsible for the conception and design of the study. HMS carried out the data analyses. HMS drafted the article. ADC, KMG, RFH, and DLMC contributed to methodology. AC, MVH, SMM, MEB, JMW, JC, AT, DP, IJD, DSM, and TCR contributed to the data collection and preparation. REM, SRC, JEM, and JPB supervised the project. All authors read and approved the manuscript.

Corresponding author

Correspondence to Riccardo E. Marioni.

Ethics declarations

Ethics approval and consent to participate

All components of GS received ethical approval from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89). GS has also been granted Research Tissue Bank status by the East of Scotland Research Ethics Service (REC Reference Number: 20-ES-0021), providing generic ethical approval for a wide range of uses within medical research. Ethics permission for the Lothian Birth Cohort 1936 (LBC1936) was obtained from the Multi-Centre Research Ethics Committee for Scotland (Wave 1: MREC/01/0/56), the Lothian Research Ethics Committee (Wave 1: LREC/2003/2/29), and the Scotland A Research Ethics Committee (Waves 2, 3, 4 & 5: 07/MRE00/58). Ethics permission for the Lothian Birth Cohort 1921 (LBC1921) was obtained from the Lothian Research Ethics Committee (Wave 1: LREC/1998/4/183; Wave 2: LREC/2003/7/23; Wave 3: LREC1702/98/4/183) and the Scotland A Research Ethics Committee (Wave 4: 10/S1103/6; Wave 5: 10/MRE00/87).

Consent for publication

Not applicable.

Competing interests

R.E.M is an advisor to the Epigenetic Clock Development Foundation, and Optima partners. R.F.H. has received consultant fees from Illumina and Optima partners. D.A.G. has received consultant fees from, and is currently employed in part-time capacity, by Optima Partners. D.L.McC. and is currently employed in part-time capacity, by Optima Partners. All other authors declare no competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Supplementary file containing additional information on methodology used in this study.

Additional file 2: Figure S1

. General Cognitive function in GS. Path diagram describing the measurement model of general cognitive function in GS. Model fit measures can be found in Additional file 3: Table S6 and loadings in Additional file 3: Table S7. Figure S2. General Cognitive function in LBC1936. Path diagram describing the measurement model of general cognitive function and change in LBC1936. Model fit measures can be found in Additional file 3: Table S6 and loadings in Additional file 3: Table S7. Figure S3. General Cognitive function in LBC1921. Path diagram describing the measurement model of general cognitive function and change in LBC1921. Model fit measures can be found in Additional file 3: Table S6 and loadings in Additional file 3: Table S7. Figure S4. Time-to-dementia in GS and LBC1936. FDR significant Hazard ratio for EpiScores with incident dementia for the mixed effects Cox models in GS. The Hazard ratios for LBC1936 from the coxPH model for the same EpiScores have been included for comparison despite being non-significant. The Hazard ratios for the competing risk models for GS and LBC1936 are also shown for comparison. All error bars represent 95% confidence intervals [95% CI].

Additional file 3: Table S1

. Descriptive statistics for cognitive tests in Generation Scotland. Table S2. Descriptive statistics for cognitive tests in LBC1936. Table S3. Descriptive statistics for cognitive tests in LBC1921. Table S4. Descriptive statistics for brain MRI measures in LBC1936. Table S5. Descriptive statistics for Dementia. Table S6. Fit measures for measurement models of general cognition in all three cohorts. Table S7. Cognitive test loadings on General cognitive function in all three cohorts. Table S8. Fit measures for MRI measure of brain health measurement models in LBC1936. Table S9. Loadings for MRI measures of brain health onto intercept and slope in LBC1936. Table S10. Descriptive statistics for Generation Scotland covariates. Table S11. Descriptive statistics for LBC1936 covariates. Table S12. Descriptive statistics for LBC1921 covariates. Table S13. Results from cognitive function models in GS, LBC1936 and LBC1921. Table S14. Meta-analysis of cognitive function associations with 84 EpiScores in GS, LBC1921, and LBC1936. Table S15. Biological function and pathway look-up in UniProt and Reactome databases of significant protein EpiScores. Table S16. Results from cognitive change models in LBC1936 and LBC1921. Table S17. Summary of associations with cognitive function in all 3 cohorts. Table S18. Results from LBC1936 global MRI measures at baseline (intercept). Table S19. LBC1936 global MRI measures across four waves in association with 84 EpiScores. Table S20. Summary of LBC1936 EpiScore associations with MRI measures of Brain health. Table S21. Logistic regression analysis results testing EpiScore associations with binary dementia diagnosis in GS, LBC1936 and LBC1921. Table S22. coxPH analysis results testing associations between 84 EpiScores and time-to-dementia in GS and LBC1936. Table S23. Mixed effects cox analysis results testing associations between 84 EpiScores and time-to-dementia in GS while accounting for family structure. Table S24. Competing risk analysis results testing associations between 84 EpiScores and time-to-dementia while accounting for the competing event of all-cause mortality in GS and LBC1936. Table S25. Meta-analysis of dementia (binary) associations with 84 EpiScores in GS, LBC1921 and LBC1936. Table S26. Meta-analysis of EpiScore associations with time-to-dementia in GS and LBC1936.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Smith, H.M., Moodie, J.E., Monterrubio-Gómez, K. et al. Epigenetic scores of blood-based proteins as biomarkers of general cognitive function and brain health. Clin Epigenet 16, 46 (2024). https://0-doi-org.brum.beds.ac.uk/10.1186/s13148-024-01661-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s13148-024-01661-7

Keywords