Novel genes and variants associated with longevity in Bulgarian centenarians revealed by whole exome sequencing DNA pools: a pilot study

Aim: To determine specific genetic loci that might be associated with longevity in Bulgarian population by analyzing exome pool-seq data from centenarians and a control group. Methods: We performed whole-exome sequencing of two DNA pools, set up of 32 Bulgarian centenarians and 61 young healthy controls, respectively, and 59935 quality filtered variants were concurrently detected in both pools. Fisher’s exact test was employed to establish the significance of allele frequency difference between the pools. Results: Forty seven variants showed significantly higher allele frequency in the centenarian compared to the control pool, and these can be considered to be positively associated with longevity in Bulgarian populaton. Based on their assigned functional role, three genes containing three of these variants were further investigated. These genes, RNF43 , WNK1 and NADSYN1 , are involved in evolutionary conserved processes with well ascertained association with longevity, i.e., Wnt signaling pathway, insulin/IGF-1 signal pathway and redox balancing processes, respectively. For the remaining genes exhibiting variants with significantly higher allele frequency in the Page 2 Serbezov et al. J Transl Genet Genom 2020;4:[Online First] I http://dx.doi.org/10.20517/jtgg.2020.41 Bulgarian centenarian pool there is not enough evidence about their functional role in determining longevity and further research is needed. Conclusion: The results confirm the importance of studying centenarians in different populations to discover those combinations of variants that associate with longer health span.

Bulgarian centenarian pool there is not enough evidence about their functional role in determining longevity and further research is needed.

INTRODUCTION
The European Union is currently undergoing significant demographic change. The average life expectancy for both men and women continues to increase while the birth rate is declining [1] . As a result of these trends European countries have presently attained the lowest birth rates and among the highest life expectancy rates in the world. The European Commission predicts that this trend will continue over the next few decades, resulting in a significant increase in the proportion of retired people in the population and a concordant decrease in the proportion of working-age individuals. After 50 years, the ratio between workers and retirees is expected to fall from 4:1 to 2:1. Such population aging will unavoidably have consequences on the economic and health sectors. Bulgaria is no exception to these demographic trends, and it is ranked number seven on the list of countries with the highest proportion of people over the the of 65, after Japan, Italy, Greece, Germany, Portugal and Finland. According to the National Statistical Institute at the end of 2015, 20.4% of the Bulgarian population was over this age, and this percentage is increasing each year [2] .
Aging is a genetic process that leads to a decline in the ability of the human body to maintain homeostatic balance. It is one of the most significant risk factors for a number of diseases as it leads to progressive degeneration of tissues and organs. The biological mechanisms behind the complex aging process are not yet fully understood, in humans or in model organisms. Studies of long-lived twins, including centenarian siblings, have shown that longevity has a strong genetic component [3] . However, still only a handful of genetic factors have repeatedly been shown to be associated with extreme longevity [4,5] .
Genetic association studies in centenarians from different populations have revealed that a variety of metabolic, cellular and tissue maintenance mechanisms influence aging [6,7] . Longevity has been shown to be affected by variations in the sequence or expression of genes related to the preservation of telomere length, DNA damage repair, tolerance to stress and heat-shock response, as well as the degree of accumulation or restriction of free radicals. In addition, clinical trials on centenerians have found that variation in genes involved in lipoprotein metabolism (e.g., APOE and APOB), cardiovascular homeostasis, immunity, and inflammatory processes may also contribute to prolonging life and preventing diseases.
Centenarians are individuals who, having attained the full extent of human longevity, present a unique opportunity to gain medical and genetic insight. Their genome is presumably devoid of pathological variants with significant penetrance and/or it carries protective alleles against various environmental risk factors. Centenarian genome can thus be considered the "gold standard" for establshing genetic factors predisposing to longevity. Whole-genome sequencing could facilitate uncovering some of these factors, and it might even unravel the physiological mechanisms impacting longevity. Decoding the molecular mechanisms that govern ageing could facilitate the development of strategies and therapies for prolonging healthy life expectancy, and a key aspect of that is also the preservation of cognitive abilities and physical activity. Extending the working age has major social implications as older people acquire the opportunity to be an active and contributing part of the society.
Ageing is a complex process regulated by diverse physiological mechanisms and longevity is determined by the combined effect of multiple genetic and environmental factors. A multifactorial approach is needed to analyze the complex network of interactions between different genes, as well as the contribution of each gene to longevity. Populations differ in genomic characteristics, and it is conceivable that centenarians in America could have different adaptive mechanisms than centenarians in Europe. Studying the genome of long-lived individuals at population level will contribute to more in-depth understanding of genetic factors related to health and longevity.
Population frequency of centenarians is very low on a worldwide basis (0.006%) and there are marked diffrences between countries, e.g., 0.002 % in India and as high as 0.048% in Japan. In Bulgaria, their frequemcy is 0.0036% [8] .
The cost-effectiveness of sequencing pools of individuals (Pool-seq) provides the basis for the popularity and wide-spread use of this method for many research questions, e.g., unravelling the genetic basis of complex traits. Pool-seq methods have been repeatedly shown to provide reliable estimates of allele frequencies [9] . Pool-seq also has the great advantage of allowing flexible re-analysis of new genes from the same data without the need for repeated sequencing.
The aim of the present study is to determne specific genetic loci that might be associated with longevity by analyzing exome pool-seq data of Bulgarian centenarians and a control group.

METHODS
The study was approved by the Ethics committee of the Medical University of Sofia and was found to be in accordance with the requirements of national and international legislation for conducting research involving human subjects. To ensure full compliance with the principles of information regulations, personal data protection and the right to privacy, participants received prior information about the aim, objectives and the methods used in the project, e.g., sampling, data analyses, etc., so that they can get acquainted in detail and make informed decision about taking part in the project. The subjects were also required to sign consent before biological samples were taken. Tissue samples (saliva or blood sample) were collected from a group of 32 unrelated Bulgarian centenarians (100-106 years old) from different geographical regions, selected to be capable of walking independently after the age of 90. The control group was set up of 61 young healthy individuals (25-30 years old) ethnically matched but unrelated, neither to each other or to the subjects in the centenarian group. The subjects in this study were selected in such a way as to minimize the effects of the environment, and potential population stratification or admixture. The control subjects have survived the prevailing children's diseases, yet the probalility that they will reach extreme longevity is low as they are still susceptible to diseases due to their lifestyle, and these affecting middle and later age individuals. The clinical team, with the assistance of geruntologists, designed the questionnaire in order to gather information about subjects' lifestyle, medical history, neurological status, movement independence, cardiovascular disease, cancer, diabetes, etc. The questionnaire included questions on their nutrition, tobacco and alcohol consumption, physical activity, alongside other potentially significant factors, e.g., social contacts, positive mood, stress periods experienced, financial problems, presence of long-lived family members.
DNA was extracted using QIAamp DNA Blood Mini Kit (Qiagen) and equimolar amounts of DNA were used to prepare the two pools. These were whole exome sequenced using BGI v4 chemistry on a BGISEQ-500 platform (by BGI Genomics) at a mean 250x coverage. Such high coverage is required for pool-seq sequencing to ensure that alleles with low frequency are also detected [10] . Variant calling was performed using GATK and the obtained. VCF files were annotated using the web-based platform wANNOVAR [11] . Following the 'best practice' recommendations for pool-seq data [9] , we performed robust filtering on variant calling: genotype quality ≥ 99, mapping quality ≥ 60, number of reads per MAF > 2, total depth of coverage above 30 and below 500. The number of variants annotated simultaneously in both pools after applying these filters was 59,935 (52,870 SNPs and 7,065 indels). The number of allele reads for each variant was used to construct contingency tables and the Fisher's exact test was then deployed to evaluate the significance of the allele frequency differences. The allele frequency estimates obtained were compared with values taken from the publicly available resource for exome sequencing data, The Genome Aggregation Database (gnomAD) [12] . False discovery rate (FDR) adjustment of Benjamini and Hochberg [13] was used to reduce the number of false positives. All statistical analyses were performed using R scripts [14] .
The following criteria were used to identify variants that are likely to be positively associated with longevity in Bulgarian centenarians: (1) significantly higher estimated frequency in centenarians compared to the control group; (2) prioritization of genes accommodating the variant according to their molecular function, e.g., being a part of signaling network, evidence for association with longevity in humans or model organisms; (3) evidence for interaction with variants in other genes known to be associated with longevity; and (4) the impact of specific nucleotid change according to various software algorithms.

Centenarian lifestyle, living environment amd medical history
Only a small proportion of the sampled Bulgarian centenarians follow vegetarian diet (2%), 84% consume salty foods, 86% sugar containing and 40% animal fat containing foods. They report that they consume fish, albeit only occasionally, in contrast to the centenarians in the Mediterranian region. Overall, the diet of the Bulgarian centenarians is similar to the typical diet of the country's population. They are also generally of normal body habitus, corroborating the role of excess body fat as a risk factor for longevity. As much as 76% report, whereas the remaing 24% are uncertain about, the presence of long-living family members. These data support the significance of genetic factors in determining human longevity. All centenarians are with well preserved memory function, and 93% state positive life attitude. Seven percent report that they are occasional smokers, in contrast to the country's general population where 30% are regular smokers [15] , and 63% state that they refrain from consuming alcohol, or do so only occasionally. All interviewed centenarians claim that they maintain moderate (62%), and even high (38%), physical activity that includes agricultural or domestic activities, sports, long walks, etc. This also contrasts with the general population where 40% report moderate and only 9% with high physical activity, respectively [15] .

Whole exome pool-seq data
As an initial step in our analysis, the correlation coefficient between the estimated allele frequencies of the detected variants was calculated between the two pools, Pearson's r = 0.89, as well as between the Bulgarian control pool and estimates for non-Finnish European population, Pearson's r = 0.93 [ Figure 1].
Fisher's exact test identified a number of variants that differ significantly between the Bulgarian centenarian and control pools, and these are visualized on a Manhattan [ Figure 2].
Of the 91 varaints estimated to differ most significantly between the two pools, those above the line (FDR adj. P-value = 5.0 × 10 -8 ) [ Figure 2], 47 had significantly higher frequency in the centenarian pool (black dots above the identity line on Figure 3, Supplementary Table 1) and 44 in the control pool (blank dots below the identity line on Figure 3).
It can be speculated that variants found to have significantly higher frequency in the centenarian pool are positively associated with longevity, whereas variants found to have significantly lower frequency in the centenarain pool are negatively associated with longevity in the Bulgarian population. We used publicly available data bases and performed literature survey in order to establish the functional role of the genes with variants positively associated with longevity. As a result, three variants were nominated to be putatively associated with longevity: rs2526374 in the RNF43 gene, rs956868 in the WNK1 gene and rs2276362 in the NADSYN1 gene [ Table 1].

DISCUSSION
In this study, whole exome sequencing was performed on two DNA pools, one set up with Bulgarian centenarians and the other with young and healthy indivividuals serving as a control group. As an initial step in our analyses we compared the allele frequency estimates from the Bulgarian control group with estimates from the non-Finnish European population [12] . Besides testifying for the reliability of the pool-seq method used, the high correlation coefficient of allele frequencies between the two groups demonstrates that there is a considerable proportion of variants that markedly differ in frequency between the two populations. This inter-population heterogenety calls for the need to establish population-specific allele frequency databases that centenarian exomes can be compared to.
Of approximately sixty thousand variants detected in both pools, 91 variants differed significantly in allele frequency between the two pools. Among them 47 variants in 43 genes were with higher frequency in the centenarian pool and could therefore putitively be associated with extreme human longevity, acting along and in combination with environmental factors in the Bulgarian population. These variants are with estimated high population frequencies [Supplementary Table 1], as is often the case with variants associated with complex traits since such traits are formed by many genetic factors with small effect. Of these 43 genes, only one (GSTZ1) is the listed in LongevityMap database, a comprehensive online database of longevity associated genes and variants [16] . In order to establish common molecular mechanisms for these 43 genes, we performed pathway analyses using ToppGene [17] and Reactome [18] platforms. We could however not establish any common significant pathways associated with this set of genes. Based on their known molecular function established from literature survey, we nominated three variants in genes linked to familiar genetic pathways for longevity. Below we discuss each of these genes and variants, and consider the mechanisms they might be involved in that result in extending longevity in Bulgarian centenarians.

RNF43 (rs2526374)
The protein encoded by this gene is thought to negatively regulate Wnt signaling, an important, evolutionary conserved molecular pathway, involved in cell proliferation, tissue homeostasis and maintenance of stem cells in adults. As organisms age, proliferating cells stop dividing and stem cells are lost, but Wnt signaling counteracts this by maintaining stem cells in their niches with positive effects on neurogenesis and bone regeneration [19,20] . Mutations in the Wnt signaling pathway have been shown to lead to a variety of developmental defects in animals [21] . Mutations in the RNF43 gene have been reported in multiple tumor cells, including colorectal and endometrial cancers [22] . Identifying evolutionarily conserved genes, pathways and mechanisms involved in regulating lifespan and life history is a central goal of aging research and has direct implications to human health, not least because most mechanistic aging research is carried out in model organisms.

WNK1 (rs956868)
The protein encoded by this gene is a serine/threonine kinase, playing a major role in cell signaling, proliferation and cell survival. This gene is a downstream effector of the insulin/IGF-1 signal pathway that has been shown to regulate longevity in various model organisms. WNK1 has also been shown to regulate the activity of the longevity associated transcriptional factor FOXO4 [23] , which regulates lifespan extension, tumor suppression, and energy metabolism [24] . The encoded protein may also be a key regulator of blood pressure by controlling the transport of sodium and chloride ions. Specifically, the rs956868 SNP in the WNK1 gene is a missense variant, and has been shown to be associated with high blood pressure [25,26] .

NADSYN1 (rs2276362)
The gene NADSYN1 regulates NAD+, an important cofactor that plays a central role in metabolism, best known for being a coenzyme in redox reactions and as a signaling molecule. NAD+ levels fall as we age, and the levels of NAD+ itself may be the reason for extended longevity [27] . The NADSYN1 gene has been shown to regulate longevity in model organisms [28] . The promoter of NADSYN1 has a FOXO3A recognition sequence and NADSYN1 transcription is promoted through phosphorylation of FOXO3A [29] , one of the handful genes repeatedly shown to be asscociated with longevity [7] . The rs2276362 SNP has been shown to be linked to higher vitamin D status in the human body [30] , which in turn has been shown to promote protein homeostasis and hence also longevity [31] .
In conclusions, the results of this study reveal new sides of the complex genetic regulation of longevity and suggest 47 new variants that could potentially be associated with longevity in the Bulgarian population as these are found in significantly higher allele frequency in the Bulgarian centenarian sample. Based on their assigned functional role, three of these genes were further investigated. These genes, RNF43, WNK1 and NADSYN1, are involved in evolutionary conserved processes with well ascertained association with longevity, i.e., Wnt signaling pathway, insulin/IGF-1 signal pathway and redox balancing processes, respectively. Genes containing new longevity variants are linked to major subnets of longevity genes. This is only a pilot study, and these results will be augmented with data from whole genome sequencing of the same subjects that we plan to perform in the near future.

Authors' contributions
Made substantial contributions to conception and design of the study and performed data analysis and interpretation: Serbezov D, Balabanski L, Karachanak-Yankova S, Toncheva D Performed data acquisition, as well as provided administrative, technical, and material support: Vazharova R, Nesheva D, Hammoudeh Z, Staneva R, Mihaylova M, Damyanova V, Antonova O, Nikolova D, Hadjidekova S

Availability of data and materials
Data used for the analyses in this article can be provided by the authors upon request.

Financial support and sponsorship
The Bulgarian centenarians project was funded by the National Science Fund of Bulgaria (DN 03/7/18.12.2016), and Bulgarian Ministry of Education and Science under the National Program for Research "Young Scientists and Postdoctoral Students".