EHMT1 pathogenic variants and 9q34.3 microdeletions share altered DNA methylation patterns in patients with Kleefstra syndrome

Aim: Kleefstra syndrome (KS) is a rare neurodevelopmental disorder caused by haploinsufficiency of the euchromatic histone lysine methyltransferase 1 gene, EHMT1 , due to either a submicroscopic 9q34.3 deletion or a pathogenic EHMT1 variant. KS is characterized by intellectual disability, autistic-like features, heart defects, hypotonia and distinctive facial features. Here, we aimed to (1) identify a unique DNA methylation signature in patients with KS, and (2) demonstrate the efficacy of DNA methylation in predicting the pathogenicity of copy number and sequence variants. Methods: We assayed genome-wide DNA methylation at > 850,000 CpG sites in the blood of KS patients (n = 10) carrying pathogenic variants in EHMT1 or 9q34.3 deletions, as compared to neurotypical controls (n = 42). Differentially methylated sites were validated using additional KS patients (n = 10) and controls (n = 29) to assess specificity and sensitivity of these patterns. Results: The DNA methylation signature of KS demonstrated high sensitivity and specificity; controls and KS patients with a confirmed molecular diagnosis were classified correctly. In additional individuals with EHMT1 alterations, including frameshift or missense variants and partial gene duplications, DNA methylation classifications were consistent with clinical presentation. Furthermore, genes containing differentially methylated CpG sites were enriched for functions related to KS features, including heart formation and synaptic activity. Conclusion: The KS DNA methylation signature did not differ in patients with deletions and variants, supporting haploinsufficiency of EHMT1 as the likely causative mechanism. Beyond this finding, it provides new insights into epigenetic dysregulation associated with KS and can be used to classify individuals with uncertain genomic findings or ambiguous clinical presentations.


INTRODUCTION
Epigenetic regulators, including chromatin remodelers and enzymes that write, read or erase epigenetic marks, are essential to healthy human development [1,2] . Chromatin states and epigenetic patterns play a key role in regulating transcriptional profiles specific to cellular identity and developmental timing [3,4] . This is especially important during neurodevelopment for which developmental processes necessitate a complex orchestration of gene expression and environmental signals [5,6] . Recent research into the role of epigenetic regulators in human disease has demonstrated that genes encoding this machinery, termed "epigenes", are linked to Mendelian neurodevelopmental disorders (NDDs) [2,7] . To date, approximately 70 epigenes have been implicated in 82 distinct conditions, many of which are characterized by intellectual disability (ID) and growth dysregulation [8,9] .
At the molecular level, our group and others have found that these NDDs are characterized by aberrant DNA methylation (DNAm) patterns [10,11] . DNAm refers to the addition of a methyl group to cytosine, typically in the context of a cytosine-guanine dinucleotide or CpG. CpG dense regions, known as CpG islands, are found at 70% of gene promoters; in this context, DNAm is usually a repressive mark, corresponding to the silencing of gene activity [12] . DNAm found in enhancers, gene bodies and intergenic regions have a more complex relationship with gene activity and may associate with the repression or activation of genes. Importantly, there is cross-talk between epigenetic marks, such as DNAm and histone 3 lysine 9 (H3K9) trimethylation, exhibiting spatial and temporal co-localization [13,14] . As such, mutations in epigenes encoding enzymes involved in regulation of histone modifications and chromatin packaging often result in downstream DNA methylation alterations in patients carrying these mutations [10,11,[15][16][17] . These altered DNAm patterns reflect genome-wide transcriptional changes at the downstream target genes that constitute the molecular underpinnings of the pathophysiology of the associated NDD. As such, we hypothesize that through a similar cross-talk mechanism between DNAm and H3K9 methylation, pathogenic variants in EHMT1 can affect DNAm patterns Kleefstra syndrome or KS (MIM 610253) is caused by haploinsufficiency of the euchromatic histone lysine methyltransferase 1 gene, EHMT1 [18,19] . EHMT1 encodes a histone methyltransferase that catalyzes monoand dimethylation of H3K9. As an important transcriptional repressor, EHMT1 is expressed in most human tissues and is overexpressed in a variety of human cancers [20] . KS is caused by either a heterozygous 9q subtelomeric deletion, which overlaps with part or all of EHMT1 (50%) or a heterozygous pathogenic variant in EHMT1, including frameshift, missense and nonsense mutations (50%) [21,22] . Features of KS include moderate to severe ID, childhood hypotonia, seizures, heart defects and characteristic facial features including brachy(-micro)cephaly, synophrys, cupid bowed upper lip and prominent jaw [18,[23][24][25] . As well, urogenital and renal complications, psychiatric disorders, and features of autism spectrum disorder (ASD) are often present. Males and females are affected equally [22] . There is some evidence of genotypephentoype correlation in that individuals with EHMT1 pathogenic variants and those with a small 9q34.3 deletion (< 1 Mb) have similar clinical findings, whereas individuals with larger deletions (> 1 Mb) can have more severe ID and more medical problems [19,21,26] .
Here, we report the differential DNAm patterns associated with KS. We use these patterns to derive a predictive model to classify individuals with EHMT1 sequence variants of uncertain significance or ambiguous clinical presentations. We found that patients with KS caused by either 9q34.3 microdeletions or pathogenic EHMT1 variants exhibit a specific DNAm signature that is unique from both typically developing controls and individuals with similar epigene-related disorders. As well, these differentially methylated CpG sites map to genes enriched for heart and brain development. KS patients were divided into groups based on age, diagnosis and molecular findings [ Table 1] and availability of clinical phenotype data; importantly, the frequency of deletions and variants in this sample is reflective of the reported frequency in the greater population of KS cases. The first group encompassed individuals over the age of one year with a confirmed clinical diagnosis of KS and a pathogenic EHMT1 sequence variant or 9q34 deletion reported by a molecular diagnostic laboratory. These individuals were separated into discovery and validation groups (n = 10 and n = 5, respectively), such that molecular underpinnings, sex, and age were represented within each group. For all patients in the discovery cohort, clinical and molecular diagnoses were defined by certified clinical and molecular geneticists, respectively. Individuals younger than 1 year represented an additional validation group (n = 5), and individuals with EHMT1 variants but little or no phenotypic information available at the time of data analysis (n = 5) represented an "unknown" test group. Two additional individuals who both carried duplications of chromosome 9q34 were also included in analyses. The presence of specific KS features, such as hypotonia, microcephaly, or heart defects, was limited to a subset of KS patients, and as such, were not included as variables in the statistical analysis. Also, since only one individual with KS carried a deletion greater than 1 Mb, patient KS17_I, deletion size was not considered in the analysis.

Research participants
Banked DNA samples from age-and sex-matched neurotypical participants (n = 42) were included as a control group. These individuals were recruited from the Hospital for Sick Children and were deemed typically developing by physician or parental questionnaires. DNA methylation data for individuals with Nicolaides-Baraitser syndrome, an neurodevelopmental disorder (NDD) caused by pathogenic variants in SMARCA2, were downloaded from the Gene Expression Omnibus database, accession number: GSE125367 [11] . These individuals were included as an additional "control" group to assess the specificity of the DNAm patterns to KS, as opposed to Nicolaides-Baraitser syndrome, an NDD with similar features, which is also caused by epigenetic dysregulation.

DNAm data generation and preprocessing
Genomic DNA was extracted from peripheral blood samples and bisulfite converted using the protocols described in Chater-Diehl et al. [11] . Converted DNA was then assayed for DNAm levels on the Illumina Infinium MethylationEPIC array (EPIC array; > 850,000 CpG sites) at The Center for Applied Genomics Raw data were then processed in R statistical software, using the package minfi [27] . Quality control measures included removing probes that failed detection P-value, meaning the signal was not significantly above background noise, as well as probes mapping to X and Y chromosomes, cross-reactive probes and SNP probes [28,29] . All criteria and methods for pre-processing are fully described in Chater-Diehl et al. [11] . Following these steps, data underwent background signal subtraction and control normalization also using minfi [29] . The normalized data consisted of 774,583 methylation sites or CpGs for each sample. DNAm, measured in b values, ranges 0-1 representing percent methylation.

DNAm signature derivation
Prior to statistical analysis, underlying proportions of monocytes, neutrophils, CD4T, CD8T, natural killer cells and B cells were estimated from the DNAm data using the Houseman algorithm [30] . At each CpG site, a two-group comparison of KS discovery cases vs. controls was performed using limma regression, accounting for sex, age, batch and estimated blood cell proportion covariates [31] . CpG sites found to be differentially methylated between cases and controls were reported if they met both a statistical significance [false discovery rate (FDR)-corrected P-value < 0.01] and a minimum effect size (absolute Δb >10%). Δb represents the difference in average DNAm (b) between groups. Principal component analysis (PCA) and hierarchical clustering were generated using Qlucore Omics Explorer (QOE, www.qlucore.com).

SVM model classification
Statistically significant CpG sites, i.e., the DNAm signature, were used as input into a machine-learning algorithm, support vector machine (SVM), to generate a predictive classification model. To remove noise and to filter out information that did not improve the efficacy of the model, we first removed redundant sites. Any methylation site that was highly correlated (r > 0.9) with any other site was removed, leaving 429 CpG sites. We then built an SVM model using the R package caret (for details of model training and validation, see Butcher et al. [10] ) [32] . The classification model generated by SVM was then applied to all remaining samples. The output of this model was a probability score indicating likelihood of having KS or a genomic alteration that causes KS.

GO analysis
Gene ontology (GO) enrichment analysis was performed on the KS signature sites using GREAT (Genomic Regions Enrichment of Annotations Tool) [33] . We used a custom "background" that included all 774,583 CpG sites that passed quality control. "Basal+extension" was used to identify associated genes, using the following modified parameters: constitutive 5.0 kb upstream and 1.0 kb downstream, up to 10.0 kb maximum extension. We also refined the output by requiring that significant terms contain two or more gene hits.

Identifying a DNA methylation signature for Kleefstra syndrome
To define a DNAm signature associated with KS, DNA from KS patients and neurotypical controls was extracted from blood and assayed using the EPIC array, generating high-quality measurements at 774,583 CpG sites. Ten unrelated individuals with a confirmed clinical diagnosis of KS, samples KS1_T -KS10_T, and pathogenic variants in EHMT1 or microdeletions of 9q34.3, which included partial or full deletions of EHMT1 (n = 3 and n = 7, respectively; n = 6 females; age 1-25 years) were compared to 42 neurotypical controls (n = 21 females; age 1-28 years). Since we combined data from patients with pathogenic variants in EHMT1 and those with 9q34.3 microdeletions together, our analyses identified DNAm changes common to both molecular causes (analysis of this cohort stratified by molecular alteration did not identify differential methylation; see below). Figure 1 shows the genomic position of all deletions and variants, converted to hg38 genome build. For patients KS5_T and KS7_T, deletion coordinates were not available. KS5_T carried a 0.4-Mb deletion and KS7_T carried a 0.32-Mb deletion, both of which overlapped EHTM1.
Differentially methylated CpG sites were identified using a linear regression which included sex, age, blood cell proportions and array batch as covariates. There was no significant difference in estimated blood cell proportions between groups (all FDR-corrected P-values > 0.05). A total of 598 CpG sites differed significantly between individuals with KS and controls at an FDR-corrected P-value < 0.01 and a minimum mean difference between groups (Δb) of 10% [Supplementary Table 1]. Also, no CpG site was significantly associated with age (FDR P-value > 0.05). The 598 CpG sites constituted a "DNAm signature" of KS. Both hierarchical clustering and principal component analysis of DNAm values at the signature sites clearly distinguished the KS case group from neurotypical controls [ Figure 2]. Of note, 484 signature sites (81%) were hypomethylated in KS patients as compared to controls. Furthermore, CpG sites mapping to CpG islands, commonly found at gene promoters, were significantly underrepresented as compared to all CpG sites that passed quality control [Supplementary Figure 1].

Ontology of KS signature genes
A number of genes containing multiple differentially methylated CpGs sites or "signature sites" functioned in pathways related to the KS phenotype. These included: ELAVL4 (4 promoter CpGs; mean Δb = -11.4%), involved in neuron-specific RNA processing; glutamate receptor GRIA1 (2 promoter CpGs; mean Δb = -11.4%); CHRND1 (2 promoter CpGs; mean Δb = -11.4%), a muscle acetylcholine receptor subunit, involved in neuromuscular transmission; and PDE4D. Pathogenic variants in the latter are associated with Variants are colored by analysis group: discovery (n = 8; 2 not depicted), validation older than 1 year (n = 4; 1 not depicted), validation samples younger than 1 year (n = 4; 1 not depicted), unknown test samples with no phenotype information and partial EHMT1 duplication (n = 5 and n = 2, respectively). Genomic coordinates were not available for KS15_T, KS15_T, KS15_V, and KS20_I acrodysostosis 2, a skeletal disorder characterized by facial anomalies and ID [34] . PDE4D contained two hypomethylation CpGs approximately 110 bp apart and located within 1.5 kb upstream of the transcription start site, cg18804667 (Δb = -16.1%) and cg00322656 (Δb = -10.7%). As such, we performed gene ontology (GO) analysis on the 598 signature sites using GREAT [33] . GREAT identified 130 genes proximal to the signature sites, which met the criteria of less than 15 kb upstream or 11 kb downstream of the gene's transcription start site, revealing that the majority of signature sites (408) were intergenic. Enriched terms or pathways in cellular components and human phenotype ontology are listed in Table 2. The most prominent finding was the identification of pathways and processes involved in neuronal and synaptic function across all ontologies.
The single enriched human phenotype was hypoplastic heart. Congenital heart defects are a core feature of KS, including one reported severe presentation of KS with hypoplastic left heart syndrome [35] . Two genes, CHRND and DTNA, containing differentially methylated CpG sites were annotated to this GO term.

Independent validation of Kleefstra syndrome DNA methylation signature
Using the KS DNAm signature, we developed a machine learning classification model capable of categorizing individuals as positive or negative for KS on the basis of their DNAm levels at signature sites.
We trained a SVM model on data from the KS discovery group (n = 10) and neurotypical controls (n = 42) used to derive the signature. To test the specificity of the KS DNAm signature, we assessed an additional 29 neurotypical controls (n = 14 female, age 1 month to 16 years), all of which classified as negative for KS (i.e., with controls) demonstrating 100% specificity of the signature [ Figure 3]. Additionally, we assessed whether the KS DNAm signature could be used to classify patients (n = 8) with Nicolaides-Baraitser syndrome (NCBRS), a neurodevelopmental disorder with some clinical features that overlap KS, including ID, ASD and seizures but with distinct facial characteristics. NCBRS is caused by haploinsufficiency of SMARCA2, which encodes a protein that is part of another epigenetic regulator (SNF/SWI chromatin remodeling complex). Pathogenic variants in SMARCA2 have been shown to be associated with a distinct DNAm signature [11] . All eight samples were classified as negative for KS, with controls (n = 5 female, age 4-15 years), providing further evidence of the specificity of the KS DNAm signature.
Finally, we tested two individuals carrying partial duplications of EHMT1, both of which mapped to the last two exons of the gene [26,27] . Duplications with similar boundaries in individuals with variable phenotypes, including ID and dysmorphic features, have been previously reported as benign [36] . Both patients, Dup1 and Dup2, were classified as negative for KS. Patient Dup1 had dysmorphic features and also carried an unbalanced 31.8-Mb complex rearrangement on chromosome 6p, as well as a microdeletion on 6q27  [37] . DNAm-based classification of these individuals suggests that neither 9q34.3 CNV disrupted EHTM1 gene function that resulted in haploinsufficiency of the EHMT1 protein.

EHMT1 variant classification
Having illustrated the efficacy of the KS DNAm signature in appropriately classifying individuals with clinical diagnosis of KS, we next assessed five individuals with EHMT1 variants for whom we had limited or no phenotypic data available at the time of analysis, samples U1-U5. Following classification, if phenotype information was available (beyond an ASD diagnosis), it was accessed. Four of these individuals, samples U1, U3, U4 and U5, had a diagnosis of ASD and had undergone whole-genome sequencing; the remaining individual, U2, had no phenotypic or clinical information available but carried an EHMT1 variant identified by targeted EHMT1 testing. The variants identified in these five patients were assessed for predicted pathogenicity using Alamut variant annotation software, which applies multiple prediction  Table 3] and found to be novel in gnomAD (v3).
Using the KS DNAm signature, the two patients carrying frameshift mutations, U1 and U2, were classified as positive for KS. Following classification, additional clinical information for U1 was obtained and included mild ID, pulmonary stenosis, genital malformations, dysmorphic facial features and mild hypotonia. Such features support a clinical diagnosis of KS. No additional phenotypic information was available for U2. However, given that this individual underwent targeted EHMT1 gene testing, it is likely that her healthcare providers had a high clinical suspicion of KS.
All three individuals with missense variants, U3-U5, were classified as negative for KS [ Table 3]. For patient U3, this classification was supported by all pathogenicity prediction algorithms. This individual was reported to have ASD, macrocephaly, and obesity; macrocephaly is not typically reported in KS. For patients U4 and U5, sequence-based pathogenicity predictions were inconsistent. Clinically, patient U4 was described as having ASD, obesity and asthma. Patient U5 was diagnosed with PDD-NOS (pervasive developmental disorder -not otherwise specified), with no history of motor delay or growth abnormalities (50th percentile for height, weight and head circumference at approximately three years). Both phenotypes were inconsistent with KS clinical features, thus supporting non-KS classification.

Comparing DNAm patterns in individuals EHMT1 variants and 9q34.3 microdeletions
Using all individuals that classified positively, we next wanted to assess if DNAm patterns varied by molecular finding. We compared individuals with EHTM1 variants (n = 6) and deletions (n = 16) at all 774,583 CpGs sites and found no CpG sites to be differentially methylated between these two groups of KS individuals (all FDR P-values > 0.05). Furthermore, average methylation changes compared to controls, measured as Δb, were comparable between individuals with variants and deletions [ Figure 4], further supporting the underlying cause of the KS-associated DNAm signature as EHMT1 haploinsufficiency.

DISCUSSION
We identified a genome-wide DNAm signature, associated with haploinsufficiency of the EHMT1 gene product in the peripheral blood of individuals with KS. The signature enabled the classification of both 9q34.3 microdeletions encompassing all or part of EHMT1 and pathogenic EHMT1 variants. Of the 598 signature sites identified, 81% exhibited a loss of methylation in individuals with KS as compared to controls. EHMT1 acts as an H3K9 methyltransferase, typically depositing repressive marks, H3K9me1/2, in euchromatin. However, it has also been shown to methylate non-DNA targets including DNA ligase 1 (LIG1), which once methylated, plays a role in recruiting DNMT1 to hemi-methylated DNA during replication [38] . More specifically, the methylated LIG1 protein more readily binds UHRF1; this binding event recruits UHRF1 to replication sites, binding to hemi-methylated DNA and promoting maintenance of DNA  Table 3. Predicted pathogenicity of variants in subjects with limited phenotypic information 1 Least likely deleterious. FS: frame shift methylation by DNMT1 [38] . Furthermore, mouse embryonic stem cells in which Uhrf1 has been knocked down, progressively lose DNAm [39,40] . Therefore, we propose that the loss of DNAm observed in individuals with KS may be a consequence of dysregulated DNA methylation maintenance by DNMT1 due to EHMT1 haploinsufficiency. We also demonstrated that individuals with deletions and variants had indistinguishable DNAm patterns genome-wide. This has previously been reported in other NDDs associated with epigenetic regulators including: pathogenic variants in ARID1B and 6q.25.2 deletions in Coffin-Siris syndrome; pathogenic variants in NSD1 or 5q35.3 deletions in Sotos syndrome; and pathogenic SET1B variants and 12q31.24 deletions in SETD1B-related syndrome [15,16,41] .
An important aim of the present study was to generate a broad KS signature that captured the underlying pathophysiological effects of EHMT1 haploinsufficiency on the epigenome, in addition to comparing the impact of CNVs vs. SNVs. To that end, our reported KS signature sites differed somewhat from a DNAm signature of KS recently reported as part of a bioinformatics pipeline of 34 signatures designed to uniquely classify NDDs [42] ; this signature was developed using 15 patients with KS, including three individuals with 9q34.3 CNVs, 11 individuals with EHMT1 SNVs, and one individual with a clinical diagnosis of KS but no molecular data. Importantly, the reported signature was constrained to 107 CpGs to reduce "noise" and redundancy, and to optimize output for potential use in diagnostic testing [42] . Of the 107 CpGs sites in this signature, 28 overlapped with the 598 sites in the signature reported here [Supplementary Table 1]. This difference can, in part, be attributed to the platforms used to generate and validate the respective signatures. The signature presented here was generated and validated using only EPIC array data, as compared to Aref-Eshghi et al. [42] , who analyzed data from both the EPIC array (assays ~850,000 CpGs) and Infinium HumanMethylation450 array (450K array; assays ~450,000 CpGs; 90% of which are represented on the EPIC array). Therefore all 107 CpG sites in their signature are present on the 450K array [42] . Of our 598 signature sites, approximately half (317) are not represented on the 450K array. Beyond this technical difference, different statistical methods were employed for signature generation. The Aref-Eshghi et al. [42] signature was generated by initially ranking CpG sites on an interaction between P-value and effect size, with no required minimum P-value or multiple testing correction, and the top 1000 CpG sites underwent further analysis, including receiver operating characteristics curve, to select a final set of 100-150 CpGs [42] . Our signature sites were required to meet stringent significance and effect size thresholds (FDR-corrected P-value < 0.01, absolute Δb > 10%), with no restrictions on signature size. The differences in methods and outcomes highlight two important and overlapping applications of DNAm data, i.e., understanding pathophysiology and use of DNAm data in diagnostics.
GO analysis of the genes identified by our KS signature sites recognized enriched functions and pathways related to KS pathophysiology. Several processes related to neuronal and synaptic function were identified, relevant to the high frequency of ID observed in patients with KS. The top term for GO biological processes was "homophilic cell adhesion via plasma membrane adhesion molecules". This term was enriched due to signature sites mapping to CHD4, CDH5, and seven γ-protocadherin genes. Protocadherins are neuronal cell surface proteins serving several functions including avoidance of dendritic self-synapsing by conferring self-identity [43] ; these genes are crucial for normal synaptic development. Aberrant epigenetic regulation of these genes has also been associated with many NDDs, including Down syndrome and Williams-Beuren syndrome [44,45] . Furthermore, epigenetic dysregulation of protocadherins has been previously implicated in KS pathophysiology; brain tissue of Ehmt1 +/mice display increased H3K9 methylation at protocadherin genes that exhibit dysregulated expression [46] . While both DNA methylation loss and H3K9me2/3 gain support the role of protocadherin dysregulation in KS, we propose that these outcomes may be paradoxically independent, since EHMT1 has been shown to silence transcription by independently acting on both H3K9 and DNAm [47] . In keeping with this model, loci with the greatest changes to H3K9 methylation in Ehmt1 +/mice, showed no consistent DNAm changes [46] . Future studies in human cells are necessary to show if changes to H3K9 and DNAm marks in individuals with KS are indeed uncoupled and result from different EHMT1 functions, as proposed here. Also, such studies would greatly benefit from expression assays in relevant tissue types to directly measure the functional consequences of these dysregulated epigenetic patterns.
In addition to the 10 KS patients in the validation groups, including five infants, who were positively classified by our DNAm signature, we tested an additional seven individuals of interest. Within the validation cases, individuals did not cluster by age [Supplementary Figure 2]; this suggests that the KS signature is not strongly affected by age and likely independent of the dynamic DNAm changes that occur in the first year of life, which occur in part due to normal developmental shifts in blood cell composition [48] . The remaining seven individuals tested all carried genomic variants at 9q34.3: two individuals with partial duplications of EHMT1 and five with single nucleotide variants but limited clinical or no information available. Both patients with duplications were classified as negative for KS, with controls. Such copy number variants are commonly reported as uncertain, as their impact on gene function cannot be ascertained via cytogenetic analysis. On the basis of DNAm classifications, we suggest that the clinical phenotypes of these individuals are likely not related to their EHMT1-asociated CNVs; however, a functional protein assay in these cases would be valuable to confirm our findings.
Patient U1 was one of five individuals with limited clinical or no information available. This individual had a diagnosis of ASD and had undergone whole-genome sequencing, which identified a de novo EHMT1 frameshift variant. Following positive classification of this individual, we learned that he exhibited many features of KS, including mild ID, pulmonary stenosis, dysmorphic facial features and mild hypotonia. This finding speaks to the potential of DNAm signatures for clinical translation, i.e., to be used in concert with genome or exome sequencing to enhance interpretability of genome diagnostics.
A considerable strength of previously reported DNAm signatures is their utility in predicting the pathogenicity of variants of uncertain significance (VUS) [10,11,15] . Pathogenic missense variants are rare but present in KS [49] , Thus, we sought to assess the power/utility of the KS signature in classifying missense variants. We included one patient in the discovery group used to derive the DNAm signature, who carried a pathogenic missense variant, P809R, and therefore expected the KS DNAm signature to have clinical utility in this regard. Of note, this variant was predicted to be pathogenic/damaging in Polyphen (1.000), Align GVGD (C65), and SIFT (0). We included three individuals with EHMT1 missense variants in the "test" group and found that each was classified as negative for KS, with controls. Th e clinical information for these individuals, although limited, was consistent with the DNAm signature prediction.
Although Kleefstra syndrome has a clinically recognizable phenotype, affected individuals exhibit a range of cognitive and behavioral characteristics. Currently, there is little understanding of the relationship between genotype and phenotype. The complex genotype-phenotype relationship in KS will require further study using a large KS cohort with well characterized phenotypes. Our work presented here demonstrates that EHMT1 variants and 9q34.3 deletions share a DNAm signature, further supporting the underlying cause of KS as EHMT1 haploinsufficiency. Further epigenetic research in KS has the potential to elucidate the relationship between genotype and phenotype by refining the DNAm signature and identifying DNAm alterations associated with specific features of KS or specific molecular variants. Therefore, building upon this work to identify genes with altered regulation and expression patterns in KS will provide novel insights into the molecular pathophysiology of this disorder.