Exome sequencing identifies FLNC and ADD 3 variants in a family with cardiomyopathy

Aim: Idiopathic cardiomyopathy is often genetic in origin, typically autosomal dominant, and restrictive cardiomyopathy (RCM) is the rarest form. Clinically, RCM prognosis is poor with most patients requiring heart transplant due to impaired diastolic function leading to heart failure. In some cases, desminopathy is also observed, whereby desmin protein aggregates in the myocardium. Many genes are known to be involved in cardiomyopathy, and we sought to find the pathogenic mutation of a four-generation family with RCM and desminopathy. Methods: We employed whole exome sequence analysis of four RCM patients from the family, to identify the underlying pathogenic mutation(s). Results: Analysis of the exome data led to identification of two putative pathogenic variants. One, a nonsense mutation in ADD3, encoding adducin 3, was found in the proband and his affected daughter, but not other family members. The other was a missense mutation (G1546S) in the gene encoding filamin C (FLNC), found in all of the affected individuals. Both of these proteins interact with proteins involved in sarcomere function, and are expressed in both heart and skeletal muscle. FLNC has recently been implicated as a cardiomyopathy gene, but ADD3 has not at this point. Conclusion: We present bioinformatic analyses that suggest that the FLNC variant is the major pathogenic variant affecting this family, but cannot rule out some contribution of the ADD3 mutation in at least two patients.


INTRODUCTION
Restrictive cardiomyopathy (RCM) is a cardiac disease in which the diastolic function of the heart is impaired by increased myocardial stiffness, decreased ventricular filling and biatrial enlargement [1][2][3] .Prognosis is very poor, often requiring cardiac transplantation as a result of heart failure.RCM is the rarest of the classical types of cardiomyopathy, accounting for approximately 5% of all primary cardiomyopathy cases [4][5][6] .The other primary cardiomyopathies are classified on a morphological basis as hypertrophic (HCM), dilated (DCM), and arrhythmogenic right ventricular (ARVCM).In RCM, there is diastolic dysfunction but the overall size of the heart is typically normal, with minor enlargement of atria.Ventricular enlargement and other features can also occur, overlapping with features of the other primary cardiomyopathies.This phenotypic overlap demonstrates the clinical complexity of cardiomyopathy, and the variable expressivity from well over 1000 mutations in over 100 known cardiomyopathy genes [6] .Typically, these genes encode proteins of the sarcomere, z-disk domain, intermediate filaments, ion channel, intercalated disk, and cytoskeletal proteins [3,[7][8][9][10] .This variable expressivity in cardiomyopathy is seen even within the same family, with multiple diagnoses (including this family and other examples) [7,9,11,12] .RCM can also occur from infiltrative processes, such as amyloid or desmin deposition, often due to genetic mutations.Although a few cases of recessive inheritance in RCM have been described, most RCM pedigrees best fit autosomal dominant inheritance, consistent with the other types of cardiomyopathy [13][14][15] .Such gene mutations can occur de novo as well, with no prior family history.Mutations associated with RCM have been found in MYH7, TNNI3, TNNT2, ACTC, BAG3, DES, MYL2, MYL3, MYPN, TPMI, TTN, and TTR [16] .More recently, the genes FLNC, TNNC1, and MYBPC3 have also been implicated [13,17,18] .
In 2001, our group described a four-generation family presenting with RCM and desminopathy, but not skeletal myopathy.Known cardiomyopathy-causing genes were eliminated at the time, and the family was subsequently studied in low-density linkage analysis using 430 microsatellite markers across the genome.The study assumed an autosomal dominant model as ascertained from the pedigree.The most promising locus was at chromosome 10q23, reaching significance (LOD score 4.0, 1:10,000 odds in favor of linkage) for two-point haplotype analysis, but the overall linkage was weak because no single polymorphisms met the minimum LOD of 3.0 [19] .Putative candidate gene analysis at the time was prohibitive because the linked interval was very large and contained over 40 genes (none of which had been implicated in cardiomyopathy).In addition, it was among the last of the genomic regions characterized in the Human Genome project, with the sequence continuing to be updated even after 2005.Thus, a genome-wide exome sequencing (also known as whole exome sequencing, WES) analysis was undertaken in four affected individuals from the family, not only to provide sequence of the genes in or near the linked interval, but to also analyze all other proteincoding genes in an unbiased fashion, since the linkage data was not very strong given the size of the family.Using this approach, we identified two variants of interest.One, a novel nonsense mutation in ADD3, was in the linked interval on chromosome 10 but was only present in the proband and his affected child; the other was a missense mutation on chromosome 7, in the filamin C gene (FLNC), which appears to be the major pathogenic variant in this family, as described here.

Pedigree and subject clinical data
Cardiac phenotype and family structure information was presented previously, in which affected individuals with available medical records had a condition consistent with diagnosis of RCM, with one individual also having left ventricular hypertrophy, overlapping HCM phenotype [19,20] .In this report, an additional affected family member, a daughter of the proband, is included [Figure 1], and a table of clinical data is shown as well [Table 1].The proband (code 141) had RCM with no ventricular hypertrophy and received a heart transplant prior to age 10.The proband' s daughter' s cardiac condition best fit a diagnosis of HCM, rather than RCM, and she underwent cardiac transplantation prior to the age of 5 years, the youngest of any family member to need a transplant.Family member 139 was reported to have features of RCM and HCM.

Exome data generation
Family members were consented for this study per approved IRB protocol, and DNA was extracted from blood samples as previously reported [20] .Genomic DNA samples from the proband and three other affected members (proband's mother and two first-cousins-once-removed) were prepared for exome sequencing using the Kapa Biosystems library preparation kit and 8 cycles of PCR amplification.WES employed the Illumina TruSeq exome kit v.1 (62.2 megabases, 20,794 genes, 201,121 exons) with the resulting enriched library subsequently sequenced on an Illlumina HiSeq 2000 machine at the Center for Pediatric Genomic Medicine, Children' s Mercy Hospital, Kansas City, to an average read depth of 80X.

Bioinformatic analysis of exome sequence reads
Alignment of reads with the human reference genome sequence NCBI v37 utilized the Genomic Shortread Nucleotide Alignment Program.Variants were called using the Genome Analysis Toolkit.In-house software, called Rapid Understanding of Nucleotide variant Effect Software, provided variant annotation, including: dbSNP entry number for previously-reported variants, Blocks Substitution Matrix (BLOSUM) scores for predicting whether missense changes are damaging to protein function, Human Gene Mutation Database annotation for previously reported mutations considered pathogenic, heterozygosity for the variant, variant type, and chromosomal location including name of the gene the variant lies within (if any).These data were first sorted to identify novel, putatively pathogenic variants in the proband's sequence for an initial view of potential hits in the linked chromosome 10 region, which led to the identification of the ADD3 variant.Because this variant was not present in the other exomes, all four sets of exome data were filtered using custom Python scripts to screen the combined data for heterozygous variants common to the four affected subjects, with a minor allele frequency less than 1% based on data in dbSNP and within the Center's own cumulative exome data.Candidate variants were further scrutinized by analysis of literature and database reviews on the functions of the host genes, to prioritize those genes with expression in cardiac muscle, previously reported in association with genetic basis of cardiomyopathy, or other evidence of putative involvement in sarcomere function.For example, the dbSTRING database was employed to find evidence of known interactions that would implicate cardiac function involvement.Variants (regardless of type) that appeared novel or very rare (allele frequency less than 1%) were prioritized after the filtering, with further prioritization based on any pathogenicity information from BLOSUM analysis.Candidates were also screened for presence in the Exome Aggregation Consortium database (exAC: http://exac.broadinstitute.org), gnomAD (gnomad.broadinstitute.org),NHLBI Exome Sequencing Project database (http://evs.gs.washington.edu/EVS/),and NHLBI whole genome sequencing project (TOPMED, https://bravo.sph.umich.edu/freeze5/hg38/)for further evidence as to whether these were previously reported and could be neutral polymorphisms.Multialignments obtained by the online ClustalOmega tool (http://www.ebi.ac.uk/Tools/msa/clustalo/) were also used to screen for conservation of candidate missense variants through evolution, and further pathogenicity prediction employed online software packages SNPs&GO (http://snps.biofold.org/snps-and-go/),Polyphen-2 (http://genetics.bwh.harvard.edu/pph2/),and Protein Variation Effect Analyzer (PROVEAN)(http://provean.jcvi.org/index.php).

RESULTS
Over 100,000 variants were identified in each subject, representing loci where the subject was heterozygous or homozygous for the non-reference allele of known SNPs, or for a previously-unannotated or novel variant.After bioinformatic analysis of the proband' s data first, the variant of highest interest was a nonsense mutation in ADD3 (NM_011903.3,c.169C>T Arg57X, Figure 1), due to its being novel, clearly disruptive, within the linked interval, and the fact that its encoded protein, gamma adducin, is involved in actin/spectrin assembly.
To validate this variant, we first designed a PCR-RFLP test by altering one nearby base in one of the PCR primers.However, restriction digestion patterns were faint and difficult to resolve [Figure 2A], and thus DNA sequencing of this region (amplified with more distant primers) was performed to genotype the family [Figure 2B].Surprisingly, this mutation was only present in the proband and his affected daughter (both heterozygous) but no other family members.We concluded that this mutation was not the main causative The other candidate variants were prioritized for further analysis by mutation type (frameshift, nonsense and missense mutations of greatest interest), predicted pathogenicity, gene expression in cardiac muscle (as evidenced in expression data at genecards.org),and function that was known or could feasibly be considered to be involved in heart structure or function.The top variants that were validated by sequencing and further considered (some in genes previously implicated in cardiomyopathy), but were ruled out because they did not segregate with the condition in the family, were: BAG3 rs199682693 missense (Arg473Gly), RBM20 rs1417635 missense (Trp768Ser), MYOF rs79732285 (splice donor variant), DNMBP rs769805859 c.3907G>T (Asp1303Tyr), LDB3 (also called ZASP, 5 bp del in a polypyrimidine tract in intron c.756-10 to -15), and XIRP1 rs138653994 (Gln1832Arg).
DNA sequencing was used to validate the variant at FLNC (NM_001127487.1)c.4636G>A G1546S variant [Figure 3], and this mutation was found to segregate with the condition in the family through multiple informative meioses.Based on the correct segregation pattern, this variant was further evaluated to assess factors regarding the probability of it being the pathogenic mutation in the family.This included the frequency of the FLNC c.4636G>A G1546S variant in the general population, location in conserved domains, and software-based prediction about functional impact, as detailed below.
FLNC c.4636G>A G1546S was novel at first discovery, but more recently was entered in the NCBI dbSNP database (rs774263134) with extremely low MAF of 0.00002 (three heterozygotes in TOPMED (out of 62,784 genomes), no phenotype data available).dbSNP indicates that ExAc has found that allele once (frequency 0.000008) but it is not listed in the ExAC database when searched directly for FLNC.This variant is not reported in gnomAD or the NHLBI Exome Sequencing Project database.Thus, it is very rare, consistent with autosomal dominant pathogenicity for a rare disorder, but the presence of the allele in at least one database suggests that it could just be a neutral polymorphism.Since many of the exomes and genomes in the databases thus far come from studies of people suspected of having genetic conditions, it is possible that the heterozygous individuals (for example, in TOPMED, funded by NHLBI) do have heart-related conditions.
Next, predicted pathogenicity of c.4636G>A G1546S was analyzed using multiple software packages.FLNC residue 1546 is in the 14th (out of 24) Ig-like domain of filamin C, close to a recently identified RCM missense mutation S1624L 17 and an HCM-related mutation, A1539T 25 .The Clustal alignment of this 14th domain peptide sequence, containing c.4636G>A G1546S, showed perfect conservation at the Gly 1546 residue among phylogenetically diverse organisms and filamin C isoforms [Figure 4].The variant was then found to be deleterious in three programs based on algorithms considering biochemical change of the residue, conservation via multialignments, and 3D structural change.PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2)found c.4636G>A G1546S to be "probably damaging".This prediction used two different datasets of known damaging variants, HumDiv and HumVar, to compare the query variant with similar polymorphisms.Additionally, the PolyPhen-2 algorithm used sequences from UniProt, and 3D structural models from the Protein Data Bank to investigate evolutionary importance and potential disturbance of any binding sites.PROVEAN (http://provean.jcvi.org/index.php)computationally aligned 260 sequences, and predicted the FLNC variant to be "deleterious".Finally, SNPs&GO (http://snps.biofold.org)used sequence alignments, 3D structure and function analyses to generate a prediction of "disease-causing".

DISCUSSION
Filamin C is a dimeric actin-cross-linking protein highly expressed in both skeletal and cardiac muscle, localized to the sarcolemma, Z-disk, and myotendinous junctions, and is associated with structural stability, mechanosensation, and intracellular signaling [21][22][23] .Recently, filamin C was also shown to mediate fast repair of myofibrillar microdamage in cardiomyocytes, in addition to Z-disk assembly [24] .Filamin C predominantly consists of 24 immunoglobulin-like domains spanning over 80% of the protein, with an actin-binding domain at the amino terminus and a dimerization domain at the carboxy terminus.The Ig-like domain region has been found to interact with over 90 proteins [24] .Initially, dominant disruptive mutations in FLNC were found in association with skeletal muscle phenotypes (OMIM no.614065).More recently, FLNC mutations have been found in cardiac-only phenotypes, including HCM, DCM, RCM, and ARVCM [17,[24][25][26][27][28][29] .The mutation described here, c.4636G>A G1546S, is located in the 14th Ig-like domain, which is thought to dimerize with the 15th domain just before the hinge region.Interference in the ability of these two domains to dimerize has been hypothesized to destabilize protein folding of the entire filamin C protein [22] .As a highly dynamic protein which interacts with many substrates, missense alterations in FLNC could feasibly destabilize interactions with other proteins and alter cytoplasmic distribution in cardiomyocytes [17,30,31] .The rarity of c.4636G>A G1546S, its bioinformatic analyses (including complete conservation of this residue through phylogenetically diverse organisms, and designation of predicted deleterious), its co-segregation with disease in the family, and reported FLNC mutations in HCM and RCM, together support the hypothesis that this is the pathogenic mutation in this family.This is the first large pedigree described with RCM and HCM bearing an FLNC mutation.Examination of this gene region in the original linkage data showed no strong evidence of linkage, however by chance none of the microsatellite markers available in that panel were near FLNC.Thus the moderate linkage to chromosome 10 appears to be coincidence, although ADD3 is in that region.
Abnormal cardiac tissue aggregate including filamin C protein was seen in a recently reported FLNC mutation case affecting only the heart, although not addressed in other FLNC mutation studies [17] .This has been termed "filaminopathy" similar to the situation in "desminopathy".While that term now applies to a condition with abnormal tissue deposits in patients with DES mutations, including mutant desmin protein (e.g., Maerkens et al. [32] , 2013), in early studies it was a descriptive term indicating presence and/ or dominance of desmin in the tissue aggregate, without knowledge of underlying genetic cause.In the family described here, the desmin gene sequence is normal [19] , yet cardiac aggregates containing desmin were seen.It is possible that mutant filamin C is also present and is precipitating the aggregation phenotype, which could attract other proteins to the complex as well as bystanders.This would be consistent with the Maerkens et al. [32] (2013) that filamin C (presumably normal protein sequence) was found in abundance in desminopathy deposits, along with a multitude of other proteins that appear to have also been trapped.It may be the gradual accumulation of these aggregates that are the underlying cause of cardiac muscle tissue failure, with such aggregates only forming in the presence of an abnormal protein, as seen in cardiac amyloidosis associated with missense mutations in the transthyretin (TTR) gene [33] .Although transthyretin deposits are extracellular and desmin deposits intracellular, both can stress cardiomyocytes.Such cardiomyopathy cases would fall within the realm of pathogenesis via toxic effect of misfolded protein accumulation, among the growing category of protein misfolding diseases, and could potentially benefit from therapies being developed for this molecular phenotype.
It is unclear why some FLNC mutations affect only skeletal muscle while others affect the heart (separately or in addition to skeletal muscle).It is possible that it may be related to the location or type of FLNC mutation.Among the reports of FLNC-related cardiomyopathy where the patients were ascertained due to heart phenotype, there was only one report of patients also having symptoms of skeletal myopathy or detection of protein deposits in skeletal muscle [30] .The predominantly-cardiac phenotype mutations include missense substitutions and truncating mutations are in functional domains scattered across the gene; the truncating mutations are thought to exert pathogenesis through haploinsufficiency rather than protein aggregation [24] .In patients with predominantly (or only) skeletal muscle myopathy, the mutations reported (A193T, M251T, in-frame p.Val930_Thr933del, F1720LfsX63, T2419M, and W2710X), are in regions and functional domains distinct from the cardiac-only reports, thus likely affecting the protein function differently such that cardiac muscle is preferentially spared [34,35] .
There is no current evidence to support the contribution of the heterozygous ADD3 loss-of-function mutation to the phenotype, although this gene is down-regulated in dilated cardiomyopathy in the left ventricle compared to the right ventricle [36] .This would be consistent with loss of some ADD3 expression being contributory to a cardiomyopathy phenotype, as a modifier locus.Adducin 3 it is known to interact with actin, and also with ubiquitin C and ubiquitin-conjugating enzyme E2I (which also interacts with desmin) (BioGRID analysis, thebiogrid.org).The proband and his daughter were the youngest in this family to require heart transplants, with the exception of one of his 2nd cousins who needed transplant prior to age 10.The other living affected family members have not yet required transplantation, or did not need a transplant until later adulthood.This supports the possibility that this novel ADD3 mutation is contributing to the earlier onset of RCM in the proband and daughter.However, it could also be a neutral variant.It was not inherited from the proband' s mother, suggesting that it was either inherited from his father (DNA unavailable, but reported clinically unaffected) or arose de novo in one of the proband' s parents' germ cells.There are no other clinical features that are specific to the proband and his daughter.The other candidate variants initially examined are unlikely to be contributing to the phenotype because they are not as rare (present in dbSNP) and/or were predicted to have lower pathogenic potential.
The power and efficiency of WES is demonstrated in the identification of the FLNC mutation in this family, and other pathogenic genes in other RCM pedigrees [8,11,28] , dilated cardiomyopathy [37] , and in disease-gene discovery in general [38] .Although there are commercially available panels to test for mutations in known cardiomyopathy genes, WES is an unbiased search for such mutations and has the power to identify new cardiomyopathy genes.However, the disadvantage is that many variants have to be filtered and validated, to identify the likely candidate, with the risk of missing a pathogenic variant if filtering choices are not appropriate.

Figure 1 .
Figure 1.Pedigree of the family, with proband indicated with arrow.Black symbols represent individuals who are clinically affected with cardiomyopathy.Open shading indicates individuals who do not have features of cardiomyopathy.Subject numbers in italics (lower left) indicate the two individuals who are heterozygous for the ADD3 variant c.169C>T R57X (all other family members have wild type genotype, CC).The FLNC genotype at c.4636G>A G1546S is shown for each family member in blue; the A allele segregates with the clinical condition

Figure 2 .
Figure 2. Panel A shows photograph of ethidium bromide-stained polyacrylamide gel of the BspDI digests to validate and genotype the ADD3 mutation c.169C>T Arg57X.M is the molecular weight marker lane, U contains uncut PCR product.The arrow indicates the smaller fragment validating the mutation in the proband 141 and his daughter 1661 and not in three other family members.Panel B shows the DNA sequencing chromatogram demonstrating the ADD3 heterozygous mutation in the proband's daughter's DNA [red (T)/blue (C) double peaks in top two lines; control sequence on the bottom line]

Figure 3 .
Figure 3.DNA sequencing chromatogram showing the FLNC heterozygous mutation c.4636G>A G1546S in the proband's DNA [green (A)/black(G) double peak under the blue box]

Figure 4 .
Figure 4. Clustal Omega alignment of filamin C protein sequences (3rd line and below) (residues 1500 to 1559) among phylogenetically diverse organisms, and aligned to human filamin A and filamin B (first two lines).The red box shows that the glycine at residue 1546 is completely conserved among species and in the human protein family.NCBI accession numbers are on the left, with species indicator.FICAL is flycatcher bird; PANTR is Pan troglodytes (chimpanzee); CANLF is Canis familiaris (dog).Asterisks below the residue indicate complete conservation.Colons below the residue indicate differences when compared with human filamin A and/or B