The (epi)genomic landscape of splenic marginal zone lymphoma, biological implications, clinical utility, and future questions

Splenic marginal zone lymphoma (SMZL) is an indolent B-cell lymphoma comprising less than 2% of lymphoid neoplasms. Approximately 70% of patients have a progressive disease requiring treatment and up to 30% of patients relapse or transform to diffuse large B-cell lymphoma. Whilst research over the last decade has transformed our understanding of many B-cell tumours, it is only beginning to shed light on the molecular pathogenesis of SMZL. Expansive immunogenetic investigations have shown biases in the immunoglobulin gene repertoire with distinct patterns of somatic hypermutation, suggesting a pathogenic role for antigen selection. In parallel cytogenetic studies have found a number of recurrent chromosomal lesions, in particular a deletion of the long arm of chromosome 7, though causative genes have not been identified. Our understanding of the mutational landscape of SMZL is built on a limited number of index cases, but has highlighted recurrent mutations in KLF2, NOTCH2 and TP53, and a spectrum of genes that cluster within biological pathways of importance in B-cell differentiation. While preliminary DNA methylation profiling has shown epigenetically distinct patient sub-groups, including a group defined by elevated expression of polycomb repressor complex 2 components. This review will provide an overview of our current understanding of the molecular basis of SMZL, and how this information impacts patient outcomes. Furthermore, we will outline; (1) the knowledge gaps that still exist; (2) a potential future research direction; and (3) how a detailed molecular understanding of the disease will ultimately provide Page 90 Oquendo et al. J Transl Genet Genom 2021;5:89-111 https://dx.doi.org/10.20517/jtgg.2021.04 patients with improved management and treatment choices.


INTRODUCTION
The World Health Organization classification of tumours of the hematopoietic and lymphoid tissues defines three marginal zone (MZ) lymphoma entities, splenic marginal zone lymphoma (SMZL), nodal MZL (NMZL) and extranodal MZL [1] . In addition, a number of provisional entities are emerging; these include splenic diffuse red pulp lymphoma (SDRPL), hairy cell leukaemia-variant (HCL-v) and clonal B-cell lymphocytosis of marginal zone origin (CBL-MZ), the latter of which is clonally related to SMZL in a proportion of cases [2][3][4][5] . SMZL comprises less than 2% of lymphoid neoplasms, typically involving the spleen, bone marrow and peripheral blood, although a minority of patients have low volume extra hilar intraabdominal lymphadenopathy detectable on CT scanning. Patients most commonly present with abdominal discomfort due to massive splenomegaly, anaemia secondary to hypersplenism or auto-immune haemolysis, or incidentally due to either clinical or radiological detection of asymptomatic splenomegaly or an abnormal blood count. The median age of diagnosis is 65 and patients exhibit a 10-year median survival time [6,7] . The diagnosis is most secure in patients who undergo splenectomy, now performed almost always for therapeutic reasons, but it increasingly relies on a combination of clinical features together with an assessment of lymphocyte morphology and immunophenotype, bone marrow histology and immunohistochemistry [7] . In the absence of splenic histology, the differential diagnosis from common lowgrade B-cell disorders such as chronic lymphocytic leukaemia (CLL), follicular lymphoma (FL), and mantle cell lymphoma (MCL) is usually straightforward but SMZL lacks a disease-specific immunophenotype and the distinction between SMZL and some cases of HCL-v, SDRPL and lymphoplasmacytic lymphoma (LPL) may be more difficult. Approximately 70% of patients have a progressive disease requiring treatment. Standard treatment options include splenectomy, Rituximab or Rituximab and bendamustine, all of which achieve high response rates with prolonged progression-free survival [8,9] . However, up to 30% of patients relapse and are intolerant of, or resistant to, standard therapies or transform to diffuse large B-cell lymphoma (DLBCL) with dismal survival [10] . Therefore, there is an unmet need for well tolerated, preferably oral, novel therapies.
Whilst SMZL is still a relatively under-studied malignancy, several works published over the last decade or so have begun to unravel the intrinsic molecular defects present in the SMZL B-cell, and extrinsic cellular mechanisms that reflect micro-environmental and antigenic interactions [ Figure 1]. This review will focus on providing a summary of the (epi)genetic and immunogenetic landscape of the disease, the associated clinico-biological implications of these aberrations, as well as supplying an overview of key unanswered questions that might direct future research in this area.

ANALYSIS OF THE IMMUNOGLOBULIN GENES
The most important molecule expressed on the cell surface of SMZL B-cells is the B-cell receptor (BCR), which allows interaction with antigens and antigenic elements within the tumour microenvironment. Seminal sequencing studies of the immunoglobulin (IG) have been performed on the myriad of mature Bcell neoplasms, including SMZL, where they have identified key features of the B-cell receptor immunoglobulin repertoire indicating that clonal B-cell selection by antigen/superantigens is an important feature of SMZL pathophysiology. Investigation of large SMZL cohorts has demonstrated remarkably biased usage of immunoglobulin heavy chain (IGHV) genes, namely enrichment of the IGHV1-02 (30%), IGHV4-34 (11%) and IGHV3-23 (9%) genes, which collectively represent almost half of SMZL cases [20,21] . What is  [11][12][13][14][15][16][17][18][19] .
noteworthy is the scarcity of the IGHV1-02 gene in other mature B-cell tumours, even tumours of the spleen, such as SDRPL [20] , suggesting that a distinctive selection process may predominate in SMZL pathogenesis.
The majority (90%) of IGHV1-02 patients are IGHV1-02*04 [20][21][22] with extended restriction to the level of polymorphic variation. Evidence of somatic hypermutation (SHM) is seen in the majority of IGHV1-02*04 SMZL cases (~95%) suggesting exposure to antigen is both important in progenitor tumour cell selection but also relevant to ongoing evolution [22] . IGHV1-02*04 cases carry extended VH CDR3 sequences, biased IGHD and IGHJ gene usage, and distinct patterns of SHM [20] . IGHV1-02 cases with allele *04 may have unique immunogenetic features as the polymorphism encodes a tryptophan (W) residue rather than arginine (R) at position 75 of VH FR3, with the latter playing a role in IG structural stability and conformation. As will be discussed in more detail later, key somatic mutations and acquired copy number changes define the disease but are also associated with specific immunological backgrounds. What is the most notable is the enrichment of 7q deletions and KLF2 and NOTCH2 mutations in IGHV1-02*04 SMZL [23,24] . These analyses, with preliminary DNA methylation and transcriptomic studies [25,26] , suggest cases with IGHV1-02*04 represent a distinct patient sub-group, which is likely to have emerged from a cell with a distinct ancestry and/or unique immune activation process followed by transformation, ongoing antigen exposure, with shared genomic lesions and poor survival [24] .
The critical part that the BCR plays in SMZL is further emphasized by the presence of somatic hypermutations, as only 12% of cases lack any evidence of SHM at the IGHV locus, and might be considered truly unmutated [20,21] . The remaining cases exhibit evidence of SHM at the IGHV locus, with 38% and 50% defined as borderline (97%-99.9% IGHV gene identity to germline) and significantly mutated (< 97% IGHV gene identity to germline), respectively. The majority (71%) of IGHV1-02*04 cases harbour borderline levels of SHM, often benign in nature that cluster in the framework region [20,22] . Whilst levels of SHM represent a critically important prognostic and predictive biomarker in CLL [27,28] , their clinical utility in SMZL is less established. Using SHM cut-offs established for CLL, although SHM levels are likely to be disease-specific, showed no difference in progression-free and overall survival between mutated and unmutated IGHV cases [29,30] . In contrast, we have shown that a physiological definition, based on the complete absence of SHM (truly unmutated), is able to provide independent prognostic information, with truly unmutated cases exhibiting reduced time to treatment [24] . Taken together, these observations show that we continue to lack a definitive understanding of the clinical utility of SHM in SMZL, likely the result of small patient cohorts, heterogeneous treatment, and an inaccurate definition of the most biologically and clinically relevant SHM cut-offs.
repeatedly gained are enriched for genes that promote cell proliferation and similar aneuploidy patterns are seen in tumours derived from the same cell types [34,35] . The latter observation would be consistent with the frequency of trisomy 3, 12 and 18 across low grade B cell tumours.
Studies of 5q deletions in myelodysplastic syndromes and acute myeloid leukaemia illustrate both the heterogeneity of their genomic landscape and provide a paradigm for the investigation of large deletions. However, unlike with 7q deleted in SMZL patients, different 5q minimally deletion regions (MDR) are associated with distinct clinical phenotypes which aided in the identification of causative genes [39,40] . Deletions of 7q are rare in chronic lymphoid malignancies apart from SMZL and HCL-v. In SMZL, there is no 7q minus syndrome comparable to 5q minus and 7q deletions have no obvious prognostic significance. However, 7q deletion is associated with a number of other biological features such as IGHV1-02*04 usage, mutations of NOTCH2 and KLF2 and a specific DNA methylation signature that will be outlined later in this review. Intriguingly, splenic, and nodal marginal lymphomas share remarkably similar genomic landscapes other than the presence of 7q deletions in the former and PTPRD mutations in the latter. Numerous studies have shown that deletion breakpoints at 7q are heterogeneous, with q21 the most proximal and q36 the most terminal breakpoints [19,31,[41][42][43][44][45] . Using array comparative genomic hybridization (aCGH) in patient samples or cell lines, 2 studies identified remarkably similar MDR of 2.7-2.8 Mb in 7q32.1-q32.2 [36,38] . Compared to SMZL cases without deletion of 7q, expression of coding genes and micro-RNAs (miRNAs) within the MDR's is reduced but neither study identified microdeletions nor pathogenic mutations or disease-specific promotor hypermethylation of genes in the retained allele that would indicate a classical tumour suppressor gene. A further aCGH study showed another 1.51 Mb MDR at q22.1 in 49% of cases studied [46] which included CUX1, a gene implicated in tumorigenesis. While it is possible that 7q deletions have no impact on phenotype it seems more likely that future studies integrating several omic technologies as well as a genome wide approach will be necessary to fully comprehend the role of the 7q deletion in SMZL pathogenesis.

MUTATIONS IN KEY GENES AND PATHWAYS
Whole genome/exome sequencing (WGS/WES) has been employed to scan the entire cancer genome at base-pair resolution and identify somatically acquired gene mutations across a broad spectrum of mature Bcell tumours [48][49][50][51][52] . The analysis of large patient cohorts, often conducted as part of large international sequencing consortia, has provided a wealth of genomic information on these neoplasms, identifying a plethora of germ-line SNPs associated with disease-risk, panels of genes targeted by somatic mutations and a number of acquired mutational mechanisms. Unlike more prevalent mature B-cell malignancies, SMZL is currently precluded from these international sequencing projects, the result being an incomplete catalogue of tumour associated genomic lesions and mutational processes, drawn from a paucity of published genomic data. Only six genome-wide studies have been conducted on only 35 patients [53] . The only study employing WGS was performed by Kiel et al. [54] but was limited to six cases without matched germline DNA. Five WES studies have been carried out on discovery cases, often with subsequent targeted resequencing of relevant genes in additional samples. To date, none of these studies have reported mutational signatures nor mechanisms, such as kataegis and chromothripsis [55] . Even the somatic mutational burden remains disputed, with a range of somatic mutations per patients of between 9 and 82 (mean 25) [23, 56,57] . This limited agreement is likely due to the low patient numbers, and experimental and computational differences in WGS/WES processing, but could also allude to disease heterogeneity and statistical power insufficient to catalogue the complete mutational landscape of the disease [23,53] . Targeted re-sequencing approaches have helped elucidate recurrently mutated genes, but these studies have often included only small numbers of matched germ-line material for analysis. Whilst we currently have only a limited picture of the somatic landscape of SMZL, several recurrently mutated genes have been identified, which preferentially target physiologically important cellular processes, such as MZ B-cell maturation and migration, and cell cycle control [ Figure 2]. A number of genomic studies, with limited functional validation, all agree that the most important recurrently mutated genes are KLF2, NOTCH2 and TP53.

KLF2 mutations are present in up to 40% of SMZL cases
KLF2 is the most frequently mutated gene in SMZL (20%-40% of cases) [24,33,58,59] . The gene belongs to the family of Kruppel-like transcription factors, a subfamily of the zinc-finger class of DNA binding transcriptional regulators [60] . KLF2 directly binds to promoters regulating the expression of genes involved in cell homing, NF-κB signalling and cell cycle control [23,61] . In murine systems, loss of KLF2 drives the germinal cells to a MZ-like phenotype and preclusion of migration to the splenic MZ [62][63][64] , thereby preventing germinal centre B-cell responses to antigens in the MZ. Nuclear localization of the KLF2 protein and consequent DNA binding require three C-terminal highly conserved zinc finger domains and two nuclear localization sequences, respectively. Gene mutations can be missense substitutions or truncating events [ Figure 3], where the latter often results in the removal of the nuclear localization sequences [23,24,33,58] . Missense substitutions result in amino acid changes within the nuclear localization sequences of KLF2 or within highly conserved regions of the first zinc finger domain. These KLF2 mutants render the protein unable to elicit its transcriptional activity by displacement from the nucleus thereby preventing the ability of KLF2 to supress NF-κB induction by upstream signalling pathways [23,33] . The p.Q24X (stopgain) variant is a hotspot mutation [23,24,33,59] . Although no functional evidence on this specific variant is available, it is very likely that due to its position on the first exon it would result in a truncated and non-functional protein.
Furthermore, this variant has a scaled Combined Annotation Dependent depletion [65] score of 36 indicating that it is predicted to be the 0.1% most deleterious substitutions you can do to the human genome. KLF2 mutations are early, clonal events, enriched in patients with deletions of 7q and IGHV1-2*04 gene usage and are associated with a short median time to first treatment in univariate survival analysis [24] .

NOTCH2 mutations occur in 10%-25% of cases
In murine models, and to a lesser extent in humans also, NOTCH2 plays a key role in MZ B-cell maturation and MZ retention [66][67][68][69] . NOTCH2 is a cell-surface receptor belonging to a family of evolutionarily conserved trans-membrane proteins. Notch pathways regulate cell proliferation, cell fate, differentiation, and cell death [70] . When a ligand binds to the extracellular domain of a Notch receptor, it initiates a cascade of proteolytic cleavages that lead to the detachment of the notch intracellular domain, which then moves into the nucleus to interact with target transcription factors [70,71] . NOTCH2 is the second most frequently mutated gene present in 10%-25% of SMZL cases [23,24,54,56,72] . In SMZL, NOTCH2 mutations target the C-terminal PEST domain on exon 34 [53] , necessary for the regulation of the intracellular domain and consequent , 2019. transcriptional regulation [ Figure 3]. Shanmugam et al. [73] propose that in the case of SMZL, NOTCH2 mutations are not initiating events, but that the sustained signalling provided by these mutations provides a selective advantage in tumours that have already established themselves in a ligand-rich microenvironment. This is supported by evidence in mice where those with activating Notch2 mutations in mature B-cells displayed expansion of the marginal zone but did not develop lymphoma [74] . Additionally, NOTCH2 mutated tumours exhibit sub-clonal heterogeneity, with consequent aggressive clones [73] , and inferior survival that has been demonstrated in clinical studies [24,54] . Expression of the Notch intracellular domain 2 (NICD2) can be detected in SMZL cases and is a common feature of both NOTCH2 wild-type and mutated SMZLs [73] , similar to prior findings with NOTCH1 in CLL [75] , suggesting that Notch activation is a general feature of SMZL tumour cells. The work by Shanmugam et al. [73] showed higher frequency of NICD2+ cells in mutated versus wild-type tumours and higher in the marginal zones of the white pulp. It is yet to be determined if enhanced NICD2 expression in wild-type tumours is explained fully by mutations in other Notch regulators, such as NOTCH1 (~5%) and SPEN (~5%) [24,57] , structural or copy number aberrations, or by the enrichment of NOTCH2 in the normal counterpart of SMZL.

TP53 mutations account for up to 15% of SMZL
TP53 is one of the main SMZL associated genes implicated in cell cycle control, along with CCND3 (~5%) and ATM (~4%). However, the lack of germline material in many of the studies makes it difficult to confirm whether ATM mutations are truly somatic and should be looked at with caution. TP53 is disrupted in 10%-15% of SMZL cases [24,57] . Most TP53 mutations are missense mutations within the DNA binding domain [ Figure 3], attenuating or eliminating its function as tumour suppressor, since mutant proteins lose the ability to activate canonical p53 target genes. This leads to uncontrolled cell proliferation and permissive accumulation of genomic mutations that may culminate in tumour growth [76] . Mutations in TP53 are associated with poor prognosis and are more common in cases with unmutated IGHV genes [29,31,36,38,56] .

Other low prevalence genes cluster into key biological pathways
Additional recurrently mutated genes are continuing to emerge from high-throughput sequencing studies, though at lower frequencies [ Table 2 and Figure 3]. As previously noted, these genes often cluster within biological pathways of known importance to B-cell differentiation and in cancer more generally. Some of the most well-established genes and pathways are summarized below: NF-κB signalling plays an essential role in MZ B-cell development and differentiation [77] . When normal B lymphocytes respond to antigens, NF-κB signalling is activated, reprogramming cells to favour cell cycle progression, survival, cytokine secretion and inflammation [78,79] . NF-κB activation, through either the canonical or non-canonical pathway, is transient in normal cells and depends on external stimuli including ligands for the BCR and for the Toll-like receptors [78] , while termination of signalling is dependent on negative feedback mechanisms including re-accumulation of IκBα and induction of A20 [79] . Both unbiased and targeted approaches have uncovered molecular lesions affecting genes belonging to the NF-κB pathway, as well as upstream pathways connected to NF-κB activation [78] . Most notably TNFAIP3 (A20), a negative regulator of NF-κB signalling, has been found mutated in ~13% of SMZL cases [23,24,29,32,56,57,80] . Parry et al. [24] also reported higher frequency of TNFAIP3 mutations in cases that subsequently transformed, along with truly unmutated IGHV genes. Other negative regulators harbouring mutations include TRAF3 (8%) and BIRC3 (5%-11%). Both TRAF3 and BIRC3 are part of the regulatory system that negatively regulates MAP3K14, a central activator of noncanonical signalling and another target of mutations in SMZL [32] . Activating mutations in positive regulators have also been found mainly in CARD11 (~5%) and IKBKB (~3%). Furthermore, mutations in this pathway seem to be mutually exclusive pointing to multiple independent molecular mechanisms targeting NF-κB signalling in SMZLs [32] .  [24] NOTCH2 10%-25% NOTCH pathway/marginal zone development Worse OS in univariate analysis. Associated with 7q deletions, KLF2 somatic mutations and IGHV1-02*04 usage Independent risk factor for TTFT (multivariate analysis) [24] TP53~1 5% Cell cycle control/genome maintenance Associated with shorter EFS in univariate analysis but is an independent risk factor for OS (multivariate analysis) [24,38] TNFAIP3 7%-15% Canonical NF-κB signaling/MZ differentiation Higher frequency in small number of transformed cases [24] MYD88 5%-15% TLR and BCR signaling/MZ differentiation Somatic mutations associated with longer OS in univariate analysis [24] Mutually exclusive to 7q deletion, KLF2 somatic mutations and IGHV1-02*04 usage [24] KMT2D 11%-15% Chromatin remodeling -H3K4 methylation High prevalence in other lymphomas particularly FL and DLBCL Toll-like receptor (TLR) signalling plays a key role in SMZL biology, as cellular proliferation is driven by TLR activation. MYD88, an adaptor protein essential for proper TLR signal transduction, has several structural domains including a death domain responsible for oligomerization and interactions with IRAK1-4, which together lead to activation of NF-κB. The toll/interleukin-1 receptor homology (TIR) domain, at the proteins C-terminus, is responsible for the activation of downstream signalling. MYD88 mutations have been identified in 3%-15% of SMZL and 10% of CBL-MZ as well as across a spectrum of other B-cell tumours [24] where they occur as early clonal events, mutually exclusive to other driver mutations and in the presence of mutated IGHV genes, suggesting that these mutations define a group of homogeneous disease subtypes [23,80] . MYD88 also harbours a recurrent variant (L265P) in the TIR domain, frequently seen in a number of mature B-cell tumours [81] . This missense mutation promotes cell survival by enhancing NF-κB signalling [82] . As in CLL, MYD88 mutations have also been shown to be linked to favourable clinical outcome in SMZL using univariate approaches, though its independent prognostic relevance has yet to be confirmed [24,80] .
Mutations in KMT2D (MLL2), a histone methyltransferase that modifies lysine-4 of histone 3, contribute to lymphomagenesis [85] , with an established tumour suppressor role in DLBCL and FL [86][87][88] . In these same tumours, mutations in CREBBP, and more rarely, EP300, two highly similar histone acetyltransferases, result in defects in acetylation-mediated ablation of BCL6 expression and activity of p53 [89] . ARID1A (BAF250a) is a component of SWI/SNF (SWItch/Sucrose Non-Fermentable) family of evolutionary conserved, multi-subunit chromatin remodelling complexes, found mutated in various cancers and in SMZL [90] . SWI/SNF regulates DNA accessibility to other proteins involved in replication and repair, allowing the activation or suppression of gene transcription [83] . It will be interesting to investigate the relationship between these mutations and any epigenetic fingerprint associated with the cell of origin in SMZL.

TRANSCRIPTOMICS
At the transcriptional level, limited gene expression profiling of SMZL cases revealed the expected signatures; most notably B-cell genes, such as those within the BCR (BTK, PKCA, NFATC1) and NF-κB ( CD40, REL, BIRC3, TRAF3) pathways, and genes critical to MZ development and migration (NOTCH2) [25,91,92] . Existing data do not yield a robust gene expression signature for SMZL, but rather genes expressed across aligned mature B-cell tumours. Navarro et al. [93] constructed a gene expression classifier from a panel of B-cell tumours, which was not able to identify distinct gene signatures for SMZL, LPL and SDRPL. Of note though, a recent meta-analysis of gene expression profiles from SMZL, normal splenic material and other B-cell tumours was able to define 135 genes that discriminate SMZL from other B-cell lymphomas [94] . It will be important to validate this early observation, through the inclusion of larger patient cohorts, cell sub-population from the spleen and the inclusions of CBL-MZ patients to track progression to overt SMZL.

EPIGENETICS
Epigenetic modifications that are perturbed in cancer include dysregulation of higher order chromatin structure, such as chemical modifications to histones (thereby effecting DNA positive charge, decreasing histone/DNA affinity and altering accessibility of tertiary DNA structure), DNA methylation (regulation of gene expression by the transfer of methyl groups onto the C5 position of cytosine) and changes to miRNA expression patterns. These modifications are crucial mechanisms that control gene transcription, and genome stability, and contribute to normal B-cell maturation. DNA methylation is the epigenetic mark most well-studied in both normal and malignant B-cells, which have outlined clinically relevant changes associated with the cell-of-origin and malignant progression itself [95][96][97][98][99] . Murine studies first demonstrated the crucial importance of epigenetic modifiers in various aspects of B-cell differentiation [100,101] and more recently, high-resolution genome-wide analysis of sorted B-cell subsets has mapped the DNA methylome during critical stages of human B-cell development [97,102,103] . These studies suggest that the B-cells are the lineage most defined by changes in DNA methylation with up to 30% of the human DNA methylome modified during cell maturation, particularly global changes affecting heterochromatin, DNA repeats and polycomb repressor regions, and specific alterations targeting lineage defining enhancers and promoters. A number of mature B-cell tumours have been analysed at the DNA methylation level, exhibiting a global reduction in methylation levels, principally the result of hypomethylation in heterochromatic sequences that correlates with the maturation state of the putative founding normal cell type. Specific DNA hypermethylation is observed in these tumours and is highly correlated with sequences known to display polycomb repression in normal cells. Unlike other mature B-cell tumours, SMZL is under-studied at the DNA methylation level, with only a single study published that employed microarray-based promoter methylation and gene expression analysis, correlated with genomic and clinical outcome data [25] .
The multicentre study performed by Arribas et al. [25] profiled the methylation status of 27,000 promoter CpGs in 134 DNA samples extracted from spleen derived SMZL tumour cells of high purity using the Infinium HumanMethylation27 arrays. These data were integrated with gene expression, copy number, somatic mutational and immunogenetic data. The authors identified a patient subgroup characterised by high genome-wide promoter methylation (High-M), inferior survival, an elevated risk of histologic transformation, and a raised prevalence of NOTCH2 mutations, 7q31-32 deletion and IGHV1-02 usage. In addition to these distinct (immuno)genetic features, the High-M subgroup was defined by significant DNA methylation and transcriptional disruption of genes involved epigenetic regulation. Most importantly, the High-M subgroup exhibited hypomethylation and consequent elevated transcription of genes involved in Bcell activation and NF-κB and those encoding for the polycomb repressor complex 2 (PRC2) components, including EED, SUZ12 and EZH2. Hypomethylation of key PRC2 components was concomitant with hypermethylation and reduced expression of PRC2 target genes and those with H3K27me3 marks, supporting the hypothesis that the expression of key differentiation genes defines this High-M subtype [25,104] . Collectively, these data support a distinct sub-group of SMZL, driven by key IGHV usage, defined by consistent epigenetic and genomic lesions, with the likely functional impact being dysregulated lineage specification and cellular proliferation signalling [ Figure 4]. The same study was performed parallel in vitro experiments with the demethylating agent, decitabine, which partially rescued key tumour suppressor genes and down-regulated survival and proliferation pathways, suggesting that the High-M sub-group might be responsive to epigenetic therapies.
MiRNAs are a key group of short noncoding RNA molecules, approximately 22 nucleotides in length, which anneal to mRNA target genes, principally through partial complementarity to the 3'-untranslated region of target genes, where they repress protein translation or promote mRNA decay. Aberrant miRNA expression is a hallmark of human tumours and across B-cell lymphomas, deregulated miRNAs have been identified due to their impact on B-cell differentiation, and on established cancer pathways. A number of candidate and genome-wide studies have been performed identifying novel miRNAs and those that are casually implicated in other B-cell lymphomas, such as miR-155, miR-21 and miR-34a [105] .
As briefly mentioned previously, several candidate miRNAs have been defined based on their proximity to the minimal deletion region on 7q. Whilst based on limited cohorts and traditional molecular approaches, miRNAs including miR-593, miR-129, miR-182, miR-96, miR-183, miR-335, miR-29a and miR-29b-1 have been shown to be under-expressed in the presence of 7q deletions [106,107] . Under-expression of miR-29a and miR-29b-on 7q, has putative roles in immune regulation, cell proliferation and B-cell differentiation. In particular a key miR-29 target is the TCL1A gene, a gene that is over-expressed in SMZL [92] , where miRNA binding deactivates the oncogenic function of TCL1A. Several of these other 7q miRNAs may also contribute to disease biology through the dysregulation of target gene expression and function; miR-129 and miR-335 have been shown to target BCL2, RB1 and BCL-w [108,109] . Beyond miRNAs located on 7q, genome-wide studies have identified a spectrum of miRNAs, some of which target disease-relevant genes and are associated with clinical outcome. Key studies have compared miRNA expression in SMZL vs. other B-cell lymphoma sub-types, or by using normal/reactive spleen samples [110,111] . Bouteloup et al. [111] demonstrated that reduced miR-29a and elevated miR-21 levels correlate with disease aggressiveness. Peveling-Oberhag et al. [112] found miR-26b to be differentially expressed between SMZL arising in Hepatitis-C (HCV)+ vs. HCV-patients, though it remains unknown to what extent miR-26b expression is associated with SMZL pathophysiology or HCV infection. However, there is only limited concordance between these studies, perhaps reflecting biological disease heterogeneity, but also predicated on experimental differences, such as cohort size, experimental design, technology of choice and statistical analysis. Clearly, further investigations are needed to identify and functionally validate causative miRNAs involved in SMZL pathogenesis.

THE CLINICAL UTILITY OF MOLECULAR LESIONS IN SMZL
Previous sections of this review have highlighted in SMZL the biased use of the IGHV1-2*04 allele, a cytogenetic and genomic landscape which overlaps that seen in other marginal zone and splenic lymphomas but is distinguished by a higher incidence of 7q deletion, KLF2 and NOTCH2 mutations and a subgroup characterised by high genome-wide promoter methylation. The latter, together with NOTCH2 mutations, unmutated IGHV genes, TP53 abnormalities and a complex karyotype has been reported in one or more studies to have an adverse prognostic significance [24,31,38] . The European Society for Medical Oncology has recently published guidelines on the diagnosis, treatment and follow up of MZ lymphomas and while acknowledging the potential utility of some of the molecular features listed above as biomarkers, none are currently recommended for establishing the diagnosis, assessing prognosis or determining the timing or choice of therapy.
There are a number of reasons for this apparent paradox; firstly there is a lack of routine access to many of the potential novel biomarkers, secondly the rarity of SMZL and the paucity of clinical trial data dictate that studies have been relatively small and retrospective and thirdly, many cohorts have included a large percentage of non-splenectomised cases in whom establishing a definitive diagnosis may be difficult, especially in cases presenting before the diagnostic features of HCL-v and SDRPL were well recognized. Confining cohorts to cases who have undergone splenectomy overcomes this problem but excludes cases with indolent disease not requiring treatment and those treated with other modalities. Nevertheless, preliminary data from ongoing studies suggest that one or more combinations of IGHV1-2*04 usage, deletion of 7q, NOTCH2 and KLF2 mutations and aberrant methylation will be sufficient to identify a subgroup of SMZL with clearly defined diagnostic and prognostic features. Whilst the role of minimal residual disease monitoring in the management of SMZL patients remains unclear [113] , (epi)genomic lesions may have utility to track treatment response, perhaps in conjunction of the use of positron emission tomography scans.
A further benefit of genomic analyses demonstrating the importance of dysregulated NF-κB signalling, partially mediated through increased BCR signalling has been able to provide the rationale for clinical trials of BTK and PI3K inhibitors in all 3 subtypes of MZL. Table 3 summarises response data for patients with SMZL who were refractory to, or relapsed after, one or more prior therapies, usually including an anti-CD20 antibody [114][115][116][117][118] . Overall response rates of 50%-65% are comparable to those seen in other MZL subtypes. It is not yet reported whether primary acquired resistance to these agents is associated with mutations downstream of the PI3K and BCR pathways but in vitro studies using MZL cell lines have demonstrated synergy between copanlisib and the BCL2 inhibitor venetoclax [119] , and that acquired copanlisib resistance induced by prolonged exposure, was associated with decreased sensitivity to other PI3K inhibitors and to ibrutinib and with upregulation of multiple signaling pathways [120] .

FUTURE PERSPECTIVES
Our understanding of the molecular basis of many mature B-cell tumour types has deepened over recent decades, principally through the systematic application of novel technologies with ever increasing resolution and throughput. This information has revolutionized our understanding of the most prevalent mature Bcell lymphomas, yielding novel biological pathways and mechanisms, the ultimate result being improvements in patient stratification, management, and outcomes. In CLL and DLBCL, these studies have included several thousand patients, studies with WGS/WES, complemented with expansive transcriptomic and epigenomic datasets, and distilled with sophisticated computational approaches. Our understanding of SMZL lags far behind and will require significant investment to realise the translational potential of a granular understanding of the SMZL genome and its regulation. The next section will outline key areas for future research focus (summarised in Figure 5), culminating with an overview of two ongoing projects that hope to answer many of these questions.

Key research questions
The research community needs to continue to collaborate, creating bio-banked resources, with high-quality fresh-frozen tumour material and matched normal cells. Currently, the literature does not include a single patient with genome-wide somatic mutational data. Kiel et al. [54] reported WGS without the analysis of matched germ-line material, precluding a meaningful analysis of the presence of non-coding somatic variation and the documentation of underlying molecular mechanisms, such as the presence of key mutational signatures and regions of chromothripsis and kataegis. There is a critical need to analyse germline material to identify SNPs associated with disease risk, and the profiling of sequential samples from SMZL patients as their disease evolves, at pre-treatment, and with the development of refractory disease and ultimately transformation. There might also be utility in using circulating free DNA as a non-invasive approach in the management of patients and in the identification of mutations as has been done in other lymphomas [121,122] .
Consequently, we should persist in our approach to extend our DNA methylation profiling of high-quality, purified SMZL material from large clinically annotated cohorts reflecting the clinical and biological heterogeneity of the disease, using high-resolution microarrays and reduced representation bisulfite sequencing approaches. Parallel analysis of expansive published data sets and the detailed fluorescenceactivated cell sorting purification of additional normal B-cell populations that reflect both the maturation state and the MZ origin of the disease will be critical. This will likely identify further disease subgroups, extending the important study from Arribas et al. [25] , which identified the PRC2 epitype using lowresolution array-based promoter DNA methylation and expression profiling. Moreover, a comparison between the methylome of SMZL and other mature B-cell neoplasms, will likely provide valuable information with utility for disease classification, particularly for patients that are currently difficult to precisely diagnose. DNA methylation also has potential in identifying the proliferative history of a tumour cell, as passive accumulation of DNA methylation in repressed regions without detectable function is a feature that can be used as a clock of cellular proliferation [95,123] .
Experiments focusing on complex karyotype SMZL cases, particularly given their high frequency, will provide novel biological and clinically relevant information, as complexity points to a key dysregulation of appropriate cell cycle control or DNA damage response pathways in these patients, and has been linked to poor survival in SMZL [31] and in other mature B-cell lymphomas [124][125][126] , including in the context of novel targeted therapies [127,128] . Systematic analysis of the complex landscape of the SMZL genome, with highdensity arrays or WGS, would provide a more granular view of the levels and types of complexity that define these patients, and would help clarify disease-specific definition of complexity, which would both aid in appropriate patient prognostication, but also identify key cases for functional analysis; particularly those exhibiting genomic complexity in the absence of established drivers, such as TP53 or ATM disfunction.
Given its intricate association with genomic complexity another attractive approach is the analysis of telomere structure and dynamics. Acute telomere attrition leads to the uncapping of chromatid ends, and in normal tissues results in the activation of senescence checkpoints, with a key pre-malignant tumoursuppressor function. Telomere attrition leads to intra-and inter-chromosomal end fusions, the formation of dicentric chromosomes with consequential breakage during anaphase, and genomic complexity, through the mechanisms of breakage-fusion-bridge formation [129] . The majority of human tumours exhibit eroded telomere length, compared to the corresponding normal tissue [130] . In line with this, several mature B-cell tumour cohorts, particularly CLL, have been shown to contain a proportion of patients with the shortest telomeres, which are more likely to exhibit poor risk-genomic lesions, have unmutated IGHV genes, and harbour a complex genome and significantly poorer clinical outcome [131,132] . However, to date, preliminary telomere length analysis of SMZL has only been published in abstract form showing an enrichment of short telomeres in patients with progressive disease [133] . This approach should be extended to large patient cohorts, and is likely to be highly informative, particularly given the aforenoted enrichment of cases with karyotypic complexity and the significant clinical heterogeneity that exists. There are a number of approaches to quantify telomere length, but one likely approach would be a PCR-based approach, such as the MMQ-PCR assay [134] , as this is highly scalable, cost effective and requires a small amount of DNA for analysis. However, given that MMQ-PCR quantifies only a mean telomere length, the single telomere length amplification assay would be the ideal choice, as this allows the quantification of telomere length for a single chromosome in a single cell [135] . There are also approaches being developed such as TeSLA and TCA, both of which The diagram outlines key areas for future research starting with unbiased whole genome sequencing of matched germline-tumour samples across the disease history. Each time point will allow for the identification of key immunogenetic and cytogenetic subgroups, as well as mutational signatures. Whole genome sequencing is essential for the identification of non-coding mutations, and with large collaborative cohorts enabling the identification of SNPs associated with disease risk. This unbiased approach will form the basis for more targeted studies, analysing additional contributory factors such as epigenome of SMZL and telomere attrition mechanisms. The identification and analysis of suitable control MZ B-cells will be important work that will feed into wider studies, particularly epigenetic studies and determining the cell of origin. Synthesis of new information will allow the identification of biological sub-groups and direct downstream functional analyses. The ultimate goal is to translate newly acquired knowledge of the underlying molecular mechanism of SMZL for direct patient benefit: to improve differential diagnosis and aid in the discovery of novel therapeutic targets. SMZL: Splenic marginal zone lymphoma; MZ: marginal zone; CNAs: copy number alterations; WGS: whole genome sequencing; CBL-MZ: clonal B cell lymphocytosis of marginal zone origin; IGHV: immunoglobulin heavy chain variable region; FACS: fluorescence-activated cell sorting; TME: tumour microenvironment.

Ongoing studies
Whilst it is clear that SMZL research remains in its infancy, a close and collaborative network of clinicians and scientists are continuing to study the disease. Two ongoing projects are particularly exciting and promise to significantly advance our biological understanding of the disease. Firstly, the IELSG46 (NCT02945319) is an observational study, led by Professor Davide Rossi, which aims to develop and validate an accurate prognostication model using integrated molecular profiling of > 300 treatment naive SMZL patients, where diagnosis is based on spleen histology and > 5 years of follow-up is available on all patients. The initial plans were to perform targeted sequencing to detect somatic gene mutations and permit IGHV analysis and FISH to define 7q deletion status, culminating with complex statistical methods to develop hierarchical models to predict overall survival. However, the project has now developed to include the genome wide analysis of mutations, CNAs and gene expression. This exciting multicentre study is ongoing and has only been presented in abstract form [138,139] . In 2019, the consortium presented their early work. Profiling a cohort of 382 SMZL patients, the authors define four molecular clusters (MC) with distinct outcomes, employing machine learning approaches. MC1 and 2 were driven by KLF2 and NOTCH2 mutations, respectively, and enriched for IGHV1-2*04 and deletion of 7q. MC3 and 4 were defined by KMT2D and TP53/ATM mutations, respectively. Clinical correlations demonstrated that MC1 and MC2 were those that exhibited inferior survival. The authors also reported a more restricted analysis of gene expression data suggesting that patients can be clustered based on transcriptomic signatures. This study, and a host of consequent manuscripts, will be truly important in defining the prognostic landscape of the disease, and will be an important step towards identifying patients for precision medicine approaches.
Our group is also coordinating a large international study, distinct from IELSG46, focusing on detailed DNA methylation profiling of SMZL cases and positioning these cases within a framework including a spectrum of normal B-cell sub-sets and a myriad of other mature B-cell tumours. These data will then be integrated with genome-wide mutational and CNA data, telomere dynamics, and for key biological subgroups, the proliferative history based on methylation arrays, transcriptomic and chromatin accessibility maps. The aims are to: (1) further characterize the epigenetic landscape of SMZL, and define new patient subgroups based on state-of-the-art approaches; (2) define patient subgroups based on their relationship to normal B-cell populations, providing insights into the potential cell-of-origin of key patient subgroups; (3) compare SMZL to the DNA methylation patterns of other mature B-cell tumours, detailing approaches that might aid in differential diagnosis; and (4) compare CBL-MZ to SMZL and aligned conditions, allowing more accurate disease classification and improved prediction of those CBL-MZ cases destined to progress to overt lymphoma. These are two examples of key data sets that will emerge in the literature in the coming months and should advance our understanding of the molecular lesions that define the disease and provide further support of their clinical utility.

CONCLUSIONS
Over recent years, technological advances have provided exponential increases in sensitivity and specificity, thereby transforming our understanding of many of the more prevalent mature B-cell tumours, providing compelling evidence of their clinical utility and facilitating the development of cost effective diagnostic procedures with high accuracy. In SMZL, the picture is less clear. Most cases of SMZL remain indolent with protracted overall survival, but biomarkers are needed to help define the third of patients that exhibit aggressive disease with the possibility of transformation to a more lethal lymphoma. The work outlined herein has improved our understanding of the biological basis of SMZL, with key discoveries being: (1) a highly restricted IGHV gene repertoire, including selective usage of the IGHV1-2*04 allele in 30% of cases; (2) recurrent CNAs (deletion of 7q being the most frequent) and somatic mutations, the latter of which appear to cluster within biologically relevant pathways, such as Notch, NF-κB, BCR and TLR signalling; and (3) a epigenetically-defined disease sub-group with PRC2 activation.
These observations are translationally important with clear clinical utility and will help with the diagnosis and risk-adapted patient stratification in the coming years if the appropriate testing procedures can be agreed upon. This is a challenging aim, needing the most appropriate assays that identify the most clinically informative immunogenetic and (epi)genomic information. In our lab, we employ two resequencing panels, one for IGHV analysis and another to identify gene mutations associated with survival (KLF2, NOTCH2, TP53, TNFAIP3) and those that might provide information on differential diagnosis [WM (MYD88), HCL ( BRAF), CLL (NOTCH1, SF3B1), NMZL (PTPRD), HCLv (MAP2K1)]. We spike our panel design with a backbone of genome-wide probes (200 kb resolution) for CNA identification. Our view is that this approach is a suitable balance between genome coverage and the sequencing depth to identify sub-clonal mutations, with appropriate reagent and sequencing costs, and the time and computational resources required for analysis. As our understanding of the genome increases, these gene panels could be expanded with further prognostically relevant lesions, those that might be therapeutic targets, and those that might help monitor response. Ultimately, it might be that comprehensive multi-omics analysis (including epigenetic profiling) will most accurately identify patients for precision medicine, but it will need to be offset by financial savings through selection of the most appropriate treatment choice. Independent of the technical approach, each molecular marker would need to be validated across multiple discovery and validation cohorts, in retrospective historical studies and in prospective clinical trials, resulting in a robust and reproducible association with a disease outcome. Assays would then need to be analytically and clinically validated, before international harmonization, regulatory approval, and ongoing assessment through accreditation. Guidelines for interpretation and reporting would be required, and training would be needed to facilitate appropriate clinical referral and accurate communication with patients.