Enrichment of B cell receptor signaling and epidermal growth factor receptor pathways in monoclonal gammopathy of undetermined significance: a genome-wide genetic interaction study

Background Recent identification of 10 germline variants predisposing to monoclonal gammopathy of undetermined significance (MGUS) explicates genetic dependency of this asymptomatic precursor condition with multiple myeloma (MM). Yet much of genetic burden as well as functional links remain unexplained. We propose a workflow to expand the search for susceptibility loci with genome-wide interaction and for subsequent identification of genetic clusters and pathways. Methods Polygenic interaction analysis on 243 cases/1285 controls identified 14 paired risk loci belonging to unique chromosomal bands which were then replicated in two independent sets (case only study, 82 individuals; case/control study 236 cases/ 2484 controls). Further investigation on gene-set enrichment, regulatory pathway and genetic network was carried out with stand-alone in silico tools separately for both interaction and genome-wide association study-detected risk loci. Results Intronic-PREX1 (20q13.13), a reported locus predisposing to MM was confirmed to have contribution to excess MGUS risk in interaction with SETBP1, a well-established candidate predisposing to myeloid malignancies. Pathway enrichment showed B cell receptor signaling pathway (P < 5.3 × 10− 3) downstream to allograft rejection pathway (P < 5.6 × 10− 4) and autoimmune thyroid disease pathway (P < 9.3 × 10− 4) as well as epidermal growth factor receptor regulation pathway (P < 2.4 × 10− 2) to be differentially regulated. Oncogene ALK and CDH2 were also identified to be moderately interacting with rs10251201 and rs16966921, two previously reported risk loci for MGUS. Conclusions We described novel pathways and variants potentially causal for MGUS. The methodology thus proposed to facilitate our search streamlines risk locus-based interaction, genetic network and pathway enrichment analyses. Electronic supplementary material The online version of this article (10.1186/s10020-018-0031-8) contains supplementary material, which is available to authorized users.


Background
Monoclonal gammopathy of undetermined significance (MGUS) is a premalignant phase of multiple myeloma (MM) and the most common plasma cell dyscrasia present in as high as 3.2% of general population below 50 years of age and up to 5.3% for population aged 70 years or older (Kyle et al. 2018). At an approximate annual rate of 1% MGUS progresses to MM, lymphoplasmacytic lymphoma/ Waldenström macroglobulinemia or amyloid light chain amyloidosis (AL amyloidosis) (Kyle et al. 2006;Dispenzieri et al. 2010). Apart from the knowledge of familial clustering from population studies, there has not been much development in deciphering the genetic architecture of MGUS (Greenberg et al. 2012;Frank et al. 2016;Landgren et al. 2009). Of late, 17 risk loci have been found predisposing to MM among which 9 are supposed to share association with MGUS (Mitchell et al. 2016; Thomsen et al. 2017;Weinhold et al. 2014;Greenberg et al. 2013).
A GWAS from our group on 243 German individuals with MGUS has recently discovered 10 susceptibility loci with varying degree of significance (Thomsen et al. 2017). GWAS on a genetic heterogeneous disorder such as MGUS would possibly be subject to 'missing heritability' which asserts, an association study can account for merely a small proportion of true causal genetic variations due to low detection power (Manolio et al. 2009). Linear interactions on paired single nucleotide polymorphisms (SNPs) can thus be used to inflate genomic resolution of test space and find novel risk locus pairs which otherwise would remain undetected. In a case/control approach statistical interaction models explain extra additive effects due to co-occurrences of two variants on top of the fixed effects (Cordell 2009). We use an unbiased statistic with high convergence rate, and observe increased detection power for genome-wide interacting pairs (Wellek and Ziegler 2009). Gene-set enrichment along with genome-wide pathway analysis that measures association of a phenotype to a predefined genetic cluster has become rather common in extending biological understanding of differentially regulated pathways affecting quantitative traits and phenotype variation (Ramanan et al. 2012;Khatri et al. 2012). This approach of mapping cancer susceptibility regions and detection of novel pathways in site-specific cancers has opened opportunities to new therapeutic approaches (Yi et al. 2017;Lee and Gyu Song 2015). Here we report a case/ control interaction study on a discovery population of 243 MGUS cases and 1285 controls. The findings are subsequently supported by a case-only interaction study on 82 cases and finally confirmed with case/control interaction analysis on another population of 236 cases and 2484 controls. We pursue pathway enrichment analysis on a subsequent stage, based on both our current interaction study and the previous GWAS. To this end we use three in silico tools to discover novel pathways, corresponding genetic loci and clusters predisposing to MGUS taking into account significance of previously detected risk loci.

Ethics
Collection of samples and clinicopathological information from subjects was undertaken with the relevant ethical board approvals in accordance with the tenets of the Declaration of Helsinki. All subjects provided written informed consent. Ethical approval was obtained from Ethics committee of medicine faculty, University of Heidelberg, Heidelberg, Germany.

Datasets
The University Clinic of Heidelberg and the University Clinic Ulm discovered 243 MGUS cases among which 114 (47%) were males with a mean age at diagnosis of 62 years, SD ± 11 years. The Ig isotype distribution was 72% IgG, 12% IgA, and 16% other Ig isotypes (Thomsen et al. 2017;Weinhold et al. 2014). These MGUS cases were identified during diagnostic work-up of a different unrelated disease. Out of the 243 cases, two developed MM within 3 years after sampling and 46 individuals were seen only at the time of sampling. IgM MGUS cases were excluded from the Heidelberg/Ulm cohort. For replication, 236/82 MGUS patients were identified for case-control/case-only replication in Essen within the Heinz -Nixdorf Recall (HNR) study (Schmermund et al. 2002). About 61% of the replication set were males with the mean age at diagnosis of 64 years, SD ± 9 years. Detection of MGUS was based on internationally accepted criteria (Criteria for the classification of monoclonal gammopathies 2003): monoclonal protein concentration less than 30 g/l, less than 10% monoclonal plasma cells in bone marrow, normal plasma calcium and kidney function and no bone destruction or anemia. The reference population for the Heidelberg/Ulm set consisted of 1285 German individuals from the HNR study of whom 59% were males with a mean age at sampling of 60 years, SD ± 8 years (Schmermund et al. 2002). The reference population for the Essen set was also recruited within the HNR study, adding up to 2484 individuals, not overlapping with the reference population for the Heidelberg set.
Illumina HumanOmniExpress-12v1.1 chip arrays were used for genotyping the Heidelberg/Ulm MGUS set and the corresponding control set was genotyped using the Illumina HumanOmniExpress-12v.1.0 chip array (Schmermund et al. 2002 The amount of overlaps among the SNPs genotyped is reported in Additional file 2. Quality control assessment for genotyping has previously been described and all variants and samples considered for the final analysis passed the predefined thresholds (Broderick et al. 2011;Chubb et al. 2013).

Quality control of GWAS samples
We excluded SNPs with less than 1% minor allele frequency, and also if the call rate was less than 99% in cases and controls. Genotype distribution in controls was tested for Hardy-Weinberg equilibrium with a χ2 test with 1 degree of freedom or Fisher's exact test and SNPs with P value lower than 10 − 5 was removed. This stringent quality control filtering produced 489,555 autosomal SNPs from Heidelberg/Ulm GWAS set for further analysis common to all of the 243 cases and 1285 controls and also 82 HNR samples (cases) genotyped on the same array. Application of similar quality control protocol rendered 195,490 autosomal SNPs present in 236 cases and 2484 controls of Essen GWAS set.

Statistical and bioinformatics analysis
Genome-wide case-control interaction analysis with CASSI and INTERSNP Statistical analyses were performed with R version 3.3.1, CASSI version 3, and INTERSNP v1.15 (Table 1). All pairwise combinations of post-quality controlled selected SNPs from Heidelberg/Ulm GWAS were tested for interactions on genome-wide scale. The interaction term of a pairwise fixed effects logistic regression model involving binary regressors may be interpreted to be the ratio of the odds ratios of association between alleles in cases to the odds ratios of association between alleles in controls. Hence the odds ratio of association between alleles in cases will be equal to that of the population interaction odds ratio given the odds ratio of association between alleles in controls are equal to 1. Epistasis enforces a logistic regression with linearity assumption among the fixed single marker effects and interaction effect of the two markers (Karkkainen et al. 2015;Purcell et al. 2007). This includes all 489,555 SNPs and approximately N = 1.2 × 10 11 interactions. However, linkage disequilibrium (LD) among the SNPs, if not taken into account in argument space, possibly renders a large portion of tests redundant which destabilizes the test statistic. To avoid this deflation of test power, we employed Wellek-Ziegler statistic which promises high variance stability and unbiased estimate on the condition of Hardy-Weinberg equilibrium conformity (Wellek and Ziegler 2009). CASSI Genome-Wide Interaction Analysis software is used to this end and we benefit from the computational efficiency thus achieved. Interaction test in CASSI (−wz) was applied with default initial pruning on single marker association P value at 10 − 3 level of significance which decreases the computational burden (Ueki and Cordell 2012). This step selected 2.8 × 10 7 SNP pairs to be tested for interaction. For replication of interactions in the discovery set, reanalysis with full log-linear model was obtained with INTERSNP which selects a user-predefined number of top single marker hits as candidate SNPs for subsequent interaction tests. On a predefined level of 5000 selections, INTERSNP produced approximately N = 1.25 × 10 7 variant pairs (Herold et al. 2009).

Case-only analysis with CASSI
We chose the additional smaller cohort of 82 individuals with MGUS for a validation study and adopt a case-only approach. Case-only approach in SNP -SNP interaction studies answers the question of association (or correlation) between alleles of two loci irrespective of phenotypic categorization. Assuming the general population (control group) is devoid of any correlation between the specific loci, the case-only approach ensures evidence of gene-gene functional pair interactions with higher power in detection due to reduced number of multiple testing.
As the risk loci are tested against each other rather than a reference set, allelic co-occurrence indicates bi-allelic interaction jointly predisposing phenotype changes (Ueki and Cordell 2012). However, case-only approach makes a quite strong assumption of no apparent genetic correlation among controls which is difficult to justify under the effects of LD which calls for cautious investigation of the results which further needs to be tested against a control population as reference to avoid misinterpretation due to the assumption. Case-only interaction test with CASSI with Wellek-Ziegler statistics (−wz-cc-only) on post quality-controlled SNPs rendered transformed Fisher's Z statistics on 4.4 × 10 5 out of a total of 1.2 × 10 11 interaction pairs. We re-observed 31 genome-wide significant pairs from the discovery set. The number of overlaps detected between the two tested interaction population sets with high power extracted from the case-only set ensured viability of the replication GWAS data being used for replication (Wu et al. 2010).

Case-control replication study with CASSI
The Essen GWAS data consisting of 236 cases and 2484 controls was chosen for the case-control replication of the results obtained. The interaction tests for replication were performed similarly as explained before using Wellek-Ziegler statistic. Approximately N = 8.2 × 10 6 tests were performed among the 195,490 SNPs after application of initial quality control. Among the quality-controlled SNPs from the Essen GWAS, we observed 188,198 overlaps with that of the Heidelberg/ Ulm discovery GWAS. With approximately 35% cross table commonality we do not expect to see chance findings overlapping with high significance which takes care of the 'winner's curse' (Shi et al. 2016;Poirier et al. 2015;Jiang and Yu 2016).

Network analysis with STRING
Two-locus epistasis evidence is verified by network construction with the in silico tool STRING. It is a web-based data repository dedicated to protein-protein interactions among 2.5 million proteins among 630 organisms including mammalians (version 8.0) (Jensen et al. 2009). Scores for each of the interacting protein pair is computed with combined scores integrating discovery probabilities from different sources adjusting for arbitrary false positives. Subsequently pathway enrichment is performed with the background repository for the interaction identified loci.

Gene prioritization and pathway analysis of the MGUS GWAS
Due to LD, causal regulatory elements often remain unidentified in genome-wide association studies. To gain biological insight, pathway analysis via gene-set enrichment is traditionally carried out to integrate signals from different sentinel variants and linked SNPs. We applied three tools designed for biological interpretation of GWAS: Pathway Scoring Algorithm (PASCAL), Data-driven Expression Prioritized Integration for Complex Traits (DE-PICT) and Meta-Analysis Gene Set Enrichment of Variant Associations (MAGENTA). PASCAL is a pathway analysis tool developed for association summary statistics for variants annotated to genes. PASCAL uses maximum of chi-squares (MOCS) or sum of chi-squares (SOCS) statistics with null distribution as Gamma with varying degrees of freedom (Lamparter et al. 2016). Although pathway scoring was performed using both MOCS and SOCS statistics, the results were comparable and SOCS produced deflated significance levels with similar order as of that by MOCS.
With empirical sampling and subsequent supervised clustering according to the significance levels introduced by single marker association tests, it utilizes the idea of gene fusions i.e. clusters of correlated genes, for which the variants are in LD. We performed gene set enrichment analysis (GSEA) using single marker P values from our published GWAS on MGUS and SNPs were mapped to closest genes searched 20 kb upstream and downstream from the gene. However, SNPs, corresponding to several genes responsible for regulating a single pathway, if they were in LD, were used to cluster the corresponding genes as single genetic entities for the given pathway and were called 'gene fusions'. Kyoto encyclopedia of genes and genomes (KEGG), REACTOME and BioCarta libraries were used for pathway enrichment. For further investigation of the distinguished pathways, we used DEPICT, a gene prioritization, tissue enrichment and pathway analysis tool for biological interpretation of GWASs distributed by Broad institute, which employs Python shell script on a Java platform for efficiency (Pers et al. 2015). This framework is built on sophisticated predictive modeling employing guilt by association on reconstituted gene sets to perform gene prioritization and GSEA (van Rheenen et al. 2016). Depict derives enrichment analysis viability from 77,840 gene expression datasets. We used DEPICT's gene set knowledge base derived from Gene ontology (GO), Ensembl, The Mammalian Phenotype (MP), KEGG and REACTOME and followed analogous analyses.
Literature on pathway analysis with MAGENTA is populous (Koster et al. 2014;Wang et al. 2012;Duncan et al. 2014). We executed GSEA in MAGENTA from MATLAB platform as a replication tool. For pathway analysis, single marker association P values and chromosomal regions were annotated to genes corresponding to a pre-existing chromosomal range and enrichment computation was applied on non-confounders (Segrè et al. 2010;Shim et al. 2015). Pathway annotations were extracted in back end from GO, KEGG, Protein Analysis through Evolutionary Relationships (PANTHER), BioCarta and REAC-TOME databases. Across the three platforms thus used to prioritize pathways, p values are pooled subject to a total overlap and are combined using empirical Brown's method. Traditionally p values thus analyzed are assumed to be dependent and thus the method applied provided a conservative restriction. All additional analyses were performed with R version 3.3.1.

Genome-wide interaction analysis identifies 23 variant pairs
Two machine level in silico tools, CASSI and INTERSNP were employed to explore curated genome-wide interaction on a set of 243 German individuals diagnosed with MGUS in a case-control design. Among the 489,555 genotyped post-quality control SNPs, a brute search algorithm required 1.2 × 10 11 tests at a whole-genome scale to perform a multivariate log-linear association test. However, CASSI restricted the runs to approximately 2.8 × 10 7 overall tests at the system-defined single marker association test threshold of P = 10 −3 . As approximately 2.8 × 10 7 tests were performed, Bonferroni corrected level of global threshold of significance was determined to be 5 × 10 −10 which restricts the family-wise error rate (FWER) at 1% ( Table 1). Selection of chance findings over significant variants was thus avoided by applying Bonferroni correction for multiple testing. At this empirically determined P < 5 × 10 −10 , CASSI reported 561 significant variant pairs. The top ranked interaction (rs12471071 [2q37] -rs1385453 [9p24]) from the discovery set had a Wellek-Zeigler (W-Z) P = 4.19 × 10 −13 and a simple logistic regression P = 8.67 × 10 −11 (Additional file 3). Although the CASSI algorithm detected several common variant SNP pairs to be genome-wide significant, previous researches demonstrate that such findings were often subject to false discovery.
Interaction tests in INTERSNP were employed according to subjective pre-selection of the 5000 most significantly associated SNPs from the single marker tests (P = 2.47 × 10 −3 ) which restricted the number of tests performed to an approximate 1.25 × 10 7 pairs. Subsequently, 693 unique interaction pairs were identified at 5 × 10 −5 significance level where none of the observations reached genome-wide threshold of approximate 8 × 10 −10 calculated taking Bonferroni correction into account on 99% confidence level. The top interaction was found to be between rs10099120 and rs3738270 with P = 9.05 × 10 −8 and we note, rs10099120 is located in the intronic region of RALYL and rs3738270 corresponds to a missense mutation on IGFN1 (Additional file 4). Overall 52 common variant pairs were co-discovered for both INTERSNP and CASSI.
A follow-up case-only analysis by CASSI on the 82 cases genotyped, rendered approximately 4.4 × 10 5 overall tests after initial single marker test shrinkage similar to that described above. The most significant interaction (rs4433825 [16p13] -rs2295179 [20p12]) showed a case-only P = 3.35 × 10 −24 against W-Z case-control P = 1.5 × 10 −12 . At 5 × 10 −10 level we detected 352 variant pairs replicated in the discovery set with varying levels of significance (Additional file 5). The order of significance observed in the follow-up analysis is consistent with the literature of case-only studies bolstering higher detection power due to the inherent mathematical assumption although we decided not to interpret the results as evidence of functional relation between variants due to the difference in number of overlapping SNP pairs tested with the discovery set. As the single marker pre-selection criteria prunes significant number of SNPs, the observed overlaps are ensured to have presumably higher individual fixed effects which is devoid of the hypotheses (Ueki and Cordell 2012). Nonetheless it confirms viability of the case-control replication study with a larger genetically overlapping sample(s).
Next, we evaluated W-Z interactions by CASSI in the case-control replication set consisting 8.2 × 10 6 test pairs with an inflation factor of 1.0151 (Additional file 6). We were able to replicate 23 out of all 561 genome-wide significant variant pairs of the discovery set which are annotated to same chromosomal regions (Table 2, Fig. 1). The top interaction was found among variants annotated to TNC and CRYL1 corresponding to 9q33 and 13q12 (rs10118040 -rs7337130, W-Z P = 6.9 × 10 −11 and rs1330368 -rs7337231, W-Z P = 2.48 × 10 −8 , respectively). Among the 23 replications, 14 were unique regions and there were 5 regions with multiple unique interactions. Interestingly, SETBP1 and PREX1 interaction at 18q12 and 20q13 were represented by 6 SNP-SNP pairwise overlaps with LD coefficient of r 2 < 0.2 between SNPs belonging to corresponding regions. The locus at 20q13 has already been identified as a predisposing locus for MM and as an expression and methylation quantitative trait locus at PREX1 without affecting an active promoter site (Mitchell et al. 2016). SETBP1 is a well-established candidate gene harboring somatic mutation in various myeloid malignancies including secondary acute myeloid leukemia (sAML) and chronic myelomonocytic leukemia (CMML) (Makishima et al. 2013). Previously our group had identified 10 common variant risk loci for MGUS (Thomsen et al. 2017), among which two SNPs showed noteworthy interactions in our analysis: rs10251201 (7p21, GLCCI1) with rs1104869 (2p23, ALK), W-Z P = 8.75 × 10 −7 and rs16966921 (18q12, GALNT1) with rs8092870 (18q12, CDH2), W-Z P = 1.71 × 10 −7 . Although both the latter two SNPs are located at 18q12, they are not in LD ( r 2 < 0.2).

ErbB signaling and B cell receptor signaling are enriched based on genetic network analysis
A partnership dependence structure of functional network was constructed with the risk variants from the final overlapping set subject to identifiable annotation from the interaction analyses. Twenty-six such reconstituted genes were used as nodes which together with first order interacting genes created scaffolding for further enrichment analysis (Fig. 2). Thirty-six potentially differentially regulated pathways were identified (Additional file 7). Among them were 18 enriched pathways at 0.01 level of significance, with as many as 5 gene nodes downstream to KEGG ErbB signaling pathway (P = 7.09 × 10 −5 ) and 3 gene nodes downstream to KEGG B cell receptor signaling pathway (P = 5.32 × 10 −3 ) were found to be the two most significant pathways.
GSEA and pathway analysis confirms enrichment of ErbB cascade and detects pathways upstream to B cell receptor signaling in the MGUS GWAS Similar to genome wide-association and interaction studies, pathway enrichment for MGUS is still unexplored in literature due to obvious limitations regarding caveats in identification and inclusion of adequate MGUS cases. We identified 65 enriched pathways at 95% confidence (19 at 99%) employing PASCAL (Additional file 8).
For confirmation of our results we employed MAGENTA on the same set and detected 111 functionally enriched pathways significant at 95% level (22 at 99%) (Additional file 9). Although 28 overlapping pathways between the two algorithms used were discovered with a combined corrected P < 0.025 (Additional file 10), we wanted to extend our search introducing more detection power with curated microarray data. Further gene set enrichment analysis (GSEA) and pathway enrichment with DEPICT identified 99 pathways at the suggestive threshold of 10 −5 (4 at With a combined analysis throughout the three algorithms, we observe 9 pathways with varying levels of significance (Table 3). Among the overlapping pathways, KEGG allograft rejection pathway (combined P = 1.60 × 10 −6 ) and KEGG autoimmune thyroid disease pathway (combined P = 5.41 × 10 −6 ), both related to B cell receptor signaling pathway, were the two most significant ones. Thus we determine that the B-cell receptor signaling pathway and EGFR regulatory pathway are present in both the interaction and genome-wide association study-oriented pathway analyses with statistical significance. Fig. 2 Genetic Interaction network. A network of 26 identified genes annotated to risk loci with added predicted genes in interaction. All nodes represent first order interaction. Colored edges convey status of predicted network edge correspondingly cyan, curated database; magenta, experimentally determined; forest green, gene neighborhood; red, gene fusion; navy blue, gene co-occurrence; lawn green, text mining; black, co-expression; lavender indigo, protein homology. Node color signifies protein functionality. Additional nodes are considered based on prediction score ≥ 0.9 (for more details, refer to STRING data base)

Discussion
Here we investigated MGUS, the preliminary phase of the lymphoproliferative neoplasm MM, assuming biological evidence of genetic burden to be spread across the whole genome spectrum. Performing genome-wide interaction analysis with case/control and case-only data together with subsequent follow-ups, our design essentially narrows down brute-force search to rather sizable genomic regions. Extending from the methodology developed by Ueki and colleagues for systematic implementation of Wellek -Zeigler statistics in interaction tests (Ueki and Cordell 2012), we propose a workflow to integrate statistical findings with biological knowledge base. Streamlining detection of risk loci with enriched protein-protein interacting networks to discover differentially regulated novel pathways facilitates understanding of disease mechanisms and congregates statistical evidence with biologically interpretable information.
Rather than simple bi-allelic association estimated with a dichotomous logistic model, we employed Pearson's product-moment correlation coefficient r 2 for association test. In simple linear association models, loss in precision is surmountable subject to non-conformity to Hardy-Weinberg equilibrium, especially while estimating for variant pairs in high LD. Wellek and Zeigler established genotype-based estimator of pearsonian r to be unbiased and circumstantial loss in information for unphased genotypes to be considerably lower than the haplotype-based estimators (Wellek and Ziegler 2009). Hence, we used genotype-based W-Z statistic, which employs a variance stabilizing Fisher's z transformation and produces robust estimates with high convergence rate.
Literature recognizes statistical interaction and functional interaction among genes/proteins exclusively, as the former stands for genetic association and the latter signifies biological process dependency. Thus, translation of statistical evidence to interpretable genetic functional involvement is of utmost importance. Creating node-based protein networks with STRING enabled us to visualize statistical clusters of genes in inter-play among the interacting genomic loci and observe the enriched biological processes that confer to them relating small genetic hubs created by clusters of genes specific to pathways. Consequently, we executed gene-prioritization and enrichment analysis accumulating tests in the previously published MGUS GWAS data from three different algorithms, PASCAL, MAGENTA and DEPICT, that engage different repositories. Observed overlaps therefore served as strong evidence connecting interacting common variant loci to the pathways via genetic networks.
We detected KEGG allograft rejection (hsa05330) and autoimmune thyroid disease (hsa05320) pathways with highest combined power from all three GWAS enrichment algorithms. Conspicuously B cell receptor signaling which is enriched in our genetic network is also downstream to both of the pathways. This pathway is shown to be dependent on mitogen-activated protein kinase signaling (MAPK), phosphatidylinositol-3 kinase and protein kinase B signaling (PI3K-Akt), nuclear factor κB signaling (NF-κB) and calcium signaling pathways. It is also dependent on adhesion molecule induced mechanisms that mediate immune tolerance in T cell receptor signaling cascade. Detection of ErbB signaling (hsa04012) in genetic network enrichment is also supported by detection of EGFR downregulation pathway from our pathway analysis. We identified ErbB4 (HER4) to be a high-risk locus interacting with 15q22 and 9p23. ErbB4 is one of the four epidermal growth factor receptor (EGFR) family members with tyrosine kinase activity. In both cancerous and non-cancerous cells, EGFR plays a crucial role in controlling key cellular pathways influencing cell proliferation, differentiation and development through MAPK and PI3K/Akt pathways and overexpression of which is associated to multiple cancer types (Yarden and Sliwkowski 2001). PTPRD has been characterized as a tumor suppressor gene in MM and a homozygous deletion in PTPRD encoding locus is known to modulate phosphorylation of STAT3 that promotes IL6 signaling (Lohr et al. 2014). Whereas, RORɑ is a regulator of circadian clock, responsible for cytokine secretion especially interleukins (Paiva et al. n.d.). Contextually, with a mechanistic aggression, circulating MM tumor cells undergo circadian rhythm dependent selective egression. Circadian rhythm was one of the suggestively selected enriched pathways encompassing the genetic network (Additional file 7, p = 1.38 × 10 −2 ). Involvement of HER4 in cancer has yet not been comprehensively addressed, although it has been shown to be overexpressed in colorectal cancer and postulated to promote carcinogenesis in general (Lau et al. 2014;Williams et al. n.d.).
We find rs10251201, one of the previously identified risk loci for MGUS with moderate significance, annotated to 36 kb 5′ to GLCCT1 in interaction with rs1104869 mapping to an intronic region of an oncogene, anaplastic lymphoma receptor tyrosine kinase (ALK). Notably ALK amplification, mutations and especially chromosomal rearrangements have been found in several cancers (Chiarle et al. 2008). Another previously identified MGUS risk locus (rs16966921) annotated to GALNT1 was shown in our data to have moderately significant interaction with (rs8092870) annotated to cadherin 2 (CDH2), an adhesion molecule and downstream target of FGFR3 signaling pathway (Takehara et al. 2015). Cell adhesion is an integral part of cell surface interaction and is of two major types: cell-to-cell and cell-to-extracellular matrix (ECM). Cadherin family cell adhesion molecules play important roles in the formation and functions of cell-cell adhesions. CDH2 is associated with several neoplasias and has been reported to be overexpressed in MM co-existing with the t(4,14) translocation (Dring et al. 2004). Interestingly, we also identified a novel risk locus on another cadherin gene, CDH13 in an interaction with 9p22. CDH13 is not involved in cell-cell adhesion, but protects vascular endothelial cells from apoptosis due to oxidative stress and is found to be hypermethylated in myeloid leukemia, B-cell lymphomas among several other cancers (Andreeva and Kutuzov 2010).
At cell-ECM adhesions, the major transmembrane proteins are integrin heterodimers. Several integrins, including integrin-β1, −β7 and -α8 have been shown to play a role in MM cell adhesion, migration, invasion, bone marrow homing and drug resistance. We observed REACTOME integrin cell surface interactions pathway as a comparably significant hit from all three of our pathway analyses. Our assumption of interplay between cell adhesion and integrin pathways is also supported by the discovery of the REACTOME platelet aggregation (plug information) pathway, a crucial adhesion mechanism not only in normal hemostasis but also in pathophysiological processes such as inflammation, immune-mediated host defense and cancer metastasis. Integrin-αIIbβ3 plays a key role in platelet adhesion and aggregation (Ruggeri and Mendolicchio 2007).
Functional epistasis traditionally derives its appeal on the assumption of polygenic risk where genes in ensemble are supposed to accumulate larger deregulating impact than considered individually. Although direct aggregation of interaction test and enrichment analysis may seem tempting following a step-wise implementation procedure, caveats due to low statistical power render it unfeasible. Hence, a major limitation of this study is the low case numbers which mostly is due to the scarcity of identifiable individuals. MGUS being an asymptomatic condition, studies are dependent on indirect diagnosis in its entirety. We admit that detection power of our analysis is marginally compromised accounting for stringent selection criteria since pairwise interaction algorithms require large statistical power to avoid high false discovery rate and that may result subsequent rejection of false negative results. However, we have minimized this loss of information by analyzing gene set and pathway enrichment on GWAS summary statistics parallel to the interaction test. Fang et al. 2017 have recently proposed such a procedure on building pathway map based on linear interaction model. Although to make the later analysis viable, their study proposes selection of risk loci at a very nominal level of significance (Fang et al. 2017). Unfortunately, this rather permissive selection criterion makes their proposed BridGE algorithm unreliable as it may allow a large proportion of false positive findings. This calls for further careful inspection in the statistics to achieve robustness in inference.

Conclusion
In summary, we developed a method that unifies variant pair interaction with genetic networks and pathway enrichment. We also discovered evidence which supports that several signaling cascades including B cell receptor, epidermal growth receptor and cell adhesion related pathways play a role and are regulated via several interacting loci in development of MGUS that possibly explicates its further progression to MM.