Key genes and regulatory networks involved in the initiation, progression and invasion of colorectal cancer

Aim: Until now, identification of drug targets for treatment of patients with specific stages of colorectal cancer (CRC) has remained a challenging field of research. Herein, we aimed to identify the key genes and regulatory networks involved in each stage of CRC. Results: The results of gene expression profiles were integrated with protein–protein interaction networks, and topologically analyzed. The most important regulatory genes (e.g., CDK1, UBC, ESR1 and ATXN1) and signaling pathways (e.g., Wnt, MAPK and JAK-STAT) in CRC initiation, progression and metastasis were identified. In vitro analysis confirmed some in silico findings. Conclusion: Our study introduces functional hub genes, subnetworks, prioritizes signaling pathways and novel biomarkers in CRC that may guide further development of targeted therapy programs.

a comparative proteome analysis of primary CRC cell lines HCT-116 and its metastatic derivative E1 [6]. Most recently, we have identified HOXB family as the most significant functional modules in metastatic CRC samples using a network biology approach. Consequently, we proposed a rationally designed anti-HOXB7 peptide for preventing CRC metastasis [7]. Despite, these valuable studies, the benign-to-malignant transition of CRC is a complicated multistep process and needs to be further elucidated in greater details.
Nowadays, computational systems biology has become a focus of many researchers to decipher complex biological systems through integrating the experimental and computational approaches [8]. Among the currently available methods, reconstruction of protein-protein interaction (PPI) networks are widely used for identification of key genes and functional modules involved in the cancer initiation and progression [9,10]. There are several networkbased methodologies for modeling of biological systems that result in an identification of novel biomarkers in the human cancers and complex diseases [11]. To the best of our knowledge, no attempt has yet been made to monitor the dynamic changes of regulatory networks from normal to distant metastasis of the patients with CRC. Accordingly, no effective biomarkers have been introduced for detecting the early events of CRC metastasis. Hence, we assessed the gene expression changes of normal versus stage I (N-S I ), stage I versus stage II (S I -S II ), stage II versus stage III (S II -S III ) and stage III versus stage IV (S III -S IV ) of CRC cells to propose a biological model for CRC initiation, progression and invasion to other tissues. Additionally, the overlapping mechanisms between four studied CRC stages were elaborated in more details The observations uncovered that several common genes and subnetworks seem to be associated with CRC initiation and metastasis. These findings would help to gain insights into the major contributors involved in the regulation of CRC initiation, progression and invasion. Our study also provides novel potential biomarkers which can be considered to be therapeutic candidates for designing novel drugs with anti-CRC activity.

Materials & methods
Microarray gene expression data A gene expression dataset (Gene Expression Omnibus [GEO] accession number: GSE21510) consisting of total 148 microarray samples were chosen from (GEO; www.ncbi.nlm.nih.gov/geo/). The samples had been accumulated from LCM and homogenized tissues of patients with colorectal cancer. Subsequently, the samples separately normalized using robust multiarray average method under R 2.6.2 statistical software with affy package from BioConductor. There was only one sample for stage 0, therefore, we excluded this stage from the study. Differentially expressed genes (DEGs) were determined by using the GEO2R tool [12]. Fold change > 1 or < −1 and pvalue < 0.05 are the thresholds used in DEGs determination.

PPI networks
A comprehensive PPI was constructed for each stage using BisoGenet plugin of Cytoscape [13]. This plugin is a multitier application that builds the PPI based on the biomolecular relationships data curated from several PPI databases, including Database of Interacting Proteins, Biological General Repository for Interaction Datasets, Human Protein Reference Database, Biomolecular Interaction Network Database, Molecular INTeraction database (MINT) and IntAct [14][15][16][17][18][19]. In parallel, an intensive literature search was conducted to retrieve new PPIs from the data of most recent studies. The PPI networks were visualized and analyzed by Cytoscape and FunRich packages [20].

Identification of hub genes
Topological properties of the PPI networks were interpreted to detect the paramount functional hub genes. We used Network Analyzer, a network analysis plug-in of Cytoscape, to calculate the different topological parameters of resulted PPI networks. To this end, nine measures including Number of Undirected Edges, Degree, Betweenness Centrality, Clustering Coefficient, Closeness Centrality, Eccentricity, Neighborhood Connectivity, Topological Coefficient and Average Shortest Path Length were calculated for each PPI network.
Gene ontology (GO) & signaling pathway enrichment Gene ontology (GO) studies were carried out utilizing Biological Networks Gene Ontology tool, a flexible and extendable Cytoscape's plugin [21]. This plugin finds GO terms over-represented in the given biological networks. Enrichr server was used as an alternative tool for validating the results of Biological Networks Gene Ontology tool [22]. A p-value less than 0.05 was chosen as the cut-off criterion for statistically over-represented GO terms. A signaling pathway enrichment was performed using SPEED web tool to identify the paramount signaling pathways underlying CRC initiation and invasion [23]. This server is a signaling pathway annotation tool that performs pathway enrichments based on the data obtained from previous pathway perturbation experiments.In parallel, the signaling pathways corresponding to the DEGs were also checked with the Kyoto Encyclopedia of Genes and Genomes [24]. This database consists of throughput information about genomic, cellular pathways and chemical compounds.
Construction of DEG-GO network for S IV of CRC After determining DEGs of stage IV gene expression profile compared with the normal samples, a DEG-GO network was constructed to distinguish the major regulators of significantly over-represented GO terms. First, the up-regulated DEGs were separated and their biological roles were determined using DAVID server to understand the activated molecular mechanisms and signaling pathways in metastatic CRC cells compared with their adjacent normal tissue. The GO terms with p-value < 0.05 containing at least ten genes were considered to be significantly over-represented categories. Finally, a DEG-GO network was constructed and topological analysis of the network was performed using out-degree (the number of outgoing edges) and in-degree (the number of incoming edges) measures.

Identification of common regulatory networks between CRC stages
Stages 1-4 of CRC may share several common genes or modules and identification of these similarities can be helpful in CRC therapy programs. To this end, all DEGs obtained from the comparison of four stages with their adjacent normal tissues (with fold change >1 and <−1 and p-value < 0.05) were clustered using Venny 2.1 web tool [25]. The main subnetwork for both common up and down-regulated DEGs were separately extracted and interpreted. In parallel, GO analysis of common DEGs was conducted to understand the regulatory functions of these genes.

Cell culture, RNA extraction & real-time PCR
In order to validate the in silico findings, the expression level ATXN1 and CDK1, identified as hub genes in S IV of CRC, was quantified in human metastatic colon carcinoma cell line (SW620 cells) compared with NCM460, a normal human colon mucosal epithelial cell line. Briefly, SW620 and NCM460 cell lines were seeded in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin. Total RNA was extracted from SW620 and NCM460 cell lines (15-30 mg) using the miRNeasy Kit according to the manufacturer's instructions (Qiagen, USA). Subsequently, cDNA synthesis was performed utilizing a commercial cDNA synthesis kit (2-step RT-PCR kit, Vivantis, USA). The standard SYBR green-based quantitative real-time PCR (qRT-PCR) analyses were carried out in Real-time PCR Detection System ABI 7300 (Applied Biosystems, USA). Primer sequences of ATXN1, CDK1 and GAPDH, as internal control, are as follows: ATXN1-F (5 CCCACCTTTCCCCTGCTGCG-3 ), ATXN1-R (5 -GGTTCCTCCCCCGGGTCTCC-3 ), CDK1-F (5 -GGAAACCAGGAAGCCTAGCA-3 ), CDK1-R (5 -TGATTCAGTGCCATTTTGCC-3 ), GAPDH-F (5 -TCCACCACCCTGTTGCTGTAG-3 ) and GAPDH-R (5 -GGTTCCTCCCCCGGGTCTCC-3 ). PCR cycling conditions were conducted in a total volume of 20 μl and containing 10 pmol of each primer and 2 μl of diluted cDNA template (800 ng cDNA). The thermal cycler conditions were composed of an initial step at 95 • C for 5 min followed by 45 cycles of 95 • C for 15 s, 60 • C for 1 min and 72 • C for 30 s and a step of 82 • C for 5 s. To generate standard curves, the tenfold serial dilutions covering a range of the eluted cDNA was considered as a template in SYBR green qRT-PCR.

Statistical analysis
Statistical analysis of in vitro experiment was done using GraphPad Prism version 4 (GraphPad Software, CA, USA). P-values were calculated by two-sided Student's t-test and the values below 0.05 were considered to be significant. PPI networks (data not shown). The main subnetworks were extracted from each PPI network which is illustrated in Supplementary Figure 1. CDK1, UBC, ESR1 and ATXN1 genes had the highest degree of connectivity in the main subnetworks of S I , S II , S III and S IV , respectively.

PPI networks properties
Hub genes of each PPI network Topological analysis of the PPI networks revealed several hub genes as major regulators of CRC initiation, progression and invasion. The results of all topological measures had a relatively high overlap. Interestingly, many of cell cyclerelated genes were identified as hub genes in their corresponding PPI networks. The top ten hub genes derived from interpretation of each PPI network are listed in Table 1.

Cell cycle genes are remarkably involved in CRC initiation & progression
The results of GO demonstrated that, except for S II , the cell cycle pathway is remarkably involved in CRC initiation as well as its progression. The UBI-proteasome pathway, an intracellular process involved in protein degradation, was identified as the most significant enriched term for the PPI network which made from the DEGs of S II . Moreover, signal transduction seems to be critical in the regulation of gene expression in S II and S III of CRC ( Figure 1). Signaling pathway enrichment was carried out utilizing SPEED and Kyoto Encyclopedia of Genes and Genomes tools. In this step, only up-regulated DEGs were considered for pathway enrichment analyses to decipher activated signaling pathways during CRC progression. The results showed that the Wnt signaling pathway was the most significant signaling pathway in S I -S III implying the indispensable role in initiation and progression of CRC from S I to S III . However, this signaling pathway was not significantly enriched in S IV suggesting that the metastasis of CRC is not considerably dependent on Wnt signaling pathway activity. The MAPK signaling pathway was shown to be especially active in S I and S III . In contrast, the JAK-STAT signaling pathway was substantially enriched in the DEGs of S II and S IV (Figure 2).

The S I -S IV of CRC share a high number of common genes
Clustering analysis of the DEGs obtained from each CRC stage compared to the respective normal tissues revealed that the four studied CRC stages (S I -S IV ) share a notable number of overlapped genes involved in multiple cellular processes. Prior to clustering, up-regulated and down-regulated DEGs were determined and separated. The results of clustering identified 817 (24.2%) common up-regulated DEGs and 490 (19%) common down-regulated DEGs in all studied CRC stages. Nevertheless, the highest proportion of common genes was observed between S I , S II and S III (1274 genes among up-regulated DEGs and 774 genes among down-regulated DEGs) ( Figure 5).  Functional annotation of these common genes indicated that the majority of common up-regulated genes were actively involved in regulation of cell cycle (94 genes) and mitosis (57 genes). However, several other pathways such as regulation of RNA processing, DNA repair and protein metabolism were also significantly enriched. In contrast, the common down-regulated DEGs obviously participated in the regulation of apoptosis and intracellular signaling cascade (results not shown). Detailed analysis of these genes indicated that 21 genes (out of 94 genes) have a direct role in cell cycle pathway ( Figure 6).
The main subnetwork was extracted from the PPI network constructed for common up-regulated DEGs and is depicted in Supplementary Figure 2. Calculation of several centrality measures such as Betweenness centrality and degree showed that CAND1 gene is the most significant hub gene in the subnetwork. However, some other genes including CDK2, EED, BRCA1, CDK1, BARD1, SRPK1, NPM1, PCNA, SRSF2, EZH2, HNRNPA1, HIST1H4C and HNRNPU were also distinguished as important hub genes.
The enriched cellular functions for the SI-SIV common genes were examined. Our observations revealed that cell proliferation and apoptosis are the most significant functional categories enriched in common up-regulated and down-regulated genes, respectively. We further compared the expression levels of these genes during S I -S IV of CRC. The results indicated that cell proliferation genes had the highest expression levels in S I and S IV . However, apoptosis-related genes are down-regulated in S III of CRC. ANLN and CDK1 genes had the highest overexpression level in all CRC stages compared with adjacent normal cells, whereas STAT1 was the most down-regulated gene in all CRC stages (Supplementary Figure 3).

ATXN1 &CDK1 have significantly deregulated in metastatic SW620 cells
The expression levels of ATXN1 and CDK1 were normalized to the level of the GAPDH housekeeping gene which showed the most homogenous expression in both SW620 and NCM460 cell lines. Consistent with our bioinformatics analyses, the results of real-time PCR indicated that the expression of ATXN1 (down-regulated, p-value < 0.001) and CDK1 (up-regulated, p-value < 0.05) have significantly altered in metastatic SW620 cells compared with the normal colon NCM460 cells (Figure 7). This in vitro validation suggests that the introduced novel biomarkers of CRC can be reliably considered to be promising candidates for therapeutic applications.

Discussion
Our study was aimed at modeling CRC initiation and invasion using deep analyses of microarray data. The results of this in silico study propose several potential candidates for the treatment of CRC as well as early diagnosis of this cancer. Consistent with previous studies, we found numerous important regulatory genes and transcription factors associated with CRC progression. These findings provide support for the efficiency of systems biology approaches in the identification of novel CRC biomarkers.
We found out that CDK1, UBC, ESR1 and ATXN1 are the most important hub genes of S I , S II , S III and S IV of CRC, respectively. Among these hub genes, CDK1 and UBC were up-regulated in CRC samples compared with adjacent normal cells, while ESR1 and ATXN1 were down-regulated. CDK1 is a member of Cyclin-dependent kinases (CDKs) family, which are serine/threonine kinases whose deregulation is associated with cancer progression. Hence, these cell cycle regulators have been targeted in a variety of human cancers [27]. Studies have shown that inhibition of CDK1 enhances the sensitivity of CRC cells to anticancer compounds through caspase-dependent signaling pathways [28,29]. The UBC, also called second ubiquitin-conjugating enzyme or E2, is a member of ubiquitin-proteasome degradation system which contributes to the regulation of cell proliferation by mediating the selective degradation of target proteins. This enzyme also catalyzes a particular type of polyubiquitination, 'Lys-63'-linked polyubiquitination, which does not result in protein degradation and leads to an enhanced cell survival through regulating cell cycle and DNA repair pathways [30,31]. Therefore, up-regulation of UBC may enhance the survival of CRC cell through maintaining the cell cycle and DNA repair abilities of the cell. ESR1 is a nuclear hormone receptor that encodes estrogen receptor α involved in the regulation eukaryotic cell proliferation and differentiation. Although, ESR1 is frequently amplified in breast cancer cells [32], methylation-dependent silencing of this gene by CGI (CpG island) has been previously reported [33]. CGI regions are unmethylated short and dispersed sequences of DNA with a high frequency of CpG dinucleotides which are highly associated with the bulk genome. Furthermore, evaluation of the ESR1 methylation status uncovered that significant methylation of this gene occurred in patients with CRC and is associated with markedly diminished ESR1 expression in colorectal tumors examined [34]. This suggests the possibility of tumor suppressor activity for ESR1 in CRC. Accordingly, estrogens might reduce CRC development via preventing the DNA methylation of ESR1 [35]. ATXN1 is a chromatinbinding factor that functions as a corepressor involved in transcriptional repression by directly repressing CBF1, a transcription factor that is crucial for the Notch signaling pathway [36]. It is now widely accepted that Notch pathway is substantially required for initiation of CRC as well as maintenance and self-renewal of CRC initiating cells [37]. Furthermore, Sonoshita et al. found that activation of DAB1/Dab1, a Notch signaling target gene, can promote CRC invasion and metastasis [38]. Kim and colleagues have found a negative crosstalk between Notch and Wnt/β-catenin signaling pathways [39]. Interestingly, the signaling pathway enrichment revealed that Wnt signaling is actively involved in regulation of S I , S II and S III of CRC, but the activity of this signaling pathway has significantly diminished during S IV .
Wnt signaling pathway was found to be a key player in the initiation and progression on CRC, but not in metastasis stage. It is now well understood that Wnt signaling is a critical event for initiating the genesis of most CRCs, especially through loss of its negative regulator, Adenomatous Polyposis Coli [40]. Moreover, both β-catenin-dependent and β-catenin-independent Wnt signaling enhance CRC progression via regulating epithelial-mesenchymal transition [41]. For decades, components of this signaling pathway served as interesting targets in human CRC. In addition to the importance of Wnt signaling pathway, we found a negative correlation between JAK-STAT and MAPK signaling pathways during CRC progression suggesting a dual regulatory role of JAK-STAT/MAPK cascades in CRC progression and invasion. Wu and colleagues reported that cooperation of PAR2/MAPKs/NF-κB signal transduction pathways promotes proliferation and migration of colon cancer cell line SW620 [42]. The crosstalk between these signaling pathways is complex and each pathway can regulate the activation of the other at multiple levels. Consistent with this, Krasilnikov and colleagues have found that PI3K and MAPK signaling down-regulate JAK-STAT signaling in human melanoma cells via inhibiting tyrosine phosphorylation of JAK/STATs [43]. However, STAT1 was identified as the most down-regulated gene in all CRC stages. In our previous study, this gene was also found to be down-regulated in metastatic CRC samples compared with the nonmetastatic samples. Modular analyses of CRC metastasis-derived PPI network corroborated that STAT1 functions in one of the most important protein complexes correlated with CRC metastasis [7]. Considering the role of MAPK and JAK-STAT signaling pathways in the regulation of cell proliferation, we suggest that down-regulation of each pathway is recouped by activation of the other pathway. This mechanism of regulation may be indispensable for providing the favorable microenvironment for CRC development [44,45].
Interestingly, the clustering analyses demonstrated a remarkable overlap between all studied CRC stages. Functional annotation of common DEGs revealed that majority of common up-and down-regulated genes are obviously contributed to the regulation of cell proliferation and apoptosis, respectively. CAND1, CDK2, EED, BRCA1 and BARD1 were identified as the most important hub genes up-regulated in all CRC stages. Several evidence have indicated the disruption of CAND1 in prostate cancer [46]. Nevertheless, the function of this gene in CRC has poorly been investigated. CDK2 is a serine/threonine kinase which its activity is dispensable for controlling cell cycle and mitosis [47]. Inhibition of CDK2 in CRC is substantially associated with enhanced cell death and suppressed cell proliferation implying its crucial function in CRC development [48,49]. Consistent with our results, EED and BARD1 have been proposed as promising prognostic markers in CRC and their expression correlate with CRC progression and aggressive clinical behavior [50,51]. BARD1 is an E3 ubiquitin-protein ligase which acts as a heterodimer with BRCA1 (BRCA1-BARD1). This protein heterodimer mediates the formation of 'Lys-6'-linked polyubiquitin chains, and subsequently regulates a wide range of intracellular pathways such as DNA repair, regulation of protein degradation, transcription and genomic stability [52,53].

Conclusion
Our findings suggest a dynamic model of CRC initiation, progression and invasion. The results were highly compatible with previous studies. Moreover, several novel biomarkers were also proposed for each stage of CRC. The cell cycle and signal transduction genes were apparently involved in regulation of CRC stages. Moreover, Wnt, MAPK and JAK-STAT signaling pathways seemed to be crucial in the regulation of CRC stages. However, the results of this study need to be investigated in greater details by further experimental methods.

Future perspective
Despite decades of investigations, clinicians are still faced with numerous challenges in the diagnosis and treatment of patients with a specific stage of CRC. This might be due to deregulation of particular gene signatures in each CRC stage. Thus, identification of drug targets based on molecular mechanisms of each CRC stage can be very helpful in targeted therapy of CRC. In this study, we separately analyzed gene expression profiles of each CRC stage (i.e., S I -S IV ) and integrated the results of gene expression analysis with PPI networks. Several genes and regulatory networks were found in each CRC stage. Cell cycle and signal transduction genes showed critical roles in early stages of CRC. Intriguingly, a crosstalk between MAPK and JAK-STAT signaling pathways during CRC progression was also predicted in silico. Although, a part of in silico findings was validated in vitro, there is a need to further validate this study's results. Moreover, parallel analysis of other datasets (e.g., microarray or next generation sequencing data) would increase the reliability of these predictions.

Background
• Rapid and robust screening for biomarkers in each stage of colorectal cancer (CRC) may result in early diagnosis and reliable treatment programs.
• Computational integration of gene expression profiles with protein-protein interaction networks have shown robustness in identifying key target genes and regulatory networks in human cancers. Experiment • Microarray dataset GSE21510 was obtained from gene expression omnibus database and analyzed with GEO2R.
• Cytoscape and FunRich packages were used for construction and analyses of protein-protein interaction networks.
• BiNGO, SPEED and DAVID tools were used for functional enrichment of differentially expressed genes.

Results & discussion
• CDK1, UBC, ESR1 and ATXN1 genes were detected as the most significant hub genes of S I , S II , S III and S IV of CRC, respectively.
• Cell cycle genes were highly involved in S I -S III of CRC suggesting their crucial role in CRC initiation and progression.
• Wnt signaling pathway was particularly enriched in S I -S III , but not in S IV . Whereas, MAPK and JAK-STAT signaling pathways had a negative correlation during CRC stages.
• A high number of common genes were identified between S I , S II and S III implying their close biological properties.
• Some in silico findings were validated in vitro implicating that the results of this study can be used in CRC diagnosis and prevention programs.