Estimation of risk factors associated with colorectal cancer: an application of knowledge discovery in databases


  • Feyza Firat
  • Ahmet Kadir Arslan Inonu University, Department of Biostatistics and Medical Informatics.
  • Cemil Colak
  • Hakan Harputluoglu


Artificial neural networks, colorectal cancer, knowledge discovery in databases, risk factors.


Colorectal cancer is one of the first reasons for death due to cancer in the world.The goal of this study is to predict important risk factors of colorectal cancer (CRC)by knowledge discovery in databases (KDD) methods. This study comprised aretrospective CRC data of patients who had been diagnosed with colorectal cancer. Theselected records between 1 January 2010 and 1 March 2014 were collected randomlyfrom Turgut Ozal Medical Centre databases. The study included 160 individuals: 80patients admitted to Department of Oncology and diagnosed with CRC, and 80 controlsubjects with non-CRC categorization. The groups were matched for age and gender.We mined retrospective CRC data from large integrated health systems with electronichealth records. Specific demographical and clinical variables including calcium,hemoglobin, white blood cells, platelets, potassium, sodium, glucose, creatinine andtotal bilirubin were used in multilayer perceptron (MLP) artificial neural networks(ANN) modeling. In this study, patient and control groups consist of 160 individuals.In each group, 45 of these (56.3%) are male, and 35 (43.7%) are women. Mean ageof CRC patients and control groups is 58.6±13.0. While the accuracy was 71.31%in training dataset (n=122), the accuracy was 81.82% in testing dataset. Area undercurve (AUC) values of training and testing datasets were 0.73 and 0.81, respectively.The suggested MLP ANN model identified significant factors of calcium, creatinine,potassium, platelets, sodium, hemoglobin and total bilirubin. Taken together, thesuggested MLP ANN model might be used for the estimation of risk factors associatedwith CRC as an application of medical KDD.


Al-Saeed, E.F., Tunio, M.A., Al-Obaid, O., Abdulla, M. & Al-Anazi, A. et al. (2014) Correlation of

pretreatment hemoglobin and platelet counts with clinicopathological features in colorectal cancer

in Saudi population, Saudi Journal of Gastroenterology, 20(2):134-138.

Andsoy, II. & Gul, A. (2014) Breast, cervix and colorectal cancer knowledge among nurses in Turkey,

Asian Pacific Organization for Cancer Prevention, 15(5):2267-2272.

Barakat, H., Nigm, E. & Khaled, O. (2014) Statistical modeling of extremes under linear and power

normalizations with applications to air pollution, Kuwait Journal of Science, 41 (1):1-19.

Baranyai, Z., Krzystanek, M., Josa, V., Dede, K. & Agoston, E. et al. (2014) The comparison of

thrombocytosis and platelet-lymphocyte ratio as potential prognostic markers in colorectal cancer,

Thrombosis and Haemostasis, 111(3):483-490.

Bottaci, L., Drew, P.J., Hartley, J.E., Hadfield, M.B. & Farouk, R. et al. (1997) Artificial neural

networks applied to outcome prediction for colorectal cancer patients in separate institutions, The

Lancet, 350(9076):469-472.

Celik, G., Baykan, O.K., Kara, Y. & Tireli, H. (2014) Predicting 10-day mortality in patients with strokes

using neural networks and multivariate statistical methods, Journal of Stroke and Cerebrovascular

Diseases, 23(6):1506-1512.

Chen, D., Huang, J.F., Liu, K., Zhang, L.Q. & Yang, Z. et al. (2014) BRAFV600E mutation and its

association with clinicopathological features of colorectal cancer: a systematic review and metaanalysis,

PLoS One, 9(3):e90607.

Djuric, Z., Ruffin, M.T.T., Rapai, M.E., Cornellier, M.L. & Ren, J. et al. (2012) A Mediterranean

dietary intervention in persons at high risk of colon cancer: recruitment and retention to an intensive

study requiring biopsies, Contemporary Clinical Trials, 33(5):881-888.

Doubeni, C.A., Major, J.M., Laiyemo, A.O., Schootman, M. & Zauber, A.G. et al. (2012) Contribution

of behavioral risk factors and obesity to socioeconomic differences in colorectal cancer incidence,

Journal of the National Cancer Institute, 104(18):1353-1362.

Dovizio, M., Alberti, S., Guillem-Llobat, P. & Patrignani, P. (2014) Role of platelets in inflammation

and cancer: novel therapeutic strategies, Basic & Clinical Pharmacology & Toxicology, 114(1):118-

Durko, L. & Malecka-Panas, E. (2014) Lifestyle modifications and colorectal cancer, Current Colorectal

Cancer Reports, 10:45-54.

Erlinger, T.P., Muntner, P. & Helzlsouer, K.J. (2004) WBC count and the risk of cancer mortality in a

national sample of U.S. adults: results from the Second National Health and Nutrition Examination

Survey mortality study, Cancer Epidemiology, Biomarkers & Prevention, 13(6):1052-1056.

Farazi, P.A. (2014) Cancer trends and risk factors in Cyprus, Ecancermedicalscience, 8:389.

Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996) From data mining to knowledge discovery in

databases, Artificial Intelligence Magazine, 17(3):37.

Fleming, M., Ravula, S., Tatishchev, S.F. & Wang, H.L. (2012) Colorectal carcinoma: Pathologic

aspects, Journal of Gastrointestinal Oncology, 3(3):153-173.

Galas, A., Augustyniak, M. & Sochacka-Tatara, E. (2013) Does dietary calcium interact with dietary

fiber against colorectal cancer? A case-control study in Central Europe, Nutrition Journal, 12:134.

Gervilla Garcia, E., Jimenez Lopez, R., Montano Moreno, J.J., Sese Abad, A. & Cajal Blasco, B. et

al. (2009) The methodology of Data Mining. An application to alcohol consumption in teenagers,

Adicciones, 21(1):65-80.

Holsheimer, M. & Siebes, A. (1994) Data mining: The search for knowledge in databases. CWI


Kajzrlikova, I.M., Vitek, P., Chalupa, J. & Dite, P. (2014) Dietary habits of colorectal neoplasia patients

in comparison to their first-degree relatives, World Journal of Gastroenterology, 20(17):5025-

Lee, C.K., Kim, Y.W., Shim, J.J. & Jang, J.Y. (2013) Prevalence of proximal serrated polyps and

conventional adenomas in an asymptomatic average-risk screening population, Gut and Liver,


Mogoanta, S.S., Vasile, I., Totolici, B., Neamtu, C. & Streba, L. et al. (2014) Colorectal cancer - clinical

and morphological aspects, Romanian Journal of Morphology and Embryology, 55(1):103-110.

Pericleous, M., Mandair, D. & Caplin, M.E. (2013) Diet and supplements and their impact on colorectal

cancer, Journal of Gastrointestinal Oncology, 4(4):409-423.

Tailor, D., Hahm, E.R., Kale, R.K., Singh, S.V. & Singh, R.P. (2014) Sodium butyrate induces DRP1-

mediated mitochondrial fusion and apoptosis in human colorectal cancer cells, Mitochondrion,


Takachi, R., Inoue, M., Shimazu, T., Sasazuki, S. & Ishihara, J. et al. (2010) Consumption of sodium

and salted foods in relation to cancer and cardiovascular disease: the Japan Public Health Centerbased

Prospective Study, The American Journal of Clinical Nutrition, 91(2):456-464.

Tammana, V.S. & Laiyemo, A.O. (2014) Colorectal cancer disparities: issues, controversies and solutions,

World Journal of Gastroenterology, 20(4):869-876.

Templeton, A.J., Ace, O., McNamara, M.G., Al-Mubarak, M. & Vera-Badillo, F.E. et al. (2014)

Prognostic role of platelet to lymphocyte ratio in solid tumors: a systematic review and metaanalysis,

Cancer Epidemiology, Biomarkers & Prevention, 23(7):1204-1212.

Tsugane, S. 2005. Salt, salted food intake, and risk of gastric cancer: epidemiologic evidence, Cancer

Science, 96(1):1-6.

Wallner, M., Antl, N., Rittmannsberger, B., Schreidl, S. & Najafi, K. et al. (2013) Anti-genotoxic

potential of bilirubin in vivo: damage to DNA in hyperbilirubinemic human and animal models,

Cancer Prevention Research (Philadelphia, Pa.), 6(10):1056-1063.

Winawer, S.J. & Zauber, A.G. (2002) The advanced adenoma as the primary target of screening,

Gastrointestinal Endoscopy Clinics of North America, 12(1):1-9, v.

Wong, M.L., Lam, W., Leung, K.S., Ngan, P.S. & Cheng, J.C. (2000) Discovering knowledge from

medical databases using evolutionary algorithms, IEEE Engineering in Medicine and Biology

Magazine, 19(4):45-55.

Youmans, L., Taylor, C., Shin, E., Harrell, A. & Ellis, A.E. et al. (2012) Frequent alteration of the tumor

suppressor gene APC in sporadic canine colorectal tumors, PLoS One, 7(12):e50813.

Zeng, H., Lazarova, D.L. & Bordonaro, M. (2014) Mechanisms linking dietary fiber, gut microbiota and

colon cancer prevention, World Journal of Gastrointestinal Oncology, 6(2):41-51.