Simple empirical models of classifying patients from microarray data
Keywords:genes, microarray data, disease classification, cancer, gene filtering
AbstractThere have been tremendous advances in bioinformatics in recent years. One of these is the use of microarrays for collecting Big Data. This paper reports on the work carried out by the author in devising models to classify patients by conducting microarray data analyses. The problem is to determine, for each patient, which class he/she belongs to. For example, one class may be ‘has the disease’ whilst the other class is ‘does not have the disease.’ Membership of a class can aid in giving a patient a prognosis. Often only a small number of genes are significantly affected by the presence of a disease and so it is possible to classify a patient by looking at this small number of genes. Two models for classifying patients from gene expression microarray data were developed. One model involves an existing algorithm whilst the other involves a new algorithm. The models involve some simple mathematical techniques – the two sample student’s t-test, Diagonal Linear Discriminant Analysis – and a newly developed technique which shall be called Multiplicative Probabilistic Discriminant Analysis. Each model has been implemented as a computer program. The research restricted itself to one dataset. Prior to using the models, the raw data must be pre-processed.
Anonymous. (2016) A Doctor Writes …: Deep Learning but no Deeper Understanding. Mathematics Today, 52(6):266.
Einbeck, J., Jackson, S. & Kasim, A. (2015) A Summer with Genes: Simple Disease Classification from Microarray Data. Mathematics Today, 51(4): 186-188.
Hall, M. (2016) Conquering Cancer. ITNOW, Sept.:40-41.
Jackson, S. E., Einbeck, J., Kasim, A. & Talloen, W. (2016) The Correlation Threshold as a Strategy for Gene Filtering, with Application to Irritable Bowel Syndrome and Breast Cancer Microarray Data. Reinvention, 9(2).
Mackintosh, M. (2017) Data is the Future of Healthcare. ITNOW, March:40-41.
NCBI. Series GSE2034. Retrieved from
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2034 (3/10/16) Last accessed on 13/6/17.
Oxley A. (2017) Designing a Course on Processing Big Data, 2017 e-Learnit Conference, Bahrain Polytechnic, Bahrain.
The Nemours Foundation. What is a Gene? Retrieved from
http://kidshealth.org/en/kids/what-is-gene.html (2017) Last accessed on 3/10/17.
Thomas, D. J. (2016) Computer-Aided Medicine Revolution. ITNOW, Dec.:40-41.
University of Utah, Genetic Science Learning Centre. Learn: Genetics. Retrieved from
http://learn.genetics.utah.edu/ (n.d.) Last accessed on 3/10/17.
Wang, Y., Klijn, J.G., Zhang, Y, Sieuwerts, A.M., Look, M.P. et al. (2005) Gene expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet, 365(9460):671-679.