Improving the performance of Bayesian networks in non-ignorable missing data imputation
Keywords:
Bayesian networks, imputation, kullback-Leibler information, nonignorable mechanism, value of information analysisAbstract
The issue of missing data may arise for researchers who deal with data gathering problems. Bayesian networks are one of the proposed methods that have been recently used in missing data imputation. The main objective of this research is to improve the efficiency of the Bayesian networks in nonignorable missing imputation, by adding missing indicator nodes for incomplete variables and constructing an augmented Bayesian network. Also, to consider the effect of different kinds of missingness mechanism (ignorable and nonignorable) on the performance of imputation methods. Four methods of imputation: random overall hot-deck imputation, within-class random hot-deck imputation, imputation using Bayesian networks and imputation using presented augmented Bayesian networks are compared using two indices: (1) a distance function and (2)Minimum Kullback-Leibler index. Results indicate the high-quality of the methods based on Bayesian networks relative to other imputation methods.
References
Aussem A., Rodrigues de Morais S. 2010. A Conservative feature subset selection algorithm with missing data. Neurocomputing, 73: 585-590.
Buntine, W. L. 1994. Operations for learning with graphical models. Journal of Artificial Intelligence Research 2: 159-225.
Cowell, R. G., Dawid, A. P., Lauritzen, S. L. Spiegelhalter, D. J. 1999. Probabilistic networks and expert systems. Springer, New York.
Di Zio, M., Scanu, M., Coppola, L., Luzi, O. Ponti, A. 2004. Bayesian networks for imputation. Journal of Royal Statistical Society,A,Vol. 167(2): 309-322.
Di Zio, M., Sacco G., Scanu M. Vicard P. 2005. Multivariate techniques for imputation based on Bayesian networks. Neural Networkworld .4:.303-309.
Heckerman, D. 1996. A tutorial on learning with Bayesian networks. Microsoft Research Technical report, MSR-TR-95-06.
Jensen, F. V. 1996. An introduction to Bayesian networks, Springer-Verlag, New York.
Jensen, F. V. Jianming, L. 1995. dr-Hugin: A System for hypothesis driven data request. I Gammerman, A. (ed.) (red.), Probabilistic Reasoning and Bayesian Belief Networks.
Jensen, Finn V. Nielsen Thomas D. 2007. Bayesian networks and decision graphs, Springer, New York.
Kullback, S. Leibler, R.A. 1951. On information and sufficiency. Annals of Mathematical Statistics 22(1): 79--86.
Kullback, S. 1959. Information theory and statistics, JohnWiley and Sons, New York.
Kullback, S. 1987. Letter to the Editor: The Kullback-Leibler distance. The American Statistician 41(4):.340--341.
Lauritzen, S. 1995. The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis 19 191-201.
Lauritzen, S. L. Spiegelhalter, D. J. 1988. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society, Series B 50:157-224.
Lin J. Haug P. 2008. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. Journal of Biomedical Informatics 41: 1-14.
Pearl, J. 1988. Probabilistic reasoning in intelligence systems, Morgan Kaufmann, San Mateo, California.
Ramoni, M. Sebastiani P. 2001. Robust learning with missing data. Machine Learning 45:147-170.
Rodrigues de Morais, S. Aussem, A. 2009. Exploiting data missingness in Bayesian network modeling. In the proceedings of 8th International Symposium on Intelligent Data Analysis (IDA 2009), LNCS No5772, Springer-Verlag, Lyon, France 35-46.
Rubin, D. B. 1976. Inference and missing data. Biometrika, 63: 581-592.
Spirtes, P., Glymour, C. Scheines, R. 1993. Causation prediction and search, Springer-verlag, New York.
Thibadeau, Y. Winkler, W. E. 2002. Bayesian networks representations, generalizied imputation and synthetic micro-data satisfying analytic constraints'', Technical report RRS2002/9, U.S. Bureau of the Census.