Multi-Level Mining of Association Rules from Warehouse Schema
Keywords:
Association rule mining, data warehouses, interestingness measures, knowledge discovery, multidimensional schema.Abstract
The integration of data mining techniques with data warehousing is becoming an interesting domain. The reason behind this popularity is the ability to extract knowledge from large data sets. However, in current available techniques a big emphasis is put on solutions where data mining plays a front end role to data warehousing for mining of data. Very little work is done, in order to apply data mining techniques in design of data warehouses. While techniques like data clustering have been implied on multidimensional data to enhance the knowledge discovery process still a number of issues remain unresolved related to the multidimensional schema design. These issues include the manual process of selection of important facts and dimensions in high dimensional data environment, an activity which is a challenging job for human designers where data is available in large volume having many related variables. Moreover, the interestingness measures used are specific to the transactional database. In this research we propose a technique to select a subset of informative dimensions and fact variables to start the mining process. This selection results in mining of association rules which are measured for interestingness using advanced diversity measures. Our experimental results after implementation of method on two real word data sets taken from UCI machine learning website show that the rules discovered from the schema that we generated was more diverse and informative as compared to the rules discovered from typical data mining process used on the original data without schema imposed on it. We compared the results with a similar approach and it showed prominent improvement for importance and diversity deviation.References
Asuncion, A. & Newman, D.J. (2010). UCI machine learning
repository. http://archive.ics.uci.edu/ml.
Chung, S.M. & Mangamuri, M. (2005). Mining association rules
from the star schema on a parallel NCR teradata database system.
International Conference on Information Technology, Coding and
Computing (ITCC05). Dayton, OH, USA.
Goil, S. & Choudhary, A. (2001). Parismony: An infrastructure for
parallel multidimensional analysis and data mining. Journal of Parallel and Distributed Computing, 61:285321.
Han, J., Kamber, M. & Chiang, J. (1997). Mining multi-dimensional
association rules using data cubes. Technical report, Database Systems Research Laboratory, School of Science, Simon Fraser University, Burnaby, BC, Canada.
Kamber, M., Han, J. & Chiang, J.Y. (1997). Metarule-guided mining
of multi-dimensional association rules using data cubes. KDD. Burnaby, BC, Canada.
Messaoud, R.B., Rabasda, S.L., Boussaid, O. & Missaoui, R.
(2006). Enhanced mining of association rules from data cubes. DOLAP 06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP, New York, USA.
Messaoud, R.B., Loudcher, S., Missaoui, R. & Boussaid, O. (2007).
OLEMAR: an on-line environment for mining association rules in
multidimensional data, Advances in Data Warehousing and Mining,
IGI Global, Vol. 2, 2007, pp. 1-35. DOI: 10.4018/978-1-59904-960-1.ch001
Moses, D. & Deisy, C. (2015). A survey of data mining algorithms used in cardiovascular disease diagnosis from multi-lead ECG data. Kuwait Journal of Science, 42(2):206-235.
Ng, E.K.K., Fu, A.W.C. & Wang, K. (2002). Mining association rules
from stars. Proceedings of IEEE International Conference on Data
Mining IEED-ICDM, Maebashi City, Japan, pp. 322-329.
Psaila, G. & Lanzi, P.L. (2000). Hierarchy-based mining of association rules in data warehouses. Proceedings of the 2000 ACM symposium on Applied computing, Como, Italy.
Rosario, G.E., Rundensteiner, E.A., Brown, D.C., Ward, M.O. & Huang, S. (2004). Mapping nominal values to numbers for effective
visualization. Information Visualization, 3:80-95.
Schlimmer, J.C. (1985). Automobile dataset Retrieved 20 june, 2012, from http://archive.ics.uci.edu/ml/datasets/Automobile. University of California, Irvine, School of Information and Computer Sciences
Tjioe, H.C. & Taniar, D. (2005). Mining association rules in data
warehouses. International Journal of Data Warehousing and Mining,
(3):28-62.
Usman, M., Asghar, S. & Fong, S. (2009). Conceptual model
for combining enhanced OLAP and data mining systems. Fifth
international joint conference on INC, IMS and IDC 09, Seoul, Korea, pp. 19581963.
Usman, M., Pears, R. & Fong, S. (2013). Discovering diverse
association rules from multidimensional schema. Expert Systems with Applications, 40(15):5975-5996.
Zhen, L. & Minyi, G. (2001). A proposal of integrating data mining
and on-line analytical processing in data warehouse. Proceedings of the international conference on info-tech and info-net, Beijing, China, pp. 146151.