Multi-Level Mining of Association Rules from Warehouse Schema

Authors

  • Muhammad Usman Shaheed Zulfikar Ali Bhutto Institute of Science and Technology
  • Muhammad Usman Shaheed Zulfikar Ali Bhutto Institute of Science and Technology

Keywords:

Association rule mining, data warehouses, interestingness measures, knowledge discovery, multidimensional schema.

Abstract

The integration of data mining techniques with data warehousing is becoming an interesting domain. The reason behind this popularity is the ability to extract knowledge from large data sets. However, in current available techniques a big emphasis is put on solutions where data mining plays a front end role to data warehousing for mining of data. Very little work is done, in order to apply data mining techniques in design of data warehouses. While techniques like data clustering have been implied on multidimensional data to enhance the knowledge discovery process still a number of issues remain unresolved related to the multidimensional schema design. These issues include the manual process of selection of important facts and dimensions in high dimensional data environment, an activity which is a challenging job for human designers where data is available in large volume having many related variables. Moreover, the interestingness measures used are specific to the transactional database. In this research we propose a technique to select a subset of informative dimensions and fact variables to start the mining process. This selection results in mining of association rules which are measured for interestingness using advanced diversity measures. Our experimental results after implementation of method on two real word data sets taken from UCI machine learning website show that the rules discovered from the schema that we generated was more diverse and informative as compared to the rules discovered from typical data mining process used on the original data without schema imposed on it. We compared the results with a similar approach and it showed prominent improvement for importance and diversity deviation.

Author Biographies

Muhammad Usman, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology

Assistant Professor, Department of Computing

Muhammad Usman, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology

Master Student, Department of Computer Science

References

Asuncion, A. & Newman, D.J. (2010). UCI machine learning

repository. http://archive.ics.uci.edu/ml.

Chung, S.M. & Mangamuri, M. (2005). Mining association rules

from the star schema on a parallel NCR teradata database system.

International Conference on Information Technology, Coding and

Computing (ITCC05). Dayton, OH, USA.

Goil, S. & Choudhary, A. (2001). Parismony: An infrastructure for

parallel multidimensional analysis and data mining. Journal of Parallel and Distributed Computing, 61:285321.

Han, J., Kamber, M. & Chiang, J. (1997). Mining multi-dimensional

association rules using data cubes. Technical report, Database Systems Research Laboratory, School of Science, Simon Fraser University, Burnaby, BC, Canada.

Kamber, M., Han, J. & Chiang, J.Y. (1997). Metarule-guided mining

of multi-dimensional association rules using data cubes. KDD. Burnaby, BC, Canada.

Messaoud, R.B., Rabasda, S.L., Boussaid, O. & Missaoui, R.

(2006). Enhanced mining of association rules from data cubes. DOLAP 06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP, New York, USA.

Messaoud, R.B., Loudcher, S., Missaoui, R. & Boussaid, O. (2007).

OLEMAR: an on-line environment for mining association rules in

multidimensional data, Advances in Data Warehousing and Mining,

IGI Global, Vol. 2, 2007, pp. 1-35. DOI: 10.4018/978-1-59904-960-1.ch001

Moses, D. & Deisy, C. (2015). A survey of data mining algorithms used in cardiovascular disease diagnosis from multi-lead ECG data. Kuwait Journal of Science, 42(2):206-235.

Ng, E.K.K., Fu, A.W.C. & Wang, K. (2002). Mining association rules

from stars. Proceedings of IEEE International Conference on Data

Mining IEED-ICDM, Maebashi City, Japan, pp. 322-329.

Psaila, G. & Lanzi, P.L. (2000). Hierarchy-based mining of association rules in data warehouses. Proceedings of the 2000 ACM symposium on Applied computing, Como, Italy.

Rosario, G.E., Rundensteiner, E.A., Brown, D.C., Ward, M.O. & Huang, S. (2004). Mapping nominal values to numbers for effective

visualization. Information Visualization, 3:80-95.

Schlimmer, J.C. (1985). Automobile dataset Retrieved 20 june, 2012, from http://archive.ics.uci.edu/ml/datasets/Automobile. University of California, Irvine, School of Information and Computer Sciences

Tjioe, H.C. & Taniar, D. (2005). Mining association rules in data

warehouses. International Journal of Data Warehousing and Mining,

(3):28-62.

Usman, M., Asghar, S. & Fong, S. (2009). Conceptual model

for combining enhanced OLAP and data mining systems. Fifth

international joint conference on INC, IMS and IDC 09, Seoul, Korea, pp. 19581963.

Usman, M., Pears, R. & Fong, S. (2013). Discovering diverse

association rules from multidimensional schema. Expert Systems with Applications, 40(15):5975-5996.

Zhen, L. & Minyi, G. (2001). A proposal of integrating data mining

and on-line analytical processing in data warehouse. Proceedings of the international conference on info-tech and info-net, Beijing, China, pp. 146151.

Downloads

Published

28-01-2017