Postdiffset: an Eclat-like algorithm for frequent itemset mining

  • Authors

    • W A.W.A. Bakar
    • M A. Jalil
    • M Man
    • Z Abdullah
    • F Mohd
    2018-05-16
    https://doi.org/10.14419/ijet.v7i2.28.12911
  • Association rule mining, data mining, eclat algorithm, frequent itemset, vertical data format.
  • Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns.  Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining.

     

  • References

    1. [1] Agrawal R, & Srikant R. “Fast algorithms for mining association rulesâ€, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Vol.12, No.15, (1994), pp:487–499.

      [2] Agrawal R, Imielinski T, & Swami A (1993), Mining association rules between sets of items in large databases. ACM SIGMOD Record, 22(2), 207–216.

      [3] Han J, Pei J, & Yin Y (2000), Mining frequent patterns without candidate generation. ACM SIGMOD Record, 29(2), 1–12.

      [4] Zaki MJ, Parthasarathy S, Ogihara M, Li W, et al. “New algorithms for fast discovery of association rulesâ€, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’97), (1997), pp:283–286.

      [5] Zaki MJ, & Gouda K. “Fast vertical mining using diffsetsâ€, Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2003), pp:326–335.

      [6] Han J, Cheng H, Xin D, & Yan X (2007), Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.

      [7] Trieu TA, & Kunieda Y. “An improvement for declat algorithmâ€, Proceedings of The 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC’12), Vol. 54, (2012), pp: 1–6.

      [8] Yu X, & Wang H (2014), Improvement of Eclat algorithm based on support in frequent itemset mining. Journal of Computers, 9(9), 2116–2123.

      [9] Man M, Rahim MSM, Zakaria MZ, & Bakar WAWA (2011), Spatial information databases integration model. Informatics Engineering and Information Science. Springer, 77–90.

  • Downloads

  • How to Cite

    A.W.A. Bakar, W., A. Jalil, M., Man, M., Abdullah, Z., & Mohd, F. (2018). Postdiffset: an Eclat-like algorithm for frequent itemset mining. International Journal of Engineering & Technology, 7(2.28), 197-199. https://doi.org/10.14419/ijet.v7i2.28.12911