Efficient data cleaning algorithm using decision tree classification model approach and modified new unique user identification algorithm using hashing techniques with a new error factor

  • Authors

    • Ranjena Sriram
    • S Sheeja
    • I Henry Alexander
    2018-03-01
    https://doi.org/10.14419/ijet.v7i1.9.9736
  • Use about five key words or phrases in alphabetical order, Separated by Semicolon.
  • The study focuses on preprocessing techniques of web mining. Considering this scope, the study has proposed and implemented an efficient data cleaning and unique user identification algorithms. Previously proposed data cleaning algorithm is a generalized approach and lacked transparency. An appropriate model has to be used to implement the new data cleaning algorithm. Over analysis of various related studies and suggestions made by eminent experts, the study finalized decision tree classification model, and appropriate model to implement the new data cleaning algorithm. Simplicity, ease in framing rules and ability to fragment complex decisions to solve a problem motivated to choose decision tree classification model to implement new data cleaning algorithm. Apart from this the study has also modified the previously proposed hash function, used to locate existing web users in web log server. A new error factor is introduced to remove memory address discrepancy. The modified hashing function along with binary search techniques is used to design the new unique user identification algorithm. Various experiments analysis is done using web log servers of eminent universities and colleges from United Arab Emirates and India. Results obtained prove the improved and better performances of the new rule based data cleaning and modified unique user identification algorithms.

  • References

    1. [1] Chitraa V &Dr.AntonySelvadoss, “A Novel Technique for Sessions Identification in Web Usage Mining Preprocessingâ€,International Journal of Computer Applications, Vol.34,No.9, (2011), pp.23-31.

      [2] Suguna R & Sharmila D, “User Interest Level Based Pre-processing Algorithms Using Web Usage Miningâ€, International Journal on Computer Science and Engineering, Vol.10, (2015), pp.108-117.

      [3] Vaarandi R &Pihelgas M, “Logcluster-a data clustering and pattern mining algorithm for event logsâ€, 11th International Conference on Network and Service Management, (2015), pp.1-7.https://doi.org/10.1109/CNSM.2015.7367331.

      [4] Jagan S & Rajagopalan SP, “A Survey on Web Personalization of Web Usage Miningâ€, International Research Journal of Engineering and Technology, Vol.02, No.01, (2015), pp.2395-0056.

      [5] Parmar VP &Kumbharana CK, “Comparing Linear Search and Binary Search Algorithms to Search an Element from a Linear List Implemented through Static Array, Dynamic Array and Linked Listâ€, International Journal of Computer Applications, Vol.121, No.3, (2015).

      [6] Ranjena Sriram & Mallika R, “Innovative Pre-Processing Technique and Efficient User Identification Algorithm for Web Usage Miningâ€, International Journal of Advanced Research in Computer Science and Software Engineering, Vol.6, No.2, (2016), pp.85-90.

      [7] Sleator DD &Tarjan RE, “Self-adjusting binary search treesâ€, Journal of the ACM (JACM), Vol.32, No.3, (1985), pp.652-686.https://doi.org/10.1145/3828.3835.

      [8] Singh K &Sulekh R., “The Comparison of Various Decision Tree Algorithms for Data Analysisâ€, International Journal of Engineering and Computer Science, Vol.6, No.6, (2017).

      [9] Sewaiwar P & Verma KK, “Comparative Study of Various Decision Tree Classification Algorithm Using WEKAâ€, International Journal of Emerging Research in Management &Technology, Vol.4, (2015), pp.2278-9359.

      [10] Chourasia S, “Survey paper on improved methods of ID3 decision tree classificationâ€, International Journal of Scientific and Research Publications, Vol.3, No.12, (2013).

      [11] Vadhera P &Lall B, “Review Paper on Secure Hashing Algorithm and Its Variantsâ€, International Journal of Science and Research (IJSR), Vol.3, No.6, (2012), pp.55-61.

      [12] Raiyani SA, “Preprocessing and Analysis of Web Server Logsâ€, International Journal of Computer Science & Communication Networks, Vol.2, (2015), pp.46-55.

      [13] Suneetha KR &Krishnamoorthi R, “Identifying User Behavior by Analyzing Web Server Access Log Fileâ€, International Journal of Computer Science and Network Security, Vol.9, No.4, (2009), pp.327-332.

      [14] Sahu MS &Sahu APL, “A Survey on Frequent Web Page Mining with Improving Data Quality of Log Cleanerâ€, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Vol.4, No.3, (2015), pp.825-829.

  • Downloads

  • How to Cite

    Sriram, R., Sheeja, S., & Henry Alexander, I. (2018). Efficient data cleaning algorithm using decision tree classification model approach and modified new unique user identification algorithm using hashing techniques with a new error factor. International Journal of Engineering & Technology, 7(1.9), 54-63. https://doi.org/10.14419/ijet.v7i1.9.9736