Malicious Website Collection System Using Machine Learning
Keywords:Internet, Malicious websites, blacklist, Machine learning, Hidden Markov Model(HMM), legitimate websites.
Malicious websites are those sites which have malicious content or files in it. It lures the user when they click on it either by going to some other irrelevant site or downloading some malicious content in the user system without the userâ€™s knowledge. These websites appear to be legitimate websites but they are malicious sites. It contains various content such as spam, phishing, driven-by-download, virus, ransomware and other etc. These malicious sites even cause huge losses to a particular organization or to an individual user. Typically a blacklisting mechanism is used to detect malicious websites. But these blacklisting mechanism doesnâ€™t work efficiently to find all kinds of malicious sites. This blacklisting mechanism can be easily evaded by the attacker. To overcome this blacklisting mechanism a machine learning approach is used to detect and tackle all kind of malicious contents in the web pages. This machine learning approach canâ€™t be evaded by the attacker. Supervised and Unsupervised machine learning approaches are used to detect malicious websites.  The supervised approach is used to detect known attacks were Unsupervised learning is used to detect unknown malicious websites. Unsupervised learning is done using a machine learning approach. For classification of websites, we use Hidden Markov Model(HMM) which is safe and reliable for operating on the internet. This model works efficiently to find inter-dependencies among the resources. A fast feature extraction is used to find the attributes, the Baum Welch algorithm and Viterbi algorithm in the Markov model used to detect malicious URLs more accurately and precisely. This shows that the application of HMM enhances the performance to classify the data sets and gives more accurate results. This model is applied on all social media.
 Malicious URL Detection using Machine Learning: A Survey Doyen Sahoo, Chenghao Liu, and Steven C.H. Hoi
 Honeypot Frameworks and Their Applications: A New Framework ,By Chee Keong NG, Lei Pan, Yang Xiang
 Annachhatre, C., et al (2015) Hidden Markov Models for Malware Classification. Journal in Computer Virology and Hacking Techniques, 11, 59-73. http://dx.doi.org/10.1007/s11416-014-0215-x
 Bazrafshan, Z., Hashemi, H., Fard, S.M.H. and Hamzeh, A. (2013) A Survey on Heuristic Malware Detection Techniques. The 5th Conference on Information and Knowledge Technology (IKT 2013), Shiraz, 28-30 May 2013, 113-120. http://dx.doi.org/10.1109/ikt.2013.6620049
 Ichise H, Jin Y, Iida K. Detection Method of DNS-based Botnet Communication Using Obtained NS Record History[C]// Computer Software and Applications Conference. IEEE, 2015:676-677.