A Study on Detecting Misleading Online News Using Bigram and Cosine Similarity


  • Normala Che Eembi
  • Iskandar Ishak
  • Fatimah Sidi
  • Lilly Suriani Affendey




Fake news, Deception, Lies, Misleading headlines, Deceiving news


Fake news can impact negatively in terms of creating negative perception towards business, organization, and government. One of the ways that fake news is created is through deceptive news writing. Many researchers have developed approaches in detecting deceptive news content using machine-learning approach and each of the approach has its own focus. Previous researches emphasis on the components of the news content such as indetecting grammar, humor, punctuation, body-dependent and body-independent features. In this paper, a new approach in detecting deceptive news based on misleading news has been developed which is focusing on the similarity between the content and its headlines using bigram and cosine similarity. Based on the experiments, the proposed approach has better performance in terms of detecting deceptive news.




[1] N. C. Eembijamil, I. Ishak, and F. Sidi, “Deception detection approach for data veracity in online digital news: Headlines vs contents,†AIP Conf. Proc., vol. 1891, 2017.

[2] Y. R. Tausczik and J. W. Pennebaker, “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods,†J. Lang. Soc. Psychol., vol. 29, no. 1, pp. 24–54, 2010.

[3] V. L. Rubin and T. Vashchilko, “Extending information quality assessment methodology: A new veracity/deception dimension and its measures,†Proc. Am. Soc. Inf. Sci. Technol., vol. 49, no. 1, pp. 1–6, 2012.

[4] E. Ferrara, “Manipulation and abuse on social media,†2015.

[5] V. Rubin, N. J. Conroy, V. L. Rubin, Y. Chen, and N. J. Conroy, “Deception Detection for News : Three Types of Fakes Deception Detection for News : Three Types of Fakes,†no. November, 2015.

[6] V. L. Rubin, N. J. Conroy, and Y. Chen, “Towards News Verification : Deception Detection Methods for News Discourse,†no. JANUARY, 2015.

[7] Y. Chen, N. J. Conroy, Y. Chen, N. J. Conroy, and V. L. Rubin, “News in an Online World : The Need for an " Automatic Crap Detector ",†no. November, 2015.

[8] V. Rubin, N. J. Conroy, V. L. Rubin, N. J. Conroy, Y. Chen, and S. Cornwell, “Fake News or Truth ? Using Satirical Cues to Detect Potentially Misleading News Fake News or Truth ? Using Satirical Cues to Detect Potentially Misleading News .,†no. April, 2016.

[9] H. Allcott and M. Gentzkow, “Social Media and Fake News in the 2016 Election,†J. Econ. Perspect., vol. 31, no. 2, pp. 211–236, 2017.

[10] R. M. Entman, “Framing bias: Media in the distribution of power,†J. Commun., vol. 57, no. 1, pp. 163–173, 2007.

[11] S. Lee, “Detection of Political Manipulation in Online Communities through Measures of Effort and Collaboration,†ACM Trans. Web, vol. 9, no. 3, pp. 1–24, 2015.

[12] “‘Fake news’ becomes a business model – researchers - The East African.†[Online]. Available: http://www.theeastafrican.co.ke/business/Fake-news-a-business-model/2560-4189846-bbkysn/index.html. [Accessed: 29-Apr-2018].

[13] “Identifying Fake News: Use Deception Detection Techniques | Globalytica.†[Online]. Available: http://www.globalytica.com/identifying-fake-news-deception-detection-techniques/. [Accessed: 30-Mar-2018].

[14] D. Dor, “On newspaper headlines as relevance optimizers,†J. Pragmat., vol. 35, no. 5, pp. 695–721, 2003.

[15] V. Pérez-Rosas and R. Mihalcea, “Experiments in Open Domain Deception Detection,†2013.

[16] E. Ifantidou, “Newspaper headlines and relevance: Ad hoc concepts in ad hoc contexts,†J. Pragmat., vol. 41, no. 4, pp. 699–720, 2009.

[17] J. O’Shea, Z. Bandar, and K. Crockett, “A New Benchmark Dataset with Production Methodology for Short Text Semantic Similarity Algorithms,†ACM Trans. Speech Lang. Process., vol. 10, no. 4, p. Article No. 19, 2013.

[18] T. Lukoianova and V. L. Rubin, “Veracity roadmap: Is big data objective, truthful and credible?,†Adv. Classif. Res. Online, vol. 24, pp. 4–15, 2013.

[19] N. M. Turner, D. G. York, and H. A. Petousis-Harris, “The use and misuse of media headlines: Lessons from the MeNZB??? immunisation campaign,†N. Z. Med. J., vol. 122, no. 1291, pp. 22–27, 2009.

[20] W. Wei and X. Wan, “Learning to Identify Ambiguous and Misleading News Headlines,†pp. 4172–4178, 2017.

[21] W. Wei and X. Wan, “Learning to Identify Ambiguous and Misleading News Headlines,†2017.

[22] R. Ecker, U.K, Lewandowsky, S., Chang, E.P., Pillai, “The Effects of Subtle Misinformation in News Headlines,†Uma ética para quantos?, vol. XXXIII, no. 2, pp. 81–87, 2014.

[23] T. Berger, D. Lettner, J. Rubin, P. Grünbacher, A. Silva, M. Becker, M. Chechik, and K. Czarnecki, What is a feature? 2015.

[24] V. L. Rubin, N. J. Conroy, Y. Chen, and S. Cornwell, “Fake News or Truth ? Using Satirical Cues to Detect Potentially Misleading News .,†no. April, pp. 7–17, 2016.

[25] L. Zhou, Y. Shi, D. Zhang, and A. Sears, “Discovering Cues to Error Detection in Speech Recognition Output: A User-Centered Approach,†J. Manag. Inf. Syst., vol. 22, no. 4, pp. 237–270, 2006.

[26] S. Petrov and D. Klein, “Improved Inferencing for Unlexicalized Parsing,†Proc. NAACL-HLT 2007, no. April, pp. 404–411, 2007.

[27] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, and L. Chanona-Hernández, “Syntactic N-grams as machine learning features for natural language processing,†Expert Syst. Appl., vol. 41, no. 3, pp. 853–860, Feb. 2014.

[28] H. Zhang, Z. Fan, J. Zheng, and Q. Liu, “An improving deception detection method in Computer-Mediated Communication,†J. Networks, vol. 7, no. 11, pp. 1811–1816, 2012.

[29] “1. Language Processing and Python.†[Online]. Available: https://www.nltk.org/book/ch01.html. [Accessed: 29-Apr-2018].