Language Independent Models for COVID-19 Fake News Detection Black Box versus White Box Models
Main Article Content
Keywords
Fake news, black box model, white box model, machine learning, COVID-19
Abstract
In an era where massive information can be spread easily through social media, fake news detention is increasingly used to prevent widespread misinformation, especially fake news regarding COVID-19. Databases have been built and machine-learning algorithms have been used to identify patterns in news content and filter the false information. A brief overview, ranging from public domain datasets through the deployment of several machine learning models, as well as feature extraction methods, is provided in this paper. As a case study, a mixed language dataset is presented. The dataset consists of tweets of COVID-19 which have been labelled as fake or real news. To perform the detection task, a classification model is implemented using language-independent features. In particular, the features offer numerical inputs that are invariant to the language type; thus, they are suitable for investigation, as many regions in the world have similar linguistic structures. Furthermore, the classification task can be performed by using black box or white box models, each having its own advantages and disadvantages. In this paper, we compare the performance of the two approaches. Simulation results show that the performance difference between black box models and white box models is not significant.
Downloads
References
Abonizio, H. Q., Morais, J. I., Tavares, G. M., & Barbon Junior, S. (2020). Language-Independent Fake News Detection: English, Portuguese, and Spanish Mutual Features. Future Internet, 12, 1–18. https://doi.org/10.3390/fi12050087
Al-Ahmad, B., Al-Zoubi, A., Abu Khurma, R., & Aljarah, I. (2021). An Evolutionary Fake News Detection Method for COVID-19 Pandemic Information. Symmetry, 13, 1091. https://doi.org/10.3390/sym13061091
Alam, F., Shaar, S., Dalvi, F., Sajjad, H., Nikolov, A., Mubarak, H., Martino, G. D. S., Abdelali, A., Durrani, N., Darwish, K., Al-Homaid, A., Zaghouani, W., Caselli, T., Danoe, G., Stolk, F., Bruntink, B., & Nakov, P. (2020). Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society. arXiv preprint arXiv:2005.00033. https://doi.org/10.48550/arXiv.2005.00033
Alameri, S. A., & Mohd, M. (2021). Comparison of Fake News Detection Using Machine Learning and Deep Learning Techniques. 3rd International Cyber Resilience Conference (CRC). https://doi.org/10.1109/CRC50527.2021.9392458
Al-Ash, H. S., Putri, M. F., Mursanto, P., & Bustamam, A. (2019). Ensemble Learning Approach on Indonesian Fake News Classification. 3rd International Conference on Informatics and Computational Sciences (ICICoS). https://doi.org/10.1109/ICICoS48119.2019.8982409
Albury, N. J. (2017). Mother Tongues and Languaging in Malaysia: Critical Linguistics Under Critical Examination. Language in Society, 46, 567–589. https://www.jstor.org/stable/26847179
Choudhury, D., & Acharjee, T. (2022). A Novel Approach to Fake News Detection in Social Networks Using Genetic Algorithm Applying Machine Learning Classifiers. Multimedia Tools and Applications, 82, 9029–9045. https://doi.org/10.1007/s11042-022-12788-1
Cui, L., & Lee, D. (2020). CoAID: COVID-19 Healthcare Misinformation Dataset. arXiv preprint arXiv:2006.00885. https://doi.org/10.48550/arXiv.2006.00885
Das, S. D., Basak, A., & Dutta, S. (2021). A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection. International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation (pp. 164–176). https://doi.org/10.48550/arXiv.2101.03545
De, A., Bandyopadhyay, D., Gain, B., & Ekbal, A. (2021). A Transformer-based Approach to Multilingual Fake News Detection in Low-resource Languages. ACM Transactions on Asian and Low-Resource Language Information Processing, 21, 1–20. https://doi.org/10.1145/3472619
Domenico, G. D., Sit, J., Ishizaka, A., & Nunan, D. (2021). Fake News, Social Media and Marketing: A Systematic Review. Journal of Business Research, 124, 329–341. https://doi.org/10.1016/j.jbusres.2020.11.037
Faustini. P., & Covões, T. (2020). Fake News Detection in Multiple Platforms and Languages. Expert Systems with Applications, 158, 1–17. https://doi.org/10.1016/j.eswa.2020.113503
Faustini, P., & Covões, T. (2019). Fake News Detection Using One-class Classification. 8th Brazilian Conference on Intelligent Systems (BRACIS). https://doi.org/10.1109/BRACIS.2019.00109
Ferreira Caceres, M. M., Sosa, J. P., Lawrence, J. A., Sestacovschi, C., Tidd-Johnson, A., Rasool, M. H. U., Gadamidi, V. K., Ozair, S., Pandav, K., Cuevas-Lou, C., Parrish, M., Rodriguez, I., & Fernandez, J. P. (2022). The Impact of Misinformation on the COVID-19 Pandemic. AIMS Public Health, 9(2), 262–277. https://doi.org/10.3934/publichealth.2022018
Fung, P. L., Zaidan, M. A., Timonen, H., Niemi, J. V., Kousa, A., Kuula, J., Luoma, K., Tarkoma, S., Petäjä, T., Kulmala, M., & Hussein, T. (2021). Evaluation of White-box Versus Black-box Machine Learning Models in Estimating Ambient Black Carbon Concentration. Journal of Aerosol Science, 152, 105694. https://doi.org/10.1016/j.jaerosci.2020.105694
Galal, S., Nagy, N., & El-Sharkawi, M. E. (2021). CNMF: A Community-Based Fake News Mitigation Framework. Information, 12(9), 376. https://doi.org/10.3390/info12090376
Grossman, G. M., & Helpman, E. (2023). Electoral Competition with Fake News. European Journal of Political Economy, 77, 1–12. https://doi.org/10.1016/j.ejpoleco.2022.102315
Guibon, G., Ermakova, L., Seffih, H., Firsov, A., & Noé-Bienvenu, G. (2019). Multilingual Fake News Detection with Satire. International Conference on Computational Linguistics and Intelligent Text Processing (pp. 392–402). https://doi.org/10.1007/978-3-031-24340-0_29
Gupta, M., Dennehy, D., Parra, C. M., Mäntymäki, M., & Dwivedi, Y. K. (2023). Fake News Believability: The Effects of Political Beliefs and Espoused Cultural Values. Information & Management. 60, 1–12. https://doi.org/10.1016/j.im.2022.103745
Hande, A., Puranik, K., Priyadharshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). Evaluating Pretrained Transformer-based Models for COVID-19 Fake News Detection. 5th International Conference on Computing Methodologies and Communication (ICCMC) (pp. 766–772). https://doi.org/10.1109/ICCMC51019.2021.9418446
Hayawi, K., Shahriar, S., Serhani, M. A., Taleb, I., & Mathew, S. S. (2022). ANTi-Vax: A Novel Twitter Dataset for COVID-19 Vaccine Misinformation Detection. Public Health, 203, 23–30. https://doi.org/10.1016/j.puhe.2021.11.022
Hu, L., Wei, S., Zhao, Z., & Wu, B. (2022). Deep Learning for Fake News Detection: A Comprehensive Survey. AI Open, 3, 133–155. https://doi.org/10.1016/j.aiopen.2022.09.001
Hussain, M G., Hasan, M. R., Rahman, M., Protim, J., & Hasan, S. A. (2020). Detection of Bangla Fake News Using MNB and SVM Classifier. arXiv preprint arXiv:2005.14627. https://doi.org/10.48550/arXiv.2005.14627
Imaduwage, S., Kumara, P. P. N. V., & Samaraweera, W. J. (2022). Importance of User Representation in Propagation Network-based Fake News Detection: A Critical Review and Potential Improvements. 2nd International Conference on Advanced Research in Computing (ICARC) (pp. 90–95). https://doi.org/10.1109/ICARC54489.2022.9754103
Imbwaga, J. L., Chittaragi, N., & Koolagudi, S. (2022). Fake News Detection Using Machine Learning Algorithms. Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing (IC3-2022). https://doi.org/10.1145/3549206.3549256
Ivancová, K., Sarnovský, M., & Maslej-Krcšñáková, V. (2021). Fake News Detection in Slovak Language Using Deep Learning Techniques. IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI). http://dx.doi.org/10.1109/SAMI50585.2021.9378650
Jardaneh, G., Abdelhaq, H., Buzz, M., & Johnson, D. (2019). Classifying Arabic Tweets Based on Credibility Using Content and User Features. Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT). https://doi.org/10.1109/JEEIT.2019.8717386
Javed Mehedi Shamrat, F. M., Ranjan, R., Hasib, K. M., Yadav, A., & Siddique, A. H. (2022). Performance Evaluation Among ID3, C4.5, and CART Decision Tree Algorithm. In Ranganathan, G., Bestak, R., Palanisamy, R., & Rocha, Á. (eds). Pervasive Computing and Social Networking. Lecture Notes in Networks and Systems, 317. Springer, Singapore. https://doi.org/10.1007/978-981-16-5640-8_11
Jiang, T., Li, J. P., Haq, A. U., & Saboor, A. (2020). Fake News Detection Using Deep Recurrent Neural Networks. 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). https://doi.org/10.1109/ICCWAMTIP51612.2020.9317325
Jiang, T., Li, J. P., Haq, A. U., Saboor, A., & Ali, A. (2021). A Novel Stacking Approach for Accurate Detection of Fake News. IEEE Access, 9, 22626–22639. https://doi.org/10.1109/ACCESS.2021.3056079
Kar, D., Bhardwaj, M., Samanta, S., & Azad, A. P. (2020). No Rumours Please! A Multi-indic-lingual Approach for COVID Fake-tweet Detection. 2021 Grace Hopper Celebration India (GHCI) conference. https://doi.org/10.1109/GHCI50508.2021.9514012
Kesarwani, A., Chauhan, S. S., & Nair, A. R., (2020). Fake News Detection on Social Media Using K-Nearest Neighbours Classifier. International Conference on Advances in Computing and Communication Engineering (ICACCE). https://doi.org/10.1109–/ICACCE49060.2020.9154997
Kim, J., Tabibian, B., Oh, A., Schoelkopf, B., & Gomez-Rodriguez, M. (2018). Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation. arXiv preprint arXiv:1711.09918. https://doi.org/10.48550/arXiv.1711.09918
Kong, J. T. H., Wong, W. K., Juwono, F. H., & Apriono, C. (2023). Generating Fake News Detection Model Using a Two-stage Evolutionary Approach. IEEE Access, 11, 85067–85085. https://doi.org/10.1109/ACCESS.2023.3303321
Kong, S. H., Tan, L. M., Gan, K. H., & Samsudin, N. H. (2020). Fake News Detection Using Deep Learning. 2020 IEEE 10th Symposium on Computer Applications & Industrial Electronics (ISCAIE). https://doi.org/10.1109/DSAA49011.2020.00088
Li, Y., Jiang, B., Shu, K., & Liu, H. (2020). Mm-covid: A Multilingual and Multimodal Data Repository for Combating COVID-19 Disinformation. arXiv preprint arXiv:2011.04088. https://doi.org/10.48550/arXiv.2011.04088
Lin, J., Tremblay-Taylor, G., Mou, G., You, D., & Lee, K. (2019). Detecting Fake News Articles. 2019 IEEE International Conference on Big Data (Big Data) (pp. 3021–3025). http://dx.doi.org/10.1109/BigData47090.2019.9005980
Loyola-González, O. (2019). Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses from A Practical Point of View. IEEE Access, 7, 154096–154113. https://doi.org/10.1109/ACCESS.2019.2949286
Maakoul, O., Boucht, S., Hachimi, K., & Azzouzi, S. (2020). Towards Evaluating the COVID’19 Related Fake News Problem: Case of Morocco. 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS). https://doi.org/10.1109/ICECOCS50124.2020.9314517
Melo, T., & Figueiredo, C. M. (2020). A First Public Dataset from Brazilian Twitter and News on COVID-19 in Portuguese. Data in brief, 32, 106179. https://doi.org/10.1016–/j.dib.2020.106179
Memon, S. A., & Carley, K. M. (2020). Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. arXiv preprint arXiv:2008.00791. https://doi.org/10.48550/arXiv.2008.00791
Mugdha, S. B. S., Ferdous, S. M., & Fahmin, A. (2020). Evaluating Machine Learning Algorithms for Bengali Fake News Detection. 23rd International Conference on Computer and Information Technology (ICCIT). https://doi.org/10.1109–/ICCIT51783.2020.9392662
Murayama, T., Wakamiya, S., Aramaki, E., & Kobayashi, R. (2021). Modeling the Spread of Fake News on Twitter. PLOS ONE, 16(4), e0250419. https://doi.org/10.1371–/journal.pone.0250419
Nordberga, P., Kävrestada, J., & Nohlberg, M. (2020). Automatic Detection of Fake News. 6th International Workshop on Socio-Technical Perspective in IS Development (STPIS’20). https://ceur-ws.org/Vol-2789/paper23.pdf
Oliveira, N. R., Medeiros, D. S., & Mattos, D. M. (2020). A Sensitive Stylistic Approach to Identify Fake News on Social Networking. IEEE Signal Processing Letters, 27, 1250–1254. http://dx.doi.org/10.1109/LSP.2020.3008087
Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., Ekbal, A., Das, A., & Chakraborty, T. (2021). Fighting an Infodemic: COVID-19 Fake News Dataset. In Chakraborty, T., Shu, K., Bernard, H. R., Liu, H., & Akhtar, M.S. (eds), Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_3
Pizarro, J. (2020). Profiling Bots and Fake News Spreaders at Pan’19 and Pan’20: Bots and Gender Profiling 2019, Profiling Fake News Spreaders on Twitter 2020. IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). https://doi.org/10.1109/DSAA49011.2020.00088
Prasetyo, A., Septianto, B. D., Shidik, G. F., & Fanani, A Z. (2019). Evaluation of Feature Extraction TF-IDF in Indonesian Hoax News Classification. International Seminar on Application for Technology of Information and Communication (iSemantic). https://doi.org/10.1109/ISEMANTIC.2019.8884291
Probierz, B., Stefański, P., & Kozak, J. (2021). Rapid Detection of Fake News Based on Machine Learning Methods, Procedia Computer Science, 192, 2893–2902. https://doi.org–/10.1016/j.procs.2021.09.060
Rocha, Y. M., Moura, G. A., Desidério, G. A., Oliveira C. H., Lourenço, F. D., & Figueiredo Nicolete, L. D. (2023). The Impact of Fake News on Social Media and Its Influence on Health During The COVID-19 Pandemic: A Systematic Review. Journal of Public Health, 31, 1007–1016. https://doi.org/10.1007/s10389-021-01658-z
Rusli, A., Young, J. C., & Iswari, N. M. S. (2020). Identifying Fake News in Indonesian via Supervised Binary Text Classification. IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, pp. 86–90. https://doi.org/10.1109/IAICT50021.2020.9172020
Shahi, G. K., & Nandini, D. (2020). FakeCovid –A Multilingual Cross-domain Fact Check News Dataset for COVID-19. arXiv preprint arXiv:2006.11343. https://doi.org–/10.36190/2020.14
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on Social Media: A Data Mining Perspective. ACM SIGKDD explorations newsletter, 19(1), 22–36. https://doi.org/10.48550/arXiv.1708.01967
Sutter, G. D., Cappelle, B., Clercq, O. D., Loock, R., & Plevoets, K. (2017). Towards A Corpus-based, Statistical Approach to Translation Quality: Measuring and Visualizing Linguistic Deviance in Student Translations. Linguistica Antverpiensia, New Series–Themes in Translation Studies, 16, 16–25. https://doi.org/10.52034–/lanstts.v16i0.440
Törnberg, P. (2018). Echo Chambers and Viral Misinformation: Modeling Fake News as Complex Contagion. PLOS ONE, 13(9), e0203958. https://doi.org/10.1371–/journal.pone.0203958
Verma, P. K., Agrawal, P., Amorim, I., & Prodan, R. (2021). Welfare: Word Embedding Over Linguistic Features for Fake News Detection. IEEE Transactions on Computational Social Systems, 8(4), 881–893. https://doi.org/10.1109/TCSS.2021.3068519
Veselý, K., Karafiát, M., Grézl, F., Janda, M., & Egorova, E. (2012). The Language-Independent Bottleneck Features. IEEE Spoken Language Technology Workshop (SLT). http://dx.doi.org/10.1109/SLT.2012.6424246
Vogel, I., & Meghana, M. (2020). Detecting Fake News Spreaders on Twitter From A Multilingual Perspective. 7th International Conference on Data Science and Advanced Analytics (DSAA). http://dx.doi.org/10.1109/DSAA49011.2020.00084
Wang, Y., Hou, Y., Che, W., & Liu, T. (2020). From Static to Dynamic Word Representations: A Survey. International Journal of Machine Learning and Cybernetics, 11, 1611–1630. https://doi.org/10.1007/s13042-020-01069-8
Waszak, P., Kasprzycka-Waszak, W., Kubanek, A. (2018). The Spread of Medical Fake News in Social Media – The Pilot Quantitative Study. Health Policy and Technology, 7(2), 115–118. https://doi.org/10.1016/J.HLPT.2018.03.002
Wong, W. K., Juwono, F. H., & Apriono, C. (2021). Vision-based Malware Detection: A Transfer Learning Approach Using Optimal ECOC-SVM Configuration. IEEE Access, 9, 159262–159270. https://doi.org/10.1109/ACCESS.2021.3131713
Wong, W. K., Juwono, F. H., Nuwara, Y., & Kong, J. T. H. (2023). Synthesizing Missing Travel Time of P-Wave and S-Wave: A Two-Stage Evolutionary Modeling Approach. IEEE Sensors Journal, 23(14), 15867–15877. http://dx.doi.org/10.1109–/JSEN.2023.3280708
Wong. W., Ming, C. I. (2019). A Review on Metaheuristic Algorithms: Recent Trends, Benchmarking and Applications. 7th International Conference on Smart Computing Communications (ICSCC). https://doi.org/10.1109/ICSCC.2019.8843624
Yang, C., Zhou, X., Zafarani, R. (2021). Checked: Chinese COVID-19 Fake News Dataset. Social Network Analysis and Mining, 11(1), 1–8. https://doi.org/10.1007/s13278-021-00766-8
Zervopoulos, A., Alvanou, A. G., Bezas, K., & Papamichail, A. (2022). Deep Learning for Fake News Detection on Twitter Regarding the 2019 Hong Kong Protests. Neural Computing and Applications, 34(1), 969–982. https://doi.org/10.1007/s00521-021-06230-0
Zhou, X., & Zafarani, R. (2019). Network-based Fake News Detection: A Pattern-driven Approach. SIGKDD Explorations Newsletter, 21(2), 48–60. https://doi.org–/10.1145/3373464.3373473