Clustering Social Media Data for Marketing Strategies Literature Review Using Topic Modelling Techniques
Main Article Content
Keywords
social media data analytics, marketing strategies, technology, clustering, topic modelling
Abstract
With the rise of social media platforms for marketing purposes, the central dilemma for researchers and policymakers lies in choosing effective data analysis tools to improve marketing decisions. In the academic literature, numerous articles have discussed clustering techniques for analysing social media data, from a perspective of data mining or social media marketing. However, few studies have attempted to synthesise results obtained from both perspectives. This research aims to (1) offer a structured overview of existing literature on clustering methods for marketing strategies and (2) compare three topic modelling techniques applied to extract the main topics evoked in the corpus of papers. Indeed, topic modelling emerges as a valuable tool for extracting relevant information from big data in general and more specifically from extensive scientific papers. Based on a thematic analysis, the extracted topics were classified according to the following categories: fields, marketing strategies and technologies. Results prove that latent Dirichlet allocation (LDA) is the most effective technique in this context. Furthermore, this study provides an overview of clustering techniques and technologies used for marketing strategies in studied fields. These findings help researchers and practitioners to select the best techniques and technologies for extracting marketing knowledge from big data.
Downloads
References
Arun, R., Suresh, V., Veni Madhavan, C., & Narasimha Murthy, M. (2010). On finding the natural number of topics with latent Dirichlet allocation: Some observations. Advances in Knowledge Discovery and Data Mining, 391–402. https://doi.org/10.1007/978-3-642-13657-3_43
Ayachi, Z., & Jallouli, R. (2022). Digital marketing strategies driven by wellbeing in virtual communities: Literature review. Journal of Telecommunications and the Digital Economy, 10(3), 107–127. https://doi.org/10.18080/jtde.v10n3.612
Banu, A. B., & Nivedita, V. S. (2023). Trending big data tools for industrial data analytics. Encyclopedia of Data Science and Machine Learning (pp. 545–565). IGI Global. https://doi.org/10.4018/978-1-7998-9220-5.ch032
Bataineh, B., & Alzah, A. A. (2023). Fully automated density-based clustering method. Computers, Materials & Continua, 76(2). https://doi.org/10.32604/cmc.2023.039923
Benslama, T., & Jallouli, R. (2020). Clustering of social media data and marketing decisions. Lecture Notes in Business Information Processing, 53–65. https://doi.org/10.1007/978-3-030-64642-4_5
Benslama, T., & Jallouli, R. (2022). Social media data analytics for marketing strategies: The path from data to value. Journal of Telecommunications and the Digital Economy, 10(2), 96–110. https://doi.org/10.18080/jtde.v10n2.523
Blei, D., & Lafferty, J. (2007). A correlated topic model of Science. The Annals of Applied Statistics. https://doi.org/10.48550/arXiv.0708.3601
Bohra, N., & Bhatnagar, V. (2021). Group level social media popularity prediction by MRGB and Adam optimization. Journal of Combinatorial Optimization, 41, 328–347. https://doi.org/10.1007/s10878-020-00684-z
Cambria, E., Rajagopal, D., Olsher, D., & Das, D. (2013). Big social data analysis. Big Data Computing, 13, 401–414. https://doi.org/10.1201/b16014-19
Campbell, C., Sands, S., Ferraro, C., Tsao, H. Y. J., & Mavrommatis, A. (2020). From data to action: How marketers can leverage AI. Business Horizons, 63(2), 227–243. https://doi.org/10.1016/j.bushor.2019.12.002
Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA Model Selection. Neurocomputing, 1775–1781. http:// doi.org/10.1016/j.neucom.2008.06.011
Chebil, M., Jallouli, R., Bach Tobji, M.A., Ben Ncir, C.E. (2021). Topic Modeling of Marketing Scientific Papers: An Experimental Survey. In Jallouli, R., Bach Tobji, M. A., Mcheick, H., Piho, G. (eds), Digital Economy. Emerging Technologies and Business Innovation. ICDEc 2021. Lecture Notes in Business Information Processing, vol. 431. Springer, Cham. https://doi.org/10.1007/978-3-030-92909-1_10
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(76), 2493–2537. https://doi.org/10.48550/arXiv.1103.0398
Elgendy, N., & Elragal, A. (2014). Big data analytics: a literature review paper. Advances in Data Mining. Applications and Theoretical Aspects: 14th Industrial Conference, ICDM 2014, St. Petersburg, Russia, July 16–20, 2014. Proceedings, 14 (pp. 214–227). Springer International Publishing. https://doi.org/10.1007/978-3-319-08976-8_16
Ghani, N. A., Hamid, S., Hashem, I. A. T., & Ahmed, E. (2018). Big social media data analytics: a survey. Computers in Human Behavior. https://doi.org/10.1016/j.chb.2018.08.039
Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 5228–5235.https://doi.org/10.1073/pnas.0307752101
He, Y., Wang, C., & Jiang, C. (2018). Discovering canonical correlations between topical and topological information in document networks. IEEE Transactions on Knowledge and Data Engineering, 30(3), 460–473.https://doi.org/10.1109/tkde.2017.2767599
He, W., Zhang, W., Tian, X., Tao, R., & Akula, V. (2019). Identifying customer knowledge on social media through data analytics. Journal of Enterprise Information Management, 32(1), 152–169. https://doi.org/10.1108/JEIM-02-2018-0031
Hu, L., Xing, Y., Gong, Y., Zhao, K., & Wang, F. (2019). Nonnegative matrix tri-factorization with user similarity for clustering in point-of-interest. Neurocomputing, 363, 58–65. https://doi.org/10.1016/j.neucom.2019.07.040
Jansson, P., & Liu, S. (2017, September). Distributed representation, LDA topic modelling and deep learning for emerging named entity recognition from social media. Proceedings of the 3rd Workshop on Noisy User-generated Text (pp. 154–159). https://doi.org/10.18653/v1/W17-4420
Jayanthi, D. S., & Priya, C. K. (2018). Clustering approach for classification of research articles based on keyword search. International Journal of Advanced Research in Computer Engineering & Technology, 7(1), 8690. https://api.semanticscholar.org/CorpusID:85460424
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2018). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78, 15169–15211. https://doi.org/10.1007/s11042-018-6894-4
Jenik, C., (2021). A Minute on the Internet in 2021. Statista. https://www.statista.com/chart/25443/estimated-amount-of-data-created-on-the-internet-in-one-minute/.
Jimenez-Marquez, J. L., Gonzalez-Carrasco, I., Lopez-Cuadrado, J. L., & Ruiz-Mezcua, B. (2019). Towards a big data framework for analyzing social media content. International Journal of Information Management, 44, 1–12. https://doi.org/10.1016/j.ijinfomgt.2018.09.003
Kampas, D. (2016). Topic Identification Considering Word Order by Using Markov Chains. Retrieved from https://hdl.handle.net/10993/27805
Katal, A., Wazid, M., & Goudar, R. H. (2013, August). Big data: issues, challenges, tools and good practices. 2013 Sixth International Conference On Contemporary Computing (IC3) (pp. 404–409). IEEE. https://doi.org/10.1109/IC3.2013.6612229
Keegan, B. J., & Rowley, J. (2017). Evaluation and decision making in social media marketing. Management Decision, 55(1), 15–31. https://doi.org/10.1108/MD-10-2015-0450
Kim, J., & Hastak, M. (2018). Social network analysis: Characteristics of online social networks after a disaster. International Journal of Information Management, 38(1), 86–96. https://doi.org/10.1016/j.ijinfomgt.2017.08.003
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980
Kirilenko, A. P., Stepchenkova, S. O., & Hernandez, J. M. (2019). Comparative clustering of destination attractions for different origin markets with network and spatial analyses of online reviews. Tourism Management, 72, 400–410. https://doi.org/10.1016/j.tourman.2019.01.001
Kowalczyk, M., & Buxmann, P. (2015). An ambidextrous perspective on business intelligence and analytics support in decision processes: Insights from a multiple case study. Decision Support Systems, 80, 1–13. https://doi.org/10.1016/j.dss.2015.08.010
Krishnan, A. (2023). Exploring the power of topic modeling techniques in analyzing customer reviews: a comparative analysis. arXiv preprint arXiv:2308.11520. https://doi.org/10.48550/arXiv.2308.11520
Lim, S., Tucker, C. S., & Kumara, S. (2017). An unsupervised machine learning model for discovering latent infectious diseases using social media data. Journal of Biomedical Informatics, 66, 82–94. https://doi.org/10.1016/j.jbi.2016.12.007
Liu, L., Tang, L., Dong, W., Yao, S., & Zhou, W. (2016). An overview of topic modeling and its current applications in bioinformatics. SpringerPlus. https://doi.org/10.1186/s40064-016-3252-8
Lynn, T., Healy, P., Kilroy, S., Hunt, G., Van Der Werff, L., Venkatagiri, S., & Morrison, J. (2015, July). Towards a general research framework for social media research using big data. 2015 IEEE International Professional Communication Conference (IPCC) (pp. 1–8). IEEE. https://doi.org/10.1109/IPCC.2015.7235843
Madhuri, R., Murty, M. R., Murthy, J. V. R., Reddy, P. P., & Satapathy, S. C. (2014). Cluster analysis on different data sets using K-modes and K-prototype algorithms. ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol II: Hosted by CSI Vishakapatnam Chapter (pp. 137–144). Springer International Publishing. https://doi.org/10.1007/978-3-319-03095-1_15
Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT press. http://nlp.stanford.edu/fsnlp/
Marine-Roig, E., & Clavé, S. A. (2015). Tourism analytics with massive user-generated content: A case study of Barcelona. Journal of Destination Marketing & Management, 4(3), 162–172. https://doi.org/10.1016/j.jdmm.2015.06.004
Matilda, S. (2017). Big data in social media environment: A business perspective. Decision Management: Concepts, Methodologies, Tools, and Applications (pp. 1876–1899). IGI Global. https://doi.org/10.4018/978-1-5225-0846-5.ch004
Morabito, V. (2015). Big data and analytics. Strategic and organisational impacts. Springer; 2015th edition (January 31, 2015). https://doi.org/10.1007/978-3-319-10665-6
Paul, P. V., Monica, K., & Trishanka, M. (2017, April). A survey on big data analytics using social media data. 2017 Innovations in Power and Advanced Computing Technologies (i-PACT) (pp. 1–4). IEEE. https://doi.org/10.1109/IPACT.2017.8245092
Rawat, R., & Yadav, R. (2021). Big data: Big data analysis, issues and challenges and technologies. IOP Conference Series: Materials Science and Engineering (Vol. 1022, No. 1, p. 012014). IOP Publishing. https://doi.org/10.1109/IPACT.2017.8245092
Reuter, T., & Cimiano, P. (2012, June). Event-based classification of social media streams. Proceedings of the 2nd ACM International Conference on Multimedia Retrieval (pp. 1–8). https://doi.org/10.1145/2324796.2324824
Saeed, M. M., Al Aghbari, Z., & Alsharidah, M. (2020). Big data clustering techniques based on spark: a literature review. PeerJ Computer Science, 6, e321. https://doi.org/10.7717/peerj-cs.321
Sarma, M. K., & Mahanta, A. K. (2019, April). Clustering of web documents with structure of webpages based on the html document object model. 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS) (pp. 1–6). IEEE. https://doi.org/10.1109/INCOS45849.2019.8951405
Stieglitz, S., Mirbabaie, M., Ross, B., & Neuberger, C. (2018). Social media analytics–Challenges in topic discovery, data collection, and data preparation. International Journal of Information Management, 39, 156–168. https://doi.org/10.1016/j.ijinfomgt.2017.12.002
Tu, H., Phan, T., & Nguyen, K. (2017). An adaptive latent semantic analysis for text mining. 2017 International Conference on System Science and Engineering (ICSSE). https://doi.org/10.1109/icsse.2017.8030943
Wang, Y., Deng, Q., Rod, M., & Ji, S. (2021). A thematic exploration of social media analytics in marketing research and an agenda for future inquiry. Journal of Strategic Marketing, 29(6), 471–491. https://doi.org/10.1080/0965254X.2020.1755351
Wu, X., Zhu, X., Wu, G. Q., & Ding, W. (2013). Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97–107. https://doi.org/10.1109/TKDE.2013.109
Yang, J., Xiu, P., Sun, L., Ying, L., & Muthu, B. (2022). Social media data analytics for business decision making system to competitive analysis. Information Processing & Management, 59(1), 102751. https://doi.org/10.1016/j.ipm.2021.102751
Yang, Y., Gong, Z., & U, L. H. (2011, July). Identifying points of interest by self-tuning clustering. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (pp. 883–892). https://doi.org/10.1145/2009916.2010034
Zakir, J., Seymour, T., & Berg, K. (2015). Big data analytics. Issues in Information Systems, 16(2), 81–90. https://doi.org/10.48009/2_iis_2015_81-90