TY - JOUR
T1 - Multimodal model for the Spanish sentiment analysis in a tourism domain
AU - Monsalve-Pulido, Julian
AU - Parra, Carlos Alberto
AU - Aguilar, Jose
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2024.
PY - 2024/12
Y1 - 2024/12
N2 - The problem of sentiment analysis of tourism data focuses on the analysis of the multimodal characteristics of the data generated digitally by tourists on each platform or social network. Generally, their opinions have multimodal characteristics, since they combine text, images or numbers (ratings), which represents an important challenge in sentiment analysis that requires new models or multimodal data classification techniques. This work proposes a multimodal sentiment analysis model for data in Spanish in the tourism domain composed of four main phases (extraction, classification, fusion, visualization), and a transversal phase to evaluate the quality of the multimodal sentiment analysis process. Thus, the multimodal sentiment analysis model integrates a data quality model to improve multimodal sentiment analysis tasks, but in addition, the linguistic resource "SenticNet 5" is adapted to Spanish. The model was validated by applying various classification metrics, and the classification results were compared to a manually labeled dataset (TASS) using two machine learning classification algorithms. The first was Random Forest, where the manually labeled dataset has a 50% F1 score compared to the adapted SenticNet automatically generated dataset, which has a 71% F1 score measure and a 70% accuracy. The classification generated by SenticNet is 21% higher than that of the TASS data set. The second algorithm applied was Support Vector Machine (SVM), which classified the SenticNet-generated dataset with an F1 score of 72% versus the manually created dataset with 57.7% (14.3% more effective). In the fusion tests of the multimodal sentiment inputs, the accuracy results for text were 65%, for images 33%, and the fusion of both was 71%. In general, it was identified that the opinions made by users composed of text in Spanish and images improve polarity identification if an independent classification is carried out, and then apply a polarity fusion process.
AB - The problem of sentiment analysis of tourism data focuses on the analysis of the multimodal characteristics of the data generated digitally by tourists on each platform or social network. Generally, their opinions have multimodal characteristics, since they combine text, images or numbers (ratings), which represents an important challenge in sentiment analysis that requires new models or multimodal data classification techniques. This work proposes a multimodal sentiment analysis model for data in Spanish in the tourism domain composed of four main phases (extraction, classification, fusion, visualization), and a transversal phase to evaluate the quality of the multimodal sentiment analysis process. Thus, the multimodal sentiment analysis model integrates a data quality model to improve multimodal sentiment analysis tasks, but in addition, the linguistic resource "SenticNet 5" is adapted to Spanish. The model was validated by applying various classification metrics, and the classification results were compared to a manually labeled dataset (TASS) using two machine learning classification algorithms. The first was Random Forest, where the manually labeled dataset has a 50% F1 score compared to the adapted SenticNet automatically generated dataset, which has a 71% F1 score measure and a 70% accuracy. The classification generated by SenticNet is 21% higher than that of the TASS data set. The second algorithm applied was Support Vector Machine (SVM), which classified the SenticNet-generated dataset with an F1 score of 72% versus the manually created dataset with 57.7% (14.3% more effective). In the fusion tests of the multimodal sentiment inputs, the accuracy results for text were 65%, for images 33%, and the fusion of both was 71%. In general, it was identified that the opinions made by users composed of text in Spanish and images improve polarity identification if an independent classification is carried out, and then apply a polarity fusion process.
KW - Multimodal model
KW - Opinion mining
KW - Sentiment analysis
KW - Spanish language
KW - Tourism
UR - http://www.scopus.com/inward/record.url?scp=85186121071&partnerID=8YFLogxK
U2 - 10.1007/s13278-024-01202-3
DO - 10.1007/s13278-024-01202-3
M3 - Article
AN - SCOPUS:85186121071
SN - 1869-5450
VL - 14
JO - Social Network Analysis and Mining
JF - Social Network Analysis and Mining
IS - 1
M1 - 46
ER -