TY - JOUR
T1 - Celebrity profiling through linguistic analysis of digital social networks
AU - Moreno-Sandoval, Luis G.
AU - Pomares-Quimbaya, Alexandra
AU - Alvarado-Valencia, Jorge A.
N1 - Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Digital social networks have become an essential source of information because celebrities use them to share their opinions, ideas, thoughts, and feelings. This makes digital social networks one of the preferred means for celebrities to promote themselves and attract new followers. This paper proposes a model of feature selection for the classification of celebrities profiles based on their use of a digital social network Twitter. The model includes the analysis of lexical, syntactic, symbolic, participation, and complementary information features of the posts of celebrities to estimate, based on these, their demographic and influence characteristics. The classification with these new features has an F1-score of 0.65 in Fame, 0.88 in Gender, 0.37 in Birth year, and 0.57 in Occupation. With these new features, the average accuracy improve up to 0.14 more. As a result, extracted features from linguistic cues improved the performance of predictive models of Fame and Gender and facilitate explanations of the model results. Particularly, the use of the third person singular was highly predictive in the model of Fame.
AB - Digital social networks have become an essential source of information because celebrities use them to share their opinions, ideas, thoughts, and feelings. This makes digital social networks one of the preferred means for celebrities to promote themselves and attract new followers. This paper proposes a model of feature selection for the classification of celebrities profiles based on their use of a digital social network Twitter. The model includes the analysis of lexical, syntactic, symbolic, participation, and complementary information features of the posts of celebrities to estimate, based on these, their demographic and influence characteristics. The classification with these new features has an F1-score of 0.65 in Fame, 0.88 in Gender, 0.37 in Birth year, and 0.57 in Occupation. With these new features, the average accuracy improve up to 0.14 more. As a result, extracted features from linguistic cues improved the performance of predictive models of Fame and Gender and facilitate explanations of the model results. Particularly, the use of the third person singular was highly predictive in the model of Fame.
KW - Author profile
KW - Celebrity profile
KW - Demographic features
KW - Influential feature
KW - Natural Language Processing
UR - http://www.scopus.com/inward/record.url?scp=85113437805&partnerID=8YFLogxK
U2 - 10.1186/s40649-021-00097-w
DO - 10.1186/s40649-021-00097-w
M3 - Article
AN - SCOPUS:85113437805
SN - 2197-4314
VL - 8
JO - Computational Social Networks
JF - Computational Social Networks
IS - 1
M1 - 16
ER -