TY - GEN
T1 - Age classification from Spanish tweets the variable age analyzed by using linear classifiers
AU - Moreno-Sandoval, Luis G.
AU - Mendoza-Molina, Joan Felipe
AU - Puertas, Edwin Alexander
AU - Duque-Marín, Arturo
AU - Pomares-Quimbaya, Alexandra
AU - Alvarado-Valencia, Jorge A.
N1 - Publisher Copyright:
Copyright © 2018 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Text classification or text categorization in social networks such as Twitter has taken great importance with the growth of applications of this process in diverse domains of society. Literature about text classifiers is significantly wide especially in languages such as English; however, this is not the case for age classification whose studies have been mainly focused on image recognition and analysis. This paper presents the results of testing linear classifiers performance in the task of identifying Twitter users age from their profile descriptions and tweets. For this purpose, a Spanish Lexicon of 45 words around the concept "cumpleaños" was created and the Gold Standard of 1541 users with age correctly identified was obtained. The experiments are presented with the description of the algorithms used to finally obtain the best seven models that permit to identify the user's age with accuracy results between 66% and 69 %. Considering the information-retrieval layer, the new results showed that accuracy was increased from 69,09% to 72,96%.
AB - Text classification or text categorization in social networks such as Twitter has taken great importance with the growth of applications of this process in diverse domains of society. Literature about text classifiers is significantly wide especially in languages such as English; however, this is not the case for age classification whose studies have been mainly focused on image recognition and analysis. This paper presents the results of testing linear classifiers performance in the task of identifying Twitter users age from their profile descriptions and tweets. For this purpose, a Spanish Lexicon of 45 words around the concept "cumpleaños" was created and the Gold Standard of 1541 users with age correctly identified was obtained. The experiments are presented with the description of the algorithms used to finally obtain the best seven models that permit to identify the user's age with accuracy results between 66% and 69 %. Considering the information-retrieval layer, the new results showed that accuracy was increased from 69,09% to 72,96%.
KW - Age Classification
KW - Classification Problem
KW - SGD
KW - SVM
KW - Spanish
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85047761819&partnerID=8YFLogxK
U2 - 10.5220/0006811102750281
DO - 10.5220/0006811102750281
M3 - Conference contribution
AN - SCOPUS:85047761819
T3 - ICEIS 2018 - Proceedings of the 20th International Conference on Enterprise Information Systems
SP - 275
EP - 281
BT - ICEIS 2018 - Proceedings of the 20th International Conference on Enterprise Information Systems
A2 - Hammoudi, Slimane
A2 - Smialek, Michal
A2 - Camp, Olivier
A2 - Filipe, Joaquim
A2 - Filipe, Joaquim
PB - SciTePress
T2 - 20th International Conference on Enterprise Information Systems, ICEIS 2018
Y2 - 21 March 2018 through 24 March 2018
ER -