TY - JOUR
T1 - DOCUMENT CLASSIFICATION SYSTEM FOR THE SPANISH LANGUAGE
AU - Sandoval, Luis Gabriel Moreno
AU - Rojas, Liliana Maria Pantoja
AU - Cristancho, Nelson Giovanni Agudelo
AU - Meneses, Cristina Ramírez
N1 - Publisher Copyright:
© 2021 Little Lion Scientific
PY - 2022/1/15
Y1 - 2022/1/15
N2 - The classification of documents is a relevant task in companies to save time in managing information present in specific documents; therefore, the health sector seeks to prioritize documents performing the traceability of any process within its network. This article presents a document classification system to provide a tool divided in software components that faces the challenges of binding to the Spanish language using public sources such as Google and Wikipedia applying long documents related to the health sector in Colombia. For this purpose, a set of Machine Learning classifiers is performed to compare F1-score, Precision, and Recall metrics obtaining the best performance in the Logistic Regression classifier. In addition, the article makes a theoretical survey on the relationships that text mining, Information Retrieval, and Text Summarization have with document classification.
AB - The classification of documents is a relevant task in companies to save time in managing information present in specific documents; therefore, the health sector seeks to prioritize documents performing the traceability of any process within its network. This article presents a document classification system to provide a tool divided in software components that faces the challenges of binding to the Spanish language using public sources such as Google and Wikipedia applying long documents related to the health sector in Colombia. For this purpose, a set of Machine Learning classifiers is performed to compare F1-score, Precision, and Recall metrics obtaining the best performance in the Logistic Regression classifier. In addition, the article makes a theoretical survey on the relationships that text mining, Information Retrieval, and Text Summarization have with document classification.
KW - Document Classification
KW - Information Retrieval
KW - Spanish Language
KW - Text Mining
KW - Text Summarization
UR - http://www.scopus.com/inward/record.url?scp=85124460132&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85124460132
SN - 1992-8645
VL - 100
SP - 100
EP - 112
JO - Journal of Theoretical and Applied Information Technology
JF - Journal of Theoretical and Applied Information Technology
IS - 1
ER -