DOCUMENT CLASSIFICATION SYSTEM FOR THE SPANISH LANGUAGE

Luis Gabriel Moreno Sandoval, Liliana Maria Pantoja Rojas, Nelson Giovanni Agudelo Cristancho, Cristina Ramírez Meneses

Producción: Contribución a una revistaArtículorevisión exhaustiva

Resumen

The classification of documents is a relevant task in companies to save time in managing information present in specific documents; therefore, the health sector seeks to prioritize documents performing the traceability of any process within its network. This article presents a document classification system to provide a tool divided in software components that faces the challenges of binding to the Spanish language using public sources such as Google and Wikipedia applying long documents related to the health sector in Colombia. For this purpose, a set of Machine Learning classifiers is performed to compare F1-score, Precision, and Recall metrics obtaining the best performance in the Logistic Regression classifier. In addition, the article makes a theoretical survey on the relationships that text mining, Information Retrieval, and Text Summarization have with document classification.

Idioma originalInglés
Páginas (desde-hasta)100-112
Número de páginas13
PublicaciónJournal of Theoretical and Applied Information Technology
Volumen100
N.º1
EstadoPublicada - 15 ene. 2022
Publicado de forma externa

Huella

Profundice en los temas de investigación de 'DOCUMENT CLASSIFICATION SYSTEM FOR THE SPANISH LANGUAGE'. En conjunto forman una huella única.

Citar esto