Skip to main navigation Skip to search Skip to main content

DOCUMENT CLASSIFICATION SYSTEM FOR THE SPANISH LANGUAGE

Research output: Contribution to journalArticlepeer-review

Abstract

The classification of documents is a relevant task in companies to save time in managing information present in specific documents; therefore, the health sector seeks to prioritize documents performing the traceability of any process within its network. This article presents a document classification system to provide a tool divided in software components that faces the challenges of binding to the Spanish language using public sources such as Google and Wikipedia applying long documents related to the health sector in Colombia. For this purpose, a set of Machine Learning classifiers is performed to compare F1-score, Precision, and Recall metrics obtaining the best performance in the Logistic Regression classifier. In addition, the article makes a theoretical survey on the relationships that text mining, Information Retrieval, and Text Summarization have with document classification.

Original languageEnglish
Pages (from-to)100-112
Number of pages13
JournalJournal of Theoretical and Applied Information Technology
Volume100
Issue number1
StatePublished - 15 Jan 2022
Externally publishedYes

Keywords

  • Document Classification
  • Information Retrieval
  • Spanish Language
  • Text Mining
  • Text Summarization

Fingerprint

Dive into the research topics of 'DOCUMENT CLASSIFICATION SYSTEM FOR THE SPANISH LANGUAGE'. Together they form a unique fingerprint.

Cite this