Abstract
The classification of documents is a relevant task in companies to save time in managing information present in specific documents; therefore, the health sector seeks to prioritize documents performing the traceability of any process within its network. This article presents a document classification system to provide a tool divided in software components that faces the challenges of binding to the Spanish language using public sources such as Google and Wikipedia applying long documents related to the health sector in Colombia. For this purpose, a set of Machine Learning classifiers is performed to compare F1-score, Precision, and Recall metrics obtaining the best performance in the Logistic Regression classifier. In addition, the article makes a theoretical survey on the relationships that text mining, Information Retrieval, and Text Summarization have with document classification.
| Original language | English |
|---|---|
| Pages (from-to) | 100-112 |
| Number of pages | 13 |
| Journal | Journal of Theoretical and Applied Information Technology |
| Volume | 100 |
| Issue number | 1 |
| State | Published - 15 Jan 2022 |
| Externally published | Yes |
Keywords
- Document Classification
- Information Retrieval
- Spanish Language
- Text Mining
- Text Summarization
Fingerprint
Dive into the research topics of 'DOCUMENT CLASSIFICATION SYSTEM FOR THE SPANISH LANGUAGE'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver