Abstract
Transfer learning has demonstrated its potential in natural language processing tasks, where models have been pre-trained on large corpora and then tuned to specific tasks. We applied pre-trained transfer models to a Spanish biomedical document classification task. The main goal is to analyze the performance of text classification by clinical specialties using state-of-the-art language models for Spanish, and compared them with the results using corresponding models in English and with the most important pre-trained model for the biomedical domain. The outcomes present interesting perspectives on the performance of language models that are pre-trained for a particular domain. In particular, we found that BioBERT achieved better results on Spanish texts translated into English than the general domain model in Spanish and the state-of-the-art multilingual model.
| Original language | English |
|---|---|
| Title of host publication | Public Health and Informatics |
| Subtitle of host publication | Proceedings of MIE 2021 |
| Publisher | IOS Press |
| Pages | 377-381 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781643681856 |
| ISBN (Print) | 9781643681849 |
| DOIs | |
| State | Published - 01 Jul 2021 |
Keywords
- Classification
- Natural Language Processing
- Spanish
- Transfer learning