LABAS-TS A system for assisting labeling of training sets for text classification

Alejandro Sierra-Múnera, Alexandra Pomares-Quimbaya, Rafael Andrés González Rivera, Julián Camilo Daza Rodríguez, Oscar Mauricio Muñoz Velandia, Angel Alberto Garcia Peña

Producción: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Most text classification techniques rely on the existence of training data sets that are required to build models. However, in many text classification projects, the availability of previously labeled texts is not frequent due to differences in language (e.g. Spanish), domain (e.g. healthcare) and regional or institutional written culture (e.g. specific hospital). In order to contribute to dealing with this problem, this paper presents LABAS-TS, a web-enabled system for assisting the open, collaborative labeling of training sets for text classification. LABAS-TS is framed within a named entity recognition approach that identifies important entities from a domain-specific corpus, based on gazetteers, and uses a language specific sentence analyzer that extracts the portions of text that should be annotated. LABAS-TS was evaluated in the generation of training data sets to classify whether an electronic health record text contains a diagnosis, a test or a procedure, and demonstrated its utility in reducing the required time for building a reliable training set, with an average of eleven seconds between two labels.

Idioma originalInglés
Título de la publicación alojadaIC3K 2017 - Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
EditoresKecheng Liu, Ana Carolina Salgado, Jorge Bernardino, Joaquim Filipe, Joaquim Filipe
EditorialSciTePress
Páginas174-180
Número de páginas7
ISBN (versión impresa)9789897582738
DOI
EstadoPublicada - 2017
Evento9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2017 - Funchal, Madeira, Portugal
Duración: 01 nov. 201703 nov. 2017

Serie de la publicación

NombreIC3K 2017 - Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
Volumen3

Conferencia

Conferencia9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2017
País/TerritorioPortugal
CiudadFunchal, Madeira
Período01/11/1703/11/17

Huella

Profundice en los temas de investigación de 'LABAS-TS A system for assisting labeling of training sets for text classification'. En conjunto forman una huella única.

Citar esto