Leveraging class hierarchy for detecting missing annotations on hierarchical multi-label classification

Miguel Romero, Felipe Kenji Nakano, Jorge Finke, Camilo Rocha, Celine Vens

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

With the development of new sequencing technologies, availability of genomic data has grown exponentially. Over the past decade, numerous studies have used genomic data to identify associations between genes and biological functions. While these studies have shown success in annotating genes with functions, they often assume that genes are completely annotated and fail to take into account that datasets are sparse and noisy. This work proposes a method to detect missing annotations in the context of hierarchical multi-label classification. More precisely, our method exploits the relations of functions, represented as a hierarchy, by computing probabilities based on the paths of functions in the hierarchy. By performing several experiments on a variety of rice (Oriza sativa Japonica), we showcase that the proposed method accurately detects missing annotations and yields superior results when compared to state-of-art methods from the literature.

Original languageEnglish
Article number106423
JournalComputers in Biology and Medicine
Volume152
DOIs
StatePublished - Jan 2023

Keywords

  • Detecting missing annotations
  • Gene function prediction
  • Gene ontology hierarchy
  • Hierarchical multi-label classification
  • Random forest
  • Structured output prediction
  • Tree ensembles

Fingerprint

Dive into the research topics of 'Leveraging class hierarchy for detecting missing annotations on hierarchical multi-label classification'. Together they form a unique fingerprint.

Cite this