TY - GEN
T1 - Supervised Gene Function Prediction Using Spectral Clustering on Gene Co-expression Networks
AU - Romero, Miguel
AU - Ramírez, Óscar
AU - Finke, Jorge
AU - Rocha, Camilo
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Gene annotation addresses the problem of predicting unknown functions that are associated to the genes of a specific organism (e.g., biological processes). Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents an in silico approach to the annotation of genes that follows a network-based representation, and combines techniques from multivariate statistics (spectral clustering) and machine learning (gradient boosting). Spectral clustering is used to enrich the gene co-expression network (GCN) with currently known gene annotations. Gradient boosting is trained on features of the GCN to build an estimator of the probability that a gene is involved in a given biological process. The proposed approach is applied to a case study on Zea mays, one of the world’s most dominant and productive crop. Broadly speaking, the main results illustrate how computational experimentation narrows down the time and costs in efforts to annotate the functions of genes. More specifically, the results highlight the importance of network science, multivariate statistics, and machine learning techniques in reducing types I and II prediction errors.
AB - Gene annotation addresses the problem of predicting unknown functions that are associated to the genes of a specific organism (e.g., biological processes). Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents an in silico approach to the annotation of genes that follows a network-based representation, and combines techniques from multivariate statistics (spectral clustering) and machine learning (gradient boosting). Spectral clustering is used to enrich the gene co-expression network (GCN) with currently known gene annotations. Gradient boosting is trained on features of the GCN to build an estimator of the probability that a gene is involved in a given biological process. The proposed approach is applied to a case study on Zea mays, one of the world’s most dominant and productive crop. Broadly speaking, the main results illustrate how computational experimentation narrows down the time and costs in efforts to annotate the functions of genes. More specifically, the results highlight the importance of network science, multivariate statistics, and machine learning techniques in reducing types I and II prediction errors.
UR - http://www.scopus.com/inward/record.url?scp=85122498077&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-93413-2_54
DO - 10.1007/978-3-030-93413-2_54
M3 - Conference contribution
AN - SCOPUS:85122498077
SN - 9783030934125
T3 - Studies in Computational Intelligence
SP - 652
EP - 663
BT - Complex Networks and Their Applications X - Volume 2, Proceedings of the 10th International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2021
A2 - Benito, Rosa Maria
A2 - Cherifi, Chantal
A2 - Cherifi, Hocine
A2 - Moro, Esteban
A2 - Rocha, Luis M.
A2 - Sales-Pardo, Marta
PB - Springer Science and Business Media Deutschland GmbH
T2 - 10th International Conference on Complex Networks and Their Applications, COMPLEX NETWORKS 2021
Y2 - 30 November 2021 through 2 December 2021
ER -