TY - JOUR
T1 - Gene Expression Datasets for Two Versions of the Saccharum spontaneum AP85-441 Genome
AU - López-Rozo, Nicolás
AU - Ramirez-Castrillon, Mauricio
AU - Romero, Miguel
AU - Finke, Jorge
AU - Rocha, Camilo
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2023/1
Y1 - 2023/1
N2 - Sugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the literature to document this organism’s gene co-expression and annotation, and, when available, use different gene identifiers that cannot be easily associated across studies. This data descriptor paper presents a dataset that consolidates expression matrices of two Saccharum spontaneum AP85-441 genome versions and an algorithm implemented in Python to mechanically obtain this dataset. The data are processed from the allele-level information of the two sources, with BLASTn used bidirectionally to suggest feasible mappings between the two sets of alleles, and a graph-matching optimization algorithm to maximize global identity and uniqueness of genes. Association tables are used to consolidate the expression values from alleles to genes. The contributed expression matrices comprise 96 experiments and 109,050 and 35,516 from the two genome versions. They can represent significant computational cost reduction for further research on, e.g., sugarcane co-expression network generation, functional annotation prediction, and stress-specific gene identification. Dataset:https://github.com/mauriciogeteg/sugarcane-gene-expression. Dataset License: CC-BY-NC.
AB - Sugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the literature to document this organism’s gene co-expression and annotation, and, when available, use different gene identifiers that cannot be easily associated across studies. This data descriptor paper presents a dataset that consolidates expression matrices of two Saccharum spontaneum AP85-441 genome versions and an algorithm implemented in Python to mechanically obtain this dataset. The data are processed from the allele-level information of the two sources, with BLASTn used bidirectionally to suggest feasible mappings between the two sets of alleles, and a graph-matching optimization algorithm to maximize global identity and uniqueness of genes. Association tables are used to consolidate the expression values from alleles to genes. The contributed expression matrices comprise 96 experiments and 109,050 and 35,516 from the two genome versions. They can represent significant computational cost reduction for further research on, e.g., sugarcane co-expression network generation, functional annotation prediction, and stress-specific gene identification. Dataset:https://github.com/mauriciogeteg/sugarcane-gene-expression. Dataset License: CC-BY-NC.
KW - allele expression
KW - expression matrix
KW - graph flow
KW - sugarcane
UR - http://www.scopus.com/inward/record.url?scp=85146888382&partnerID=8YFLogxK
U2 - 10.3390/data8010001
DO - 10.3390/data8010001
M3 - Article
AN - SCOPUS:85146888382
SN - 2306-5729
VL - 8
JO - Data
JF - Data
IS - 1
M1 - 1
ER -