TY - JOUR
T1 - Predicting the microalgae lipid profile obtained by supercritical fluid extraction using a machine learning model
AU - Rangel Pinto, Juan David
AU - Guerrero Santacruz, Jose Luis
AU - Rivera, Lorena
AU - Parada-Pinilla, María Paula
AU - Cala, Mónica P.
AU - López, Gina
AU - Gonzalez-Barrios, Andrés Fernando
N1 - Publisher Copyright:
Copyright © 2024 Rangel Pinto, Guerrero, Rivera, Parada-Pinilla, Cala, López and González Barrios.
Copyright © 2024 Rangel Pinto, Guerrero, Rivera, Parada-Pinilla, Cala, López and González Barrios.
PY - 2024/10/24
Y1 - 2024/10/24
N2 - In this study a Machine Learning model was employed to predict the lipid profile from supercritical fluid extraction (SFE) of microalgae Galdieria sp. USBA-GBX-832 under different temperature (40, 50, 60°C), pressure (150, 250 bar), and ethanol flow (0.6, 0.9 mL min-1) conditions. Six machine learning regression models were trained using 33 independent variables: 29 from RD-Kit molecular descriptors, three from the extraction conditions, and the infinite dilution activity coefficient (IDAC). The lipidomic characterization analysis identified 139 features, annotating 89 lipids used as the entries of the model, primarily glycerophospholipids and glycerolipids. It was proposed a methodology for selecting the representative lipids from the lipidomic analysis using an unsupervised learning method, these results were compared with Tanimoto scores and IDAC calculations using COSMO-SAC-HB2 model. The models based on decision trees, particularly XGBoost, outperformed others (RMSE: 0.035, 0.095, 0.065 and coefficient of determination (R2): 0.971, 0.933, 0.946 for train, test and experimental validation, respectively), accurately predicting lipid profiles for unseen conditions. Machine Learning methods provide a cost-effective way to optimize SFE conditions and are applicable to other biological samples.
AB - In this study a Machine Learning model was employed to predict the lipid profile from supercritical fluid extraction (SFE) of microalgae Galdieria sp. USBA-GBX-832 under different temperature (40, 50, 60°C), pressure (150, 250 bar), and ethanol flow (0.6, 0.9 mL min-1) conditions. Six machine learning regression models were trained using 33 independent variables: 29 from RD-Kit molecular descriptors, three from the extraction conditions, and the infinite dilution activity coefficient (IDAC). The lipidomic characterization analysis identified 139 features, annotating 89 lipids used as the entries of the model, primarily glycerophospholipids and glycerolipids. It was proposed a methodology for selecting the representative lipids from the lipidomic analysis using an unsupervised learning method, these results were compared with Tanimoto scores and IDAC calculations using COSMO-SAC-HB2 model. The models based on decision trees, particularly XGBoost, outperformed others (RMSE: 0.035, 0.095, 0.065 and coefficient of determination (R2): 0.971, 0.933, 0.946 for train, test and experimental validation, respectively), accurately predicting lipid profiles for unseen conditions. Machine Learning methods provide a cost-effective way to optimize SFE conditions and are applicable to other biological samples.
KW - Microalgae
KW - Supercritical fluid extraction
KW - lipids
KW - Galdieria sp.
KW - supercritical fluid extraction
KW - regression models
KW - lipidomic
KW - COSMO-SAC
KW - extremophile microalgae
UR - http://www.scopus.com/inward/record.url?scp=85208613966&partnerID=8YFLogxK
U2 - 10.3389/fchem.2024.1480887
DO - 10.3389/fchem.2024.1480887
M3 - Article
C2 - 39525962
AN - SCOPUS:85208613966
SN - 2296-2646
VL - 12
JO - Frontiers in Chemistry
JF - Frontiers in Chemistry
M1 - 1480887
ER -