TY - JOUR
T1 - Cross-validation to select the optimum rank for a reduced-rank approximation to multivariate data
AU - Arciniegas-Alarcón, Sergio
AU - García-Peña, Marisol
AU - Krzanowski, Wojtek J.
AU - Rengifo, Camilo
N1 - Publisher Copyright:
© 2024 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2024
Y1 - 2024
N2 - In this paper we consider the Gabriel form of cross-validation (CV) and we investigate how to estimate the optimum rank for lower rank approximations of any dataset that can be written in matrix form, with particular application in multivariate analysis and in the analysis of multienvironment trials. The literature related to the method suggests that it can produce overfitting and poor-quality predictions, characteristics that result in overestimation of the rank. Because of this, it is proposed to change the rank selection criterion, testing thirteen statistics both in the original method and in four proposed extensions that seek to solve the above problems. A comparison is made with two gold standard methods for CV through a simulation study and through the analysis of seventeen real datasets, two of which are general multivariate and fifteen are from experiments with genotype-by-environment interaction. It is concluded that from a predictive point of view, the highest accuracy in estimating the rank is obtained by using a regularized singular value decomposition.
AB - In this paper we consider the Gabriel form of cross-validation (CV) and we investigate how to estimate the optimum rank for lower rank approximations of any dataset that can be written in matrix form, with particular application in multivariate analysis and in the analysis of multienvironment trials. The literature related to the method suggests that it can produce overfitting and poor-quality predictions, characteristics that result in overestimation of the rank. Because of this, it is proposed to change the rank selection criterion, testing thirteen statistics both in the original method and in four proposed extensions that seek to solve the above problems. A comparison is made with two gold standard methods for CV through a simulation study and through the analysis of seventeen real datasets, two of which are general multivariate and fifteen are from experiments with genotype-by-environment interaction. It is concluded that from a predictive point of view, the highest accuracy in estimating the rank is obtained by using a regularized singular value decomposition.
KW - eigenvalues
KW - eigenvectors
KW - matrix data
KW - regularization
KW - singular value decomposition
UR - http://www.scopus.com/inward/record.url?scp=85193508562&partnerID=8YFLogxK
U2 - 10.1080/15427528.2024.2349610
DO - 10.1080/15427528.2024.2349610
M3 - Article
AN - SCOPUS:85193508562
SN - 1542-7528
VL - 38
SP - 344
EP - 367
JO - Journal of Crop Improvement
JF - Journal of Crop Improvement
IS - 4
ER -