TY - JOUR
T1 - Missing-value imputation using the robust singular-value decomposition
T2 - Proposals and numerical evaluation
AU - García-Peña, Marisol
AU - Arciniegas-Alarcón, Sergio
AU - Krzanowski, Wojtek J.
AU - Duarte, Diego
N1 - Publisher Copyright:
© 2021 The Authors. Crop Science © 2021 Crop Science Society of America
PY - 2021/9/1
Y1 - 2021/9/1
N2 - A common problem in the analysis of data from multi-environment trials is imbalance caused by missing observations. To get around this problem, Yan proposed a method for imputing the missing values based on the singular-value decomposition (SVD) of a matrix. However, this SVD can be affected by outliers and produce low quality imputations. In this article, we propose four extensions of the Yan method that are resistant to outliers, replacing the standard SVD method with four robust SVD extensions. We evaluate these methods, using exclusively numerical criteria in a simulation study and in a cross-validation study based on real data. We conclude that in the presence of outliers, the standard SVD method should not be used; instead, the best alternatives are the robust SVD methods based on sub-sampling when the percentage of contamination is less than 2% following a completely random missing data mechanism. In any other case, methods that either minimize the L2 norm or that involve L1 regressions are preferable.
AB - A common problem in the analysis of data from multi-environment trials is imbalance caused by missing observations. To get around this problem, Yan proposed a method for imputing the missing values based on the singular-value decomposition (SVD) of a matrix. However, this SVD can be affected by outliers and produce low quality imputations. In this article, we propose four extensions of the Yan method that are resistant to outliers, replacing the standard SVD method with four robust SVD extensions. We evaluate these methods, using exclusively numerical criteria in a simulation study and in a cross-validation study based on real data. We conclude that in the presence of outliers, the standard SVD method should not be used; instead, the best alternatives are the robust SVD methods based on sub-sampling when the percentage of contamination is less than 2% following a completely random missing data mechanism. In any other case, methods that either minimize the L2 norm or that involve L1 regressions are preferable.
UR - http://www.scopus.com/inward/record.url?scp=85111730541&partnerID=8YFLogxK
U2 - 10.1002/csc2.20508
DO - 10.1002/csc2.20508
M3 - Article
AN - SCOPUS:85111730541
SN - 0011-183X
VL - 61
SP - 3288
EP - 3300
JO - Crop Science
JF - Crop Science
IS - 5
ER -