TY - JOUR
T1 - Beyond model interpretability using LDA and decision trees for α-amylase and α-glucosidase inhibitor classification studies
AU - Diéguez-Santana, Karel
AU - Rivera-Borroto, Oscar M.
AU - Puris, Amilkar
AU - Pham-The, Hai
AU - Le-Thi-Thu, Huong
AU - Rasulev, Bakhtiyor
AU - Casañola-Martin, Gerardo M.
N1 - Publisher Copyright:
© 2019 John Wiley & Sons A/S
PY - 2019/7
Y1 - 2019/7
N2 - In this report are used two data sets involving the main antidiabetic enzyme targets α-amylase and α-glucosidase. The prediction of α-amylase and α-glucosidase inhibitory activity as antidiabetic is carried out using LDA and classification trees (CT). A large data set of 640 compounds for α-amylase and 1546 compounds in the case of α-glucosidase are selected to develop the tree model. In the case of CT-J48 have the better classification model performances for both targets with values above 80%–90% for the training and prediction sets, correspondingly. The best model shows an accuracy higher than 95% for training set; the model was also validated using 10-fold cross-validation procedure and through a test set achieving accuracy values of 85.32% and 86.80%, correspondingly. Additionally, the obtained model is compared with other approaches previously published in the international literature showing better results. Finally, we can say that the present results provided a double-target approach for increasing the estimation of antidiabetic chemicals identification aimed by double-way workflow in virtual screening pipelines.
AB - In this report are used two data sets involving the main antidiabetic enzyme targets α-amylase and α-glucosidase. The prediction of α-amylase and α-glucosidase inhibitory activity as antidiabetic is carried out using LDA and classification trees (CT). A large data set of 640 compounds for α-amylase and 1546 compounds in the case of α-glucosidase are selected to develop the tree model. In the case of CT-J48 have the better classification model performances for both targets with values above 80%–90% for the training and prediction sets, correspondingly. The best model shows an accuracy higher than 95% for training set; the model was also validated using 10-fold cross-validation procedure and through a test set achieving accuracy values of 85.32% and 86.80%, correspondingly. Additionally, the obtained model is compared with other approaches previously published in the international literature showing better results. Finally, we can say that the present results provided a double-target approach for increasing the estimation of antidiabetic chemicals identification aimed by double-way workflow in virtual screening pipelines.
KW - antidiabetic agents
KW - decision trees
KW - linear discriminant analysis
KW - QSAR
UR - http://www.scopus.com/inward/record.url?scp=85066010185&partnerID=8YFLogxK
U2 - 10.1111/cbdd.13518
DO - 10.1111/cbdd.13518
M3 - Artículo
C2 - 30908888
AN - SCOPUS:85066010185
SN - 1747-0277
VL - 94
SP - 1414
EP - 1421
JO - Chemical Biology and Drug Design
JF - Chemical Biology and Drug Design
IS - 1
ER -