Statistical confidence for variable selection in QSAR models via Monte Carlo cross-validation
Journal Publication ResearchOnline@JCUAbstract
A new variable selection wrapper method named the Monte Carlo variable selection (MCVS) method was developed utilizing the framework of the Monte Carlo cross-validation (MCCV) approach. The MCVS method reports the variable selection results in the most conventional and common measure of statistical hypothesis testing, the P-values, thus allowing for a clear and simple statistical interpretation of the results. The MCVS method is equally applicable to the multiple-linear-regression (MLR)-based or non-MLR-based quantitative structure-activity relationship (QSAR) models. The method was applied to blood-brain barrier (BBB) permeation and human intestinal absorption (HIA) QSAR problems using MLR to demonstrate the workings of the new approach. Starting from more than 1600 molecular descriptors, only two (TPSA(NO) and ALOGP) yielded acceptably low P-values for the BBB and HIA problems, respectively. The new method has been implemented in the QSAR-BENCH v2 program, which is freely available (including its Java source code) from www.dmitrykonovalov.org for academic use.
Journal
Journal of Chemical Information and Modeling
Publication Name
N/A
Volume
48
ISBN/ISSN
1549-9596
Edition
N/A
Issue
2
Pages Count
14
Location
N/A
Publisher
American Chemical Society
Publisher Url
N/A
Publisher Location
N/A
Publish Date
N/A
Url
N/A
Date
N/A
EISSN
N/A
DOI
10.1021/ci700283s