We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
Published Online:https://doi.org/10.4155/fmc.11.23

Background: Accuracy concerns the ability of a model to make correct predictions, while interpretability concerns to what degree the model allows for human understanding. Models exhibiting the former property are many times more complex and opaque, while interpretable models may lack the necessary accuracy. The trade-off between accuracy and interpretability for predictive in silico modeling is investigated. Method: A number of state-of-the-art methods for generating accurate models are compared with state-of-the-art methods for generating transparent models. Conclusion: Results on 16 biopharmaceutical classification tasks demonstrate that, although the opaque methods generally obtain higher accuracies than the transparent ones, one often only has to pay a quite limited penalty in terms of predictive performance when choosing an interpretable model.

Bibliography

  • van de Waterbeemd H, Gifford E. Admet in silico modeling: towards prediction paradise? Nat. Rev. Drug. Discov.2(3),192–204 (2003).Crossref, Medline, CASGoogle Scholar
  • Wagener M, van Geerestein V. Potential drugs and nondrugs: prediction and identification of important structural features. J. Chem. Inf. Comput. Sci.40(2),280–292 (2000).Crossref, Medline, CASGoogle Scholar
  • Bohanec M, Bratko I. Trading accuracy for simplicity in decision trees. Machine Learning15(3),223–250 (1994).Google Scholar
  • Plate T. Accuracy versus interpretability in flexible modeling: implementing a tradeoff using Gaussian process models. Behaviourmetrika26(1),29–50 (1999).CrossrefGoogle Scholar
  • Witten IH, Frank E. Data Mining – Practical Machine Learning Tools and Techniques. Elsevier, San Francisco, CA, USA (2005).Google Scholar
  • Quinlan JR. Induction of decision trees. Machine Learning11(1),81–106 (1986).Google Scholar
  • Cohen W. Fast effective rule induction. Proceedings of the 12th International Conference on Machine Learning, Lake Tahoe, CA, USA 1995.Google Scholar
  • Johansson U, Sönströd C, Löfström T, Boström H. Chipper – a novel algorithm for concept description. Scandinavian Conference on Artificial Intelligence, Stockholm, Sweden, 26–28 May 2008.Google Scholar
  • Hansen LK, Salamon P. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell.12(10),993–1001 (1990).CrossrefGoogle Scholar
  • 10  Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning. In: Advances in Neural Information Processing Systems (Volume 2). Morgan Kaufmann, San Mateo, CA, USA 650–659 (1995).Google Scholar
  • 11  Breiman L. Bagging predictors. Machine Learning24,123–140 (1996).Google Scholar
  • 12  Schapire R. The strength of weak learnability. Machine Learning5(2),197–227 (1990).Google Scholar
  • 13  Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006.Google Scholar
  • 14  Freund Y, Schapire R. Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, 4–6 July 1996.Google Scholar
  • 15  Breiman L. Random forests. Machine Learning45,5–32 (2001).CrossrefGoogle Scholar
  • 16  Fawcett T. Using rule sets to maximize ROC performance. Proceedings of the 2001 International Conference on Machine Learning 131–138, Washington, DC, USA (2001).Google Scholar
  • 17  Bruneau P. Search for predictive generic model of aqueous solubility using bayesian neural nets. J. Chem. Inf. Comput. Sci.41,1605–1616 (2001).Crossref, Medline, CASGoogle Scholar
  • 18  Rodgers D, Hahn M. Extended-connectivity fingerprints. J. Chem. Inf. Model.50,742–754 (2010).Crossref, MedlineGoogle Scholar
  • 19  Faulon Jl, Collins MJ, Carr RD. The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J. Chem. Inf. Comput. Sci.44,427–436 (2004).Crossref, Medline, CASGoogle Scholar
  • 20  Hou TJ, Zhang W, Xia K, Qiao XB, Xu XJ. ADME evaluation in drug discovery. 5. Correlation of caco-2 permeation with simple molecular properties. J. Chem. Inf. Comput. Sci.44,1585–1600 (2004).Crossref, Medline, CASGoogle Scholar
  • 21  Mittal RR, McKinnon RA, Sorich MJ. Comparison data sets for benchmarking QSAR methodologies in lead optimization. J. Chem. Inf. Model.49,1810–1820 (2009).Crossref, Medline, CASGoogle Scholar
  • 22  Bruce CL, Melville Jl, Pickett SD, Hirst JD. Contemporary QSAR classifiers compared. J. Chem. Inf. Model.47,219–227 (2007).Crossref, Medline, CASGoogle Scholar
  • 23  Demšar J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.7,1–30 (2006).Google Scholar
  • 24  Friedman M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc.32,675–701 (1937).CrossrefGoogle Scholar
  • 101  Organisation for Economic Co-operation and Development Quantitative Structure–Activity Relationships Project. www.oecd.org/document/23/0,3343,en_2649_34365_33957015_1_1_1_1,00.htmlGoogle Scholar
  • 102  European Commission Joint Research Centre Ex-European Chemicals Bureau Computational Toxicology. Organisation for Economic Co-operation and Development Quantitative Structure–Activity Relationships Project. http://ecb.jrc.ec.europa.eu/qsar/background/index.php?c=OECDGoogle Scholar
  • 103  US Environmental Protection Agency (EPA) computational toxicology research program. Carcinogenic potency database summary tables – all species database file. www.epa.gov/ncct/dsstox/sdf_cpdbas.htmlGoogle Scholar
  • 104  US EPA distributed structure-searchable toxicity public database network. EPA water disinfection by-products with carcinogenicity estimates database file. www.epa.gov/ncct/dsstox/sdf_dbpcan.htmlGoogle Scholar
  • 105  US EPA distributed structure-searchable toxicity public database network. FDA national center for toxicological research estrogen receptor binding database file. www.epa.gov/ncct/dsstox/sdf_nctrer.htmlGoogle Scholar