arxivst stuff from arxiv that you should probably bookmark

On the Unreported-Profile-is-Negative Assumption for Predictive Cheminformatics

Abstract · Apr 3, 2017 15:33 ·

cs-lg physics-chem-ph stat-ml

Arxiv Abstract

  • Chao Lan
  • Sai Nivedita Chandrasekaran
  • Jun Huan

The study of compound-target binding profiles has been a central theme in cheminformatics. For data repositories that only provide positive binding profiles, a popular assumption is that all unreported profiles are negative. In this paper, we caution audience not to take such assumptions for granted. Under a problem setting where binding profiles are used as features to train predictive models, we present empirical evidence that (1) predictive performance degrades when the assumption fails and (2) explicit recovery of unreported profiles improves predictive performance. In particular, we propose a joint framework of profile recovery and supervised learning, which shows further performance improvement. Our study not only calls for more careful treatment of unreported profiles in cheminformatics, but also initiates a new machine learning problem which we called Learning with Positive and Unknown Features.

Read the paper (pdf) »