New Preprint on Predicting Changes of Molecular Derivatives

New preprint from the lab: 

“DeepDelta: Predicting Pharmacokinetic Improvements of Molecular Derivatives with Deep Learning”

https://chemrxiv.org/engage/chemrxiv/article-details/642d823f0784a63aee949898

 

DeepDelta Logo V2

Molecular machine learning workflows are commonly used to triage experiments and are becoming increasingly accurate due to expanding data availability, growing computational power, and algorithmic developments.

(cf https://pubmed.ncbi.nlm.nih.gov/32468207/ for context)

 

However, traditional algorithms require large datasets and have not been optimized to predict property differences between molecules. We can calculate property differences by subtracting individual predictions, but this approach does not always lead to accurate results. 

DeepDelta Figure 1 v2

We developed DeepDelta, a pairwise deep learning approach that processes two molecules simultaneously and learns to predict property differences between two molecules from small datasets to guide molecular optimization and prioritization. 

DeepDelta Figure 2 v2

 

We tested DeepDelta on 10 pharmacokinetic benchmark tasks and on 2 external test sets and saw that DeepDelta outperforms two established molecular machine learning algorithms, ChemProp and Random Forest. 

DeepDelta Figure 3 v2

 

 

With low error, our pairwise model could: (1) predict zero property differences when provided the same molecule for both inputs, (2) predict the inverse of the original prediction when swapping molecule order, and (3) predict additive differences between three molecules. 

 

DeepDelta Equations

 

A model’s ability to correctly predict no property change for same molecule pairs correlated strongly with overall cross-validation performance, suggesting that these simple unsupervised calculations could indicate model performance and convergence.  

DeepDelta Figure 5 v2

 

Overall, we believe that DeepDelta and extensions thereof will provide accurate and easily deployable predictions to steer molecular optimization and compound prioritization. Code and data are available on GitHub: https://github.com/RekerLab/DeepDelta.

This work was conducted by PhD student Zachary Fralish (BME) together with our Duke Undergraduates Ashley Chen (Comp Sci) and Paul Skaluba (BME & Math). We are hopeful about the promise of our machine learning models tailored to predict property differences between molecules. Stay tuned for more analysis and applications of pairwise deep learning approaches coming out of the Reker lab.