New Preprint on Marginalized Graph Kernels
New preprint alert from our lab! “Interpretable Molecular Property Predictions Using Marginalized Graph Kernels” https://chemrxiv.org/engage/chemrxiv/article-details/63ef9e761d2d184063a...
Marginalized graph kernels are a new approach to quantify molecular similarity directly from molecular graphs and can serve as input to kernel machines (e.g. SVM or GPR) with performance similar to graph neural networks (Figure below from https://doi.org/10.1021/acs.jcim.1c01118)
However, we can not yet interpret graph-based similarity. Interpretability helps to build trust & detect biases. We derive two interpretations to identify (1) the most important atoms, and (2) the most important training data points causing a certain prediction.
On the “logic” benchmark, our atomic attribution performed similar to state of the art GNN atomic attribution and never performed worse than 95% of the best approach per dataset while neural network performance varied more widely.
We evaluated predicted performance of MGK-GPR on the FreeSolv benchmark and found that it outperformed standard RBF kernels.
When consulting our “molecular attribution” for why MGK outperforms RBF, we found MGK created more “chemically reasonable” similarities between molecules while classic fingerprints were misled by bit collisions.
Intrigued by this finding, we calculated average molecular attribution and found that MGK creates “more significant” nearest neighbor relationships compared to RBF (p = 1e-116, paired T test).
Overall we hope that these measures of interpretability will help to further establish marginalized graph kernels for molecular machine learning to aid in drug development.This work was conducted by our postdoc Yan together with colleagues Yu-Hang Tang and Guang Lin.
Congratulations everyone on this exciting study!