loading page

Creating a Novel Deep Learning Pipeline to Generate and Screen Molecules for Hormone-Positive Breast Cancer Treatment
  • Nishank Raisinghani
Nishank Raisinghani
Dougherty Valley High School

Corresponding Author:[email protected]

Author Profile


There has been a lot of research done into the implementation of neural networks in the bioinformatics space, specifically with respect to drug discovery. Although there have been many promising steps taken in this direction, there is still a large amount of research yet to be done in this field. In this paper, we design a novel architecture that aims to generate novel molecules that will treat hormone-receptor-positive breast cancer disease. These molecules are aimed to inhibit aromatase, CDK4, CDK6, PI3K, and mTOR proteins. To do this, we used a natural language processor based variational autoencoder. Our model is trained on the ZINC open-source dataset due to its library of 250k drug molecules. To generate our molecules we compiled a test set of about 68 molecules that were already proven to bind to our mentioned target proteins. To measure the initial viability of our generated molecules we used RDKit’s quantitative estimated drug-likeness score, which will help provide insight into the drug-likeness of our generated data. Supplementary models helped predict other properties of our generated molecules, specifically solubility, synthetic accessibility, and toxicity to further heighten our screening process. We used the AutoDock Vina framework to predict the Gibbs Free Energy Score between the molecule and the desired target enzymes. Our experimentation was able to expand and improve upon a previous solubility prediction architecture to procure more accurate results on both solubility and synthetic accessibility of molecules. The goal of our research is to develop a novel high-throughput process to generate and screen for hormone-positive breast cancer drug molecules that can be feasible in the real world. Since the drug discovery space is so large (approximately 1060 molecules), neural networks are a valuable tool to help cut down the time and cost it takes to find these molecules. Through our experimentation, we were able to add a novel improvement to a working VAE framework by refining certain layers of the network’s decoder, leading to the generation of three molecules that passed our screening process and have high viability to be successful in suppressing hormone-positive breast cancer tumor growth.