About

DeepCodex uses a deep-learned embedding of gene expression profiles to find chemical and genetic perturbations with similar biological effects. Its deep neural network was trained on the NIH LINCS dataset, which comprises more than 1.5M expression profiles from over 40,000 perturbagens (compounds or genetic manipulations). NIH LINCS data were measured using the L1000 platform.
If you use DeepCodex in your research, please cite:
Yoni Donner, Stephane Kazmierczak, and Kristen Fortney
Drug repurposing using deep embeddings of gene expression profiles Molecular Pharmaceutics.
DOI: 10.1021/acs.molpharmaceut.8b00284
Abstract
Computational drug repositioning requires assessment of the functional similarities among compounds. Here, we report a new method for measuring compound functional similarity based on gene expression data. This approach takes advantage of deep neural networks to learn an embedding that substantially denoises expression data, making replicates of the same compound more similar. Our method uses unlabeled data in the sense that it only requires compounds to be labeled by identity rather than detailed pharmacological information, which is often unavailable and costly to obtain. Similar- ity in the learned embedding space accurately predicted pharmacological similarities despite the lack of any such labels during training, and achieved substantially improved performance in comparison with previous similarity measures applied directly to gene expression measurements. Our method could identify drugs with shared therapeutic and biological targets even when the compounds were structurally dissimilar, thereby revealing previously unreported functional relationships between compounds. Thus, our approach provides an improved engine for drug repurposing based on expression data.
Requirements for CSV upload:
CSV should conform to the following:
  • No header
  • 1 row
  • 978 columns
  • no missing values
  • all values are floating-point numbers
  • the order of the genes should be the same as the LINCS level 4 data
Download the LINCS level 4 data.
Download here CSV mapping of DeepCodex IDs to Pert IDs.
BIOAGE develops drugs to treat aging and its associated diseases. We created Deep Codex, an algorithm that uses artificial intelligence (deep learning) to identify groups of small molecules that have similar functional impact on human biology.