MolecularDiffusion / README.md
pregH's picture
Update README.md
2a1aa1a verified
---
license: mit
tags:
- chemistry
- drug-discovery
- materials-science
- generative-ai
- diffusion-models
---
# MolecularDiffusion: Pre-trained Models and Datasets
Welcome to the repository for pre-trained models and datasets accompanying the **[MolecularDiffusion](https://github.com/pregHosh/MolCraftDiffusion)** framework.
MolecularDiffusion is a unified Generative AI framework designed to streamline the entire lifecycle of 3D molecular diffusion models, from efficient training to seamless deployment in data-driven computational chemistry pipelines.
**Find more details in our paper:**
[![arXiv](https://img.shields.io/badge/arXiv-6909e50fef936fb4a23df237-b31b1b.svg)](https://chemrxiv.org/engage/chemrxiv/article-details/6909e50fef936fb4a23df237)
## Models
This repository hosts several pre-trained 3D molecular diffusion models described in our paper.
* **Pre-trained General Model:** A diffusion model trained on our comprehensive compiled dataset of 3D molecules.
* **GEOM-Trained Models:** Diffusion models trained on the GEOM dataset, potentially exploring different training methodologies or variations described in the paper.
## Datasets
We provide the datasets used for training our models, as well as novel datasets generated by our models.
### Training Datasets
* **QM9:** Small organic molecules
* **FORMED:** Synthesizable molecules from CSD
* **Compiled 3D Molecules:** Our custom-compiled dataset used for pre-training, combining GEOM, QMug, COMPAS1, COMPAS3, FORMED, and OSCAR.
* **IFLP Dataset:** Dataset of IFLP derived from the CoRE MOF 2019 database
### Generated Datasets
These datasets were generated using the MolecularDiffusion models:
* **Asymmetric Cp Dataset:** A generated dataset focusing on asymmetric cyclopentadienyl ligands.
* **Target IFLP Dataset:** Generated IFLP with desired geometrical features for the catlytic hydrogenation of CO$_2$
* **Singlet Fission Candidates:** A curated dataset of potential generated candidates for singlet fission applications.