|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- chemistry |
|
|
- drug-discovery |
|
|
- materials-science |
|
|
- generative-ai |
|
|
- diffusion-models |
|
|
--- |
|
|
|
|
|
|
|
|
# MolecularDiffusion: Pre-trained Models and Datasets |
|
|
|
|
|
Welcome to the repository for pre-trained models and datasets accompanying the **[MolecularDiffusion](https://github.com/pregHosh/MolCraftDiffusion)** framework. |
|
|
|
|
|
MolecularDiffusion is a unified Generative AI framework designed to streamline the entire lifecycle of 3D molecular diffusion models, from efficient training to seamless deployment in data-driven computational chemistry pipelines. |
|
|
|
|
|
**Find more details in our paper:** |
|
|
[](https://chemrxiv.org/engage/chemrxiv/article-details/6909e50fef936fb4a23df237) |
|
|
|
|
|
## Models |
|
|
|
|
|
This repository hosts several pre-trained 3D molecular diffusion models described in our paper. |
|
|
|
|
|
* **Pre-trained General Model:** A diffusion model trained on our comprehensive compiled dataset of 3D molecules. |
|
|
* **GEOM-Trained Models:** Diffusion models trained on the GEOM dataset, potentially exploring different training methodologies or variations described in the paper. |
|
|
|
|
|
|
|
|
## Datasets |
|
|
|
|
|
We provide the datasets used for training our models, as well as novel datasets generated by our models. |
|
|
|
|
|
### Training Datasets |
|
|
|
|
|
* **QM9:** Small organic molecules |
|
|
|
|
|
* **FORMED:** Synthesizable molecules from CSD |
|
|
|
|
|
* **Compiled 3D Molecules:** Our custom-compiled dataset used for pre-training, combining GEOM, QMug, COMPAS1, COMPAS3, FORMED, and OSCAR. |
|
|
|
|
|
* **IFLP Dataset:** Dataset of IFLP derived from the CoRE MOF 2019 database |
|
|
|
|
|
### Generated Datasets |
|
|
|
|
|
These datasets were generated using the MolecularDiffusion models: |
|
|
|
|
|
* **Asymmetric Cp Dataset:** A generated dataset focusing on asymmetric cyclopentadienyl ligands. |
|
|
|
|
|
* **Target IFLP Dataset:** Generated IFLP with desired geometrical features for the catlytic hydrogenation of CO$_2$ |
|
|
|
|
|
* **Singlet Fission Candidates:** A curated dataset of potential generated candidates for singlet fission applications. |