--- license: mit tags: - chemistry - drug-discovery - materials-science - generative-ai - diffusion-models --- # MolecularDiffusion: Pre-trained Models and Datasets Welcome to the repository for pre-trained models and datasets accompanying the **[MolecularDiffusion](https://github.com/pregHosh/MolCraftDiffusion)** framework. MolecularDiffusion is a unified Generative AI framework designed to streamline the entire lifecycle of 3D molecular diffusion models, from efficient training to seamless deployment in data-driven computational chemistry pipelines. **Find more details in our paper:** [![arXiv](https://img.shields.io/badge/arXiv-6909e50fef936fb4a23df237-b31b1b.svg)](https://chemrxiv.org/engage/chemrxiv/article-details/6909e50fef936fb4a23df237) ## Models This repository hosts several pre-trained 3D molecular diffusion models described in our paper. * **Pre-trained General Model:** A diffusion model trained on our comprehensive compiled dataset of 3D molecules. * **GEOM-Trained Models:** Diffusion models trained on the GEOM dataset, potentially exploring different training methodologies or variations described in the paper. ## Datasets We provide the datasets used for training our models, as well as novel datasets generated by our models. ### Training Datasets * **QM9:** Small organic molecules * **FORMED:** Synthesizable molecules from CSD * **Compiled 3D Molecules:** Our custom-compiled dataset used for pre-training, combining GEOM, QMug, COMPAS1, COMPAS3, FORMED, and OSCAR. * **IFLP Dataset:** Dataset of IFLP derived from the CoRE MOF 2019 database ### Generated Datasets These datasets were generated using the MolecularDiffusion models: * **Asymmetric Cp Dataset:** A generated dataset focusing on asymmetric cyclopentadienyl ligands. * **Target IFLP Dataset:** Generated IFLP with desired geometrical features for the catlytic hydrogenation of CO$_2$ * **Singlet Fission Candidates:** A curated dataset of potential generated candidates for singlet fission applications.