--- tags: - ontology-embedding - hyperbolic-space - hierarchical-reasoning - biomedical-ontology - generated_from_trainer - dataset_size:100000 - loss:HierarchyTransformerLoss base_model: sentence-transformers/all-mpnet-base-v2 widget: - source_sentence: cellular response to stimulus sentences: - response to stimulus - medial transverse frontopolar gyrus - biological regulation - source_sentence: regulation of cell differentiation involved in embryonic placenta development sentences: - thoracic wall - ectoderm-derived structure - regulation of cell differentiation - source_sentence: regulation of hippocampal neuron apoptotic process sentences: - external genitalia morphogenesis - compact layer of ventricle - biological regulation - source_sentence: transitional myocyte of internodal tract sentences: - secretory epithelial cell - internodal tract myocyte - insect haltere disc - source_sentence: alveolar atrium sentences: - organ part - superior recess of lesser sac - foramen of skull pipeline_tag: sentence-similarity library_name: sentence-transformers --- # OnT: Language Models as Ontology Encoders This is an OnT (Ontology Transformer) model trained on the GALEN dataset, based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). OnT is a language model-based framework for ontology embeddings, enabling effective representation of concepts as points in hyperbolic space and axioms as hierarchical relationships between concepts. ## Model Details ### Model Description - **Model Type:** Ontology Transformer (OnT) - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) - **Training Dataset:** GALEN - **Maximum Sequence Length:** 384 tokens - **Output Dimensionality:** 768 dimensions - **Embedding Space:** Hyperbolic Space - **Key Features:** - Hyperbolic embeddings for ontology concept encoding - Modeling of hierarchical relationships between concepts - Support for role embeddings as rotations over hyperbolic spaces - Concept rotation, transition, and existential quantifier representation ### Model Sources - **Repository:** [OnT on GitHub](https://github.com/HuiYang1997/OnT) - **Paper:** [Language Models as Ontology Encoders](https://arxiv.org/abs/2507.14334) ### Available Versions This model is available in **4 versions** (Git branches) to suit different use cases: | Branch | Training Type | Role Embedding | Use Case | |--------|------------|----------------|----------| | **`main`** (default) | Prediction Dataset | ✅ With role embedding | Default version: training on prediction dataset, support role embedding | | **`role-free`** | Prediction Dataset | ❌ Without role embedding | Training on prediction dataset, without role embedding | | **`inference-default`** | Inference Dataset | ✅ With role embedding | Training on inference dataset, with role support | | **`inference-role-free`** | Inference Dataset | ❌ Without role embedding | Training on inference dataset, without role embeddings | **How to use different versions:** ```python from OnT import OntologyTransformer # Default version (main branch - OnTr with role embedding) ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-galen") # Role-free version (without role embedding) ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-galen", revision="role-free") # Inference version with role embedding ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-galen", revision="inference-default") # Inference version without role embedding ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-galen", revision="inference-role-free") ``` ### Full Model Architecture ``` OntologyTransformer( (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Installation First, install the required dependencies: ```bash pip install sentence-transformers==3.4.0.dev0 ``` You also need to install [HierarchyTransformers](https://github.com/KRR-Oxford/HierarchyTransformers) following the instructions in their repository. ### Direct Usage Load the model and use it for ontology concept encoding: ```python import torch from OnT import OntologyTransformer # Load the OnT model path = "Hui97/OnT-MPNet-galen" ont = OntologyTransformer.from_pretrained(path) # Entity names to be encoded entity_names = [ 'alveolar atrium', 'organ part', 'superior recess of lesser sac', ] # Get the entity embeddings in hyperbolic space entity_embeddings = ont.encode_concept(entity_names) print(entity_embeddings.shape) # [3, 768] # Role sentences to be encoded role_sentences = [ "application attribute", "attribute", "chemical modifier" ] # Get the role embeddings (rotations and scalings) role_rotations, role_scalings = ont.encode_roles(role_sentences) ``` ## Citation ### BibTeX If you use this model, please cite: ```bibtex @article{yang2025language, title={Language Models as Ontology Encoders}, author={Yang, Hui and Chen, Jiaoyan and He, Yuan and Gao, Yongsheng and Horrocks, Ian}, journal={arXiv preprint arXiv:2507.14334}, year={2025} } ```