File size: 3,644 Bytes
8da36bd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
---
language: en
license: apache-2.0
tags:
- keyword-extraction
- research-papers
- t5
- text-generation
- academic
datasets:
- custom
widget:
- text: "extract keywords: Deep Learning for Computer Vision Applications"
example_title: "Computer Vision Example"
- text: "extract keywords: Quantum Machine Learning for Drug Discovery"
example_title: "Quantum Computing Example"
- text: "extract keywords: Blockchain Technology for Supply Chain Management"
example_title: "Blockchain Example"
---
# Research Paper Keyword Extractor
## Model Description
This is a fine-tuned T5-small model specifically trained for extracting keywords from research paper titles. The model takes a research paper title as input and generates relevant keywords that capture the main topics, methodologies, and application domains.
## Training Data
- **Total Training Examples**: 35
- **Validation Examples**: 9
- **Data Sources**: Manual curation + synthetic generation
- **Domains Covered**: Computer Science, Healthcare, Physics, Engineering, Mathematics, Biology, and more
## Training Configuration
- **Base Model**: t5-small
- **Epochs**: 3
- **Batch Size**: 2
- **Learning Rate**: 0.0005
- **Max Input Length**: 96 tokens
- **Max Output Length**: 48 tokens
## Usage
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("ZoeDuan/research-keyword-extractor")
model = T5ForConditionalGeneration.from_pretrained("ZoeDuan/research-keyword-extractor")
def extract_keywords(title):
input_text = f"extract keywords: {title}"
input_ids = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=96).input_ids
outputs = model.generate(
input_ids,
max_length=48,
num_beams=4,
no_repeat_ngram_size=2,
early_stopping=True,
do_sample=True,
temperature=0.8
)
keywords = tokenizer.decode(outputs[0], skip_special_tokens=True)
return keywords
# Example usage
title = "Machine Learning for Natural Language Processing Applications"
keywords = extract_keywords(title)
print(keywords)
# Expected output: Machine Learning, Natural Language Processing, NLP, AI, Text Processing
```
## Example Predictions
| Input Title | Generated Keywords |
|-------------|-------------------|
| Deep Learning for Computer Vision Applications | Deep Learning, Computer Vision, Neural Networks, AI, Image Processing |
| Quantum Computing in Cryptography and Security | Quantum Computing, Cryptography, Security, Quantum Algorithms, Cybersecurity |
| IoT and Edge Computing for Smart Cities | IoT, Edge Computing, Smart Cities, Internet of Things, Urban Technology |
## Model Performance
The model has been trained on diverse research domains and can extract:
- **Technical methodologies** (e.g., Machine Learning, Deep Learning)
- **Application domains** (e.g., Healthcare, Finance)
- **Specific technologies** (e.g., Transformer, CNN, Blockchain)
- **Research areas** (e.g., Computer Vision, NLP)
## Limitations
- Optimized for research paper titles in English
- May not perform well on highly specialized or emerging domains not covered in training
- Best performance on titles between 5-15 words
- May occasionally generate overlapping or redundant keywords
## License
This model is released under the Apache 2.0 license.
## Citation
If you use this model in your research, please cite:
```
@misc{research-keyword-extractor,
title={Research Paper Keyword Extractor},
author={Zoe Duan},
year={2025},
url={https://huggingface.co/ZoeDuan/research-keyword-extractor}
}
```
|