File size: 3,644 Bytes

8da36bd

---
language: en
license: apache-2.0
tags:
- keyword-extraction
- research-papers
- t5
- text-generation
- academic
datasets:
- custom
widget:
- text: "extract keywords: Deep Learning for Computer Vision Applications"
  example_title: "Computer Vision Example"
- text: "extract keywords: Quantum Machine Learning for Drug Discovery"
  example_title: "Quantum Computing Example"
- text: "extract keywords: Blockchain Technology for Supply Chain Management"
  example_title: "Blockchain Example"
---

# Research Paper Keyword Extractor

## Model Description

This is a fine-tuned T5-small model specifically trained for extracting keywords from research paper titles. The model takes a research paper title as input and generates relevant keywords that capture the main topics, methodologies, and application domains.

## Training Data

- **Total Training Examples**: 35
- **Validation Examples**: 9
- **Data Sources**: Manual curation + synthetic generation
- **Domains Covered**: Computer Science, Healthcare, Physics, Engineering, Mathematics, Biology, and more

## Training Configuration

- **Base Model**: t5-small
- **Epochs**: 3
- **Batch Size**: 2
- **Learning Rate**: 0.0005
- **Max Input Length**: 96 tokens
- **Max Output Length**: 48 tokens

## Usage

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("ZoeDuan/research-keyword-extractor")
model = T5ForConditionalGeneration.from_pretrained("ZoeDuan/research-keyword-extractor")

def extract_keywords(title):
    input_text = f"extract keywords: {title}"
    input_ids = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=96).input_ids
    
    outputs = model.generate(
        input_ids,
        max_length=48,
        num_beams=4,
        no_repeat_ngram_size=2,
        early_stopping=True,
        do_sample=True,
        temperature=0.8
    )
    
    keywords = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return keywords

# Example usage
title = "Machine Learning for Natural Language Processing Applications"
keywords = extract_keywords(title)
print(keywords)
# Expected output: Machine Learning, Natural Language Processing, NLP, AI, Text Processing
```

## Example Predictions

| Input Title | Generated Keywords |
|-------------|-------------------|
| Deep Learning for Computer Vision Applications | Deep Learning, Computer Vision, Neural Networks, AI, Image Processing |
| Quantum Computing in Cryptography and Security | Quantum Computing, Cryptography, Security, Quantum Algorithms, Cybersecurity |
| IoT and Edge Computing for Smart Cities | IoT, Edge Computing, Smart Cities, Internet of Things, Urban Technology |

## Model Performance

The model has been trained on diverse research domains and can extract:
- **Technical methodologies** (e.g., Machine Learning, Deep Learning)
- **Application domains** (e.g., Healthcare, Finance)
- **Specific technologies** (e.g., Transformer, CNN, Blockchain)
- **Research areas** (e.g., Computer Vision, NLP)

## Limitations

- Optimized for research paper titles in English
- May not perform well on highly specialized or emerging domains not covered in training
- Best performance on titles between 5-15 words
- May occasionally generate overlapping or redundant keywords

## License

This model is released under the Apache 2.0 license.

## Citation

If you use this model in your research, please cite:

```
@misc{research-keyword-extractor,
  title={Research Paper Keyword Extractor},
  author={Zoe Duan},
  year={2025},
  url={https://huggingface.co/ZoeDuan/research-keyword-extractor}
}
```