--- language: en license: apache-2.0 tags: - keyword-extraction - research-papers - t5 - text-generation - academic datasets: - custom widget: - text: "extract keywords: Deep Learning for Computer Vision Applications" example_title: "Computer Vision Example" - text: "extract keywords: Quantum Machine Learning for Drug Discovery" example_title: "Quantum Computing Example" - text: "extract keywords: Blockchain Technology for Supply Chain Management" example_title: "Blockchain Example" --- # Research Paper Keyword Extractor ## Model Description This is a fine-tuned T5-small model specifically trained for extracting keywords from research paper titles. The model takes a research paper title as input and generates relevant keywords that capture the main topics, methodologies, and application domains. ## Training Data - **Total Training Examples**: 35 - **Validation Examples**: 9 - **Data Sources**: Manual curation + synthetic generation - **Domains Covered**: Computer Science, Healthcare, Physics, Engineering, Mathematics, Biology, and more ## Training Configuration - **Base Model**: t5-small - **Epochs**: 3 - **Batch Size**: 2 - **Learning Rate**: 0.0005 - **Max Input Length**: 96 tokens - **Max Output Length**: 48 tokens ## Usage ```python from transformers import T5Tokenizer, T5ForConditionalGeneration # Load model and tokenizer tokenizer = T5Tokenizer.from_pretrained("ZoeDuan/research-keyword-extractor") model = T5ForConditionalGeneration.from_pretrained("ZoeDuan/research-keyword-extractor") def extract_keywords(title): input_text = f"extract keywords: {title}" input_ids = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=96).input_ids outputs = model.generate( input_ids, max_length=48, num_beams=4, no_repeat_ngram_size=2, early_stopping=True, do_sample=True, temperature=0.8 ) keywords = tokenizer.decode(outputs[0], skip_special_tokens=True) return keywords # Example usage title = "Machine Learning for Natural Language Processing Applications" keywords = extract_keywords(title) print(keywords) # Expected output: Machine Learning, Natural Language Processing, NLP, AI, Text Processing ``` ## Example Predictions | Input Title | Generated Keywords | |-------------|-------------------| | Deep Learning for Computer Vision Applications | Deep Learning, Computer Vision, Neural Networks, AI, Image Processing | | Quantum Computing in Cryptography and Security | Quantum Computing, Cryptography, Security, Quantum Algorithms, Cybersecurity | | IoT and Edge Computing for Smart Cities | IoT, Edge Computing, Smart Cities, Internet of Things, Urban Technology | ## Model Performance The model has been trained on diverse research domains and can extract: - **Technical methodologies** (e.g., Machine Learning, Deep Learning) - **Application domains** (e.g., Healthcare, Finance) - **Specific technologies** (e.g., Transformer, CNN, Blockchain) - **Research areas** (e.g., Computer Vision, NLP) ## Limitations - Optimized for research paper titles in English - May not perform well on highly specialized or emerging domains not covered in training - Best performance on titles between 5-15 words - May occasionally generate overlapping or redundant keywords ## License This model is released under the Apache 2.0 license. ## Citation If you use this model in your research, please cite: ``` @misc{research-keyword-extractor, title={Research Paper Keyword Extractor}, author={Zoe Duan}, year={2025}, url={https://huggingface.co/ZoeDuan/research-keyword-extractor} } ```