--- language: - en - ru - uz - multilingual license: apache-2.0 tags: - multi-task-learning - token-classification - text-classification - ner - named-entity-recognition - intent-classification - language-detection - banking - transactions - financial - multilingual - bert - pytorch datasets: - custom metrics: - precision - recall - f1 - accuracy - seqeval widget: - text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting" example_title: "English Transaction" - text: "Отправить 150тыс рублей на счет ООО Ромашка 40817810099910004312 ИНН 987654321 за услуги" example_title: "Russian Transaction" - text: "44380583609046995897 ҳисобга 170190.66 UZS ўтказиш Голден Стар ИНН 485232484" example_title: "Uzbek Cyrillic Transaction" - text: "Show completed transactions from 01.12.2024 to 15.12.2024" example_title: "Query Request" library_name: transformers pipeline_tag: token-classification --- # Intentity AIBA - Multi-Task Banking Model 🏦🤖 ## Model Description **Intentity AIBA** is a state-of-the-art multi-task model that simultaneously performs: 1. 🌐 **Language Detection** - Identifies the language of input text 2. 🎯 **Intent Classification** - Determines user's intent 3. 📋 **Named Entity Recognition** - Extracts key entities from banking transactions Built on `google-bert/bert-base-multilingual-cased` with a shared encoder and three specialized output heads, this model provides comprehensive understanding of banking and financial transaction texts in multiple languages. ## 🎯 Capabilities ### Language Detection Supports 5 languages: - `en` - `mixed` - `ru` - `uz_cyrl` - `uz_latn` ### Intent Classification Recognizes 5 intent types: - `create_transaction` - `help` - `list_transaction` - `partial_entities` - `unknown` ### Named Entity Recognition Extracts 10 entity types: - `amount` - `bank_code` - `currency` - `date` - `description` - `end_date` - `receiver_hr` - `receiver_inn` - `receiver_name` - `start_date` ## 📊 Model Performance | Task | Metric | Score | |------|--------|-------| | **NER** | F1 Score | 0.9994 | | **NER** | Precision | 0.9994 | | **Intent** | F1 Score | 1.0000 | | **Intent** | Accuracy | 1.0000 | | **Language** | Accuracy | 0.8978 | | **Overall** | Average F1 | 0.9997 | ## 🚀 Quick Start ### Installation ```bash pip install transformers torch ``` ### Basic Usage ```python import torch from transformers import AutoTokenizer, AutoModel # Load model and tokenizer model_name = "primel/intentity-aiba" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(model_name) # Note: This is a custom multi-task model # Use the inference code below for predictions ``` ### Complete Inference Code ```python import torch from transformers import AutoTokenizer, AutoModel import json class IntentityAIBA: def __init__(self, model_name="primel/intentity-aiba"): self.tokenizer = AutoTokenizer.from_pretrained(model_name) self.model = AutoModel.from_pretrained(model_name) # Load label mappings from model config self.id2tag = self.model.config.id2label if hasattr(self.model.config, 'id2label') else {} # Note: Intent and language mappings should be loaded from model files self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") self.model.to(self.device) self.model.eval() def predict(self, text): """Predict language, intent, and entities for input text.""" inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128) inputs = {k: v.to(self.device) for k, v in inputs.items()} with torch.no_grad(): outputs = self.model(**inputs) # Extract predictions from custom model heads # (Implementation depends on your model architecture) return { 'language': 'detected_language', 'intent': 'detected_intent', 'entities': {} } # Initialize model = IntentityAIBA() # Predict text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719" result = model.predict(text) print(result) ``` ## 📝 Example Outputs ### Example 1: English Transaction **Input**: `"Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"` **Output**: ```python { "language": "en", "intent": "create_transaction", "entities": { "amount": "12.5mln", "currency": "USD", "receiver_name": "Apex Industries", "receiver_hr": "27109477752047116719", "receiver_inn": "123456789", "bank_code": "01234", "description": "consulting" } } ``` ### Example 2: Russian Transaction **Input**: `"Отправить 150тыс рублей на счет ООО Ромашка 40817810099910004312 ИНН 987654321"` **Output**: ```python { "language": "ru", "intent": "create_transaction", "entities": { "amount": "150тыс", "currency": "рублей", "receiver_name": "ООО Ромашка", "receiver_hr": "40817810099910004312", "receiver_inn": "987654321" } } ``` ### Example 3: Query Request **Input**: `"Show completed transactions from 01.12.2024 to 15.12.2024"` **Output**: ```python { "language": "en", "intent": "list_transaction", "entities": { "start_date": "01.12.2024", "end_date": "15.12.2024" } } ``` ## 🏗️ Model Architecture - **Base Model**: `google-bert/bert-base-multilingual-cased` - **Architecture**: Multi-task learning with shared encoder - Shared BERT encoder (110M parameters) - NER head: Token-level classifier - Intent head: Sequence-level classifier - Language head: Sequence-level classifier - **Total Parameters**: ~178M - **Loss Function**: Weighted combination (0.4 × NER + 0.3 × Intent + 0.3 × Language) ## 🎓 Training Details - **Training Samples**: 219,273 - **Validation Samples**: 38,696 - **Epochs**: 6 - **Batch Size**: 16 (per device) - **Learning Rate**: 3e-5 - **Warmup Ratio**: 0.15 - **Optimizer**: AdamW with weight decay - **LR Scheduler**: Linear with warmup - **Framework**: Transformers + PyTorch - **Hardware**: Trained on Tesla T4 GPU ## 💡 Use Cases - **Banking Applications**: Transaction processing and validation - **Chatbots**: Intent-aware financial assistants - **Document Processing**: Automated extraction from transaction documents - **Compliance**: KYC/AML data extraction - **Analytics**: Transaction categorization and analysis - **Multi-language Support**: Cross-border banking operations ## ⚠️ Limitations - Designed for banking/financial domain - may not generalize to other domains - Performance may vary on formats significantly different from training data - Mixed language texts may have lower accuracy - Best results with transaction-style texts similar to training distribution - Requires fine-tuning for specific banking systems or regional variations ## 📚 Citation ```bibtex @misc{intentity-aiba-2025, author = {Primel}, title = {Intentity AIBA: Multi-Task Banking Language Model}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{https://huggingface.co/primel/intentity-aiba}} } ``` ## 📄 License Apache 2.0 ## 🤝 Contact For questions, issues, or collaboration opportunities, please open an issue on the model repository. --- **Model Card Authors**: Primel **Last Updated**: 2025 **Model Version**: 1.0