--- license: apache-2.0 datasets: - sovthpaw/senter-omni-data language: - en base_model: - Qwen/Qwen2.5-Omni-3B pipeline_tag: any-to-any ---
![Alt Text](senter-fixed-banner.gif) 🤘🤖
**🎯 ONE MODEL, ALL MODALITIES, CHAT & EMBED** - Unlike pipeline approaches, Senter-Omni is a single 4B parameter model that truly understands and reasons across text, images, audio, and video simultaneously. **🔓 OPEN & UNCENSORED** - Apache 2.0 licensed with unrestricted responses for maximum utility. **🧠 128K CONTEXT** - Extended RoPE scaling for handling massive documents and conversations. **💾 MEMORY EFFICIENT** - 4-bit quantized model that fits on consumer GPUs while maintaining full multimodal capabilities. --- ## 🚀 **Quick Start** ### **Installation** ```bash git clone https://github.com/SouthpawIN/senter-omni.git cd senter-omni pip install -r requirements.txt # Download the quantized model (instructions below) # Then run the demo: python senter_omni_demo.py ``` ### **Basic Usage** ```python from omni import OmniClient # Initialize Senter-Omni client = OmniClient() # Streaming chat response = client.chat([ {"role": "user", "content": "Hello Senter!"} ], stream=True) # Multimodal chat with image response = client.chat([ {"role": "user", "content": [ {"type": "image", "image": "photo.jpg"}, {"type": "text", "text": "What do you see?"} ]} ]) # Cross-modal embeddings embedding = client.embed("any content", modality="auto") ``` --- ## 🎭 **Multimodal Capabilities** ### **Text Understanding & Generation** - **Mathematical Reasoning**: Step-by-step problem solving - **Code Generation**: Python, JavaScript, and more - **Creative Writing**: Stories, scripts, poetry - **Technical Analysis**: Complex explanations and documentation ### **Visual Understanding** - **Image Analysis**: Detailed descriptions of visual content - **Geometric Recognition**: Shapes, colors, spatial relationships - **Creative Interpretation**: Stories inspired by images - **Technical Diagrams**: Understanding charts, graphs, schematics ### **Audio Processing** - **Sound Analysis**: Identifying audio content and patterns - **Speech Understanding**: Transcribing and interpreting spoken content - **Music Analysis**: Recognizing musical elements and genres - **Environmental Audio**: Identifying sounds from various sources ### **Cross-Modal Reasoning** - **Unified Understanding**: Connecting information across modalities - **Contextual Analysis**: Using multiple inputs for better reasoning - **Creative Synthesis**: Combining visual, audio, and text for rich responses ### **Model Specifications** - **Parameters**: 4B (quantized to 4-bit) - **Context Length**: 128K tokens (RoPE scaled) - **Memory Usage**: ~8GB VRAM - **Inference Speed**: Real-time streaming - **Modalities**: Text, Image, Audio, Video ### **Embedding Capabilities** - **Unified Space**: 1024D embeddings for all modalities - **Cross-Modal Search**: Find similar content across text, images, audio - **Similarity Matching**: Cosine similarity in unified space - **Memory Efficient**: Same model for chat and embeddings --- ## 🎯 **Real Examples** ### **Image Analysis** ```python # Analyze geometric shapes response = client.chat([ {"role": "user", "content": [ {"type": "image", "image": "test_assets/real_test_image.jpg"}, {"type": "text", "text": "What geometric shapes do you see?"} ]} ]) # Output: "I see a red square, blue square, and green oval arranged vertically" ``` ### **Audio Understanding** ```python # Process audio content response = client.chat([ {"role": "user", "content": [ {"type": "audio", "audio": "test_assets/real_test_audio.wav"}, {"type": "text", "text": "What do you hear?"} ]} ]) # Output: "I hear an electric hum from a device like a radio or TV" ``` ### **Creative Multimodal Storytelling** ```python # Create stories from images response = client.chat([ {"role": "user", "content": [ {"type": "image", "image": "shapes.jpg"}, {"type": "text", "text": "Create a story inspired by this image"} ]} ]) # Output: Rich, creative stories combining visual elements with narrative ``` ### **Cross-Modal Embeddings** ```python # Embed different modalities text_emb = client.embed("beautiful mountain landscape") image_emb = client.embed("mountain_photo.jpg", modality="image") audio_emb = client.embed("nature_sounds.wav", modality="audio") # All embeddings are in the same 1024D space for comparison ``` --- ## 🔧 **Technical Architecture** ### **Model Details** - **Base**: Qwen2.5-Omni-3B (Apache 2.0 licensed) - **Quantization**: 4-bit NF4 for memory efficiency - **Context Extension**: Yarn RoPE scaling to 128K - **Streaming**: Custom TimingStreamer for real-time output - **Embeddings**: Hash-based unified 1024D space ### **Training Data** - **131,893 samples** from multiple high-quality datasets: - 50,000 ShareGPT conversations (chat) - 30,000 AgentCode samples (function calling) - 20,000 Stack Overflow (coding) - 30,000 Hermes-3 (instruction tuning) - 1,893 Hermes function calling ### **Key Features** - **XML Tag Support**: ``, ``, ``, ``, `` - **Uncensored Responses**: No content restrictions - **Function Calling**: Tool integration capabilities - **Memory Efficient**: Single model for chat and embeddings --- ## 📦 **Installation & Setup** ### **1. Clone Repository** ```bash git clone https://github.com/SouthpawIN/senter-omni.git cd senter-omni ``` ### **2. Install Dependencies** ```bash pip install -r requirements.txt ``` ### **3. Download Model** The quantized model (3.5GB) is hosted on Hugging Face due to GitHub's 100MB file limit: - **Dataset**: https://huggingface.co/datasets/SouthpawIN/senter-omni-data ```bash # Option 1: Download from Hugging Face (Recommended) git lfs install git clone https://huggingface.co/SouthpawIN/senter-omni-model cp -r senter-omni-model/* ./senter_omni_128k/ # Option 2: Manual download # Download from: https://huggingface.co/SouthpawIN/senter-omni-model ``` ## 🎮 **Interactive Demo** The comprehensive demo showcases all capabilities: ```bash python senter_omni_demo.py ``` **Demo Sections:** 1. **🎓 Training Capabilities** - Dataset overview and training features 2. **💬 Multimodal Chat** - Text, image, audio, and combined processing 3. **🔍 Cross-Modal Embeddings** - Unified embedding space demonstration 4. **🚀 Building Guide** - API usage and integration examples --- ## 🛠️ **API Reference** ### **Core Methods** #### **`client.chat(messages, **kwargs)`** ```python # Basic chat response = client.chat([ {"role": "user", "content": "Hello!"} ]) # With parameters response = client.chat( messages=[{"role": "user", "content": "Hello!"}], max_tokens=256, temperature=0.7, stream=True ) # Multimodal response = client.chat([ {"role": "user", "content": [ {"type": "image", "image": "photo.jpg"}, {"type": "text", "text": "Describe this image"} ]} ]) ``` #### **`client.embed(content, modality="auto")`** ```python # Text embedding emb = client.embed("sample text") # Image embedding emb = client.embed("image.jpg", modality="image") # Audio embedding emb = client.embed("audio.wav", modality="audio") # Auto-detect modality emb = client.embed("[IMAGE] photo.jpg") # Detects as image ``` #### **`client.cross_search(query, top_k=5)`** ```python # Search across modalities results = client.cross_search("mountain landscape") # Returns: {"text": [...], "image": [...], "audio": [...]} ``` #### **`client.retrieve_context(query, context_window=5)`** ```python # Get relevant context context = client.retrieve_context("nature scenes") # Returns multimodal context items ``` --- ### **Memory Usage** - **Model Loading**: ~8GB VRAM - **Inference**: ~10GB VRAM peak - **Embeddings**: Shared model (no additional memory) - **Context (128K)**: ~2GB additional for full context ### **Development Setup** ```bash git clone https://github.com/SouthpawIN/senter-omni.git cd senter-omni pip install -r requirements.txt python senter_omni_demo.py # Test installation ``` --- ## 📄 **License** **Apache 2.0 License** - See [LICENSE](LICENSE) for details. This project uses: - **Qwen2.5-Omni**: Apache 2.0 (Alibaba Cloud) - **Training Datasets**: Various open licenses - **Code**: Apache 2.0 --- ## 🙏 **Acknowledgments** - **Alibaba Cloud** for Qwen2.5-Omni architecture - **Nous Research** for Hermes dataset and inspiration - **Alignment Lab AI** for development and training - **Unsloth** for efficient training framework - **HuggingFace** for model hosting and tools - **Open Source Community** for datasets and tools ---
**🎭 EXPERIENCE THE FUTURE OF MULTIMODAL AI WITH SENTER-OMNI** *Built with ❤️ by sovthpaw at Alignment Lab AI* Donations: https://www.paypal.me/Sellgames1l