Custom Embedding Model

This repository contains a custom embedding model based on Jina Embeddings V4, optimized for generating embeddings for text, images, and visual documents.

Features

Multimodal embeddings for text and images
Multilingual support (30+ languages)
Task-specific adapters (retrieval, text-matching, code)
Flexible embedding dimensions

Setup

Install the required dependencies:

pip install -r requirements.txt

You can use the model in different ways:

Using the Handler

from handler import ModelHandler

# Initialize the model
model_handler = ModelHandler()
model_handler.initialize(None)

# Process text inputs
text_inputs = ["Your text here", "Another example"]
features = model_handler.preprocess({"body": {"inputs": text_inputs}})
result = model_handler.inference(features)
print(result)  # {"embeddings": [...]}

Using the API

Run the API server:

python api.py

Then make API requests:

import requests
import json

response = requests.post(
    "http://localhost:8000/embeddings",
    json={
        "inputs": [{"text": "Your text here"}, {"text": "Another example"}],
        "task": "retrieval"
    }
)
print(response.json())  # {"embeddings": [...]}

Using the Pipeline

from pipeline import load_pipeline

# Load the pipeline
pipeline = load_pipeline("path/to/model")

# Generate embeddings
embeddings = pipeline("Your text here", task="retrieval")
print(embeddings.shape)  # (1, 2048)

Demo UI

You can also run a Gradio demo UI:

python app.py

This will start a web interface for testing embeddings and comparing similarities between text and images.

License

This model is available under the same terms as the original model it's based on. Please refer to the license information for details.