Spaces:
Sleeping
Sleeping
| title: LoomRAG | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: pink | |
| sdk: streamlit | |
| sdk_version: 1.41.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: π§ Multimodal RAG that "weaves" together text and images πͺ‘ | |
| # π LoomRAG: Multimodal Retrieval-Augmented Generation for AI-Powered Search | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| <a href="https://huggingface.co/spaces/NotShrirang/LoomRAG"><img src="https://img.shields.io/badge/Streamlit%20App-red?style=flat-rounded-square&logo=streamlit&labelColor=white"/></a> | |
| This project implements a Multimodal Retrieval-Augmented Generation (RAG) system, named **LoomRAG**, that leverages OpenAI's CLIP model for neural cross-modal retrieval and semantic search. The system allows users to input text queries and retrieve both text and image responses seamlessly through vector embeddings. It features a comprehensive annotation interface for creating custom datasets and supports CLIP model fine-tuning with configurable parameters for domain-specific applications. The system also supports uploading images and PDFs for enhanced interaction and intelligent retrieval capabilities through a Streamlit-based interface. | |
| Experience the project in action: | |
| [](https://huggingface.co/spaces/NotShrirang/LoomRAG) | |
| --- | |
| ## πΈ Implementation Screenshots | |
| |  |  | | |
| | ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | | |
| | Data Upload Page | Data Search / Retrieval | | |
| | | | | |
| |  |  | | |
| | Data Annotation Page | CLIP Fine-Tuning | | |
| --- | |
| ## β¨ Features | |
| - π **Cross-Modal Retrieval**: Search text to retrieve both text and image results using deep learning | |
| - π **Streamlit Interface**: Provides a user-friendly web interface for interacting with the system | |
| - π€ **Upload Options**: Allows users to upload images and PDFs for AI-powered processing and retrieval | |
| - π§ **Embedding-Based Search**: Uses OpenAI's CLIP model to align text and image embeddings in a shared latent space | |
| - π **Augmented Text Generation**: Enhances text results using LLMs for contextually rich outputs | |
| - π·οΈ **Image Annotation**: Enables users to annotate uploaded images through an intuitive interface | |
| - π― **CLIP Fine-Tuning**: Supports custom model training with configurable parameters including test dataset split size, learning rate, optimizer, and weight decay | |
| - π¨ **Fine-Tuned Model Integration**: Seamlessly load and utilize fine-tuned CLIP models for enhanced search and retrieval | |
| --- | |
| ## ποΈ Architecture Overview | |
| 1. **Data Indexing**: | |
| - Text, images, and PDFs are preprocessed and embedded using the CLIP model | |
| - Embeddings are stored in a vector database for fast and efficient retrieval | |
| 2. **Query Processing**: | |
| - Text queries are converted into embeddings for semantic search | |
| - Uploaded images and PDFs are processed and embedded for comparison | |
| - The system performs a nearest neighbor search in the vector database to retrieve relevant text and images | |
| 3. **Response Generation**: | |
| - For text results: Optionally refined or augmented using a language model | |
| - For image results: Directly returned or enhanced with image captions | |
| - For PDFs: Extracts text content and provides relevant sections | |
| 4. **Image Annotation**: | |
| - Dedicated annotation page for managing uploaded images | |
| - Support for creating and managing multiple datasets simultaneously | |
| - Flexible annotation workflow for efficient data labeling | |
| - Dataset organization and management capabilities | |
| 5. **Model Fine-Tuning**: | |
| - Custom CLIP model training on annotated images | |
| - Configurable training parameters for optimization | |
| - Integration of fine-tuned models into the search pipeline | |
| --- | |
| ## π Installation | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://github.com/NotShrirang/LoomRAG.git | |
| cd LoomRAG | |
| ``` | |
| 2. Create a virtual environment and install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| --- | |
| ## π Usage | |
| 1. **Running the Streamlit Interface**: | |
| - Start the Streamlit app: | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| - Access the interface in your browser to: | |
| - Submit natural language queries | |
| - Upload images or PDFs to retrieve contextually relevant results | |
| - Annotate uploaded images | |
| - Fine-tune CLIP models with custom parameters | |
| - Use fine-tuned models for improved search results | |
| 2. **Example Queries**: | |
| - **Text Query**: "sunset over mountains" | |
| Output: An image of a sunset over mountains along with descriptive text | |
| - **PDF Upload**: Upload a PDF of a scientific paper | |
| Output: Extracted key sections or contextually relevant images | |
| --- | |
| ## βοΈ Configuration | |
| - π **Vector Database**: It uses FAISS for efficient similarity search | |
| - π€ **Model**: Uses OpenAI CLIP for neural embedding generation | |
| - βοΈ **Augmentation**: Optional LLM-based augmentation for text responses | |
| - ποΈ Fine-Tuning: Configurable parameters for model training and optimization | |
| --- | |
| ## πΊοΈ Roadmap | |
| - [x] Fine-tuning CLIP for domain-specific datasets | |
| - [ ] Adding support for audio and video modalities | |
| - [ ] Improving the re-ranking system for better contextual relevance | |
| - [ ] Enhanced PDF parsing with semantic section segmentation | |
| --- | |
| ## π€ Contributing | |
| Contributions are welcome! Please open an issue or submit a pull request for any feature requests or bug fixes. | |
| --- | |
| ## π License | |
| This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details. | |
| --- | |
| ## π Acknowledgments | |
| - [OpenAI CLIP](https://openai.com/research/clip) | |
| - [FAISS](https://github.com/facebookresearch/faiss) | |
| - [Hugging Face](https://huggingface.co/) | |