Legalllama-shv / CompleteGuide
Shiv22419's picture
Rename CompleteGuide.txt to CompleteGuide
72ee7a2 verified
# In-Legal-IPC Documentation
## Table of Contents
1. [Overview](#overview)
2. [System Requirements](#system-requirements)
3. [Setup Guide](#setup-guide)
4. [Usage Guide](#usage-guide)
5. [Configuration](#configuration)
6. [Troubleshooting](#troubleshooting)
---
## Overview
**In-Legal-IPC** is an AI-powered legal assistant chatbot specializing in the Indian Penal Code. It uses Llama 3.3 70B model via Groq API to provide accurate legal information, search legal documents from Google Drive, and maintain conversational context for coherent interactions.
### Key Features
- AI-powered responses using Llama 3.3 70B (Groq)
- Vector-based search using FAISS and InLegalBERT embeddings
- Google Drive integration for document retrieval
- Conversational memory for context-aware responses
- Quick access to commonly used legal PDFs
---
## System Requirements
### Hardware Requirements
- **Processor**: Dual-core CPU or higher
- **RAM**: Minimum 4GB (8GB recommended)
- **Storage**: At least 2GB free space
- **Internet**: Stable connection required
### Software Requirements
- **Operating System**: Windows 10/11, macOS 10.14+, or Linux (Ubuntu 18.04+)
- **Python**: Version 3.10 or higher
- **pip**: Latest version
---
## Setup Guide
### Step 1: Environment Preparation
#### 1.1 Install Python
Verify Python installation:
```bash
python --version
# Should show Python 3.10 or higher
```
If not installed, download from [python.org](https://www.python.org/downloads/)
#### 1.2 Clone or Download Project
```bash
# Option 1: Clone repository
git clone <your-repository-url>
cd in-legal-ipc
# Option 2: Download and extract ZIP
# Then navigate to the extracted folder
```
### Step 2: Install Dependencies
#### 2.1 Create Virtual Environment (Recommended)
```bash
# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activate
```
#### 2.2 Install Required Packages
```bash
pip install -r requirements.txt
```
**Required packages include:**
- streamlit==1.40.2
- langchain
- langchain-groq
- langchain-community
- faiss-cpu
- sentence-transformers
- google-api-python-client
- google-auth
- fuzzywuzzy
- python-Levenshtein
- reportlab
### Step 3: Google Drive API Setup
#### 3.1 Create Google Cloud Project
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select existing one
3. Enable **Google Drive API**:
- Navigate to "APIs & Services" β†’ "Library"
- Search for "Google Drive API"
- Click "Enable"
#### 3.2 Create Service Account
1. Go to "APIs & Services" β†’ "Credentials"
2. Click "Create Credentials" β†’ "Service Account"
3. Fill in service account details:
- **Name**: `in-legal-ipc-service`
- **Description**: Service account for legal chatbot
4. Click "Create and Continue"
5. Skip optional steps and click "Done"
#### 3.3 Generate Credentials JSON
1. Click on the created service account
2. Go to "Keys" tab
3. Click "Add Key" β†’ "Create new key"
4. Select "JSON" format
5. Download the JSON file
6. Rename it to `credentials.json`
#### 3.4 Place Credentials File
```bash
# Create data directory if it doesn't exist
mkdir data
# Move credentials file
# Place credentials.json in data/ folder
mv ~/Downloads/credentials.json data/credentials.json
```
### Step 4: Google Drive Folder Setup
#### 4.1 Create Drive Folder
1. Go to [Google Drive](https://drive.google.com/)
2. Create a new folder named "Legal Documents" (or any name)
3. Note the folder ID from the URL:
```
https://drive.google.com/drive/folders/1LZIx-1tt_GormpU8nF_I2WL88Oxa9juU
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is your FOLDER_ID
```
#### 4.2 Share Folder with Service Account
1. Right-click the folder β†’ "Share"
2. Add the service account email (found in credentials.json under `client_email`)
3. Set permission to "Viewer"
4. Click "Send"
#### 4.3 Update Folder ID in Code
Open `app.py` and update line 13:
```python
FOLDER_ID = "YOUR_FOLDER_ID_HERE" # Replace with your actual folder ID
```
### Step 5: Prepare FAISS Database
#### 5.1 Verify Database Files
Ensure the following structure exists:
```
ipc_embed_db/
β”œβ”€β”€ index.faiss
└── index.pkl
```
#### 5.2 If Database Missing
If you don't have the pre-built database:
1. Prepare your IPC documents in text format
2. Create embeddings using InLegalBERT
3. Build FAISS index
4. Save to `ipc_embed_db/` directory
*Note: Database creation requires separate preprocessing. Contact the project maintainer if needed.*
### Step 6: Groq API Configuration
The Groq API key is already configured in the code. If you need to use a different key:
#### Option 1: Direct Configuration (Current)
The key is hardcoded in `app.py`
#### Option 2: Environment Variable (Recommended for Production)
```bash
# Windows
set GROQ_API_KEY=your_groq_api_key_here
# macOS/Linux
export GROQ_API_KEY=your_groq_api_key_here
```
Then update `app.py`:
```python
groq_api_key=os.getenv('GROQ_API_KEY')
```
### Step 7: Verify Installation
Run the verification script:
```bash
python -c "import streamlit; import langchain; import faiss; print('All packages installed successfully!')"
```
---
## Usage Guide
### Starting the Application
#### 1. Activate Virtual Environment
```bash
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
```
#### 2. Launch Streamlit App
```bash
streamlit run app.py
```
#### 3. Access the Interface
- The app will automatically open in your default browser
- If not, manually navigate to: `http://localhost:8501`
### Using the Chatbot
#### Basic Legal Queries
Simply type your legal question in the chat input:
**Example queries:**
```
What is Section 302 IPC?
Explain provisions for theft in IPC
What are the punishments for defamation?
Difference between Section 420 and 406
```
**Response format:**
- Bullet-point structured answers
- Context-aware explanations
- Citations of relevant IPC sections
- Common misconceptions clarified
#### Searching for Documents
To find legal forms or documents stored in Google Drive:
**Example queries:**
```
I need bail bond form
Show me inspection form
Find commercial court application form
```
**Keywords that trigger document search:**
- "form"
- "document"
- Any specific form name
**Search behavior:**
- Uses fuzzy matching (75% threshold)
- Shows available files in Drive
- Provides direct download links
- Suggests folder link if no match found
#### Quick Access PDFs
Click any of the PDF buttons to open commonly used documents:
- **Commercial Court Rules and Forms**
- **Bail-Bond**
- **Inspection Form**
- **Additional PDF**
#### Managing Conversations
**Conversation Memory:**
- Maintains last 5 exchanges
- Provides context-aware responses
- References previous questions
**Reset Conversation:**
- Click "πŸ—‘οΈ Reset All Chat" button
- Clears all messages and memory
- Starts fresh conversation
### Best Practices
#### Effective Querying
1. **Be Specific**: "Explain Section 304A IPC" vs "Tell me about death"
2. **Use Proper Terminology**: "What is culpable homicide?" vs "What is killing?"
3. **Ask Follow-ups**: The bot remembers context from previous messages
4. **For Documents**: Include the word "form" when searching for forms
#### Getting Better Results
- Start with general questions, then ask for specifics
- If response is unclear, rephrase your question
- Use the reset button if conversation context becomes confusing
- Verify important legal information with official sources
---
## Configuration
### Model Parameters
**Location:** `app.py` (lines 138-143)
```python
llm = ChatGroq(
model="llama-3.3-70b-versatile", # Model name
temperature=0.5, # Randomness (0.0-1.0)
max_tokens=1024, # Response length
groq_api_key="your_key_here" # API key
)
```
**Adjustable parameters:**
- `temperature`: Lower = more focused, Higher = more creative (recommended: 0.3-0.7)
- `max_tokens`: Maximum response length (recommended: 512-2048)
### Retriever Configuration
**Location:** `app.py` (line 131)
```python
db_retriever = db.as_retriever(
search_type="similarity",
search_kwargs={"k": 3} # Number of documents to retrieve
)
```
**Adjustments:**
- Increase `k` for more comprehensive answers (uses more context)
- Decrease `k` for faster, more focused responses
### Memory Configuration
**Location:** `app.py` (line 116)
```python
ConversationBufferWindowMemory(
k=5, # Number of exchanges to remember
memory_key="chat_history",
return_messages=True
)
```
**Adjustments:**
- Increase `k` for longer conversation memory
- Decrease `k` to reduce memory usage
### Document Search Threshold
**Location:** `app.py` (line 43)
```python
if score >= 75: # Fuzzy match threshold (0-100)
```
**Adjustments:**
- Lower threshold = more matches (may include irrelevant results)
- Higher threshold = fewer, more accurate matches
---
## Troubleshooting
### Installation Issues
#### Problem: "pip: command not found"
**Solution:**
```bash
# Ensure pip is installed
python -m ensurepip --upgrade
# Or use python -m pip
python -m pip install -r requirements.txt
```
#### Problem: Package installation fails
**Solution:**
```bash
# Update pip first
pip install --upgrade pip
# Install packages one by one if needed
pip install streamlit
pip install langchain-groq
# etc.
```
#### Problem: "No module named 'langchain_groq'"
**Solution:**
```bash
# Ensure requirements.txt has correct format
# Should be: langchain-groq
# NOT: pip install langchain-groq
pip install langchain-groq
```
### Google Drive API Issues
#### Problem: "Credentials file not found"
**Solution:**
- Verify `credentials.json` exists in `data/` folder
- Check file permissions (should be readable)
- Ensure path in code matches actual location
#### Problem: "Access denied" or "Insufficient permissions"
**Solution:**
1. Verify service account email in credentials.json
2. Check folder sharing settings in Google Drive
3. Ensure service account has at least "Viewer" permission
4. Re-share folder if necessary
#### Problem: "Folder ID not found"
**Solution:**
- Double-check folder ID in `app.py`
- Ensure folder isn't deleted or moved
- Verify you have access to the folder
### FAISS Database Issues
#### Problem: "FAISS database not found"
**Solution:**
```bash
# Verify directory structure
ls -la ipc_embed_db/
# Should contain:
# - index.faiss
# - index.pkl
```
#### Problem: "Deserialization error"
**Solution:**
- Ensure `allow_dangerous_deserialization=True` is set in code
- Rebuild FAISS index if corrupted
- Check Python version compatibility
### Runtime Errors
#### Problem: Streamlit won't start
**Solution:**
```bash
# Check if port 8501 is already in use
# Windows
netstat -ano | findstr :8501
# macOS/Linux
lsof -i :8501
# Use different port
streamlit run app.py --server.port 8502
```
#### Problem: "Groq API error" or rate limiting
**Solution:**
- Verify API key is valid
- Check Groq API status
- Wait a moment if rate limited
- Consider upgrading Groq plan for higher limits
#### Problem: Slow responses
**Solution:**
1. Reduce `max_tokens` in model config
2. Decrease retriever `k` value
3. Check internet connection speed
4. Verify Groq API isn't experiencing issues
#### Problem: "Memory error" or crashes
**Solution:**
1. Reduce conversation memory window (`k` parameter)
2. Clear chat history more frequently
3. Restart the application
4. Check system RAM availability
### Document Search Issues
#### Problem: No documents found
**Solution:**
1. Verify documents are in the correct Google Drive folder
2. Check if service account has folder access
3. Try exact file name instead of partial match
4. Lower fuzzy match threshold (below 75)
#### Problem: Wrong documents returned
**Solution:**
- Use more specific search terms
- Include full or partial file name
- Check if multiple files have similar names
- Increase fuzzy match threshold
### Response Quality Issues
#### Problem: Responses are too generic
**Solution:**
- Ask more specific questions
- Provide more context in your query
- Reference specific IPC sections
- Use follow-up questions with context
#### Problem: Responses are incorrect
**Solution:**
- Cross-verify with official IPC sources
- Rephrase your question
- Reset conversation and ask again
- Remember: Bot provides information, not legal advice
---
## Support and Additional Resources
### Getting Help
- Check this documentation first
- Review error messages carefully
- Search for similar issues online
- Contact project maintainer
### Useful Links
- [Streamlit Documentation](https://docs.streamlit.io/)
- [LangChain Documentation](https://python.langchain.com/)
- [Groq API Documentation](https://console.groq.com/docs)
- [Google Drive API Guide](https://developers.google.com/drive/api/guides/about-sdk)
### Updating the Application
```bash
# Pull latest changes
git pull origin main
# Update dependencies
pip install -r requirements.txt --upgrade
# Restart application
streamlit run app.py
```
---
## Important Notes
⚠️ **Legal Disclaimer**: This application is for informational purposes only and does not constitute legal advice. Always consult with a qualified legal professional for specific legal matters.
⚠️ **API Key Security**: Never commit API keys or credentials to version control. Use environment variables in production.
⚠️ **Data Privacy**: Be mindful of what information you share. Do not input sensitive personal or case-specific information.
---
*Last Updated: November 2025*