Spaces:

takomattyy
/

handyhome-ocr-api

Sleeping

File size: 8,486 Bytes

db10255

# Deployment Guide for Hugging Face Spaces

This guide will help you deploy the HandyHome OCR API to Hugging Face Spaces.

## Prerequisites

- A Hugging Face account (free at https://huggingface.co/join)
- Git installed on your machine (optional, for command-line deployment)

## Deployment Options

### Option 1: Web UI Deployment (Easiest)

#### Step 1: Create a New Space

1. Go to https://huggingface.co/new-space
2. Fill in the details:
   - **Owner**: Your username
   - **Space name**: `handyhome-ocr-api` (or any name you prefer)
   - **License**: MIT
   - **Select the Space SDK**: Choose **Docker**
   - **Space hardware**: Start with **CPU basic** (free tier)
   - **Visibility**: Choose Public or Private

3. Click **Create Space**

#### Step 2: Upload Files via Web UI

1. In your new Space, click **Files** tab
2. Click **Add file** → **Upload files**
3. Upload the following files from the `huggingface-ocr` folder:
   ```

   app.py

   requirements.txt

   Dockerfile

   README.md

   .gitignore

   extract_national_id.py

   extract_drivers_license.py

   extract_prc.py

   extract_umid.py

   extract_sss.py

   extract_passport.py

   extract_postal.py

   extract_phic.py

   extract_nbi_ocr.py

   extract_police_ocr.py

   extract_tesda_ocr.py

   analyze_document.py

   ```

4. Click **Commit changes to main**

#### Step 3: Wait for Build

1. Go to the **App** tab
2. You'll see the build progress
3. Initial build takes **5-10 minutes** due to:
   - Installing PaddleOCR and dependencies
   - Downloading OCR models (~500MB)
   - Building Docker container

4. Watch the build logs for any errors

#### Step 4: Verify Deployment

Once built, test your API:

```bash

# Check health

curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health



# Expected response:

# {"status":"healthy","service":"handyhome-ocr-api","version":"1.0.0"}

```

### Option 2: Git Command Line Deployment

#### Step 1: Create Space on Web

Follow Step 1 from Option 1 above.

#### Step 2: Clone Space Repository

```bash

# Install Git LFS (if not already installed)

git lfs install



# Clone your space

git clone https://huggingface.co/spaces/YOUR-USERNAME/handyhome-ocr-api

cd handyhome-ocr-api

```

#### Step 3: Copy Files

```bash

# Copy all files from huggingface-ocr folder

cp -r ../huggingface-ocr/* .

```

#### Step 4: Commit and Push

```bash

# Add all files

git add .



# Commit

git commit -m "Initial deployment of HandyHome OCR API"



# Push to Hugging Face

git push

```

#### Step 5: Monitor Build

Go to your Space URL to watch the build progress.

## Configuration

### Space Settings

In your Space settings, you can configure:

1. **Hardware**:
   - **CPU basic** (free): 2 vCPU, 16GB RAM - Suitable for testing
   - **CPU upgrade** (paid): Better performance
   - **GPU** (paid): Faster OCR processing

2. **Sleep time**:
   - Free tier: Sleeps after 48 hours of inactivity
   - Paid tier: Can disable sleep

3. **Secrets** (if needed):
   - Add environment variables in Settings → Repository secrets

### Custom Domain (Optional)

For production, you can set up a custom domain in Space settings.

## Testing Your Deployment

### Test Health Endpoint

```bash

curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health

```

### Test OCR Extraction

```bash

# Test National ID extraction

curl -X POST https://YOUR-USERNAME-handyhome-ocr-api.hf.space/api/extract-national-id \

  -H "Content-Type: application/json" \

  -d '{"document_url": "YOUR_IMAGE_URL"}'

```

### Test in Python

```python

import requests



base_url = "https://YOUR-USERNAME-handyhome-ocr-api.hf.space"



# Test health

response = requests.get(f"{base_url}/health")

print(response.json())



# Test extraction

response = requests.post(

    f"{base_url}/api/extract-national-id",

    json={"document_url": "YOUR_IMAGE_URL"}

)

print(response.json())

```

## Integration with Your Main App

Update your main Flask app (`handyhome-web-scripts/app.py`) to use the Hugging Face Space:

```python

import requests



HUGGINGFACE_OCR_API = "https://YOUR-USERNAME-handyhome-ocr-api.hf.space"



@app.route('/extract-document', methods=['POST'])

def extract_document():

    data = request.json

    image_url = data.get('image_url')

    document_type = data.get('document_type')

    

    # Map document types to HF Space endpoints

    endpoint_mapping = {

        'National ID': '/api/extract-national-id',

        "Driver's License": '/api/extract-drivers-license',

        'PRC ID': '/api/extract-prc',

        'UMID': '/api/extract-umid',

        'SSS ID': '/api/extract-sss',

        'Passport': '/api/extract-passport',

        'Postal ID': '/api/extract-postal',

        'PHIC': '/api/extract-phic',

        'NBI Clearance': '/api/extract-nbi',

        'Police Clearance': '/api/extract-police-clearance',

        'TESDA': '/api/extract-tesda'

    }

    

    endpoint = endpoint_mapping.get(document_type)

    if not endpoint:

        return jsonify({'error': 'Unsupported document type'}), 400

    

    # Call Hugging Face Space API

    try:

        response = requests.post(

            f"{HUGGINGFACE_OCR_API}{endpoint}",

            json={'document_url': image_url},

            timeout=300

        )

        return jsonify(response.json())

    except Exception as e:

        return jsonify({'error': str(e)}), 500

```

## Monitoring and Maintenance

### Check Space Status

1. Go to your Space URL
2. Click **Settings** → **Usage**
3. Monitor:
   - Request count
   - Error rate
   - Response times
   - Memory usage

### View Logs

1. In your Space, click **App** tab
2. Scroll down to see real-time logs
3. Useful for debugging errors

### Update Deployment

To update your deployment:

**Web UI Method:**
1. Click **Files** tab
2. Click on file to edit
3. Make changes
4. Click **Commit changes**

**Git Method:**
```bash

cd handyhome-ocr-api

# Make changes to files

git add .

git commit -m "Update description"

git push

```

## Troubleshooting

### Build Fails

**Error: Out of memory**
- Solution: Reduce workers in Dockerfile or upgrade hardware

**Error: Timeout during build**
- Solution: This is normal for first build. Wait or restart build.

**Error: Missing dependencies**
- Solution: Check requirements.txt and Dockerfile

### Runtime Errors

**Error: Script not found**
- Solution: Ensure all `extract_*.py` files are uploaded

**Error: PaddleOCR model download fails**
- Solution: Models download on first use. Check internet connectivity.

**Error: 503 Service Unavailable**
- Solution: Space is sleeping. Wake it up by accessing the URL.

### Performance Issues

**Slow response times**
- Upgrade to better hardware tier
- Increase Gunicorn workers (may need more RAM)
- Consider caching frequently accessed documents

**Out of memory errors**
- Reduce Gunicorn workers in Dockerfile
- Upgrade to higher memory tier
- Process smaller images

## Cost Considerations

### Free Tier
- CPU basic hardware
- 48-hour sleep timeout
- Suitable for testing and low-traffic use

### Paid Tiers
- **CPU upgrade**: $0.03/hour (~$22/month)
- **GPU T4**: $0.60/hour (~$432/month)
- No sleep timeout
- Better performance

### Optimization Tips
- Use CPU for cost-effective deployment
- Enable sleep timeout for development
- Only upgrade if you need 24/7 availability or high performance

## Security Best Practices

1. **Use Private Spaces** for sensitive data
2. **Add authentication** if needed (custom middleware)
3. **Rate limiting** - Add to prevent abuse
4. **HTTPS only** - Hugging Face provides this by default
5. **Input validation** - Already implemented in scripts
6. **Secrets management** - Use HF Space secrets for API keys

## Support Resources

- **Hugging Face Spaces Docs**: https://huggingface.co/docs/hub/spaces
- **Docker SDK Guide**: https://huggingface.co/docs/hub/spaces-sdks-docker
- **Community Forum**: /static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2F

## Next Steps

After successful deployment:

1. ✅ Update your main app to use the HF Space API
2. ✅ Test all document types thoroughly
3. ✅ Set up monitoring and alerts
4. ✅ Document the API endpoints for your team
5. ✅ Consider setting up staging and production spaces

---

Happy deploying! 🚀