Spaces:

takomattyy
/

handyhome-ocr-api

Sleeping

App Files Files Community

handyhome-ocr-api / DEPLOYMENT_GUIDE.md

takomattyy

Upload 20 files

db10255 verified 25 days ago

preview code

raw

history blame contribute delete

8.49 kB

Deployment Guide for Hugging Face Spaces

This guide will help you deploy the HandyHome OCR API to Hugging Face Spaces.

Prerequisites

A Hugging Face account (free at https://huggingface.co/join)
Git installed on your machine (optional, for command-line deployment)

Deployment Options

Option 1: Web UI Deployment (Easiest)

Step 1: Create a New Space

Go to https://huggingface.co/new-space
Fill in the details:
- Owner: Your username
- Space name: handyhome-ocr-api (or any name you prefer)
- License: MIT
- Select the Space SDK: Choose Docker
- Space hardware: Start with CPU basic (free tier)
- Visibility: Choose Public or Private
Click Create Space

Step 2: Upload Files via Web UI

In your new Space, click Files tab
Click Add file → Upload files

Upload the following files from the huggingface-ocr folder:

app.py
requirements.txt
Dockerfile
README.md
.gitignore
extract_national_id.py
extract_drivers_license.py
extract_prc.py
extract_umid.py
extract_sss.py
extract_passport.py
extract_postal.py
extract_phic.py
extract_nbi_ocr.py
extract_police_ocr.py
extract_tesda_ocr.py
analyze_document.py

Click Commit changes to main

Step 3: Wait for Build

Go to the App tab
You'll see the build progress
Initial build takes 5-10 minutes due to:
- Installing PaddleOCR and dependencies
- Downloading OCR models (~500MB)
- Building Docker container
Watch the build logs for any errors

Step 4: Verify Deployment

Once built, test your API:

# Check health
curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health

# Expected response:
# {"status":"healthy","service":"handyhome-ocr-api","version":"1.0.0"}

Option 2: Git Command Line Deployment

Step 1: Create Space on Web

Follow Step 1 from Option 1 above.

Step 2: Clone Space Repository

# Install Git LFS (if not already installed)
git lfs install

# Clone your space
git clone https://huggingface.co/spaces/YOUR-USERNAME/handyhome-ocr-api
cd handyhome-ocr-api

Step 3: Copy Files

# Copy all files from huggingface-ocr folder
cp -r ../huggingface-ocr/* .

Step 4: Commit and Push

# Add all files
git add .

# Commit
git commit -m "Initial deployment of HandyHome OCR API"

# Push to Hugging Face
git push

Step 5: Monitor Build

Go to your Space URL to watch the build progress.

Configuration

Space Settings

In your Space settings, you can configure:

Hardware:
- CPU basic (free): 2 vCPU, 16GB RAM - Suitable for testing
- CPU upgrade (paid): Better performance
- GPU (paid): Faster OCR processing
Sleep time:
- Free tier: Sleeps after 48 hours of inactivity
- Paid tier: Can disable sleep
Secrets (if needed):
- Add environment variables in Settings → Repository secrets

Custom Domain (Optional)

For production, you can set up a custom domain in Space settings.

Testing Your Deployment

Test Health Endpoint

curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health

Test OCR Extraction

# Test National ID extraction
curl -X POST https://YOUR-USERNAME-handyhome-ocr-api.hf.space/api/extract-national-id \
  -H "Content-Type: application/json" \
  -d '{"document_url": "YOUR_IMAGE_URL"}'

Test in Python

import requests

base_url = "https://YOUR-USERNAME-handyhome-ocr-api.hf.space"

# Test health
response = requests.get(f"{base_url}/health")
print(response.json())

# Test extraction
response = requests.post(
    f"{base_url}/api/extract-national-id",
    json={"document_url": "YOUR_IMAGE_URL"}
)
print(response.json())

Integration with Your Main App

Update your main Flask app (handyhome-web-scripts/app.py) to use the Hugging Face Space:

import requests

HUGGINGFACE_OCR_API = "https://YOUR-USERNAME-handyhome-ocr-api.hf.space"

@app.route('/extract-document', methods=['POST'])
def extract_document():
    data = request.json
    image_url = data.get('image_url')
    document_type = data.get('document_type')
    
    # Map document types to HF Space endpoints
    endpoint_mapping = {
        'National ID': '/api/extract-national-id',
        "Driver's License": '/api/extract-drivers-license',
        'PRC ID': '/api/extract-prc',
        'UMID': '/api/extract-umid',
        'SSS ID': '/api/extract-sss',
        'Passport': '/api/extract-passport',
        'Postal ID': '/api/extract-postal',
        'PHIC': '/api/extract-phic',
        'NBI Clearance': '/api/extract-nbi',
        'Police Clearance': '/api/extract-police-clearance',
        'TESDA': '/api/extract-tesda'
    }
    
    endpoint = endpoint_mapping.get(document_type)
    if not endpoint:
        return jsonify({'error': 'Unsupported document type'}), 400
    
    # Call Hugging Face Space API
    try:
        response = requests.post(
            f"{HUGGINGFACE_OCR_API}{endpoint}",
            json={'document_url': image_url},
            timeout=300
        )
        return jsonify(response.json())
    except Exception as e:
        return jsonify({'error': str(e)}), 500

Monitoring and Maintenance

Check Space Status

Go to your Space URL
Click Settings → Usage
Monitor:
- Request count
- Error rate
- Response times
- Memory usage

View Logs

In your Space, click App tab
Scroll down to see real-time logs
Useful for debugging errors

Update Deployment

To update your deployment:

Web UI Method:

Click Files tab
Click on file to edit
Make changes
Click Commit changes

Git Method:

cd handyhome-ocr-api
# Make changes to files
git add .
git commit -m "Update description"
git push

Troubleshooting

Build Fails

Error: Out of memory

Solution: Reduce workers in Dockerfile or upgrade hardware

Error: Timeout during build

Solution: This is normal for first build. Wait or restart build.

Error: Missing dependencies

Solution: Check requirements.txt and Dockerfile

Runtime Errors

Error: Script not found

Solution: Ensure all extract_*.py files are uploaded

Error: PaddleOCR model download fails

Solution: Models download on first use. Check internet connectivity.

Error: 503 Service Unavailable

Solution: Space is sleeping. Wake it up by accessing the URL.

Performance Issues

Slow response times

Upgrade to better hardware tier
Increase Gunicorn workers (may need more RAM)
Consider caching frequently accessed documents

Out of memory errors

Reduce Gunicorn workers in Dockerfile
Upgrade to higher memory tier
Process smaller images

Cost Considerations

Free Tier

CPU basic hardware
48-hour sleep timeout
Suitable for testing and low-traffic use

Paid Tiers

CPU upgrade: $0.03/hour (~$22/month)
GPU T4: $0.60/hour (~$432/month)
No sleep timeout
Better performance

Optimization Tips

Use CPU for cost-effective deployment
Enable sleep timeout for development
Only upgrade if you need 24/7 availability or high performance

Security Best Practices

Use Private Spaces for sensitive data
Add authentication if needed (custom middleware)
Rate limiting - Add to prevent abuse
HTTPS only - Hugging Face provides this by default
Input validation - Already implemented in scripts
Secrets management - Use HF Space secrets for API keys

Support Resources

Hugging Face Spaces Docs: https://huggingface.co/docs/hub/spaces
Docker SDK Guide: https://huggingface.co/docs/hub/spaces-sdks-docker
Community Forum: /static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2F%3C%2Fa%3E%3C%2Fli%3E

Next Steps

After successful deployment:

✅ Update your main app to use the HF Space API
✅ Test all document types thoroughly
✅ Set up monitoring and alerts
✅ Document the API endpoints for your team
✅ Consider setting up staging and production spaces

Happy deploying! 🚀