handyhome-ocr-api / DEPLOYMENT_GUIDE.md
takomattyy's picture
Upload 20 files
db10255 verified

Deployment Guide for Hugging Face Spaces

This guide will help you deploy the HandyHome OCR API to Hugging Face Spaces.

Prerequisites

Deployment Options

Option 1: Web UI Deployment (Easiest)

Step 1: Create a New Space

  1. Go to https://huggingface.co/new-space

  2. Fill in the details:

    • Owner: Your username
    • Space name: handyhome-ocr-api (or any name you prefer)
    • License: MIT
    • Select the Space SDK: Choose Docker
    • Space hardware: Start with CPU basic (free tier)
    • Visibility: Choose Public or Private
  3. Click Create Space

Step 2: Upload Files via Web UI

  1. In your new Space, click Files tab

  2. Click Add file β†’ Upload files

  3. Upload the following files from the huggingface-ocr folder:

    app.py
    requirements.txt
    Dockerfile
    README.md
    .gitignore
    extract_national_id.py
    extract_drivers_license.py
    extract_prc.py
    extract_umid.py
    extract_sss.py
    extract_passport.py
    extract_postal.py
    extract_phic.py
    extract_nbi_ocr.py
    extract_police_ocr.py
    extract_tesda_ocr.py
    analyze_document.py
    
  4. Click Commit changes to main

Step 3: Wait for Build

  1. Go to the App tab

  2. You'll see the build progress

  3. Initial build takes 5-10 minutes due to:

    • Installing PaddleOCR and dependencies
    • Downloading OCR models (~500MB)
    • Building Docker container
  4. Watch the build logs for any errors

Step 4: Verify Deployment

Once built, test your API:

# Check health
curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health

# Expected response:
# {"status":"healthy","service":"handyhome-ocr-api","version":"1.0.0"}

Option 2: Git Command Line Deployment

Step 1: Create Space on Web

Follow Step 1 from Option 1 above.

Step 2: Clone Space Repository

# Install Git LFS (if not already installed)
git lfs install

# Clone your space
git clone https://huggingface.co/spaces/YOUR-USERNAME/handyhome-ocr-api
cd handyhome-ocr-api

Step 3: Copy Files

# Copy all files from huggingface-ocr folder
cp -r ../huggingface-ocr/* .

Step 4: Commit and Push

# Add all files
git add .

# Commit
git commit -m "Initial deployment of HandyHome OCR API"

# Push to Hugging Face
git push

Step 5: Monitor Build

Go to your Space URL to watch the build progress.

Configuration

Space Settings

In your Space settings, you can configure:

  1. Hardware:

    • CPU basic (free): 2 vCPU, 16GB RAM - Suitable for testing
    • CPU upgrade (paid): Better performance
    • GPU (paid): Faster OCR processing
  2. Sleep time:

    • Free tier: Sleeps after 48 hours of inactivity
    • Paid tier: Can disable sleep
  3. Secrets (if needed):

    • Add environment variables in Settings β†’ Repository secrets

Custom Domain (Optional)

For production, you can set up a custom domain in Space settings.

Testing Your Deployment

Test Health Endpoint

curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health

Test OCR Extraction

# Test National ID extraction
curl -X POST https://YOUR-USERNAME-handyhome-ocr-api.hf.space/api/extract-national-id \
  -H "Content-Type: application/json" \
  -d '{"document_url": "YOUR_IMAGE_URL"}'

Test in Python

import requests

base_url = "https://YOUR-USERNAME-handyhome-ocr-api.hf.space"

# Test health
response = requests.get(f"{base_url}/health")
print(response.json())

# Test extraction
response = requests.post(
    f"{base_url}/api/extract-national-id",
    json={"document_url": "YOUR_IMAGE_URL"}
)
print(response.json())

Integration with Your Main App

Update your main Flask app (handyhome-web-scripts/app.py) to use the Hugging Face Space:

import requests

HUGGINGFACE_OCR_API = "https://YOUR-USERNAME-handyhome-ocr-api.hf.space"

@app.route('/extract-document', methods=['POST'])
def extract_document():
    data = request.json
    image_url = data.get('image_url')
    document_type = data.get('document_type')
    
    # Map document types to HF Space endpoints
    endpoint_mapping = {
        'National ID': '/api/extract-national-id',
        "Driver's License": '/api/extract-drivers-license',
        'PRC ID': '/api/extract-prc',
        'UMID': '/api/extract-umid',
        'SSS ID': '/api/extract-sss',
        'Passport': '/api/extract-passport',
        'Postal ID': '/api/extract-postal',
        'PHIC': '/api/extract-phic',
        'NBI Clearance': '/api/extract-nbi',
        'Police Clearance': '/api/extract-police-clearance',
        'TESDA': '/api/extract-tesda'
    }
    
    endpoint = endpoint_mapping.get(document_type)
    if not endpoint:
        return jsonify({'error': 'Unsupported document type'}), 400
    
    # Call Hugging Face Space API
    try:
        response = requests.post(
            f"{HUGGINGFACE_OCR_API}{endpoint}",
            json={'document_url': image_url},
            timeout=300
        )
        return jsonify(response.json())
    except Exception as e:
        return jsonify({'error': str(e)}), 500

Monitoring and Maintenance

Check Space Status

  1. Go to your Space URL
  2. Click Settings β†’ Usage
  3. Monitor:
    • Request count
    • Error rate
    • Response times
    • Memory usage

View Logs

  1. In your Space, click App tab
  2. Scroll down to see real-time logs
  3. Useful for debugging errors

Update Deployment

To update your deployment:

Web UI Method:

  1. Click Files tab
  2. Click on file to edit
  3. Make changes
  4. Click Commit changes

Git Method:

cd handyhome-ocr-api
# Make changes to files
git add .
git commit -m "Update description"
git push

Troubleshooting

Build Fails

Error: Out of memory

  • Solution: Reduce workers in Dockerfile or upgrade hardware

Error: Timeout during build

  • Solution: This is normal for first build. Wait or restart build.

Error: Missing dependencies

  • Solution: Check requirements.txt and Dockerfile

Runtime Errors

Error: Script not found

  • Solution: Ensure all extract_*.py files are uploaded

Error: PaddleOCR model download fails

  • Solution: Models download on first use. Check internet connectivity.

Error: 503 Service Unavailable

  • Solution: Space is sleeping. Wake it up by accessing the URL.

Performance Issues

Slow response times

  • Upgrade to better hardware tier
  • Increase Gunicorn workers (may need more RAM)
  • Consider caching frequently accessed documents

Out of memory errors

  • Reduce Gunicorn workers in Dockerfile
  • Upgrade to higher memory tier
  • Process smaller images

Cost Considerations

Free Tier

  • CPU basic hardware
  • 48-hour sleep timeout
  • Suitable for testing and low-traffic use

Paid Tiers

  • CPU upgrade: $0.03/hour (~$22/month)
  • GPU T4: $0.60/hour (~$432/month)
  • No sleep timeout
  • Better performance

Optimization Tips

  • Use CPU for cost-effective deployment
  • Enable sleep timeout for development
  • Only upgrade if you need 24/7 availability or high performance

Security Best Practices

  1. Use Private Spaces for sensitive data
  2. Add authentication if needed (custom middleware)
  3. Rate limiting - Add to prevent abuse
  4. HTTPS only - Hugging Face provides this by default
  5. Input validation - Already implemented in scripts
  6. Secrets management - Use HF Space secrets for API keys

Support Resources

Next Steps

After successful deployment:

  1. βœ… Update your main app to use the HF Space API
  2. βœ… Test all document types thoroughly
  3. βœ… Set up monitoring and alerts
  4. βœ… Document the API endpoints for your team
  5. βœ… Consider setting up staging and production spaces

Happy deploying! πŸš€