---
title: HandyHome OCR API
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
---

# HandyHome OCR Extraction API

Philippine ID and Document OCR Extraction Service using PaddleOCR

## 🎯 Features

### Supported Documents

#### Philippine Government IDs
- **National ID** - 19-digit ID number, full name, birth date
- **Driver's License** - License number, full name, address, birth date
- **UMID** - CRN, full name, birth date
- **SSS ID** - SSS number, full name, birth date
- **PRC ID** - PRC number, profession, full name, validity
- **Postal ID** - PRN, full name, address, birth date
- **PhilHealth ID** - ID number, full name, birth date, sex, address

#### Clearances & Certificates
- **NBI Clearance** - ID number, full name, birth date
- **Police Clearance** - ID number, full name, address, birth date, status
- **TESDA Certificate** - Registry number, full name, qualification, date issued

#### Passport
- **Philippine Passport** - Passport number, surname, given names, birth date, nationality

### Additional Features
- **Document Analysis** - Automatic document type identification
- **Document Tampering Detection** - Analyze multiple documents for tampering using Error Level Analysis (ELA) and metadata inspection

## 🚀 Quick Start

### API Endpoints

All extraction endpoints accept POST requests with the following format:

```json
{
  "document_url": "https://example.com/document.jpg"
}
```

#### Philippine ID Endpoints
- `POST /api/extract-national-id` - Extract National ID
- `POST /api/extract-drivers-license` - Extract Driver's License
- `POST /api/extract-prc` - Extract PRC ID
- `POST /api/extract-umid` - Extract UMID
- `POST /api/extract-sss` - Extract SSS ID
- `POST /api/extract-passport` - Extract Passport
- `POST /api/extract-postal` - Extract Postal ID
- `POST /api/extract-phic` - Extract PhilHealth ID

#### Clearance Endpoints
- `POST /api/extract-nbi` - Extract NBI Clearance
- `POST /api/extract-police-clearance` - Extract Police Clearance
- `POST /api/extract-tesda` - Extract TESDA Certificate

#### Analysis Endpoints
- `POST /api/analyze-document` - Identify document type
- `POST /api/analyze-documents` - Analyze multiple documents for tampering (max 3)

#### Utility Endpoints
- `GET /health` - Health check
- `GET /` - API documentation
- `GET /api/routes` - List all routes

## 📝 Usage Examples

### Python Example

```python
import requests

# Extract National ID
response = requests.post(
    'https://YOUR-SPACE.hf.space/api/extract-national-id',
    json={'document_url': 'https://example.com/national_id.jpg'}
)

result = response.json()
print(result)

# Expected output:
# {
#     "success": true,
#     "id_number": "1234-5678-9012-3456",
#     "full_name": "Juan Dela Cruz",
#     "birth_date": "1990-01-15"
# }

# Analyze multiple documents for tampering
response = requests.post(
    'https://YOUR-SPACE.hf.space/api/analyze-documents',
    json={'image_urls': [
        'https://example.com/id1.jpg',
        'https://example.com/id2.jpg'
    ]}
)

tampering_result = response.json()
print(tampering_result)

# Expected output:
# {
#     "success": true,
#     "total_documents": 2,
#     "results": [
#         {
#             "document_id": "doc_1",
#             "tampering_results": {"tampered": "False", "brightness_ratio": 0.015},
#             "metadata_results": {"result": "success", "message": "..."}
#         },
#         ...
#     ]
# }
```

### cURL Example

```bash
curl -X POST https://YOUR-SPACE.hf.space/api/extract-national-id \
  -H "Content-Type: application/json" \
  -d '{"document_url": "https://example.com/national_id.jpg"}'
```

### JavaScript Example

```javascript
const response = await fetch('https://YOUR-SPACE.hf.space/api/extract-national-id', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    document_url: 'https://example.com/national_id.jpg'
  })
});

const result = await response.json();
console.log(result);
```

## 🛠️ Technical Details

### Technology Stack
- **OCR Engine**: PaddleOCR 2.7+
- **Framework**: Flask + Gunicorn
- **Image Processing**: OpenCV, Pillow
- **Runtime**: Python 3.9

### Performance
- Average response time: 2-5 seconds per document
- Supports images up to 10MB
- Concurrent request handling with Gunicorn workers

### Resource Requirements
- RAM: 4GB minimum
- Storage: 2GB (includes PaddleOCR models)
- CPU: 2 cores recommended

## 📦 Deployment to Hugging Face Spaces

### Step 1: Create a New Space

1. Go to [Hugging Face Spaces](https://huggingface.co/new-space)
2. Enter space name: `handyhome-ocr-api` (or your preferred name)
3. Select **Docker** as SDK
4. Choose visibility: Public or Private
5. Click "Create Space"

### Step 2: Upload Files

Upload all files from this directory to your Space:
- `app.py`
- `requirements.txt`
- `Dockerfile`
- `README.md`
- All `extract_*.py` scripts
- `analyze_document.py`

### Step 3: Configure Space Settings

1. In your Space settings, set:
   - **SDK**: Docker
   - **Port**: 7860
   - **Sleep time**: 48 hours (optional)

2. The Space will automatically build and deploy

### Step 4: Wait for Build

- Initial build takes 5-10 minutes
- PaddleOCR models are downloaded during build
- Check build logs for any errors

### Step 5: Test Your API

Once deployed, test the health endpoint:
```bash
curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health
```

## 🔧 Local Development

### Setup

```bash
# Install dependencies
pip install -r requirements.txt

# Run Flask development server
python app.py
```

### Testing

```bash
# Test with a document URL
curl -X POST http://localhost:7860/api/extract-national-id \
  -H "Content-Type: application/json" \
  -d '{"document_url": "YOUR_IMAGE_URL"}'
```

## 📊 Response Format

### Successful Response

```json
{
  "success": true,
  "id_number": "1234-5678-9012-3456",
  "full_name": "Juan Dela Cruz",
  "birth_date": "1990-01-15",
  ...additional fields...
}
```

### Error Response

```json
{
  "success": false,
  "error": "Error description",
  "stderr": "Detailed error message"
}
```

## ⚠️ Limitations

- Requires clear, readable document images
- Works best with well-lit, high-resolution scans
- OCR accuracy depends on image quality
- Some fields may be null if not detected
- Processing time varies based on image size

## 🔐 Security Considerations

- Images are processed in memory and not stored permanently
- All processing happens server-side
- Sensitive data should be transmitted over HTTPS
- Consider rate limiting for production use

## 📄 License

MIT License - See LICENSE file for details

## 🤝 Contributing

Contributions welcome! Please submit issues and pull requests.

## 📞 Support

For issues and questions:
- Open an issue on GitHub
- Contact: [Your contact information]

---

Built with ❤️ using PaddleOCR and Flask