Spaces:

takomattyy
/

handyhome-ocr-api

Sleeping

File size: 7,188 Bytes

---

title: HandyHome OCR API
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
---


# HandyHome OCR Extraction API

Philippine ID and Document OCR Extraction Service using PaddleOCR

## 🎯 Features

### Supported Documents

#### Philippine Government IDs
- **National ID** - 19-digit ID number, full name, birth date
- **Driver's License** - License number, full name, address, birth date
- **UMID** - CRN, full name, birth date
- **SSS ID** - SSS number, full name, birth date
- **PRC ID** - PRC number, profession, full name, validity
- **Postal ID** - PRN, full name, address, birth date
- **PhilHealth ID** - ID number, full name, birth date, sex, address

#### Clearances & Certificates
- **NBI Clearance** - ID number, full name, birth date
- **Police Clearance** - ID number, full name, address, birth date, status
- **TESDA Certificate** - Registry number, full name, qualification, date issued

#### Passport
- **Philippine Passport** - Passport number, surname, given names, birth date, nationality

### Additional Features
- **Document Analysis** - Automatic document type identification
- **Document Tampering Detection** - Analyze multiple documents for tampering using Error Level Analysis (ELA) and metadata inspection

## 🚀 Quick Start

### API Endpoints

All extraction endpoints accept POST requests with the following format:

```json

{

  "document_url": "https://example.com/document.jpg"

}

```

#### Philippine ID Endpoints
- `POST /api/extract-national-id` - Extract National ID
- `POST /api/extract-drivers-license` - Extract Driver's License
- `POST /api/extract-prc` - Extract PRC ID
- `POST /api/extract-umid` - Extract UMID
- `POST /api/extract-sss` - Extract SSS ID
- `POST /api/extract-passport` - Extract Passport
- `POST /api/extract-postal` - Extract Postal ID
- `POST /api/extract-phic` - Extract PhilHealth ID

#### Clearance Endpoints
- `POST /api/extract-nbi` - Extract NBI Clearance
- `POST /api/extract-police-clearance` - Extract Police Clearance
- `POST /api/extract-tesda` - Extract TESDA Certificate

#### Analysis Endpoints
- `POST /api/analyze-document` - Identify document type
- `POST /api/analyze-documents` - Analyze multiple documents for tampering (max 3)

#### Utility Endpoints
- `GET /health` - Health check
- `GET /` - API documentation
- `GET /api/routes` - List all routes

## 📝 Usage Examples

### Python Example

```python

import requests



# Extract National ID

response = requests.post(

    'https://YOUR-SPACE.hf.space/api/extract-national-id',

    json={'document_url': 'https://example.com/national_id.jpg'}

)



result = response.json()

print(result)



# Expected output:

# {

#     "success": true,

#     "id_number": "1234-5678-9012-3456",

#     "full_name": "Juan Dela Cruz",

#     "birth_date": "1990-01-15"

# }



# Analyze multiple documents for tampering

response = requests.post(

    'https://YOUR-SPACE.hf.space/api/analyze-documents',

    json={'image_urls': [

        'https://example.com/id1.jpg',

        'https://example.com/id2.jpg'

    ]}

)



tampering_result = response.json()

print(tampering_result)



# Expected output:

# {

#     "success": true,

#     "total_documents": 2,

#     "results": [

#         {

#             "document_id": "doc_1",

#             "tampering_results": {"tampered": "False", "brightness_ratio": 0.015},

#             "metadata_results": {"result": "success", "message": "..."}

#         },

#         ...

#     ]

# }

```

### cURL Example

```bash

curl -X POST https://YOUR-SPACE.hf.space/api/extract-national-id \

  -H "Content-Type: application/json" \

  -d '{"document_url": "https://example.com/national_id.jpg"}'

```

### JavaScript Example

```javascript

const response = await fetch('https://YOUR-SPACE.hf.space/api/extract-national-id', {

  method: 'POST',

  headers: { 'Content-Type': 'application/json' },

  body: JSON.stringify({

    document_url: 'https://example.com/national_id.jpg'

  })

});



const result = await response.json();

console.log(result);

```

## 🛠️ Technical Details

### Technology Stack
- **OCR Engine**: PaddleOCR 2.7+
- **Framework**: Flask + Gunicorn
- **Image Processing**: OpenCV, Pillow
- **Runtime**: Python 3.9

### Performance
- Average response time: 2-5 seconds per document
- Supports images up to 10MB
- Concurrent request handling with Gunicorn workers

### Resource Requirements
- RAM: 4GB minimum
- Storage: 2GB (includes PaddleOCR models)
- CPU: 2 cores recommended

## 📦 Deployment to Hugging Face Spaces

### Step 1: Create a New Space

1. Go to [Hugging Face Spaces](https://huggingface.co/new-space)
2. Enter space name: `handyhome-ocr-api` (or your preferred name)
3. Select **Docker** as SDK
4. Choose visibility: Public or Private
5. Click "Create Space"

### Step 2: Upload Files

Upload all files from this directory to your Space:
- `app.py`
- `requirements.txt`
- `Dockerfile`
- `README.md`
- All `extract_*.py` scripts
- `analyze_document.py`

### Step 3: Configure Space Settings

1. In your Space settings, set:
   - **SDK**: Docker
   - **Port**: 7860
   - **Sleep time**: 48 hours (optional)

2. The Space will automatically build and deploy

### Step 4: Wait for Build

- Initial build takes 5-10 minutes
- PaddleOCR models are downloaded during build
- Check build logs for any errors

### Step 5: Test Your API

Once deployed, test the health endpoint:
```bash

curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health

```

## 🔧 Local Development

### Setup

```bash

# Install dependencies

pip install -r requirements.txt



# Run Flask development server

python app.py

```

### Testing

```bash

# Test with a document URL

curl -X POST http://localhost:7860/api/extract-national-id \

  -H "Content-Type: application/json" \

  -d '{"document_url": "YOUR_IMAGE_URL"}'

```

## 📊 Response Format

### Successful Response

```json

{

  "success": true,

  "id_number": "1234-5678-9012-3456",

  "full_name": "Juan Dela Cruz",

  "birth_date": "1990-01-15",

  ...additional fields...

}

```

### Error Response

```json

{

  "success": false,

  "error": "Error description",

  "stderr": "Detailed error message"

}

```

## ⚠️ Limitations

- Requires clear, readable document images
- Works best with well-lit, high-resolution scans
- OCR accuracy depends on image quality
- Some fields may be null if not detected
- Processing time varies based on image size

## 🔐 Security Considerations

- Images are processed in memory and not stored permanently
- All processing happens server-side
- Sensitive data should be transmitted over HTTPS
- Consider rate limiting for production use

## 📄 License

MIT License - See LICENSE file for details

## 🤝 Contributing

Contributions welcome! Please submit issues and pull requests.

## 📞 Support

For issues and questions:
- Open an issue on GitHub
- Contact: [Your contact information]

---

Built with ❤️ using PaddleOCR and Flask