Spaces:
Sleeping
Sleeping
metadata
title: HandyHome OCR API
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
HandyHome OCR Extraction API
Philippine ID and Document OCR Extraction Service using PaddleOCR
π― Features
Supported Documents
Philippine Government IDs
- National ID - 19-digit ID number, full name, birth date
- Driver's License - License number, full name, address, birth date
- UMID - CRN, full name, birth date
- SSS ID - SSS number, full name, birth date
- PRC ID - PRC number, profession, full name, validity
- Postal ID - PRN, full name, address, birth date
- PhilHealth ID - ID number, full name, birth date, sex, address
Clearances & Certificates
- NBI Clearance - ID number, full name, birth date
- Police Clearance - ID number, full name, address, birth date, status
- TESDA Certificate - Registry number, full name, qualification, date issued
Passport
- Philippine Passport - Passport number, surname, given names, birth date, nationality
Additional Features
- Document Analysis - Automatic document type identification
- Document Tampering Detection - Analyze multiple documents for tampering using Error Level Analysis (ELA) and metadata inspection
π Quick Start
API Endpoints
All extraction endpoints accept POST requests with the following format:
{
"document_url": "https://example.com/document.jpg"
}
Philippine ID Endpoints
POST /api/extract-national-id- Extract National IDPOST /api/extract-drivers-license- Extract Driver's LicensePOST /api/extract-prc- Extract PRC IDPOST /api/extract-umid- Extract UMIDPOST /api/extract-sss- Extract SSS IDPOST /api/extract-passport- Extract PassportPOST /api/extract-postal- Extract Postal IDPOST /api/extract-phic- Extract PhilHealth ID
Clearance Endpoints
POST /api/extract-nbi- Extract NBI ClearancePOST /api/extract-police-clearance- Extract Police ClearancePOST /api/extract-tesda- Extract TESDA Certificate
Analysis Endpoints
POST /api/analyze-document- Identify document typePOST /api/analyze-documents- Analyze multiple documents for tampering (max 3)
Utility Endpoints
GET /health- Health checkGET /- API documentationGET /api/routes- List all routes
π Usage Examples
Python Example
import requests
# Extract National ID
response = requests.post(
'https://YOUR-SPACE.hf.space/api/extract-national-id',
json={'document_url': 'https://example.com/national_id.jpg'}
)
result = response.json()
print(result)
# Expected output:
# {
# "success": true,
# "id_number": "1234-5678-9012-3456",
# "full_name": "Juan Dela Cruz",
# "birth_date": "1990-01-15"
# }
# Analyze multiple documents for tampering
response = requests.post(
'https://YOUR-SPACE.hf.space/api/analyze-documents',
json={'image_urls': [
'https://example.com/id1.jpg',
'https://example.com/id2.jpg'
]}
)
tampering_result = response.json()
print(tampering_result)
# Expected output:
# {
# "success": true,
# "total_documents": 2,
# "results": [
# {
# "document_id": "doc_1",
# "tampering_results": {"tampered": "False", "brightness_ratio": 0.015},
# "metadata_results": {"result": "success", "message": "..."}
# },
# ...
# ]
# }
cURL Example
curl -X POST https://YOUR-SPACE.hf.space/api/extract-national-id \
-H "Content-Type: application/json" \
-d '{"document_url": "https://example.com/national_id.jpg"}'
JavaScript Example
const response = await fetch('https://YOUR-SPACE.hf.space/api/extract-national-id', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
document_url: 'https://example.com/national_id.jpg'
})
});
const result = await response.json();
console.log(result);
π οΈ Technical Details
Technology Stack
- OCR Engine: PaddleOCR 2.7+
- Framework: Flask + Gunicorn
- Image Processing: OpenCV, Pillow
- Runtime: Python 3.9
Performance
- Average response time: 2-5 seconds per document
- Supports images up to 10MB
- Concurrent request handling with Gunicorn workers
Resource Requirements
- RAM: 4GB minimum
- Storage: 2GB (includes PaddleOCR models)
- CPU: 2 cores recommended
π¦ Deployment to Hugging Face Spaces
Step 1: Create a New Space
- Go to Hugging Face Spaces
- Enter space name:
handyhome-ocr-api(or your preferred name) - Select Docker as SDK
- Choose visibility: Public or Private
- Click "Create Space"
Step 2: Upload Files
Upload all files from this directory to your Space:
app.pyrequirements.txtDockerfileREADME.md- All
extract_*.pyscripts analyze_document.py
Step 3: Configure Space Settings
In your Space settings, set:
- SDK: Docker
- Port: 7860
- Sleep time: 48 hours (optional)
The Space will automatically build and deploy
Step 4: Wait for Build
- Initial build takes 5-10 minutes
- PaddleOCR models are downloaded during build
- Check build logs for any errors
Step 5: Test Your API
Once deployed, test the health endpoint:
curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health
π§ Local Development
Setup
# Install dependencies
pip install -r requirements.txt
# Run Flask development server
python app.py
Testing
# Test with a document URL
curl -X POST http://localhost:7860/api/extract-national-id \
-H "Content-Type: application/json" \
-d '{"document_url": "YOUR_IMAGE_URL"}'
π Response Format
Successful Response
{
"success": true,
"id_number": "1234-5678-9012-3456",
"full_name": "Juan Dela Cruz",
"birth_date": "1990-01-15",
...additional fields...
}
Error Response
{
"success": false,
"error": "Error description",
"stderr": "Detailed error message"
}
β οΈ Limitations
- Requires clear, readable document images
- Works best with well-lit, high-resolution scans
- OCR accuracy depends on image quality
- Some fields may be null if not detected
- Processing time varies based on image size
π Security Considerations
- Images are processed in memory and not stored permanently
- All processing happens server-side
- Sensitive data should be transmitted over HTTPS
- Consider rate limiting for production use
π License
MIT License - See LICENSE file for details
π€ Contributing
Contributions welcome! Please submit issues and pull requests.
π Support
For issues and questions:
- Open an issue on GitHub
- Contact: [Your contact information]
Built with β€οΈ using PaddleOCR and Flask