--- title: HandyHome OCR API emoji: 🔍 colorFrom: blue colorTo: green sdk: docker pinned: false license: mit --- # HandyHome OCR Extraction API Philippine ID and Document OCR Extraction Service using PaddleOCR ## 🎯 Features ### Supported Documents #### Philippine Government IDs - **National ID** - 19-digit ID number, full name, birth date - **Driver's License** - License number, full name, address, birth date - **UMID** - CRN, full name, birth date - **SSS ID** - SSS number, full name, birth date - **PRC ID** - PRC number, profession, full name, validity - **Postal ID** - PRN, full name, address, birth date - **PhilHealth ID** - ID number, full name, birth date, sex, address #### Clearances & Certificates - **NBI Clearance** - ID number, full name, birth date - **Police Clearance** - ID number, full name, address, birth date, status - **TESDA Certificate** - Registry number, full name, qualification, date issued #### Passport - **Philippine Passport** - Passport number, surname, given names, birth date, nationality ### Additional Features - **Document Analysis** - Automatic document type identification - **Document Tampering Detection** - Analyze multiple documents for tampering using Error Level Analysis (ELA) and metadata inspection ## 🚀 Quick Start ### API Endpoints All extraction endpoints accept POST requests with the following format: ```json { "document_url": "https://example.com/document.jpg" } ``` #### Philippine ID Endpoints - `POST /api/extract-national-id` - Extract National ID - `POST /api/extract-drivers-license` - Extract Driver's License - `POST /api/extract-prc` - Extract PRC ID - `POST /api/extract-umid` - Extract UMID - `POST /api/extract-sss` - Extract SSS ID - `POST /api/extract-passport` - Extract Passport - `POST /api/extract-postal` - Extract Postal ID - `POST /api/extract-phic` - Extract PhilHealth ID #### Clearance Endpoints - `POST /api/extract-nbi` - Extract NBI Clearance - `POST /api/extract-police-clearance` - Extract Police Clearance - `POST /api/extract-tesda` - Extract TESDA Certificate #### Analysis Endpoints - `POST /api/analyze-document` - Identify document type - `POST /api/analyze-documents` - Analyze multiple documents for tampering (max 3) #### Utility Endpoints - `GET /health` - Health check - `GET /` - API documentation - `GET /api/routes` - List all routes ## 📝 Usage Examples ### Python Example ```python import requests # Extract National ID response = requests.post( 'https://YOUR-SPACE.hf.space/api/extract-national-id', json={'document_url': 'https://example.com/national_id.jpg'} ) result = response.json() print(result) # Expected output: # { # "success": true, # "id_number": "1234-5678-9012-3456", # "full_name": "Juan Dela Cruz", # "birth_date": "1990-01-15" # } # Analyze multiple documents for tampering response = requests.post( 'https://YOUR-SPACE.hf.space/api/analyze-documents', json={'image_urls': [ 'https://example.com/id1.jpg', 'https://example.com/id2.jpg' ]} ) tampering_result = response.json() print(tampering_result) # Expected output: # { # "success": true, # "total_documents": 2, # "results": [ # { # "document_id": "doc_1", # "tampering_results": {"tampered": "False", "brightness_ratio": 0.015}, # "metadata_results": {"result": "success", "message": "..."} # }, # ... # ] # } ``` ### cURL Example ```bash curl -X POST https://YOUR-SPACE.hf.space/api/extract-national-id \ -H "Content-Type: application/json" \ -d '{"document_url": "https://example.com/national_id.jpg"}' ``` ### JavaScript Example ```javascript const response = await fetch('https://YOUR-SPACE.hf.space/api/extract-national-id', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ document_url: 'https://example.com/national_id.jpg' }) }); const result = await response.json(); console.log(result); ``` ## 🛠️ Technical Details ### Technology Stack - **OCR Engine**: PaddleOCR 2.7+ - **Framework**: Flask + Gunicorn - **Image Processing**: OpenCV, Pillow - **Runtime**: Python 3.9 ### Performance - Average response time: 2-5 seconds per document - Supports images up to 10MB - Concurrent request handling with Gunicorn workers ### Resource Requirements - RAM: 4GB minimum - Storage: 2GB (includes PaddleOCR models) - CPU: 2 cores recommended ## 📦 Deployment to Hugging Face Spaces ### Step 1: Create a New Space 1. Go to [Hugging Face Spaces](https://huggingface.co/new-space) 2. Enter space name: `handyhome-ocr-api` (or your preferred name) 3. Select **Docker** as SDK 4. Choose visibility: Public or Private 5. Click "Create Space" ### Step 2: Upload Files Upload all files from this directory to your Space: - `app.py` - `requirements.txt` - `Dockerfile` - `README.md` - All `extract_*.py` scripts - `analyze_document.py` ### Step 3: Configure Space Settings 1. In your Space settings, set: - **SDK**: Docker - **Port**: 7860 - **Sleep time**: 48 hours (optional) 2. The Space will automatically build and deploy ### Step 4: Wait for Build - Initial build takes 5-10 minutes - PaddleOCR models are downloaded during build - Check build logs for any errors ### Step 5: Test Your API Once deployed, test the health endpoint: ```bash curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health ``` ## 🔧 Local Development ### Setup ```bash # Install dependencies pip install -r requirements.txt # Run Flask development server python app.py ``` ### Testing ```bash # Test with a document URL curl -X POST http://localhost:7860/api/extract-national-id \ -H "Content-Type: application/json" \ -d '{"document_url": "YOUR_IMAGE_URL"}' ``` ## 📊 Response Format ### Successful Response ```json { "success": true, "id_number": "1234-5678-9012-3456", "full_name": "Juan Dela Cruz", "birth_date": "1990-01-15", ...additional fields... } ``` ### Error Response ```json { "success": false, "error": "Error description", "stderr": "Detailed error message" } ``` ## ⚠️ Limitations - Requires clear, readable document images - Works best with well-lit, high-resolution scans - OCR accuracy depends on image quality - Some fields may be null if not detected - Processing time varies based on image size ## 🔐 Security Considerations - Images are processed in memory and not stored permanently - All processing happens server-side - Sensitive data should be transmitted over HTTPS - Consider rate limiting for production use ## 📄 License MIT License - See LICENSE file for details ## 🤝 Contributing Contributions welcome! Please submit issues and pull requests. ## 📞 Support For issues and questions: - Open an issue on GitHub - Contact: [Your contact information] --- Built with ❤️ using PaddleOCR and Flask