File size: 7,188 Bytes
db10255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f04313
db10255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f04313
db10255
3f04313
db10255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f04313
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db10255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
---

title: HandyHome OCR API
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
---


# HandyHome OCR Extraction API

Philippine ID and Document OCR Extraction Service using PaddleOCR

## 🎯 Features

### Supported Documents

#### Philippine Government IDs
- **National ID** - 19-digit ID number, full name, birth date
- **Driver's License** - License number, full name, address, birth date
- **UMID** - CRN, full name, birth date
- **SSS ID** - SSS number, full name, birth date
- **PRC ID** - PRC number, profession, full name, validity
- **Postal ID** - PRN, full name, address, birth date
- **PhilHealth ID** - ID number, full name, birth date, sex, address

#### Clearances & Certificates
- **NBI Clearance** - ID number, full name, birth date
- **Police Clearance** - ID number, full name, address, birth date, status
- **TESDA Certificate** - Registry number, full name, qualification, date issued

#### Passport
- **Philippine Passport** - Passport number, surname, given names, birth date, nationality

### Additional Features
- **Document Analysis** - Automatic document type identification
- **Document Tampering Detection** - Analyze multiple documents for tampering using Error Level Analysis (ELA) and metadata inspection

## πŸš€ Quick Start

### API Endpoints

All extraction endpoints accept POST requests with the following format:

```json

{

  "document_url": "https://example.com/document.jpg"

}

```

#### Philippine ID Endpoints
- `POST /api/extract-national-id` - Extract National ID
- `POST /api/extract-drivers-license` - Extract Driver's License
- `POST /api/extract-prc` - Extract PRC ID
- `POST /api/extract-umid` - Extract UMID
- `POST /api/extract-sss` - Extract SSS ID
- `POST /api/extract-passport` - Extract Passport
- `POST /api/extract-postal` - Extract Postal ID
- `POST /api/extract-phic` - Extract PhilHealth ID

#### Clearance Endpoints
- `POST /api/extract-nbi` - Extract NBI Clearance
- `POST /api/extract-police-clearance` - Extract Police Clearance
- `POST /api/extract-tesda` - Extract TESDA Certificate

#### Analysis Endpoints
- `POST /api/analyze-document` - Identify document type
- `POST /api/analyze-documents` - Analyze multiple documents for tampering (max 3)

#### Utility Endpoints
- `GET /health` - Health check
- `GET /` - API documentation
- `GET /api/routes` - List all routes

## πŸ“ Usage Examples

### Python Example

```python

import requests



# Extract National ID

response = requests.post(

    'https://YOUR-SPACE.hf.space/api/extract-national-id',

    json={'document_url': 'https://example.com/national_id.jpg'}

)



result = response.json()

print(result)



# Expected output:

# {

#     "success": true,

#     "id_number": "1234-5678-9012-3456",

#     "full_name": "Juan Dela Cruz",

#     "birth_date": "1990-01-15"

# }



# Analyze multiple documents for tampering

response = requests.post(

    'https://YOUR-SPACE.hf.space/api/analyze-documents',

    json={'image_urls': [

        'https://example.com/id1.jpg',

        'https://example.com/id2.jpg'

    ]}

)



tampering_result = response.json()

print(tampering_result)



# Expected output:

# {

#     "success": true,

#     "total_documents": 2,

#     "results": [

#         {

#             "document_id": "doc_1",

#             "tampering_results": {"tampered": "False", "brightness_ratio": 0.015},

#             "metadata_results": {"result": "success", "message": "..."}

#         },

#         ...

#     ]

# }

```

### cURL Example

```bash

curl -X POST https://YOUR-SPACE.hf.space/api/extract-national-id \

  -H "Content-Type: application/json" \

  -d '{"document_url": "https://example.com/national_id.jpg"}'

```

### JavaScript Example

```javascript

const response = await fetch('https://YOUR-SPACE.hf.space/api/extract-national-id', {

  method: 'POST',

  headers: { 'Content-Type': 'application/json' },

  body: JSON.stringify({

    document_url: 'https://example.com/national_id.jpg'

  })

});



const result = await response.json();

console.log(result);

```

## πŸ› οΈ Technical Details

### Technology Stack
- **OCR Engine**: PaddleOCR 2.7+
- **Framework**: Flask + Gunicorn
- **Image Processing**: OpenCV, Pillow
- **Runtime**: Python 3.9

### Performance
- Average response time: 2-5 seconds per document
- Supports images up to 10MB
- Concurrent request handling with Gunicorn workers

### Resource Requirements
- RAM: 4GB minimum
- Storage: 2GB (includes PaddleOCR models)
- CPU: 2 cores recommended

## πŸ“¦ Deployment to Hugging Face Spaces

### Step 1: Create a New Space

1. Go to [Hugging Face Spaces](https://huggingface.co/new-space)
2. Enter space name: `handyhome-ocr-api` (or your preferred name)
3. Select **Docker** as SDK
4. Choose visibility: Public or Private
5. Click "Create Space"

### Step 2: Upload Files

Upload all files from this directory to your Space:
- `app.py`
- `requirements.txt`
- `Dockerfile`
- `README.md`
- All `extract_*.py` scripts
- `analyze_document.py`

### Step 3: Configure Space Settings

1. In your Space settings, set:
   - **SDK**: Docker
   - **Port**: 7860
   - **Sleep time**: 48 hours (optional)

2. The Space will automatically build and deploy

### Step 4: Wait for Build

- Initial build takes 5-10 minutes
- PaddleOCR models are downloaded during build
- Check build logs for any errors

### Step 5: Test Your API

Once deployed, test the health endpoint:
```bash

curl https://YOUR-USERNAME-handyhome-ocr-api.hf.space/health

```

## πŸ”§ Local Development

### Setup

```bash

# Install dependencies

pip install -r requirements.txt



# Run Flask development server

python app.py

```

### Testing

```bash

# Test with a document URL

curl -X POST http://localhost:7860/api/extract-national-id \

  -H "Content-Type: application/json" \

  -d '{"document_url": "YOUR_IMAGE_URL"}'

```

## πŸ“Š Response Format

### Successful Response

```json

{

  "success": true,

  "id_number": "1234-5678-9012-3456",

  "full_name": "Juan Dela Cruz",

  "birth_date": "1990-01-15",

  ...additional fields...

}

```

### Error Response

```json

{

  "success": false,

  "error": "Error description",

  "stderr": "Detailed error message"

}

```

## ⚠️ Limitations

- Requires clear, readable document images
- Works best with well-lit, high-resolution scans
- OCR accuracy depends on image quality
- Some fields may be null if not detected
- Processing time varies based on image size

## πŸ” Security Considerations

- Images are processed in memory and not stored permanently
- All processing happens server-side
- Sensitive data should be transmitted over HTTPS
- Consider rate limiting for production use

## πŸ“„ License

MIT License - See LICENSE file for details

## 🀝 Contributing

Contributions welcome! Please submit issues and pull requests.

## πŸ“ž Support

For issues and questions:
- Open an issue on GitHub
- Contact: [Your contact information]

---

Built with ❀️ using PaddleOCR and Flask