Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- vision
|
| 5 |
+
- coreml
|
| 6 |
+
- apple-neural-engine
|
| 7 |
+
- ane
|
| 8 |
+
- perception-encoder
|
| 9 |
+
- clip
|
| 10 |
+
- image-embedding
|
| 11 |
+
library_name: coremltools
|
| 12 |
+
pipeline_tag: image-feature-extraction
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# PE-Core ANE (Apple Neural Engine) Models
|
| 16 |
+
|
| 17 |
+
Perception Encoder (PE-Core) models converted to CoreML format optimized for Apple Neural Engine (ANE).
|
| 18 |
+
|
| 19 |
+
## Models
|
| 20 |
+
|
| 21 |
+
| Model | Params | Size | Input | Embedding | Accuracy |
|
| 22 |
+
|-------|--------|------|-------|-----------|----------|
|
| 23 |
+
| PE-Core-G14-448-ANE | 2.4B | 3.5GB | 448x448 | 1280 | 1.0000 |
|
| 24 |
+
| PE-Core-L-14-336-ANE | 300M | 604MB | 336x336 | 1024 | 1.0000 |
|
| 25 |
+
| PE-Core-B-16-ANE | 86M | 178MB | 224x224 | 768 | 0.9998 |
|
| 26 |
+
| PE-Core-S-16-384-ANE | 22M | 45MB | 384x384 | 384 | 1.0000 |
|
| 27 |
+
| PE-Core-T-16-384-ANE | 6M | 12MB | 384x384 | 192 | 0.9999 |
|
| 28 |
+
|
| 29 |
+
## Performance (M3 Mac)
|
| 30 |
+
|
| 31 |
+
| Model | ANE Latency | MPS Latency | Speedup |
|
| 32 |
+
|-------|-------------|-------------|---------|
|
| 33 |
+
| PE-Core-bigG-14-448 | 783ms | 1049ms | 1.34x |
|
| 34 |
+
| PE-Core-L-14-336 | ~180ms | ~280ms | ~1.5x |
|
| 35 |
+
| PE-Core-B-16 | ~50ms | ~80ms | ~1.6x |
|
| 36 |
+
|
| 37 |
+
## Usage (Python)
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
import coremltools as ct
|
| 41 |
+
import numpy as np
|
| 42 |
+
|
| 43 |
+
# Load model
|
| 44 |
+
model = ct.models.MLModel("PE-Core-B-16-ANE.mlpackage")
|
| 45 |
+
|
| 46 |
+
# Prepare image (1, 3, 224, 224) normalized
|
| 47 |
+
image = np.random.randn(1, 3, 224, 224).astype(np.float32)
|
| 48 |
+
|
| 49 |
+
# Get embedding
|
| 50 |
+
output = model.predict({"image": image})
|
| 51 |
+
embedding = output["embedding"] # (1, 768)
|
| 52 |
+
|
| 53 |
+
# Normalize for similarity search
|
| 54 |
+
embedding = embedding / np.linalg.norm(embedding)
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Usage (Swift)
|
| 58 |
+
|
| 59 |
+
```swift
|
| 60 |
+
import CoreML
|
| 61 |
+
|
| 62 |
+
let model = try MLModel(contentsOf: modelURL)
|
| 63 |
+
let input = try MLDictionaryFeatureProvider(dictionary: ["image": pixelBuffer])
|
| 64 |
+
let output = try model.prediction(from: input)
|
| 65 |
+
let embedding = output.featureValue(for: "embedding")!.multiArrayValue!
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## Conversion Details
|
| 69 |
+
|
| 70 |
+
- **Source**: Meta's Perception Encoder via open_clip
|
| 71 |
+
- **Format**: CoreML mlpackage (FP16)
|
| 72 |
+
- **Target**: macOS 14+ (ANE optimized)
|
| 73 |
+
- **Accuracy**: >99.98% cosine similarity vs PyTorch
|
| 74 |
+
|
| 75 |
+
## Credits
|
| 76 |
+
|
| 77 |
+
- Original models: [Meta AI Perception Encoder](https://github.com/facebookresearch/perception_models)
|
| 78 |
+
- Loaded via: [open_clip](https://github.com/mlfoundations/open_clip)
|
| 79 |
+
- Converted with: [coremltools](https://github.com/apple/coremltools)
|