CLIP-LoRA
In this work, we show that lightweight tuning of vision–language foundation models, combined with domain-adapted face recognition networks, can effectively bridge the domain gap between photographs and paintings. Our fusion approach achieves state-of-the-art accuracy in sitter identification. Face recognition on artworks remains a particularly difficult task compared to traditional FR due to the scarcity of labelled data, stylistic variation, and the interpretive nature of portraiture. However, the results show that adapting modern architectures to this setting is feasible and promising. This opens up new research avenues, including synthetic data generation to augment the limited training set and heterogeneous domain adaptation techniques to improve generalisation across visual domains. Project page: https://www.idiap.ch/paper/artface/
Overview
- Training: ArtFace was trained on The Historical Faces dataset (it consists of 766 paintings of 210 different sitters)
- Backbone: CLIP-LoRA is adapted from CLIP (ViT-B/16) by OpenAI.
- Source model: https://github.com/openai/CLIP
- Base model:
ViT-B-16.pt - License: MIT License
- Parameters: 1M
- Task: Towards Historical Portrait Face Identification via Model Adaptation
- Framework: Pytorch
- Output structure: Batch of face embeddings (ie, features)
Evaluation of Models:
Overview of the proposed method: (a) LoRA-based adaptation of the CLIP model, and (b) head adaptation using triplet loss.
ROC curves of tuned and base CLIP, IResNet100, COTS and proposed fusion method. Fusion provides consistent improvements even at low FAR.
| Model | EER | TAR @ 0.1% FAR | TAR @ 1% FAR |
|---|---|---|---|
| COTS FR system | 12.6 | 34.3% | 58.1% |
| CLIP-Base | 17.9 | 8.4% | 33.2% |
| IResNet100-Base | 14.0 | 29.9% | 55.1% |
| CLIP-Base + IResNet100-Base | 13.1 | 29.0% | 54.7% |
| CLIP-Base + IResNet100-Tuned | 12.6 | 35.1% | 57.9% |
| CLIP-LoRA + IResNet100-Base | 11.1 | 34.6% | 62.6% |
| CLIP-LoRA + IResNet100-Tuned | 10.7 | 39.7% | 62.15% |
| CLIP-LoRA + IResNet100-Base + IResNet100-Tuned | 9.9 | 39.7% | 65.9% |
Performance Comparison of Base, Tuned models, Fusion, and COTS FR Systems. Fusion enhances overall accuracy.
Running Code
- Minimal code to instantiate the model and perform inference:
# The command below can be used to align the images.
python align.py -f [path_to_paintings]/* -o data/paintings
# Run the commands below to test the full model.
python generate-scores.py fusion
python evaluate.py table -f out/fusion.csv
python plot.py roc --log -f out/fusion.csv
# To use the model directly, use the following code snippet:
from lib.models import get_model
from PIL import Image
model, preprocess = get_model("fusion").torch()
model.eval()
image = Image.open("...")
inputs = preprocess(image)
embedding = model(inputs).squeeze()
License
Copyright
(c) 2025, Francois Poh, Anjith George, Sébastien Marcel Idiap Research Institute, Martigny 1920, Switzerland.
https://gitlab.idiap.ch/biometric/code.iccv2025artmetrics.artface/
Please refer to the link for information about the License & Copyright terms and conditions.
Citation
If you find our work useful, please cite the following publication:
@article{poh2025artface,
title={ArtFace: Towards Historical Portrait Face Identification via Model Adaptation},
author={Poh, Francois and George, Anjith and Marcel, S{\'e}bastien},
journal={arXiv preprint arXiv:2508.20626},
year={2025}
}

