CLIP-LoRA

In this work, we show that lightweight tuning of vision–language foundation models, combined with domain-adapted face recognition networks, can effectively bridge the domain gap between photographs and paintings. Our fusion approach achieves state-of-the-art accuracy in sitter identification. Face recognition on artworks remains a particularly difficult task compared to traditional FR due to the scarcity of labelled data, stylistic variation, and the interpretive nature of portraiture. However, the results show that adapting modern architectures to this setting is feasible and promising. This opens up new research avenues, including synthetic data generation to augment the limited training set and heterogeneous domain adaptation techniques to improve generalisation across visual domains. Project page: https://www.idiap.ch/paper/artface/

Overview

Training: ArtFace was trained on The Historical Faces dataset (it consists of 766 paintings of 210 different sitters)
Backbone: CLIP-LoRA is adapted from CLIP (ViT-B/16) by OpenAI.
- Source model: https://github.com/openai/CLIP
- Base model: ViT-B-16.pt
- License: MIT License
Parameters: 1M
Task: Towards Historical Portrait Face Identification via Model Adaptation
Framework: Pytorch
Output structure: Batch of face embeddings (ie, features)

Evaluation of Models:

Overview of the proposed method: (a) LoRA-based adaptation of the CLIP model, and (b) head adaptation using triplet loss.

{width=80%}

ROC curves of tuned and base CLIP, IResNet100, COTS and proposed fusion method. Fusion provides consistent improvements even at low FAR.

Model	EER	TAR @ 0.1% FAR	TAR @ 1% FAR
COTS FR system	12.6	34.3%	58.1%
CLIP-Base	17.9	8.4%	33.2%
IResNet100-Base	14.0	29.9%	55.1%
CLIP-Base + IResNet100-Base	13.1	29.0%	54.7%
CLIP-Base + IResNet100-Tuned	12.6	35.1%	57.9%
CLIP-LoRA + IResNet100-Base	11.1	34.6%	62.6%
CLIP-LoRA + IResNet100-Tuned	10.7	39.7%	62.15%
CLIP-LoRA + IResNet100-Base + IResNet100-Tuned	9.9	39.7%	65.9%

Performance Comparison of Base, Tuned models, Fusion, and COTS FR Systems. Fusion enhances overall accuracy.

Running Code

Minimal code to instantiate the model and perform inference:

  # The command below can be used to align the images.
  python align.py -f [path_to_paintings]/* -o data/paintings
  # Run the commands below to test the full model.
  python generate-scores.py fusion
  python evaluate.py table -f out/fusion.csv
  python plot.py roc --log -f out/fusion.csv
  # To use the model directly, use the following code snippet:
  from lib.models import get_model
  from PIL import Image
  model, preprocess = get_model("fusion").torch()
  model.eval()
  image = Image.open("...")
  inputs = preprocess(image)
  embedding = model(inputs).squeeze()

License

CC BY-NC 4.0

Copyright

https://gitlab.idiap.ch/biometric/code.iccv2025artmetrics.artface/

Please refer to the link for information about the License & Copyright terms and conditions.

Citation

If you find our work useful, please cite the following publication:

@article{poh2025artface,
  title={ArtFace: Towards Historical Portrait Face Identification via Model Adaptation},
  author={Poh, Francois and George, Anjith and Marcel, S{\'e}bastien},
  journal={arXiv preprint arXiv:2508.20626},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support