File size: 2,663 Bytes
68a524b
32cb5ae
 
 
 
b3703f0
68a524b
 
 
 
 
 
32cb5ae
 
6f3e563
32cb5ae
 
6f3e563
 
32cb5ae
 
 
6f3e563
32cb5ae
 
 
6f3e563
 
32cb5ae
6f3e563
32cb5ae
6f3e563
32cb5ae
6f3e563
32cb5ae
 
 
6f3e563
 
 
 
 
 
 
 
 
 
 
32cb5ae
 
6f3e563
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32cb5ae
 
6f3e563
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
title: CLIP Score
tags:
- evaluate
- metric
description: "CLIPScore is a reference-free evaluation metric for image captioning that measures the alignment between images and their corresponding text descriptions."
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
---

# Metric Card for CLIP Score

***Module Card Instructions:*** *This module calculates CLIPScore, a reference-free evaluation metric for image captioning.*

## Metric Description

CLIPScore is a reference-free evaluation metric for image captioning that measures the alignment between images and their corresponding text descriptions. It leverages the CLIP (Contrastive Language-Image Pretraining) model to compute a similarity score between the visual and textual modalities.

## How to Use

To use the CLIPScore metric, you need to provide a list of text predictions and a list of images. The metric will compute the CLIPScore for each pair of image and text.

### Inputs

- **predictions** *(string): A list of text predictions to score. Each prediction should be a string.*
- **references** *(PIL.Image.Image): A list of images to score against. Each image should be a PIL image.*

### Output Values

The CLIPScore metric outputs a dictionary with a single key-value pair:

- **clip_score** *(float)*: The average CLIPScore across all provided image-text pairs. The score ranges from -1 to 1, where higher scores indicate better alignment between the image and text.

### Examples

```python
from PIL import Image
import evaluate

metric = evaluate.load("sunhill/clip_score")
predictions = ["A cat sitting on a windowsill.", "A dog playing with a ball."]
references = [Image.open("cat.jpg"), Image.open("dog.jpg")]
results = metric.compute(predictions=predictions, references=references)
print(results)
# Output: {'clip_score': 0.85}
```

## Citation

```bibtex
@article{DBLP:journals/corr/abs-2104-08718,
    author       = {Jack Hessel and
                    Ari Holtzman and
                    Maxwell Forbes and
                    Ronan Le Bras and
                    Yejin Choi},
    title        = {CLIPScore: {A} Reference-free Evaluation Metric for Image Captioning},
    journal      = {CoRR},
    volume       = {abs/2104.08718},
    year         = {2021},
    url          = {https://arxiv.org/abs/2104.08718},
    eprinttype   = {arXiv},
    eprint       = {2104.08718},
    timestamp    = {Sat, 29 Apr 2023 10:09:27 +0200},
    biburl       = {https://dblp.org/rec/journals/corr/abs-2104-08718.bib},
    bibsource    = {dblp computer science bibliography, https://dblp.org}
}
```

## Further References

- [clip-score](https://github.com/Taited/clip-score)