Spaces:
Running
Running
Updated short description
Browse files
README.md
CHANGED
|
@@ -11,8 +11,12 @@ tags:
|
|
| 11 |
- evaluate
|
| 12 |
- metric
|
| 13 |
description: >-
|
| 14 |
-
HaRiM+ is reference-less metric for summary quality evaluation which hurls the
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
|
|
@@ -83,4 +87,4 @@ Please cite as follows
|
|
| 83 |
pages = "895--924",
|
| 84 |
abstract = "One of the challenges of developing a summarization model arises from the difficulty in measuring the factual inconsistency of the generated text. In this study, we reinterpret the decoder overconfidence-regularizing objective suggested in (Miao et al., 2021) as a hallucination risk measurement to better estimate the quality of generated summaries. We propose a reference-free metric, HaRiM+, which only requires an off-the-shelf summarization model to compute the hallucination risk based on token likelihoods. Deploying it requires no additional training of models or ad-hoc modules, which usually need alignment to human judgments. For summary-quality estimation, HaRiM+ records state-of-the-art correlation to human judgment on three summary-quality annotation sets: FRANK, QAGS, and SummEval. We hope that our work, which merits the use of summarization models, facilitates the progress of both automated evaluation and generation of summary.",
|
| 85 |
}
|
| 86 |
-
```
|
|
|
|
| 11 |
- evaluate
|
| 12 |
- metric
|
| 13 |
description: >-
|
| 14 |
+
HaRiM+ is reference-less metric for summary quality evaluation which hurls the
|
| 15 |
+
power of summarization model to estimate the quality of the summary-article
|
| 16 |
+
pair. <br /> Note that this metric is reference-free and do not require
|
| 17 |
+
training. It is ready to go without reference text to compare with the
|
| 18 |
+
generation nor any model training for scoring.
|
| 19 |
+
short_description: HaRiM+ is a reference-free summary faithfulness measure.
|
| 20 |
---
|
| 21 |
|
| 22 |
|
|
|
|
| 87 |
pages = "895--924",
|
| 88 |
abstract = "One of the challenges of developing a summarization model arises from the difficulty in measuring the factual inconsistency of the generated text. In this study, we reinterpret the decoder overconfidence-regularizing objective suggested in (Miao et al., 2021) as a hallucination risk measurement to better estimate the quality of generated summaries. We propose a reference-free metric, HaRiM+, which only requires an off-the-shelf summarization model to compute the hallucination risk based on token likelihoods. Deploying it requires no additional training of models or ad-hoc modules, which usually need alignment to human judgments. For summary-quality estimation, HaRiM+ records state-of-the-art correlation to human judgment on three summary-quality annotation sets: FRANK, QAGS, and SummEval. We hope that our work, which merits the use of summarization models, facilitates the progress of both automated evaluation and generation of summary.",
|
| 89 |
}
|
| 90 |
+
```
|