Improve model card: correct pipeline tag, add library_name, link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -1,24 +1,25 @@
1
  ---
 
 
2
  license: other
3
  license_name: cogvlm2
4
  license_link: https://huggingface.co/THUDM/cogvlm2-video-llama3-chat/blob/main/LICENSE
5
-
6
- language:
7
- - en
8
- pipeline_tag: text-generation
9
  tags:
10
  - chat
11
  - cogvlm2
12
  - cogvlm--video
13
-
14
  inference: false
15
  ---
16
 
17
  # VisionReward-Video
18
 
 
 
19
  ## Introduction
20
  We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
21
  Here, we present the model of VisionReward-Video.
22
 
23
  ## Using this model
24
- You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).
 
1
  ---
2
+ language:
3
+ - en
4
  license: other
5
  license_name: cogvlm2
6
  license_link: https://huggingface.co/THUDM/cogvlm2-video-llama3-chat/blob/main/LICENSE
7
+ pipeline_tag: feature-extraction
8
+ library_name: transformers
 
 
9
  tags:
10
  - chat
11
  - cogvlm2
12
  - cogvlm--video
 
13
  inference: false
14
  ---
15
 
16
  # VisionReward-Video
17
 
18
+ This repository contains the model described in the paper [VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation](https://huggingface.co/papers/2412.21059).
19
+
20
  ## Introduction
21
  We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
22
  Here, we present the model of VisionReward-Video.
23
 
24
  ## Using this model
25
+ You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).