zai-org
/

VisionReward-Video

@@ -1,24 +1,25 @@
 ---
 license: other
 license_name: cogvlm2
 license_link: https://huggingface.co/THUDM/cogvlm2-video-llama3-chat/blob/main/LICENSE
-language:
-- en
-pipeline_tag: text-generation
 tags:
 - chat
 - cogvlm2
 - cogvlm--video
 inference: false
 ---
 # VisionReward-Video
 ## Introduction
 We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
 Here, we present the model of VisionReward-Video.
 ## Using this model
-You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).

 ---
+language:
+- en
 license: other
 license_name: cogvlm2
 license_link: https://huggingface.co/THUDM/cogvlm2-video-llama3-chat/blob/main/LICENSE
+pipeline_tag: feature-extraction
+library_name: transformers
 tags:
 - chat
 - cogvlm2
 - cogvlm--video
 inference: false
 ---
 # VisionReward-Video
+This repository contains the model described in the paper [VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation](https://huggingface.co/papers/2412.21059).
 ## Introduction
 We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
 Here, we present the model of VisionReward-Video.
 ## Using this model
+You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).