VisionReward-Video / README.md
nielsr's picture
nielsr HF Staff
Improve model card: correct pipeline tag, add library_name, link to paper
04e6bdf verified
|
raw
history blame
1.34 kB
metadata
language:
  - en
license: other
license_name: cogvlm2
license_link: https://huggingface.co/THUDM/cogvlm2-video-llama3-chat/blob/main/LICENSE
pipeline_tag: feature-extraction
library_name: transformers
tags:
  - chat
  - cogvlm2
  - cogvlm--video
inference: false

VisionReward-Video

This repository contains the model described in the paper VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation.

Introduction

We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction. Here, we present the model of VisionReward-Video.

Using this model

You can quickly install the Python package dependencies and run model inference in our github.