Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Paper β’ 2504.12626 β’ Published β’ 51
None defined yet.
float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95