An experiment, see details: https://github.com/fakerybakery/ReverseBERT. Inspired by https://github.com/vec2text/vec2text

Overview

Can you go from embeddings back to text?

The setup is pretty simple: take a sentence encoder and freeze it. Then train a small projection layer that maps those embeddings into "soft prompt" tokens for a language model. The LLM learns to reconstruct the original text from just those projected embeddings.

It's far from perfect. You probably can't reconstruct the exact meaning of the text, but you can get the general idea/vibe of the original input.

Usage

See: https://github.com/fakerybakery/ReverseBERT/blob/main/infer.py

Reconstruction samples

Original Reconstructed
Young adult male, expressing contemplation and sadness, speaking in a calm, reflective tone with a steady pace and falling intonation. young adult male, expressing Contemplation and Sadness, speaking in a calm, reflective tone with a steady pace and clear articulation.
Young adult male, expressing joy and excitement, speaking in an upbeat, energetic tone with a quick pace and rising intonation. young adult male, expressing high Elation and Amusement, speaking in a cheerful and energetic tone with a moderate pace and rising intonation.
Young adult male, expressing anger and frustration, speaking in a tense, sharp tone with an uneven pace and emphatic intonation. Young adult male, expressing Anger and Contempt, speaking in a tense, frustrated tone with a sharp, clipped pace.
Young adult male, expressing fear and anxiety, speaking in a hesitant, wavering tone with a rushed pace and uncertain intonation. young adult male, expressing nervousness and confusion, speaking in a hesitant, paced tone with a slightly shaky delivery.
Young adult male, expressing serenity and contentment, speaking in a soft, gentle tone with a slow pace and level intonation. young adult male, expressing calm contentment and gentle affection, speaking in a soft, soothing tone with a slow, deliberate pace.

Credits

As always, huge thanks to Hugging Face 🤗 for supporting the compute used to train this model!

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mrfakename/ReverseBERT-GTE-Base-EN-1.5

Adapter
(55)
this model

Dataset used to train mrfakename/ReverseBERT-GTE-Base-EN-1.5