You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

WavCochCausalV64000100M

WavCoch is a causal waveform-to-cochleagram tokenizer by Greta Tuckute and Klemen Kotar.

Model Details

Parameter Value
Parameters ~93.05M
Window Size 1001
Hop Length 80
Encoder Dim 1536
Vocabulary Size 64000
Includes Vocoder False

Usage

from transformers import AutoModel

wavcoch = AutoModel.from_pretrained(
    "TuKoResearch/WavCochCausalV64000100M",
    trust_remote_code=True,
)

codes = wavcoch.quantize(waveform_tensor)
coch = wavcoch.decode(codes)
embeddings = wavcoch(
    input_values=waveform_tensor,
    output_hidden_states=True,
    sampling_rate=16000,
).hidden_states[0]

Notes

This repo contains the WavCoch tokenizer/autoencoder only. Audio decoding requires a vocoder-enabled checkpoint.

When called with output_hidden_states=True, WavCoch exposes a single hidden-state layer: the post-FSQ projected embedding sequence used for direct probing.

Downloads last month
5
Safetensors
Model size
93M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support