🤗Transformers

Topic	Replies	Views	Activity
Is LLaMA rotary embedding implementation correct? 🤗Transformers	8	9501	February 26, 2026
How to finetune with a own private data and then build chatbot on that? 🤗Transformers	5	14374	February 26, 2026
Gemma 3 12B: 4-bit Quantization failing/ignored in Transformers v5.1.0 (Gemma3ForConditionalGeneration) 🤗Transformers	10	68	February 23, 2026
Wave Field LLM — O(n log n) attention via wave equation dynamics, within 5% of standard transformer 🤗Transformers	2	2640	February 23, 2026
[Help Needed] Dual-Phase Softmax Steering on Llama-2 Residual Stream Yields Identical POPE Results 🤗Transformers	3	25	February 23, 2026
[Research/Discussion] Depth-agnostic stability for residual models (no extra norms, no tuning). Is this useful to you? 🤗Transformers	1	25	February 22, 2026
LLaVA Steering: Why does grounding fix hallucinations in captioning but not in Yes/No QA? 🤗Transformers	1	26	February 19, 2026
KV Caching problem with gemma 3 🤗Transformers	2	34	February 17, 2026
Num_beam_groups removed in V5? 🤗Transformers	1	23	February 14, 2026
[LLaVA-1.5] Implementing Control Barrier Functions (LCBF) via Attention Hooking – Persistent AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb' 🤗Transformers	4	16	February 13, 2026
Error while importing "Trainer" 🤗Transformers	1	55	February 13, 2026
[LLaVA-1.5] Very low hallucination rate & weak attention correlation in "Attention Gap" experiment – Is my implementation of output_attentions correct? 🤗Transformers	4	24	February 12, 2026
Confusion with freezing Whisper's feature encoder 🤗Transformers	3	21	February 11, 2026
When using Whisper, pipeline notifies that generation_config default values have been modified, even for base models 🤗Transformers	4	39	February 8, 2026
Hyperparameters vs message format prompt tuning 🤗Transformers	2	27	February 6, 2026
SFT Conversation llama3-8b-Instruct fails with assistant_only_loss=True 🤗Transformers	2	78	February 5, 2026
How to train T5 to distinguish task-relevant tokens from contextual noise? 🤗Transformers	1	20	February 5, 2026
Finetuning whisper attention mask not set and canot be inferred 🤗Transformers	5	6190	February 4, 2026
Abnormal generation after multi GPU 🤗Transformers	4	40	February 4, 2026
500 Internal Error - We're working hard to fix this as soon as possible 🤗Transformers	46	3221	February 1, 2026
Caching image prototype embeddings for image-guided object detection using OWL-ViT 🤗Transformers	3	495	January 31, 2026
[Quiestion]How to specify 'model_type' of 'Qwen/Qwen3-VL-8B-Instruct-GGUF'? 🤗Transformers	4	65	January 30, 2026
SAM3Video: CLIPTextModelOutput passed as tensor causes crash with text prompts 🤗Transformers	0	44	January 29, 2026
Different lm_head size and vocab_size 🤗Transformers	1	918	January 28, 2026
Custom KV Cache Steering Implementation Fails with IndexError in LLaVA Generation 🤗Transformers	1	19	January 28, 2026
Transformers v5 timelines 🤗Transformers	1	41	January 28, 2026
Issue: Discrepancy Between Layer-Wise Density Plots vs. Mean Trajectory Plots in LLaVA-1.5 Attention Analysis 🤗Transformers	2	20	January 25, 2026
[Discussion] Validating Attention Map Visualization for Visual Fading in LLaVA-1.5 🤗Transformers	4	49	January 23, 2026
No fix for High Vulnerabilities in transformers latest package 🤗Transformers	2	36	January 22, 2026
How to disable caching in .from_pretrained() 🤗Transformers	4	1287	January 18, 2026