|
[LLaVA-1.5] Very low hallucination rate & weak attention correlation in "Attention Gap" experiment – Is my implementation of output_attentions correct?
|
|
3
|
10
|
February 12, 2026
|
|
Gemma 3 12B: 4-bit Quantization failing/ignored in Transformers v5.1.0 (Gemma3ForConditionalGeneration)
|
|
4
|
10
|
February 11, 2026
|
|
Confusion with freezing Whisper's feature encoder
|
|
3
|
9
|
February 11, 2026
|
|
When using Whisper, pipeline notifies that generation_config default values have been modified, even for base models
|
|
4
|
29
|
February 8, 2026
|
|
Hyperparameters vs message format prompt tuning
|
|
2
|
25
|
February 6, 2026
|
|
SFT Conversation llama3-8b-Instruct fails with assistant_only_loss=True
|
|
2
|
35
|
February 5, 2026
|
|
How to train T5 to distinguish task-relevant tokens from contextual noise?
|
|
1
|
18
|
February 5, 2026
|
|
Finetuning whisper attention mask not set and canot be inferred
|
|
5
|
6171
|
February 4, 2026
|
|
Abnormal generation after multi GPU
|
|
4
|
33
|
February 4, 2026
|
|
500 Internal Error - We're working hard to fix this as soon as possible
|
|
46
|
3144
|
February 1, 2026
|
|
Caching image prototype embeddings for image-guided object detection using OWL-ViT
|
|
3
|
490
|
January 31, 2026
|
|
[Quiestion]How to specify 'model_type' of 'Qwen/Qwen3-VL-8B-Instruct-GGUF'?
|
|
4
|
43
|
January 30, 2026
|
|
SAM3Video: CLIPTextModelOutput passed as tensor causes crash with text prompts
|
|
0
|
38
|
January 29, 2026
|
|
Different lm_head size and vocab_size
|
|
1
|
914
|
January 28, 2026
|
|
Custom KV Cache Steering Implementation Fails with IndexError in LLaVA Generation
|
|
1
|
17
|
January 28, 2026
|
|
Transformers v5 timelines
|
|
1
|
38
|
January 28, 2026
|
|
Issue: Discrepancy Between Layer-Wise Density Plots vs. Mean Trajectory Plots in LLaVA-1.5 Attention Analysis
|
|
2
|
18
|
January 25, 2026
|
|
[Discussion] Validating Attention Map Visualization for Visual Fading in LLaVA-1.5
|
|
4
|
41
|
January 23, 2026
|
|
No fix for High Vulnerabilities in transformers latest package
|
|
2
|
35
|
January 22, 2026
|
|
How to disable caching in .from_pretrained()
|
|
4
|
1250
|
January 18, 2026
|
|
DetLLM – Deterministic Inference Checks
|
|
0
|
25
|
January 17, 2026
|
|
Distributed LLaMA Inference Engine Built from Scratch (KV Cache, GQA, RoPE)
|
|
0
|
29
|
January 16, 2026
|
|
Run name issue, different run name file in webpage & local
|
|
1
|
91
|
January 16, 2026
|
|
Whisper fine-tuned with custom tokens works with model.generate but doesn't with a pipeline()
|
|
3
|
53
|
January 14, 2026
|
|
GPT 2 finetuning peaks at 8 GiB of VRAM
|
|
7
|
92
|
January 12, 2026
|
|
Model_accepts_loss_kwargs detection based on **kwargs is too permissive
|
|
2
|
276
|
January 5, 2026
|
|
Seeking Advice🔥🔥| Strategy for Embedding Multiple Subjective Reviews in One-time Event Domain Recommendations
|
|
2
|
50
|
January 23, 2026
|
|
TurboTensors: Optimizing CPU LLM Performance
|
|
0
|
27
|
December 31, 2025
|
|
Significant generation degradation and repetition loops when enabling KV-cache for Qwen3-VL
|
|
2
|
130
|
December 29, 2025
|
|
Injecting multi modal embeddings into a language model breaks the `generate` function
|
|
1
|
93
|
December 28, 2025
|