New Refresh with added Tool Calling in calibration dataset and improved imatrix

#21
by danielhanchen - opened
Unsloth AI org

Just a refresh! Adds more tool calling to the calibration process, and a better imatrix process.
Also updated chat template since Qwen changed it a bit

danielhanchen pinned discussion

Don't know what you guys have changed with your quants but it is not good. I just tried out your refresh of Qwen 3 Coder and it behaves badly. In fact it behaves just like your GLM 4.7 Flash quants. Which is not a compliment. I installed the Qwen3-Coder-30B-A3B-Instruct-UD-Q6_K_XL.gguf and ran it with llama.cpp b7872 on vulkan. The old version Aug. 4? worked excellent! I have since reinstalled it. Here are the issues and I'll give some prompts so you can compare the two yourself.

  • the new models have problems for lack of a better term "concentrating". If you give them a prompt with four sentences they will ignore at least one or more sentences. If you say don't round the last digit sometimes they will sometimes they wont.
  • the new models lack attention to visual detail of output. When they output a calendar for instance they might put a newline after every single character in the calendar, or pad with zeros, or eliminate spaces, or left justify everything. The old version just produced a nice calendar.
  • the new models can't focus. You ask them to fix something in the code they produced and they will blame something unrelated and try to fix it. For instance you tell them they are rounding the last digit and they change the precision. You can even provide them the offending line of code and they just ignore your prompt. The old version rarely needed to fix the code and one-shotted easy prompts. When they did fix code they just fix the real issue.

The weird thing is the behavior of your new quant here for Qwen behaves exactly like the new GLM 4.7 Flash quant. Those I have tested on rocm, vulkan and cuda. All exhibit the same behavior.

So try these simple prompts and compare the output on the refreshed Qwen3 Coder and the older August version:

  • write a python program to print a calendar of the current month

  • write a python program that prints a calendar of the current month. Highlight the current date in blue on linux terminals. Use only standard libraries.

  • write a python program to compute the first 100 digits of pi using bbp. Use only standard libraries. You may use decimal. Do not round the last digit.

Hopefully you get this worked out as your stuff before GLM 4.7 Flash has been outstanding!!!!!

I use llama-swap and here is the command line for Coder:

--jinja --temp 0.7 --min-p 0.0 --top-p 0.80 --top-k 20 --repeat-penalty 1.05 -c 32768 --no-direct-io -np 1

Yeah, I'm in the same boat. As an agentic coding tool, it was working in https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/tree/7ce945e58ed3f09f9cf9c33a2122d86ac979b457 but now it responds only with 'GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG'

image
I re-downloaded 7ce945e58ed3f09f9cf9c33a2122d86ac979b457 and used the gguf directly and it's working again.

I previously loaded it with:
llama-server -fa -ngl 99 --jinja --port 9001 --host 0.0.0.0 --ctx-size 120000 -hf unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q8_0

Now I'm loading it with:
llama-server -fa -ngl 99 --jinja --port 9001 --host 0.0.0.0 --ctx-size 120000 -m Qwen3-Coder-30B-A3B-Instruct-Q8_0.gguf against version 7ce945e58ed3f09f9cf9c33a2122d86ac979b457

Unsloth AI org

Oh this is unexpected sorry everyone - let me recheck and see what happened

Unsloth AI org

I reverted it for now - I'll do more testing on our side - sorry again guys

That is weird, because I used it yesterday for few hours with opencode without any issue, and was actually doing really good!
It was the Q4_XL uploaded on https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/commit/75c8d15549f5509d1cf941d8ae429f909e0f9bd9
Going to try the new one

Oh @danielhanchen I see the sizes are closer to what it was before the yesterday uploads (which were a bit bigger than before for the UD quants)! Did you revert the dynamic quantization approach and only kept the new calibration imatrix and template changes?

At least for me it working correctly again with the Q8 quants after pointing it back to the latest in the repository.
image

(It did write the correct stored procedures after that last bit of guidance)

So try these simple prompts and compare the output on the refreshed Qwen3 Coder and the older August version:

I'm using glm flash UD-Q4_K_XL.gguf and loving it (using it in llama-cli vulkan, because it works best in my experience :). Tried your promts and here is the result in vscode terminal:

glm

With second promt the model got confused with something and started thinking endlessly, but after regenerating, the thought process was very short.
-ngl 99 --multiline-input --jinja --temp 0.7 --top-p 1.0 --min-p 0.01 --threads -1 --parallel 1 --ctx-size 16384 --no-mmap
I already managed to create chess game against bot from scratch, step by step, in html, so to me it seems to work well. 2d game with board and pieces of course, not just terminal πŸ˜ƒ

chess
It's a bit offtopic, but you talked about glm flash, so...

@danielhanchen , I know you folk must be busy as, but the lessons from this one might make a really interesting blog post when you figure out what went wrong.

Sign up or log in to comment