New Refresh with added Tool Calling in calibration dataset and improved imatrix

#21

pinned

by danielhanchen - opened 23 days ago

Unsloth AI org 23 days ago

Just a refresh! Adds more tool calling to the calibration process, and a better imatrix process.
Also updated chat template since Qwen changed it a bit

danielhanchen pinned discussion 23 days ago

nelson-tinto

22 days ago

•

edited 22 days ago

Don't know what you guys have changed with your quants but it is not good. I just tried out your refresh of Qwen 3 Coder and it behaves badly. In fact it behaves just like your GLM 4.7 Flash quants. Which is not a compliment. I installed the Qwen3-Coder-30B-A3B-Instruct-UD-Q6_K_XL.gguf and ran it with llama.cpp b7872 on vulkan. The old version Aug. 4? worked excellent! I have since reinstalled it. Here are the issues and I'll give some prompts so you can compare the two yourself.

the new models have problems for lack of a better term "concentrating". If you give them a prompt with four sentences they will ignore at least one or more sentences. If you say don't round the last digit sometimes they will sometimes they wont.
the new models lack attention to visual detail of output. When they output a calendar for instance they might put a newline after every single character in the calendar, or pad with zeros, or eliminate spaces, or left justify everything. The old version just produced a nice calendar.
the new models can't focus. You ask them to fix something in the code they produced and they will blame something unrelated and try to fix it. For instance you tell them they are rounding the last digit and they change the precision. You can even provide them the offending line of code and they just ignore your prompt. The old version rarely needed to fix the code and one-shotted easy prompts. When they did fix code they just fix the real issue.

The weird thing is the behavior of your new quant here for Qwen behaves exactly like the new GLM 4.7 Flash quant. Those I have tested on rocm, vulkan and cuda. All exhibit the same behavior.

So try these simple prompts and compare the output on the refreshed Qwen3 Coder and the older August version:

write a python program to print a calendar of the current month
write a python program that prints a calendar of the current month. Highlight the current date in blue on linux terminals. Use only standard libraries.
write a python program to compute the first 100 digits of pi using bbp. Use only standard libraries. You may use decimal. Do not round the last digit.

Hopefully you get this worked out as your stuff before GLM 4.7 Flash has been outstanding!!!!!

I use llama-swap and here is the command line for Coder:

--jinja --temp 0.7 --min-p 0.0 --top-p 0.80 --top-k 20 --repeat-penalty 1.05 -c 32768 --no-direct-io -np 1

IdealistDoit

22 days ago

•

edited 22 days ago

Yeah, I'm in the same boat. As an agentic coding tool, it was working in https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/tree/7ce945e58ed3f09f9cf9c33a2122d86ac979b457 but now it responds only with 'GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG'

I re-downloaded 7ce945e58ed3f09f9cf9c33a2122d86ac979b457 and used the gguf directly and it's working again.

I previously loaded it with:
llama-server -fa -ngl 99 --jinja --port 9001 --host 0.0.0.0 --ctx-size 120000 -hf unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q8_0

Now I'm loading it with:
llama-server -fa -ngl 99 --jinja --port 9001 --host 0.0.0.0 --ctx-size 120000 -m Qwen3-Coder-30B-A3B-Instruct-Q8_0.gguf against version 7ce945e58ed3f09f9cf9c33a2122d86ac979b457

danielhanchen

Unsloth AI org 22 days ago

Oh this is unexpected sorry everyone - let me recheck and see what happened

danielhanchen

Unsloth AI org 22 days ago

I reverted it for now - I'll do more testing on our side - sorry again guys

owao

22 days ago

That is weird, because I used it yesterday for few hours with opencode without any issue, and was actually doing really good!
It was the Q4_XL uploaded on https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/commit/75c8d15549f5509d1cf941d8ae429f909e0f9bd9
Going to try the new one

owao

22 days ago

•

edited 22 days ago

Oh @danielhanchen I see the sizes are closer to what it was before the yesterday uploads (which were a bit bigger than before for the UD quants)! Did you revert the dynamic quantization approach and only kept the new calibration imatrix and template changes?

IdealistDoit

21 days ago

•

edited 21 days ago

At least for me it working correctly again with the Q8 quants after pointing it back to the latest in the repository.

(It did write the correct stored procedures after that last bit of guidance)

urtuuuu

21 days ago

•

edited 21 days ago

So try these simple prompts and compare the output on the refreshed Qwen3 Coder and the older August version:

I'm using glm flash UD-Q4_K_XL.gguf and loving it (using it in llama-cli vulkan, because it works best in my experience :). Tried your promts and here is the result in vscode terminal:

With second promt the model got confused with something and started thinking endlessly, but after regenerating, the thought process was very short.
-ngl 99 --multiline-input --jinja --temp 0.7 --top-p 1.0 --min-p 0.01 --threads -1 --parallel 1 --ctx-size 16384 --no-mmap
I already managed to create chess game against bot from scratch, step by step, in html, so to me it seems to work well. 2d game with board and pieces of course, not just terminal 😃

It's a bit offtopic, but you talked about glm flash, so...

smcleod

20 days ago

@danielhanchen , I know you folk must be busy as, but the lessons from this one might make a really interesting blog post when you figure out what went wrong.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment