Please update llama.cpp to see improved performance!
Hey guys, please update llama.cpp to use the latest updates from 2 days ago. According to many of people and our tests, you should see large improvements in Devstral 2 etc for use cases like tool calling as well. Looping should be also less.
We'll be reconverting today and all should be reuploaded by tomorrow.
See these 2 pull requests and issues:
https://github.com/ggml-org/llama.cpp/pull/17945
https://github.com/ggml-org/llama.cpp/issues/17980
when can we tell that its been updated?
just testing some arbitrary PHP code with Devstral 2 Small, Roo code and updated llama.cpp, works great so far (also better than Qwen3 Coder and Devstral 2507)
tested in roo code for a tetris game, it worked, no errors
tested in roo code for a tetris game, it worked, no errors
What backend do you use? I can't get roocode to work, telling me it doesn't support tool calling...
Tried koboldcpp with jinja and jinja for tools enabled:
The model returned no assistant messages. This may indicate an issue with the API or the model's output. - koboldcpp over OpenAI Compatible
Date/time: 2026-01-10T17:03:04.836Z
Extension version: 3.39.2
Provider: ollama
Model: MODELFILE
The model provided text/reasoning but did not call any of the required tools. This usually indicates the model misunderstood the task or is having difficulty determining which tool to use. The model has been automatically prompted to retry with proper tool usage. - koboldcpp over ollama
and ollama afterwards with the same file and:
PARAMETER num_ctx 65536
PARAMETER temperature 0.7
If you're using ollama, which quant did you use and did you install it through ollama itself?
If you're using llamacpp, can you tell me how? I've never served a model using it.
I barely use llamacpp/ollama only tabbyapi/koboldcpp