Please update llama.cpp to see improved performance!

#7
by danielhanchen - opened

Hey guys, please update llama.cpp to use the latest updates from 2 days ago. According to many of people and our tests, you should see large improvements in Devstral 2 etc for use cases like tool calling as well. Looping should be also less.

We'll be reconverting today and all should be reuploaded by tomorrow.

See these 2 pull requests and issues:
https://github.com/ggml-org/llama.cpp/pull/17945
https://github.com/ggml-org/llama.cpp/issues/17980

danielhanchen pinned discussion

when can we tell that its been updated?

just testing some arbitrary PHP code with Devstral 2 Small, Roo code and updated llama.cpp, works great so far (also better than Qwen3 Coder and Devstral 2507)

tested in roo code for a tetris game, it worked, no errors

tested in roo code for a tetris game, it worked, no errors

What backend do you use? I can't get roocode to work, telling me it doesn't support tool calling...
Tried koboldcpp with jinja and jinja for tools enabled:

The model returned no assistant messages. This may indicate an issue with the API or the model's output. - koboldcpp over OpenAI Compatible

Date/time: 2026-01-10T17:03:04.836Z
Extension version: 3.39.2
Provider: ollama
Model: MODELFILE

The model provided text/reasoning but did not call any of the required tools. This usually indicates the model misunderstood the task or is having difficulty determining which tool to use. The model has been automatically prompted to retry with proper tool usage. - koboldcpp over ollama

and ollama afterwards with the same file and:

PARAMETER num_ctx 65536
PARAMETER temperature 0.7

If you're using ollama, which quant did you use and did you install it through ollama itself?

If you're using llamacpp, can you tell me how? I've never served a model using it.

I barely use llamacpp/ollama only tabbyapi/koboldcpp

Sign up or log in to comment