Instructions to use MTSAIR/Kodify-Nano-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use MTSAIR/Kodify-Nano-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="MTSAIR/Kodify-Nano-GGUF", filename="Kodify_Nano.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use MTSAIR/Kodify-Nano-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S # Run inference directly in the terminal: llama-cli -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S # Run inference directly in the terminal: llama-cli -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S # Run inference directly in the terminal: ./llama-cli -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S # Run inference directly in the terminal: ./build/bin/llama-cli -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S
Use Docker
docker model run hf.co/MTSAIR/Kodify-Nano-GGUF:Q4_K_S
- LM Studio
- Jan
- vLLM
How to use MTSAIR/Kodify-Nano-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MTSAIR/Kodify-Nano-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MTSAIR/Kodify-Nano-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MTSAIR/Kodify-Nano-GGUF:Q4_K_S
- Ollama
How to use MTSAIR/Kodify-Nano-GGUF with Ollama:
ollama run hf.co/MTSAIR/Kodify-Nano-GGUF:Q4_K_S
- Unsloth Studio
How to use MTSAIR/Kodify-Nano-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MTSAIR/Kodify-Nano-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MTSAIR/Kodify-Nano-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for MTSAIR/Kodify-Nano-GGUF to start chatting
- Pi
How to use MTSAIR/Kodify-Nano-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "MTSAIR/Kodify-Nano-GGUF:Q4_K_S" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use MTSAIR/Kodify-Nano-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf MTSAIR/Kodify-Nano-GGUF:Q4_K_S
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default MTSAIR/Kodify-Nano-GGUF:Q4_K_S
Run Hermes
hermes
- Docker Model Runner
How to use MTSAIR/Kodify-Nano-GGUF with Docker Model Runner:
docker model run hf.co/MTSAIR/Kodify-Nano-GGUF:Q4_K_S
- Lemonade
How to use MTSAIR/Kodify-Nano-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull MTSAIR/Kodify-Nano-GGUF:Q4_K_S
Run and chat with the model
lemonade run user.Kodify-Nano-GGUF-Q4_K_S
List all available models
lemonade list
Kodify-Nano-GGUF ๐ค
Kodify-Nano-GGUF - GGUF ะฒะตััะธั ะผะพะดะตะปะธ MTSAIR/Kodify-Nano, ะพะฟัะธะผะธะทะธัะพะฒะฐะฝะฝะฐั ะดะปั CPU/GPU-ะธะฝัะตัะตะฝัะฐ ะธ ะธัะฟะพะปัะทะพะฒะฐะฝะธะตะผ Ollama/llama.cpp. ะะตะณะบะพะฒะตัะฝะฐั LLM ะดะปั ะทะฐะดะฐั ัะฐะทัะฐะฑะพัะบะธ ะบะพะดะฐ ั ะผะธะฝะธะผะฐะปัะฝัะผะธ ัะตััััะฐะผะธ.
Kodify-Nano-GGUF - GGUF version of MTSAIR/Kodify-Nano, optimized for CPU/GPU inference with Ollama/llama.cpp. Lightweight LLM for code development tasks with minimal resource requirements.
Using the Image
You can run Kodify Nano on OLLAMA in two ways:
- Using Docker
- Locally (provides faster responses than Docker)
Method 1: Running Kodify Nano on OLLAMA in Docker
Without NVIDIA GPU:
docker run -e OLLAMA_HOST=0.0.0.0:8985 -p 8985:8985 --name ollama -d ollama/ollama
With NVIDIA GPU:
docker run --runtime nvidia -e OLLAMA_HOST=0.0.0.0:8985 -p 8985:8985 --name ollama -d ollama/ollama
Important:
- Ensure Docker is installed and running
- If port 8985 is occupied, replace it with any available port and update plugin configuration
Load the model:
docker exec ollama ollama pull hf.co/MTSAIR/Kodify-Nano-GGUF
Rename the model:
docker exec ollama ollama cp hf.co/MTSAIR/Kodify-Nano-GGUF kodify_nano
Start the model:
docker exec ollama ollama run kodify_nano
Method 2: Local Kodify Nano on OLLAMA
Download OLLAMA:
https://ollama.com/downloadSet the port:
export OLLAMA_HOST=0.0.0.0:8985
Note: If port 8985 is occupied, replace it and update plugin configuration
- Start OLLAMA server:
ollama serve &
- Download the model:
ollama pull hf.co/MTSAIR/Kodify-Nano-GGUF
- Rename the model:
ollama cp hf.co/MTSAIR/Kodify-Nano-GGUF kodify_nano
- Run the model:
ollama run kodify_nano
Plugin Installation
For Visual Studio Code
- Download the latest Kodify plugin for VS Code.
- Open the Extensions panel on the left sidebar.
- Click Install from VSIX... and select the downloaded plugin file.
For JetBrains IDEs
- Download the latest Kodify plugin for JetBrains.
- Open the IDE and go to Settings > Plugins.
- Click the gear icon (โ๏ธ) and select Install Plugin from Disk....
- Choose the downloaded plugin file.
- Restart the IDE when prompted.
Changing the Port in Plugin Settings (for Visual Studio Code and JetBrains)
If you changed the Docker port from 8985, update the plugin's config.json:
- Open any file in the IDE.
- Open the Kodify sidebar:
- VS Code:
Ctrl+L(Cmd+Lon Mac). - JetBrains:
Ctrl+J(Cmd+Jon Mac).
- VS Code:
- Access the
config.jsonfile:- Method 1: Click Open Settings (VS Code) or Kodify Config (JetBrains), then navigate to Configuration > Chat Settings > Open Config File.
- Method 2: Click the gear icon (โ๏ธ) in the Kodify sidebar.
- Modify the
apiBaseport undertabAutocompleteModelandmodels. - Save the file (
Ctrl+Sor File > Save).
Available quantization variants:
- Kodify_Nano_q4_k_s.gguf (balanced)
- Kodify_Nano_q8_0.gguf (high quality)
- Kodify_Nano.gguf (best quality, unquantized)
Download using huggingface_hub:
pip install huggingface-hub
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='MTSAIR/Kodify-Nano-GGUF', filename='Kodify_Nano_q4_k_s.gguf', local_dir='./models')"
Python Integration
Install Ollama Python library:
pip install ollama
Example code:
import ollama
response = ollama.generate(
model="kodify-nano",
prompt="Write a Python function to calculate factorial",
options={
"temperature": 0.4,
"top_p": 0.8,
"num_ctx": 8192
}
)
print(response['response'])
Usage Examples
response = ollama.generate(
model="kodify-nano",
prompt="""<s>[INST]
Write a Python function that:
1. Accepts a list of numbers
2. Returns the median value
[/INST]""",
options={"max_tokens": 512}
)
### Code Refactoring
response = ollama.generate(
model="kodify-nano",
prompt="""<s>[INST]
Refactor this Python code:
def calc(a,b):
s = a + b
d = a - b
p = a * b
return s, d, p
[/INST]""",
options={"temperature": 0.3}
)
- Downloads last month
- 181