This model is a fine-tuned version of cl-nagoya/ruri-v3-70m for retrieving semantically segmented code snippets using natural language queries. The dataset used for fine-tuning consists of (Japanese and English queries) × (14 programming languages) × 1000 source files, resulting in a total of approximately 270,000 code snippet-query pairs after semantic segmentation.
Supported Natural Languages
Japanese, English
Supported Programming Languages
C, CSharp, Cpp, Go, Java, JavaScript, PHP, Python, Ruby, Rust, SQL, Bash, Swift, TypeScript
Example Usage
Please refer to the original model for more detail: https://huggingface.co/cl-nagoya/ruri-v3-70m
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("tokinasin/ruri-v3-70m-code-v0.2")
code_snippets = [
"""def fibonacci(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return a""",
"""import numpy as np
def normalize(v):
norm = np.linalg.norm(v)
return v / norm if norm != 0 else v""",
"""def is_prime(num):
if num < 2:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True"""
]
descriptions = [
"フィボナッチ数列を計算する関数",
"ベクトルを正規化するユーティリティ",
"整数が素数かどうか判定する関数"
]
# Encode
code_embeddings = model.encode(["検索文書: " + s for s in code_snippets], normalize_embeddings=True)
desc_embeddings = model.encode(["検索クエリ: " + s for s in descriptions], normalize_embeddings=True)
# Calculate similarities
similarities = util.cos_sim(desc_embeddings, code_embeddings)
# Print results
print("\nSimilarity Matrix (Description → Code):")
print("="*60)
print(f"{'Description':<40} | Code #1 Code #2 Code #3")
print("-"*60)
for i, desc in enumerate(descriptions):
scores = [f"{similarities[i][j]:.4f}" for j in range(3)]
best_match = similarities[i].argmax().item() + 1
print(f"{desc[:37]:<40} | {scores[0]} {scores[1]} {scores[2]} → Best: #{best_match}")
print("="*60)
# Check accuracy
correct = 0
for i in range(len(descriptions)):
if similarities[i].argmax().item() == i:
correct += 1
accuracy = correct / len(descriptions)
print(f"\nAccuracy@1: {accuracy:.2%} ({correct}/{len(descriptions)})")
Acknowledgements
This model is based on cl-nagoya/ruri-v3-70m. Many thanks to those who developed the excellent original model.
- Downloads last month
- 11
Model tree for tokinasin/ruri-v3-70m-code-v0.2
Base model
sbintuitions/modernbert-ja-70m
Finetuned
cl-nagoya/ruri-v3-pt-70m
Finetuned
cl-nagoya/ruri-v3-70m