Himeyuri v0.1 12B

drawing

Base Model

This model is built upon Mistral-Nemo-Instruct-2407

Usage Notes

Low Temperature Recommendation: As explained on Mistral-Nemo-Instruct-2407 repo, it's recommended to use a low temperature. I don't exactly know why, but my rough guess is that based on my experience, Mistral Nemo tends to switch the scene drastically so a low temperature can mitigate it, somewhat akin to "slow pace" style in AI Novelist.

Description

This is an experimental model with significant room for improvement.

Since Japanese Mistral 7B-based models appear to have plateaued, I've been exploring other architectures like LLaMA 3. Currently, I find Mistral-Nemo-Instruct-2407 the most promising for Japanese open LLMs, so I made this one.

Example

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Elizezen/Himeyuri-v0.1-12B")
model = AutoModelForCausalLM.from_pretrained(
  "Elizezen/Himeyuri-v0.1-12B",
  torch_dtype="auto",
)
model.eval()

if torch.cuda.is_available():
    model = model.to("cuda")

input_ids = tokenizer.encode(
    "ๅนดๆœซใฎใƒœใƒผใƒŠใ‚นใ‚’ๅ—ๅ–ใฃใฆๅŠ ๅฅˆๆฑŸใŒ็คพใ‹ใ‚‰ๅธฐใ‚ใ†ใจใ—ใŸใจใใงใ‚ใฃใŸใ€‚", 
    add_special_tokens=True, 
    return_tensors="pt"
)

tokens = model.generate(
    input_ids.to(device=model.device),
    max_new_tokens=512,
    temperature=0.35,
    top_p=1,
    do_sample=True,
)

out = tokenizer.decode(tokens[0][input_ids.shape[1]:], skip_special_tokens=True).strip()
print(out)

"""
output example:
 ๅพŒใ‚ใ‹ใ‚‰่‚ฉใ‚’ๅฉใ‹ใ‚ŒใฆๆŒฏใ‚Šๅ‘ใใจใ€ใใ“ใซใฏ่ฆ‹็Ÿฅใ‚‰ใฌ็”ทใŒ็ซ‹ใฃใฆใ„ใŸใ€‚ใฉใ“ใ‹ใง่ฆ‹ใŸใ“ใจใŒใ‚ใ‚‹ใ‚ˆใ†ใช้ก”็ซ‹ใกใ‚’ใ—ใฆใ„ใ‚‹ใ€‚็”ทใฏใซใ“ใ‚„ใ‹ใซ็ฌ‘ใฃใฆๅŠ ๅฅˆๆฑŸใซ่ฉฑใ—ใ‹ใ‘ใฆใใŸใ€‚
 ใ€Œใ“ใ‚“ใฐใ‚“ใฏใ€‚ๅŠ ๅฅˆๆฑŸใ•ใ‚“ใงใ™ใ‚ˆใญ๏ผŸใ€€ใกใ‚‡ใฃใจใŠ่ฉฑใ—ใ—ใŸใ„ใ“ใจใŒใ‚ใ‚‹ใ‚“ใงใ™ใ‘ใฉใ€ใ„ใ„ใงใ™ใ‹๏ผŸใ€
 ใใฎ็”ทใฎ่จ€่‘‰ใซๅŠ ๅฅˆๆฑŸใฏ้ฆ–ใ‚’ๅ‚พใ’ใชใŒใ‚‰ใ‚‚ใ€ใจใ‚Šใ‚ใˆใš่ฉฑใ‚’่žใใ“ใจใซใ—ใŸใ€‚ๅ‘จใ‚Šใซใฏ็คพๅ“กใŒใพใ ๆฎ‹ใฃใฆใ„ใ‚‹ใฎใงใ€ๅคงใใชๅ•้กŒใฏใชใ„ใ ใ‚ใ†ใจๆ€ใฃใŸใ‹ใ‚‰ใ ใ€‚
 ใ€Œไฝ•ใงใ—ใ‚‡ใ†๏ผŸใ€€็งใซไฝ•ใ‹็”จใงใ™ใ‹๏ผŸใ€
 ๅŠ ๅฅˆๆฑŸใŒใใ†่žใใจใ€็”ทใฏใซใ“ใ‚„ใ‹ใช็ฌ‘ใฟใ‚’ๆตฎใ‹ในใŸใพใพใ€ๅฝผๅฅณใฎ่€ณๅ…ƒใงๅฐๅฃฐใงๅ›ใ„ใŸใ€‚
 ใ€ŒๅฎŸใฏใ€ใ‚ใชใŸใฎๆ—ฆ้‚ฃใ•ใ‚“ใฎใ“ใจใงๅฐ‘ใ—ใŠ่ฉฑใ—ใ—ใŸใ„ใ“ใจใŒใ‚ใ‚‹ใ‚“ใงใ™ใ€‚ๅ ดๆ‰€ใ‚’ๅค‰ใˆใฆ่ฉฑใ‚’่žใ„ใฆใ„ใŸใ ใ‘ใพใ›ใ‚“ใ‹๏ผŸใ€
 ใใฎ่จ€่‘‰ใซๅŠ ๅฅˆๆฑŸใฏ้ฉšใใฎ่กจๆƒ…ใ‚’ๆตฎใ‹ในใŸใ€‚ๅคซใฎใ“ใจใฏ็คพๅ†…ใงใฏใ‚ใพใ‚Š่ฉฑใ—ใŸใใชใ„ใ€‚ใ—ใ‹ใ—ใ€็”ทใฎๆง˜ๅญใ‹ใ‚‰ใฏใ€ๅคซใฎ่บซใซไฝ•ใ‹ใ‚ใฃใŸใฎใงใฏใชใ„ใ‹ใจใ„ใ†ไธๅฎ‰ใŒใ‚ˆใŽใฃใŸใ€‚
 ใ€Œๅˆ†ใ‹ใ‚Šใพใ—ใŸใ€‚ใ˜ใ‚ƒใ‚ใ€่ฟ‘ใใฎๅ–ซ่Œถๅบ—ใซ่กŒใใพใ—ใ‚‡ใ†ใ€
 ๅŠ ๅฅˆๆฑŸใŒใใ†่จ€ใ†ใจใ€็”ทใฏ้ ทใ„ใฆๅฝผๅฅณใฎๅพŒใซใคใ„ใฆใใŸใ€‚

ไบŒไบบใฏ่ฟ‘ใใฎๅ–ซ่Œถๅบ—ใซๅ…ฅใ‚Šใ€ๅฅฅใฎๅธญใซ่…ฐใ‚’ไธ‹ใ‚ใ—ใŸใ€‚
 ใ€Œใใ‚Œใงใ€ๅคซใฎ่บซใซไฝ•ใŒใ‚ใฃใŸใ‚“ใงใ™ใ‹๏ผŸใ€
 ๅŠ ๅฅˆๆฑŸใฏ็”ทใฎ้ก”ใ‚’่ฆ‹ใชใŒใ‚‰ใ€ไธๅฎ‰ใ’ใซ่žใ„ใŸใ€‚็”ทใฏใ‚ณใƒผใƒ’ใƒผใ‚’ไธ€ๅฃ้ฃฒใ‚€ใจใ€ๅฝผๅฅณใซๅ‘ใ‹ใฃใฆ่ฉฑใ—ๅง‹ใ‚ใŸใ€‚
 ใ€ŒๅฎŸใฏใ€ใ‚ใชใŸใฎๆ—ฆ้‚ฃใ•ใ‚“ใฏใ€ใ‚ใ‚‹็ง˜ๅฏ†ใฎ็ต„็น”ใซๆ‰€ๅฑžใ—ใฆใ„ใ‚‹ใ‚“ใงใ™ใ€‚ใใฎ็ต„็น”ใฏใ€ไธ–็•Œไธญใซๅฝฑ้ŸฟๅŠ›ใ‚’ๆŒใคใ‚ˆใ†ใชๅคง่ฆๆจกใช็ต„็น”ใงใ€ๆง˜ใ€…ใชๅ›ฝๅฎถใฎ่ฆไบบใ‚„ๆœ‰ๅŠ›่€…ใŸใกใŒๅŠ ๅ…ฅใ—ใฆใ„ใ‚‹ใจ่จ€ใ‚ใ‚Œใฆใ„ใพใ™ใ€‚ๆ—ฆ้‚ฃใ•ใ‚“ใ‚‚ใใฎไธ€ๅ“กใงใ€ใ‹ใชใ‚Š้ซ˜ใ„ๅœฐไฝใซใ„ใ‚‹ใ‚ˆใ†ใงใ™ใ€
 ใใฎ่จ€่‘‰ใซๅŠ ๅฅˆๆฑŸใฏ็›ฎใ‚’ไธธใใ—ใŸใ€‚ๅคซใŒใใ‚“ใช็ต„็น”ใซๆ‰€ๅฑžใ—ใฆใ„ใ‚‹ใ ใชใ‚“ใฆใ€ๅˆ่€ณใงใ‚ใ‚‹ใ€‚
 ใ€Œใใ‚“ใช้ฆฌ้นฟใชใ€‚ใ†ใกใฎไบบใฏใŸใ ใฎใ‚ตใƒฉใƒชใƒผใƒžใƒณใงใ™ใ‚ˆใ€‚ใใ‚“ใช
"""

Intended Use

Primarily designed for novel generation. Not optimized for:

  • Role-playing (RP) scenarios
  • Instruction-based responses
Downloads last month
126
Safetensors
Model size
12B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Elizezen/Himeyuri-v0.1-12B

Merges
24 models
Quantizations
3 models

Spaces using Elizezen/Himeyuri-v0.1-12B 4