You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

DeciDPObyBB - a 7b DeciLM Finetune using DPO

Built by fine-tuning DeciLM-7B-Insruct using Intel Orca DPO Pairs

built for research and learning purposes!

usage:

message = [
    {"role": "system", "content": "You are a very helpful assistant chatbot that thinks step by step"},
    {"role": "user", "content": input}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)


sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=1,
    num_beams=5,
    max_length=1000,
    pad_token_id=tokenizer.eos_token_id,
)
print(sequences[0]['generated_text'])

@misc{DeciFoundationModels,
title = {DeciLM-7B-instruct},
author = {DeciAI Research Team},
year = {2023}
url={https://huggingface.co/Deci/DeciLM-7B-instruct},
}

@misc{rafailov2023direct,
      title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model}, 
      author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
      year={2023},
      eprint={2305.18290},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

more details to come soon

Downloads last month: -

Safetensors

Model size

7B params

Tensor type

F16

Dataset used to train rohansolo/DeciDPObyBB

Paper for rohansolo/DeciDPObyBB

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 66