Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (c1a21523b282e1e7c1fc587d5ac5c908d058c1e2)
- Update README.md (e39d245239f253df2cad1b2598e2b429869a5a44)
- Update README.md (86cb4c41db25fa2b09bba1d7fcab58e72540ab6b)

Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show

README.md +162 -48

README.md CHANGED Viewed

@@ -1,56 +1,53 @@
 ---
 license: apache-2.0
 base_model: Felladrin/Minueza-32M-Base
 pipeline_tag: text-generation
-language:
-  - en
-datasets:
-  - HuggingFaceH4/ultrachat_200k
-  - Felladrin/ChatML-ultrachat_200k
 widget:
-  - messages:
-      - role: system
-        content: >-
-          You are a career counselor. The user will provide you with an individual
-          looking for guidance in their professional life, and your task is to assist
-          them in determining what careers they are most suited for based on their skills,
-          interests, and experience. You should also conduct research into the various
-          options available, explain the job market trends in different industries, and
-          advice on which qualifications would be beneficial for pursuing particular fields.
-      - role: user
-        content: Heya!
-      - role: assistant
-        content: Hi! How may I help you?
-      - role: user
-        content: >-
-          I am interested in developing a career in software engineering. What
-          would you recommend me to do?
-  - messages:
-      - role: user
-        content: Morning!
-      - role: assistant
-        content: Good morning! How can I help you today?
-      - role: user
-        content: Could you give me some tips for becoming a healthier person?
-  - messages:
-      - role: user
-        content: Write the specs of a game about mages in a fantasy world.
-  - messages:
-      - role: user
-        content: Tell me about the pros and cons of social media.
-  - messages:
-      - role: system
-        content: >-
-          You are a highly knowledgeable and friendly assistant.
-          Your goal is to understand and respond to user inquiries with clarity.
-          Your interactions are always respectful, helpful, and focused on
-          delivering the most accurate information to the user.
-      - role: user
-        content: Hey! Got a question for you!
-      - role: assistant
-        content: Sure! What's it?
-      - role: user
-        content: What are some potential applications for quantum computing?
 inference:
   parameters:
     max_new_tokens: 250
@@ -59,6 +56,109 @@ inference:
     top_p: 0.55
     top_k: 35
     repetition_penalty: 1.176
 ---
 # Minueza-32M-UltraChat: A chat model with 32 million parameters
@@ -145,3 +245,17 @@ This model was trained with [SFTTrainer](https://huggingface.co/docs/trl/main/en
 | Optimizer              | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
 | Scheduler              | cosine                                        |
 | Seed                   | 42                                            |

 ---
+language:
+- en
 license: apache-2.0
+datasets:
+- HuggingFaceH4/ultrachat_200k
+- Felladrin/ChatML-ultrachat_200k
 base_model: Felladrin/Minueza-32M-Base
 pipeline_tag: text-generation
 widget:
+- messages:
+  - role: system
+    content: You are a career counselor. The user will provide you with an individual
+      looking for guidance in their professional life, and your task is to assist
+      them in determining what careers they are most suited for based on their skills,
+      interests, and experience. You should also conduct research into the various
+      options available, explain the job market trends in different industries, and
+      advice on which qualifications would be beneficial for pursuing particular fields.
+  - role: user
+    content: Heya!
+  - role: assistant
+    content: Hi! How may I help you?
+  - role: user
+    content: I am interested in developing a career in software engineering. What
+      would you recommend me to do?
+- messages:
+  - role: user
+    content: Morning!
+  - role: assistant
+    content: Good morning! How can I help you today?
+  - role: user
+    content: Could you give me some tips for becoming a healthier person?
+- messages:
+  - role: user
+    content: Write the specs of a game about mages in a fantasy world.
+- messages:
+  - role: user
+    content: Tell me about the pros and cons of social media.
+- messages:
+  - role: system
+    content: You are a highly knowledgeable and friendly assistant. Your goal is to
+      understand and respond to user inquiries with clarity. Your interactions are
+      always respectful, helpful, and focused on delivering the most accurate information
+      to the user.
+  - role: user
+    content: Hey! Got a question for you!
+  - role: assistant
+    content: Sure! What's it?
+  - role: user
+    content: What are some potential applications for quantum computing?
 inference:
   parameters:
     max_new_tokens: 250
     top_p: 0.55
     top_k: 35
     repetition_penalty: 1.176
+model-index:
+- name: Minueza-32M-UltraChat
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 21.08
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 26.95
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 26.08
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 47.7
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 51.78
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 0.23
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
+      name: Open LLM Leaderboard
 ---
 # Minueza-32M-UltraChat: A chat model with 32 million parameters
 | Optimizer              | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
 | Scheduler              | cosine                                        |
 | Seed                   | 42                                            |
+## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Minueza-32M-UltraChat)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |28.97|
+|AI2 Reasoning Challenge (25-Shot)|21.08|
+|HellaSwag (10-Shot)              |26.95|
+|MMLU (5-Shot)                    |26.08|
+|TruthfulQA (0-shot)              |47.70|
+|Winogrande (5-shot)              |51.78|
+|GSM8k (5-shot)                   | 0.23|