See axolotl config
axolotl version: 0.13.0.dev0
# --- Base Model & Tokenizer Configuration ---
base_model: allenai/Olmo-3-1025-7B
trust_remote_code: true
hub_model_id: Auditt/O37BB # Push the model to the Hugging Face Hub
chat_template_jinja: /workspace/data/model-output/chat_template.jinja # Uses the template defined in tokenizer_config.json
# --- Dataset Configuration ---
# Assuming a standard conversation format (ShareGPT/ChatML style)
datasets:
- path: dataset-tfs-mk-IMP-SOS-processed-olmo3-think.jsonl
type: chat_template
field_messages: messages # The top-level key containing the list
message_field_role: role # The key inside the list for 'user'/'assistant'
message_field_content: content # The key inside the list for the actual text
# 4. MAP YOUR ROLES
# The keys (left) are what Axolotl expects.
# The values (right) are what exist in your raw JSONL file.
roles:
user: ["user"]
assistant: ["assistant"]
system: ["system"]
# 5. SUPERVISION
# This ensures loss is calculated ONLY on the "assistant" turns.
roles_to_train: ["assistant"]
val_set_size: 0.1 # 10% Validation, 90% Training
dataset_prepared_path: last_run_prepared
# --- Training Strategy ---
sequence_len: 60000 # Max sequence length
sample_packing: true # Efficiently packs samples to fill sequence_len
pad_to_sequence_len: true
# Supervision Settings
train_on_inputs: false # False = Mask User prompts (Supervise Assistant only)
group_by_length: false # Usually false when sample_packing is true
# --- Hyperparameters & Training Loop ---
num_epochs: 2
micro_batch_size: 1 # Keep small due to 60k context
gradient_accumulation_steps: 4 # Adjust based on desired global batch size
learning_rate: 0.00001
optimizer: adamw_torch
# --- Distributed Training & Memory ---
context_parallel_size: 2 # Splits the 60k sequence across 2 GPUs
gradient_checkpointing: true # Essential for 60k context
flash_attention: true # Essential for speed/memory at this length
# --- Logging & Evaluation ---
logging_steps: 1 # Log training loss every step
evals_per_epoch: 1 # Run eval 1 times per epoch (roughly)
#eval_strategy: epoch
#save_strategy: epoch # Save checkpoint at end of epoch
#wandb_project: olmo3-finetune # Optional: Weights & Biases logging
#wandb_entity: your-entity # Optional
output_dir: /workspace/data/model-output-base
# --- Precision ---
bf16: true # Bfloat16 is recommended for OLMo
fp16: false
tf32: true
tokens: # Add these to the tokenizer
- "π²"
- "πΎ"
- "γ"
- "π"
- "β"
- "π "
- "π"
- "πΈ"
- "β§"
- "β₯"
- "π"
- "π"
- "β"
- "π"
- "β"
- "π£"
- "π"
- "π"
- "π"
- "Ο"
- "π"
- "γ"
- "π"
- "π»"
- "π"
- "π³"
- "β "
- "π·"
- "β€"
- "π"
- "π±"
- "π"
- "β¦"
- "π"
- "β"
- "π"
- "π°"
- "Ξ΅"
O37BB
This model is a fine-tuned version of allenai/Olmo-3-1025-7B on the dataset-tfs-mk-IMP-SOS-processed-olmo3-think.jsonl dataset. It achieves the following results on the evaluation set:
- Loss: 0.0019
- Memory/max Active (gib): 85.95
- Memory/max Allocated (gib): 82.72
- Memory/device Reserved (gib): 93.36
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- total_eval_batch_size: 2
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- training_steps: 348
Training results
| Training Loss | Epoch | Step | Validation Loss | Active (gib) | Allocated (gib) | Reserved (gib) |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 1.0680 | 58.72 | 55.5 | 65.44 |
| 0.0647 | 0.9943 | 174 | 0.0021 | 85.95 | 82.72 | 106.04 |
| 0.0296 | 1.9943 | 348 | 0.0019 | 85.95 | 82.72 | 93.36 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.7.1+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 19
Model tree for Auditt/O37BB
Base model
allenai/Olmo-3-1025-7B