High-level developer guide
High-level user guide
Troubleshooting
num_processes=1 on a 2-gpu system causes accelerate to crash with
ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=1` or by launching with `python {{myscript.py}}`.
While on a 1-gpu node (with or without any num_processes arg), it works. This is because ACCELERATE_USE_CPU=False env var, which causes device_map to be unset.
def choose_device(cfg):
def get_device():
try:
if torch.cuda.is_available():
return f"cuda:{cfg.local_rank}"
if torch.backends.mps.is_available():
return "mps"
raise SystemError("No CUDA/mps device found")
except Exception: # pylint: disable=broad-exception-caught
return "cpu"
cfg.device = get_device()
if cfg.world_size == 1:
cfg.device_map = cfg.device_map or "auto"
else:
if cfg.device.startswith("cuda"):
cfg.device_map = {"": torch.cuda.current_device()}
else:
cfg.device_map = {"": cfg.device}
# in `accelerate launch`, we need to not pass through any device map and let
# accelerate figure out which parts of the model to put on which gpu
accelerate_vars = [var for var in os.environ if var.startswith("ACCELERATE_USE_")]
print('!', cfg.device_map, accelerate_vars)
if accelerate_vars:
cfg.device_map = None
def normalize_config(cfg):
# setup some derived config / hyperparams
cfg.gradient_accumulation_steps = cfg.gradient_accumulation_steps or (
cfg.batch_size // cfg.micro_batch_size
)
cfg.batch_size = (
cfg.batch_size or cfg.micro_batch_size * cfg.gradient_accumulation_steps
)
if cfg.eval_batch_size is None:
cfg.eval_batch_size = cfg.micro_batch_size
cfg.world_size = int(os.environ.get("WORLD_SIZE", 1))
cfg.local_rank = int(os.environ.get("LOCAL_RANK", 0))
cfg.eval_table_size = cfg.eval_table_size or 0
cfg.eval_table_max_new_tokens = cfg.eval_table_max_new_tokens or 128
choose_device(cfg)
cfg.ddp = cfg.ddp if cfg.ddp is not None else cfg.world_size != 1
if cfg.ddp:
cfg.device_map = {"": int(os.environ.get("LOCAL_RANK", 0))}
cfg.batch_size = cfg.batch_size * cfg.world_size
print('!',cfg.ddp, cfg.device_map, cfg.world_size)
Can instead set CUDA_VISIBLE_DEVICES=0 which just works
Scratch observations
Downloads data
Estimates total progress time somehow? Really nice!
Something to do with this?
[2023-12-29 02:48:39,409] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:195] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 188373
While training openllama-3b/lora
While training openllama-3b/qlora
When training openllama-3b/config, on 2 GPUs:
# run 1, just ddp
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 1; 23.69 GiB total capacity; 22.87 GiB already allocated; 2.81 MiB free; 23.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
# run 2
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 1; 23.68 GiB total capacity; 22.85 GiB already allocated; 26.50 MiB free; 23.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
# with deepspeed/zero1
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.38 GiB (GPU 0; 23.69 GiB total capacity; 19.17 GiB already allocated; 4.14 GiB free; 19.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
# with deepspeed/zero2 works...
On deepspeed/zero2, does use less VRAM than without deepspeed on 1x (below). Note it’s also half the total num steps as 1x, due to the 2x card parallelism.
On 1x (without deepspeed) it works:
163f6110:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/openllama-3b/config.yml [738/907]
/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/utils/hub.py:123: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated[and/will
be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1` [815/907]
`--num_machines` was set to a value of `1` [814/907]
`--mixed_precision` was set to a value of `'no'` [813/907]
`--dynamo_backend` was set to a value of `'no'` [812/907]
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/utils/hub.py:123: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated[and/will
be removed in v5 of Transformers. Use `HF_HOME` instead. [809/907]
warnings.warn(
/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and wi
ll be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-30 05:05:18,207] [INFO] [datasets.<module>:58] [PID:298] PyTorch version 2.0.1+cu118 available.
[2023-12-30 05:05:19,114] [INFO] [axolotl.validate_config:156] [PID:298] [RANK:0] bf16 support detected, but not enabled for this configuration.
[2023-12-30 05:05:19,114] [WARNING] [axolotl.validate_config:176] [PID:298] [RANK:0] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-30 05:05:19,284] [INFO] [axolotl.normalize_config:150] [PID:298] [RANK:0] GPU memory usage baseline: 0.000GB (+0.312GB misc)
dP dP dP
88 88 88
.d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88
88' `88 `8bd8' 88' `88 88 88' `88 88 88
88. .88 .d88b. 88. .88 88 88. .88 88 88
`88888P8 dP' `dP `88888P' dP `88888P' dP dP
[2023-12-30 05:05:19,286] [WARNING] [axolotl.scripts.check_user_token:342] [PID:298] [RANK:0] Error verifying HuggingFace token. Remember to log in using
`huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply mea
ns that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should o
nly be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers>
/pull/24565
[2023-12-30 05:05:19,598] [DEBUG] [axolotl.load_tokenizer:184] [PID:298] [RANK:0] EOS: 2 / </s>
[2023-12-30 05:05:19,598] [DEBUG] [axolotl.load_tokenizer:185] [PID:298] [RANK:0] BOS: 1 / <s>
[2023-12-30 05:05:19,598] [DEBUG] [axolotl.load_tokenizer:186] [PID:298] [RANK:0] PAD: 2 / </s>
[2023-12-30 05:05:19,598] [DEBUG] [axolotl.load_tokenizer:187] [PID:298] [RANK:0] UNK: 0 / <unk>
[2023-12-30 05:05:19,598] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:298] [RANK:0] Unable to find prepared dataset in last_run_prepared/f
9e5091071bf5ab6f7287bd5565a5f24
[2023-12-30 05:05:19,598] [INFO] [axosotl.load_tokenized_prepared_datasets:148] [PID:298] [RANK:0] Loading raw datasets...
[2023-12-30 05:05:19,598] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:298] [RANK:0] No seed provided, using default seed of 42
/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting C
ardData to empty.
trwarnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
[2023-12-30 05:05:23,069] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:298] [RANK:0] merging datasets
[2023-12-30 05:05:23,075] [INFO] [axolotl.load_tokenized_prepared_datasets:369] [PID:298] [RANK:0] Saving merged prepared dataset to disk... last_run_pre
pared/f9e5091071bf5ab6f7287bd5565a5f24
Saving the dataset (1/1 shards): 100%|__________________________________________________________________| 54568/54568 [00:00<00:00, 393168.49 examples/s]
[2023-12-30 05:05:24,422] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] total_num_tokens: 188373
[2023-12-30 05:05:24,430] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] `total_supervised_tokens: 38104`
[2023-12-30 05:05:30,373] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:298] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per
device: 188373
[2023-12-30 05:05:30,373] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] data_loader_len: 181
[2023-12-30 05:05:30,374] [INFO] [axolotl.log:60] [PID:298] [RANK:0] sample_packing_eff_est across ranks: [0.9017549402573529]
[2023-12-30 05:05:30,374] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] sample_packing_eff_est: None
[2023-12-30 05:05:30,374] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] total_num_steps: 724
[2023-12-30 05:05:30,421] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] total_num_tokens: 10733491
[2023-12-30 05:05:30,786] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] `total_supervised_tokens: 6735490`
[2023-12-30 05:05:30,982] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:298] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per
device: 10733491
[2023-12-30 05:05:30,982] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] data_loader_len: 10376
[2023-12-30 05:05:30,982] [INFO] [axolotl.log:60] [PID:298] [RANK:0] sample_packing_eff_est across ranks: [0.8629229278577015]
[2023-12-30 05:05:30,982] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] sample_packing_eff_est: 0.87
[2023-12-30 05:05:30,982] [DEBUG] [axolotl.log:60] [PID:298] [RANK:0] total_num_steps: 41504
[2023-12-30 05:05:30,990] [DEBUG] [axolotl.train.log:60] [PID:298] [RANK:0] loading tokenizer... openlm-research/open_llama_3b_v2
[2023-12-30 05:05:31,307] [DEBUG] [axolotl.load_tokenizer:184] [PID:298] [RANK:0] EOS: 2 / </s>
[2023-12-30 05:05:31,307] [DEBUG] [axolotl.load_tokenizer:185] [PID:298] [RANK:0] BOS: 1 / <s>
[2023-12-30 05:05:31,307] [DEBUG] [axolotl.load_tokenizer:186] [PID:298] [RANK:0] PAD: 2 / </s>
[2023-12-30 05:05:31,307] [DEBUG] [axolotl.load_tokenizer:187] [PID:298] [RANK:0] UNK: 0 / <unk>
[ssh_tmux]0:[tmux]* "1f01163f6110" 05:07 30-Dec-23
Has a deterministic total number of iterations, which is the same for the same # GPUs + same dataset - 22,072 on both g5.xlarge (A10G) and runpod RTX 3090
On a 8x RTX 3090, this is divided by 8
No automatic checkpointing happening by default
Some important log lines
[2023-12-29 03:37:18,194] [DEBUG] [axolotl.log:60] [PID:1983] [RANK:0] total_num_tokens: 188373
[2023-12-29 03:37:18,202] [DEBUG] [axolotl.log:60] [PID:1983] [RANK:0] `total_supervised_tokens: 38104`
[2023-12-29 03:37:24,256] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:1983] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens pe
r device: 23546...
[2023-12-29 03:37:25,462] [DEBUG] [axolotl.log:60] [PID:1983] [RANK:0] total_num_steps: 43
[2023-12-29 03:37:25,509] [DEBUG] [axolotl.log:60] [PID:1983] [RANK:0] total_num_tokens: 10733491
[2023-12-29 03:37:25,864] [DEBUG] [axolotl.log:60] [PID:1983] [RANK:0] `total_supervised_tokens: 6735490
...
[2023-12-29 03:37:26,107] [DEBUG] [axolotl.log:60] [PID:1983] [RANK:0] total_num_steps: 2591
...
[2023-12-29 03:38:28,484] [INFO] [axolotl.load_model:503] [PID:1984] [RANK:1] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
...
Running mistral full finetune crashes with OOM on the GPUs, even though each GPU has 24GB
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 38, in <module>
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/workspace/axolotl/src/axolotl/train.py", line 136, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1672, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/accelerator.py", line 1288, in prepare
result = tuple(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/accelerator.py", line 1289, in <genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/accelerator.py", line 1094, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/accelerator.py", line 1433, in prepare_model
model = torch.nn.parallel.DistributedDataParallel(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 688, in __init__
self._ddp_init_helper(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 825, in _ddp_init_helper
self.reducer = dist.Reducer(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.00 GiB (GPU 2; 23.69 GiB total capacity; 14.48 GiB already allocated; 8.07 GiB free
; 14.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentat
ion for Memory Management and PYTORCH_CUDA_ALLOC_CONF
root@e79203620abc:/workspace/axolotl# nvidia-smi
Fri Dec 29 03:33:59 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 30% 28C P8 25W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:23:00.0 Off | N/A |
| 30% 28C P8 21W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 30% 29C P8 19W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... On | 00000000:61:00.0 Off | N/A |
| 30% 27C P8 27W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce ... On | 00000000:81:00.0 Off | N/A |
| 30% 27C P8 21W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce ... On | 00000000:A1:00.0 Off | N/A |
| 30% 29C P8 25W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce ... On | 00000000:C1:00.0 Off | N/A |
| 30% 27C P8 23W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce ... On | 00000000:E1:00.0 Off | N/A |
| 30% 28C P8 24W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
OOM when trying 5,6,7,8x on examples/llama-2/fft or mistral fft in a 124GB RAM box
OOVM when trying 4x 24G GPU on mistral fft
OOVM when trying 6x 24G GPU on mistral fft
llama-2/fft seems to get close to working on 6x 24 GPU with 192G RAM—not sure, got stuck somewhere early during pack estimation
Full output of a fast run:
-- RUNPOD.IO --
Enjoy your Pod #eg6eikcnopftyk ^_^
root@8510995a57b3:/workspace/axolotl#
root@8510995a57b3:/workspace/axolotl# nvidia-smi
Fri Dec 29 02:01:48 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 30% 27C P8 25W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:23:00.0 Off | N/A |
| 30% 27C P8 21W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 30% 28C P8 19W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... On | 00000000:61:00.0 Off | N/A |
| 30% 27C P8 27W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce ... On | 00000000:81:00.0 Off | N/A |
| 30% 27C P8 20W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce ... On | 00000000:A1:00.0 Off | N/A |
| 30% 29C P8 25W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce ... On | 00000000:C1:00.0 Off | N/A |
| 30% 27C P8 23W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce ... On | 00000000:E1:00.0 Off | N/A |
| 30% 27C P8 24W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@8510995a57b3:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `8`
More than one GPU was found, enabling multi-GPU training.
If this was unintended please pass in `--num_processes=1`.
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 02:02:14,078] [INFO] [datasets.<module>:58] [PID:162] PyTorch version 2.0.1+cu118 available.
[2023-12-29 02:02:14,080] [INFO] [datasets.<module>:58] [PID:168] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 02:02:14,110] [INFO] [datasets.<module>:58] [PID:164] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 02:02:14,159] [INFO] [datasets.<module>:58] [PID:166] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 02:02:14,168] [INFO] [datasets.<module>:58] [PID:163] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 02:02:14,188] [INFO] [datasets.<module>:58] [PID:161] PyTorch version 2.0.1+cu118 available.
[2023-12-29 02:02:14,209] [INFO] [datasets.<module>:58] [PID:165] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 02:02:14,278] [INFO] [datasets.<module>:58] [PID:167] PyTorch version 2.0.1+cu118 available.
[2023-12-29 02:02:15,083] [INFO] [axolotl.validate_config:156] [PID:162] [RANK:1] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,083] [WARNING] [axolotl.validate_config:176] [PID:162] [RANK:1] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,090] [INFO] [axolotl.validate_config:156] [PID:168] [RANK:7] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,091] [WARNING] [axolotl.validate_config:176] [PID:168] [RANK:7] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,092] [INFO] [axolotl.validate_config:156] [PID:164] [RANK:3] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,092] [WARNING] [axolotl.validate_config:176] [PID:164] [RANK:3] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,150] [INFO] [axolotl.validate_config:156] [PID:166] [RANK:5] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,150] [WARNING] [axolotl.validate_config:176] [PID:166] [RANK:5] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,166] [INFO] [axolotl.validate_config:156] [PID:161] [RANK:0] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,166] [WARNING] [axolotl.validate_config:176] [PID:161] [RANK:0] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,196] [INFO] [axolotl.validate_config:156] [PID:163] [RANK:2] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,196] [WARNING] [axolotl.validate_config:176] [PID:163] [RANK:2] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,266] [INFO] [axolotl.validate_config:156] [PID:167] [RANK:6] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,266] [WARNING] [axolotl.validate_config:176] [PID:167] [RANK:6] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,320] [INFO] [axolotl.validate_config:156] [PID:165] [RANK:4] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,320] [WARNING] [axolotl.validate_config:176] [PID:165] [RANK:4] `pad_to_sequence_len: true` is recommended when using sample_packing
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 506/506 [00:00<00:00, 78.3kB/s]
[2023-12-29 02:02:15,389] [INFO] [axolotl.normalize_config:150] [PID:162] [RANK:1] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,402] [INFO] [axolotl.normalize_config:150] [PID:168] [RANK:7] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,402] [INFO] [axolotl.normalize_config:150] [PID:164] [RANK:3] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,413] [INFO] [axolotl.normalize_config:150] [PID:166] [RANK:5] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,417] [INFO] [axolotl.normalize_config:150] [PID:163] [RANK:2] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,436] [INFO] [axolotl.normalize_config:150] [PID:167] [RANK:6] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,436] [INFO] [axolotl.normalize_config:150] [PID:161] [RANK:0] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,492] [INFO] [axolotl.normalize_config:150] [PID:165] [RANK:4] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,496] [WARNING] [axolotl.scripts.check_user_token:342] [PID:165] [RANK:4] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,497] [WARNING] [axolotl.scripts.check_user_token:342] [PID:168] [RANK:7] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,498] [WARNING] [axolotl.scripts.check_user_token:342] [PID:166] [RANK:5] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,500] [WARNING] [axolotl.scripts.check_user_token:342] [PID:167] [RANK:6] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
dP dP dP
88 88 88
.d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88
88' `88 `8bd8' 88' `88 88 88' `88 88 88
88. .88 .d88b. 88. .88 88 88. .88 88 88
`88888P8 dP' `dP `88888P' dP `88888P' dP dP
[2023-12-29 02:02:15,500] [WARNING] [axolotl.scripts.check_user_token:342] [PID:161] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,501] [WARNING] [axolotl.scripts.check_user_token:342] [PID:163] [RANK:2] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,503] [WARNING] [axolotl.scripts.check_user_token:342] [PID:162] [RANK:1] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,503] [WARNING] [axolotl.scripts.check_user_token:342] [PID:164] [RANK:3] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 593/593 [00:00<00:00, 87.7kB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 512k/512k [00:00<00:00, 47.3MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 330/330 [00:00<00:00, 284kB/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
[2023-12-29 02:02:16,769] [DEBUG] [axolotl.load_tokenizer:184] [PID:164] [RANK:3] EOS: 2 / </s>
[2023-12-29 02:02:16,770] [DEBUG] [axolotl.load_tokenizer:185] [PID:164] [RANK:3] BOS: 1 / <s>
[2023-12-29 02:02:16,770] [DEBUG] [axolotl.load_tokenizer:186] [PID:164] [RANK:3] PAD: 2 / </s>
[2023-12-29 02:02:16,770] [DEBUG] [axolotl.load_tokenizer:187] [PID:164] [RANK:3] UNK: 0 / <unk>
[2023-12-29 02:02:16,774] [DEBUG] [axolotl.load_tokenizer:184] [PID:165] [RANK:4] EOS: 2 / </s>
[2023-12-29 02:02:16,774] [DEBUG] [axolotl.load_tokenizer:185] [PID:165] [RANK:4] BOS: 1 / <s>
[2023-12-29 02:02:16,774] [DEBUG] [axolotl.load_tokenizer:186] [PID:165] [RANK:4] PAD: 2 / </s>
[2023-12-29 02:02:16,775] [DEBUG] [axolotl.load_tokenizer:187] [PID:165] [RANK:4] UNK: 0 / <unk>
[2023-12-29 02:02:16,805] [DEBUG] [axolotl.load_tokenizer:184] [PID:166] [RANK:5] EOS: 2 / </s>
[2023-12-29 02:02:16,805] [DEBUG] [axolotl.load_tokenizer:185] [PID:166] [RANK:5] BOS: 1 / <s>
[2023-12-29 02:02:16,805] [DEBUG] [axolotl.load_tokenizer:186] [PID:166] [RANK:5] PAD: 2 / </s>
[2023-12-29 02:02:16,805] [DEBUG] [axolotl.load_tokenizer:187] [PID:166] [RANK:5] UNK: 0 / <unk>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:184] [PID:161] [RANK:0] EOS: 2 / </s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:184] [PID:168] [RANK:7] EOS: 2 / </s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:185] [PID:161] [RANK:0] BOS: 1 / <s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:186] [PID:161] [RANK:0] PAD: 2 / </s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:185] [PID:168] [RANK:7] BOS: 1 / <s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:186] [PID:168] [RANK:7] PAD: 2 / </s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:187] [PID:161] [RANK:0] UNK: 0 / <unk>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:187] [PID:168] [RANK:7] UNK: 0 / <unk>
[2023-12-29 02:02:16,815] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:161] [RANK:0] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:16,815] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:161] [RANK:0] Loading raw datasets...
[2023-12-29 02:02:16,815] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:161] [RANK:0] No seed provided, using default seed of 42
[2023-12-29 02:02:16,817] [DEBUG] [axolotl.load_tokenizer:184] [PID:162] [RANK:1] EOS: 2 / </s>
[2023-12-29 02:02:16,817] [DEBUG] [axolotl.load_tokenizer:185] [PID:162] [RANK:1] BOS: 1 / <s>
[2023-12-29 02:02:16,817] [DEBUG] [axolotl.load_tokenizer:186] [PID:162] [RANK:1] PAD: 2 / </s>
[2023-12-29 02:02:16,817] [DEBUG] [axolotl.load_tokenizer:187] [PID:162] [RANK:1] UNK: 0 / <unk>
[2023-12-29 02:02:16,823] [DEBUG] [axolotl.load_tokenizer:184] [PID:163] [RANK:2] EOS: 2 / </s>
[2023-12-29 02:02:16,823] [DEBUG] [axolotl.load_tokenizer:185] [PID:163] [RANK:2] BOS: 1 / <s>
[2023-12-29 02:02:16,823] [DEBUG] [axolotl.load_tokenizer:186] [PID:163] [RANK:2] PAD: 2 / </s>
[2023-12-29 02:02:16,823] [DEBUG] [axolotl.load_tokenizer:187] [PID:163] [RANK:2] UNK: 0 / <unk>
[2023-12-29 02:02:16,824] [DEBUG] [axolotl.load_tokenizer:184] [PID:167] [RANK:6] EOS: 2 / </s>
[2023-12-29 02:02:16,824] [DEBUG] [axolotl.load_tokenizer:185] [PID:167] [RANK:6] BOS: 1 / <s>
[2023-12-29 02:02:16,824] [DEBUG] [axolotl.load_tokenizer:186] [PID:167] [RANK:6] PAD: 2 / </s>
[2023-12-29 02:02:16,824] [DEBUG] [axolotl.load_tokenizer:187] [PID:167] [RANK:6] UNK: 0 / <unk>
Downloading readme: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 501/501 [00:00<00:00, 2.40MB/s]
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 36.0M/36.0M [00:01<00:00, 31.6MB/s]
Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 4.91M/4.91M [00:00<00:00, 18.3MB/s]
Generating train split: 54568 examples [00:00, 113061.60 examples/s]
Map (num_proc=64): 82%|██████████████████████████████████████████████████████████████████▋ | 44909/54568 [00:01<00:00, 36136.89 examples/s][2023-12-29 02:02:24,963] [WARNING] [axolotl._tokenize:66] [PID:350] [RANK:0] Empty text requested for tokenization.
Map (num_proc=64): 100%|█████████████████████████████████████████████████████████████████████████████████| 54568/54568 [00:02<00:00, 26150.53 examples/s]
[2023-12-29 02:02:25,698] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:161] [RANK:0] merging datasets
[2023-12-29 02:02:25,704] [INFO] [axolotl.load_tokenized_prepared_datasets:369] [PID:161] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
Saving the dataset (1/1 shards): 100%|██████████████████████████████████████████████████████████████████| 54568/54568 [00:00<00:00, 549048.67 examples/s]
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:165] [RANK:4] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:167] [RANK:6] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:162] [RANK:1] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:165] [RANK:4] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:165] [RANK:4] No seed provided, using default seed of 42
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:168] [RANK:7] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:167] [RANK:6] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:162] [RANK:1] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:163] [RANK:2] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:167] [RANK:6] No seed provided, using default seed of 42
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:162] [RANK:1] No seed provided, using default seed of 42
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:168] [RANK:7] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:166] [RANK:5] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:163] [RANK:2] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:168] [RANK:7] No seed provided, using default seed of 42
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:163] [RANK:2] No seed provided, using default seed of 42
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:166] [RANK:5] Loading raw datasets...
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:166] [RANK:5] No seed provided, using default seed of 42
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:164] [RANK:3] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:164] [RANK:3] Loading raw datasets...
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:164] [RANK:3] No seed provided, using default seed of 42
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Filter (num_proc=96): 65%|█████████████████████████████████████████████████▋ | 34538/53476 [00:00<00:00, 102958.81 examples/s][2023-12-29 02:02:30,804] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:167] [RANK:6] merging datasets
[2023-12-29 02:02:30,820] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:162] [RANK:1] merging datasets
[2023-12-29 02:02:30,849] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:164] [RANK:3] merging datasets
Filter (num_proc=96): 100%|██████████████████████████████████████████████████████████████████████████████| 53476/53476 [00:00<00:00, 77709.35 examples/s]
[2023-12-29 02:02:30,893] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:168] [RANK:7] merging datasets
[2023-12-29 02:02:30,994] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:166] [RANK:5] merging datasets
[2023-12-29 02:02:31,001] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:163] [RANK:2] merging datasets
[2023-12-29 02:02:31,092] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:165] [RANK:4] merging datasets
Filter (num_proc=96): 100%|█████████████████████████████████████████████████████████████████████████████████| 1092/1092 [00:00<00:00, 2167.47 examples/s]
Map (num_proc=96): 100%|█████████████████████████████████████████████████████████████████████████████████| 53476/53476 [00:01<00:00, 39921.95 examples/s]
Map (num_proc=96): 100%|████████████████████████████████████████████████████████████████████████████████████| 1092/1092 [00:00<00:00, 1936.55 examples/s]
[2023-12-29 02:02:46,895] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] total_num_tokens: 188373
[2023-12-29 02:02:46,903] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] `total_supervised_tokens: 38104`
[2023-12-29 02:02:52,372] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:52,372] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] data_loader_len: 87
[2023-12-29 02:02:53,430] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,468] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,473] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,473] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,475] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,599] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,832] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,873] [INFO] [axolotl.log:60] [PID:161] [RANK:0] sample_packing_eff_est across ranks: [0.9385612607002258, 0.9482371807098389, 0.9482371807098389, 0.9482371807098389, 0.9385612607002258, 0.9385612607002258, 0.9482371807098389, 0.9385612607002258]
[2023-12-29 02:02:53,878] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] sample_packing_eff_est: None
[2023-12-29 02:02:53,878] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] total_num_steps: 43
[2023-12-29 02:02:53,920] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] total_num_tokens: 10733491
[2023-12-29 02:02:54,255] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] `total_supervised_tokens: 6735490`
[2023-12-29 02:02:54,408] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,408] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] data_loader_len: 5183
[2023-12-29 02:02:54,470] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,470] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,471] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,473] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,477] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,482] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,484] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,485] [INFO] [axolotl.log:60] [PID:161] [RANK:0] sample_packing_eff_est across ranks: [0.9308992028236389, 0.9320580363273621, 0.9318923354148865, 0.9327215552330017, 0.9300732016563416, 0.9310645461082458, 0.9299081563949585, 0.9308992028236389]
[2023-12-29 02:02:54,490] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] sample_packing_eff_est: 0.94
[2023-12-29 02:02:54,490] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] total_num_steps: 2591
[2023-12-29 02:02:54,496] [DEBUG] [axolotl.train.log:60] [PID:161] [RANK:0] loading tokenizer... openlm-research/open_llama_3b_v2
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.load_tokenizer:184] [PID:161] [RANK:0] EOS: 2 / </s>
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.load_tokenizer:185] [PID:161] [RANK:0] BOS: 1 / <s>
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.load_tokenizer:186] [PID:161] [RANK:0] PAD: 2 / </s>
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.load_tokenizer:187] [PID:161] [RANK:0] UNK: 0 / <unk>
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.train.log:60] [PID:161] [RANK:0] loading model and peft_config...
[2023-12-29 02:02:54,827] [DEBUG] [axolotl.load_tokenizer:184] [PID:162] [RANK:1] EOS: 2 / </s>
[2023-12-29 02:02:54,827] [DEBUG] [axolotl.load_tokenizer:185] [PID:162] [RANK:1] BOS: 1 / <s>
[2023-12-29 02:02:54,827] [DEBUG] [axolotl.load_tokenizer:186] [PID:162] [RANK:1] PAD: 2 / </s>
[2023-12-29 02:02:54,827] [DEBUG] [axolotl.load_tokenizer:187] [PID:162] [RANK:1] UNK: 0 / <unk>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:184] [PID:163] [RANK:2] EOS: 2 / </s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:185] [PID:163] [RANK:2] BOS: 1 / <s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:186] [PID:163] [RANK:2] PAD: 2 / </s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:187] [PID:163] [RANK:2] UNK: 0 / <unk>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:184] [PID:168] [RANK:7] EOS: 2 / </s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:185] [PID:168] [RANK:7] BOS: 1 / <s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:186] [PID:168] [RANK:7] PAD: 2 / </s>
[2023-12-29 02:02:54,829] [DEBUG] [axolotl.load_tokenizer:187] [PID:168] [RANK:7] UNK: 0 / <unk>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:184] [PID:165] [RANK:4] EOS: 2 / </s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:185] [PID:165] [RANK:4] BOS: 1 / <s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:186] [PID:165] [RANK:4] PAD: 2 / </s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:187] [PID:165] [RANK:4] UNK: 0 / <unk>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:184] [PID:167] [RANK:6] EOS: 2 / </s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:185] [PID:167] [RANK:6] BOS: 1 / <s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:186] [PID:167] [RANK:6] PAD: 2 / </s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:187] [PID:167] [RANK:6] UNK: 0 / <unk>
[2023-12-29 02:02:54,842] [DEBUG] [axolotl.load_tokenizer:184] [PID:166] [RANK:5] EOS: 2 / </s>
[2023-12-29 02:02:54,842] [DEBUG] [axolotl.load_tokenizer:185] [PID:166] [RANK:5] BOS: 1 / <s>
[2023-12-29 02:02:54,842] [DEBUG] [axolotl.load_tokenizer:186] [PID:166] [RANK:5] PAD: 2 / </s>
[2023-12-29 02:02:54,842] [DEBUG] [axolotl.load_tokenizer:187] [PID:166] [RANK:5] UNK: 0 / <unk>
[2023-12-29 02:02:54,923] [DEBUG] [axolotl.load_tokenizer:184] [PID:164] [RANK:3] EOS: 2 / </s>
[2023-12-29 02:02:54,923] [DEBUG] [axolotl.load_tokenizer:185] [PID:164] [RANK:3] BOS: 1 / <s>
[2023-12-29 02:02:54,923] [DEBUG] [axolotl.load_tokenizer:186] [PID:164] [RANK:3] PAD: 2 / </s>
[2023-12-29 02:02:54,923] [DEBUG] [axolotl.load_tokenizer:187] [PID:164] [RANK:3] UNK: 0 / <unk>
[2023-12-29 02:02:54,975] [INFO] [axolotl.load_model:232] [PID:161] [RANK:0] patching with flash attention for sample packing
[2023-12-29 02:02:54,975] [INFO] [axolotl.load_model:278] [PID:161] [RANK:0] patching _expand_mask
[2023-12-29 02:02:54,980] [INFO] [axolotl.load_model:232] [PID:162] [RANK:1] patching with flash attention for sample packing
[2023-12-29 02:02:54,980] [INFO] [axolotl.load_model:278] [PID:162] [RANK:1] patching _expand_mask
[2023-12-29 02:02:54,987] [INFO] [axolotl.load_model:232] [PID:167] [RANK:6] patching with flash attention for sample packing
[2023-12-29 02:02:54,987] [INFO] [axolotl.load_model:278] [PID:167] [RANK:6] patching _expand_mask
[2023-12-29 02:02:54,989] [INFO] [axolotl.load_model:232] [PID:168] [RANK:7] patching with flash attention for sample packing
[2023-12-29 02:02:54,990] [INFO] [axolotl.load_model:278] [PID:168] [RANK:7] patching _expand_mask
[2023-12-29 02:02:54,998] [INFO] [axolotl.load_model:232] [PID:165] [RANK:4] patching with flash attention for sample packing
[2023-12-29 02:02:54,999] [INFO] [axolotl.load_model:278] [PID:165] [RANK:4] patching _expand_mask
[2023-12-29 02:02:55,012] [INFO] [axolotl.load_model:232] [PID:163] [RANK:2] patching with flash attention for sample packing
[2023-12-29 02:02:55,012] [INFO] [axolotl.load_model:278] [PID:163] [RANK:2] patching _expand_mask
[2023-12-29 02:02:55,078] [INFO] [axolotl.load_model:232] [PID:164] [RANK:3] patching with flash attention for sample packing
[2023-12-29 02:02:55,079] [INFO] [axolotl.load_model:278] [PID:164] [RANK:3] patching _expand_mask
[2023-12-29 02:02:55,111] [INFO] [axolotl.load_model:232] [PID:166] [RANK:5] patching with flash attention for sample packing
[2023-12-29 02:02:55,112] [INFO] [axolotl.load_model:278] [PID:166] [RANK:5] patching _expand_mask
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 6.85G/6.85G [00:17<00:00, 394MB/s]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 116kB/s]
[2023-12-29 02:03:21,252] [INFO] [axolotl.load_model:503] [PID:161] [RANK:0] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:21,255] [INFO] [axolotl.load_model:526] [PID:161] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:21,258] [INFO] [axolotl.load_model:538] [PID:161] [RANK:0] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:21,289] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:161] CUDA extension not installed.
[2023-12-29 02:03:21,290] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:161] CUDA extension not installed.
[2023-12-29 02:03:21,304] [INFO] [axolotl.load_model:503] [PID:165] [RANK:4] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:21,307] [INFO] [axolotl.load_model:526] [PID:165] [RANK:4] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:21,311] [INFO] [axolotl.load_model:538] [PID:165] [RANK:4] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:21,342] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:165] CUDA extension not installed.
[2023-12-29 02:03:21,343] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:165] CUDA extension not installed.
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:21,524] [INFO] [axolotl.load_model:568] [PID:161] [RANK:0] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:21,559] [INFO] [axolotl.train.log:60] [PID:161] [RANK:0] Pre-saving adapter config to ./lora-out
[2023-12-29 02:03:21,562] [INFO] [axolotl.train.log:60] [PID:161] [RANK:0] Starting trainer...
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:21,585] [INFO] [axolotl.load_model:568] [PID:165] [RANK:4] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:21,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:21,965] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:21,997] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,028] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,072] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,104] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,167] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,955] [INFO] [axolotl.load_model:503] [PID:163] [RANK:2] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:22,958] [INFO] [axolotl.load_model:526] [PID:163] [RANK:2] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:22,961] [INFO] [axolotl.load_model:538] [PID:163] [RANK:2] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:22,993] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:163] CUDA extension not installed.
[2023-12-29 02:03:22,993] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:163] CUDA extension not installed.
[2023-12-29 02:03:23,072] [INFO] [axolotl.load_model:503] [PID:166] [RANK:5] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,075] [INFO] [axolotl.load_model:526] [PID:166] [RANK:5] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,079] [INFO] [axolotl.load_model:538] [PID:166] [RANK:5] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,100] [INFO] [axolotl.load_model:503] [PID:162] [RANK:1] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,103] [INFO] [axolotl.load_model:526] [PID:162] [RANK:1] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,107] [INFO] [axolotl.load_model:538] [PID:162] [RANK:1] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,110] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:166] CUDA extension not installed.
[2023-12-29 02:03:23,110] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:166] CUDA extension not installed.
[2023-12-29 02:03:23,139] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:162] CUDA extension not installed.
[2023-12-29 02:03:23,139] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:162] CUDA extension not installed.
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,237] [INFO] [axolotl.load_model:568] [PID:163] [RANK:2] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,354] [INFO] [axolotl.load_model:568] [PID:166] [RANK:5] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,388] [INFO] [axolotl.load_model:568] [PID:162] [RANK:1] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:23,443] [INFO] [axolotl.load_model:503] [PID:167] [RANK:6] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,446] [INFO] [axolotl.load_model:526] [PID:167] [RANK:6] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,450] [INFO] [axolotl.load_model:538] [PID:167] [RANK:6] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,481] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:167] CUDA extension not installed.
[2023-12-29 02:03:23,482] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:167] CUDA extension not installed.
[2023-12-29 02:03:23,597] [INFO] [axolotl.load_model:503] [PID:168] [RANK:7] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,600] [INFO] [axolotl.load_model:526] [PID:168] [RANK:7] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,604] [INFO] [axolotl.load_model:538] [PID:168] [RANK:7] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,635] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:168] CUDA extension not installed.
[2023-12-29 02:03:23,636] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:168] CUDA extension not installed.
[2023-12-29 02:03:23,685] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,720] [INFO] [axolotl.load_model:568] [PID:167] [RANK:6] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:23,742] [INFO] [axolotl.load_model:503] [PID:164] [RANK:3] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,745] [INFO] [axolotl.load_model:526] [PID:164] [RANK:3] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,749] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,750] [INFO] [axolotl.load_model:538] [PID:164] [RANK:3] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,781] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,786] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:164] CUDA extension not installed.
[2023-12-29 02:03:23,786] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:164] CUDA extension not installed.
[2023-12-29 02:03:23,826] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,853] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,858] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,878] [INFO] [axolotl.load_model:568] [PID:168] [RANK:7] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:23,886] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,890] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,917] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,924] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,949] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:24,104] [INFO] [axolotl.load_model:568] [PID:164] [RANK:3] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:24,178] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,215] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,248] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,280] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,325] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,357] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,425] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,902] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,971] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,003] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
0%| | 0/2752 [00:00<?, ?it/s][2023-12-29 02:03:25,504] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,504] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,505] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,506] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,507] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,508] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,510] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,511] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,536] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,536] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,537] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,539] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,539] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,541] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,542] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,543] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
{'loss': 1.3883, 'learning_rate': 1e-05, 'epoch': 0.0}
0%| | 1/2752 [00:02<1:32:27, 2.02s/it][2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,517] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,517] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,517] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,518] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,518] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,768] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,769] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,770] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,770] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,032] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,033] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,296] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,297] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,547] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,547] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,827] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,827] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,337] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,338] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,596] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,597] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,853] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,853] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,123] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,123] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,381] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,382] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,641] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,641] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.6172605752944946, 'eval_runtime': 3.1392, 'eval_samples_per_second': 347.857, 'eval_steps_per_second': 21.98, 'epoch': 0.0}
0%| [2023-12-29 02:03:31,652] [INFO] [axolotl.callbacks.on_step_end:122] [PID:161] [RANK:0] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
0%| | 2/2752 [00:06<2:30:21, 3.28s/it][2023-12-29 02:03:31,654] [INFO] [axolotl.callbacks.on_step_end:122] [PID:162] [RANK:1] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,655] [INFO] [axolotl.callbacks.on_step_end:122] [PID:166] [RANK:5] GPU memory usage while training: 3.554GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,656] [INFO] [axolotl.callbacks.on_step_end:122] [PID:168] [RANK:7] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,656] [INFO] [axolotl.callbacks.on_step_end:122] [PID:165] [RANK:4] GPU memory usage while training: 3.554GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,657] [INFO] [axolotl.callbacks.on_step_end:122] [PID:163] [RANK:2] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,658] [INFO] [axolotl.callbacks.on_step_end:122] [PID:167] [RANK:6] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,659] [INFO] [axolotl.callbacks.on_step_end:122] [PID:164] [RANK:3] GPU memory usage while training: 3.552GB (+3.813GB cache, +1.668GB misc)
{'loss': 1.2717, 'learning_rate': 2e-05, 'epoch': 0.0}
{'loss': 1.4102, 'learning_rate': 3e-05, 'epoch': 0.0}
{'loss': 1.2598, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.4164, 'learning_rate': 5e-05, 'epoch': 0.01}
{'loss': 1.3747, 'learning_rate': 6e-05, 'epoch': 0.01}
{'loss': 1.2655, 'learning_rate': 7e-05, 'epoch': 0.01}
{'loss': 1.3905, 'learning_rate': 8e-05, 'epoch': 0.01}
{'loss': 1.3399, 'learning_rate': 9e-05, 'epoch': 0.01}
{'loss': 1.2916, 'learning_rate': 0.0001, 'epoch': 0.01}
{'loss': 1.292, 'learning_rate': 0.00011000000000000002, 'epoch': 0.02}
{'loss': 1.2863, 'learning_rate': 0.00012, 'epoch': 0.02}
{'loss': 1.4121, 'learning_rate': 0.00013000000000000002, 'epoch': 0.02}
{'loss': 1.2416, 'learning_rate': 0.00014, 'epoch': 0.02}
{'loss': 1.1848, 'learning_rate': 0.00015000000000000001, 'epoch': 0.02}
{'loss': 1.2452, 'learning_rate': 0.00016, 'epoch': 0.02}
{'loss': 1.271, 'learning_rate': 0.00017, 'epoch': 0.02}
{'loss': 1.1541, 'learning_rate': 0.00018, 'epoch': 0.03}
{'loss': 1.1802, 'learning_rate': 0.00019, 'epoch': 0.03}
{'loss': 1.1818, 'learning_rate': 0.0002, 'epoch': 0.03}
{'loss': 1.157, 'learning_rate': 0.00019999993388373499, 'epoch': 0.03}
{'loss': 1.179, 'learning_rate': 0.00019999973553502733, 'epoch': 0.03}
{'loss': 1.2595, 'learning_rate': 0.00019999940495413936, 'epoch': 0.03}
{'loss': 1.1576, 'learning_rate': 0.00019999894214150818, 'epoch': 0.03}
{'loss': 1.1034, 'learning_rate': 0.00019999834709774576, 'epoch': 0.04}
{'loss': 1.1498, 'learning_rate': 0.000199997619823639, 'epoch': 0.04}
{'loss': 1.2154, 'learning_rate': 0.00019999676032014953, 'epoch': 0.04}
{'loss': 1.1614, 'learning_rate': 0.00019999576858841395, 'epoch': 0.04}
{'loss': 1.2372, 'learning_rate': 0.0001999946446297436, 'epoch': 0.04}
{'loss': 1.1168, 'learning_rate': 0.00019999338844562477, 'epoch': 0.04}
{'loss': 1.1751, 'learning_rate': 0.0001999920000377185, 'epoch': 0.05}
{'loss': 1.1843, 'learning_rate': 0.00019999047940786073, 'epoch': 0.05}
{'loss': 1.1122, 'learning_rate': 0.00019998882655806224, 'epoch': 0.05}
{'loss': 1.1952, 'learning_rate': 0.00019998704149050864, 'epoch': 0.05}
{'loss': 1.1519, 'learning_rate': 0.00019998512420756032, 'epoch': 0.05}
{'loss': 1.1951, 'learning_rate': 0.00019998307471175264, 'epoch': 0.05}
{'loss': 1.188, 'learning_rate': 0.00019998089300579558, 'epoch': 0.05}
{'loss': 1.2015, 'learning_rate': 0.0001999785790925742, 'epoch': 0.06}
{'loss': 1.1764, 'learning_rate': 0.00019997613297514816, 'epoch': 0.06}
{'loss': 1.0991, 'learning_rate': 0.00019997355465675205, 'epoch': 0.06}
{'loss': 1.1394, 'learning_rate': 0.0001999708441407952, 'epoch': 0.06}
{'loss': 1.1221, 'learning_rate': 0.00019996800143086188, 'epoch': 0.06}
{'loss': 1.1235, 'learning_rate': 0.000199965026530711, 'epoch': 0.06}
{'loss': 1.2169, 'learning_rate': 0.00019996191944427638, 'epoch': 0.06}
{'loss': 1.1213, 'learning_rate': 0.0001999586801756666, 'epoch': 0.07}
{'loss': 1.1409, 'learning_rate': 0.00019995530872916501, 'epoch': 0.07}
{'loss': 1.2117, 'learning_rate': 0.0001999518051092298, 'epoch': 0.07}
{'loss': 1.137, 'learning_rate': 0.00019994816932049383, 'epoch': 0.07}
{'loss': 1.1152, 'learning_rate': 0.00019994440136776484, 'epoch': 0.07}
{'loss': 1.1461, 'learning_rate': 0.0001999405012560253, 'epoch': 0.07}
{'loss': 1.1633, 'learning_rate': 0.00019993646899043238, 'epoch': 0.07}
{'loss': 1.0886, 'learning_rate': 0.0001999323045763181, 'epoch': 0.08}
{'loss': 1.0954, 'learning_rate': 0.00019992800801918914, 'epoch': 0.08}
{'loss': 1.1329, 'learning_rate': 0.00019992357932472693, 'epoch': 0.08}
{'loss': 1.1167, 'learning_rate': 0.00019991901849878766, 'epoch': 0.08}
{'loss': 1.0483, 'learning_rate': 0.00019991432554740225, 'epoch': 0.08}
{'loss': 1.0399, 'learning_rate': 0.0001999095004767763, 'epoch': 0.08}
{'loss': 1.1457, 'learning_rate': 0.0001999045432932901, 'epoch': 0.08}
{'loss': 1.1898, 'learning_rate': 0.00019989945400349866, 'epoch': 0.09}
{'loss': 1.0845, 'learning_rate': 0.0001998942326141317, 'epoch': 0.09}
{'loss': 1.1339, 'learning_rate': 0.00019988887913209355, 'epoch': 0.09}
{'loss': 1.1488, 'learning_rate': 0.00019988339356446334, 'epoch': 0.09}
{'loss': 1.1876, 'learning_rate': 0.00019987777591849468, 'epoch': 0.09}
{'loss': 1.1047, 'learning_rate': 0.000199872026201616, 'epoch': 0.09}
{'loss': 1.1951, 'learning_rate': 0.00019986614442143023, 'epoch': 0.09}
{'loss': 1.0934, 'learning_rate': 0.00019986013058571504, 'epoch': 0.1}
{'loss': 1.1218, 'learning_rate': 0.00019985398470242268, 'epoch': 0.1}
{'loss': 1.0293, 'learning_rate': 0.00019984770677968, 'epoch': 0.1}
{'loss': 1.1087, 'learning_rate': 0.00019984129682578842, 'epoch': 0.1}
{'loss': 1.1243, 'learning_rate': 0.00019983475484922406, 'epoch': 0.1}
{'loss': 1.1438, 'learning_rate': 0.00019982808085863745, 'epoch': 0.1}
{'loss': 1.1484, 'learning_rate': 0.00019982127486285384, 'epoch': 0.1}
{'loss': 1.0461, 'learning_rate': 0.00019981433687087295, 'epoch': 0.11}
{'loss': 1.142, 'learning_rate': 0.00019980726689186907, 'epoch': 0.11}
{'loss': 1.0223, 'learning_rate': 0.000199800064935191, 'epoch': 0.11}
{'loss': 1.0888, 'learning_rate': 0.0001997927310103621, 'epoch': 0.11}
{'loss': 1.1318, 'learning_rate': 0.00019978526512708013, 'epoch': 0.11}
{'loss': 0.9538, 'learning_rate': 0.00019977766729521753, 'epoch': 0.11}
{'loss': 1.1368, 'learning_rate': 0.000199769937524821, 'epoch': 0.11}
{'loss': 1.1407, 'learning_rate': 0.00019976207582611189, 'epoch': 0.12}
{'loss': 1.1756, 'learning_rate': 0.00019975408220948584, 'epoch': 0.12}
{'loss': 1.1115, 'learning_rate': 0.0001997459566855131, 'epoch': 0.12}
{'loss': 1.0728, 'learning_rate': 0.0001997376992649382, 'epoch': 0.12}
{'loss': 1.1101, 'learning_rate': 0.00019972930995868014, 'epoch': 0.12}
{'loss': 1.0907, 'learning_rate': 0.00019972078877783232, 'epoch': 0.12}
{'loss': 1.1709, 'learning_rate': 0.0001997121357336625, 'epoch': 0.12}
{'loss': 1.08, 'learning_rate': 0.0001997033508376129, 'epoch': 0.13}
{'loss': 1.1698, 'learning_rate': 0.0001996944341012999, 'epoch': 0.13}
{'loss': 1.1646, 'learning_rate': 0.00019968538553651437, 'epoch': 0.13}
{'loss': 1.1041, 'learning_rate': 0.00019967620515522146, 'epoch': 0.13}
{'loss': 1.244, 'learning_rate': 0.00019966689296956064, 'epoch': 0.13}
{'loss': 1.2206, 'learning_rate': 0.0001996574489918456, 'epoch': 0.13}
{'loss': 1.0939, 'learning_rate': 0.00019964787323456436, 'epoch': 0.14}
{'loss': 1.0881, 'learning_rate': 0.00019963816571037923, 'epoch': 0.14}
{'loss': 1.1767, 'learning_rate': 0.00019962832643212667, 'epoch': 0.14}
{'loss': 1.0859, 'learning_rate': 0.00019961835541281746, 'epoch': 0.14}
{'loss': 1.1681, 'learning_rate': 0.00019960825266563648, 'epoch': 0.14}
{'loss': 1.0445, 'learning_rate': 0.00019959801820394285, 'epoch': 0.14}
{'loss': 1.1215, 'learning_rate': 0.00019958765204126987, 'epoch': 0.14}
{'loss': 1.0394, 'learning_rate': 0.00019957715419132498, 'epoch': 0.15}
{'loss': 1.1211, 'learning_rate': 0.00019956652466798978, 'epoch': 0.15}
{'loss': 1.1266, 'learning_rate': 0.00019955576348531994, 'epoch': 0.15}
{'loss': 1.1694, 'learning_rate': 0.00019954487065754518, 'epoch': 0.15}
{'loss': 1.0359, 'learning_rate': 0.00019953384619906945, 'epoch': 0.15}
{'loss': 1.107, 'learning_rate': 0.00019952269012447064, 'epoch': 0.15}
{'loss': 1.0611, 'learning_rate': 0.0001995114024485007, 'epoch': 0.15}
{'loss': 1.0891, 'learning_rate': 0.00019949998318608561, 'epoch': 0.16}
{'loss': 1.093, 'learning_rate': 0.00019948843235232535, 'epoch': 0.16}
{'loss': 1.1103, 'learning_rate': 0.00019947674996249393, 'epoch': 0.16}
{'loss': 1.0403, 'learning_rate': 0.00019946493603203918, 'epoch': 0.16}
{'loss': 1.1186, 'learning_rate': 0.000199452990576583, 'epoch': 0.16}
{'loss': 0.9921, 'learning_rate': 0.0001994409136119212, 'epoch': 0.16}
{'loss': 1.1533, 'learning_rate': 0.00019942870515402345, 'epoch': 0.16}
{'loss': 1.0887, 'learning_rate': 0.00019941636521903321, 'epoch': 0.17}
{'loss': 1.0476, 'learning_rate': 0.00019940389382326802, 'epoch': 0.17}
{'loss': 1.0209, 'learning_rate': 0.00019939129098321904, 'epoch': 0.17}
{'loss': 1.1392, 'learning_rate': 0.00019937855671555132, 'epoch': 0.17}
{'loss': 1.089, 'learning_rate': 0.00019936569103710377, 'epoch': 0.17}
{'loss': 1.1601, 'learning_rate': 0.00019935269396488894, 'epoch': 0.17}
{'loss': 1.1188, 'learning_rate': 0.00019933956551609322, 'epoch': 0.17}
{'loss': 1.0759, 'learning_rate': 0.00019932630570807666, 'epoch': 0.18}
{'loss': 1.0615, 'learning_rate': 0.00019931291455837306, 'epoch': 0.18}
{'loss': 1.0169, 'learning_rate': 0.00019929939208468991, 'epoch': 0.18}
{'loss': 1.0956, 'learning_rate': 0.00019928573830490826, 'epoch': 0.18}
{'loss': 1.1011, 'learning_rate': 0.0001992719532370829, 'epoch': 0.18}
{'loss': 1.0581, 'learning_rate': 0.00019925803689944212, 'epoch': 0.18}
{'loss': 1.2119, 'learning_rate': 0.00019924398931038786, 'epoch': 0.18}
{'loss': 0.9725, 'learning_rate': 0.00019922981048849564, 'epoch': 0.19}
{'loss': 1.1215, 'learning_rate': 0.00019921550045251443, 'epoch': 0.19}
{'loss': 1.069, 'learning_rate': 0.00019920105922136678, 'epoch': 0.19}
{'loss': 1.0962, 'learning_rate': 0.00019918648681414868, 'epoch': 0.19}
{'loss': 1.0503, 'learning_rate': 0.00019917178325012963, 'epoch': 0.19}
{'loss': 1.1766, 'learning_rate': 0.00019915694854875246, 'epoch': 0.19}
{'loss': 1.1211, 'learning_rate': 0.00019914198272963352, 'epoch': 0.19}
{'loss': 1.099, 'learning_rate': 0.00019912688581256248, 'epoch': 0.2}
{'loss': 1.2161, 'learning_rate': 0.00019911165781750237, 'epoch': 0.2}
{'loss': 1.1908, 'learning_rate': 0.00019909629876458954, 'epoch': 0.2}
{'loss': 1.0902, 'learning_rate': 0.00019908080867413368, 'epoch': 0.2}
{'loss': 1.136, 'learning_rate': 0.0001990651875666177, 'epoch': 0.2}
{'loss': 1.0682, 'learning_rate': 0.00019904943546269785, 'epoch': 0.2}
{'loss': 1.0492, 'learning_rate': 0.00019903355238320346, 'epoch': 0.2}
{'loss': 1.0702, 'learning_rate': 0.0001990175383491372, 'epoch': 0.21}
{'loss': 1.1384, 'learning_rate': 0.00019900139338167473, 'epoch': 0.21}
{'loss': 1.0154, 'learning_rate': 0.00019898511750216505, 'epoch': 0.21}
{'loss': 0.9933, 'learning_rate': 0.00019896871073213007, 'epoch': 0.21}
{'loss': 1.109, 'learning_rate': 0.000198952173093265, 'epoch': 0.21}
{'loss': 1.0841, 'learning_rate': 0.00019893550460743788, 'epoch': 0.21}
{'loss': 1.0595, 'learning_rate': 0.0001989187052966899, 'epoch': 0.22}
{'loss': 1.0745, 'learning_rate': 0.0001989017751832352, 'epoch': 0.22}
{'loss': 1.1012, 'learning_rate': 0.00019888471428946094, 'epoch': 0.22}
{'loss': 1.1249, 'learning_rate': 0.00019886752263792714, 'epoch': 0.22}
{'loss': 1.1229, 'learning_rate': 0.00019885020025136677, 'epoch': 0.22}
{'loss': 1.0904, 'learning_rate': 0.00019883274715268564, 'epoch': 0.22}
{'loss': 1.1373, 'learning_rate': 0.00019881516336496243, 'epoch': 0.22}
{'loss': 1.1407, 'learning_rate': 0.00019879744891144864, 'epoch': 0.23}
{'loss': 0.9957, 'learning_rate': 0.0001987796038155685, 'epoch': 0.23}
{'loss': 1.2211, 'learning_rate': 0.00019876162810091908, 'epoch': 0.23}
{'loss': 1.1074, 'learning_rate': 0.00019874352179127014, 'epoch': 0.23}
{'loss': 1.1739, 'learning_rate': 0.00019872528491056405, 'epoch': 0.23}
{'loss': 1.1182, 'learning_rate': 0.0001987069174829159, 'epoch': 0.23}
{'loss': 1.127, 'learning_rate': 0.0001986884195326135, 'epoch': 0.23}
{'loss': 1.0611, 'learning_rate': 0.000198669791084117, 'epoch': 0.24}
{'loss': 1.1211, 'learning_rate': 0.0001986510321620594, 'epoch': 0.24}
{'loss': 1.1486, 'learning_rate': 0.00019863214279124608, 'epoch': 0.24}
{'loss': 1.1382, 'learning_rate': 0.0001986131229966549, 'epoch': 0.24}
{'loss': 1.1212, 'learning_rate': 0.0001985939728034362, 'epoch': 0.24}
{'loss': 1.0609, 'learning_rate': 0.00019857469223691276, 'epoch': 0.24}
{'loss': 1.1094, 'learning_rate': 0.00019855528132257984, 'epoch': 0.24}
{'loss': 1.1383, 'learning_rate': 0.0001985357400861049, 'epoch': 0.25}
{'loss': 1.0304, 'learning_rate': 0.00019851606855332787, 'epoch': 0.25}
{'loss': 1.0512, 'learning_rate': 0.00019849626675026087, 'epoch': 0.25}
{'loss': 1.1373, 'learning_rate': 0.00019847633470308833, 'epoch': 0.25}
6%|███████ | 172/2752 [02:57<43:30, 1.01s/it][2023-12-29 02:06:23,389] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,389] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,389] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,643] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,644] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,644] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,906] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,907] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,168] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,168] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,419] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,420] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,701] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,702] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,964] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,964] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,216] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,217] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,477] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,478] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,737] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,738] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,006] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,007] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,271] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,271] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,533] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,533] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.08314847946167, 'eval_runtime': 3.155, 'eval_samples_per_second': 346.12, 'eval_steps_per_second': 21.87, 'epoch': 0.25}
{'loss': 1.0455, 'learning_rate': 0.00019845627243816693, 'epoch': 0.25}
{'loss': 1.1284, 'learning_rate': 0.0001984360799820255, 'epoch': 0.25}
{'loss': 1.1126, 'learning_rate': 0.00019841575736136502, 'epoch': 0.25}
{'loss': 1.0239, 'learning_rate': 0.00019839530460305862, 'epoch': 0.26}
{'loss': 1.1152, 'learning_rate': 0.00019837472173415147, 'epoch': 0.26}
{'loss': 1.1221, 'learning_rate': 0.0001983540087818609, 'epoch': 0.26}
{'loss': 1.0601, 'learning_rate': 0.00019833316577357607, 'epoch': 0.26}
{'loss': 1.1294, 'learning_rate': 0.00019831219273685826, 'epoch': 0.26}
{'loss': 1.0331, 'learning_rate': 0.00019829108969944068, 'epoch': 0.26}
{'loss': 1.0731, 'learning_rate': 0.00019826985668922834, 'epoch': 0.26}
{'loss': 1.0465, 'learning_rate': 0.00019824849373429825, 'epoch': 0.27}
{'loss': 1.0436, 'learning_rate': 0.00019822700086289915, 'epoch': 0.27}
{'loss': 1.0638, 'learning_rate': 0.00019820537810345164, 'epoch': 0.27}
{'loss': 1.0589, 'learning_rate': 0.000198183625484548, 'epoch': 0.27}
{'loss': 1.1969, 'learning_rate': 0.0001981617430349523, 'epoch': 0.27}
{'loss': 1.0533, 'learning_rate': 0.00019813973078360025, 'epoch': 0.27}
{'loss': 0.9988, 'learning_rate': 0.0001981175887595992, 'epoch': 0.27}
{'loss': 1.0979, 'learning_rate': 0.0001980953169922281, 'epoch': 0.28}
{'loss': 1.085, 'learning_rate': 0.00019807291551093747, 'epoch': 0.28}
{'loss': 1.1527, 'learning_rate': 0.0001980503843453494, 'epoch': 0.28}
{'loss': 1.1165, 'learning_rate': 0.00019802772352525735, 'epoch': 0.28}
{'loss': 1.0262, 'learning_rate': 0.00019800493308062635, 'epoch': 0.28}
{'loss': 1.052, 'learning_rate': 0.00019798201304159282, 'epoch': 0.28}
{'loss': 1.1547, 'learning_rate': 0.00019795896343846437, 'epoch': 0.28}
{'loss': 1.1672, 'learning_rate': 0.00019793578430172022, 'epoch': 0.29}
{'loss': 1.1193, 'learning_rate': 0.00019791247566201063, 'epoch': 0.29}
{'loss': 1.0842, 'learning_rate': 0.00019788903755015724, 'epoch': 0.29}
{'loss': 1.0412, 'learning_rate': 0.00019786546999715285, 'epoch': 0.29}
{'loss': 1.0645, 'learning_rate': 0.00019784177303416148, 'epoch': 0.29}
{'loss': 1.0191, 'learning_rate': 0.00019781794669251817, 'epoch': 0.29}
{'loss': 1.0915, 'learning_rate': 0.0001977939910037291, 'epoch': 0.3}
{'loss': 1.0264, 'learning_rate': 0.00019776990599947147, 'epoch': 0.3}
{'loss': 1.0597, 'learning_rate': 0.00019774569171159353, 'epoch': 0.3}
{'loss': 1.0705, 'learning_rate': 0.00019772134817211442, 'epoch': 0.3}
{'loss': 1.0304, 'learning_rate': 0.00019769687541322422, 'epoch': 0.3}
{'loss': 1.1003, 'learning_rate': 0.00019767227346728392, 'epoch': 0.3}
{'loss': 1.0781, 'learning_rate': 0.00019764754236682524, 'epoch': 0.3}
{'loss': 1.1305, 'learning_rate': 0.00019762268214455072, 'epoch': 0.31}
{'loss': 1.1008, 'learning_rate': 0.00019759769283333377, 'epoch': 0.31}
{'loss': 1.075, 'learning_rate': 0.00019757257446621827, 'epoch': 0.31}
{'loss': 1.0276, 'learning_rate': 0.0001975473270764189, 'epoch': 0.31}
{'loss': 1.1116, 'learning_rate': 0.000197521950697321, 'epoch': 0.31}
{'loss': 1.107, 'learning_rate': 0.00019749644536248031, 'epoch': 0.31}
{'loss': 1.1041, 'learning_rate': 0.00019747081110562322, 'epoch': 0.31}
{'loss': 1.183, 'learning_rate': 0.00019744504796064653, 'epoch': 0.32}
{'loss': 1.1011, 'learning_rate': 0.00019741915596161756, 'epoch': 0.32}
{'loss': 1.011, 'learning_rate': 0.00019739313514277384, 'epoch': 0.32}
{'loss': 1.0174, 'learning_rate': 0.0001973669855385235, 'epoch': 0.32}
{'loss': 1.0124, 'learning_rate': 0.00019734070718344468, 'epoch': 0.32}
{'loss': 1.0547, 'learning_rate': 0.00019731430011228604, 'epoch': 0.32}
{'loss': 1.0363, 'learning_rate': 0.00019728776435996625, 'epoch': 0.32}
{'loss': 1.1824, 'learning_rate': 0.00019726109996157424, 'epoch': 0.33}
{'loss': 1.0507, 'learning_rate': 0.00019723430695236895, 'epoch': 0.33}
{'loss': 1.1741, 'learning_rate': 0.00019720738536777951, 'epoch': 0.33}
{'loss': 1.0863, 'learning_rate': 0.00019718033524340504, 'epoch': 0.33}
{'loss': 1.1066, 'learning_rate': 0.0001971531566150145, 'epoch': 0.33}
{'loss': 1.1112, 'learning_rate': 0.00019712584951854701, 'epoch': 0.33}
{'loss': 1.0474, 'learning_rate': 0.0001970984139901114, 'epoch': 0.33}
{'loss': 1.0878, 'learning_rate': 0.00019707085006598628, 'epoch': 0.34}
{'loss': 1.1319, 'learning_rate': 0.00019704315778262016, 'epoch': 0.34}
{'loss': 1.1191, 'learning_rate': 0.00019701533717663133, 'epoch': 0.34}
{'loss': 1.0319, 'learning_rate': 0.00019698738828480758, 'epoch': 0.34}
{'loss': 1.0866, 'learning_rate': 0.00019695931114410646, 'epoch': 0.34}
{'loss': 1.0531, 'learning_rate': 0.00019693110579165513, 'epoch': 0.34}
{'loss': 0.9935, 'learning_rate': 0.0001969027722647502, 'epoch': 0.34}
{'loss': 1.0671, 'learning_rate': 0.0001968743106008578, 'epoch': 0.35}
{'loss': 1.0987, 'learning_rate': 0.00019684572083761352, 'epoch': 0.35}
{'loss': 1.1184, 'learning_rate': 0.00019681700301282234, 'epoch': 0.35}
{'loss': 1.1598, 'learning_rate': 0.00019678815716445857, 'epoch': 0.35}
{'loss': 1.0762, 'learning_rate': 0.0001967591833306658, 'epoch': 0.35}
{'loss': 0.9739, 'learning_rate': 0.00019673008154975685, 'epoch': 0.35}
{'loss': 0.9894, 'learning_rate': 0.00019670085186021375, 'epoch': 0.35}
{'loss': 1.0197, 'learning_rate': 0.00019667149430068766, 'epoch': 0.36}
{'loss': 1.0673, 'learning_rate': 0.00019664200890999882, 'epoch': 0.36}
{'loss': 1.0144, 'learning_rate': 0.0001966123957271365, 'epoch': 0.36}
{'loss': 1.0572, 'learning_rate': 0.000196582654791259, 'epoch': 0.36}
{'loss': 1.1283, 'learning_rate': 0.00019655278614169345, 'epoch': 0.36}
{'loss': 1.0792, 'learning_rate': 0.00019652278981793596, 'epoch': 0.36}
{'loss': 1.1433, 'learning_rate': 0.00019649266585965145, 'epoch': 0.36}
{'loss': 1.0699, 'learning_rate': 0.00019646241430667353, 'epoch': 0.37}
{'loss': 1.0948, 'learning_rate': 0.00019643203519900465, 'epoch': 0.37}
{'loss': 1.1015, 'learning_rate': 0.0001964015285768158, 'epoch': 0.37}
{'loss': 1.0478, 'learning_rate': 0.00019637089448044676, 'epoch': 0.37}
{'loss': 1.1033, 'learning_rate': 0.0001963401329504057, 'epoch': 0.37}
{'loss': 1.0759, 'learning_rate': 0.0001963092440273694, 'epoch': 0.37}
{'loss': 1.1169, 'learning_rate': 0.00019627822775218303, 'epoch': 0.38}
{'loss': 1.092, 'learning_rate': 0.00019624708416586021, 'epoch': 0.38}
{'loss': 1.0935, 'learning_rate': 0.00019621581330958295, 'epoch': 0.38}
{'loss': 1.1252, 'learning_rate': 0.0001961844152247014, 'epoch': 0.38}
{'loss': 0.9732, 'learning_rate': 0.00019615288995273412, 'epoch': 0.38}
{'loss': 1.0818, 'learning_rate': 0.0001961212375353677, 'epoch': 0.38}
{'loss': 1.0424, 'learning_rate': 0.000196089458014457, 'epoch': 0.38}
{'loss': 1.0514, 'learning_rate': 0.00019605755143202488, 'epoch': 0.39}
{'loss': 1.1457, 'learning_rate': 0.00019602551783026216, 'epoch': 0.39}
{'loss': 1.1511, 'learning_rate': 0.00019599335725152775, 'epoch': 0.39}
{'loss': 1.2227, 'learning_rate': 0.00019596106973834835, 'epoch': 0.39}
{'loss': 1.0812, 'learning_rate': 0.00019592865533341858, 'epoch': 0.39}
{'loss': 1.147, 'learning_rate': 0.0001958961140796008, 'epoch': 0.39}
{'loss': 1.2058, 'learning_rate': 0.00019586344601992515, 'epoch': 0.39}
{'loss': 1.0079, 'learning_rate': 0.0001958306511975895, 'epoch': 0.4}
{'loss': 1.1207, 'learning_rate': 0.00019579772965595918, 'epoch': 0.4}
{'loss': 1.0082, 'learning_rate': 0.00019576468143856719, 'epoch': 0.4}
{'loss': 1.1827, 'learning_rate': 0.00019573150658911404, 'epoch': 0.4}
{'loss': 1.0825, 'learning_rate': 0.00019569820515146768, 'epoch': 0.4}
{'loss': 1.0575, 'learning_rate': 0.00019566477716966344, 'epoch': 0.4}
{'loss': 1.0498, 'learning_rate': 0.000195631222687904, 'epoch': 0.4}
{'loss': 1.1522, 'learning_rate': 0.00019559754175055925, 'epoch': 0.41}
{'loss': 1.0967, 'learning_rate': 0.0001955637344021664, 'epoch': 0.41}
{'loss': 1.0513, 'learning_rate': 0.00019552980068742977, 'epoch': 0.41}
{'loss': 1.1326, 'learning_rate': 0.0001954957406512207, 'epoch': 0.41}
{'loss': 1.185, 'learning_rate': 0.0001954615543385777, 'epoch': 0.41}
{'loss': 1.183, 'learning_rate': 0.00019542724179470616, 'epoch': 0.41}
{'loss': 1.1387, 'learning_rate': 0.00019539280306497844, 'epoch': 0.41}
{'loss': 1.0883, 'learning_rate': 0.00019535823819493374, 'epoch': 0.42}
{'loss': 1.1656, 'learning_rate': 0.0001953235472302781, 'epoch': 0.42}
{'loss': 1.1537, 'learning_rate': 0.0001952887302168842, 'epoch': 0.42}
{'loss': 1.0822, 'learning_rate': 0.00019525378720079147, 'epoch': 0.42}
{'loss': 1.1122, 'learning_rate': 0.00019521871822820598, 'epoch': 0.42}
{'loss': 1.1109, 'learning_rate': 0.0001951835233455003, 'epoch': 0.42}
{'loss': 1.0737, 'learning_rate': 0.00019514820259921352, 'epoch': 0.42}
{'loss': 0.9906, 'learning_rate': 0.0001951127560360511, 'epoch': 0.43}
{'loss': 0.9589, 'learning_rate': 0.00019507718370288503, 'epoch': 0.43}
{'loss': 1.0871, 'learning_rate': 0.0001950414856467534, 'epoch': 0.43}
{'loss': 1.0871, 'learning_rate': 0.00019500566191486075, 'epoch': 0.43}
{'loss': 0.9745, 'learning_rate': 0.00019496971255457765, 'epoch': 0.43}
{'loss': 1.1144, 'learning_rate': 0.00019493363761344086, 'epoch': 0.43}
{'loss': 1.1213, 'learning_rate': 0.00019489743713915316, 'epoch': 0.43}
{'loss': 1.0394, 'learning_rate': 0.00019486111117958342, 'epoch': 0.44}
{'loss': 0.898, 'learning_rate': 0.0001948246597827663, 'epoch': 0.44}
{'loss': 1.1053, 'learning_rate': 0.00019478808299690247, 'epoch': 0.44}
{'loss': 1.1651, 'learning_rate': 0.0001947513808703583, 'epoch': 0.44}
{'loss': 1.0712, 'learning_rate': 0.00019471455345166595, 'epoch': 0.44}
{'loss': 1.0605, 'learning_rate': 0.00019467760078952325, 'epoch': 0.44}
{'loss': 1.0684, 'learning_rate': 0.00019464052293279363, 'epoch': 0.44}
{'loss': 1.0278, 'learning_rate': 0.00019460331993050609, 'epoch': 0.45}
{'loss': 1.1612, 'learning_rate': 0.00019456599183185507, 'epoch': 0.45}
{'loss': 1.1316, 'learning_rate': 0.0001945285386862005, 'epoch': 0.45}
{'loss': 1.1126, 'learning_rate': 0.00019449096054306763, 'epoch': 0.45}
{'loss': 1.0819, 'learning_rate': 0.00019445325745214695, 'epoch': 0.45}
{'loss': 1.0392, 'learning_rate': 0.00019441542946329422, 'epoch': 0.45}
{'loss': 1.1946, 'learning_rate': 0.0001943774766265304, 'epoch': 0.45}
{'loss': 1.1179, 'learning_rate': 0.00019433939899204142, 'epoch': 0.46}
{'loss': 1.0504, 'learning_rate': 0.0001943011966101783, 'epoch': 0.46}
{'loss': 1.0482, 'learning_rate': 0.00019426286953145704, 'epoch': 0.46}
{'loss': 1.0551, 'learning_rate': 0.0001942244178065585, 'epoch': 0.46}
{'loss': 1.1729, 'learning_rate': 0.00019418584148632836, 'epoch': 0.46}
{'loss': 1.0714, 'learning_rate': 0.00019414714062177712, 'epoch': 0.46}
{'loss': 1.0875, 'learning_rate': 0.00019410831526407984, 'epoch': 0.47}
{'loss': 1.1164, 'learning_rate': 0.0001940693654645763, 'epoch': 0.47}
{'loss': 1.139, 'learning_rate': 0.0001940302912747708, 'epoch': 0.47}
{'loss': 1.037, 'learning_rate': 0.00019399109274633215, 'epoch': 0.47}
{'loss': 1.0673, 'learning_rate': 0.00019395176993109356, 'epoch': 0.47}
{'loss': 1.0884, 'learning_rate': 0.00019391232288105254, 'epoch': 0.47}
{'loss': 1.0848, 'learning_rate': 0.00019387275164837098, 'epoch': 0.47}
{'loss': 1.1198, 'learning_rate': 0.00019383305628537485, 'epoch': 0.48}
{'loss': 1.1119, 'learning_rate': 0.0001937932368445544, 'epoch': 0.48}
{'loss': 1.0566, 'learning_rate': 0.00019375329337856383, 'epoch': 0.48}
{'loss': 1.0883, 'learning_rate': 0.0001937132259402214, 'epoch': 0.48}
{'loss': 1.0596, 'learning_rate': 0.00019367303458250938, 'epoch': 0.48}
{'loss': 1.0067, 'learning_rate': 0.00019363271935857372, 'epoch': 0.48}
{'loss': 1.0583, 'learning_rate': 0.00019359228032172433, 'epoch': 0.48}
{'loss': 1.1095, 'learning_rate': 0.00019355171752543472, 'epoch': 0.49}
{'loss': 1.0321, 'learning_rate': 0.00019351103102334212, 'epoch': 0.49}
{'loss': 1.0552, 'learning_rate': 0.00019347022086924732, 'epoch': 0.49}
{'loss': 0.9958, 'learning_rate': 0.00019342928711711465, 'epoch': 0.49}
{'loss': 1.0808, 'learning_rate': 0.0001933882298210718, 'epoch': 0.49}
{'loss': 1.1052, 'learning_rate': 0.0001933470490354099, 'epoch': 0.49}
{'loss': 1.1816, 'learning_rate': 0.00019330574481458333, 'epoch': 0.49}
{'loss': 1.182, 'learning_rate': 0.00019326431721320973, 'epoch': 0.5}
{'loss': 1.0482, 'learning_rate': 0.0001932227662860698, 'epoch': 0.5}
{'loss': 1.0726, 'learning_rate': 0.00019318109208810746, 'epoch': 0.5}
{'loss': 1.01, 'learning_rate': 0.00019313929467442952, 'epoch': 0.5}
12%|██████████████▏ | 344/2752 [05:55<40:27, 1.01s/it][2023-12-29 02:09:20,839] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,839] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,842] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,095] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,096] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,096] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,097] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,618] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,618] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,869] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,870] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,152] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,152] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,417] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,418] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,670] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,670] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,193] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,194] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,462] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,463] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,725] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,726] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,989] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,990] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0399237871170044, 'eval_runtime': 3.1624, 'eval_samples_per_second': 345.306, 'eval_steps_per_second': 21.819, 'epoch': 0.5}
{'loss': 1.0084, 'learning_rate': 0.00019309737410030578, 'epoch': 0.5}
{'loss': 1.1143, 'learning_rate': 0.00019305533042116883, 'epoch': 0.5}
{'loss': 1.0768, 'learning_rate': 0.00019301316369261414, 'epoch': 0.5}
{'loss': 1.0763, 'learning_rate': 0.00019297087397039984, 'epoch': 0.51}
{'loss': 1.0445, 'learning_rate': 0.00019292846131044664, 'epoch': 0.51}
{'loss': 1.0717, 'learning_rate': 0.00019288592576883793, 'epoch': 0.51}
{'loss': 1.0391, 'learning_rate': 0.00019284326740181952, 'epoch': 0.51}
{'loss': 1.2133, 'learning_rate': 0.00019280048626579962, 'epoch': 0.51}
{'loss': 1.0577, 'learning_rate': 0.00019275758241734886, 'epoch': 0.51}
{'loss': 1.1106, 'learning_rate': 0.00019271455591320007, 'epoch': 0.51}
{'loss': 1.1, 'learning_rate': 0.0001926714068102483, 'epoch': 0.52}
{'loss': 0.9612, 'learning_rate': 0.0001926281351655506, 'epoch': 0.52}
{'loss': 1.123, 'learning_rate': 0.00019258474103632625, 'epoch': 0.52}
{'loss': 1.0484, 'learning_rate': 0.00019254122447995645, 'epoch': 0.52}
{'loss': 1.0206, 'learning_rate': 0.00019249758555398412, 'epoch': 0.52}
{'loss': 1.1084, 'learning_rate': 0.0001924538243161142, 'epoch': 0.52}
{'loss': 1.0974, 'learning_rate': 0.00019240994082421326, 'epoch': 0.52}
{'loss': 1.1584, 'learning_rate': 0.0001923659351363096, 'epoch': 0.53}
{'loss': 1.041, 'learning_rate': 0.00019232180731059293, 'epoch': 0.53}
{'loss': 1.1318, 'learning_rate': 0.0001922775574054147, 'epoch': 0.53}
{'loss': 1.087, 'learning_rate': 0.00019223318547928762, 'epoch': 0.53}
{'loss': 1.0942, 'learning_rate': 0.0001921886915908859, 'epoch': 0.53}
{'loss': 1.1136, 'learning_rate': 0.0001921440757990448, 'epoch': 0.53}
{'loss': 1.2016, 'learning_rate': 0.00019209933816276102, 'epoch': 0.53}
{'loss': 1.0318, 'learning_rate': 0.00019205447874119224, 'epoch': 0.54}
{'loss': 1.0923, 'learning_rate': 0.00019200949759365718, 'epoch': 0.54}
{'loss': 0.984, 'learning_rate': 0.00019196439477963556, 'epoch': 0.54}
{'loss': 1.1257, 'learning_rate': 0.00019191917035876798, 'epoch': 0.54}
{'loss': 1.2465, 'learning_rate': 0.00019187382439085586, 'epoch': 0.54}
{'loss': 1.0907, 'learning_rate': 0.00019182835693586127, 'epoch': 0.54}
{'loss': 0.9928, 'learning_rate': 0.00019178276805390703, 'epoch': 0.55}
{'loss': 1.099, 'learning_rate': 0.00019173705780527642, 'epoch': 0.55}
{'loss': 1.1329, 'learning_rate': 0.00019169122625041328, 'epoch': 0.55}
{'loss': 1.0504, 'learning_rate': 0.00019164527344992186, 'epoch': 0.55}
{'loss': 1.1264, 'learning_rate': 0.00019159919946456667, 'epoch': 0.55}
{'loss': 1.051, 'learning_rate': 0.00019155300435527256, 'epoch': 0.55}
{'loss': 1.0525, 'learning_rate': 0.0001915066881831244, 'epoch': 0.55}
{'loss': 1.0701, 'learning_rate': 0.00019146025100936733, 'epoch': 0.56}
{'loss': 0.9844, 'learning_rate': 0.00019141369289540637, 'epoch': 0.56}
{'loss': 1.1251, 'learning_rate': 0.00019136701390280644, 'epoch': 0.56}
{'loss': 1.044, 'learning_rate': 0.00019132021409329242, 'epoch': 0.56}
{'loss': 1.0764, 'learning_rate': 0.00019127329352874886, 'epoch': 0.56}
{'loss': 1.0345, 'learning_rate': 0.00019122625227122002, 'epoch': 0.56}
{'loss': 1.1272, 'learning_rate': 0.00019117909038290974, 'epoch': 0.56}
{'loss': 1.2045, 'learning_rate': 0.00019113180792618132, 'epoch': 0.57}
{'loss': 1.0651, 'learning_rate': 0.00019108440496355767, 'epoch': 0.57}
{'loss': 1.0704, 'learning_rate': 0.00019103688155772082, 'epoch': 0.57}
{'loss': 1.0859, 'learning_rate': 0.00019098923777151222, 'epoch': 0.57}
{'loss': 1.0957, 'learning_rate': 0.00019094147366793243, 'epoch': 0.57}
{'loss': 1.134, 'learning_rate': 0.00019089358931014114, 'epoch': 0.57}
{'loss': 1.1122, 'learning_rate': 0.00019084558476145706, 'epoch': 0.57}
{'loss': 1.1211, 'learning_rate': 0.00019079746008535784, 'epoch': 0.58}
{'loss': 1.0487, 'learning_rate': 0.0001907492153454799, 'epoch': 0.58}
{'loss': 1.0375, 'learning_rate': 0.00019070085060561852, 'epoch': 0.58}
{'loss': 1.0904, 'learning_rate': 0.0001906523659297276, 'epoch': 0.58}
{'loss': 1.1081, 'learning_rate': 0.0001906037613819197, 'epoch': 0.58}
{'loss': 1.0338, 'learning_rate': 0.00019055503702646576, 'epoch': 0.58}
{'loss': 1.1025, 'learning_rate': 0.0001905061929277953, 'epoch': 0.58}
{'loss': 1.0348, 'learning_rate': 0.00019045722915049607, 'epoch': 0.59}
{'loss': 1.1084, 'learning_rate': 0.00019040814575931413, 'epoch': 0.59}
{'loss': 1.0338, 'learning_rate': 0.00019035894281915368, 'epoch': 0.59}
{'loss': 1.0363, 'learning_rate': 0.00019030962039507702, 'epoch': 0.59}
{'loss': 1.0279, 'learning_rate': 0.00019026017855230444, 'epoch': 0.59}
{'loss': 1.0918, 'learning_rate': 0.00019021061735621412, 'epoch': 0.59}
{'loss': 1.0521, 'learning_rate': 0.0001901609368723421, 'epoch': 0.59}
{'loss': 1.0885, 'learning_rate': 0.00019011113716638217, 'epoch': 0.6}
{'loss': 1.1895, 'learning_rate': 0.00019006121830418565, 'epoch': 0.6}
{'loss': 1.226, 'learning_rate': 0.00019001118035176162, 'epoch': 0.6}
{'loss': 1.018, 'learning_rate': 0.00018996102337527648, 'epoch': 0.6}
{'loss': 1.1071, 'learning_rate': 0.00018991074744105407, 'epoch': 0.6}
{'loss': 1.0587, 'learning_rate': 0.00018986035261557552, 'epoch': 0.6}
{'loss': 1.0012, 'learning_rate': 0.0001898098389654792, 'epoch': 0.6}
{'loss': 1.0688, 'learning_rate': 0.0001897592065575606, 'epoch': 0.61}
{'loss': 1.0061, 'learning_rate': 0.0001897084554587722, 'epoch': 0.61}
{'loss': 1.1045, 'learning_rate': 0.0001896575857362235, 'epoch': 0.61}
{'loss': 1.2132, 'learning_rate': 0.0001896065974571808, 'epoch': 0.61}
{'loss': 1.174, 'learning_rate': 0.00018955549068906717, 'epoch': 0.61}
{'loss': 1.0624, 'learning_rate': 0.0001895042654994624, 'epoch': 0.61}
{'loss': 1.1386, 'learning_rate': 0.00018945292195610288, 'epoch': 0.61}
{'loss': 1.0884, 'learning_rate': 0.00018940146012688146, 'epoch': 0.62}
{'loss': 1.0157, 'learning_rate': 0.0001893498800798474, 'epoch': 0.62}
{'loss': 1.0452, 'learning_rate': 0.00018929818188320635, 'epoch': 0.62}
{'loss': 1.1981, 'learning_rate': 0.00018924636560532006, 'epoch': 0.62}
{'loss': 1.0887, 'learning_rate': 0.00018919443131470658, 'epoch': 0.62}
{'loss': 1.0295, 'learning_rate': 0.0001891423790800399, 'epoch': 0.62}
{'loss': 1.0788, 'learning_rate': 0.00018909020897015004, 'epoch': 0.62}
{'loss': 1.0976, 'learning_rate': 0.00018903792105402282, 'epoch': 0.63}
{'loss': 1.0467, 'learning_rate': 0.00018898551540079989, 'epoch': 0.63}
{'loss': 1.0741, 'learning_rate': 0.00018893299207977857, 'epoch': 0.63}
{'loss': 1.1081, 'learning_rate': 0.00018888035116041175, 'epoch': 0.63}
{'loss': 1.1227, 'learning_rate': 0.0001888275927123079, 'epoch': 0.63}
{'loss': 1.0104, 'learning_rate': 0.00018877471680523082, 'epoch': 0.63}
{'loss': 1.1211, 'learning_rate': 0.00018872172350909968, 'epoch': 0.64}
{'loss': 1.0695, 'learning_rate': 0.00018866861289398883, 'epoch': 0.64}
{'loss': 1.0582, 'learning_rate': 0.0001886153850301278, 'epoch': 0.64}
{'loss': 1.0773, 'learning_rate': 0.00018856203998790112, 'epoch': 0.64}
{'loss': 1.0354, 'learning_rate': 0.0001885085778378483, 'epoch': 0.64}
{'loss': 1.0898, 'learning_rate': 0.00018845499865066372, 'epoch': 0.64}
{'loss': 1.1125, 'learning_rate': 0.00018840130249719644, 'epoch': 0.64}
{'loss': 1.0007, 'learning_rate': 0.00018834748944845028, 'epoch': 0.65}
{'loss': 1.0058, 'learning_rate': 0.00018829355957558362, 'epoch': 0.65}
{'loss': 1.1075, 'learning_rate': 0.00018823951294990923, 'epoch': 0.65}
{'loss': 1.1168, 'learning_rate': 0.0001881853496428944, 'epoch': 0.65}
{'loss': 1.0664, 'learning_rate': 0.00018813106972616055, 'epoch': 0.65}
{'loss': 1.0699, 'learning_rate': 0.00018807667327148345, 'epoch': 0.65}
{'loss': 1.0609, 'learning_rate': 0.00018802216035079293, 'epoch': 0.65}
{'loss': 1.0355, 'learning_rate': 0.00018796753103617278, 'epoch': 0.66}
{'loss': 1.0838, 'learning_rate': 0.0001879127853998607, 'epoch': 0.66}
{'loss': 1.1292, 'learning_rate': 0.00018785792351424827, 'epoch': 0.66}
{'loss': 1.2015, 'learning_rate': 0.0001878029454518807, 'epoch': 0.66}
{'loss': 1.0179, 'learning_rate': 0.00018774785128545694, 'epoch': 0.66}
{'loss': 1.0756, 'learning_rate': 0.00018769264108782933, 'epoch': 0.66}
{'loss': 1.0685, 'learning_rate': 0.00018763731493200375, 'epoch': 0.66}
{'loss': 1.1266, 'learning_rate': 0.00018758187289113937, 'epoch': 0.67}
{'loss': 1.0647, 'learning_rate': 0.00018752631503854864, 'epoch': 0.67}
{'loss': 1.1856, 'learning_rate': 0.00018747064144769703, 'epoch': 0.67}
{'loss': 1.0502, 'learning_rate': 0.0001874148521922032, 'epoch': 0.67}
{'loss': 1.0987, 'learning_rate': 0.00018735894734583867, 'epoch': 0.67}
{'loss': 0.9963, 'learning_rate': 0.00018730292698252785, 'epoch': 0.67}
{'loss': 1.1398, 'learning_rate': 0.0001872467911763479, 'epoch': 0.67}
{'loss': 1.0785, 'learning_rate': 0.00018719054000152855, 'epoch': 0.68}
{'loss': 1.0632, 'learning_rate': 0.00018713417353245223, 'epoch': 0.68}
{'loss': 1.0651, 'learning_rate': 0.00018707769184365367, 'epoch': 0.68}
{'loss': 1.0689, 'learning_rate': 0.0001870210950098201, 'epoch': 0.68}
{'loss': 1.0895, 'learning_rate': 0.00018696438310579093, 'epoch': 0.68}
{'loss': 1.0064, 'learning_rate': 0.00018690755620655774, 'epoch': 0.68}
{'loss': 1.0768, 'learning_rate': 0.00018685061438726414, 'epoch': 0.68}
{'loss': 1.0823, 'learning_rate': 0.00018679355772320585, 'epoch': 0.69}
{'loss': 1.0314, 'learning_rate': 0.00018673638628983018, 'epoch': 0.69}
{'loss': 1.0868, 'learning_rate': 0.00018667910016273648, 'epoch': 0.69}
{'loss': 1.0713, 'learning_rate': 0.00018662169941767562, 'epoch': 0.69}
{'loss': 1.1126, 'learning_rate': 0.00018656418413055007, 'epoch': 0.69}
{'loss': 1.1445, 'learning_rate': 0.00018650655437741368, 'epoch': 0.69}
{'loss': 1.0327, 'learning_rate': 0.00018644881023447177, 'epoch': 0.69}
{'loss': 1.0334, 'learning_rate': 0.00018639095177808095, 'epoch': 0.7}
{'loss': 1.0707, 'learning_rate': 0.0001863329790847488, 'epoch': 0.7}
{'loss': 1.1176, 'learning_rate': 0.00018627489223113422, 'epoch': 0.7}
{'loss': 0.9961, 'learning_rate': 0.0001862166912940468, 'epoch': 0.7}
{'loss': 1.1016, 'learning_rate': 0.00018615837635044716, 'epoch': 0.7}
{'loss': 1.1082, 'learning_rate': 0.0001860999474774466, 'epoch': 0.7}
{'loss': 1.1234, 'learning_rate': 0.00018604140475230715, 'epoch': 0.7}
{'loss': 1.0627, 'learning_rate': 0.0001859827482524413, 'epoch': 0.71}
{'loss': 1.0814, 'learning_rate': 0.00018592397805541205, 'epoch': 0.71}
{'loss': 1.0698, 'learning_rate': 0.00018586509423893267, 'epoch': 0.71}
{'loss': 1.0616, 'learning_rate': 0.00018580609688086678, 'epoch': 0.71}
{'loss': 1.0523, 'learning_rate': 0.000185746986059228, 'epoch': 0.71}
{'loss': 1.1709, 'learning_rate': 0.00018568776185218016, 'epoch': 0.71}
{'loss': 1.0101, 'learning_rate': 0.00018562842433803687, 'epoch': 0.72}
{'loss': 1.178, 'learning_rate': 0.00018556897359526162, 'epoch': 0.72}
{'loss': 1.0737, 'learning_rate': 0.0001855094097024677, 'epoch': 0.72}
{'loss': 1.0922, 'learning_rate': 0.00018544973273841784, 'epoch': 0.72}
{'loss': 1.088, 'learning_rate': 0.00018538994278202448, 'epoch': 0.72}
{'loss': 1.0477, 'learning_rate': 0.00018533003991234937, 'epoch': 0.72}
{'loss': 1.0246, 'learning_rate': 0.00018527002420860362, 'epoch': 0.72}
{'loss': 1.0326, 'learning_rate': 0.00018520989575014746, 'epoch': 0.73}
{'loss': 1.0096, 'learning_rate': 0.0001851496546164903, 'epoch': 0.73}
{'loss': 1.1709, 'learning_rate': 0.00018508930088729052, 'epoch': 0.73}
{'loss': 1.0801, 'learning_rate': 0.0001850288346423554, 'epoch': 0.73}
{'loss': 1.0792, 'learning_rate': 0.00018496825596164094, 'epoch': 0.73}
{'loss': 0.9567, 'learning_rate': 0.00018490756492525187, 'epoch': 0.73}
{'loss': 1.0395, 'learning_rate': 0.0001848467616134415, 'epoch': 0.73}
{'loss': 1.0034, 'learning_rate': 0.0001847858461066116, 'epoch': 0.74}
{'loss': 1.1287, 'learning_rate': 0.00018472481848531226, 'epoch': 0.74}
{'loss': 1.0392, 'learning_rate': 0.00018466367883024186, 'epoch': 0.74}
{'loss': 1.0144, 'learning_rate': 0.00018460242722224694, 'epoch': 0.74}
{'loss': 1.1198, 'learning_rate': 0.00018454106374232197, 'epoch': 0.74}
{'loss': 1.0975, 'learning_rate': 0.00018447958847160953, 'epoch': 0.74}
{'loss': 1.0869, 'learning_rate': 0.00018441800149139988, 'epoch': 0.74}
{'loss': 0.9778, 'learning_rate': 0.000184356302883131, 'epoch': 0.75}
{'loss': 1.0177, 'learning_rate': 0.0001842944927283886, 'epoch': 0.75}
{'loss': 1.0183, 'learning_rate': 0.00018423257110890574, 'epoch': 0.75}
{'loss': 1.11, 'learning_rate': 0.00018417053810656302, 'epoch': 0.75}
19%|█████████████████████▏ | 516/2752 [08:53<37:43, 1.01s/it][2023-12-29 02:12:18,530] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,530] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,530] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,530] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,532] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,532] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,532] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,786] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,786] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,787] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,787] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,048] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,049] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,309] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,310] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,558] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,559] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,844] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,845] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,106] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,107] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,627] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,628] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,885] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,885] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,154] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,155] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,419] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,420] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,685] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0167537927627563, 'eval_runtime': 3.1655, 'eval_samples_per_second': 344.974, 'eval_steps_per_second': 21.798, 'epoch': 0.75}
{'loss': 1.0633, 'learning_rate': 0.00018410839380338817, 'epoch': 0.75}
{'loss': 1.0491, 'learning_rate': 0.00018404613828155623, 'epoch': 0.75}
{'loss': 1.0262, 'learning_rate': 0.00018398377162338924, 'epoch': 0.75}
{'loss': 1.077, 'learning_rate': 0.0001839212939113562, 'epoch': 0.76}
{'loss': 0.9969, 'learning_rate': 0.00018385870522807295, 'epoch': 0.76}
{'loss': 1.1163, 'learning_rate': 0.00018379600565630213, 'epoch': 0.76}
{'loss': 1.1191, 'learning_rate': 0.00018373319527895294, 'epoch': 0.76}
{'loss': 1.0903, 'learning_rate': 0.00018367027417908117, 'epoch': 0.76}
{'loss': 1.1652, 'learning_rate': 0.0001836072424398889, 'epoch': 0.76}
{'loss': 1.0696, 'learning_rate': 0.0001835441001447247, 'epoch': 0.76}
{'loss': 1.1756, 'learning_rate': 0.0001834808473770832, 'epoch': 0.77}
{'loss': 1.0519, 'learning_rate': 0.00018341748422060503, 'epoch': 0.77}
{'loss': 1.125, 'learning_rate': 0.000183354010759077, 'epoch': 0.77}
{'loss': 0.9652, 'learning_rate': 0.00018329042707643164, 'epoch': 0.77}
{'loss': 1.191, 'learning_rate': 0.0001832267332567473, 'epoch': 0.77}
{'loss': 1.0471, 'learning_rate': 0.00018316292938424787, 'epoch': 0.77}
{'loss': 1.204, 'learning_rate': 0.00018309901554330288, 'epoch': 0.77}
{'loss': 1.1012, 'learning_rate': 0.0001830349918184272, 'epoch': 0.78}
{'loss': 1.0106, 'learning_rate': 0.000182970858294281, 'epoch': 0.78}
{'loss': 1.0811, 'learning_rate': 0.00018290661505566963, 'epoch': 0.78}
{'loss': 1.1186, 'learning_rate': 0.00018284226218754363, 'epoch': 0.78}
{'loss': 1.0943, 'learning_rate': 0.0001827777997749984, 'epoch': 0.78}
{'loss': 1.0597, 'learning_rate': 0.0001827132279032742, 'epoch': 0.78}
{'loss': 1.036, 'learning_rate': 0.00018264854665775605, 'epoch': 0.78}
{'loss': 1.0876, 'learning_rate': 0.00018258375612397365, 'epoch': 0.79}
{'loss': 1.0605, 'learning_rate': 0.00018251885638760105, 'epoch': 0.79}
{'loss': 1.1032, 'learning_rate': 0.00018245384753445693, 'epoch': 0.79}
{'loss': 1.1089, 'learning_rate': 0.0001823887296505041, 'epoch': 0.79}
{'loss': 1.0279, 'learning_rate': 0.00018232350282184956, 'epoch': 0.79}
{'loss': 1.0117, 'learning_rate': 0.0001822581671347444, 'epoch': 0.79}
{'loss': 1.1945, 'learning_rate': 0.0001821927226755837, 'epoch': 0.8}
{'loss': 1.1076, 'learning_rate': 0.00018212716953090624, 'epoch': 0.8}
{'loss': 1.0742, 'learning_rate': 0.0001820615077873947, 'epoch': 0.8}
{'loss': 1.1111, 'learning_rate': 0.0001819957375318752, 'epoch': 0.8}
{'loss': 1.1079, 'learning_rate': 0.00018192985885131743, 'epoch': 0.8}
{'loss': 1.1471, 'learning_rate': 0.00018186387183283443, 'epoch': 0.8}
{'loss': 1.0644, 'learning_rate': 0.00018179777656368253, 'epoch': 0.8}
{'loss': 0.9979, 'learning_rate': 0.00018173157313126114, 'epoch': 0.81}
{'loss': 1.101, 'learning_rate': 0.00018166526162311276, 'epoch': 0.81}
{'loss': 1.0195, 'learning_rate': 0.00018159884212692274, 'epoch': 0.81}
{'loss': 1.0995, 'learning_rate': 0.00018153231473051933, 'epoch': 0.81}
{'loss': 1.0479, 'learning_rate': 0.00018146567952187333, 'epoch': 0.81}
{'loss': 1.0001, 'learning_rate': 0.00018139893658909817, 'epoch': 0.81}
{'loss': 1.0848, 'learning_rate': 0.00018133208602044972, 'epoch': 0.81}
{'loss': 1.0957, 'learning_rate': 0.0001812651279043262, 'epoch': 0.82}
{'loss': 1.0671, 'learning_rate': 0.000181198062329268, 'epoch': 0.82}
{'loss': 1.0095, 'learning_rate': 0.00018113088938395762, 'epoch': 0.82}
{'loss': 1.046, 'learning_rate': 0.00018106360915721956, 'epoch': 0.82}
{'loss': 1.1351, 'learning_rate': 0.00018099622173802014, 'epoch': 0.82}
{'loss': 1.0051, 'learning_rate': 0.00018092872721546754, 'epoch': 0.82}
{'loss': 1.1066, 'learning_rate': 0.00018086112567881137, 'epoch': 0.82}
{'loss': 1.0311, 'learning_rate': 0.0001807934172174429, 'epoch': 0.83}
{'loss': 1.0088, 'learning_rate': 0.0001807256019208947, 'epoch': 0.83}
{'loss': 0.981, 'learning_rate': 0.00018065767987884073, 'epoch': 0.83}
{'loss': 1.0371, 'learning_rate': 0.00018058965118109593, 'epoch': 0.83}
{'loss': 1.1011, 'learning_rate': 0.00018052151591761644, 'epoch': 0.83}
{'loss': 1.0709, 'learning_rate': 0.00018045327417849923, 'epoch': 0.83}
{'loss': 1.075, 'learning_rate': 0.00018038492605398205, 'epoch': 0.83}
{'loss': 1.0475, 'learning_rate': 0.00018031647163444339, 'epoch': 0.84}
{'loss': 1.0734, 'learning_rate': 0.0001802479110104022, 'epoch': 0.84}
{'loss': 1.0387, 'learning_rate': 0.000180179244272518, 'epoch': 0.84}
{'loss': 1.0572, 'learning_rate': 0.00018011047151159052, 'epoch': 0.84}
{'loss': 1.0918, 'learning_rate': 0.00018004159281855974, 'epoch': 0.84}
{'loss': 1.1378, 'learning_rate': 0.0001799726082845057, 'epoch': 0.84}
{'loss': 1.0488, 'learning_rate': 0.00017990351800064834, 'epoch': 0.84}
{'loss': 1.0926, 'learning_rate': 0.00017983432205834755, 'epoch': 0.85}
{'loss': 1.1045, 'learning_rate': 0.00017976502054910286, 'epoch': 0.85}
{'loss': 1.0701, 'learning_rate': 0.00017969561356455336, 'epoch': 0.85}
{'loss': 1.091, 'learning_rate': 0.00017962610119647777, 'epoch': 0.85}
{'loss': 1.0587, 'learning_rate': 0.00017955648353679398, 'epoch': 0.85}
{'loss': 1.1946, 'learning_rate': 0.00017948676067755916, 'epoch': 0.85}
{'loss': 1.1324, 'learning_rate': 0.00017941693271096966, 'epoch': 0.85}
{'loss': 1.0977, 'learning_rate': 0.00017934699972936075, 'epoch': 0.86}
{'loss': 1.1361, 'learning_rate': 0.00017927696182520658, 'epoch': 0.86}
{'loss': 1.0616, 'learning_rate': 0.00017920681909112008, 'epoch': 0.86}
{'loss': 1.0506, 'learning_rate': 0.00017913657161985268, 'epoch': 0.86}
{'loss': 1.0772, 'learning_rate': 0.00017906621950429443, 'epoch': 0.86}
{'loss': 1.1548, 'learning_rate': 0.00017899576283747373, 'epoch': 0.86}
{'loss': 1.0599, 'learning_rate': 0.0001789252017125572, 'epoch': 0.86}
{'loss': 1.0945, 'learning_rate': 0.0001788545362228496, 'epoch': 0.87}
{'loss': 1.0675, 'learning_rate': 0.00017878376646179368, 'epoch': 0.87}
{'loss': 1.0966, 'learning_rate': 0.00017871289252297011, 'epoch': 0.87}
{'loss': 1.1361, 'learning_rate': 0.0001786419145000973, 'epoch': 0.87}
{'loss': 1.0963, 'learning_rate': 0.00017857083248703126, 'epoch': 0.87}
{'loss': 1.0654, 'learning_rate': 0.00017849964657776552, 'epoch': 0.87}
{'loss': 0.9858, 'learning_rate': 0.00017842835686643108, 'epoch': 0.88}
{'loss': 1.0749, 'learning_rate': 0.00017835696344729605, 'epoch': 0.88}
{'loss': 1.0994, 'learning_rate': 0.00017828546641476578, 'epoch': 0.88}
{'loss': 1.0596, 'learning_rate': 0.0001782138658633826, 'epoch': 0.88}
{'loss': 0.9941, 'learning_rate': 0.00017814216188782577, 'epoch': 0.88}
{'loss': 1.0438, 'learning_rate': 0.00017807035458291122, 'epoch': 0.88}
{'loss': 1.1233, 'learning_rate': 0.0001779984440435916, 'epoch': 0.88}
{'loss': 1.0819, 'learning_rate': 0.000177926430364956, 'epoch': 0.89}
{'loss': 1.1729, 'learning_rate': 0.00017785431364222997, 'epoch': 0.89}
{'loss': 1.0372, 'learning_rate': 0.00017778209397077528, 'epoch': 0.89}
{'loss': 1.0326, 'learning_rate': 0.00017770977144608978, 'epoch': 0.89}
{'loss': 1.0828, 'learning_rate': 0.0001776373461638074, 'epoch': 0.89}
{'loss': 1.0522, 'learning_rate': 0.00017756481821969798, 'epoch': 0.89}
{'loss': 1.0151, 'learning_rate': 0.00017749218770966692, 'epoch': 0.89}
{'loss': 1.1541, 'learning_rate': 0.0001774194547297555, 'epoch': 0.9}
{'loss': 1.0455, 'learning_rate': 0.00017734661937614035, 'epoch': 0.9}
{'loss': 1.0421, 'learning_rate': 0.00017727368174513347, 'epoch': 0.9}
{'loss': 1.0451, 'learning_rate': 0.0001772006419331822, 'epoch': 0.9}
{'loss': 1.169, 'learning_rate': 0.00017712750003686883, 'epoch': 0.9}
{'loss': 1.1022, 'learning_rate': 0.00017705425615291084, 'epoch': 0.9}
{'loss': 1.0964, 'learning_rate': 0.00017698091037816042, 'epoch': 0.9}
{'loss': 1.0322, 'learning_rate': 0.00017690746280960454, 'epoch': 0.91}
{'loss': 1.0325, 'learning_rate': 0.0001768339135443648, 'epoch': 0.91}
{'loss': 1.0452, 'learning_rate': 0.00017676026267969728, 'epoch': 0.91}
{'loss': 1.0745, 'learning_rate': 0.0001766865103129923, 'epoch': 0.91}
{'loss': 1.1599, 'learning_rate': 0.00017661265654177454, 'epoch': 0.91}
{'loss': 0.9833, 'learning_rate': 0.00017653870146370267, 'epoch': 0.91}
{'loss': 1.0006, 'learning_rate': 0.00017646464517656943, 'epoch': 0.91}
{'loss': 1.0776, 'learning_rate': 0.0001763904877783013, 'epoch': 0.92}
{'loss': 1.011, 'learning_rate': 0.0001763162293669584, 'epoch': 0.92}
{'loss': 1.0316, 'learning_rate': 0.00017624187004073463, 'epoch': 0.92}
{'loss': 1.0778, 'learning_rate': 0.0001761674098979571, 'epoch': 0.92}
{'loss': 1.0309, 'learning_rate': 0.00017609284903708644, 'epoch': 0.92}
{'loss': 1.0083, 'learning_rate': 0.0001760181875567163, 'epoch': 0.92}
{'loss': 1.1088, 'learning_rate': 0.0001759434255555734, 'epoch': 0.92}
{'loss': 1.014, 'learning_rate': 0.00017586856313251756, 'epoch': 0.93}
{'loss': 1.0578, 'learning_rate': 0.00017579360038654114, 'epoch': 0.93}
{'loss': 1.0651, 'learning_rate': 0.00017571853741676932, 'epoch': 0.93}
{'loss': 1.1492, 'learning_rate': 0.00017564337432245976, 'epoch': 0.93}
{'loss': 1.0673, 'learning_rate': 0.00017556811120300253, 'epoch': 0.93}
{'loss': 1.0684, 'learning_rate': 0.00017549274815791994, 'epoch': 0.93}
{'loss': 1.0571, 'learning_rate': 0.00017541728528686645, 'epoch': 0.93}
{'loss': 1.0359, 'learning_rate': 0.00017534172268962852, 'epoch': 0.94}
{'loss': 1.1768, 'learning_rate': 0.00017526606046612452, 'epoch': 0.94}
{'loss': 1.0608, 'learning_rate': 0.0001751902987164045, 'epoch': 0.94}
{'loss': 1.0877, 'learning_rate': 0.00017511443754065012, 'epoch': 0.94}
{'loss': 1.0676, 'learning_rate': 0.00017503847703917455, 'epoch': 0.94}
{'loss': 1.073, 'learning_rate': 0.0001749624173124223, 'epoch': 0.94}
{'loss': 0.9852, 'learning_rate': 0.00017488625846096904, 'epoch': 0.94}
{'loss': 1.0669, 'learning_rate': 0.00017481000058552156, 'epoch': 0.95}
{'loss': 1.1724, 'learning_rate': 0.0001747336437869176, 'epoch': 0.95}
{'loss': 1.0379, 'learning_rate': 0.00017465718816612563, 'epoch': 0.95}
{'loss': 1.0634, 'learning_rate': 0.00017458063382424488, 'epoch': 0.95}
{'loss': 1.028, 'learning_rate': 0.00017450398086250513, 'epoch': 0.95}
{'loss': 1.0724, 'learning_rate': 0.00017442722938226647, 'epoch': 0.95}
{'loss': 1.0603, 'learning_rate': 0.00017435037948501935, 'epoch': 0.95}
{'loss': 1.0566, 'learning_rate': 0.00017427343127238439, 'epoch': 0.96}
{'loss': 0.9961, 'learning_rate': 0.00017419638484611206, 'epoch': 0.96}
{'loss': 1.0443, 'learning_rate': 0.00017411924030808284, 'epoch': 0.96}
{'loss': 1.0428, 'learning_rate': 0.0001740419977603069, 'epoch': 0.96}
{'loss': 1.0352, 'learning_rate': 0.00017396465730492406, 'epoch': 0.96}
{'loss': 1.0107, 'learning_rate': 0.00017388721904420352, 'epoch': 0.96}
{'loss': 1.0954, 'learning_rate': 0.00017380968308054385, 'epoch': 0.97}
{'loss': 1.129, 'learning_rate': 0.0001737320495164728, 'epoch': 0.97}
{'loss': 1.0696, 'learning_rate': 0.00017365431845464723, 'epoch': 0.97}
{'loss': 1.034, 'learning_rate': 0.0001735764899978529, 'epoch': 0.97}
{'loss': 1.0322, 'learning_rate': 0.0001734985642490043, 'epoch': 0.97}
{'loss': 1.0417, 'learning_rate': 0.00017342054131114465, 'epoch': 0.97}
{'loss': 1.0695, 'learning_rate': 0.00017334242128744568, 'epoch': 0.97}
{'loss': 1.1169, 'learning_rate': 0.0001732642042812074, 'epoch': 0.98}
{'loss': 0.9709, 'learning_rate': 0.00017318589039585816, 'epoch': 0.98}
{'loss': 0.9522, 'learning_rate': 0.00017310747973495446, 'epoch': 0.98}
{'loss': 1.0343, 'learning_rate': 0.00017302897240218065, 'epoch': 0.98}
{'loss': 1.0656, 'learning_rate': 0.00017295036850134893, 'epoch': 0.98}
{'loss': 1.1269, 'learning_rate': 0.0001728716681363993, 'epoch': 0.98}
{'loss': 1.0884, 'learning_rate': 0.00017279287141139918, 'epoch': 0.98}
{'loss': 0.979, 'learning_rate': 0.00017271397843054352, 'epoch': 0.99}
{'loss': 1.0008, 'learning_rate': 0.00017263498929815448, 'epoch': 0.99}
{'loss': 1.0743, 'learning_rate': 0.00017255590411868136, 'epoch': 0.99}
{'loss': 1.0851, 'learning_rate': 0.00017247672299670053, 'epoch': 0.99}
{'loss': 1.0137, 'learning_rate': 0.00017239744603691524, 'epoch': 0.99}
{'loss': 1.0706, 'learning_rate': 0.00017231807334415532, 'epoch': 0.99}
{'loss': 1.1174, 'learning_rate': 0.00017223860502337733, 'epoch': 0.99}
{'loss': 0.9967, 'learning_rate': 0.00017215904117966427, 'epoch': 1.0}
{'loss': 0.9764, 'learning_rate': 0.0001720793819182254, 'epoch': 1.0}
{'loss': 1.021, 'learning_rate': 0.00017199962734439618, 'epoch': 1.0}
{'loss': 1.0658, 'learning_rate': 0.00017191977756363808, 'epoch': 1.0}
25%|████████████████████████████▎ | 688/2752 [11:50<34:55, 1.02s/it][2023-12-29 02:15:16,017] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,017] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,017] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,017] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,019] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,019] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,019] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,273] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,274] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,274] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,275] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,536] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,536] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,799] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,800] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,052] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,053] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,342] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,342] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,609] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,610] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,866] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,866] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,135] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,135] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,399] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,400] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,669] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,670] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:19,197] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:19,198] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0015047788619995, 'eval_runtime': 3.191, 'eval_samples_per_second': 342.217, 'eval_steps_per_second': 21.624, 'epoch': 1.0}
{'loss': 1.153, 'learning_rate': 0.00017183983268153849, 'epoch': 1.0}
{'loss': 1.1034, 'learning_rate': 0.00017175979280381056, 'epoch': 1.0}
{'loss': 1.1087, 'learning_rate': 0.00017167965803629307, 'epoch': 1.0}
{'loss': 1.072, 'learning_rate': 0.00017159942848495025, 'epoch': 1.01}
{'loss': 1.0972, 'learning_rate': 0.00017151910425587162, 'epoch': 1.01}
{'loss': 1.0909, 'learning_rate': 0.00017143868545527196, 'epoch': 1.01}
{'loss': 1.1009, 'learning_rate': 0.00017135817218949108, 'epoch': 1.01}
{'loss': 1.1206, 'learning_rate': 0.00017127756456499372, 'epoch': 1.01}
{'loss': 1.0852, 'learning_rate': 0.0001711968626883694, 'epoch': 1.01}
{'loss': 1.0235, 'learning_rate': 0.00017111606666633225, 'epoch': 1.01}
{'loss': 1.1133, 'learning_rate': 0.00017103517660572087, 'epoch': 1.02}
{'loss': 1.0687, 'learning_rate': 0.0001709541926134982, 'epoch': 1.02}
{'loss': 1.1827, 'learning_rate': 0.00017087311479675147, 'epoch': 1.02}
{'loss': 1.0509, 'learning_rate': 0.00017079194326269194, 'epoch': 1.02}
{'loss': 1.101, 'learning_rate': 0.00017071067811865476, 'epoch': 1.02}
26%|████████████████████████████▊ | 703/2752 [12:09<34:57, 1.02s/it][2023-12-29 02:15:34,683] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,685] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,686] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,715] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,716] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,719] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,945] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
{'loss': 1.0668, 'learning_rate': 0.0001706293194720989, 'epoch': 1.0}
{'loss': 0.9812, 'learning_rate': 0.00017054786743060692, 'epoch': 1.0}
{'loss': 1.1106, 'learning_rate': 0.00017046632210188496, 'epoch': 1.0}
{'loss': 0.9909, 'learning_rate': 0.00017038468359376245, 'epoch': 1.01}
{'loss': 1.074, 'learning_rate': 0.00017030295201419206, 'epoch': 1.01}
{'loss': 1.1153, 'learning_rate': 0.0001702211274712495, 'epoch': 1.01}
{'loss': 0.9878, 'learning_rate': 0.00017013921007313348, 'epoch': 1.01}
{'loss': 1.0902, 'learning_rate': 0.00017005719992816546, 'epoch': 1.01}
{'loss': 1.0829, 'learning_rate': 0.00016997509714478944, 'epoch': 1.01}
{'loss': 1.0556, 'learning_rate': 0.00016989290183157206, 'epoch': 1.01}
{'loss': 1.0264, 'learning_rate': 0.0001698106140972023, 'epoch': 1.02}
{'loss': 1.0451, 'learning_rate': 0.00016972823405049124, 'epoch': 1.02}
{'loss': 1.194, 'learning_rate': 0.00016964576180037217, 'epoch': 1.02}
{'loss': 1.0291, 'learning_rate': 0.00016956319745590017, 'epoch': 1.02}
{'loss': 1.0089, 'learning_rate': 0.00016948054112625222, 'epoch': 1.02}
{'loss': 1.0672, 'learning_rate': 0.00016939779292072683, 'epoch': 1.02}
{'loss': 1.1044, 'learning_rate': 0.00016931495294874408, 'epoch': 1.02}
{'loss': 1.0185, 'learning_rate': 0.0001692320213198453, 'epoch': 1.03}
{'loss': 1.0347, 'learning_rate': 0.0001691489981436932, 'epoch': 1.03}
{'loss': 1.0306, 'learning_rate': 0.00016906588353007132, 'epoch': 1.03}
{'loss': 1.022, 'learning_rate': 0.00016898267758888423, 'epoch': 1.03}
{'loss': 1.0474, 'learning_rate': 0.00016889938043015726, 'epoch': 1.03}
{'loss': 1.1231, 'learning_rate': 0.0001688159921640364, 'epoch': 1.03}
{'loss': 1.0324, 'learning_rate': 0.000168732512900788, 'epoch': 1.03}
{'loss': 0.9465, 'learning_rate': 0.00016864894275079882, 'epoch': 1.04}
{'loss': 1.0117, 'learning_rate': 0.0001685652818245758, 'epoch': 1.04}
{'loss': 1.0863, 'learning_rate': 0.0001684815302327459, 'epoch': 1.04}
{'loss': 1.0413, 'learning_rate': 0.00016839768808605594, 'epoch': 1.04}
{'loss': 1.1089, 'learning_rate': 0.00016831375549537252, 'epoch': 1.04}
{'loss': 0.9819, 'learning_rate': 0.00016822973257168186, 'epoch': 1.04}
{'loss': 1.0439, 'learning_rate': 0.00016814561942608957, 'epoch': 1.05}
{'loss': 1.0662, 'learning_rate': 0.00016806141616982059, 'epoch': 1.05}
{'loss': 0.9847, 'learning_rate': 0.00016797712291421904, 'epoch': 1.05}
{'loss': 1.0726, 'learning_rate': 0.00016789273977074797, 'epoch': 1.05}
{'loss': 1.0325, 'learning_rate': 0.00016780826685098942, 'epoch': 1.05}
{'loss': 1.073, 'learning_rate': 0.00016772370426664402, 'epoch': 1.05}
{'loss': 1.0774, 'learning_rate': 0.00016763905212953102, 'epoch': 1.05}
{'loss': 1.0768, 'learning_rate': 0.00016755431055158807, 'epoch': 1.06}
{'loss': 1.0674, 'learning_rate': 0.00016746947964487116, 'epoch': 1.06}
{'loss': 0.9796, 'learning_rate': 0.0001673845595215543, 'epoch': 1.06}
{'loss': 1.029, 'learning_rate': 0.0001672995502939295, 'epoch': 1.06}
{'loss': 1.0137, 'learning_rate': 0.00016721445207440664, 'epoch': 1.06}
{'loss': 1.0265, 'learning_rate': 0.00016712926497551326, 'epoch': 1.06}
{'loss': 1.1064, 'learning_rate': 0.0001670439891098944, 'epoch': 1.06}
{'loss': 1.0185, 'learning_rate': 0.00016695862459031248, 'epoch': 1.07}
{'loss': 1.027, 'learning_rate': 0.00016687317152964718, 'epoch': 1.07}
{'loss': 1.1075, 'learning_rate': 0.00016678763004089527, 'epoch': 1.07}
{'loss': 1.02, 'learning_rate': 0.00016670200023717038, 'epoch': 1.07}
{'loss': 1.0088, 'learning_rate': 0.00016661628223170295, 'epoch': 1.07}
{'loss': 1.0402, 'learning_rate': 0.0001665304761378401, 'epoch': 1.07}
{'loss': 1.0554, 'learning_rate': 0.00016644458206904546, 'epoch': 1.07}
{'loss': 0.9861, 'learning_rate': 0.00016635860013889886, 'epoch': 1.08}
{'loss': 0.9996, 'learning_rate': 0.00016627253046109638, 'epoch': 1.08}
{'loss': 1.0255, 'learning_rate': 0.00016618637314945014, 'epoch': 1.08}
{'loss': 1.0184, 'learning_rate': 0.00016610012831788813, 'epoch': 1.08}
{'loss': 0.948, 'learning_rate': 0.00016601379608045406, 'epoch': 1.08}
{'loss': 0.9442, 'learning_rate': 0.0001659273765513073, 'epoch': 1.08}
{'loss': 1.0535, 'learning_rate': 0.0001658408698447225, 'epoch': 1.08}
{'loss': 1.0937, 'learning_rate': 0.00016575427607508974, 'epoch': 1.09}
{'loss': 0.9786, 'learning_rate': 0.00016566759535691406, 'epoch': 1.09}
{'loss': 1.0295, 'learning_rate': 0.00016558082780481563, 'epoch': 1.09}
{'loss': 1.0469, 'learning_rate': 0.00016549397353352938, 'epoch': 1.09}
{'loss': 1.0665, 'learning_rate': 0.0001654070326579049, 'epoch': 1.09}
{'loss': 1.0111, 'learning_rate': 0.0001653200052929063, 'epoch': 1.09}
{'loss': 1.0803, 'learning_rate': 0.00016523289155361204, 'epoch': 1.09}
{'loss': 0.9923, 'learning_rate': 0.00016514569155521493, 'epoch': 1.1}
{'loss': 1.013, 'learning_rate': 0.0001650584054130216, 'epoch': 1.1}
{'loss': 0.9378, 'learning_rate': 0.00016497103324245282, 'epoch': 1.1}
{'loss': 1.0145, 'learning_rate': 0.00016488357515904295, 'epoch': 1.1}
{'loss': 1.0332, 'learning_rate': 0.0001647960312784401, 'epoch': 1.1}
{'loss': 1.0591, 'learning_rate': 0.0001647084017164057, 'epoch': 1.1}
{'loss': 1.05, 'learning_rate': 0.00016462068658881456, 'epoch': 1.1}
{'loss': 0.9587, 'learning_rate': 0.0001645328860116546, 'epoch': 1.11}
{'loss': 1.0445, 'learning_rate': 0.00016444500010102676, 'epoch': 1.11}
{'loss': 0.9294, 'learning_rate': 0.00016435702897314478, 'epoch': 1.11}
{'loss': 0.9881, 'learning_rate': 0.00016426897274433513, 'epoch': 1.11}
{'loss': 1.0397, 'learning_rate': 0.00016418083153103683, 'epoch': 1.11}
{'loss': 0.8706, 'learning_rate': 0.00016409260544980115, 'epoch': 1.11}
{'loss': 1.0591, 'learning_rate': 0.0001640042946172917, 'epoch': 1.11}
{'loss': 1.0461, 'learning_rate': 0.00016391589915028417, 'epoch': 1.12}
{'loss': 1.0836, 'learning_rate': 0.0001638274191656661, 'epoch': 1.12}
{'loss': 1.0256, 'learning_rate': 0.00016373885478043672, 'epoch': 1.12}
{'loss': 0.9782, 'learning_rate': 0.00016365020611170712, 'epoch': 1.12}
{'loss': 1.024, 'learning_rate': 0.0001635614732766996, 'epoch': 1.12}
{'loss': 0.9892, 'learning_rate': 0.00016347265639274778, 'epoch': 1.12}
{'loss': 1.0796, 'learning_rate': 0.00016338375557729658, 'epoch': 1.12}
{'loss': 0.984, 'learning_rate': 0.00016329477094790168, 'epoch': 1.13}
{'loss': 1.0713, 'learning_rate': 0.00016320570262222983, 'epoch': 1.13}
{'loss': 1.0742, 'learning_rate': 0.00016311655071805822, 'epoch': 1.13}
{'loss': 1.0199, 'learning_rate': 0.00016302731535327474, 'epoch': 1.13}
{'loss': 1.1462, 'learning_rate': 0.00016293799664587755, 'epoch': 1.13}
{'loss': 1.1392, 'learning_rate': 0.00016284859471397503, 'epoch': 1.13}
{'loss': 1.0094, 'learning_rate': 0.00016275910967578558, 'epoch': 1.14}
{'loss': 1.0022, 'learning_rate': 0.00016266954164963763, 'epoch': 1.14}
{'loss': 1.0915, 'learning_rate': 0.00016257989075396916, 'epoch': 1.14}
{'loss': 0.9938, 'learning_rate': 0.00016249015710732785, 'epoch': 1.14}
{'loss': 1.0792, 'learning_rate': 0.00016240034082837078, 'epoch': 1.14}
{'loss': 0.9668, 'learning_rate': 0.00016231044203586422, 'epoch': 1.14}
{'loss': 1.0253, 'learning_rate': 0.00016222046084868373, 'epoch': 1.14}
{'loss': 0.9409, 'learning_rate': 0.00016213039738581362, 'epoch': 1.15}
{'loss': 1.0344, 'learning_rate': 0.00016204025176634712, 'epoch': 1.15}
{'loss': 1.0437, 'learning_rate': 0.0001619500241094861, 'epoch': 1.15}
{'loss': 1.0711, 'learning_rate': 0.00016185971453454078, 'epoch': 1.15}
{'loss': 0.9458, 'learning_rate': 0.0001617693231609299, 'epoch': 1.15}
{'loss': 1.0187, 'learning_rate': 0.00016167885010818017, 'epoch': 1.15}
{'loss': 0.9752, 'learning_rate': 0.00016158829549592647, 'epoch': 1.15}
{'loss': 0.9931, 'learning_rate': 0.0001614976594439114, 'epoch': 1.16}
{'loss': 1.0036, 'learning_rate': 0.00016140694207198534, 'epoch': 1.16}
{'loss': 1.0178, 'learning_rate': 0.00016131614350010614, 'epoch': 1.16}
{'loss': 0.9627, 'learning_rate': 0.00016122526384833907, 'epoch': 1.16}
{'loss': 1.0381, 'learning_rate': 0.00016113430323685658, 'epoch': 1.16}
{'loss': 0.9023, 'learning_rate': 0.00016104326178593818, 'epoch': 1.16}
{'loss': 1.068, 'learning_rate': 0.00016095213961597033, 'epoch': 1.16}
{'loss': 0.9922, 'learning_rate': 0.0001608609368474461, 'epoch': 1.17}
{'loss': 0.9681, 'learning_rate': 0.00016076965360096535, 'epoch': 1.17}
{'loss': 0.9458, 'learning_rate': 0.00016067828999723405, 'epoch': 1.17}
{'loss': 1.0536, 'learning_rate': 0.00016058684615706477, 'epoch': 1.17}
{'loss': 1.0079, 'learning_rate': 0.0001604953222013759, 'epoch': 1.17}
{'loss': 1.0715, 'learning_rate': 0.0001604037182511919, 'epoch': 1.17}
{'loss': 1.0338, 'learning_rate': 0.00016031203442764307, 'epoch': 1.17}
{'loss': 0.9989, 'learning_rate': 0.00016022027085196516, 'epoch': 1.18}
{'loss': 0.9745, 'learning_rate': 0.00016012842764549952, 'epoch': 1.18}
{'loss': 0.9336, 'learning_rate': 0.0001600365049296927, 'epoch': 1.18}
{'loss': 1.0223, 'learning_rate': 0.0001599445028260965, 'epoch': 1.18}
{'loss': 1.0088, 'learning_rate': 0.0001598524214563675, 'epoch': 1.18}
{'loss': 0.9775, 'learning_rate': 0.0001597602609422674, 'epoch': 1.18}
{'loss': 1.1261, 'learning_rate': 0.00015966802140566225, 'epoch': 1.18}
{'loss': 0.8982, 'learning_rate': 0.0001595757029685228, 'epoch': 1.19}
{'loss': 1.0443, 'learning_rate': 0.00015948330575292401, 'epoch': 1.19}
{'loss': 0.9859, 'learning_rate': 0.00015939082988104505, 'epoch': 1.19}
{'loss': 1.0136, 'learning_rate': 0.00015929827547516914, 'epoch': 1.19}
{'loss': 0.9593, 'learning_rate': 0.0001592056426576833, 'epoch': 1.19}
{'loss': 1.0756, 'learning_rate': 0.0001591129315510782, 'epoch': 1.19}
{'loss': 1.0454, 'learning_rate': 0.00015902014227794816, 'epoch': 1.19}
{'loss': 1.0189, 'learning_rate': 0.00015892727496099075, 'epoch': 1.2}
{'loss': 1.1202, 'learning_rate': 0.00015883432972300674, 'epoch': 1.2}
{'loss': 1.1035, 'learning_rate': 0.00015874130668690003, 'epoch': 1.2}
{'loss': 1.0029, 'learning_rate': 0.0001586482059756773, 'epoch': 1.2}
{'loss': 1.0449, 'learning_rate': 0.00015855502771244798, 'epoch': 1.2}
{'loss': 0.9882, 'learning_rate': 0.00015846177202042406, 'epoch': 1.2}
{'loss': 0.9717, 'learning_rate': 0.00015836843902291984, 'epoch': 1.2}
{'loss': 0.9821, 'learning_rate': 0.000158275028843352, 'epoch': 1.21}
{'loss': 1.0507, 'learning_rate': 0.00015818154160523911, 'epoch': 1.21}
{'loss': 0.9342, 'learning_rate': 0.00015808797743220175, 'epoch': 1.21}
{'loss': 0.9054, 'learning_rate': 0.00015799433644796216, 'epoch': 1.21}
{'loss': 1.0236, 'learning_rate': 0.0001579006187763442, 'epoch': 1.21}
{'loss': 1.0042, 'learning_rate': 0.00015780682454127312, 'epoch': 1.21}
{'loss': 0.9812, 'learning_rate': 0.00015771295386677543, 'epoch': 1.22}
{'loss': 0.9981, 'learning_rate': 0.00015761900687697865, 'epoch': 1.22}
{'loss': 1.0162, 'learning_rate': 0.00015752498369611133, 'epoch': 1.22}
{'loss': 1.03, 'learning_rate': 0.0001574308844485026, 'epoch': 1.22}
{'loss': 1.0454, 'learning_rate': 0.00015733670925858237, 'epoch': 1.22}
{'loss': 1.0052, 'learning_rate': 0.00015724245825088086, 'epoch': 1.22}
{'loss': 1.0582, 'learning_rate': 0.0001571481315500285, 'epoch': 1.22}
{'loss': 1.0621, 'learning_rate': 0.00015705372928075594, 'epoch': 1.23}
{'loss': 0.918, 'learning_rate': 0.00015695925156789366, 'epoch': 1.23}
{'loss': 1.1238, 'learning_rate': 0.00015686469853637192, 'epoch': 1.23}
31%|███████████████████████████████████▎ | 860/2752 [14:49<31:45, 1.01s/it][2023-12-29 02:18:14,763] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,763] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,766] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,766] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,019] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,020] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,021] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,021] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,285] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,286] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,547] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,548] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,798] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,798] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,081] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,082] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,349] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,350] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,601] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,602] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,862] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,863] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,128] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,129] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,394] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,394] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,656] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,656] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,918] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,919] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.997471809387207, 'eval_runtime': 3.1658, 'eval_samples_per_second': 344.941, 'eval_steps_per_second': 21.796, 'epoch': 1.23}
{'loss': 1.0215, 'learning_rate': 0.0001567700703112206, 'epoch': 1.23}
{'loss': 1.0901, 'learning_rate': 0.00015667536701756903, 'epoch': 1.23}
{'loss': 1.043, 'learning_rate': 0.00015658058878064573, 'epoch': 1.23}
{'loss': 1.0507, 'learning_rate': 0.00015648573572577839, 'epoch': 1.23}
{'loss': 0.9744, 'learning_rate': 0.0001563908079783935, 'epoch': 1.24}
{'loss': 1.0352, 'learning_rate': 0.00015629580566401657, 'epoch': 1.24}
{'loss': 1.0486, 'learning_rate': 0.0001562007289082715, 'epoch': 1.24}
{'loss': 1.0542, 'learning_rate': 0.0001561055778368807, 'epoch': 1.24}
{'loss': 1.0303, 'learning_rate': 0.00015601035257566478, 'epoch': 1.24}
{'loss': 0.9847, 'learning_rate': 0.00015591505325054258, 'epoch': 1.24}
{'loss': 1.0166, 'learning_rate': 0.00015581967998753082, 'epoch': 1.24}
{'loss': 1.0479, 'learning_rate': 0.00015572423291274393, 'epoch': 1.25}
{'loss': 0.9412, 'learning_rate': 0.00015562871215239402, 'epoch': 1.25}
{'loss': 0.9782, 'learning_rate': 0.00015553311783279055, 'epoch': 1.25}
{'loss': 1.051, 'learning_rate': 0.00015543745008034042, 'epoch': 1.25}
{'loss': 0.9484, 'learning_rate': 0.00015534170902154742, 'epoch': 1.25}
{'loss': 1.0431, 'learning_rate': 0.0001552458947830124, 'epoch': 1.25}
{'loss': 1.0192, 'learning_rate': 0.000155150007491433, 'epoch': 1.25}
{'loss': 0.9414, 'learning_rate': 0.00015505404727360334, 'epoch': 1.26}
{'loss': 1.0227, 'learning_rate': 0.00015495801425641407, 'epoch': 1.26}
{'loss': 1.0344, 'learning_rate': 0.00015486190856685208, 'epoch': 1.26}
{'loss': 0.9855, 'learning_rate': 0.0001547657303320004, 'epoch': 1.26}
{'loss': 1.0361, 'learning_rate': 0.00015466947967903786, 'epoch': 1.26}
{'loss': 0.9577, 'learning_rate': 0.0001545731567352392, 'epoch': 1.26}
{'loss': 0.983, 'learning_rate': 0.00015447676162797465, 'epoch': 1.26}
{'loss': 0.9632, 'learning_rate': 0.00015438029448470991, 'epoch': 1.27}
{'loss': 0.9546, 'learning_rate': 0.00015428375543300599, 'epoch': 1.27}
{'loss': 0.9796, 'learning_rate': 0.00015418714460051875, 'epoch': 1.27}
{'loss': 0.9752, 'learning_rate': 0.0001540904621149993, 'epoch': 1.27}
{'loss': 1.1023, 'learning_rate': 0.0001539937081042933, 'epoch': 1.27}
{'loss': 0.9553, 'learning_rate': 0.00015389688269634098, 'epoch': 1.27}
{'loss': 0.9235, 'learning_rate': 0.00015379998601917704, 'epoch': 1.27}
{'loss': 1.0244, 'learning_rate': 0.00015370301820093042, 'epoch': 1.28}
{'loss': 1.0053, 'learning_rate': 0.00015360597936982416, 'epoch': 1.28}
{'loss': 1.0732, 'learning_rate': 0.0001535088696541751, 'epoch': 1.28}
{'loss': 1.0219, 'learning_rate': 0.0001534116891823939, 'epoch': 1.28}
{'loss': 0.9317, 'learning_rate': 0.00015331443808298473, 'epoch': 1.28}
{'loss': 0.9772, 'learning_rate': 0.00015321711648454524, 'epoch': 1.28}
{'loss': 1.0778, 'learning_rate': 0.00015311972451576618, 'epoch': 1.28}
{'loss': 1.0702, 'learning_rate': 0.00015302226230543147, 'epoch': 1.29}
{'loss': 1.0377, 'learning_rate': 0.00015292472998241778, 'epoch': 1.29}
{'loss': 1.0045, 'learning_rate': 0.00015282712767569463, 'epoch': 1.29}
{'loss': 0.9666, 'learning_rate': 0.00015272945551432398, 'epoch': 1.29}
{'loss': 0.9771, 'learning_rate': 0.00015263171362746026, 'epoch': 1.29}
{'loss': 0.9417, 'learning_rate': 0.00015253390214435, 'epoch': 1.29}
{'loss': 1.0094, 'learning_rate': 0.0001524360211943318, 'epoch': 1.3}
{'loss': 0.939, 'learning_rate': 0.0001523380709068361, 'epoch': 1.3}
{'loss': 0.9674, 'learning_rate': 0.0001522400514113851, 'epoch': 1.3}
{'loss': 0.9809, 'learning_rate': 0.00015214196283759238, 'epoch': 1.3}
{'loss': 0.9492, 'learning_rate': 0.00015204380531516298, 'epoch': 1.3}
{'loss': 1.0156, 'learning_rate': 0.0001519455789738931, 'epoch': 1.3}
{'loss': 1.0039, 'learning_rate': 0.00015184728394366988, 'epoch': 1.3}
{'loss': 1.0555, 'learning_rate': 0.00015174892035447134, 'epoch': 1.31}
{'loss': 1.0223, 'learning_rate': 0.00015165048833636616, 'epoch': 1.31}
{'loss': 0.9875, 'learning_rate': 0.0001515519880195135, 'epoch': 1.31}
{'loss': 0.9453, 'learning_rate': 0.00015145341953416271, 'epoch': 1.31}
{'loss': 1.0224, 'learning_rate': 0.00015135478301065352, 'epoch': 1.31}
{'loss': 1.0238, 'learning_rate': 0.00015125607857941547, 'epoch': 1.31}
{'loss': 1.029, 'learning_rate': 0.0001511573063709679, 'epoch': 1.31}
{'loss': 1.0911, 'learning_rate': 0.0001510584665159198, 'epoch': 1.32}
{'loss': 1.0157, 'learning_rate': 0.00015095955914496965, 'epoch': 1.32}
{'loss': 0.9321, 'learning_rate': 0.00015086058438890508, 'epoch': 1.32}
{'loss': 0.9421, 'learning_rate': 0.00015076154237860304, 'epoch': 1.32}
{'loss': 0.9346, 'learning_rate': 0.00015066243324502918, 'epoch': 1.32}
{'loss': 0.9771, 'learning_rate': 0.00015056325711923808, 'epoch': 1.32}
{'loss': 0.9584, 'learning_rate': 0.00015046401413237282, 'epoch': 1.32}
{'loss': 1.0972, 'learning_rate': 0.00015036470441566488, 'epoch': 1.33}
{'loss': 0.9689, 'learning_rate': 0.00015026532810043407, 'epoch': 1.33}
{'loss': 1.0963, 'learning_rate': 0.00015016588531808816, 'epoch': 1.33}
{'loss': 1.0024, 'learning_rate': 0.00015006637620012286, 'epoch': 1.33}
{'loss': 1.0214, 'learning_rate': 0.00014996680087812165, 'epoch': 1.33}
{'loss': 1.0303, 'learning_rate': 0.00014986715948375542, 'epoch': 1.33}
{'loss': 0.9603, 'learning_rate': 0.00014976745214878256, 'epoch': 1.33}
{'loss': 1.0109, 'learning_rate': 0.00014966767900504856, 'epoch': 1.34}
{'loss': 1.0529, 'learning_rate': 0.00014956784018448603, 'epoch': 1.34}
{'loss': 1.0332, 'learning_rate': 0.00014946793581911428, 'epoch': 1.34}
{'loss': 0.9636, 'learning_rate': 0.00014936796604103948, 'epoch': 1.34}
{'loss': 1.008, 'learning_rate': 0.00014926793098245415, 'epoch': 1.34}
{'loss': 0.9722, 'learning_rate': 0.00014916783077563716, 'epoch': 1.34}
{'loss': 0.9184, 'learning_rate': 0.00014906766555295358, 'epoch': 1.34}
{'loss': 0.9888, 'learning_rate': 0.0001489674354468544, 'epoch': 1.35}
{'loss': 1.0216, 'learning_rate': 0.00014886714058987642, 'epoch': 1.35}
{'loss': 1.0306, 'learning_rate': 0.0001487667811146421, 'epoch': 1.35}
{'loss': 1.0693, 'learning_rate': 0.00014866635715385927, 'epoch': 1.35}
{'loss': 1.0027, 'learning_rate': 0.00014856586884032108, 'epoch': 1.35}
{'loss': 0.8941, 'learning_rate': 0.00014846531630690582, 'epoch': 1.35}
{'loss': 0.9139, 'learning_rate': 0.00014836469968657659, 'epoch': 1.35}
{'loss': 0.9428, 'learning_rate': 0.0001482640191123813, 'epoch': 1.36}
{'loss': 0.9904, 'learning_rate': 0.00014816327471745244, 'epoch': 1.36}
{'loss': 0.924, 'learning_rate': 0.00014806246663500686, 'epoch': 1.36}
{'loss': 0.9746, 'learning_rate': 0.00014796159499834568, 'epoch': 1.36}
{'loss': 1.0442, 'learning_rate': 0.00014786065994085396, 'epoch': 1.36}
{'loss': 0.9975, 'learning_rate': 0.0001477596615960007, 'epoch': 1.36}
{'loss': 1.0596, 'learning_rate': 0.00014765860009733858, 'epoch': 1.36}
{'loss': 0.9934, 'learning_rate': 0.00014755747557850378, 'epoch': 1.37}
{'loss': 1.0132, 'learning_rate': 0.00014745628817321578, 'epoch': 1.37}
{'loss': 1.0168, 'learning_rate': 0.00014735503801527726, 'epoch': 1.37}
{'loss': 0.978, 'learning_rate': 0.00014725372523857386, 'epoch': 1.37}
{'loss': 1.0202, 'learning_rate': 0.00014715234997707404, 'epoch': 1.37}
{'loss': 0.988, 'learning_rate': 0.00014705091236482887, 'epoch': 1.37}
{'loss': 1.0283, 'learning_rate': 0.00014694941253597183, 'epoch': 1.38}
{'loss': 1.0172, 'learning_rate': 0.00014684785062471883, 'epoch': 1.38}
{'loss': 0.999, 'learning_rate': 0.0001467462267653676, 'epoch': 1.38}
{'loss': 1.0348, 'learning_rate': 0.00014664454109229808, 'epoch': 1.38}
{'loss': 0.8903, 'learning_rate': 0.00014654279373997172, 'epoch': 1.38}
{'loss': 0.9988, 'learning_rate': 0.00014644098484293164, 'epoch': 1.38}
{'loss': 0.965, 'learning_rate': 0.0001463391145358023, 'epoch': 1.38}
{'loss': 0.9749, 'learning_rate': 0.00014623718295328944, 'epoch': 1.39}
{'loss': 1.0572, 'learning_rate': 0.00014613519023017974, 'epoch': 1.39}
{'loss': 1.0616, 'learning_rate': 0.00014603313650134075, 'epoch': 1.39}
{'loss': 1.1367, 'learning_rate': 0.00014593102190172067, 'epoch': 1.39}
{'loss': 0.9937, 'learning_rate': 0.00014582884656634827, 'epoch': 1.39}
{'loss': 1.0623, 'learning_rate': 0.0001457266106303326, 'epoch': 1.39}
{'loss': 1.1189, 'learning_rate': 0.00014562431422886272, 'epoch': 1.39}
{'loss': 0.9141, 'learning_rate': 0.0001455219574972079, 'epoch': 1.4}
{'loss': 1.0323, 'learning_rate': 0.00014541954057071692, 'epoch': 1.4}
{'loss': 0.9206, 'learning_rate': 0.0001453170635848183, 'epoch': 1.4}
{'loss': 1.103, 'learning_rate': 0.00014521452667501996, 'epoch': 1.4}
{'loss': 1.0007, 'learning_rate': 0.00014511192997690905, 'epoch': 1.4}
{'loss': 0.981, 'learning_rate': 0.00014500927362615177, 'epoch': 1.4}
{'loss': 0.9768, 'learning_rate': 0.00014490655775849324, 'epoch': 1.4}
{'loss': 1.0751, 'learning_rate': 0.0001448037825097572, 'epoch': 1.41}
{'loss': 1.0242, 'learning_rate': 0.000144700948015846, 'epoch': 1.41}
{'loss': 0.9725, 'learning_rate': 0.00014459805441274028, 'epoch': 1.41}
{'loss': 1.0499, 'learning_rate': 0.00014449510183649886, 'epoch': 1.41}
{'loss': 1.0967, 'learning_rate': 0.00014439209042325856, 'epoch': 1.41}
{'loss': 1.095, 'learning_rate': 0.00014428902030923392, 'epoch': 1.41}
{'loss': 1.0553, 'learning_rate': 0.00014418589163071722, 'epoch': 1.41}
{'loss': 1.01, 'learning_rate': 0.00014408270452407807, 'epoch': 1.42}
{'loss': 1.0895, 'learning_rate': 0.0001439794591257634, 'epoch': 1.42}
{'loss': 1.067, 'learning_rate': 0.00014387615557229726, 'epoch': 1.42}
{'loss': 1.0042, 'learning_rate': 0.00014377279400028053, 'epoch': 1.42}
{'loss': 1.0315, 'learning_rate': 0.00014366937454639078, 'epoch': 1.42}
{'loss': 1.0287, 'learning_rate': 0.0001435658973473822, 'epoch': 1.42}
{'loss': 0.9953, 'learning_rate': 0.00014346236254008537, 'epoch': 1.42}
{'loss': 0.9153, 'learning_rate': 0.00014335877026140688, 'epoch': 1.43}
{'loss': 0.874, 'learning_rate': 0.00014325512064832953, 'epoch': 1.43}
{'loss': 1.0082, 'learning_rate': 0.00014315141383791175, 'epoch': 1.43}
{'loss': 0.9922, 'learning_rate': 0.0001430476499672877, 'epoch': 1.43}
{'loss': 0.903, 'learning_rate': 0.000142943829173667, 'epoch': 1.43}
{'loss': 1.0113, 'learning_rate': 0.00014283995159433444, 'epoch': 1.43}
{'loss': 1.0344, 'learning_rate': 0.00014273601736665, 'epoch': 1.43}
{'loss': 0.9561, 'learning_rate': 0.00014263202662804863, 'epoch': 1.44}
{'loss': 0.8212, 'learning_rate': 0.00014252797951603977, 'epoch': 1.44}
{'loss': 1.02, 'learning_rate': 0.00014242387616820762, 'epoch': 1.44}
{'loss': 1.0819, 'learning_rate': 0.0001423197167222107, 'epoch': 1.44}
{'loss': 0.9885, 'learning_rate': 0.00014221550131578162, 'epoch': 1.44}
{'loss': 0.9739, 'learning_rate': 0.00014211123008672712, 'epoch': 1.44}
{'loss': 0.9883, 'learning_rate': 0.0001420069031729276, 'epoch': 1.44}
{'loss': 0.9477, 'learning_rate': 0.00014190252071233727, 'epoch': 1.45}
{'loss': 1.0646, 'learning_rate': 0.0001417980828429836, 'epoch': 1.45}
{'loss': 1.0487, 'learning_rate': 0.0001416935897029675, 'epoch': 1.45}
{'loss': 1.038, 'learning_rate': 0.00014158904143046286, 'epoch': 1.45}
{'loss': 1.0058, 'learning_rate': 0.0001414844381637165, 'epoch': 1.45}
{'loss': 0.9658, 'learning_rate': 0.00014137978004104802, 'epoch': 1.45}
{'loss': 1.11, 'learning_rate': 0.0001412750672008494, 'epoch': 1.45}
{'loss': 1.0374, 'learning_rate': 0.0001411702997815852, 'epoch': 1.46}
{'loss': 0.9788, 'learning_rate': 0.00014106547792179196, 'epoch': 1.46}
{'loss': 0.9702, 'learning_rate': 0.00014096060176007827, 'epoch': 1.46}
{'loss': 0.952, 'learning_rate': 0.00014085567143512457, 'epoch': 1.46}
{'loss': 1.085, 'learning_rate': 0.00014075068708568284, 'epoch': 1.46}
{'loss': 0.9883, 'learning_rate': 0.00014064564885057657, 'epoch': 1.46}
{'loss': 1.0018, 'learning_rate': 0.0001405405568687005, 'epoch': 1.47}
{'loss': 1.0296, 'learning_rate': 0.00014043541127902037, 'epoch': 1.47}
{'loss': 1.0565, 'learning_rate': 0.00014033021222057283, 'epoch': 1.47}
{'loss': 0.9503, 'learning_rate': 0.00014022495983246534, 'epoch': 1.47}
{'loss': 0.9864, 'learning_rate': 0.00014011965425387573, 'epoch': 1.47}
{'loss': 1.0184, 'learning_rate': 0.00014001429562405225, 'epoch': 1.47}
{'loss': 1.0072, 'learning_rate': 0.00013990888408231333, 'epoch': 1.47}
{'loss': 1.0445, 'learning_rate': 0.00013980341976804726, 'epoch': 1.48}
{'loss': 1.0382, 'learning_rate': 0.00013969790282071217, 'epoch': 1.48}
{'loss': 0.9769, 'learning_rate': 0.00013959233337983582, 'epoch': 1.48}
38%|██████████████████████████████████████████ | 1032/2752 [17:46<29:04, 1.01s/it][2023-12-29 02:21:12,185] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,443] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,444] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,444] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,445] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,705] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,706] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,969] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,970] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,220] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,221] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,513] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,514] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,777] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,777] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,032] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,032] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,300] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,301] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,559] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,559] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,829] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,829] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:15,091] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:15,091] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:15,356] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:15,356] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9937575459480286, 'eval_runtime': 3.1818, 'eval_samples_per_second': 343.202, 'eval_steps_per_second': 21.686, 'epoch': 1.48}
{'loss': 1.0046, 'learning_rate': 0.0001394867115850153, 'epoch': 1.48}
{'loss': 0.9755, 'learning_rate': 0.00013938103757591704, 'epoch': 1.48}
{'loss': 0.9396, 'learning_rate': 0.00013927531149227645, 'epoch': 1.48}
{'loss': 0.9849, 'learning_rate': 0.00013916953347389776, 'epoch': 1.48}
{'loss': 1.0334, 'learning_rate': 0.00013906370366065398, 'epoch': 1.49}
{'loss': 0.9628, 'learning_rate': 0.0001389578221924865, 'epoch': 1.49}
{'loss': 0.978, 'learning_rate': 0.00013885188920940506, 'epoch': 1.49}
{'loss': 0.9242, 'learning_rate': 0.0001387459048514876, 'epoch': 1.49}
{'loss': 1.0053, 'learning_rate': 0.00013863986925887983, 'epoch': 1.49}
{'loss': 1.0103, 'learning_rate': 0.00013853378257179535, 'epoch': 1.49}
{'loss': 1.105, 'learning_rate': 0.00013842764493051526, 'epoch': 1.49}
{'loss': 1.083, 'learning_rate': 0.00013832145647538799, 'epoch': 1.5}
{'loss': 0.9747, 'learning_rate': 0.00013821521734682933, 'epoch': 1.5}
{'loss': 0.9911, 'learning_rate': 0.00013810892768532186, 'epoch': 1.5}
{'loss': 0.9411, 'learning_rate': 0.00013800258763141518, 'epoch': 1.5}
{'loss': 0.9309, 'learning_rate': 0.00013789619732572538, 'epoch': 1.5}
{'loss': 1.0339, 'learning_rate': 0.00013778975690893503, 'epoch': 1.5}
{'loss': 1.0039, 'learning_rate': 0.00013768326652179307, 'epoch': 1.5}
{'loss': 1.0032, 'learning_rate': 0.00013757672630511434, 'epoch': 1.51}
{'loss': 0.9655, 'learning_rate': 0.00013747013639977973, 'epoch': 1.51}
{'loss': 0.9827, 'learning_rate': 0.00013736349694673576, 'epoch': 1.51}
{'loss': 0.9483, 'learning_rate': 0.00013725680808699444, 'epoch': 1.51}
{'loss': 1.1289, 'learning_rate': 0.00013715006996163317, 'epoch': 1.51}
{'loss': 0.9843, 'learning_rate': 0.0001370432827117945, 'epoch': 1.51}
{'loss': 1.0217, 'learning_rate': 0.00013693644647868586, 'epoch': 1.51}
{'loss': 1.0234, 'learning_rate': 0.00013682956140357954, 'epoch': 1.52}
{'loss': 0.8829, 'learning_rate': 0.00013672262762781242, 'epoch': 1.52}
{'loss': 1.0308, 'learning_rate': 0.0001366156452927856, 'epoch': 1.52}
{'loss': 0.9609, 'learning_rate': 0.00013650861453996465, 'epoch': 1.52}
{'loss': 0.9404, 'learning_rate': 0.00013640153551087902, 'epoch': 1.52}
{'loss': 1.0322, 'learning_rate': 0.000136294408347122, 'epoch': 1.52}
{'loss': 1.0137, 'learning_rate': 0.00013618723319035056, 'epoch': 1.52}
{'loss': 1.0723, 'learning_rate': 0.00013608001018228512, 'epoch': 1.53}
{'loss': 0.9559, 'learning_rate': 0.00013597273946470937, 'epoch': 1.53}
{'loss': 1.0487, 'learning_rate': 0.0001358654211794701, 'epoch': 1.53}
{'loss': 1.0032, 'learning_rate': 0.000135758055468477, 'epoch': 1.53}
{'loss': 1.0157, 'learning_rate': 0.00013565064247370248, 'epoch': 1.53}
{'loss': 1.0161, 'learning_rate': 0.00013554318233718136, 'epoch': 1.53}
{'loss': 1.1203, 'learning_rate': 0.00013543567520101106, 'epoch': 1.53}
{'loss': 0.9438, 'learning_rate': 0.00013532812120735087, 'epoch': 1.54}
{'loss': 1.0074, 'learning_rate': 0.00013522052049842216, 'epoch': 1.54}
{'loss': 0.9111, 'learning_rate': 0.0001351128732165081, 'epoch': 1.54}
{'loss': 1.0367, 'learning_rate': 0.00013500517950395348, 'epoch': 1.54}
{'loss': 1.1566, 'learning_rate': 0.0001348974395031643, 'epoch': 1.54}
{'loss': 1.0114, 'learning_rate': 0.00013478965335660798, 'epoch': 1.54}
{'loss': 0.9228, 'learning_rate': 0.00013468182120681278, 'epoch': 1.55}
{'loss': 1.0196, 'learning_rate': 0.000134573943196368, 'epoch': 1.55}
{'loss': 1.0479, 'learning_rate': 0.00013446601946792334, 'epoch': 1.55}
{'loss': 0.9756, 'learning_rate': 0.00013435805016418913, 'epoch': 1.55}
{'loss': 1.0486, 'learning_rate': 0.00013425003542793596, 'epoch': 1.55}
{'loss': 0.9684, 'learning_rate': 0.00013414197540199436, 'epoch': 1.55}
{'loss': 0.9797, 'learning_rate': 0.00013403387022925488, 'epoch': 1.55}
{'loss': 0.9891, 'learning_rate': 0.0001339257200526677, 'epoch': 1.56}
{'loss': 0.9018, 'learning_rate': 0.0001338175250152426, 'epoch': 1.56}
{'loss': 1.0394, 'learning_rate': 0.00013370928526004855, 'epoch': 1.56}
{'loss': 0.9698, 'learning_rate': 0.00013360100093021376, 'epoch': 1.56}
{'loss': 0.9922, 'learning_rate': 0.00013349267216892529, 'epoch': 1.56}
{'loss': 0.9479, 'learning_rate': 0.00013338429911942908, 'epoch': 1.56}
{'loss': 1.0514, 'learning_rate': 0.00013327588192502948, 'epoch': 1.56}
{'loss': 1.1081, 'learning_rate': 0.00013316742072908927, 'epoch': 1.57}
{'loss': 0.9938, 'learning_rate': 0.00013305891567502953, 'epoch': 1.57}
{'loss': 0.9916, 'learning_rate': 0.0001329503669063292, 'epoch': 1.57}
{'loss': 1.0116, 'learning_rate': 0.000132841774566525, 'epoch': 1.57}
{'loss': 1.0194, 'learning_rate': 0.0001327331387992114, 'epoch': 1.57}
{'loss': 1.0529, 'learning_rate': 0.0001326244597480402, 'epoch': 1.57}
{'loss': 1.032, 'learning_rate': 0.00013251573755672047, 'epoch': 1.57}
{'loss': 1.0313, 'learning_rate': 0.0001324069723690183, 'epoch': 1.58}
{'loss': 0.9669, 'learning_rate': 0.00013229816432875664, 'epoch': 1.58}
{'loss': 0.9497, 'learning_rate': 0.00013218931357981514, 'epoch': 1.58}
{'loss': 1.0103, 'learning_rate': 0.0001320804202661299, 'epoch': 1.58}
{'loss': 1.0261, 'learning_rate': 0.0001319714845316933, 'epoch': 1.58}
{'loss': 0.9605, 'learning_rate': 0.00013186250652055378, 'epoch': 1.58}
{'loss': 1.0291, 'learning_rate': 0.00013175348637681575, 'epoch': 1.58}
{'loss': 0.959, 'learning_rate': 0.00013164442424463935, 'epoch': 1.59}
{'loss': 1.0226, 'learning_rate': 0.0001315353202682401, 'epoch': 1.59}
{'loss': 0.957, 'learning_rate': 0.00013142617459188899, 'epoch': 1.59}
{'loss': 0.964, 'learning_rate': 0.00013131698735991217, 'epoch': 1.59}
{'loss': 0.9521, 'learning_rate': 0.0001312077587166906, 'epoch': 1.59}
{'loss': 1.0107, 'learning_rate': 0.00013109848880666014, 'epoch': 1.59}
{'loss': 0.9698, 'learning_rate': 0.0001309891777743111, 'epoch': 1.59}
{'loss': 1.0156, 'learning_rate': 0.00013087982576418823, 'epoch': 1.6}
{'loss': 1.1068, 'learning_rate': 0.00013077043292089054, 'epoch': 1.6}
{'loss': 1.1366, 'learning_rate': 0.00013066099938907085, 'epoch': 1.6}
{'loss': 0.9279, 'learning_rate': 0.00013055152531343592, 'epoch': 1.6}
{'loss': 1.0342, 'learning_rate': 0.00013044201083874612, 'epoch': 1.6}
{'loss': 0.9757, 'learning_rate': 0.00013033245610981516, 'epoch': 1.6}
{'loss': 0.9279, 'learning_rate': 0.00013022286127151007, 'epoch': 1.6}
{'loss': 0.9938, 'learning_rate': 0.00013011322646875088, 'epoch': 1.61}
{'loss': 0.9187, 'learning_rate': 0.00013000355184651045, 'epoch': 1.61}
{'loss': 1.0253, 'learning_rate': 0.0001298938375498143, 'epoch': 1.61}
{'loss': 1.1222, 'learning_rate': 0.00012978408372374048, 'epoch': 1.61}
{'loss': 1.0878, 'learning_rate': 0.00012967429051341913, 'epoch': 1.61}
{'loss': 0.9866, 'learning_rate': 0.0001295644580640327, 'epoch': 1.61}
{'loss': 1.0562, 'learning_rate': 0.00012945458652081535, 'epoch': 1.61}
{'loss': 1.0038, 'learning_rate': 0.00012934467602905304, 'epoch': 1.62}
{'loss': 0.9432, 'learning_rate': 0.00012923472673408322, 'epoch': 1.62}
{'loss': 0.9757, 'learning_rate': 0.00012912473878129454, 'epoch': 1.62}
{'loss': 1.1197, 'learning_rate': 0.0001290147123161269, 'epoch': 1.62}
{'loss': 1.012, 'learning_rate': 0.0001289046474840711, 'epoch': 1.62}
{'loss': 0.9468, 'learning_rate': 0.0001287945444306686, 'epoch': 1.62}
{'loss': 0.9955, 'learning_rate': 0.00012868440330151152, 'epoch': 1.62}
{'loss': 1.0147, 'learning_rate': 0.0001285742242422422, 'epoch': 1.63}
{'loss': 0.9599, 'learning_rate': 0.00012846400739855324, 'epoch': 1.63}
{'loss': 1.0064, 'learning_rate': 0.00012835375291618716, 'epoch': 1.63}
{'loss': 1.0272, 'learning_rate': 0.0001282434609409362, 'epoch': 1.63}
{'loss': 1.047, 'learning_rate': 0.00012813313161864228, 'epoch': 1.63}
{'loss': 0.9432, 'learning_rate': 0.00012802276509519666, 'epoch': 1.63}
{'loss': 1.0305, 'learning_rate': 0.00012791236151653973, 'epoch': 1.64}
{'loss': 0.9884, 'learning_rate': 0.00012780192102866098, 'epoch': 1.64}
{'loss': 0.9868, 'learning_rate': 0.00012769144377759866, 'epoch': 1.64}
{'loss': 1.0013, 'learning_rate': 0.00012758092990943962, 'epoch': 1.64}
{'loss': 0.9558, 'learning_rate': 0.00012747037957031916, 'epoch': 1.64}
{'loss': 1.0111, 'learning_rate': 0.00012735979290642076, 'epoch': 1.64}
{'loss': 1.0384, 'learning_rate': 0.000127249170063976, 'epoch': 1.64}
{'loss': 0.9203, 'learning_rate': 0.00012713851118926426, 'epoch': 1.65}
{'loss': 0.9246, 'learning_rate': 0.00012702781642861253, 'epoch': 1.65}
{'loss': 1.0293, 'learning_rate': 0.00012691708592839533, 'epoch': 1.65}
{'loss': 1.0329, 'learning_rate': 0.00012680631983503436, 'epoch': 1.65}
{'loss': 0.9835, 'learning_rate': 0.00012669551829499852, 'epoch': 1.65}
{'loss': 0.9903, 'learning_rate': 0.00012658468145480337, 'epoch': 1.65}
{'loss': 0.9837, 'learning_rate': 0.0001264738094610114, 'epoch': 1.65}
{'loss': 0.9596, 'learning_rate': 0.0001263629024602313, 'epoch': 1.66}
{'loss': 1.0152, 'learning_rate': 0.00012625196059911834, 'epoch': 1.66}
{'loss': 1.065, 'learning_rate': 0.00012614098402437366, 'epoch': 1.66}
{'loss': 1.1273, 'learning_rate': 0.00012602997288274444, 'epoch': 1.66}
{'loss': 0.9468, 'learning_rate': 0.0001259189273210235, 'epoch': 1.66}
{'loss': 1.0009, 'learning_rate': 0.00012580784748604922, 'epoch': 1.66}
{'loss': 0.994, 'learning_rate': 0.00012569673352470523, 'epoch': 1.66}
{'loss': 1.049, 'learning_rate': 0.00012558558558392038, 'epoch': 1.67}
{'loss': 0.9821, 'learning_rate': 0.0001254744038106684, 'epoch': 1.67}
{'loss': 1.1094, 'learning_rate': 0.00012536318835196773, 'epoch': 1.67}
{'loss': 0.9801, 'learning_rate': 0.00012525193935488137, 'epoch': 1.67}
{'loss': 1.0237, 'learning_rate': 0.00012514065696651674, 'epoch': 1.67}
{'loss': 0.9143, 'learning_rate': 0.00012502934133402533, 'epoch': 1.67}
{'loss': 1.065, 'learning_rate': 0.00012491799260460265, 'epoch': 1.67}
{'loss': 0.9975, 'learning_rate': 0.00012480661092548786, 'epoch': 1.68}
{'loss': 0.9822, 'learning_rate': 0.00012469519644396385, 'epoch': 1.68}
{'loss': 0.9908, 'learning_rate': 0.0001245837493073568, 'epoch': 1.68}
{'loss': 0.9924, 'learning_rate': 0.00012447226966303605, 'epoch': 1.68}
{'loss': 1.0077, 'learning_rate': 0.00012436075765841396, 'epoch': 1.68}
{'loss': 0.9311, 'learning_rate': 0.00012424921344094566, 'epoch': 1.68}
{'loss': 1.0039, 'learning_rate': 0.0001241376371581289, 'epoch': 1.68}
{'loss': 0.995, 'learning_rate': 0.0001240260289575039, 'epoch': 1.69}
{'loss': 0.9583, 'learning_rate': 0.00012391438898665287, 'epoch': 1.69}
{'loss': 0.9996, 'learning_rate': 0.00012380271739320027, 'epoch': 1.69}
{'loss': 0.9926, 'learning_rate': 0.00012369101432481224, 'epoch': 1.69}
{'loss': 1.0325, 'learning_rate': 0.00012357927992919657, 'epoch': 1.69}
{'loss': 1.0725, 'learning_rate': 0.00012346751435410248, 'epoch': 1.69}
{'loss': 0.9411, 'learning_rate': 0.00012335571774732044, 'epoch': 1.69}
{'loss': 0.9459, 'learning_rate': 0.0001232438902566819, 'epoch': 1.7}
{'loss': 0.9839, 'learning_rate': 0.0001231320320300592, 'epoch': 1.7}
{'loss': 1.0347, 'learning_rate': 0.0001230201432153653, 'epoch': 1.7}
{'loss': 0.9203, 'learning_rate': 0.00012290822396055355, 'epoch': 1.7}
{'loss': 1.018, 'learning_rate': 0.00012279627441361772, 'epoch': 1.7}
{'loss': 1.0189, 'learning_rate': 0.00012268429472259143, 'epoch': 1.7}
{'loss': 1.0424, 'learning_rate': 0.00012257228503554835, 'epoch': 1.7}
{'loss': 0.9882, 'learning_rate': 0.00012246024550060166, 'epoch': 1.71}
{'loss': 0.9952, 'learning_rate': 0.0001223481762659041, 'epoch': 1.71}
{'loss': 0.9864, 'learning_rate': 0.00012223607747964766, 'epoch': 1.71}
{'loss': 0.9798, 'learning_rate': 0.00012212394929006336, 'epoch': 1.71}
{'loss': 0.9661, 'learning_rate': 0.00012201179184542115, 'epoch': 1.71}
{'loss': 1.0852, 'learning_rate': 0.00012189960529402971, 'epoch': 1.71}
{'loss': 0.942, 'learning_rate': 0.00012178738978423612, 'epoch': 1.72}
{'loss': 1.0777, 'learning_rate': 0.00012167514546442576, 'epoch': 1.72}
{'loss': 0.996, 'learning_rate': 0.00012156287248302219, 'epoch': 1.72}
{'loss': 1.0227, 'learning_rate': 0.00012145057098848673, 'epoch': 1.72}
{'loss': 1.0104, 'learning_rate': 0.00012133824112931858, 'epoch': 1.72}
{'loss': 0.9697, 'learning_rate': 0.00012122588305405434, 'epoch': 1.72}
{'loss': 0.9514, 'learning_rate': 0.00012111349691126785, 'epoch': 1.72}
{'loss': 0.9427, 'learning_rate': 0.00012100108284957028, 'epoch': 1.73}
{'loss': 0.9278, 'learning_rate': 0.0001208886410176095, 'epoch': 1.73}
{'loss': 1.085, 'learning_rate': 0.0001207761715640702, 'epoch': 1.73}
44%|█████████████████████████████████████████████████ | 1204/2752 [20:44<26:11, 1.02s/it][2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,745] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,746] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,747] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,999] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,999] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,000] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,001] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,268] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,269] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,528] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,528] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,779] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,780] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,065] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,066] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,334] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,335] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,586] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,586] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,850] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,850] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,111] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,112] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,381] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,382] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,644] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,644] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,908] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,908] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.986236572265625, 'eval_runtime': 3.1796, 'eval_samples_per_second': 343.443, 'eval_steps_per_second': 21.701, 'epoch': 1.73}
{'loss': 1.0051, 'learning_rate': 0.00012066367463767361, 'epoch': 1.73}
{'loss': 1.0065, 'learning_rate': 0.00012055115038717722, 'epoch': 1.73}
{'loss': 0.8877, 'learning_rate': 0.00012043859896137472, 'epoch': 1.73}
{'loss': 0.9551, 'learning_rate': 0.00012032602050909574, 'epoch': 1.73}
{'loss': 0.9067, 'learning_rate': 0.00012021341517920555, 'epoch': 1.74}
{'loss': 1.0503, 'learning_rate': 0.00012010078312060504, 'epoch': 1.74}
{'loss': 0.9591, 'learning_rate': 0.00011998812448223049, 'epoch': 1.74}
{'loss': 0.9306, 'learning_rate': 0.00011987543941305321, 'epoch': 1.74}
{'loss': 1.0437, 'learning_rate': 0.00011976272806207956, 'epoch': 1.74}
{'loss': 1.0226, 'learning_rate': 0.00011964999057835055, 'epoch': 1.74}
{'loss': 1.0021, 'learning_rate': 0.00011953722711094189, 'epoch': 1.74}
{'loss': 0.899, 'learning_rate': 0.00011942443780896351, 'epoch': 1.75}
{'loss': 0.9354, 'learning_rate': 0.00011931162282155953, 'epoch': 1.75}
{'loss': 0.9396, 'learning_rate': 0.00011919878229790813, 'epoch': 1.75}
{'loss': 1.0293, 'learning_rate': 0.00011908591638722115, 'epoch': 1.75}
{'loss': 0.9847, 'learning_rate': 0.00011897302523874405, 'epoch': 1.75}
{'loss': 0.9713, 'learning_rate': 0.00011886010900175564, 'epoch': 1.75}
{'loss': 0.9512, 'learning_rate': 0.00011874716782556794, 'epoch': 1.75}
{'loss': 0.9958, 'learning_rate': 0.0001186342018595259, 'epoch': 1.76}
{'loss': 0.918, 'learning_rate': 0.0001185212112530073, 'epoch': 1.76}
{'loss': 1.0313, 'learning_rate': 0.00011840819615542247, 'epoch': 1.76}
{'loss': 1.0363, 'learning_rate': 0.00011829515671621412, 'epoch': 1.76}
{'loss': 1.0067, 'learning_rate': 0.00011818209308485717, 'epoch': 1.76}
{'loss': 1.0817, 'learning_rate': 0.0001180690054108585, 'epoch': 1.76}
{'loss': 0.9825, 'learning_rate': 0.00011795589384375686, 'epoch': 1.76}
{'loss': 1.1017, 'learning_rate': 0.00011784275853312245, 'epoch': 1.77}
{'loss': 0.9762, 'learning_rate': 0.00011772959962855704, 'epoch': 1.77}
{'loss': 1.0414, 'learning_rate': 0.00011761641727969343, 'epoch': 1.77}
{'loss': 0.8992, 'learning_rate': 0.0001175032116361956, 'epoch': 1.77}
{'loss': 1.107, 'learning_rate': 0.00011738998284775815, 'epoch': 1.77}
{'loss': 0.9726, 'learning_rate': 0.00011727673106410642, 'epoch': 1.77}
{'loss': 1.1176, 'learning_rate': 0.00011716345643499608, 'epoch': 1.77}
{'loss': 1.0255, 'learning_rate': 0.00011705015911021307, 'epoch': 1.78}
{'loss': 0.9226, 'learning_rate': 0.00011693683923957328, 'epoch': 1.78}
{'loss': 0.9997, 'learning_rate': 0.00011682349697292245, 'epoch': 1.78}
{'loss': 1.0464, 'learning_rate': 0.00011671013246013596, 'epoch': 1.78}
{'loss': 1.0169, 'learning_rate': 0.0001165967458511185, 'epoch': 1.78}
{'loss': 0.99, 'learning_rate': 0.00011648333729580412, 'epoch': 1.78}
{'loss': 0.9599, 'learning_rate': 0.0001163699069441558, 'epoch': 1.78}
{'loss': 1.0039, 'learning_rate': 0.00011625645494616535, 'epoch': 1.79}
{'loss': 0.9786, 'learning_rate': 0.00011614298145185323, 'epoch': 1.79}
{'loss': 1.0318, 'learning_rate': 0.00011602948661126828, 'epoch': 1.79}
{'loss': 1.0288, 'learning_rate': 0.00011591597057448769, 'epoch': 1.79}
{'loss': 0.9541, 'learning_rate': 0.0001158024334916165, 'epoch': 1.79}
{'loss': 0.9439, 'learning_rate': 0.00011568887551278768, 'epoch': 1.79}
{'loss': 1.115, 'learning_rate': 0.00011557529678816188, 'epoch': 1.8}
{'loss': 1.0287, 'learning_rate': 0.00011546169746792705, 'epoch': 1.8}
{'loss': 1.0036, 'learning_rate': 0.00011534807770229845, 'epoch': 1.8}
{'loss': 1.0283, 'learning_rate': 0.00011523443764151842, 'epoch': 1.8}
{'loss': 1.0294, 'learning_rate': 0.00011512077743585603, 'epoch': 1.8}
{'loss': 1.0725, 'learning_rate': 0.0001150070972356071, 'epoch': 1.8}
{'loss': 0.9904, 'learning_rate': 0.00011489339719109378, 'epoch': 1.8}
{'loss': 0.9253, 'learning_rate': 0.00011477967745266453, 'epoch': 1.81}
{'loss': 1.0266, 'learning_rate': 0.00011466593817069391, 'epoch': 1.81}
{'loss': 0.949, 'learning_rate': 0.00011455217949558217, 'epoch': 1.81}
{'loss': 1.0171, 'learning_rate': 0.00011443840157775527, 'epoch': 1.81}
{'loss': 0.9657, 'learning_rate': 0.00011432460456766471, 'epoch': 1.81}
{'loss': 0.9154, 'learning_rate': 0.00011421078861578709, 'epoch': 1.81}
{'loss': 1.0106, 'learning_rate': 0.00011409695387262416, 'epoch': 1.81}
{'loss': 1.0194, 'learning_rate': 0.00011398310048870247, 'epoch': 1.82}
{'loss': 0.9796, 'learning_rate': 0.00011386922861457319, 'epoch': 1.82}
{'loss': 0.9294, 'learning_rate': 0.00011375533840081202, 'epoch': 1.82}
{'loss': 0.9676, 'learning_rate': 0.00011364142999801887, 'epoch': 1.82}
{'loss': 1.049, 'learning_rate': 0.00011352750355681772, 'epoch': 1.82}
{'loss': 0.9453, 'learning_rate': 0.00011341355922785634, 'epoch': 1.82}
{'loss': 1.0379, 'learning_rate': 0.00011329959716180622, 'epoch': 1.82}
{'loss': 0.9617, 'learning_rate': 0.00011318561750936232, 'epoch': 1.83}
{'loss': 0.9331, 'learning_rate': 0.00011307162042124277, 'epoch': 1.83}
{'loss': 0.9028, 'learning_rate': 0.00011295760604818882, 'epoch': 1.83}
{'loss': 0.9642, 'learning_rate': 0.00011284357454096457, 'epoch': 1.83}
{'loss': 1.0175, 'learning_rate': 0.00011272952605035674, 'epoch': 1.83}
{'loss': 1.003, 'learning_rate': 0.00011261546072717454, 'epoch': 1.83}
{'loss': 1.0012, 'learning_rate': 0.00011250137872224946, 'epoch': 1.83}
{'loss': 0.9744, 'learning_rate': 0.00011238728018643499, 'epoch': 1.84}
{'loss': 1.0007, 'learning_rate': 0.00011227316527060651, 'epoch': 1.84}
{'loss': 0.9524, 'learning_rate': 0.00011215903412566111, 'epoch': 1.84}
{'loss': 0.9723, 'learning_rate': 0.00011204488690251725, 'epoch': 1.84}
{'loss': 1.0132, 'learning_rate': 0.00011193072375211468, 'epoch': 1.84}
{'loss': 1.0582, 'learning_rate': 0.00011181654482541428, 'epoch': 1.84}
{'loss': 0.9719, 'learning_rate': 0.00011170235027339766, 'epoch': 1.84}
{'loss': 1.0145, 'learning_rate': 0.0001115881402470672, 'epoch': 1.85}
{'loss': 1.0237, 'learning_rate': 0.0001114739148974457, 'epoch': 1.85}
{'loss': 0.9978, 'learning_rate': 0.00011135967437557626, 'epoch': 1.85}
{'loss': 1.0125, 'learning_rate': 0.00011124541883252198, 'epoch': 1.85}
{'loss': 0.9764, 'learning_rate': 0.00011113114841936584, 'epoch': 1.85}
{'loss': 1.1135, 'learning_rate': 0.00011101686328721053, 'epoch': 1.85}
{'loss': 1.0534, 'learning_rate': 0.00011090256358717819, 'epoch': 1.85}
{'loss': 1.0163, 'learning_rate': 0.00011078824947041016, 'epoch': 1.86}
{'loss': 1.0513, 'learning_rate': 0.00011067392108806692, 'epoch': 1.86}
{'loss': 0.9866, 'learning_rate': 0.00011055957859132773, 'epoch': 1.86}
{'loss': 0.9761, 'learning_rate': 0.00011044522213139064, 'epoch': 1.86}
{'loss': 1.0012, 'learning_rate': 0.00011033085185947208, 'epoch': 1.86}
{'loss': 1.0798, 'learning_rate': 0.00011021646792680667, 'epoch': 1.86}
{'loss': 0.9703, 'learning_rate': 0.00011010207048464729, 'epoch': 1.86}
{'loss': 1.007, 'learning_rate': 0.00010998765968426449, 'epoch': 1.87}
{'loss': 0.9898, 'learning_rate': 0.00010987323567694661, 'epoch': 1.87}
{'loss': 1.0196, 'learning_rate': 0.00010975879861399938, 'epoch': 1.87}
{'loss': 1.0584, 'learning_rate': 0.00010964434864674584, 'epoch': 1.87}
{'loss': 1.0112, 'learning_rate': 0.00010952988592652611, 'epoch': 1.87}
{'loss': 0.9821, 'learning_rate': 0.00010941541060469712, 'epoch': 1.87}
{'loss': 0.9078, 'learning_rate': 0.00010930092283263243, 'epoch': 1.88}
{'loss': 1.0014, 'learning_rate': 0.00010918642276172218, 'epoch': 1.88}
{'loss': 1.0205, 'learning_rate': 0.00010907191054337269, 'epoch': 1.88}
{'loss': 0.9852, 'learning_rate': 0.00010895738632900636, 'epoch': 1.88}
{'loss': 0.9204, 'learning_rate': 0.00010884285027006147, 'epoch': 1.88}
{'loss': 0.96, 'learning_rate': 0.0001087283025179919, 'epoch': 1.88}
{'loss': 1.0313, 'learning_rate': 0.00010861374322426714, 'epoch': 1.88}
{'loss': 1.006, 'learning_rate': 0.00010849917254037174, 'epoch': 1.89}
{'loss': 1.0996, 'learning_rate': 0.00010838459061780546, 'epoch': 1.89}
{'loss': 0.9643, 'learning_rate': 0.0001082699976080829, 'epoch': 1.89}
{'loss': 0.9693, 'learning_rate': 0.00010815539366273327, 'epoch': 1.89}
{'loss': 0.9897, 'learning_rate': 0.00010804077893330023, 'epoch': 1.89}
{'loss': 0.9793, 'learning_rate': 0.0001079261535713418, 'epoch': 1.89}
{'loss': 0.943, 'learning_rate': 0.00010781151772842993, 'epoch': 1.89}
{'loss': 1.0706, 'learning_rate': 0.00010769687155615055, 'epoch': 1.9}
{'loss': 0.9709, 'learning_rate': 0.00010758221520610321, 'epoch': 1.9}
{'loss': 0.9712, 'learning_rate': 0.00010746754882990082, 'epoch': 1.9}
{'loss': 0.9715, 'learning_rate': 0.00010735287257916972, 'epoch': 1.9}
{'loss': 1.0767, 'learning_rate': 0.00010723818660554913, 'epoch': 1.9}
{'loss': 1.0162, 'learning_rate': 0.00010712349106069131, 'epoch': 1.9}
{'loss': 1.0161, 'learning_rate': 0.00010700878609626102, 'epoch': 1.9}
{'loss': 0.9651, 'learning_rate': 0.00010689407186393552, 'epoch': 1.91}
{'loss': 0.962, 'learning_rate': 0.0001067793485154044, 'epoch': 1.91}
{'loss': 0.9621, 'learning_rate': 0.00010666461620236922, 'epoch': 1.91}
{'loss': 0.9946, 'learning_rate': 0.00010654987507654341, 'epoch': 1.91}
{'loss': 1.0797, 'learning_rate': 0.00010643512528965207, 'epoch': 1.91}
{'loss': 0.9162, 'learning_rate': 0.00010632036699343178, 'epoch': 1.91}
{'loss': 0.9236, 'learning_rate': 0.00010620560033963025, 'epoch': 1.91}
{'loss': 0.9982, 'learning_rate': 0.00010609082548000642, 'epoch': 1.92}
{'loss': 0.9382, 'learning_rate': 0.00010597604256632994, 'epoch': 1.92}
{'loss': 0.9533, 'learning_rate': 0.0001058612517503812, 'epoch': 1.92}
{'loss': 0.9939, 'learning_rate': 0.00010574645318395095, 'epoch': 1.92}
{'loss': 0.9483, 'learning_rate': 0.00010563164701884027, 'epoch': 1.92}
{'loss': 0.9357, 'learning_rate': 0.00010551683340686027, 'epoch': 1.92}
{'loss': 1.0208, 'learning_rate': 0.00010540201249983188, 'epoch': 1.92}
{'loss': 0.9353, 'learning_rate': 0.00010528718444958567, 'epoch': 1.93}
{'loss': 0.9885, 'learning_rate': 0.00010517234940796173, 'epoch': 1.93}
{'loss': 0.9881, 'learning_rate': 0.00010505750752680926, 'epoch': 1.93}
{'loss': 1.0692, 'learning_rate': 0.00010494265895798665, 'epoch': 1.93}
{'loss': 0.9999, 'learning_rate': 0.00010482780385336106, 'epoch': 1.93}
{'loss': 0.9933, 'learning_rate': 0.00010471294236480827, 'epoch': 1.93}
{'loss': 0.9862, 'learning_rate': 0.00010459807464421257, 'epoch': 1.93}
{'loss': 0.9535, 'learning_rate': 0.0001044832008434664, 'epoch': 1.94}
{'loss': 1.1075, 'learning_rate': 0.00010436832111447034, 'epoch': 1.94}
{'loss': 0.9865, 'learning_rate': 0.00010425343560913277, 'epoch': 1.94}
{'loss': 1.0135, 'learning_rate': 0.00010413854447936966, 'epoch': 1.94}
{'loss': 0.9973, 'learning_rate': 0.00010402364787710451, 'epoch': 1.94}
{'loss': 0.9933, 'learning_rate': 0.00010390874595426794, 'epoch': 1.94}
{'loss': 0.9078, 'learning_rate': 0.0001037938388627977, 'epoch': 1.94}
{'loss': 0.9869, 'learning_rate': 0.00010367892675463837, 'epoch': 1.95}
{'loss': 1.0976, 'learning_rate': 0.0001035640097817411, 'epoch': 1.95}
{'loss': 0.9593, 'learning_rate': 0.00010344908809606353, 'epoch': 1.95}
{'loss': 0.9876, 'learning_rate': 0.0001033341618495695, 'epoch': 1.95}
{'loss': 0.9538, 'learning_rate': 0.0001032192311942289, 'epoch': 1.95}
{'loss': 0.996, 'learning_rate': 0.00010310429628201743, 'epoch': 1.95}
{'loss': 0.984, 'learning_rate': 0.00010298935726491648, 'epoch': 1.95}
{'loss': 0.9882, 'learning_rate': 0.00010287441429491274, 'epoch': 1.96}
{'loss': 0.9246, 'learning_rate': 0.0001027594675239983, 'epoch': 1.96}
{'loss': 0.9562, 'learning_rate': 0.00010264451710417011, 'epoch': 1.96}
{'loss': 0.9681, 'learning_rate': 0.00010252956318743006, 'epoch': 1.96}
{'loss': 0.9678, 'learning_rate': 0.0001024146059257846, 'epoch': 1.96}
{'loss': 0.9447, 'learning_rate': 0.00010229964547124464, 'epoch': 1.96}
{'loss': 1.0149, 'learning_rate': 0.0001021846819758253, 'epoch': 1.97}
{'loss': 1.0448, 'learning_rate': 0.0001020697155915457, 'epoch': 1.97}
{'loss': 0.9901, 'learning_rate': 0.0001019547464704288, 'epoch': 1.97}
{'loss': 0.9507, 'learning_rate': 0.00010183977476450117, 'epoch': 1.97}
{'loss': 0.9689, 'learning_rate': 0.00010172480062579287, 'epoch': 1.97}
{'loss': 0.9805, 'learning_rate': 0.000101609824206337, 'epoch': 1.97}
{'loss': 1.002, 'learning_rate': 0.00010149484565816992, 'epoch': 1.97}
{'loss': 1.0156, 'learning_rate': 0.00010137986513333055, 'epoch': 1.98}
{'loss': 0.9005, 'learning_rate': 0.00010126488278386063, 'epoch': 1.98}
{'loss': 0.8677, 'learning_rate': 0.00010114989876180423, 'epoch': 1.98}
50%|████████████████████████████████████████████████████████ | 1376/2752 [23:41<23:01, 1.00s/it][2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,940] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,193] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,193] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,194] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,195] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,455] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,456] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,720] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,721] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,971] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,971] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,256] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,256] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,521] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,522] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,775] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,775] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,036] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,036] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,298] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,299] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,568] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,569] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,830] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,831] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:10,095] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:10,096] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9859890937805176, 'eval_runtime': 3.1694, 'eval_samples_per_second': 344.542, 'eval_steps_per_second': 21.77, 'epoch': 1.98}
{'loss': 0.9542, 'learning_rate': 0.00010103491321920757, 'epoch': 1.98}
{'loss': 0.994, 'learning_rate': 0.000100919926308119, 'epoch': 1.98}
{'loss': 1.0406, 'learning_rate': 0.00010080493818058859, 'epoch': 1.98}
{'loss': 1.0105, 'learning_rate': 0.00010068994898866804, 'epoch': 1.98}
{'loss': 0.8996, 'learning_rate': 0.00010057495888441046, 'epoch': 1.99}
{'loss': 0.9243, 'learning_rate': 0.00010045996801987023, 'epoch': 1.99}
{'loss': 0.9922, 'learning_rate': 0.00010034497654710266, 'epoch': 1.99}
{'loss': 1.017, 'learning_rate': 0.00010022998461816389, 'epoch': 1.99}
{'loss': 0.93, 'learning_rate': 0.00010011499238511062, 'epoch': 1.99}
{'loss': 0.9934, 'learning_rate': 0.0001, 'epoch': 1.99}
{'loss': 1.0425, 'learning_rate': 9.988500761488941e-05, 'epoch': 1.99}
{'loss': 0.9271, 'learning_rate': 9.977001538183616e-05, 'epoch': 2.0}
{'loss': 0.8973, 'learning_rate': 9.965502345289733e-05, 'epoch': 2.0}
{'loss': 0.9481, 'learning_rate': 9.954003198012977e-05, 'epoch': 2.0}
{'loss': 0.9896, 'learning_rate': 9.942504111558956e-05, 'epoch': 2.0}
{'loss': 1.0777, 'learning_rate': 9.9310051011332e-05, 'epoch': 2.0}
{'loss': 1.0304, 'learning_rate': 9.919506181941146e-05, 'epoch': 2.0}
{'loss': 1.0172, 'learning_rate': 9.908007369188105e-05, 'epoch': 2.0}
{'loss': 1.0018, 'learning_rate': 9.896508678079244e-05, 'epoch': 2.01}
{'loss': 1.0292, 'learning_rate': 9.88501012381958e-05, 'epoch': 2.01}
{'loss': 1.0116, 'learning_rate': 9.873511721613938e-05, 'epoch': 2.01}
{'loss': 1.0264, 'learning_rate': 9.862013486666947e-05, 'epoch': 2.01}
{'loss': 1.0555, 'learning_rate': 9.850515434183013e-05, 'epoch': 2.01}
{'loss': 1.0142, 'learning_rate': 9.839017579366299e-05, 'epoch': 2.01}
{'loss': 0.9613, 'learning_rate': 9.827519937420716e-05, 'epoch': 2.01}
{'loss': 1.0444, 'learning_rate': 9.816022523549885e-05, 'epoch': 2.02}
{'loss': 0.9935, 'learning_rate': 9.804525352957124e-05, 'epoch': 2.02}
{'loss': 1.0946, 'learning_rate': 9.793028440845434e-05, 'epoch': 2.02}
{'loss': 0.9827, 'learning_rate': 9.781531802417473e-05, 'epoch': 2.02}
{'loss': 1.0292, 'learning_rate': 9.770035452875537e-05, 'epoch': 2.02}
51%|█████████████████████████████████████████████████████████▏ | 1406/2752 [24:15<22:39, 1.01s/it][2023-12-29 02:27:40,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,646] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,646] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,646] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,646] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,677] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,677] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,679] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
{'loss': 0.9939, 'learning_rate': 9.758539407421542e-05, 'epoch': 2.0}
{'loss': 0.8951, 'learning_rate': 9.747043681256996e-05, 'epoch': 2.0}
{'loss': 1.0295, 'learning_rate': 9.735548289582992e-05, 'epoch': 2.0}
{'loss': 0.9203, 'learning_rate': 9.724053247600175e-05, 'epoch': 2.01}
{'loss': 0.9928, 'learning_rate': 9.712558570508726e-05, 'epoch': 2.01}
{'loss': 1.0479, 'learning_rate': 9.701064273508356e-05, 'epoch': 2.01}
{'loss': 0.9144, 'learning_rate': 9.68957037179826e-05, 'epoch': 2.01}
{'loss': 1.0052, 'learning_rate': 9.678076880577114e-05, 'epoch': 2.01}
{'loss': 1.0168, 'learning_rate': 9.666583815043054e-05, 'epoch': 2.01}
{'loss': 0.9791, 'learning_rate': 9.65509119039365e-05, 'epoch': 2.01}
{'loss': 0.944, 'learning_rate': 9.643599021825892e-05, 'epoch': 2.02}
{'loss': 0.9631, 'learning_rate': 9.632107324536165e-05, 'epoch': 2.02}
{'loss': 1.107, 'learning_rate': 9.620616113720232e-05, 'epoch': 2.02}
{'loss': 0.9464, 'learning_rate': 9.609125404573211e-05, 'epoch': 2.02}
{'loss': 0.9397, 'learning_rate': 9.59763521228955e-05, 'epoch': 2.02}
{'loss': 0.989, 'learning_rate': 9.586145552063035e-05, 'epoch': 2.02}
{'loss': 1.0245, 'learning_rate': 9.574656439086725e-05, 'epoch': 2.02}
{'loss': 0.9392, 'learning_rate': 9.563167888552968e-05, 'epoch': 2.03}
{'loss': 0.9548, 'learning_rate': 9.551679915653362e-05, 'epoch': 2.03}
{'loss': 0.956, 'learning_rate': 9.540192535578748e-05, 'epoch': 2.03}
{'loss': 0.9523, 'learning_rate': 9.528705763519176e-05, 'epoch': 2.03}
{'loss': 0.9789, 'learning_rate': 9.517219614663896e-05, 'epoch': 2.03}
{'loss': 1.0522, 'learning_rate': 9.505734104201336e-05, 'epoch': 2.03}
{'loss': 0.9545, 'learning_rate': 9.494249247319077e-05, 'epoch': 2.03}
{'loss': 0.8591, 'learning_rate': 9.482765059203834e-05, 'epoch': 2.04}
{'loss': 0.9401, 'learning_rate': 9.471281555041432e-05, 'epoch': 2.04}
{'loss': 1.0116, 'learning_rate': 9.459798750016813e-05, 'epoch': 2.04}
{'loss': 0.9633, 'learning_rate': 9.448316659313975e-05, 'epoch': 2.04}
{'loss': 1.031, 'learning_rate': 9.436835298115975e-05, 'epoch': 2.04}
{'loss': 0.9011, 'learning_rate': 9.425354681604907e-05, 'epoch': 2.04}
{'loss': 0.9536, 'learning_rate': 9.413874824961883e-05, 'epoch': 2.05}
{'loss': 0.9879, 'learning_rate': 9.402395743367008e-05, 'epoch': 2.05}
{'loss': 0.8898, 'learning_rate': 9.390917451999359e-05, 'epoch': 2.05}
{'loss': 0.9827, 'learning_rate': 9.379439966036977e-05, 'epoch': 2.05}
{'loss': 0.9504, 'learning_rate': 9.367963300656827e-05, 'epoch': 2.05}
{'loss': 0.9886, 'learning_rate': 9.356487471034796e-05, 'epoch': 2.05}
{'loss': 1.0051, 'learning_rate': 9.34501249234566e-05, 'epoch': 2.05}
{'loss': 0.9898, 'learning_rate': 9.333538379763079e-05, 'epoch': 2.06}
{'loss': 0.9908, 'learning_rate': 9.32206514845956e-05, 'epoch': 2.06}
{'loss': 0.8935, 'learning_rate': 9.310592813606449e-05, 'epoch': 2.06}
{'loss': 0.9603, 'learning_rate': 9.2991213903739e-05, 'epoch': 2.06}
{'loss': 0.9367, 'learning_rate': 9.28765089393087e-05, 'epoch': 2.06}
{'loss': 0.9518, 'learning_rate': 9.276181339445088e-05, 'epoch': 2.06}
{'loss': 1.0235, 'learning_rate': 9.26471274208303e-05, 'epoch': 2.06}
{'loss': 0.9373, 'learning_rate': 9.253245117009919e-05, 'epoch': 2.07}
{'loss': 0.943, 'learning_rate': 9.241778479389683e-05, 'epoch': 2.07}
{'loss': 1.0316, 'learning_rate': 9.230312844384943e-05, 'epoch': 2.07}
{'loss': 0.941, 'learning_rate': 9.218848227157007e-05, 'epoch': 2.07}
{'loss': 0.926, 'learning_rate': 9.207384642865824e-05, 'epoch': 2.07}
{'loss': 0.9663, 'learning_rate': 9.195922106669981e-05, 'epoch': 2.07}
{'loss': 0.9753, 'learning_rate': 9.18446063372668e-05, 'epoch': 2.07}
{'loss': 0.9101, 'learning_rate': 9.173000239191713e-05, 'epoch': 2.08}
{'loss': 0.925, 'learning_rate': 9.161540938219454e-05, 'epoch': 2.08}
{'loss': 0.9318, 'learning_rate': 9.150082745962828e-05, 'epoch': 2.08}
{'loss': 0.9457, 'learning_rate': 9.138625677573289e-05, 'epoch': 2.08}
{'loss': 0.8729, 'learning_rate': 9.127169748200812e-05, 'epoch': 2.08}
{'loss': 0.8718, 'learning_rate': 9.115714972993858e-05, 'epoch': 2.08}
{'loss': 0.9763, 'learning_rate': 9.104261367099365e-05, 'epoch': 2.08}
{'loss': 1.0157, 'learning_rate': 9.092808945662733e-05, 'epoch': 2.09}
{'loss': 0.8974, 'learning_rate': 9.081357723827785e-05, 'epoch': 2.09}
{'loss': 0.9423, 'learning_rate': 9.069907716736761e-05, 'epoch': 2.09}
{'loss': 0.9599, 'learning_rate': 9.058458939530295e-05, 'epoch': 2.09}
{'loss': 0.9702, 'learning_rate': 9.047011407347389e-05, 'epoch': 2.09}
{'loss': 0.9297, 'learning_rate': 9.035565135325414e-05, 'epoch': 2.09}
{'loss': 0.9849, 'learning_rate': 9.024120138600063e-05, 'epoch': 2.09}
{'loss': 0.9112, 'learning_rate': 9.01267643230534e-05, 'epoch': 2.1}
{'loss': 0.9173, 'learning_rate': 9.001234031573553e-05, 'epoch': 2.1}
{'loss': 0.8652, 'learning_rate': 8.989792951535276e-05, 'epoch': 2.1}
{'loss': 0.9371, 'learning_rate': 8.978353207319332e-05, 'epoch': 2.1}
{'loss': 0.9531, 'learning_rate': 8.966914814052796e-05, 'epoch': 2.1}
{'loss': 0.9788, 'learning_rate': 8.955477786860937e-05, 'epoch': 2.1}
{'loss': 0.966, 'learning_rate': 8.944042140867229e-05, 'epoch': 2.1}
{'loss': 0.8829, 'learning_rate': 8.932607891193315e-05, 'epoch': 2.11}
{'loss': 0.9596, 'learning_rate': 8.921175052958985e-05, 'epoch': 2.11}
{'loss': 0.8538, 'learning_rate': 8.909743641282183e-05, 'epoch': 2.11}
{'loss': 0.9055, 'learning_rate': 8.898313671278948e-05, 'epoch': 2.11}
{'loss': 0.9679, 'learning_rate': 8.886885158063416e-05, 'epoch': 2.11}
{'loss': 0.7991, 'learning_rate': 8.875458116747806e-05, 'epoch': 2.11}
{'loss': 0.9862, 'learning_rate': 8.864032562442374e-05, 'epoch': 2.11}
{'loss': 0.9596, 'learning_rate': 8.852608510255429e-05, 'epoch': 2.12}
{'loss': 1.0007, 'learning_rate': 8.841185975293282e-05, 'epoch': 2.12}
{'loss': 0.9466, 'learning_rate': 8.829764972660237e-05, 'epoch': 2.12}
{'loss': 0.8998, 'learning_rate': 8.818345517458576e-05, 'epoch': 2.12}
{'loss': 0.9441, 'learning_rate': 8.806927624788535e-05, 'epoch': 2.12}
{'loss': 0.9134, 'learning_rate': 8.795511309748276e-05, 'epoch': 2.12}
{'loss': 0.9951, 'learning_rate': 8.78409658743389e-05, 'epoch': 2.12}
{'loss': 0.9091, 'learning_rate': 8.772683472939351e-05, 'epoch': 2.13}
{'loss': 0.9919, 'learning_rate': 8.761271981356504e-05, 'epoch': 2.13}
{'loss': 0.9902, 'learning_rate': 8.749862127775058e-05, 'epoch': 2.13}
{'loss': 0.9323, 'learning_rate': 8.738453927282548e-05, 'epoch': 2.13}
{'loss': 1.0634, 'learning_rate': 8.727047394964328e-05, 'epoch': 2.13}
{'loss': 1.0659, 'learning_rate': 8.715642545903546e-05, 'epoch': 2.13}
{'loss': 0.9359, 'learning_rate': 8.704239395181121e-05, 'epoch': 2.14}
{'loss': 0.9258, 'learning_rate': 8.692837957875725e-05, 'epoch': 2.14}
{'loss': 1.0116, 'learning_rate': 8.681438249063767e-05, 'epoch': 2.14}
{'loss': 0.9114, 'learning_rate': 8.670040283819376e-05, 'epoch': 2.14}
{'loss': 1.002, 'learning_rate': 8.658644077214368e-05, 'epoch': 2.14}
{'loss': 0.8892, 'learning_rate': 8.647249644318232e-05, 'epoch': 2.14}
{'loss': 0.9424, 'learning_rate': 8.635857000198114e-05, 'epoch': 2.14}
{'loss': 0.845, 'learning_rate': 8.6244661599188e-05, 'epoch': 2.15}
{'loss': 0.949, 'learning_rate': 8.613077138542684e-05, 'epoch': 2.15}
{'loss': 0.9641, 'learning_rate': 8.601689951129757e-05, 'epoch': 2.15}
{'loss': 0.9768, 'learning_rate': 8.590304612737587e-05, 'epoch': 2.15}
{'loss': 0.8609, 'learning_rate': 8.578921138421294e-05, 'epoch': 2.15}
{'loss': 0.9284, 'learning_rate': 8.567539543233532e-05, 'epoch': 2.15}
{'loss': 0.8856, 'learning_rate': 8.556159842224472e-05, 'epoch': 2.15}
{'loss': 0.9045, 'learning_rate': 8.544782050441785e-05, 'epoch': 2.16}
{'loss': 0.9271, 'learning_rate': 8.533406182930613e-05, 'epoch': 2.16}
{'loss': 0.9371, 'learning_rate': 8.522032254733548e-05, 'epoch': 2.16}
{'loss': 0.8904, 'learning_rate': 8.510660280890625e-05, 'epoch': 2.16}
{'loss': 0.9564, 'learning_rate': 8.499290276439293e-05, 'epoch': 2.16}
{'loss': 0.8299, 'learning_rate': 8.4879222564144e-05, 'epoch': 2.16}
{'loss': 0.9851, 'learning_rate': 8.47655623584816e-05, 'epoch': 2.16}
{'loss': 0.9057, 'learning_rate': 8.465192229770156e-05, 'epoch': 2.17}
{'loss': 0.8881, 'learning_rate': 8.4538302532073e-05, 'epoch': 2.17}
{'loss': 0.8754, 'learning_rate': 8.442470321183817e-05, 'epoch': 2.17}
{'loss': 0.9749, 'learning_rate': 8.43111244872123e-05, 'epoch': 2.17}
{'loss': 0.9256, 'learning_rate': 8.41975665083835e-05, 'epoch': 2.17}
{'loss': 0.9899, 'learning_rate': 8.408402942551234e-05, 'epoch': 2.17}
{'loss': 0.9544, 'learning_rate': 8.397051338873172e-05, 'epoch': 2.17}
{'loss': 0.9213, 'learning_rate': 8.38570185481468e-05, 'epoch': 2.18}
{'loss': 0.9002, 'learning_rate': 8.374354505383467e-05, 'epoch': 2.18}
{'loss': 0.8544, 'learning_rate': 8.363009305584424e-05, 'epoch': 2.18}
{'loss': 0.9532, 'learning_rate': 8.351666270419589e-05, 'epoch': 2.18}
{'loss': 0.9327, 'learning_rate': 8.340325414888152e-05, 'epoch': 2.18}
{'loss': 0.9033, 'learning_rate': 8.328986753986409e-05, 'epoch': 2.18}
{'loss': 1.0464, 'learning_rate': 8.317650302707754e-05, 'epoch': 2.18}
{'loss': 0.8241, 'learning_rate': 8.306316076042673e-05, 'epoch': 2.19}
{'loss': 0.9649, 'learning_rate': 8.294984088978694e-05, 'epoch': 2.19}
{'loss': 0.9017, 'learning_rate': 8.283654356500394e-05, 'epoch': 2.19}
{'loss': 0.9327, 'learning_rate': 8.272326893589362e-05, 'epoch': 2.19}
{'loss': 0.87, 'learning_rate': 8.261001715224188e-05, 'epoch': 2.19}
{'loss': 0.9762, 'learning_rate': 8.249678836380442e-05, 'epoch': 2.19}
{'loss': 0.9656, 'learning_rate': 8.238358272030658e-05, 'epoch': 2.19}
{'loss': 0.9379, 'learning_rate': 8.227040037144297e-05, 'epoch': 2.2}
{'loss': 1.0153, 'learning_rate': 8.215724146687756e-05, 'epoch': 2.2}
{'loss': 1.0173, 'learning_rate': 8.204410615624318e-05, 'epoch': 2.2}
{'loss': 0.9253, 'learning_rate': 8.193099458914148e-05, 'epoch': 2.2}
{'loss': 0.957, 'learning_rate': 8.181790691514284e-05, 'epoch': 2.2}
{'loss': 0.9088, 'learning_rate': 8.17048432837859e-05, 'epoch': 2.2}
{'loss': 0.8977, 'learning_rate': 8.159180384457757e-05, 'epoch': 2.2}
{'loss': 0.898, 'learning_rate': 8.147878874699274e-05, 'epoch': 2.21}
56%|███████████████████████████████████████████████████████████████ | 1548/2752 [26:38<20:08, 1.00s/it][2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,188] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,189] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,190] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,190] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,452] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,452] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,716] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,967] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,968] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,254] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,254] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,519] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,519] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,770] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,771] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,031] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,032] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,292] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,293] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,564] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,564] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,823] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,824] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:07,086] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:07,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9910922646522522, 'eval_runtime': 3.1651, 'eval_samples_per_second': 345.012, 'eval_steps_per_second': 21.8, 'epoch': 2.21}
{'loss': 0.9692, 'learning_rate': 8.136579814047409e-05, 'epoch': 2.21}
{'loss': 0.8631, 'learning_rate': 8.125283217443207e-05, 'epoch': 2.21}
{'loss': 0.8234, 'learning_rate': 8.113989099824438e-05, 'epoch': 2.21}
{'loss': 0.9396, 'learning_rate': 8.102697476125597e-05, 'epoch': 2.21}
{'loss': 0.922, 'learning_rate': 8.091408361277888e-05, 'epoch': 2.21}
{'loss': 0.8982, 'learning_rate': 8.080121770209191e-05, 'epoch': 2.22}
{'loss': 0.9227, 'learning_rate': 8.068837717844047e-05, 'epoch': 2.22}
{'loss': 0.9319, 'learning_rate': 8.057556219103653e-05, 'epoch': 2.22}
{'loss': 0.9389, 'learning_rate': 8.046277288905814e-05, 'epoch': 2.22}
{'loss': 0.9657, 'learning_rate': 8.035000942164947e-05, 'epoch': 2.22}
{'loss': 0.9241, 'learning_rate': 8.023727193792048e-05, 'epoch': 2.22}
{'loss': 0.9862, 'learning_rate': 8.012456058694678e-05, 'epoch': 2.22}
{'loss': 0.9788, 'learning_rate': 8.001187551776952e-05, 'epoch': 2.23}
{'loss': 0.8368, 'learning_rate': 7.989921687939497e-05, 'epoch': 2.23}
{'loss': 1.0245, 'learning_rate': 7.978658482079449e-05, 'epoch': 2.23}
{'loss': 0.9395, 'learning_rate': 7.967397949090431e-05, 'epoch': 2.23}
{'loss': 1.0009, 'learning_rate': 7.956140103862527e-05, 'epoch': 2.23}
{'loss': 0.9677, 'learning_rate': 7.944884961282279e-05, 'epoch': 2.23}
{'loss': 0.9715, 'learning_rate': 7.933632536232642e-05, 'epoch': 2.23}
{'loss': 0.8948, 'learning_rate': 7.922382843592984e-05, 'epoch': 2.24}
{'loss': 0.948, 'learning_rate': 7.911135898239055e-05, 'epoch': 2.24}
{'loss': 0.9526, 'learning_rate': 7.899891715042976e-05, 'epoch': 2.24}
{'loss': 0.9683, 'learning_rate': 7.888650308873213e-05, 'epoch': 2.24}
{'loss': 0.9407, 'learning_rate': 7.87741169459457e-05, 'epoch': 2.24}
{'loss': 0.9093, 'learning_rate': 7.866175887068143e-05, 'epoch': 2.24}
{'loss': 0.9218, 'learning_rate': 7.854942901151328e-05, 'epoch': 2.24}
{'loss': 0.9578, 'learning_rate': 7.843712751697786e-05, 'epoch': 2.25}
{'loss': 0.8599, 'learning_rate': 7.832485453557424e-05, 'epoch': 2.25}
{'loss': 0.909, 'learning_rate': 7.821261021576391e-05, 'epoch': 2.25}
{'loss': 0.9667, 'learning_rate': 7.81003947059703e-05, 'epoch': 2.25}
{'loss': 0.8572, 'learning_rate': 7.798820815457886e-05, 'epoch': 2.25}
{'loss': 0.958, 'learning_rate': 7.787605070993668e-05, 'epoch': 2.25}
{'loss': 0.9316, 'learning_rate': 7.776392252035237e-05, 'epoch': 2.25}
{'loss': 0.8547, 'learning_rate': 7.765182373409591e-05, 'epoch': 2.26}
{'loss': 0.9308, 'learning_rate': 7.753975449939835e-05, 'epoch': 2.26}
{'loss': 0.9408, 'learning_rate': 7.742771496445167e-05, 'epoch': 2.26}
{'loss': 0.9128, 'learning_rate': 7.731570527740856e-05, 'epoch': 2.26}
{'loss': 0.9414, 'learning_rate': 7.720372558638233e-05, 'epoch': 2.26}
{'loss': 0.8834, 'learning_rate': 7.709177603944645e-05, 'epoch': 2.26}
{'loss': 0.8925, 'learning_rate': 7.697985678463476e-05, 'epoch': 2.26}
{'loss': 0.8767, 'learning_rate': 7.686796796994084e-05, 'epoch': 2.27}
{'loss': 0.868, 'learning_rate': 7.675610974331813e-05, 'epoch': 2.27}
{'loss': 0.8945, 'learning_rate': 7.66442822526796e-05, 'epoch': 2.27}
{'loss': 0.8905, 'learning_rate': 7.653248564589751e-05, 'epoch': 2.27}
{'loss': 0.9968, 'learning_rate': 7.642072007080343e-05, 'epoch': 2.27}
{'loss': 0.8581, 'learning_rate': 7.630898567518778e-05, 'epoch': 2.27}
{'loss': 0.8514, 'learning_rate': 7.619728260679975e-05, 'epoch': 2.27}
{'loss': 0.9472, 'learning_rate': 7.608561101334714e-05, 'epoch': 2.28}
{'loss': 0.9261, 'learning_rate': 7.597397104249613e-05, 'epoch': 2.28}
{'loss': 0.9951, 'learning_rate': 7.586236284187106e-05, 'epoch': 2.28}
{'loss': 0.9294, 'learning_rate': 7.575078655905434e-05, 'epoch': 2.28}
{'loss': 0.8349, 'learning_rate': 7.563924234158607e-05, 'epoch': 2.28}
{'loss': 0.8989, 'learning_rate': 7.552773033696398e-05, 'epoch': 2.28}
{'loss': 1.0002, 'learning_rate': 7.541625069264324e-05, 'epoch': 2.28}
{'loss': 0.9754, 'learning_rate': 7.530480355603615e-05, 'epoch': 2.29}
{'loss': 0.9512, 'learning_rate': 7.519338907451215e-05, 'epoch': 2.29}
{'loss': 0.923, 'learning_rate': 7.508200739539739e-05, 'epoch': 2.29}
{'loss': 0.8903, 'learning_rate': 7.49706586659747e-05, 'epoch': 2.29}
{'loss': 0.8961, 'learning_rate': 7.485934303348327e-05, 'epoch': 2.29}
{'loss': 0.8639, 'learning_rate': 7.474806064511863e-05, 'epoch': 2.29}
{'loss': 0.9267, 'learning_rate': 7.46368116480323e-05, 'epoch': 2.3}
{'loss': 0.8483, 'learning_rate': 7.452559618933164e-05, 'epoch': 2.3}
{'loss': 0.8689, 'learning_rate': 7.441441441607964e-05, 'epoch': 2.3}
{'loss': 0.8956, 'learning_rate': 7.43032664752948e-05, 'epoch': 2.3}
{'loss': 0.8652, 'learning_rate': 7.419215251395078e-05, 'epoch': 2.3}
{'loss': 0.9348, 'learning_rate': 7.408107267897651e-05, 'epoch': 2.3}
{'loss': 0.9277, 'learning_rate': 7.397002711725558e-05, 'epoch': 2.3}
{'loss': 0.9744, 'learning_rate': 7.385901597562637e-05, 'epoch': 2.31}
{'loss': 0.9454, 'learning_rate': 7.374803940088171e-05, 'epoch': 2.31}
{'loss': 0.9003, 'learning_rate': 7.36370975397687e-05, 'epoch': 2.31}
{'loss': 0.8666, 'learning_rate': 7.352619053898864e-05, 'epoch': 2.31}
{'loss': 0.9264, 'learning_rate': 7.341531854519664e-05, 'epoch': 2.31}
{'loss': 0.9439, 'learning_rate': 7.33044817050015e-05, 'epoch': 2.31}
{'loss': 0.9582, 'learning_rate': 7.319368016496564e-05, 'epoch': 2.31}
{'loss': 1.0029, 'learning_rate': 7.308291407160472e-05, 'epoch': 2.32}
{'loss': 0.9322, 'learning_rate': 7.29721835713875e-05, 'epoch': 2.32}
{'loss': 0.8516, 'learning_rate': 7.286148881073578e-05, 'epoch': 2.32}
{'loss': 0.8623, 'learning_rate': 7.275082993602402e-05, 'epoch': 2.32}
{'loss': 0.8561, 'learning_rate': 7.264020709357927e-05, 'epoch': 2.32}
{'loss': 0.8975, 'learning_rate': 7.25296204296809e-05, 'epoch': 2.32}
{'loss': 0.8716, 'learning_rate': 7.241907009056039e-05, 'epoch': 2.32}
{'loss': 1.0045, 'learning_rate': 7.230855622240136e-05, 'epoch': 2.33}
{'loss': 0.8901, 'learning_rate': 7.219807897133906e-05, 'epoch': 2.33}
{'loss': 1.0119, 'learning_rate': 7.208763848346029e-05, 'epoch': 2.33}
{'loss': 0.9225, 'learning_rate': 7.197723490480338e-05, 'epoch': 2.33}
{'loss': 0.9379, 'learning_rate': 7.186686838135774e-05, 'epoch': 2.33}
{'loss': 0.949, 'learning_rate': 7.17565390590638e-05, 'epoch': 2.33}
{'loss': 0.8753, 'learning_rate': 7.164624708381285e-05, 'epoch': 2.33}
{'loss': 0.9395, 'learning_rate': 7.153599260144677e-05, 'epoch': 2.34}
{'loss': 0.9719, 'learning_rate': 7.142577575775782e-05, 'epoch': 2.34}
{'loss': 0.9445, 'learning_rate': 7.13155966984885e-05, 'epoch': 2.34}
{'loss': 0.8849, 'learning_rate': 7.120545556933138e-05, 'epoch': 2.34}
{'loss': 0.9214, 'learning_rate': 7.109535251592892e-05, 'epoch': 2.34}
{'loss': 0.8934, 'learning_rate': 7.098528768387311e-05, 'epoch': 2.34}
{'loss': 0.849, 'learning_rate': 7.087526121870548e-05, 'epoch': 2.34}
{'loss': 0.9069, 'learning_rate': 7.076527326591682e-05, 'epoch': 2.35}
{'loss': 0.939, 'learning_rate': 7.065532397094695e-05, 'epoch': 2.35}
{'loss': 0.9456, 'learning_rate': 7.054541347918464e-05, 'epoch': 2.35}
{'loss': 0.982, 'learning_rate': 7.043554193596732e-05, 'epoch': 2.35}
{'loss': 0.9272, 'learning_rate': 7.03257094865809e-05, 'epoch': 2.35}
{'loss': 0.8147, 'learning_rate': 7.021591627625958e-05, 'epoch': 2.35}
{'loss': 0.8368, 'learning_rate': 7.010616245018573e-05, 'epoch': 2.35}
{'loss': 0.8645, 'learning_rate': 6.999644815348956e-05, 'epoch': 2.36}
{'loss': 0.9121, 'learning_rate': 6.988677353124913e-05, 'epoch': 2.36}
{'loss': 0.8353, 'learning_rate': 6.977713872848995e-05, 'epoch': 2.36}
{'loss': 0.8928, 'learning_rate': 6.966754389018487e-05, 'epoch': 2.36}
{'loss': 0.9564, 'learning_rate': 6.955798916125393e-05, 'epoch': 2.36}
{'loss': 0.9131, 'learning_rate': 6.94484746865641e-05, 'epoch': 2.36}
{'loss': 0.9782, 'learning_rate': 6.933900061092919e-05, 'epoch': 2.36}
{'loss': 0.9148, 'learning_rate': 6.92295670791095e-05, 'epoch': 2.37}
{'loss': 0.9299, 'learning_rate': 6.912017423581179e-05, 'epoch': 2.37}
{'loss': 0.9317, 'learning_rate': 6.901082222568895e-05, 'epoch': 2.37}
{'loss': 0.8967, 'learning_rate': 6.890151119333988e-05, 'epoch': 2.37}
{'loss': 0.9378, 'learning_rate': 6.87922412833094e-05, 'epoch': 2.37}
{'loss': 0.9033, 'learning_rate': 6.868301264008785e-05, 'epoch': 2.37}
{'loss': 0.9331, 'learning_rate': 6.857382540811101e-05, 'epoch': 2.38}
{'loss': 0.9384, 'learning_rate': 6.846467973175993e-05, 'epoch': 2.38}
{'loss': 0.9054, 'learning_rate': 6.835557575536071e-05, 'epoch': 2.38}
{'loss': 0.9476, 'learning_rate': 6.824651362318425e-05, 'epoch': 2.38}
{'loss': 0.8062, 'learning_rate': 6.813749347944625e-05, 'epoch': 2.38}
{'loss': 0.9072, 'learning_rate': 6.802851546830674e-05, 'epoch': 2.38}
{'loss': 0.8875, 'learning_rate': 6.791957973387013e-05, 'epoch': 2.38}
{'loss': 0.8945, 'learning_rate': 6.781068642018488e-05, 'epoch': 2.39}
{'loss': 0.9672, 'learning_rate': 6.770183567124337e-05, 'epoch': 2.39}
{'loss': 0.9715, 'learning_rate': 6.759302763098172e-05, 'epoch': 2.39}
{'loss': 1.0443, 'learning_rate': 6.748426244327957e-05, 'epoch': 2.39}
{'loss': 0.9025, 'learning_rate': 6.737554025195984e-05, 'epoch': 2.39}
{'loss': 0.9726, 'learning_rate': 6.726686120078862e-05, 'epoch': 2.39}
{'loss': 1.0293, 'learning_rate': 6.715822543347502e-05, 'epoch': 2.39}
{'loss': 0.8217, 'learning_rate': 6.704963309367083e-05, 'epoch': 2.4}
{'loss': 0.949, 'learning_rate': 6.694108432497048e-05, 'epoch': 2.4}
{'loss': 0.8321, 'learning_rate': 6.683257927091074e-05, 'epoch': 2.4}
{'loss': 1.0148, 'learning_rate': 6.672411807497057e-05, 'epoch': 2.4}
{'loss': 0.9201, 'learning_rate': 6.661570088057097e-05, 'epoch': 2.4}
{'loss': 0.8982, 'learning_rate': 6.65073278310747e-05, 'epoch': 2.4}
{'loss': 0.9065, 'learning_rate': 6.639899906978626e-05, 'epoch': 2.4}
{'loss': 0.9881, 'learning_rate': 6.629071473995147e-05, 'epoch': 2.41}
{'loss': 0.9452, 'learning_rate': 6.618247498475744e-05, 'epoch': 2.41}
{'loss': 0.8942, 'learning_rate': 6.60742799473323e-05, 'epoch': 2.41}
{'loss': 0.9652, 'learning_rate': 6.596612977074515e-05, 'epoch': 2.41}
{'loss': 1.0082, 'learning_rate': 6.585802459800566e-05, 'epoch': 2.41}
{'loss': 1.0078, 'learning_rate': 6.574996457206408e-05, 'epoch': 2.41}
{'loss': 0.9739, 'learning_rate': 6.564194983581089e-05, 'epoch': 2.41}
{'loss': 0.9248, 'learning_rate': 6.553398053207671e-05, 'epoch': 2.42}
{'loss': 1.011, 'learning_rate': 6.542605680363204e-05, 'epoch': 2.42}
{'loss': 0.9858, 'learning_rate': 6.53181787931872e-05, 'epoch': 2.42}
{'loss': 0.9249, 'learning_rate': 6.521034664339204e-05, 'epoch': 2.42}
{'loss': 0.9466, 'learning_rate': 6.510256049683571e-05, 'epoch': 2.42}
{'loss': 0.9436, 'learning_rate': 6.499482049604656e-05, 'epoch': 2.42}
{'loss': 0.9149, 'learning_rate': 6.488712678349189e-05, 'epoch': 2.42}
{'loss': 0.8384, 'learning_rate': 6.477947950157785e-05, 'epoch': 2.43}
{'loss': 0.7928, 'learning_rate': 6.467187879264916e-05, 'epoch': 2.43}
{'loss': 0.9252, 'learning_rate': 6.456432479898897e-05, 'epoch': 2.43}
{'loss': 0.897, 'learning_rate': 6.445681766281863e-05, 'epoch': 2.43}
{'loss': 0.8298, 'learning_rate': 6.434935752629758e-05, 'epoch': 2.43}
{'loss': 0.915, 'learning_rate': 6.4241944531523e-05, 'epoch': 2.43}
{'loss': 0.9462, 'learning_rate': 6.413457882052991e-05, 'epoch': 2.43}
{'loss': 0.8737, 'learning_rate': 6.402726053529065e-05, 'epoch': 2.44}
{'loss': 0.7485, 'learning_rate': 6.391998981771492e-05, 'epoch': 2.44}
{'loss': 0.9372, 'learning_rate': 6.381276680964947e-05, 'epoch': 2.44}
{'loss': 1.002, 'learning_rate': 6.3705591652878e-05, 'epoch': 2.44}
{'loss': 0.9004, 'learning_rate': 6.359846448912099e-05, 'epoch': 2.44}
{'loss': 0.8896, 'learning_rate': 6.349138546003534e-05, 'epoch': 2.44}
{'loss': 0.9036, 'learning_rate': 6.338435470721442e-05, 'epoch': 2.44}
{'loss': 0.8637, 'learning_rate': 6.327737237218765e-05, 'epoch': 2.45}
{'loss': 0.9706, 'learning_rate': 6.317043859642049e-05, 'epoch': 2.45}
{'loss': 0.9648, 'learning_rate': 6.306355352131414e-05, 'epoch': 2.45}
{'loss': 0.9592, 'learning_rate': 6.295671728820553e-05, 'epoch': 2.45}
{'loss': 0.9285, 'learning_rate': 6.284993003836686e-05, 'epoch': 2.45}
{'loss': 0.8922, 'learning_rate': 6.27431919130056e-05, 'epoch': 2.45}
{'loss': 1.0248, 'learning_rate': 6.263650305326429e-05, 'epoch': 2.45}
{'loss': 0.9553, 'learning_rate': 6.252986360022029e-05, 'epoch': 2.46}
62%|██████████████████████████████████████████████████████████████████████ | 1720/2752 [29:34<17:19, 1.01s/it][2023-12-29 02:33:00,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,712] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,713] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,713] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,714] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,977] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,977] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,241] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,242] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,491] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,492] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,778] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,779] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,043] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,044] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,295] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,296] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,557] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,557] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,818] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,819] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,348] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,349] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,611] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,612] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9947829246520996, 'eval_runtime': 3.1664, 'eval_samples_per_second': 344.866, 'eval_steps_per_second': 21.791, 'epoch': 2.46}
{'loss': 0.9065, 'learning_rate': 6.242327369488568e-05, 'epoch': 2.46}
{'loss': 0.889, 'learning_rate': 6.231673347820694e-05, 'epoch': 2.46}
{'loss': 0.8549, 'learning_rate': 6.221024309106498e-05, 'epoch': 2.46}
{'loss': 0.9912, 'learning_rate': 6.210380267427467e-05, 'epoch': 2.46}
{'loss': 0.9013, 'learning_rate': 6.199741236858483e-05, 'epoch': 2.46}
{'loss': 0.9185, 'learning_rate': 6.189107231467814e-05, 'epoch': 2.47}
{'loss': 0.9443, 'learning_rate': 6.17847826531707e-05, 'epoch': 2.47}
{'loss': 0.9674, 'learning_rate': 6.167854352461202e-05, 'epoch': 2.47}
{'loss': 0.861, 'learning_rate': 6.15723550694848e-05, 'epoch': 2.47}
{'loss': 0.9042, 'learning_rate': 6.146621742820471e-05, 'epoch': 2.47}
{'loss': 0.9436, 'learning_rate': 6.136013074112018e-05, 'epoch': 2.47}
{'loss': 0.9288, 'learning_rate': 6.125409514851243e-05, 'epoch': 2.47}
{'loss': 0.9631, 'learning_rate': 6.114811079059495e-05, 'epoch': 2.48}
{'loss': 0.9584, 'learning_rate': 6.104217780751353e-05, 'epoch': 2.48}
{'loss': 0.8961, 'learning_rate': 6.0936296339346054e-05, 'epoch': 2.48}
{'loss': 0.9219, 'learning_rate': 6.083046652610224e-05, 'epoch': 2.48}
{'loss': 0.8926, 'learning_rate': 6.072468850772357e-05, 'epoch': 2.48}
{'loss': 0.8685, 'learning_rate': 6.061896242408298e-05, 'epoch': 2.48}
{'loss': 0.901, 'learning_rate': 6.051328841498473e-05, 'epoch': 2.48}
{'loss': 0.9529, 'learning_rate': 6.040766662016424e-05, 'epoch': 2.49}
{'loss': 0.8898, 'learning_rate': 6.0302097179287844e-05, 'epoch': 2.49}
{'loss': 0.8969, 'learning_rate': 6.019658023195276e-05, 'epoch': 2.49}
{'loss': 0.8563, 'learning_rate': 6.009111591768668e-05, 'epoch': 2.49}
{'loss': 0.9268, 'learning_rate': 5.998570437594775e-05, 'epoch': 2.49}
{'loss': 0.9143, 'learning_rate': 5.9880345746124265e-05, 'epoch': 2.49}
{'loss': 1.0232, 'learning_rate': 5.977504016753468e-05, 'epoch': 2.49}
{'loss': 0.9814, 'learning_rate': 5.9669787779427155e-05, 'epoch': 2.5}
{'loss': 0.8978, 'learning_rate': 5.9564588720979655e-05, 'epoch': 2.5}
{'loss': 0.9057, 'learning_rate': 5.945944313129953e-05, 'epoch': 2.5}
{'loss': 0.8697, 'learning_rate': 5.935435114942345e-05, 'epoch': 2.5}
{'loss': 0.8556, 'learning_rate': 5.924931291431719e-05, 'epoch': 2.5}
{'loss': 0.957, 'learning_rate': 5.914432856487544e-05, 'epoch': 2.5}
{'loss': 0.928, 'learning_rate': 5.903939823992174e-05, 'epoch': 2.5}
{'loss': 0.9249, 'learning_rate': 5.8934522078208066e-05, 'epoch': 2.51}
{'loss': 0.8818, 'learning_rate': 5.882970021841483e-05, 'epoch': 2.51}
{'loss': 0.9018, 'learning_rate': 5.8724932799150586e-05, 'epoch': 2.51}
{'loss': 0.861, 'learning_rate': 5.8620219958952e-05, 'epoch': 2.51}
{'loss': 1.0394, 'learning_rate': 5.851556183628348e-05, 'epoch': 2.51}
{'loss': 0.9121, 'learning_rate': 5.8410958569537146e-05, 'epoch': 2.51}
{'loss': 0.9268, 'learning_rate': 5.830641029703254e-05, 'epoch': 2.51}
{'loss': 0.9448, 'learning_rate': 5.82019171570164e-05, 'epoch': 2.52}
{'loss': 0.8035, 'learning_rate': 5.8097479287662756e-05, 'epoch': 2.52}
{'loss': 0.943, 'learning_rate': 5.79930968270724e-05, 'epoch': 2.52}
{'loss': 0.8752, 'learning_rate': 5.788876991327288e-05, 'epoch': 2.52}
{'loss': 0.8614, 'learning_rate': 5.778449868421836e-05, 'epoch': 2.52}
{'loss': 0.9529, 'learning_rate': 5.768028327778932e-05, 'epoch': 2.52}
{'loss': 0.9355, 'learning_rate': 5.757612383179238e-05, 'epoch': 2.52}
{'loss': 0.9904, 'learning_rate': 5.747202048396023e-05, 'epoch': 2.53}
{'loss': 0.8688, 'learning_rate': 5.73679733719514e-05, 'epoch': 2.53}
{'loss': 0.9654, 'learning_rate': 5.726398263335e-05, 'epoch': 2.53}
{'loss': 0.9239, 'learning_rate': 5.71600484056656e-05, 'epoch': 2.53}
{'loss': 0.9396, 'learning_rate': 5.705617082633306e-05, 'epoch': 2.53}
{'loss': 0.9185, 'learning_rate': 5.695235003271231e-05, 'epoch': 2.53}
{'loss': 1.0378, 'learning_rate': 5.684858616208826e-05, 'epoch': 2.53}
{'loss': 0.8558, 'learning_rate': 5.674487935167049e-05, 'epoch': 2.54}
{'loss': 0.9194, 'learning_rate': 5.664122973859313e-05, 'epoch': 2.54}
{'loss': 0.8405, 'learning_rate': 5.653763745991467e-05, 'epoch': 2.54}
{'loss': 0.9421, 'learning_rate': 5.643410265261784e-05, 'epoch': 2.54}
{'loss': 1.0677, 'learning_rate': 5.633062545360925e-05, 'epoch': 2.54}
{'loss': 0.9314, 'learning_rate': 5.622720599971952e-05, 'epoch': 2.54}
{'loss': 0.851, 'learning_rate': 5.6123844427702775e-05, 'epoch': 2.55}
{'loss': 0.94, 'learning_rate': 5.602054087423663e-05, 'epoch': 2.55}
{'loss': 0.9605, 'learning_rate': 5.591729547592195e-05, 'epoch': 2.55}
{'loss': 0.8999, 'learning_rate': 5.5814108369282824e-05, 'epoch': 2.55}
{'loss': 0.9691, 'learning_rate': 5.571097969076611e-05, 'epoch': 2.55}
{'loss': 0.8829, 'learning_rate': 5.5607909576741445e-05, 'epoch': 2.55}
{'loss': 0.9018, 'learning_rate': 5.550489816350113e-05, 'epoch': 2.55}
{'loss': 0.9032, 'learning_rate': 5.540194558725973e-05, 'epoch': 2.56}
{'loss': 0.8197, 'learning_rate': 5.5299051984153995e-05, 'epoch': 2.56}
{'loss': 0.9565, 'learning_rate': 5.5196217490242793e-05, 'epoch': 2.56}
{'loss': 0.8936, 'learning_rate': 5.5093442241506784e-05, 'epoch': 2.56}
{'loss': 0.9086, 'learning_rate': 5.4990726373848243e-05, 'epoch': 2.56}
{'loss': 0.8598, 'learning_rate': 5.488807002309098e-05, 'epoch': 2.56}
{'loss': 0.9722, 'learning_rate': 5.478547332498007e-05, 'epoch': 2.56}
{'loss': 1.005, 'learning_rate': 5.468293641518172e-05, 'epoch': 2.57}
{'loss': 0.924, 'learning_rate': 5.458045942928309e-05, 'epoch': 2.57}
{'loss': 0.9149, 'learning_rate': 5.447804250279213e-05, 'epoch': 2.57}
{'loss': 0.9354, 'learning_rate': 5.437568577113727e-05, 'epoch': 2.57}
{'loss': 0.94, 'learning_rate': 5.4273389369667436e-05, 'epoch': 2.57}
{'loss': 0.967, 'learning_rate': 5.417115343365171e-05, 'epoch': 2.57}
{'loss': 0.9475, 'learning_rate': 5.4068978098279336e-05, 'epoch': 2.57}
{'loss': 0.9334, 'learning_rate': 5.396686349865929e-05, 'epoch': 2.58}
{'loss': 0.8909, 'learning_rate': 5.3864809769820315e-05, 'epoch': 2.58}
{'loss': 0.8644, 'learning_rate': 5.37628170467106e-05, 'epoch': 2.58}
{'loss': 0.9284, 'learning_rate': 5.366088546419771e-05, 'epoch': 2.58}
{'loss': 0.9473, 'learning_rate': 5.3559015157068404e-05, 'epoch': 2.58}
{'loss': 0.89, 'learning_rate': 5.3457206260028324e-05, 'epoch': 2.58}
{'loss': 0.9504, 'learning_rate': 5.3355458907701925e-05, 'epoch': 2.58}
{'loss': 0.8861, 'learning_rate': 5.325377323463239e-05, 'epoch': 2.59}
{'loss': 0.9384, 'learning_rate': 5.315214937528121e-05, 'epoch': 2.59}
{'loss': 0.8827, 'learning_rate': 5.3050587464028136e-05, 'epoch': 2.59}
{'loss': 0.882, 'learning_rate': 5.2949087635171144e-05, 'epoch': 2.59}
{'loss': 0.8757, 'learning_rate': 5.284765002292598e-05, 'epoch': 2.59}
{'loss': 0.9264, 'learning_rate': 5.2746274761426176e-05, 'epoch': 2.59}
{'loss': 0.8916, 'learning_rate': 5.2644961984722796e-05, 'epoch': 2.59}
{'loss': 0.9372, 'learning_rate': 5.254371182678424e-05, 'epoch': 2.6}
{'loss': 1.0207, 'learning_rate': 5.244252442149624e-05, 'epoch': 2.6}
{'loss': 1.0514, 'learning_rate': 5.234139990266143e-05, 'epoch': 2.6}
{'loss': 0.8366, 'learning_rate': 5.224033840399931e-05, 'epoch': 2.6}
{'loss': 0.959, 'learning_rate': 5.213934005914607e-05, 'epoch': 2.6}
{'loss': 0.8958, 'learning_rate': 5.203840500165434e-05, 'epoch': 2.6}
{'loss': 0.8551, 'learning_rate': 5.1937533364993143e-05, 'epoch': 2.6}
{'loss': 0.9172, 'learning_rate': 5.1836725282547585e-05, 'epoch': 2.61}
{'loss': 0.831, 'learning_rate': 5.173598088761874e-05, 'epoch': 2.61}
{'loss': 0.9451, 'learning_rate': 5.163530031342347e-05, 'epoch': 2.61}
{'loss': 1.0276, 'learning_rate': 5.153468369309424e-05, 'epoch': 2.61}
{'loss': 1.0002, 'learning_rate': 5.1434131159678945e-05, 'epoch': 2.61}
{'loss': 0.9118, 'learning_rate': 5.133364284614077e-05, 'epoch': 2.61}
{'loss': 0.9769, 'learning_rate': 5.123321888535795e-05, 'epoch': 2.61}
{'loss': 0.9228, 'learning_rate': 5.113285941012358e-05, 'epoch': 2.62}
{'loss': 0.8696, 'learning_rate': 5.103256455314562e-05, 'epoch': 2.62}
{'loss': 0.9039, 'learning_rate': 5.093233444704641e-05, 'epoch': 2.62}
{'loss': 1.0408, 'learning_rate': 5.083216922436284e-05, 'epoch': 2.62}
{'loss': 0.9339, 'learning_rate': 5.073206901754586e-05, 'epoch': 2.62}
{'loss': 0.86, 'learning_rate': 5.063203395896052e-05, 'epoch': 2.62}
{'loss': 0.9069, 'learning_rate': 5.053206418088572e-05, 'epoch': 2.62}
{'loss': 0.9286, 'learning_rate': 5.043215981551398e-05, 'epoch': 2.63}
{'loss': 0.8741, 'learning_rate': 5.033232099495144e-05, 'epoch': 2.63}
{'loss': 0.933, 'learning_rate': 5.023254785121746e-05, 'epoch': 2.63}
{'loss': 0.9479, 'learning_rate': 5.0132840516244604e-05, 'epoch': 2.63}
{'loss': 0.9658, 'learning_rate': 5.003319912187838e-05, 'epoch': 2.63}
{'loss': 0.8772, 'learning_rate': 4.993362379987716e-05, 'epoch': 2.63}
{'loss': 0.936, 'learning_rate': 4.9834114681911846e-05, 'epoch': 2.64}
{'loss': 0.9086, 'learning_rate': 4.9734671899565955e-05, 'epoch': 2.64}
{'loss': 0.9157, 'learning_rate': 4.963529558433514e-05, 'epoch': 2.64}
{'loss': 0.9237, 'learning_rate': 4.953598586762722e-05, 'epoch': 2.64}
{'loss': 0.8762, 'learning_rate': 4.9436742880761964e-05, 'epoch': 2.64}
{'loss': 0.934, 'learning_rate': 4.933756675497082e-05, 'epoch': 2.64}
{'loss': 0.9644, 'learning_rate': 4.923845762139699e-05, 'epoch': 2.64}
{'loss': 0.8441, 'learning_rate': 4.913941561109493e-05, 'epoch': 2.65}
{'loss': 0.8449, 'learning_rate': 4.904044085503041e-05, 'epoch': 2.65}
{'loss': 0.9557, 'learning_rate': 4.894153348408021e-05, 'epoch': 2.65}
{'loss': 0.9485, 'learning_rate': 4.884269362903212e-05, 'epoch': 2.65}
{'loss': 0.9048, 'learning_rate': 4.874392142058456e-05, 'epoch': 2.65}
{'loss': 0.9076, 'learning_rate': 4.8645216989346466e-05, 'epoch': 2.65}
{'loss': 0.91, 'learning_rate': 4.8546580465837274e-05, 'epoch': 2.65}
{'loss': 0.881, 'learning_rate': 4.844801198048654e-05, 'epoch': 2.66}
{'loss': 0.9445, 'learning_rate': 4.834951166363385e-05, 'epoch': 2.66}
{'loss': 1.0006, 'learning_rate': 4.825107964552864e-05, 'epoch': 2.66}
{'loss': 1.0497, 'learning_rate': 4.815271605633012e-05, 'epoch': 2.66}
{'loss': 0.8729, 'learning_rate': 4.8054421026106913e-05, 'epoch': 2.66}
{'loss': 0.9184, 'learning_rate': 4.7956194684837045e-05, 'epoch': 2.66}
{'loss': 0.9196, 'learning_rate': 4.785803716240767e-05, 'epoch': 2.66}
{'loss': 0.972, 'learning_rate': 4.775994858861492e-05, 'epoch': 2.67}
{'loss': 0.9013, 'learning_rate': 4.7661929093163905e-05, 'epoch': 2.67}
{'loss': 1.0315, 'learning_rate': 4.756397880566823e-05, 'epoch': 2.67}
{'loss': 0.909, 'learning_rate': 4.7466097855650025e-05, 'epoch': 2.67}
{'loss': 0.9478, 'learning_rate': 4.7368286372539775e-05, 'epoch': 2.67}
{'loss': 0.8349, 'learning_rate': 4.727054448567601e-05, 'epoch': 2.67}
{'loss': 0.9896, 'learning_rate': 4.7172872324305395e-05, 'epoch': 2.67}
{'loss': 0.9205, 'learning_rate': 4.7075270017582254e-05, 'epoch': 2.68}
{'loss': 0.9018, 'learning_rate': 4.697773769456859e-05, 'epoch': 2.68}
{'loss': 0.9135, 'learning_rate': 4.688027548423386e-05, 'epoch': 2.68}
{'loss': 0.9179, 'learning_rate': 4.678288351545478e-05, 'epoch': 2.68}
{'loss': 0.9328, 'learning_rate': 4.6685561917015276e-05, 'epoch': 2.68}
{'loss': 0.8566, 'learning_rate': 4.658831081760614e-05, 'epoch': 2.68}
{'loss': 0.9334, 'learning_rate': 4.6491130345824906e-05, 'epoch': 2.68}
{'loss': 0.9076, 'learning_rate': 4.639402063017585e-05, 'epoch': 2.69}
{'loss': 0.8876, 'learning_rate': 4.629698179906958e-05, 'epoch': 2.69}
{'loss': 0.9201, 'learning_rate': 4.6200013980822954e-05, 'epoch': 2.69}
{'loss': 0.9125, 'learning_rate': 4.610311730365904e-05, 'epoch': 2.69}
{'loss': 0.9517, 'learning_rate': 4.600629189570672e-05, 'epoch': 2.69}
{'loss': 0.9998, 'learning_rate': 4.590953788500071e-05, 'epoch': 2.69}
{'loss': 0.8575, 'learning_rate': 4.5812855399481256e-05, 'epoch': 2.69}
{'loss': 0.8623, 'learning_rate': 4.571624456699404e-05, 'epoch': 2.7}
{'loss': 0.8983, 'learning_rate': 4.561970551529008e-05, 'epoch': 2.7}
{'loss': 0.9505, 'learning_rate': 4.5523238372025356e-05, 'epoch': 2.7}
{'loss': 0.8463, 'learning_rate': 4.542684326476082e-05, 'epoch': 2.7}
{'loss': 0.9392, 'learning_rate': 4.533052032096217e-05, 'epoch': 2.7}
{'loss': 0.9328, 'learning_rate': 4.523426966799965e-05, 'epoch': 2.7}
{'loss': 0.9663, 'learning_rate': 4.5138091433147925e-05, 'epoch': 2.7}
{'loss': 0.9191, 'learning_rate': 4.504198574358596e-05, 'epoch': 2.71}
69%|█████████████████████████████████████████████████████████████████████████████ | 1892/2752 [32:31<14:30, 1.01s/it][2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,392] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,393] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,393] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,394] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,654] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,655] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,919] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,920] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,170] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,171] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,456] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,722] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,723] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,974] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,974] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,234] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,235] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,496] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,497] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:36:00,026] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:36:00,027] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:36:00,289] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:36:00,290] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9938868880271912, 'eval_runtime': 3.1649, 'eval_samples_per_second': 345.036, 'eval_steps_per_second': 21.802, 'epoch': 2.71}
{'loss': 0.9106, 'learning_rate': 4.4945952726396714e-05, 'epoch': 2.71}
{'loss': 0.9026, 'learning_rate': 4.484999250856706e-05, 'epoch': 2.71}
{'loss': 0.9029, 'learning_rate': 4.475410521698764e-05, 'epoch': 2.71}
{'loss': 0.8842, 'learning_rate': 4.465829097845261e-05, 'epoch': 2.71}
{'loss': 1.003, 'learning_rate': 4.4562549919659625e-05, 'epoch': 2.71}
{'loss': 0.8727, 'learning_rate': 4.4466882167209464e-05, 'epoch': 2.72}
{'loss': 0.9795, 'learning_rate': 4.4371287847606e-05, 'epoch': 2.72}
{'loss': 0.9245, 'learning_rate': 4.427576708725609e-05, 'epoch': 2.72}
{'loss': 0.9606, 'learning_rate': 4.418032001246917e-05, 'epoch': 2.72}
{'loss': 0.9373, 'learning_rate': 4.408494674945739e-05, 'epoch': 2.72}
{'loss': 0.8952, 'learning_rate': 4.3989647424335214e-05, 'epoch': 2.72}
{'loss': 0.879, 'learning_rate': 4.389442216311933e-05, 'epoch': 2.72}
{'loss': 0.8557, 'learning_rate': 4.3799271091728525e-05, 'epoch': 2.73}
{'loss': 0.8478, 'learning_rate': 4.370419433598345e-05, 'epoch': 2.73}
{'loss': 1.0037, 'learning_rate': 4.360919202160648e-05, 'epoch': 2.73}
{'loss': 0.9289, 'learning_rate': 4.351426427422165e-05, 'epoch': 2.73}
{'loss': 0.9311, 'learning_rate': 4.341941121935429e-05, 'epoch': 2.73}
{'loss': 0.8235, 'learning_rate': 4.332463298243099e-05, 'epoch': 2.73}
{'loss': 0.8741, 'learning_rate': 4.3229929688779414e-05, 'epoch': 2.73}
{'loss': 0.8201, 'learning_rate': 4.313530146362809e-05, 'epoch': 2.74}
{'loss': 0.9737, 'learning_rate': 4.304074843210637e-05, 'epoch': 2.74}
{'loss': 0.8822, 'learning_rate': 4.29462707192441e-05, 'epoch': 2.74}
{'loss': 0.8463, 'learning_rate': 4.285186844997154e-05, 'epoch': 2.74}
{'loss': 0.9696, 'learning_rate': 4.275754174911921e-05, 'epoch': 2.74}
{'loss': 0.9475, 'learning_rate': 4.266329074141764e-05, 'epoch': 2.74}
{'loss': 0.9222, 'learning_rate': 4.256911555149742e-05, 'epoch': 2.74}
{'loss': 0.8247, 'learning_rate': 4.247501630388873e-05, 'epoch': 2.75}
{'loss': 0.852, 'learning_rate': 4.2380993123021385e-05, 'epoch': 2.75}
{'loss': 0.8594, 'learning_rate': 4.2287046133224584e-05, 'epoch': 2.75}
{'loss': 0.9492, 'learning_rate': 4.219317545872689e-05, 'epoch': 2.75}
{'loss': 0.9106, 'learning_rate': 4.209938122365579e-05, 'epoch': 2.75}
{'loss': 0.8924, 'learning_rate': 4.200566355203784e-05, 'epoch': 2.75}
{'loss': 0.8769, 'learning_rate': 4.191202256779827e-05, 'epoch': 2.75}
{'loss': 0.9155, 'learning_rate': 4.181845839476091e-05, 'epoch': 2.76}
{'loss': 0.8438, 'learning_rate': 4.172497115664803e-05, 'epoch': 2.76}
{'loss': 0.9502, 'learning_rate': 4.163156097708014e-05, 'epoch': 2.76}
{'loss': 0.954, 'learning_rate': 4.153822797957596e-05, 'epoch': 2.76}
{'loss': 0.9277, 'learning_rate': 4.144497228755203e-05, 'epoch': 2.76}
{'loss': 0.9976, 'learning_rate': 4.1351794024322724e-05, 'epoch': 2.76}
{'loss': 0.8987, 'learning_rate': 4.1258693313099996e-05, 'epoch': 2.76}
{'loss': 1.0284, 'learning_rate': 4.1165670276993254e-05, 'epoch': 2.77}
{'loss': 0.9039, 'learning_rate': 4.1072725039009275e-05, 'epoch': 2.77}
{'loss': 0.9622, 'learning_rate': 4.097985772205186e-05, 'epoch': 2.77}
{'loss': 0.8291, 'learning_rate': 4.088706844892182e-05, 'epoch': 2.77}
{'loss': 1.0228, 'learning_rate': 4.079435734231676e-05, 'epoch': 2.77}
{'loss': 0.9032, 'learning_rate': 4.070172452483091e-05, 'epoch': 2.77}
{'loss': 1.0341, 'learning_rate': 4.0609170118954965e-05, 'epoch': 2.77}
{'loss': 0.9567, 'learning_rate': 4.051669424707602e-05, 'epoch': 2.78}
{'loss': 0.8364, 'learning_rate': 4.042429703147723e-05, 'epoch': 2.78}
{'loss': 0.921, 'learning_rate': 4.033197859433777e-05, 'epoch': 2.78}
{'loss': 0.9741, 'learning_rate': 4.0239739057732614e-05, 'epoch': 2.78}
{'loss': 0.9435, 'learning_rate': 4.014757854363249e-05, 'epoch': 2.78}
{'loss': 0.9182, 'learning_rate': 4.005549717390352e-05, 'epoch': 2.78}
{'loss': 0.8817, 'learning_rate': 3.996349507030731e-05, 'epoch': 2.78}
{'loss': 0.9229, 'learning_rate': 3.987157235450051e-05, 'epoch': 2.79}
{'loss': 0.8952, 'learning_rate': 3.977972914803486e-05, 'epoch': 2.79}
{'loss': 0.9602, 'learning_rate': 3.9687965572356935e-05, 'epoch': 2.79}
{'loss': 0.9466, 'learning_rate': 3.9596281748808086e-05, 'epoch': 2.79}
{'loss': 0.8834, 'learning_rate': 3.950467779862411e-05, 'epoch': 2.79}
{'loss': 0.8743, 'learning_rate': 3.9413153842935255e-05, 'epoch': 2.79}
{'loss': 1.0362, 'learning_rate': 3.9321710002765956e-05, 'epoch': 2.8}
{'loss': 0.9543, 'learning_rate': 3.92303463990347e-05, 'epoch': 2.8}
{'loss': 0.9374, 'learning_rate': 3.9139063152553864e-05, 'epoch': 2.8}
{'loss': 0.9471, 'learning_rate': 3.9047860384029675e-05, 'epoch': 2.8}
{'loss': 0.9507, 'learning_rate': 3.895673821406183e-05, 'epoch': 2.8}
{'loss': 1.0041, 'learning_rate': 3.8865696763143447e-05, 'epoch': 2.8}
{'loss': 0.9205, 'learning_rate': 3.877473615166097e-05, 'epoch': 2.8}
{'loss': 0.8563, 'learning_rate': 3.868385649989388e-05, 'epoch': 2.81}
{'loss': 0.9562, 'learning_rate': 3.859305792801469e-05, 'epoch': 2.81}
{'loss': 0.8823, 'learning_rate': 3.850234055608863e-05, 'epoch': 2.81}
{'loss': 0.9413, 'learning_rate': 3.841170450407358e-05, 'epoch': 2.81}
{'loss': 0.8878, 'learning_rate': 3.832114989181988e-05, 'epoch': 2.81}
{'loss': 0.8403, 'learning_rate': 3.8230676839070134e-05, 'epoch': 2.81}
{'loss': 0.9446, 'learning_rate': 3.814028546545924e-05, 'epoch': 2.81}
{'loss': 0.9467, 'learning_rate': 3.804997589051394e-05, 'epoch': 2.82}
{'loss': 0.8977, 'learning_rate': 3.795974823365287e-05, 'epoch': 2.82}
{'loss': 0.8595, 'learning_rate': 3.7869602614186395e-05, 'epoch': 2.82}
{'loss': 0.8949, 'learning_rate': 3.77795391513163e-05, 'epoch': 2.82}
{'loss': 0.9651, 'learning_rate': 3.768955796413577e-05, 'epoch': 2.82}
{'loss': 0.8827, 'learning_rate': 3.759965917162925e-05, 'epoch': 2.82}
{'loss': 0.9665, 'learning_rate': 3.750984289267217e-05, 'epoch': 2.82}
{'loss': 0.8955, 'learning_rate': 3.7420109246030866e-05, 'epoch': 2.83}
{'loss': 0.8589, 'learning_rate': 3.733045835036241e-05, 'epoch': 2.83}
{'loss': 0.8211, 'learning_rate': 3.724089032421441e-05, 'epoch': 2.83}
{'loss': 0.8969, 'learning_rate': 3.7151405286025e-05, 'epoch': 2.83}
{'loss': 0.9402, 'learning_rate': 3.706200335412248e-05, 'epoch': 2.83}
{'loss': 0.9404, 'learning_rate': 3.6972684646725283e-05, 'epoch': 2.83}
{'loss': 0.9332, 'learning_rate': 3.688344928194181e-05, 'epoch': 2.83}
{'loss': 0.9071, 'learning_rate': 3.6794297377770196e-05, 'epoch': 2.84}
{'loss': 0.9343, 'learning_rate': 3.670522905209832e-05, 'epoch': 2.84}
{'loss': 0.8765, 'learning_rate': 3.661624442270346e-05, 'epoch': 2.84}
{'loss': 0.8928, 'learning_rate': 3.652734360725224e-05, 'epoch': 2.84}
{'loss': 0.9422, 'learning_rate': 3.6438526723300446e-05, 'epoch': 2.84}
{'loss': 0.9866, 'learning_rate': 3.6349793888292915e-05, 'epoch': 2.84}
{'loss': 0.8925, 'learning_rate': 3.626114521956327e-05, 'epoch': 2.84}
{'loss': 0.9413, 'learning_rate': 3.617258083433396e-05, 'epoch': 2.85}
{'loss': 0.9447, 'learning_rate': 3.6084100849715876e-05, 'epoch': 2.85}
{'loss': 0.9265, 'learning_rate': 3.59957053827083e-05, 'epoch': 2.85}
{'loss': 0.9366, 'learning_rate': 3.590739455019888e-05, 'epoch': 2.85}
{'loss': 0.9024, 'learning_rate': 3.581916846896318e-05, 'epoch': 2.85}
{'loss': 1.037, 'learning_rate': 3.573102725566485e-05, 'epoch': 2.85}
{'loss': 0.978, 'learning_rate': 3.564297102685522e-05, 'epoch': 2.85}
{'loss': 0.9384, 'learning_rate': 3.555499989897326e-05, 'epoch': 2.86}
{'loss': 0.9749, 'learning_rate': 3.546711398834543e-05, 'epoch': 2.86}
{'loss': 0.919, 'learning_rate': 3.5379313411185453e-05, 'epoch': 2.86}
{'loss': 0.9048, 'learning_rate': 3.5291598283594316e-05, 'epoch': 2.86}
{'loss': 0.933, 'learning_rate': 3.520396872155992e-05, 'epoch': 2.86}
{'loss': 1.0065, 'learning_rate': 3.5116424840957065e-05, 'epoch': 2.86}
{'loss': 0.8885, 'learning_rate': 3.502896675754722e-05, 'epoch': 2.86}
{'loss': 0.9302, 'learning_rate': 3.494159458697843e-05, 'epoch': 2.87}
{'loss': 0.916, 'learning_rate': 3.485430844478509e-05, 'epoch': 2.87}
{'loss': 0.9507, 'learning_rate': 3.476710844638795e-05, 'epoch': 2.87}
{'loss': 0.9827, 'learning_rate': 3.467999470709373e-05, 'epoch': 2.87}
{'loss': 0.9284, 'learning_rate': 3.459296734209514e-05, 'epoch': 2.87}
{'loss': 0.9059, 'learning_rate': 3.450602646647066e-05, 'epoch': 2.87}
{'loss': 0.8338, 'learning_rate': 3.441917219518438e-05, 'epoch': 2.88}
{'loss': 0.9415, 'learning_rate': 3.433240464308597e-05, 'epoch': 2.88}
{'loss': 0.943, 'learning_rate': 3.4245723924910315e-05, 'epoch': 2.88}
{'loss': 0.92, 'learning_rate': 3.415913015527753e-05, 'epoch': 2.88}
{'loss': 0.8524, 'learning_rate': 3.407262344869272e-05, 'epoch': 2.88}
{'loss': 0.8813, 'learning_rate': 3.3986203919545945e-05, 'epoch': 2.88}
{'loss': 0.946, 'learning_rate': 3.389987168211187e-05, 'epoch': 2.88}
{'loss': 0.9317, 'learning_rate': 3.381362685054987e-05, 'epoch': 2.89}
{'loss': 1.0254, 'learning_rate': 3.3727469538903646e-05, 'epoch': 2.89}
{'loss': 0.8975, 'learning_rate': 3.3641399861101165e-05, 'epoch': 2.89}
{'loss': 0.9077, 'learning_rate': 3.355541793095456e-05, 'epoch': 2.89}
{'loss': 0.9046, 'learning_rate': 3.3469523862159856e-05, 'epoch': 2.89}
{'loss': 0.9072, 'learning_rate': 3.338371776829705e-05, 'epoch': 2.89}
{'loss': 0.8743, 'learning_rate': 3.3297999762829655e-05, 'epoch': 2.89}
{'loss': 0.9907, 'learning_rate': 3.3212369959104774e-05, 'epoch': 2.9}
{'loss': 0.9043, 'learning_rate': 3.312682847035284e-05, 'epoch': 2.9}
{'loss': 0.9034, 'learning_rate': 3.3041375409687526e-05, 'epoch': 2.9}
{'loss': 0.9034, 'learning_rate': 3.295601089010562e-05, 'epoch': 2.9}
{'loss': 0.993, 'learning_rate': 3.287073502448675e-05, 'epoch': 2.9}
{'loss': 0.9328, 'learning_rate': 3.278554792559337e-05, 'epoch': 2.9}
{'loss': 0.9436, 'learning_rate': 3.2700449706070534e-05, 'epoch': 2.9}
{'loss': 0.899, 'learning_rate': 3.2615440478445715e-05, 'epoch': 2.91}
{'loss': 0.8965, 'learning_rate': 3.2530520355128854e-05, 'epoch': 2.91}
{'loss': 0.8862, 'learning_rate': 3.2445689448411934e-05, 'epoch': 2.91}
{'loss': 0.9173, 'learning_rate': 3.236094787046901e-05, 'epoch': 2.91}
{'loss': 1.003, 'learning_rate': 3.2276295733356024e-05, 'epoch': 2.91}
{'loss': 0.8534, 'learning_rate': 3.2191733149010594e-05, 'epoch': 2.91}
{'loss': 0.8589, 'learning_rate': 3.2107260229252036e-05, 'epoch': 2.91}
{'loss': 0.925, 'learning_rate': 3.202287708578097e-05, 'epoch': 2.92}
{'loss': 0.8715, 'learning_rate': 3.193858383017942e-05, 'epoch': 2.92}
{'loss': 0.8844, 'learning_rate': 3.185438057391045e-05, 'epoch': 2.92}
{'loss': 0.9201, 'learning_rate': 3.1770267428318154e-05, 'epoch': 2.92}
{'loss': 0.8743, 'learning_rate': 3.168624450462746e-05, 'epoch': 2.92}
{'loss': 0.8695, 'learning_rate': 3.160231191394407e-05, 'epoch': 2.92}
{'loss': 0.9371, 'learning_rate': 3.151846976725412e-05, 'epoch': 2.92}
{'loss': 0.8625, 'learning_rate': 3.143471817542422e-05, 'epoch': 2.93}
{'loss': 0.9185, 'learning_rate': 3.13510572492012e-05, 'epoch': 2.93}
{'loss': 0.9125, 'learning_rate': 3.1267487099212e-05, 'epoch': 2.93}
{'loss': 0.9906, 'learning_rate': 3.118400783596361e-05, 'epoch': 2.93}
{'loss': 0.9378, 'learning_rate': 3.110061956984275e-05, 'epoch': 2.93}
{'loss': 0.9237, 'learning_rate': 3.10173224111158e-05, 'epoch': 2.93}
{'loss': 0.9199, 'learning_rate': 3.093411646992873e-05, 'epoch': 2.93}
{'loss': 0.8795, 'learning_rate': 3.085100185630685e-05, 'epoch': 2.94}
{'loss': 1.0392, 'learning_rate': 3.0767978680154684e-05, 'epoch': 2.94}
{'loss': 0.9213, 'learning_rate': 3.0685047051255946e-05, 'epoch': 2.94}
{'loss': 0.9464, 'learning_rate': 3.060220707927319e-05, 'epoch': 2.94}
{'loss': 0.9369, 'learning_rate': 3.051945887374782e-05, 'epoch': 2.94}
{'loss': 0.9182, 'learning_rate': 3.0436802544099862e-05, 'epoch': 2.94}
{'loss': 0.8392, 'learning_rate': 3.035423819962785e-05, 'epoch': 2.94}
{'loss': 0.9129, 'learning_rate': 3.027176594950878e-05, 'epoch': 2.95}
{'loss': 1.0331, 'learning_rate': 3.0189385902797705e-05, 'epoch': 2.95}
{'loss': 0.8897, 'learning_rate': 3.0107098168427937e-05, 'epoch': 2.95}
{'loss': 0.9169, 'learning_rate': 3.002490285521059e-05, 'epoch': 2.95}
{'loss': 0.8883, 'learning_rate': 2.9942800071834554e-05, 'epoch': 2.95}
{'loss': 0.9268, 'learning_rate': 2.9860789926866504e-05, 'epoch': 2.95}
{'loss': 0.9142, 'learning_rate': 2.977887252875049e-05, 'epoch': 2.95}
{'loss': 0.9245, 'learning_rate': 2.9697047985807958e-05, 'epoch': 2.96}
75%|████████████████████████████████████████████████████████████████████████████████████ | 2064/2752 [35:28<11:32, 1.01s/it][2023-12-29 02:38:53,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,188] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,188] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,189] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,190] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,450] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,450] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,715] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,716] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,966] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,966] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,252] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,253] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,518] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,518] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,769] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,770] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,030] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,031] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,292] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,292] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,561] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,561] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,824] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,825] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:57,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:57,089] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.991502583026886, 'eval_runtime': 3.1683, 'eval_samples_per_second': 344.661, 'eval_steps_per_second': 21.778, 'epoch': 2.96}
{'loss': 0.8612, 'learning_rate': 2.961531640623757e-05, 'epoch': 2.96}
{'loss': 0.8814, 'learning_rate': 2.9533677898115063e-05, 'epoch': 2.96}
{'loss': 0.899, 'learning_rate': 2.9452132569393077e-05, 'epoch': 2.96}
{'loss': 0.9123, 'learning_rate': 2.9370680527901116e-05, 'epoch': 2.96}
{'loss': 0.8835, 'learning_rate': 2.9289321881345254e-05, 'epoch': 2.96}
{'loss': 0.9375, 'learning_rate': 2.9208056737308074e-05, 'epoch': 2.97}
{'loss': 0.9662, 'learning_rate': 2.9126885203248554e-05, 'epoch': 2.97}
{'loss': 0.918, 'learning_rate': 2.904580738650181e-05, 'epoch': 2.97}
{'loss': 0.8745, 'learning_rate': 2.8964823394279174e-05, 'epoch': 2.97}
{'loss': 0.9056, 'learning_rate': 2.888393333366778e-05, 'epoch': 2.97}
{'loss': 0.9258, 'learning_rate': 2.880313731163061e-05, 'epoch': 2.97}
{'loss': 0.9394, 'learning_rate': 2.872243543500629e-05, 'epoch': 2.97}
{'loss': 0.932, 'learning_rate': 2.864182781050895e-05, 'epoch': 2.98}
{'loss': 0.8368, 'learning_rate': 2.856131454472807e-05, 'epoch': 2.98}
{'loss': 0.7922, 'learning_rate': 2.8480895744128422e-05, 'epoch': 2.98}
{'loss': 0.881, 'learning_rate': 2.840057151504979e-05, 'epoch': 2.98}
{'loss': 0.9326, 'learning_rate': 2.832034196370693e-05, 'epoch': 2.98}
{'loss': 0.9605, 'learning_rate': 2.824020719618944e-05, 'epoch': 2.98}
{'loss': 0.9439, 'learning_rate': 2.8160167318461506e-05, 'epoch': 2.98}
{'loss': 0.8284, 'learning_rate': 2.8080222436361934e-05, 'epoch': 2.99}
{'loss': 0.8589, 'learning_rate': 2.8000372655603847e-05, 'epoch': 2.99}
{'loss': 0.9226, 'learning_rate': 2.7920618081774618e-05, 'epoch': 2.99}
{'loss': 0.954, 'learning_rate': 2.784095882033575e-05, 'epoch': 2.99}
{'loss': 0.8516, 'learning_rate': 2.7761394976622658e-05, 'epoch': 2.99}
{'loss': 0.9274, 'learning_rate': 2.768192665584468e-05, 'epoch': 2.99}
{'loss': 0.9747, 'learning_rate': 2.7602553963084776e-05, 'epoch': 2.99}
{'loss': 0.866, 'learning_rate': 2.7523277003299463e-05, 'epoch': 3.0}
{'loss': 0.8241, 'learning_rate': 2.7444095881318656e-05, 'epoch': 3.0}
{'loss': 0.8822, 'learning_rate': 2.736501070184556e-05, 'epoch': 3.0}
{'loss': 0.9198, 'learning_rate': 2.728602156945649e-05, 'epoch': 3.0}
{'loss': 1.0102, 'learning_rate': 2.720712858860083e-05, 'epoch': 3.0}
{'loss': 0.9658, 'learning_rate': 2.712833186360072e-05, 'epoch': 3.0}
{'loss': 0.9391, 'learning_rate': 2.7049631498651085e-05, 'epoch': 3.0}
{'loss': 0.9419, 'learning_rate': 2.69710275978194e-05, 'epoch': 3.01}
{'loss': 0.97, 'learning_rate': 2.6892520265045552e-05, 'epoch': 3.01}
{'loss': 0.9433, 'learning_rate': 2.6814109604141848e-05, 'epoch': 3.01}
{'loss': 0.9582, 'learning_rate': 2.6735795718792646e-05, 'epoch': 3.01}
{'loss': 0.9987, 'learning_rate': 2.665757871255439e-05, 'epoch': 3.01}
{'loss': 0.9493, 'learning_rate': 2.6579458688855362e-05, 'epoch': 3.01}
{'loss': 0.9051, 'learning_rate': 2.6501435750995727e-05, 'epoch': 3.01}
{'loss': 0.984, 'learning_rate': 2.6423510002147113e-05, 'epoch': 3.02}
{'loss': 0.9262, 'learning_rate': 2.6345681545352773e-05, 'epoch': 3.02}
{'loss': 1.0209, 'learning_rate': 2.6267950483527216e-05, 'epoch': 3.02}
{'loss': 0.9266, 'learning_rate': 2.619031691945618e-05, 'epoch': 3.02}
{'loss': 0.9631, 'learning_rate': 2.611278095579651e-05, 'epoch': 3.02}
77%|█████████████████████████████████████████████████████████████████████████████████████▊ | 2109/2752 [36:17<10:48, 1.01s/it][2023-12-29 02:39:42,702] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,703] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,703] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,703] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,703] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,734] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,940] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,943] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,964] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,973] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,976] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,990] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,992] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:43,175] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:43,209] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
{'loss': 0.925, 'learning_rate': 2.6035342695075937e-05, 'epoch': 3.0}
{'loss': 0.8235, 'learning_rate': 2.5958002239693092e-05, 'epoch': 3.0}
{'loss': 0.959, 'learning_rate': 2.588075969191718e-05, 'epoch': 3.0}
{'loss': 0.86, 'learning_rate': 2.5803615153887983e-05, 'epoch': 3.01}
{'loss': 0.9208, 'learning_rate': 2.5726568727615662e-05, 'epoch': 3.01}
{'loss': 0.9867, 'learning_rate': 2.5649620514980644e-05, 'epoch': 3.01}
{'loss': 0.8472, 'learning_rate': 2.5572770617733544e-05, 'epoch': 3.01}
{'loss': 0.9361, 'learning_rate': 2.5496019137494908e-05, 'epoch': 3.01}
{'loss': 0.9576, 'learning_rate': 2.5419366175755145e-05, 'epoch': 3.01}
{'loss': 0.913, 'learning_rate': 2.5342811833874423e-05, 'epoch': 3.01}
{'loss': 0.8749, 'learning_rate': 2.5266356213082433e-05, 'epoch': 3.02}
{'loss': 0.8932, 'learning_rate': 2.518999941447846e-05, 'epoch': 3.02}
{'loss': 1.0326, 'learning_rate': 2.5113741539030987e-05, 'epoch': 3.02}
{'loss': 0.8748, 'learning_rate': 2.503758268757773e-05, 'epoch': 3.02}
{'loss': 0.8779, 'learning_rate': 2.496152296082548e-05, 'epoch': 3.02}
{'loss': 0.9256, 'learning_rate': 2.4885562459349888e-05, 'epoch': 3.02}
{'loss': 0.9557, 'learning_rate': 2.480970128359552e-05, 'epoch': 3.02}
{'loss': 0.866, 'learning_rate': 2.4733939533875472e-05, 'epoch': 3.03}
{'loss': 0.882, 'learning_rate': 2.465827731037147e-05, 'epoch': 3.03}
{'loss': 0.892, 'learning_rate': 2.458271471313357e-05, 'epoch': 3.03}
{'loss': 0.8872, 'learning_rate': 2.4507251842080092e-05, 'epoch': 3.03}
{'loss': 0.9151, 'learning_rate': 2.443188879699747e-05, 'epoch': 3.03}
{'loss': 0.9867, 'learning_rate': 2.4356625677540233e-05, 'epoch': 3.03}
{'loss': 0.8839, 'learning_rate': 2.4281462583230686e-05, 'epoch': 3.03}
{'loss': 0.7848, 'learning_rate': 2.4206399613458875e-05, 'epoch': 3.04}
{'loss': 0.8742, 'learning_rate': 2.413143686748247e-05, 'epoch': 3.04}
{'loss': 0.9463, 'learning_rate': 2.405657444442657e-05, 'epoch': 3.04}
{'loss': 0.8966, 'learning_rate': 2.3981812443283723e-05, 'epoch': 3.04}
{'loss': 0.9582, 'learning_rate': 2.3907150962913584e-05, 'epoch': 3.04}
{'loss': 0.8336, 'learning_rate': 2.3832590102042895e-05, 'epoch': 3.04}
{'loss': 0.8765, 'learning_rate': 2.3758129959265407e-05, 'epoch': 3.05}
{'loss': 0.9209, 'learning_rate': 2.3683770633041613e-05, 'epoch': 3.05}
{'loss': 0.8095, 'learning_rate': 2.3609512221698725e-05, 'epoch': 3.05}
{'loss': 0.9049, 'learning_rate': 2.3535354823430577e-05, 'epoch': 3.05}
{'loss': 0.8767, 'learning_rate': 2.3461298536297328e-05, 'epoch': 3.05}
{'loss': 0.9175, 'learning_rate': 2.33873434582255e-05, 'epoch': 3.05}
{'loss': 0.9369, 'learning_rate': 2.331348968700775e-05, 'epoch': 3.05}
{'loss': 0.9144, 'learning_rate': 2.3239737320302756e-05, 'epoch': 3.06}
{'loss': 0.9236, 'learning_rate': 2.3166086455635218e-05, 'epoch': 3.06}
{'loss': 0.8233, 'learning_rate': 2.3092537190395457e-05, 'epoch': 3.06}
{'loss': 0.9012, 'learning_rate': 2.3019089621839597e-05, 'epoch': 3.06}
{'loss': 0.8681, 'learning_rate': 2.2945743847089174e-05, 'epoch': 3.06}
{'loss': 0.8885, 'learning_rate': 2.2872499963131155e-05, 'epoch': 3.06}
{'loss': 0.951, 'learning_rate': 2.279935806681782e-05, 'epoch': 3.06}
{'loss': 0.8665, 'learning_rate': 2.272631825486653e-05, 'epoch': 3.07}
{'loss': 0.8761, 'learning_rate': 2.2653380623859665e-05, 'epoch': 3.07}
{'loss': 0.9654, 'learning_rate': 2.258054527024451e-05, 'epoch': 3.07}
{'loss': 0.8756, 'learning_rate': 2.2507812290333097e-05, 'epoch': 3.07}
{'loss': 0.8569, 'learning_rate': 2.243518178030206e-05, 'epoch': 3.07}
{'loss': 0.9044, 'learning_rate': 2.2362653836192603e-05, 'epoch': 3.07}
{'loss': 0.9117, 'learning_rate': 2.2290228553910242e-05, 'epoch': 3.07}
{'loss': 0.8473, 'learning_rate': 2.2217906029224757e-05, 'epoch': 3.08}
{'loss': 0.8593, 'learning_rate': 2.2145686357770046e-05, 'epoch': 3.08}
{'loss': 0.8527, 'learning_rate': 2.2073569635044e-05, 'epoch': 3.08}
{'loss': 0.8815, 'learning_rate': 2.2001555956408428e-05, 'epoch': 3.08}
{'loss': 0.8078, 'learning_rate': 2.1929645417088805e-05, 'epoch': 3.08}
{'loss': 0.8098, 'learning_rate': 2.1857838112174267e-05, 'epoch': 3.08}
{'loss': 0.9116, 'learning_rate': 2.178613413661743e-05, 'epoch': 3.08}
{'loss': 0.9506, 'learning_rate': 2.1714533585234244e-05, 'epoch': 3.09}
{'loss': 0.8248, 'learning_rate': 2.164303655270399e-05, 'epoch': 3.09}
{'loss': 0.8722, 'learning_rate': 2.1571643133568964e-05, 'epoch': 3.09}
{'loss': 0.8872, 'learning_rate': 2.1500353422234475e-05, 'epoch': 3.09}
{'loss': 0.8939, 'learning_rate': 2.142916751296876e-05, 'epoch': 3.09}
{'loss': 0.8646, 'learning_rate': 2.1358085499902725e-05, 'epoch': 3.09}
{'loss': 0.9108, 'learning_rate': 2.1287107477029878e-05, 'epoch': 3.09}
{'loss': 0.8471, 'learning_rate': 2.121623353820632e-05, 'epoch': 3.1}
{'loss': 0.8401, 'learning_rate': 2.114546377715042e-05, 'epoch': 3.1}
{'loss': 0.8044, 'learning_rate': 2.107479828744282e-05, 'epoch': 3.1}
{'loss': 0.8743, 'learning_rate': 2.1004237162526296e-05, 'epoch': 3.1}
{'loss': 0.8842, 'learning_rate': 2.093378049570558e-05, 'epoch': 3.1}
{'loss': 0.9091, 'learning_rate': 2.0863428380147344e-05, 'epoch': 3.1}
{'loss': 0.8988, 'learning_rate': 2.079318090887996e-05, 'epoch': 3.1}
{'loss': 0.8222, 'learning_rate': 2.072303817479343e-05, 'epoch': 3.11}
{'loss': 0.8944, 'learning_rate': 2.0653000270639268e-05, 'epoch': 3.11}
{'loss': 0.7921, 'learning_rate': 2.0583067289030335e-05, 'epoch': 3.11}
{'loss': 0.8369, 'learning_rate': 2.0513239322440847e-05, 'epoch': 3.11}
{'loss': 0.9107, 'learning_rate': 2.0443516463206048e-05, 'epoch': 3.11}
{'loss': 0.7424, 'learning_rate': 2.037389880352225e-05, 'epoch': 3.11}
{'loss': 0.9282, 'learning_rate': 2.030438643544663e-05, 'epoch': 3.11}
{'loss': 0.8903, 'learning_rate': 2.0234979450897184e-05, 'epoch': 3.12}
{'loss': 0.9372, 'learning_rate': 2.016567794165246e-05, 'epoch': 3.12}
{'loss': 0.8833, 'learning_rate': 2.0096481999351678e-05, 'epoch': 3.12}
{'loss': 0.8388, 'learning_rate': 2.0027391715494347e-05, 'epoch': 3.12}
{'loss': 0.8807, 'learning_rate': 1.9958407181440286e-05, 'epoch': 3.12}
{'loss': 0.8558, 'learning_rate': 1.988952848840948e-05, 'epoch': 3.12}
{'loss': 0.9237, 'learning_rate': 1.982075572748201e-05, 'epoch': 3.12}
{'loss': 0.8507, 'learning_rate': 1.9752088989597795e-05, 'epoch': 3.13}
{'loss': 0.9246, 'learning_rate': 1.9683528365556637e-05, 'epoch': 3.13}
{'loss': 0.9273, 'learning_rate': 1.961507394601797e-05, 'epoch': 3.13}
{'loss': 0.8661, 'learning_rate': 1.95467258215008e-05, 'epoch': 3.13}
{'loss': 0.9966, 'learning_rate': 1.9478484082383562e-05, 'epoch': 3.13}
{'loss': 1.0066, 'learning_rate': 1.9410348818904078e-05, 'epoch': 3.13}
{'loss': 0.8767, 'learning_rate': 1.9342320121159295e-05, 'epoch': 3.14}
{'loss': 0.8655, 'learning_rate': 1.9274398079105316e-05, 'epoch': 3.14}
{'loss': 0.9439, 'learning_rate': 1.9206582782557136e-05, 'epoch': 3.14}
{'loss': 0.845, 'learning_rate': 1.913887432118866e-05, 'epoch': 3.14}
{'loss': 0.9405, 'learning_rate': 1.9071272784532468e-05, 'epoch': 3.14}
{'loss': 0.8258, 'learning_rate': 1.9003778261979843e-05, 'epoch': 3.14}
{'loss': 0.8778, 'learning_rate': 1.893639084278046e-05, 'epoch': 3.14}
{'loss': 0.7739, 'learning_rate': 1.8869110616042407e-05, 'epoch': 3.15}
{'loss': 0.8811, 'learning_rate': 1.880193767073204e-05, 'epoch': 3.15}
{'loss': 0.902, 'learning_rate': 1.8734872095673817e-05, 'epoch': 3.15}
{'loss': 0.9088, 'learning_rate': 1.86679139795503e-05, 'epoch': 3.15}
{'loss': 0.7956, 'learning_rate': 1.8601063410901852e-05, 'epoch': 3.15}
{'loss': 0.8558, 'learning_rate': 1.853432047812671e-05, 'epoch': 3.15}
{'loss': 0.8145, 'learning_rate': 1.8467685269480705e-05, 'epoch': 3.15}
{'loss': 0.8353, 'learning_rate': 1.8401157873077257e-05, 'epoch': 3.16}
{'loss': 0.8619, 'learning_rate': 1.8334738376887262e-05, 'epoch': 3.16}
{'loss': 0.8716, 'learning_rate': 1.826842686873885e-05, 'epoch': 3.16}
{'loss': 0.83, 'learning_rate': 1.820222343631748e-05, 'epoch': 3.16}
{'loss': 0.8917, 'learning_rate': 1.8136128167165578e-05, 'epoch': 3.16}
{'loss': 0.7738, 'learning_rate': 1.8070141148682584e-05, 'epoch': 3.16}
{'loss': 0.9152, 'learning_rate': 1.80042624681248e-05, 'epoch': 3.16}
{'loss': 0.8365, 'learning_rate': 1.7938492212605306e-05, 'epoch': 3.17}
{'loss': 0.8271, 'learning_rate': 1.787283046909376e-05, 'epoch': 3.17}
{'loss': 0.8139, 'learning_rate': 1.7807277324416338e-05, 'epoch': 3.17}
{'loss': 0.9108, 'learning_rate': 1.7741832865255625e-05, 'epoch': 3.17}
{'loss': 0.8615, 'learning_rate': 1.7676497178150464e-05, 'epoch': 3.17}
{'loss': 0.9262, 'learning_rate': 1.7611270349495924e-05, 'epoch': 3.17}
{'loss': 0.8908, 'learning_rate': 1.7546152465543088e-05, 'epoch': 3.17}
{'loss': 0.8593, 'learning_rate': 1.7481143612398955e-05, 'epoch': 3.18}
{'loss': 0.8398, 'learning_rate': 1.7416243876026396e-05, 'epoch': 3.18}
{'loss': 0.7941, 'learning_rate': 1.735145334224394e-05, 'epoch': 3.18}
{'loss': 0.8993, 'learning_rate': 1.728677209672581e-05, 'epoch': 3.18}
{'loss': 0.8728, 'learning_rate': 1.7222200225001616e-05, 'epoch': 3.18}
{'loss': 0.8441, 'learning_rate': 1.7157737812456386e-05, 'epoch': 3.18}
{'loss': 0.9832, 'learning_rate': 1.7093384944330393e-05, 'epoch': 3.18}
81%|███████████████████████████████████████████████████████████████████████████████████████████ | 2236/2752 [38:26<08:43, 1.02s/it][2023-12-29 02:41:51,716] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,719] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,719] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,720] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,973] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,973] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,974] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,975] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,236] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,236] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,510] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,511] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,760] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,761] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,046] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,046] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,311] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,312] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,565] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,565] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,826] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,826] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,359] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,620] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,621] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,884] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,884] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9991366267204285, 'eval_runtime': 3.1792, 'eval_samples_per_second': 343.486, 'eval_steps_per_second': 21.704, 'epoch': 3.18}
{'loss': 0.7692, 'learning_rate': 1.7029141705719064e-05, 'epoch': 3.19}
{'loss': 0.9054, 'learning_rate': 1.696500818157284e-05, 'epoch': 3.19}
{'loss': 0.8379, 'learning_rate': 1.6900984456697145e-05, 'epoch': 3.19}
{'loss': 0.8711, 'learning_rate': 1.6837070615752115e-05, 'epoch': 3.19}
{'loss': 0.8025, 'learning_rate': 1.6773266743252703e-05, 'epoch': 3.19}
{'loss': 0.905, 'learning_rate': 1.670957292356835e-05, 'epoch': 3.19}
{'loss': 0.9015, 'learning_rate': 1.6645989240922987e-05, 'epoch': 3.19}
{'loss': 0.8721, 'learning_rate': 1.6582515779394968e-05, 'epoch': 3.2}
{'loss': 0.9382, 'learning_rate': 1.6519152622916843e-05, 'epoch': 3.2}
{'loss': 0.9502, 'learning_rate': 1.6455899855275303e-05, 'epoch': 3.2}
{'loss': 0.8633, 'learning_rate': 1.6392757560111093e-05, 'epoch': 3.2}
{'loss': 0.8858, 'learning_rate': 1.632972582091884e-05, 'epoch': 3.2}
{'loss': 0.8456, 'learning_rate': 1.6266804721047058e-05, 'epoch': 3.2}
{'loss': 0.8411, 'learning_rate': 1.6203994343697882e-05, 'epoch': 3.2}
{'loss': 0.8317, 'learning_rate': 1.6141294771927062e-05, 'epoch': 3.21}
{'loss': 0.9077, 'learning_rate': 1.6078706088643836e-05, 'epoch': 3.21}
{'loss': 0.8067, 'learning_rate': 1.60162283766108e-05, 'epoch': 3.21}
{'loss': 0.7608, 'learning_rate': 1.5953861718443774e-05, 'epoch': 3.21}
{'loss': 0.8801, 'learning_rate': 1.5891606196611843e-05, 'epoch': 3.21}
{'loss': 0.8622, 'learning_rate': 1.5829461893437015e-05, 'epoch': 3.21}
{'loss': 0.833, 'learning_rate': 1.576742889109427e-05, 'epoch': 3.22}
{'loss': 0.865, 'learning_rate': 1.570550727161144e-05, 'epoch': 3.22}
{'loss': 0.8657, 'learning_rate': 1.5643697116869004e-05, 'epoch': 3.22}
{'loss': 0.8733, 'learning_rate': 1.558199850860016e-05, 'epoch': 3.22}
{'loss': 0.9039, 'learning_rate': 1.55204115283905e-05, 'epoch': 3.22}
{'loss': 0.863, 'learning_rate': 1.5458936257678014e-05, 'epoch': 3.22}
{'loss': 0.9268, 'learning_rate': 1.539757277775308e-05, 'epoch': 3.22}
{'loss': 0.9167, 'learning_rate': 1.533632116975814e-05, 'epoch': 3.23}
{'loss': 0.7777, 'learning_rate': 1.527518151468773e-05, 'epoch': 3.23}
{'loss': 0.9511, 'learning_rate': 1.5214153893388405e-05, 'epoch': 3.23}
{'loss': 0.8834, 'learning_rate': 1.51532383865585e-05, 'epoch': 3.23}
{'loss': 0.9316, 'learning_rate': 1.5092435074748146e-05, 'epoch': 3.23}
{'loss': 0.9127, 'learning_rate': 1.5031744038359097e-05, 'epoch': 3.23}
{'loss': 0.9103, 'learning_rate': 1.4971165357644613e-05, 'epoch': 3.23}
{'loss': 0.84, 'learning_rate': 1.491069911270948e-05, 'epoch': 3.24}
{'loss': 0.8856, 'learning_rate': 1.48503453835097e-05, 'epoch': 3.24}
{'loss': 0.8837, 'learning_rate': 1.4790104249852554e-05, 'epoch': 3.24}
{'loss': 0.9049, 'learning_rate': 1.4729975791396411e-05, 'epoch': 3.24}
{'loss': 0.8747, 'learning_rate': 1.4669960087650625e-05, 'epoch': 3.24}
{'loss': 0.8525, 'learning_rate': 1.4610057217975526e-05, 'epoch': 3.24}
{'loss': 0.8555, 'learning_rate': 1.4550267261582173e-05, 'epoch': 3.24}
{'loss': 0.8905, 'learning_rate': 1.4490590297532346e-05, 'epoch': 3.25}
{'loss': 0.8001, 'learning_rate': 1.4431026404738391e-05, 'epoch': 3.25}
{'loss': 0.8561, 'learning_rate': 1.4371575661963143e-05, 'epoch': 3.25}
{'loss': 0.9014, 'learning_rate': 1.4312238147819857e-05, 'epoch': 3.25}
{'loss': 0.7926, 'learning_rate': 1.425301394077201e-05, 'epoch': 3.25}
{'loss': 0.8945, 'learning_rate': 1.4193903119133256e-05, 'epoch': 3.25}
{'loss': 0.8685, 'learning_rate': 1.4134905761067329e-05, 'epoch': 3.25}
{'loss': 0.7927, 'learning_rate': 1.407602194458797e-05, 'epoch': 3.26}
{'loss': 0.8627, 'learning_rate': 1.4017251747558712e-05, 'epoch': 3.26}
{'loss': 0.8743, 'learning_rate': 1.395859524769284e-05, 'epoch': 3.26}
{'loss': 0.8629, 'learning_rate': 1.3900052522553397e-05, 'epoch': 3.26}
{'loss': 0.8783, 'learning_rate': 1.384162364955286e-05, 'epoch': 3.26}
{'loss': 0.8307, 'learning_rate': 1.3783308705953224e-05, 'epoch': 3.26}
{'loss': 0.8257, 'learning_rate': 1.3725107768865787e-05, 'epoch': 3.26}
{'loss': 0.8161, 'learning_rate': 1.3667020915251173e-05, 'epoch': 3.27}
{'loss': 0.8053, 'learning_rate': 1.3609048221919064e-05, 'epoch': 3.27}
{'loss': 0.8356, 'learning_rate': 1.3551189765528217e-05, 'epoch': 3.27}
{'loss': 0.826, 'learning_rate': 1.3493445622586343e-05, 'epoch': 3.27}
{'loss': 0.9188, 'learning_rate': 1.3435815869449964e-05, 'epoch': 3.27}
{'loss': 0.7895, 'learning_rate': 1.3378300582324387e-05, 'epoch': 3.27}
{'loss': 0.7949, 'learning_rate': 1.3320899837263524e-05, 'epoch': 3.27}
{'loss': 0.8875, 'learning_rate': 1.3263613710169831e-05, 'epoch': 3.28}
{'loss': 0.8677, 'learning_rate': 1.3206442276794207e-05, 'epoch': 3.28}
{'loss': 0.9383, 'learning_rate': 1.3149385612735876e-05, 'epoch': 3.28}
{'loss': 0.8617, 'learning_rate': 1.3092443793442277e-05, 'epoch': 3.28}
{'loss': 0.7634, 'learning_rate': 1.303561689420909e-05, 'epoch': 3.28}
{'loss': 0.8413, 'learning_rate': 1.297890499017992e-05, 'epoch': 3.28}
{'loss': 0.9422, 'learning_rate': 1.2922308156346353e-05, 'epoch': 3.28}
{'loss': 0.9099, 'learning_rate': 1.2865826467547825e-05, 'epoch': 3.29}
{'loss': 0.8902, 'learning_rate': 1.2809459998471462e-05, 'epoch': 3.29}
{'loss': 0.867, 'learning_rate': 1.2753208823652141e-05, 'epoch': 3.29}
{'loss': 0.8341, 'learning_rate': 1.269707301747215e-05, 'epoch': 3.29}
{'loss': 0.8372, 'learning_rate': 1.2641052654161333e-05, 'epoch': 3.29}
{'loss': 0.8073, 'learning_rate': 1.2585147807796815e-05, 'epoch': 3.29}
{'loss': 0.8703, 'learning_rate': 1.2529358552302972e-05, 'epoch': 3.3}
{'loss': 0.7844, 'learning_rate': 1.2473684961451381e-05, 'epoch': 3.3}
{'loss': 0.8001, 'learning_rate': 1.2418127108860623e-05, 'epoch': 3.3}
{'loss': 0.8309, 'learning_rate': 1.236268506799625e-05, 'epoch': 3.3}
{'loss': 0.8018, 'learning_rate': 1.2307358912170686e-05, 'epoch': 3.3}
{'loss': 0.8752, 'learning_rate': 1.2252148714543088e-05, 'epoch': 3.3}
{'loss': 0.868, 'learning_rate': 1.2197054548119302e-05, 'epoch': 3.3}
{'loss': 0.9158, 'learning_rate': 1.2142076485751751e-05, 'epoch': 3.31}
{'loss': 0.8873, 'learning_rate': 1.2087214600139308e-05, 'epoch': 3.31}
{'loss': 0.8359, 'learning_rate': 1.2032468963827249e-05, 'epoch': 3.31}
{'loss': 0.8097, 'learning_rate': 1.197783964920709e-05, 'epoch': 3.31}
{'loss': 0.8627, 'learning_rate': 1.1923326728516549e-05, 'epoch': 3.31}
{'loss': 0.8869, 'learning_rate': 1.1868930273839473e-05, 'epoch': 3.31}
{'loss': 0.9054, 'learning_rate': 1.181465035710565e-05, 'epoch': 3.31}
{'loss': 0.9408, 'learning_rate': 1.1760487050090796e-05, 'epoch': 3.32}
{'loss': 0.8753, 'learning_rate': 1.170644042441642e-05, 'epoch': 3.32}
{'loss': 0.7947, 'learning_rate': 1.1652510551549723e-05, 'epoch': 3.32}
{'loss': 0.8062, 'learning_rate': 1.1598697502803568e-05, 'epoch': 3.32}
{'loss': 0.8017, 'learning_rate': 1.1545001349336315e-05, 'epoch': 3.32}
{'loss': 0.843, 'learning_rate': 1.14914221621517e-05, 'epoch': 3.32}
{'loss': 0.8078, 'learning_rate': 1.1437960012098892e-05, 'epoch': 3.32}
{'loss': 0.9372, 'learning_rate': 1.1384614969872221e-05, 'epoch': 3.33}
{'loss': 0.8341, 'learning_rate': 1.1331387106011172e-05, 'epoch': 3.33}
{'loss': 0.9547, 'learning_rate': 1.1278276490900319e-05, 'epoch': 3.33}
{'loss': 0.862, 'learning_rate': 1.1225283194769176e-05, 'epoch': 3.33}
{'loss': 0.884, 'learning_rate': 1.1172407287692099e-05, 'epoch': 3.33}
{'loss': 0.8948, 'learning_rate': 1.1119648839588258e-05, 'epoch': 3.33}
{'loss': 0.8137, 'learning_rate': 1.1067007920221439e-05, 'epoch': 3.33}
{'loss': 0.8918, 'learning_rate': 1.1014484599200125e-05, 'epoch': 3.34}
{'loss': 0.9144, 'learning_rate': 1.0962078945977195e-05, 'epoch': 3.34}
{'loss': 0.8809, 'learning_rate': 1.090979102984998e-05, 'epoch': 3.34}
{'loss': 0.8329, 'learning_rate': 1.085762091996011e-05, 'epoch': 3.34}
{'loss': 0.8637, 'learning_rate': 1.0805568685293422e-05, 'epoch': 3.34}
{'loss': 0.8378, 'learning_rate': 1.0753634394679934e-05, 'epoch': 3.34}
{'loss': 0.7993, 'learning_rate': 1.0701818116793672e-05, 'epoch': 3.34}
{'loss': 0.8492, 'learning_rate': 1.06501199201526e-05, 'epoch': 3.35}
{'loss': 0.8809, 'learning_rate': 1.0598539873118552e-05, 'epoch': 3.35}
{'loss': 0.8883, 'learning_rate': 1.054707804389713e-05, 'epoch': 3.35}
{'loss': 0.9208, 'learning_rate': 1.0495734500537591e-05, 'epoch': 3.35}
{'loss': 0.8741, 'learning_rate': 1.0444509310932848e-05, 'epoch': 3.35}
{'loss': 0.7554, 'learning_rate': 1.0393402542819231e-05, 'epoch': 3.35}
{'loss': 0.7831, 'learning_rate': 1.0342414263776512e-05, 'epoch': 3.35}
{'loss': 0.8131, 'learning_rate': 1.0291544541227804e-05, 'epoch': 3.36}
{'loss': 0.8611, 'learning_rate': 1.0240793442439411e-05, 'epoch': 3.36}
{'loss': 0.7758, 'learning_rate': 1.0190161034520795e-05, 'epoch': 3.36}
{'loss': 0.8395, 'learning_rate': 1.0139647384424477e-05, 'epoch': 3.36}
{'loss': 0.9013, 'learning_rate': 1.008925255894595e-05, 'epoch': 3.36}
{'loss': 0.8523, 'learning_rate': 1.0038976624723539e-05, 'epoch': 3.36}
{'loss': 0.9207, 'learning_rate': 9.988819648238379e-06, 'epoch': 3.36}
{'loss': 0.863, 'learning_rate': 9.938781695814337e-06, 'epoch': 3.37}
{'loss': 0.8753, 'learning_rate': 9.888862833617862e-06, 'epoch': 3.37}
{'loss': 0.8762, 'learning_rate': 9.83906312765791e-06, 'epoch': 3.37}
{'loss': 0.8391, 'learning_rate': 9.789382643785895e-06, 'epoch': 3.37}
{'loss': 0.8798, 'learning_rate': 9.739821447695585e-06, 'epoch': 3.37}
{'loss': 0.8436, 'learning_rate': 9.690379604922983e-06, 'epoch': 3.37}
{'loss': 0.8689, 'learning_rate': 9.641057180846324e-06, 'epoch': 3.38}
{'loss': 0.8863, 'learning_rate': 9.591854240685882e-06, 'epoch': 3.38}
{'loss': 0.8453, 'learning_rate': 9.542770849503946e-06, 'epoch': 3.38}
{'loss': 0.8856, 'learning_rate': 9.493807072204718e-06, 'epoch': 3.38}
{'loss': 0.7519, 'learning_rate': 9.444962973534244e-06, 'epoch': 3.38}
{'loss': 0.8428, 'learning_rate': 9.396238618080322e-06, 'epoch': 3.38}
{'loss': 0.8327, 'learning_rate': 9.347634070272404e-06, 'epoch': 3.38}
{'loss': 0.8393, 'learning_rate': 9.299149394381501e-06, 'epoch': 3.39}
{'loss': 0.9051, 'learning_rate': 9.250784654520106e-06, 'epoch': 3.39}
{'loss': 0.9085, 'learning_rate': 9.202539914642182e-06, 'epoch': 3.39}
{'loss': 0.9799, 'learning_rate': 9.154415238542946e-06, 'epoch': 3.39}
{'loss': 0.8416, 'learning_rate': 9.106410689858857e-06, 'epoch': 3.39}
{'loss': 0.9105, 'learning_rate': 9.058526332067586e-06, 'epoch': 3.39}
{'loss': 0.9668, 'learning_rate': 9.010762228487813e-06, 'epoch': 3.39}
{'loss': 0.7611, 'learning_rate': 8.963118442279205e-06, 'epoch': 3.4}
{'loss': 0.8875, 'learning_rate': 8.915595036442349e-06, 'epoch': 3.4}
{'loss': 0.7758, 'learning_rate': 8.868192073818671e-06, 'epoch': 3.4}
{'loss': 0.9575, 'learning_rate': 8.820909617090289e-06, 'epoch': 3.4}
{'loss': 0.8685, 'learning_rate': 8.773747728780001e-06, 'epoch': 3.4}
{'loss': 0.8455, 'learning_rate': 8.726706471251156e-06, 'epoch': 3.4}
{'loss': 0.8608, 'learning_rate': 8.679785906707582e-06, 'epoch': 3.4}
{'loss': 0.9329, 'learning_rate': 8.632986097193573e-06, 'epoch': 3.41}
{'loss': 0.8921, 'learning_rate': 8.586307104593672e-06, 'epoch': 3.41}
{'loss': 0.8398, 'learning_rate': 8.539748990632701e-06, 'epoch': 3.41}
{'loss': 0.9086, 'learning_rate': 8.493311816875615e-06, 'epoch': 3.41}
{'loss': 0.9483, 'learning_rate': 8.446995644727473e-06, 'epoch': 3.41}
{'loss': 0.9482, 'learning_rate': 8.40080053543334e-06, 'epoch': 3.41}
{'loss': 0.9169, 'learning_rate': 8.354726550078152e-06, 'epoch': 3.41}
{'loss': 0.8638, 'learning_rate': 8.308773749586728e-06, 'epoch': 3.42}
{'loss': 0.9535, 'learning_rate': 8.2629421947236e-06, 'epoch': 3.42}
{'loss': 0.9294, 'learning_rate': 8.217231946092984e-06, 'epoch': 3.42}
{'loss': 0.871, 'learning_rate': 8.171643064138735e-06, 'epoch': 3.42}
{'loss': 0.8875, 'learning_rate': 8.12617560914416e-06, 'epoch': 3.42}
{'loss': 0.8881, 'learning_rate': 8.080829641232013e-06, 'epoch': 3.42}
{'loss': 0.8584, 'learning_rate': 8.03560522036445e-06, 'epoch': 3.42}
{'loss': 0.7881, 'learning_rate': 7.990502406342836e-06, 'epoch': 3.43}
{'loss': 0.742, 'learning_rate': 7.945521258807776e-06, 'epoch': 3.43}
{'loss': 0.8735, 'learning_rate': 7.900661837238977e-06, 'epoch': 3.43}
{'loss': 0.8317, 'learning_rate': 7.8559242009552e-06, 'epoch': 3.43}
{'loss': 0.7784, 'learning_rate': 7.811308409114138e-06, 'epoch': 3.43}
{'loss': 0.8541, 'learning_rate': 7.766814520712384e-06, 'epoch': 3.43}
{'loss': 0.8901, 'learning_rate': 7.72244259458531e-06, 'epoch': 3.43}
88%|██████████████████████████████████████████████████████████████████████████████████████████████████ | 2408/2752 [41:23<05:49, 1.02s/it][2023-12-29 02:44:49,086] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,086] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,092] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,093] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,343] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,344] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,344] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,345] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,608] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,609] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,873] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,873] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,124] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,124] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,410] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,410] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,675] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,675] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,927] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,928] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,189] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,189] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,451] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,451] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,721] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,721] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,984] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,985] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:52,249] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:52,249] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0039585828781128, 'eval_runtime': 3.1719, 'eval_samples_per_second': 344.273, 'eval_steps_per_second': 21.754, 'epoch': 3.43}
{'loss': 0.8212, 'learning_rate': 7.678192689407082e-06, 'epoch': 3.44}
{'loss': 0.6998, 'learning_rate': 7.634064863690448e-06, 'epoch': 3.44}
{'loss': 0.8835, 'learning_rate': 7.590059175786746e-06, 'epoch': 3.44}
{'loss': 0.9484, 'learning_rate': 7.546175683885814e-06, 'epoch': 3.44}
{'loss': 0.844, 'learning_rate': 7.502414446015893e-06, 'epoch': 3.44}
{'loss': 0.8346, 'learning_rate': 7.45877552004357e-06, 'epoch': 3.44}
{'loss': 0.8501, 'learning_rate': 7.415258963673732e-06, 'epoch': 3.44}
{'loss': 0.8103, 'learning_rate': 7.371864834449405e-06, 'epoch': 3.45}
{'loss': 0.9139, 'learning_rate': 7.328593189751754e-06, 'epoch': 3.45}
{'loss': 0.9085, 'learning_rate': 7.285444086799942e-06, 'epoch': 3.45}
{'loss': 0.9066, 'learning_rate': 7.2424175826511286e-06, 'epoch': 3.45}
{'loss': 0.8778, 'learning_rate': 7.199513734200369e-06, 'epoch': 3.45}
{'loss': 0.841, 'learning_rate': 7.156732598180505e-06, 'epoch': 3.45}
{'loss': 0.9662, 'learning_rate': 7.114074231162082e-06, 'epoch': 3.45}
{'loss': 0.8999, 'learning_rate': 7.071538689553381e-06, 'epoch': 3.46}
{'loss': 0.8571, 'learning_rate': 7.029126029600197e-06, 'epoch': 3.46}
{'loss': 0.8339, 'learning_rate': 6.986836307385858e-06, 'epoch': 3.46}
{'loss': 0.7903, 'learning_rate': 6.944669578831176e-06, 'epoch': 3.46}
{'loss': 0.9298, 'learning_rate': 6.902625899694237e-06, 'epoch': 3.46}
{'loss': 0.8453, 'learning_rate': 6.860705325570494e-06, 'epoch': 3.46}
{'loss': 0.8637, 'learning_rate': 6.818907911892558e-06, 'epoch': 3.47}
{'loss': 0.8856, 'learning_rate': 6.777233713930198e-06, 'epoch': 3.47}
{'loss': 0.9112, 'learning_rate': 6.7356827867902984e-06, 'epoch': 3.47}
{'loss': 0.8057, 'learning_rate': 6.694255185416687e-06, 'epoch': 3.47}
{'loss': 0.8563, 'learning_rate': 6.652950964590121e-06, 'epoch': 3.47}
{'loss': 0.8958, 'learning_rate': 6.611770178928223e-06, 'epoch': 3.47}
{'loss': 0.8808, 'learning_rate': 6.570712882885355e-06, 'epoch': 3.47}
{'loss': 0.9118, 'learning_rate': 6.529779130752678e-06, 'epoch': 3.48}
{'loss': 0.9069, 'learning_rate': 6.488968976657894e-06, 'epoch': 3.48}
{'loss': 0.844, 'learning_rate': 6.448282474565303e-06, 'epoch': 3.48}
{'loss': 0.8704, 'learning_rate': 6.407719678275703e-06, 'epoch': 3.48}
{'loss': 0.8371, 'learning_rate': 6.3672806414262765e-06, 'epoch': 3.48}
{'loss': 0.8223, 'learning_rate': 6.326965417490638e-06, 'epoch': 3.48}
{'loss': 0.8485, 'learning_rate': 6.286774059778599e-06, 'epoch': 3.48}
{'loss': 0.9, 'learning_rate': 6.246706621436205e-06, 'epoch': 3.49}
{'loss': 0.8453, 'learning_rate': 6.206763155445627e-06, 'epoch': 3.49}
{'loss': 0.8473, 'learning_rate': 6.166943714625173e-06, 'epoch': 3.49}
{'loss': 0.8125, 'learning_rate': 6.127248351629056e-06, 'epoch': 3.49}
{'loss': 0.8782, 'learning_rate': 6.087677118947455e-06, 'epoch': 3.49}
{'loss': 0.853, 'learning_rate': 6.0482300689064466e-06, 'epoch': 3.49}
{'loss': 0.9692, 'learning_rate': 6.008907253667839e-06, 'epoch': 3.49}
{'loss': 0.9187, 'learning_rate': 5.969708725229195e-06, 'epoch': 3.5}
{'loss': 0.8498, 'learning_rate': 5.930634535423696e-06, 'epoch': 3.5}
{'loss': 0.8456, 'learning_rate': 5.891684735920167e-06, 'epoch': 3.5}
{'loss': 0.8237, 'learning_rate': 5.852859378222897e-06, 'epoch': 3.5}
{'loss': 0.8031, 'learning_rate': 5.81415851367163e-06, 'epoch': 3.5}
{'loss': 0.9069, 'learning_rate': 5.77558219344152e-06, 'epoch': 3.5}
{'loss': 0.8812, 'learning_rate': 5.737130468542972e-06, 'epoch': 3.5}
{'loss': 0.8748, 'learning_rate': 5.698803389821728e-06, 'epoch': 3.51}
{'loss': 0.8297, 'learning_rate': 5.6606010079586215e-06, 'epoch': 3.51}
{'loss': 0.8528, 'learning_rate': 5.622523373469635e-06, 'epoch': 3.51}
{'loss': 0.8046, 'learning_rate': 5.58457053670578e-06, 'epoch': 3.51}
{'loss': 0.9836, 'learning_rate': 5.546742547853067e-06, 'epoch': 3.51}
{'loss': 0.8655, 'learning_rate': 5.509039456932385e-06, 'epoch': 3.51}
{'loss': 0.8693, 'learning_rate': 5.471461313799497e-06, 'epoch': 3.51}
{'loss': 0.8959, 'learning_rate': 5.434008168144944e-06, 'epoch': 3.52}
{'loss': 0.7543, 'learning_rate': 5.396680069493953e-06, 'epoch': 3.52}
{'loss': 0.8862, 'learning_rate': 5.359477067206397e-06, 'epoch': 3.52}
{'loss': 0.8258, 'learning_rate': 5.322399210476781e-06, 'epoch': 3.52}
{'loss': 0.8141, 'learning_rate': 5.2854465483340725e-06, 'epoch': 3.52}
{'loss': 0.8996, 'learning_rate': 5.248619129641707e-06, 'epoch': 3.52}
{'loss': 0.8866, 'learning_rate': 5.211917003097544e-06, 'epoch': 3.52}
{'loss': 0.9403, 'learning_rate': 5.175340217233704e-06, 'epoch': 3.53}
{'loss': 0.816, 'learning_rate': 5.1388888204165875e-06, 'epoch': 3.53}
{'loss': 0.9147, 'learning_rate': 5.102562860846827e-06, 'epoch': 3.53}
{'loss': 0.8745, 'learning_rate': 5.066362386559154e-06, 'epoch': 3.53}
{'loss': 0.8936, 'learning_rate': 5.030287445422366e-06, 'epoch': 3.53}
{'loss': 0.862, 'learning_rate': 4.9943380851392604e-06, 'epoch': 3.53}
{'loss': 0.9836, 'learning_rate': 4.958514353246602e-06, 'epoch': 3.53}
{'loss': 0.8044, 'learning_rate': 4.9228162971149846e-06, 'epoch': 3.54}
{'loss': 0.866, 'learning_rate': 4.887243963948895e-06, 'epoch': 3.54}
{'loss': 0.7985, 'learning_rate': 4.851797400786506e-06, 'epoch': 3.54}
{'loss': 0.8844, 'learning_rate': 4.816476654499713e-06, 'epoch': 3.54}
{'loss': 1.0106, 'learning_rate': 4.781281771794033e-06, 'epoch': 3.54}
{'loss': 0.8814, 'learning_rate': 4.746212799208527e-06, 'epoch': 3.54}
{'loss': 0.802, 'learning_rate': 4.7112697831158126e-06, 'epoch': 3.55}
{'loss': 0.8892, 'learning_rate': 4.676452769721917e-06, 'epoch': 3.55}
{'loss': 0.911, 'learning_rate': 4.641761805066258e-06, 'epoch': 3.55}
{'loss': 0.8486, 'learning_rate': 4.607196935021574e-06, 'epoch': 3.55}
{'loss': 0.9191, 'learning_rate': 4.572758205293848e-06, 'epoch': 3.55}
{'loss': 0.8329, 'learning_rate': 4.53844566142232e-06, 'epoch': 3.55}
{'loss': 0.8538, 'learning_rate': 4.504259348779316e-06, 'epoch': 3.55}
{'loss': 0.8515, 'learning_rate': 4.470199312570256e-06, 'epoch': 3.56}
{'loss': 0.7707, 'learning_rate': 4.4362655978336e-06, 'epoch': 3.56}
{'loss': 0.9054, 'learning_rate': 4.4024582494407556e-06, 'epoch': 3.56}
{'loss': 0.8464, 'learning_rate': 4.368777312096006e-06, 'epoch': 3.56}
{'loss': 0.86, 'learning_rate': 4.3352228303365605e-06, 'epoch': 3.56}
{'loss': 0.8042, 'learning_rate': 4.3017948485323255e-06, 'epoch': 3.56}
{'loss': 0.927, 'learning_rate': 4.2684934108859765e-06, 'epoch': 3.56}
{'loss': 0.9424, 'learning_rate': 4.235318561432844e-06, 'epoch': 3.57}
{'loss': 0.8785, 'learning_rate': 4.2022703440408486e-06, 'epoch': 3.57}
{'loss': 0.8664, 'learning_rate': 4.169348802410522e-06, 'epoch': 3.57}
{'loss': 0.8852, 'learning_rate': 4.136553980074842e-06, 'epoch': 3.57}
{'loss': 0.8903, 'learning_rate': 4.10388592039922e-06, 'epoch': 3.57}
{'loss': 0.9148, 'learning_rate': 4.071344666581456e-06, 'epoch': 3.57}
{'loss': 0.9007, 'learning_rate': 4.038930261651674e-06, 'epoch': 3.57}
{'loss': 0.8743, 'learning_rate': 4.006642748472278e-06, 'epoch': 3.58}
{'loss': 0.8454, 'learning_rate': 3.974482169737859e-06, 'epoch': 3.58}
{'loss': 0.8139, 'learning_rate': 3.9424485679751546e-06, 'epoch': 3.58}
{'loss': 0.8817, 'learning_rate': 3.910541985543014e-06, 'epoch': 3.58}
{'loss': 0.9012, 'learning_rate': 3.878762464632313e-06, 'epoch': 3.58}
{'loss': 0.8465, 'learning_rate': 3.847110047265911e-06, 'epoch': 3.58}
{'loss': 0.9034, 'learning_rate': 3.81558477529862e-06, 'epoch': 3.58}
{'loss': 0.8386, 'learning_rate': 3.7841866904170798e-06, 'epoch': 3.59}
{'loss': 0.8854, 'learning_rate': 3.752915834139781e-06, 'epoch': 3.59}
{'loss': 0.8416, 'learning_rate': 3.7217722478169903e-06, 'epoch': 3.59}
{'loss': 0.8349, 'learning_rate': 3.690755972630622e-06, 'epoch': 3.59}
{'loss': 0.8278, 'learning_rate': 3.6598670495943123e-06, 'epoch': 3.59}
{'loss': 0.8793, 'learning_rate': 3.629105519553255e-06, 'epoch': 3.59}
{'loss': 0.847, 'learning_rate': 3.598471423184202e-06, 'epoch': 3.59}
{'loss': 0.891, 'learning_rate': 3.5679648009953935e-06, 'epoch': 3.6}
{'loss': 0.9706, 'learning_rate': 3.537585693326484e-06, 'epoch': 3.6}
{'loss': 0.9981, 'learning_rate': 3.5073341403485727e-06, 'epoch': 3.6}
{'loss': 0.789, 'learning_rate': 3.477210182064039e-06, 'epoch': 3.6}
{'loss': 0.9186, 'learning_rate': 3.447213858306564e-06, 'epoch': 3.6}
{'loss': 0.8503, 'learning_rate': 3.4173452087410187e-06, 'epoch': 3.6}
{'loss': 0.8136, 'learning_rate': 3.3876042728635092e-06, 'epoch': 3.6}
{'loss': 0.8728, 'learning_rate': 3.357991090001189e-06, 'epoch': 3.61}
{'loss': 0.7827, 'learning_rate': 3.3285056993123455e-06, 'epoch': 3.61}
{'loss': 0.9014, 'learning_rate': 3.2991481397862568e-06, 'epoch': 3.61}
{'loss': 0.9747, 'learning_rate': 3.269918450243159e-06, 'epoch': 3.61}
{'loss': 0.9531, 'learning_rate': 3.2408166693342123e-06, 'epoch': 3.61}
{'loss': 0.866, 'learning_rate': 3.211842835541423e-06, 'epoch': 3.61}
{'loss': 0.9305, 'learning_rate': 3.1829969871776555e-06, 'epoch': 3.61}
{'loss': 0.8745, 'learning_rate': 3.1542791623864863e-06, 'epoch': 3.62}
{'loss': 0.8253, 'learning_rate': 3.125689399142229e-06, 'epoch': 3.62}
{'loss': 0.8647, 'learning_rate': 3.0972277352498303e-06, 'epoch': 3.62}
{'loss': 0.993, 'learning_rate': 3.0688942083448967e-06, 'epoch': 3.62}
{'loss': 0.8882, 'learning_rate': 3.0406888558935476e-06, 'epoch': 3.62}
{'loss': 0.8133, 'learning_rate': 3.012611715192437e-06, 'epoch': 3.62}
{'loss': 0.859, 'learning_rate': 2.984662823368689e-06, 'epoch': 3.62}
{'loss': 0.8799, 'learning_rate': 2.9568422173798294e-06, 'epoch': 3.63}
{'loss': 0.8258, 'learning_rate': 2.929149934013742e-06, 'epoch': 3.63}
{'loss': 0.8953, 'learning_rate': 2.901586009888624e-06, 'epoch': 3.63}
{'loss': 0.9015, 'learning_rate': 2.874150481452975e-06, 'epoch': 3.63}
{'loss': 0.9216, 'learning_rate': 2.846843384985476e-06, 'epoch': 3.63}
{'loss': 0.842, 'learning_rate': 2.8196647565949864e-06, 'epoch': 3.63}
{'loss': 0.8822, 'learning_rate': 2.7926146322204914e-06, 'epoch': 3.64}
{'loss': 0.8613, 'learning_rate': 2.7656930476310683e-06, 'epoch': 3.64}
{'loss': 0.8748, 'learning_rate': 2.7389000384257955e-06, 'epoch': 3.64}
{'loss': 0.8796, 'learning_rate': 2.7122356400337667e-06, 'epoch': 3.64}
{'loss': 0.8332, 'learning_rate': 2.6856998877139773e-06, 'epoch': 3.64}
{'loss': 0.8884, 'learning_rate': 2.6592928165553143e-06, 'epoch': 3.64}
{'loss': 0.9207, 'learning_rate': 2.633014461476524e-06, 'epoch': 3.64}
{'loss': 0.8051, 'learning_rate': 2.6068648572261543e-06, 'epoch': 3.65}
{'loss': 0.8006, 'learning_rate': 2.5808440383824796e-06, 'epoch': 3.65}
{'loss': 0.9132, 'learning_rate': 2.554952039353475e-06, 'epoch': 3.65}
{'loss': 0.9026, 'learning_rate': 2.5291888943767992e-06, 'epoch': 3.65}
{'loss': 0.8566, 'learning_rate': 2.5035546375197006e-06, 'epoch': 3.65}
{'loss': 0.8617, 'learning_rate': 2.47804930267902e-06, 'epoch': 3.65}
{'loss': 0.87, 'learning_rate': 2.4526729235810896e-06, 'epoch': 3.65}
{'loss': 0.8373, 'learning_rate': 2.427425533781746e-06, 'epoch': 3.66}
{'loss': 0.9077, 'learning_rate': 2.4023071666662624e-06, 'epoch': 3.66}
{'loss': 0.9626, 'learning_rate': 2.377317855449268e-06, 'epoch': 3.66}
{'loss': 1.0056, 'learning_rate': 2.3524576331747762e-06, 'epoch': 3.66}
{'loss': 0.8332, 'learning_rate': 2.3277265327160904e-06, 'epoch': 3.66}
{'loss': 0.8724, 'learning_rate': 2.3031245867757734e-06, 'epoch': 3.66}
{'loss': 0.8799, 'learning_rate': 2.2786518278855807e-06, 'epoch': 3.66}
{'loss': 0.9257, 'learning_rate': 2.2543082884064815e-06, 'epoch': 3.67}
{'loss': 0.8547, 'learning_rate': 2.2300940005285374e-06, 'epoch': 3.67}
{'loss': 0.9901, 'learning_rate': 2.2060089962709253e-06, 'epoch': 3.67}
{'loss': 0.8688, 'learning_rate': 2.182053307481857e-06, 'epoch': 3.67}
{'loss': 0.904, 'learning_rate': 2.158226965838539e-06, 'epoch': 3.67}
{'loss': 0.7944, 'learning_rate': 2.134530002847146e-06, 'epoch': 3.67}
{'loss': 0.9484, 'learning_rate': 2.1109624498427794e-06, 'epoch': 3.67}
{'loss': 0.8784, 'learning_rate': 2.0875243379893883e-06, 'epoch': 3.68}
{'loss': 0.8592, 'learning_rate': 2.0642156982798144e-06, 'epoch': 3.68}
{'loss': 0.8724, 'learning_rate': 2.0410365615356365e-06, 'epoch': 3.68}
{'loss': 0.8775, 'learning_rate': 2.0179869584072254e-06, 'epoch': 3.68}
{'loss': 0.8888, 'learning_rate': 1.995066919373645e-06, 'epoch': 3.68}
{'loss': 0.8184, 'learning_rate': 1.9722764747426515e-06, 'epoch': 3.68}
{'loss': 0.8942, 'learning_rate': 1.9496156546506274e-06, 'epoch': 3.68}
94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████ | 2580/2752 [44:21<02:54, 1.01s/it][2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,589] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,589] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,589] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,843] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,843] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,844] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,845] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,108] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,108] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,373] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,374] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,624] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,624] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,910] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,910] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,180] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,180] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,432] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,433] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,696] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,697] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,959] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,959] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,228] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,229] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,494] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,495] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,758] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,759] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0016659498214722, 'eval_runtime': 3.183, 'eval_samples_per_second': 343.073, 'eval_steps_per_second': 21.678, 'epoch': 3.68}
{'loss': 0.8634, 'learning_rate': 1.927084489062547e-06, 'epoch': 3.69}
{'loss': 0.8513, 'learning_rate': 1.9046830077719236e-06, 'epoch': 3.69}
{'loss': 0.876, 'learning_rate': 1.8824112404008275e-06, 'epoch': 3.69}
{'loss': 0.8708, 'learning_rate': 1.8602692163997681e-06, 'epoch': 3.69}
{'loss': 0.9077, 'learning_rate': 1.8382569650477133e-06, 'epoch': 3.69}
{'loss': 0.9603, 'learning_rate': 1.8163745154520129e-06, 'epoch': 3.69}
{'loss': 0.8131, 'learning_rate': 1.7946218965483763e-06, 'epoch': 3.69}
{'loss': 0.82, 'learning_rate': 1.7729991371008502e-06, 'epoch': 3.7}
{'loss': 0.8551, 'learning_rate': 1.7515062657017632e-06, 'epoch': 3.7}
{'loss': 0.9039, 'learning_rate': 1.7301433107716592e-06, 'epoch': 3.7}
{'loss': 0.8059, 'learning_rate': 1.708910300559341e-06, 'epoch': 3.7}
{'loss': 0.8944, 'learning_rate': 1.6878072631417386e-06, 'epoch': 3.7}
{'loss': 0.8878, 'learning_rate': 1.6668342264239522e-06, 'epoch': 3.7}
{'loss': 0.9237, 'learning_rate': 1.6459912181391312e-06, 'epoch': 3.7}
{'loss': 0.8837, 'learning_rate': 1.6252782658485178e-06, 'epoch': 3.71}
{'loss': 0.8663, 'learning_rate': 1.6046953969413915e-06, 'epoch': 3.71}
{'loss': 0.8583, 'learning_rate': 1.5842426386349917e-06, 'epoch': 3.71}
{'loss': 0.8618, 'learning_rate': 1.5639200179745184e-06, 'epoch': 3.71}
{'loss': 0.8405, 'learning_rate': 1.543727561833086e-06, 'epoch': 3.71}
{'loss': 0.9588, 'learning_rate': 1.5236652969116804e-06, 'epoch': 3.71}
{'loss': 0.8364, 'learning_rate': 1.5037332497391588e-06, 'epoch': 3.72}
{'loss': 0.9264, 'learning_rate': 1.4839314466721599e-06, 'epoch': 3.72}
{'loss': 0.8804, 'learning_rate': 1.4642599138951163e-06, 'epoch': 3.72}
{'loss': 0.9264, 'learning_rate': 1.444718677420176e-06, 'epoch': 3.72}
{'loss': 0.8977, 'learning_rate': 1.4253077630872357e-06, 'epoch': 3.72}
{'loss': 0.8575, 'learning_rate': 1.4060271965638194e-06, 'epoch': 3.72}
{'loss': 0.838, 'learning_rate': 1.3868770033451328e-06, 'epoch': 3.72}
{'loss': 0.812, 'learning_rate': 1.367857208753931e-06, 'epoch': 3.73}
{'loss': 0.8061, 'learning_rate': 1.3489678379405956e-06, 'epoch': 3.73}
{'loss': 0.9596, 'learning_rate': 1.3302089158829912e-06, 'epoch': 3.73}
{'loss': 0.8883, 'learning_rate': 1.3115804673865306e-06, 'epoch': 3.73}
{'loss': 0.8919, 'learning_rate': 1.2930825170840877e-06, 'epoch': 3.73}
{'loss': 0.7877, 'learning_rate': 1.2747150894359738e-06, 'epoch': 3.73}
{'loss': 0.8302, 'learning_rate': 1.256478208729883e-06, 'epoch': 3.73}
{'loss': 0.7751, 'learning_rate': 1.2383718990809146e-06, 'epoch': 3.74}
{'loss': 0.934, 'learning_rate': 1.2203961844315048e-06, 'epoch': 3.74}
{'loss': 0.8424, 'learning_rate': 1.2025510885513847e-06, 'epoch': 3.74}
{'loss': 0.8013, 'learning_rate': 1.1848366350375895e-06, 'epoch': 3.74}
{'loss': 0.9296, 'learning_rate': 1.1672528473143818e-06, 'epoch': 3.74}
{'loss': 0.9107, 'learning_rate': 1.1497997486332513e-06, 'epoch': 3.74}
{'loss': 0.8821, 'learning_rate': 1.1324773620728702e-06, 'epoch': 3.74}
{'loss': 0.7834, 'learning_rate': 1.1152857105390602e-06, 'epoch': 3.75}
{'loss': 0.8085, 'learning_rate': 1.0982248167647923e-06, 'epoch': 3.75}
{'loss': 0.8174, 'learning_rate': 1.0812947033101207e-06, 'epoch': 3.75}
{'loss': 0.9087, 'learning_rate': 1.0644953925621482e-06, 'epoch': 3.75}
{'loss': 0.8706, 'learning_rate': 1.0478269067350166e-06, 'epoch': 3.75}
{'loss': 0.8524, 'learning_rate': 1.0312892678699281e-06, 'epoch': 3.75}
{'loss': 0.8352, 'learning_rate': 1.0148824978349792e-06, 'epoch': 3.75}
{'loss': 0.8716, 'learning_rate': 9.986066183252818e-07, 'epoch': 3.76}
{'loss': 0.8081, 'learning_rate': 9.824616508628315e-07, 'epoch': 3.76}
{'loss': 0.9079, 'learning_rate': 9.664476167965397e-07, 'epoch': 3.76}
{'loss': 0.9137, 'learning_rate': 9.505645373021455e-07, 'epoch': 3.76}
{'loss': 0.8851, 'learning_rate': 9.348124333822706e-07, 'epoch': 3.76}
{'loss': 0.9555, 'learning_rate': 9.191913258663199e-07, 'epoch': 3.76}
{'loss': 0.8582, 'learning_rate': 9.03701235410459e-07, 'epoch': 3.76}
{'loss': 0.989, 'learning_rate': 8.883421824976479e-07, 'epoch': 3.77}
{'loss': 0.864, 'learning_rate': 8.731141874375403e-07, 'epoch': 3.77}
{'loss': 0.9209, 'learning_rate': 8.580172703664846e-07, 'epoch': 3.77}
{'loss': 0.7957, 'learning_rate': 8.430514512475452e-07, 'epoch': 3.77}
{'loss': 0.9754, 'learning_rate': 8.282167498703918e-07, 'epoch': 3.77}
{'loss': 0.8648, 'learning_rate': 8.135131858513223e-07, 'epoch': 3.77}
{'loss': 0.9934, 'learning_rate': 7.989407786332393e-07, 'epoch': 3.77}
{'loss': 0.9181, 'learning_rate': 7.844995474855843e-07, 'epoch': 3.78}
{'loss': 0.7916, 'learning_rate': 7.701895115043822e-07, 'epoch': 3.78}
{'loss': 0.8838, 'learning_rate': 7.560106896121522e-07, 'epoch': 3.78}
{'loss': 0.9393, 'learning_rate': 7.419631005579075e-07, 'epoch': 3.78}
{'loss': 0.9027, 'learning_rate': 7.280467629171339e-07, 'epoch': 3.78}
{'loss': 0.8829, 'learning_rate': 7.142616950917446e-07, 'epoch': 3.78}
{'loss': 0.8453, 'learning_rate': 7.00607915310103e-07, 'epoch': 3.78}
{'loss': 0.8823, 'learning_rate': 6.870854416269334e-07, 'epoch': 3.79}
{'loss': 0.853, 'learning_rate': 6.736942919233436e-07, 'epoch': 3.79}
{'loss': 0.9244, 'learning_rate': 6.604344839068021e-07, 'epoch': 3.79}
{'loss': 0.9075, 'learning_rate': 6.473060351110727e-07, 'epoch': 3.79}
{'loss': 0.8477, 'learning_rate': 6.343089628962462e-07, 'epoch': 3.79}
{'loss': 0.84, 'learning_rate': 6.214432844486861e-07, 'epoch': 3.79}
{'loss': 0.9991, 'learning_rate': 6.087090167809839e-07, 'epoch': 3.8}
{'loss': 0.9178, 'learning_rate': 5.961061767320142e-07, 'epoch': 3.8}
{'loss': 0.903, 'learning_rate': 5.836347809668019e-07, 'epoch': 3.8}
{'loss': 0.9061, 'learning_rate': 5.712948459765887e-07, 'epoch': 3.8}
{'loss': 0.9127, 'learning_rate': 5.590863880788111e-07, 'epoch': 3.8}
{'loss': 0.9721, 'learning_rate': 5.470094234169998e-07, 'epoch': 3.8}
{'loss': 0.8808, 'learning_rate': 5.35063967960836e-07, 'epoch': 3.8}
{'loss': 0.8217, 'learning_rate': 5.232500375060956e-07, 'epoch': 3.81}
{'loss': 0.9174, 'learning_rate': 5.115676476746489e-07, 'epoch': 3.81}
{'loss': 0.8476, 'learning_rate': 5.000168139143946e-07, 'epoch': 3.81}
{'loss': 0.9031, 'learning_rate': 4.885975514993147e-07, 'epoch': 3.81}
{'loss': 0.854, 'learning_rate': 4.773098755293747e-07, 'epoch': 3.81}
{'loss': 0.8018, 'learning_rate': 4.661538009305577e-07, 'epoch': 3.81}
{'loss': 0.9106, 'learning_rate': 4.5512934245481865e-07, 'epoch': 3.81}
{'loss': 0.9109, 'learning_rate': 4.442365146800853e-07, 'epoch': 3.82}
{'loss': 0.8609, 'learning_rate': 4.3347533201022474e-07, 'epoch': 3.82}
{'loss': 0.8284, 'learning_rate': 4.2284580867500976e-07, 'epoch': 3.82}
{'loss': 0.8601, 'learning_rate': 4.1234795873013045e-07, 'epoch': 3.82}
{'loss': 0.9274, 'learning_rate': 4.0198179605716033e-07, 'epoch': 3.82}
{'loss': 0.8526, 'learning_rate': 3.9174733436353475e-07, 'epoch': 3.82}
{'loss': 0.9332, 'learning_rate': 3.8164458718255025e-07, 'epoch': 3.82}
{'loss': 0.8634, 'learning_rate': 3.7167356787332073e-07, 'epoch': 3.83}
{'loss': 0.821, 'learning_rate': 3.6183428962077716e-07, 'epoch': 3.83}
{'loss': 0.7866, 'learning_rate': 3.5212676543563416e-07, 'epoch': 3.83}
{'loss': 0.8669, 'learning_rate': 3.4255100815442365e-07, 'epoch': 3.83}
{'loss': 0.9061, 'learning_rate': 3.3310703043938354e-07, 'epoch': 3.83}
{'loss': 0.9099, 'learning_rate': 3.237948447785466e-07, 'epoch': 3.83}
{'loss': 0.9021, 'learning_rate': 3.146144634856407e-07, 'epoch': 3.83}
{'loss': 0.874, 'learning_rate': 3.0556589870012196e-07, 'epoch': 3.84}
{'loss': 0.901, 'learning_rate': 2.966491623871193e-07, 'epoch': 3.84}
{'loss': 0.8373, 'learning_rate': 2.8786426633747863e-07, 'epoch': 3.84}
{'loss': 0.8567, 'learning_rate': 2.792112221676857e-07, 'epoch': 3.84}
{'loss': 0.9109, 'learning_rate': 2.7069004131987653e-07, 'epoch': 3.84}
{'loss': 0.9547, 'learning_rate': 2.623007350618267e-07, 'epoch': 3.84}
{'loss': 0.8617, 'learning_rate': 2.540433144869292e-07, 'epoch': 3.84}
{'loss': 0.9077, 'learning_rate': 2.4591779051416075e-07, 'epoch': 3.85}
{'loss': 0.9076, 'learning_rate': 2.379241738881377e-07, 'epoch': 3.85}
{'loss': 0.8944, 'learning_rate': 2.300624751790048e-07, 'epoch': 3.85}
{'loss': 0.9022, 'learning_rate': 2.223327047824908e-07, 'epoch': 3.85}
{'loss': 0.8671, 'learning_rate': 2.1473487291986395e-07, 'epoch': 3.85}
{'loss': 1.0033, 'learning_rate': 2.0726898963793205e-07, 'epoch': 3.85}
{'loss': 0.9429, 'learning_rate': 1.9993506480900926e-07, 'epoch': 3.85}
{'loss': 0.9008, 'learning_rate': 1.9273310813093804e-07, 'epoch': 3.86}
{'loss': 0.9401, 'learning_rate': 1.8566312912706718e-07, 'epoch': 3.86}
{'loss': 0.8867, 'learning_rate': 1.78725137146174e-07, 'epoch': 3.86}
{'loss': 0.8705, 'learning_rate': 1.7191914136256427e-07, 'epoch': 3.86}
{'loss': 0.9031, 'learning_rate': 1.6524515077597224e-07, 'epoch': 3.86}
{'loss': 0.9782, 'learning_rate': 1.58703174211583e-07, 'epoch': 3.86}
{'loss': 0.8514, 'learning_rate': 1.5229322032002115e-07, 'epoch': 3.86}
{'loss': 0.8936, 'learning_rate': 1.4601529757732878e-07, 'epoch': 3.87}
{'loss': 0.8864, 'learning_rate': 1.3986941428496548e-07, 'epoch': 3.87}
{'loss': 0.9202, 'learning_rate': 1.3385557856977483e-07, 'epoch': 3.87}
{'loss': 0.9492, 'learning_rate': 1.27973798384029e-07, 'epoch': 3.87}
{'loss': 0.8922, 'learning_rate': 1.2222408150532882e-07, 'epoch': 3.87}
{'loss': 0.8704, 'learning_rate': 1.1660643553668138e-07, 'epoch': 3.87}
{'loss': 0.7991, 'learning_rate': 1.111208679064446e-07, 'epoch': 3.88}
{'loss': 0.9127, 'learning_rate': 1.0576738586831614e-07, 'epoch': 3.88}
{'loss': 0.9113, 'learning_rate': 1.0054599650135555e-07, 'epoch': 3.88}
{'loss': 0.891, 'learning_rate': 9.545670670991769e-08, 'epoch': 3.88}
{'loss': 0.8205, 'learning_rate': 9.049952322370824e-08, 'epoch': 3.88}
{'loss': 0.8465, 'learning_rate': 8.567445259775042e-08, 'epoch': 3.88}
{'loss': 0.91, 'learning_rate': 8.09815012123294e-08, 'epoch': 3.88}
{'loss': 0.8982, 'learning_rate': 7.642067527308116e-08, 'epoch': 3.89}
{'loss': 0.9924, 'learning_rate': 7.199198081087044e-08, 'epoch': 3.89}
{'loss': 0.8688, 'learning_rate': 6.769542368190162e-08, 'epoch': 3.89}
{'loss': 0.8771, 'learning_rate': 6.353100956761893e-08, 'epoch': 3.89}
{'loss': 0.8629, 'learning_rate': 5.949874397470634e-08, 'epoch': 3.89}
{'loss': 0.8763, 'learning_rate': 5.559863223515427e-08, 'epoch': 3.89}
{'loss': 0.8457, 'learning_rate': 5.183067950617071e-08, 'epoch': 3.89}
{'loss': 0.9548, 'learning_rate': 4.819489077021455e-08, 'epoch': 3.9}
{'loss': 0.8712, 'learning_rate': 4.469127083498448e-08, 'epoch': 3.9}
{'loss': 0.872, 'learning_rate': 4.1319824333407864e-08, 'epoch': 3.9}
{'loss': 0.8723, 'learning_rate': 3.808055572362967e-08, 'epoch': 3.9}
{'loss': 0.9563, 'learning_rate': 3.4973469289012465e-08, 'epoch': 3.9}
{'loss': 0.8961, 'learning_rate': 3.199856913813637e-08, 'epoch': 3.9}
{'loss': 0.9116, 'learning_rate': 2.915585920479913e-08, 'epoch': 3.9}
{'loss': 0.8716, 'learning_rate': 2.6445343247982755e-08, 'epoch': 3.91}
{'loss': 0.8707, 'learning_rate': 2.3867024851853546e-08, 'epoch': 3.91}
{'loss': 0.8533, 'learning_rate': 2.142090742580649e-08, 'epoch': 3.91}
{'loss': 0.8833, 'learning_rate': 1.9106994204409755e-08, 'epoch': 3.91}
{'loss': 0.9717, 'learning_rate': 1.6925288247393588e-08, 'epoch': 3.91}
{'loss': 0.823, 'learning_rate': 1.4875792439683623e-08, 'epoch': 3.91}
{'loss': 0.8259, 'learning_rate': 1.2958509491389769e-08, 'epoch': 3.91}
{'loss': 0.8943, 'learning_rate': 1.1173441937772922e-08, 'epoch': 3.92}
{'loss': 0.8425, 'learning_rate': 9.52059213927825e-09, 'epoch': 3.92}
{'loss': 0.8533, 'learning_rate': 7.999962281513006e-09, 'epoch': 3.92}
{'loss': 0.8883, 'learning_rate': 6.61155437524652e-09, 'epoch': 3.92}
{'loss': 0.8401, 'learning_rate': 5.355370256410197e-09, 'epoch': 3.92}
{'loss': 0.8426, 'learning_rate': 4.231411586064216e-09, 'epoch': 3.92}
{'loss': 0.9038, 'learning_rate': 3.2396798504752414e-09, 'epoch': 3.92}
{'loss': 0.8293, 'learning_rate': 2.3801763610165064e-09, 'epoch': 3.93}
{'loss': 0.8884, 'learning_rate': 1.6529022542455252e-09, 'epoch': 3.93}
{'loss': 0.8839, 'learning_rate': 1.0578584918374823e-09, 'epoch': 3.93}
{'loss': 0.9561, 'learning_rate': 5.950458606518439e-10, 'epoch': 3.93}
{'loss': 0.9134, 'learning_rate': 2.6446497266574555e-10, 'epoch': 3.93}
{'loss': 0.8959, 'learning_rate': 6.611626501840107e-11, 'epoch': 3.93}
{'loss': 0.891, 'learning_rate': 0.0, 'epoch': 3.93}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2752/2752 [47:18<00:00, 1.01s/it][2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,100] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,100] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,100] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,100] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,101] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,354] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,355] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,355] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,356] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,619] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,619] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,889] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,889] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,134] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,135] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,419] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,420] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,687] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,688] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,940] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,200] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,201] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,462] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,463] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,732] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,733] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,994] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,995] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:47,258] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:47,259] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0020508766174316, 'eval_runtime': 3.171, 'eval_samples_per_second': 344.368, 'eval_steps_per_second': 21.76, 'epoch': 3.93}
{'train_runtime': 2842.4239, 'train_samples_per_second': 75.254, 'train_steps_per_second': 0.968, 'train_loss': 0.9769066730947342, 'epoch': 3.93}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2752/2752 [47:26<00:00, 1.03s/it]
[2023-12-29 02:50:51,874] [INFO] [axolotl.train.log:60] [PID:161] [RANK:0] Training Completed!!! Saving pre-trained model to ./lora-out
root@8510995a57b3:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/u^C
root@8510995a57b3:/workspace/axolotl# ls -lh
total 88K
-rw-r--r-- 1 root root 648 Dec 28 07:07 FAQS.md
-rw-r--r-- 1 root root 12K Dec 28 07:07 LICENSE
-rw-r--r-- 1 root root 40K Dec 28 07:07 README.md
-rw-r--r-- 1 root root 262 Dec 28 07:07 TODO.md
drwxr-xr-x 2 root root 103 Dec 29 01:58 deepspeed
drwxr-xr-x 2 root root 88 Dec 29 01:58 docker
-rw-r--r-- 1 root root 701 Dec 28 07:07 docker-compose.yaml
drwxr-xr-x 2 root root 96 Dec 29 01:58 docs
drwxr-xr-x 20 root root 4.0K Dec 29 01:58 examples
drwxr-xr-x 2 root root 95 Dec 29 01:58 image
drwxr-xr-x 3 root root 54 Dec 29 02:02 last_run_prepared
drwxr-xr-x 7 root root 332 Dec 29 02:50 lora-out
-rw-r--r-- 1 root root 22 Dec 28 07:07 requirements-dev.txt
-rw-r--r-- 1 root root 7 Dec 28 07:07 requirements-tests.txt
-rw-r--r-- 1 root root 552 Dec 28 07:07 requirements.txt
drwxr-xr-x 2 root root 65 Dec 29 01:58 scripts
-rw-r--r-- 1 root root 1.8K Dec 28 07:07 setup.py
drwxr-xr-x 4 root root 57 Dec 29 01:58 src
drwxr-xr-x 5 root root 4.0K Dec 29 01:58 tests
root@8510995a57b3:/workspace/axolotl# ls ~/..cacache
ls: cannot access '/root/..cacache': No such file or directory
root@8510995a57b3:/workspace/axolotl# ls ~/.cache
conda huggingface matplotlib pip
root@8510995a57b3:/workspace/axolotl# ls ~/.cache/huggingface/
datasets hub
root@8510995a57b3:/workspace/axolotl# ls ~/.cache/huggingface/datasets
_root_.cache_huggingface_datasets_teknium___gpt4-llm-cleaned_default_0.0.0_b4e7d42750cbc1d81f9b85b98b13b48c88092adb.lock teknium___gpt4-llm-cleaned
downloads
root@8510995a57b3:/workspace/axolotl# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 20G 6.9G 14G 35% /
tmpfs 64M 0 64M 0% /dev
tmpfs 252G 0 252G 0% /sys/fs/cgroup
shm 251G 0 251G 0% /dev/shm
/dev/mapper/vg0-root 16G 5.4G 9.4G 37% /usr/bin/nvidia-smi
/dev/mapper/vg0-docker 100G 485M 100G 1% /workspace
tmpfs 252G 12K 252G 1% /proc/driver/nvidia
tmpfs 252G 4.0K 252G 1% /etc/nvidia/nvidia-application-profiles-rc.d
tmpfs 51G 25M 51G 1% /run/nvidia-persistenced/socket
tmpfs 252G 0 252G 0% /proc/asound
tmpfs 252G 0 252G 0% /proc/acpi
tmpfs 252G 0 252G 0% /proc/scsi
tmpfs 252G 0 252G 0% /sys/firmware
root@8510995a57b3:/workspace/axolotl# ls ~/.cache/
conda huggingface matplotlib pip
root@8510995a57b3:/workspace/axolotl# cp -r !$/huggingface /workspace/
cp -r ~/.cache//huggingface /workspace/
root@8510995a57b3:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/mistral/config.yml
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `8`
More than one GPU was found, enabling multi-GPU training.
If this was unintended please pass in `--num_processes=1`.
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:
================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================
warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 03:05:46,462] [INFO] [datasets.<module>:58] [PID:5112] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 03:05:46,491] [INFO] [datasets.<module>:58] [PID:5115] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 03:05:46,506] [INFO] [datasets.<module>:58] [PID:5110] PyTorch version 2.0.1+cu118 available.
[2023-12-29 03:05:46,528] [INFO] [datasets.<module>:58] [PID:5117] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 03:05:46,560] [INFO] [datasets.<module>:58] [PID:5114] PyTorch version 2.0.1+cu118 available.
[2023-12-29 03:05:46,566] [INFO] [datasets.<module>:58] [PID:5113] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 03:05:46,725] [INFO] [datasets.<module>:58] [PID:5116] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2023-12-29 03:05:46,815] [INFO] [datasets.<module>:58] [PID:5111] PyTorch version 2.0.1+cu118 available.
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 141kB/s]
[2023-12-29 03:05:47,838] [INFO] [axolotl.normalize_config:150] [PID:5115] [RANK:5] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,853] [INFO] [axolotl.normalize_config:150] [PID:5110] [RANK:0] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,860] [INFO] [axolotl.normalize_config:150] [PID:5112] [RANK:2] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,874] [INFO] [axolotl.normalize_config:150] [PID:5117] [RANK:7] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,874] [INFO] [axolotl.normalize_config:150] [PID:5113] [RANK:3] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,877] [INFO] [axolotl.normalize_config:150] [PID:5114] [RANK:4] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:48,239] [INFO] [axolotl.normalize_config:150] [PID:5116] [RANK:6] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:48,274] [INFO] [axolotl.normalize_config:150] [PID:5111] [RANK:1] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:48,277] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5111] [RANK:1] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,280] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5114] [RANK:4] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,280] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5115] [RANK:5] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
dP dP dP
88 88 88
.d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88
88' `88 `8bd8' 88' `88 88 88' `88 88 88
88. .88 .d88b. 88. .88 88 88. .88 88 88
`88888P8 dP' `dP `88888P' dP `88888P' dP dP
[2023-12-29 03:05:48,285] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5112] [RANK:2] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,285] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5116] [RANK:6] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,285] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5110] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,286] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5113] [RANK:3] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,287] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5117] [RANK:7] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 967/967 [00:00<00:00, 172kB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 51.8MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 72.0/72.0 [00:00<00:00, 91.1kB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 14.8MB/s]
[2023-12-29 03:05:49,783] [DEBUG] [axolotl.load_tokenizer:184] [PID:5114] [RANK:4] EOS: 2 / </s>
[2023-12-29 03:05:49,783] [DEBUG] [axolotl.load_tokenizer:185] [PID:5114] [RANK:4] BOS: 1 / <s>
[2023-12-29 03:05:49,783] [DEBUG] [axolotl.load_tokenizer:186] [PID:5114] [RANK:4] PAD: 2 / </s>
[2023-12-29 03:05:49,783] [DEBUG] [axolotl.load_tokenizer:187] [PID:5114] [RANK:4] UNK: 0 / <unk>
[2023-12-29 03:05:49,799] [DEBUG] [axolotl.load_tokenizer:184] [PID:5113] [RANK:3] EOS: 2 / </s>
[2023-12-29 03:05:49,799] [DEBUG] [axolotl.load_tokenizer:185] [PID:5113] [RANK:3] BOS: 1 / <s>
[2023-12-29 03:05:49,799] [DEBUG] [axolotl.load_tokenizer:186] [PID:5113] [RANK:3] PAD: 2 / </s>
[2023-12-29 03:05:49,799] [DEBUG] [axolotl.load_tokenizer:187] [PID:5113] [RANK:3] UNK: 0 / <unk>
[2023-12-29 03:05:49,805] [DEBUG] [axolotl.load_tokenizer:184] [PID:5115] [RANK:5] EOS: 2 / </s>
[2023-12-29 03:05:49,805] [DEBUG] [axolotl.load_tokenizer:185] [PID:5115] [RANK:5] BOS: 1 / <s>
[2023-12-29 03:05:49,805] [DEBUG] [axolotl.load_tokenizer:186] [PID:5115] [RANK:5] PAD: 2 / </s>
[2023-12-29 03:05:49,805] [DEBUG] [axolotl.load_tokenizer:187] [PID:5115] [RANK:5] UNK: 0 / <unk>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:184] [PID:5110] [RANK:0] EOS: 2 / </s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:185] [PID:5110] [RANK:0] BOS: 1 / <s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:186] [PID:5110] [RANK:0] PAD: 2 / </s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:184] [PID:5112] [RANK:2] EOS: 2 / </s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:185] [PID:5112] [RANK:2] BOS: 1 / <s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:187] [PID:5110] [RANK:0] UNK: 0 / <unk>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:186] [PID:5112] [RANK:2] PAD: 2 / </s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:187] [PID:5112] [RANK:2] UNK: 0 / <unk>
[2023-12-29 03:05:49,807] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5110] [RANK:0] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:49,807] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5110] [RANK:0] Loading raw datasets...
[2023-12-29 03:05:49,807] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5110] [RANK:0] No seed provided, using default seed of 42
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:184] [PID:5117] [RANK:7] EOS: 2 / </s>
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:185] [PID:5117] [RANK:7] BOS: 1 / <s>
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:186] [PID:5117] [RANK:7] PAD: 2 / </s>
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:187] [PID:5117] [RANK:7] UNK: 0 / <unk>
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:184] [PID:5116] [RANK:6] EOS: 2 / </s>
[2023-12-29 03:05:49,813] [DEBUG] [axolotl.load_tokenizer:185] [PID:5116] [RANK:6] BOS: 1 / <s>
[2023-12-29 03:05:49,813] [DEBUG] [axolotl.load_tokenizer:186] [PID:5116] [RANK:6] PAD: 2 / </s>
[2023-12-29 03:05:49,813] [DEBUG] [axolotl.load_tokenizer:187] [PID:5116] [RANK:6] UNK: 0 / <unk>
[2023-12-29 03:05:49,837] [DEBUG] [axolotl.load_tokenizer:184] [PID:5111] [RANK:1] EOS: 2 / </s>
[2023-12-29 03:05:49,837] [DEBUG] [axolotl.load_tokenizer:185] [PID:5111] [RANK:1] BOS: 1 / <s>
[2023-12-29 03:05:49,837] [DEBUG] [axolotl.load_tokenizer:186] [PID:5111] [RANK:1] PAD: 2 / </s>
[2023-12-29 03:05:49,837] [DEBUG] [axolotl.load_tokenizer:187] [PID:5111] [RANK:1] UNK: 0 / <unk>
Downloading readme: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 173kB/s]
Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1.76M/1.76M [00:01<00:00, 1.14MB/s]
Generating train split: 2000 examples [00:00, 99498.37 examples/s]
Map (num_proc=64): 100%|████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 3086.84 examples/s]
[2023-12-29 03:05:58,120] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5110] [RANK:0] merging datasets
[2023-12-29 03:05:58,126] [INFO] [axolotl.load_tokenized_prepared_datasets:369] [PID:5110] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
Saving the dataset (1/1 shards): 100%|████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 109410.44 examples/s]
[2023-12-29 03:05:59,853] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5114] [RANK:4] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,853] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5115] [RANK:5] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5114] [RANK:4] Loading raw datasets...
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5115] [RANK:5] Loading raw datasets...
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5114] [RANK:4] No seed provided, using default seed of 42
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5115] [RANK:5] No seed provided, using default seed of 42
[2023-12-29 03:05:59,853] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5112] [RANK:2] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5112] [RANK:2] Loading raw datasets...
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5112] [RANK:2] No seed provided, using default seed of 42
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5113] [RANK:3] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5113] [RANK:3] Loading raw datasets...
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5113] [RANK:3] No seed provided, using default seed of 42
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5111] [RANK:1] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5117] [RANK:7] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5111] [RANK:1] Loading raw datasets...
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5117] [RANK:7] Loading raw datasets...
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5111] [RANK:1] No seed provided, using default seed of 42
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5117] [RANK:7] No seed provided, using default seed of 42
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5116] [RANK:6] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5116] [RANK:6] Loading raw datasets...
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5116] [RANK:6] No seed provided, using default seed of 42
Filter (num_proc=96): 46%|█████████████████████████████████████▉ | 880/1900 [00:00<00:00, 3558.07 examples/s][2023-12-29 03:06:03,178] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5114] [RANK:4] merging datasets
[2023-12-29 03:06:03,222] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5117] [RANK:7] merging datasets
[2023-12-29 03:06:03,229] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5116] [RANK:6] merging datasets
[2023-12-29 03:06:03,235] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5115] [RANK:5] merging datasets
[2023-12-29 03:06:03,271] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5111] [RANK:1] merging datasets
Filter (num_proc=96): 100%|█████████████████████████████████████████████████████████████████████████████████| 1900/1900 [00:00<00:00, 3799.02 examples/s]
[2023-12-29 03:06:03,538] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5112] [RANK:2] merging datasets
[2023-12-29 03:06:03,746] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5113] [RANK:3] merging datasets
Filter (num_proc=96): 100%|████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 209.73 examples/s]
Map (num_proc=96): 100%|████████████████████████████████████████████████████████████████████████████████████| 1900/1900 [00:01<00:00, 1753.75 examples/s]
[2023-12-29 03:06:18,321] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] total_num_tokens: 405259
[2023-12-29 03:06:18,336] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] `total_supervised_tokens: 282059`
[2023-12-29 03:06:24,148] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5110] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,149] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] data_loader_len: 23
[2023-12-29 03:06:24,621] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5112] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,634] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5114] [RANK:4] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,656] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5111] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,749] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5113] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,773] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5115] [RANK:5] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,792] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5117] [RANK:7] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,995] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5116] [RANK:6] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:25,057] [INFO] [axolotl.log:60] [PID:5110] [RANK:0] sample_packing_eff_est across ranks: [0.9894018769264221, 0.9894018769264221, 0.9894018769264221, 0.9894018769264221, 0.9894018769264221, 0.9894018769264221, 0.9513479471206665, 0.9894018769264221]
[2023-12-29 03:06:25,061] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] sample_packing_eff_est: 0.99
[2023-12-29 03:06:25,062] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] total_num_steps: 11
[2023-12-29 03:06:25,075] [DEBUG] [axolotl.train.log:60] [PID:5110] [RANK:0] loading tokenizer... mistralai/Mistral-7B-v0.1
[2023-12-29 03:06:25,398] [DEBUG] [axolotl.load_tokenizer:184] [PID:5114] [RANK:4] EOS: 2 / </s>
[2023-12-29 03:06:25,398] [DEBUG] [axolotl.load_tokenizer:185] [PID:5114] [RANK:4] BOS: 1 / <s>
[2023-12-29 03:06:25,399] [DEBUG] [axolotl.load_tokenizer:186] [PID:5114] [RANK:4] PAD: 2 / </s>
[2023-12-29 03:06:25,399] [DEBUG] [axolotl.load_tokenizer:187] [PID:5114] [RANK:4] UNK: 0 / <unk>
[2023-12-29 03:06:25,406] [DEBUG] [axolotl.load_tokenizer:184] [PID:5115] [RANK:5] EOS: 2 / </s>
[2023-12-29 03:06:25,406] [DEBUG] [axolotl.load_tokenizer:185] [PID:5115] [RANK:5] BOS: 1 / <s>
[2023-12-29 03:06:25,406] [DEBUG] [axolotl.load_tokenizer:186] [PID:5115] [RANK:5] PAD: 2 / </s>
[2023-12-29 03:06:25,406] [DEBUG] [axolotl.load_tokenizer:187] [PID:5115] [RANK:5] UNK: 0 / <unk>
[2023-12-29 03:06:25,407] [DEBUG] [axolotl.load_tokenizer:184] [PID:5117] [RANK:7] EOS: 2 / </s>
[2023-12-29 03:06:25,407] [DEBUG] [axolotl.load_tokenizer:185] [PID:5117] [RANK:7] BOS: 1 / <s>
[2023-12-29 03:06:25,407] [DEBUG] [axolotl.load_tokenizer:186] [PID:5117] [RANK:7] PAD: 2 / </s>
[2023-12-29 03:06:25,407] [DEBUG] [axolotl.load_tokenizer:187] [PID:5117] [RANK:7] UNK: 0 / <unk>
[2023-12-29 03:06:25,409] [DEBUG] [axolotl.load_tokenizer:184] [PID:5112] [RANK:2] EOS: 2 / </s>
[2023-12-29 03:06:25,409] [DEBUG] [axolotl.load_tokenizer:185] [PID:5112] [RANK:2] BOS: 1 / <s>
[2023-12-29 03:06:25,409] [DEBUG] [axolotl.load_tokenizer:186] [PID:5112] [RANK:2] PAD: 2 / </s>
[2023-12-29 03:06:25,409] [DEBUG] [axolotl.load_tokenizer:187] [PID:5112] [RANK:2] UNK: 0 / <unk>
[2023-12-29 03:06:25,411] [DEBUG] [axolotl.load_tokenizer:184] [PID:5116] [RANK:6] EOS: 2 / </s>
[2023-12-29 03:06:25,411] [DEBUG] [axolotl.load_tokenizer:185] [PID:5116] [RANK:6] BOS: 1 / <s>
[2023-12-29 03:06:25,411] [DEBUG] [axolotl.load_tokenizer:186] [PID:5116] [RANK:6] PAD: 2 / </s>
[2023-12-29 03:06:25,411] [DEBUG] [axolotl.load_tokenizer:187] [PID:5116] [RANK:6] UNK: 0 / <unk>
[2023-12-29 03:06:25,423] [DEBUG] [axolotl.load_tokenizer:184] [PID:5113] [RANK:3] EOS: 2 / </s>
[2023-12-29 03:06:25,423] [DEBUG] [axolotl.load_tokenizer:185] [PID:5113] [RANK:3] BOS: 1 / <s>
[2023-12-29 03:06:25,423] [DEBUG] [axolotl.load_tokenizer:186] [PID:5113] [RANK:3] PAD: 2 / </s>
[2023-12-29 03:06:25,423] [DEBUG] [axolotl.load_tokenizer:187] [PID:5113] [RANK:3] UNK: 0 / <unk>
[2023-12-29 03:06:25,425] [DEBUG] [axolotl.load_tokenizer:184] [PID:5111] [RANK:1] EOS: 2 / </s>
[2023-12-29 03:06:25,426] [DEBUG] [axolotl.load_tokenizer:185] [PID:5111] [RANK:1] BOS: 1 / <s>
[2023-12-29 03:06:25,426] [DEBUG] [axolotl.load_tokenizer:186] [PID:5111] [RANK:1] PAD: 2 / </s>
[2023-12-29 03:06:25,426] [DEBUG] [axolotl.load_tokenizer:187] [PID:5111] [RANK:1] UNK: 0 / <unk>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.load_tokenizer:184] [PID:5110] [RANK:0] EOS: 2 / </s>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.load_tokenizer:185] [PID:5110] [RANK:0] BOS: 1 / <s>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.load_tokenizer:186] [PID:5110] [RANK:0] PAD: 2 / </s>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.load_tokenizer:187] [PID:5110] [RANK:0] UNK: 0 / <unk>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.train.log:60] [PID:5110] [RANK:0] loading model
[2023-12-29 03:06:25,545] [INFO] [axolotl.load_model:256] [PID:5114] [RANK:4] patching with flash attention
[2023-12-29 03:06:25,545] [INFO] [axolotl.load_model:256] [PID:5115] [RANK:5] patching with flash attention
[2023-12-29 03:06:25,548] [INFO] [axolotl.load_model:256] [PID:5112] [RANK:2] patching with flash attention
[2023-12-29 03:06:25,556] [INFO] [axolotl.load_model:256] [PID:5116] [RANK:6] patching with flash attention
[2023-12-29 03:06:25,560] [INFO] [axolotl.load_model:256] [PID:5113] [RANK:3] patching with flash attention
[2023-12-29 03:06:25,569] [INFO] [axolotl.load_model:256] [PID:5111] [RANK:1] patching with flash attention
[2023-12-29 03:06:25,577] [INFO] [axolotl.load_model:256] [PID:5110] [RANK:0] patching with flash attention
[2023-12-29 03:06:25,767] [INFO] [axolotl.load_model:256] [PID:5117] [RANK:7] patching with flash attention
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 4.31MB/s]
Downloading shards: 0%| | 0/2 [00:00<?, ?it/s]
model-00001-of-00002.safetensors: 1%|▌ | 73.4M/9.94G [00:01<03:01, 54.2MB/s]