This is mostly about building things using LLMs, and about fine-tuning.
A recipe to fine-tune DeepSeekMath-Base 7B to act as a "reasoning agent" that can solve mathematical problems via a mix of natural language reasoning and the use of the Python REPL to compute intermediate results.
A novel decoding algorithm for tool-integrated reasoning (TIR) with code execution feedback to generate solution candidates during inference.
A variety of internal validation sets that we used to guide model selection and avoid overfitting to the public leaderboard.
Pytorch .bin: most uploaded to HF in this format which pytorch dumps (sometimes called .pk e.g. by llama2.c)
safetensors: HF lib, instead of normal (unsafe) pickle files for pytorch tensors
GGUF: llama.cpp format. TheBloke’s quantized outputs are in this format. Conversion script.
It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models.
GPTQ is the same quanitized file format for models that runs on GPU
Quote
- Llama.cpp (by Georgi Gerganov)
- GGUF (new)
- GGML (old)
- Transformers (by Huggingface)
- bin (unquantized)
- safetensors (safer unquantized)
- safetensors (quantized using GPTQ algorithm via AutoGPTQ)
- AutoGPTQ (quantization library based on GPTQ algorithm, also available via Transformers)
- safetensors (quantized using GPTQ algorithm)
- koboldcpp (fork of Llama.cpp)
- bin (using GGML algorithm)
- ExLlama v2 (extremely optimized GPTQ backend for LLaMA models)
- safetensors (quantized using GPTQ algorithm)
- AWQ (low-bit quantization (INT3/4))
- safetensors (using AWQ algorithm)
- GGUF contains all the metadata it needs in the model file (no need for other files like tokenizer_config.json) except the prompt template
- *llama.cpp has a script to convert *.safetensors model files into .gguf
- Transformers & Llama.cpp support both CPU, GPU and MPU inference
Quote