This is mostly about building things using LLMs, and about fine-tuning.
Pytorch .bin: most uploaded to HF in this format which pytorch dumps (sometimes called .pk e.g. by llama2.c)
safetensors: HF lib, instead of normal (unsafe) pickle files for pytorch tensors
GGUF: llama.cpp format. TheBloke’s quantized outputs are in this format. Conversion script.
It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models.
GPTQ is the same quanitized file format for models that runs on GPU
Quote
- Llama.cpp (by Georgi Gerganov)
- GGUF (new)
- GGML (old)
- Transformers (by Huggingface)
- bin (unquantized)
- safetensors (safer unquantized)
- safetensors (quantized using GPTQ algorithm via AutoGPTQ)
- AutoGPTQ (quantization library based on GPTQ algorithm, also available via Transformers)
- safetensors (quantized using GPTQ algorithm)
- koboldcpp (fork of Llama.cpp)
- bin (using GGML algorithm)
- ExLlama v2 (extremely optimized GPTQ backend for LLaMA models)
- safetensors (quantized using GPTQ algorithm)
- AWQ (low-bit quantization (INT3/4))
- safetensors (using AWQ algorithm)
- GGUF contains all the metadata it needs in the model file (no need for other files like tokenizer_config.json) except the prompt template
- *llama.cpp has a script to convert *.safetensors model files into .gguf
- Transformers & Llama.cpp support both CPU, GPU and MPU inference
Quote
- GGML: used with llama.cpp, outdated, support is dropped or will be soon. cpu+gpu inference
- GGUF: "new version" of the GGML file format, used with llama.cpp. cpu+gpu inference. offers 2-8bit quantization
- GPTQ: pure gpu inference, used with AutoGPTQ, exllama, exllamav2, offers only 4 bit quantization
- EXL2: pure gpu inference, used with exllamav2, offers 2-8bit quantization
replicate
huggingface
textsynth
nlpcloud
banana.dev
forefront has some open source models for playing with