-
Completely open source including training, data under truly open licenses
- Pythia
- LLM360
- Olmo
- OpenChat? (Alignment Lab)
-
HuggingFace/BigCode starcoder
-
Mistral-7B
-
Dolly applies llama training data to older gpt-j 6b https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
-
GPT-Neo, GPT-J by EleutherAI
-
TinyLLAMA: 1.1B, Llama 2 arch
-
Deepseek, qwen, yi https://twitter.com/jeremyphoward/status/1730105499834847385?t=RKyFyg5bhajeI1If5IGwiA&s=19
-
Bloom (HN)
-
Code focused models
-
LLMs for coding
- https://blog.continue.dev/what-llm-to-use/
- Code Llama is an LLM trained by Meta for generating and discussing code. It is built on top of Llama 2. Even though it is below WizardCoder and Phind-CodeLlama on the Big Code Models Leaderboard, it is the base model for both of them. It also comes in a variety of sizes: 7B, 13B, and 34B, which makes it popular to use on local machines as well as with hosted providers. At this point, it is the most well-known open-source base model for coding and is leading the open-source effort to create coding capable LLMs.
- CodeLlama: 7B, 13B has FIM, but not 34B
- WizardCoder is an LLM built on top of Code Llama by the WizardLM team. The Evol-Instruct method is adapted for coding tasks to create a training dataset, which is used to fine-tune Code Llama. It comes in the same sizes as Code Llama: 7B, 13B, and 34B. As a result, it is the most popular open-source instruction-tuned LLM so far.
- Phind-CodeLlama is an LLM built on top of Code Llama by Phind. A proprietary dataset of ~80k high-quality programming problems and solutions was used to fine-tune Code Llama. That fine-tuned model was then further fine-tuned on 1.5B additional tokens. It currently leads on the Big Code Models Leaderboard. However, it is only available as a 34B parameter model, so it requires more available memory to be used.
- Phind CodeLlama: “poor” (HN)
- Mistral is a 7B parameter LLM trained by Mistal AI. It is the most recently released model on this list, having dropped at the end of September. Mistal AI says that it “approaches CodeLlama 7B performance on code, while remaining good at English tasks”. Despite only being available in the one small size, people are quite excited about it in the first couple weeks after release. The first fine-tuned LLMs that use it as their base are now beginning to emerge, and we are likely to see more going forward.
- StarCoder is a 15B parameter LLM trained by BigCode, which was ahead of its time when it was released in May. It was trained on 80+ programming languages from The Stack (v1.2) with opt-out requests excluded. It is not an instruction model and commands like "Write a function that computes the square root" do not work well. However, by using the Tech Assistant prompt you can make it more helpful.
- Llama 2 is an LLM trained by Meta on 2 trillion tokens. It is the most popular open source LLM overall, so some developers use it, despite it not being as good as many of the models above at making code edits. It is also important because Code Llama, the most popular LLM for coding, is built on top of it, which in turn is the foundation for WizardCoder and Phind-CodeLlama.
-
Llms that use other machine learning models as tools for vision etc
-
https://vicuna.lmsys.org/ : -13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT
-
Toolformer: with Meta; has specific format; requires thousands of examples per API; works on weaker models (GPT-J 6B)
-
ReAct: with Google Brain
-
LaMDA: similar to Toolformer, launched in Bard
-
ChatGPT Plugins: relies on OpenAPI specs’ human descriptions of params
-
Comparisons
- ReAct vs Toolformer vs LaMDA