More LLMs | Notion

Completely open source including training, data under truly open licenses
- Pythia
- LLM360
- Olmo
- OpenChat? (Alignment Lab)
HuggingFace/BigCode starcoder
Mistral-7B
translation: https://huggingface.co/facebook/seamless-m4t-v2-large
Dolly applies llama training data to older gpt-j 6b https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
GPT-Neo, GPT-J by EleutherAI
TinyLLAMA: 1.1B, Llama 2 arch
Deepseek, qwen, yi https://twitter.com/jeremyphoward/status/1730105499834847385?t=RKyFyg5bhajeI1If5IGwiA&s=19
Bloom (HN)
Code focused models
- Codegen
- Code Alpaca: short code tasks
- Replit: https://twitter.com/swyx/status/1650989632413401089?t=ZLXz8oSZ7XgXhsZRDLUzHQ&s=19
LLMs for coding
- https://blog.continue.dev/what-llm-to-use/
- Code Llama is an LLM trained by Meta for generating and discussing code. It is built on top of Llama 2. Even though it is below WizardCoder and Phind-CodeLlama on the Big Code Models Leaderboard, it is the base model for both of them. It also comes in a variety of sizes: 7B, 13B, and 34B, which makes it popular to use on local machines as well as with hosted providers. At this point, it is the most well-known open-source base model for coding and is leading the open-source effort to create coding capable LLMs.
  - CodeLlama: 7B, 13B has FIM, but not 34B
- WizardCoder is an LLM built on top of Code Llama by the WizardLM team. The Evol-Instruct method is adapted for coding tasks to create a training dataset, which is used to fine-tune Code Llama. It comes in the same sizes as Code Llama: 7B, 13B, and 34B. As a result, it is the most popular open-source instruction-tuned LLM so far.
  - WizardCoder: “poor” (HN)
- Phind-CodeLlama is an LLM built on top of Code Llama by Phind. A proprietary dataset of ~80k high-quality programming problems and solutions was used to fine-tune Code Llama. That fine-tuned model was then further fine-tuned on 1.5B additional tokens. It currently leads on the Big Code Models Leaderboard. However, it is only available as a 34B parameter model, so it requires more available memory to be used.
  - Phind CodeLlama: “poor” (HN)
- Mistral is a 7B parameter LLM trained by Mistal AI. It is the most recently released model on this list, having dropped at the end of September. Mistal AI says that it “approaches CodeLlama 7B performance on code, while remaining good at English tasks”. Despite only being available in the one small size, people are quite excited about it in the first couple weeks after release. The first fine-tuned LLMs that use it as their base are now beginning to emerge, and we are likely to see more going forward.
- StarCoder is a 15B parameter LLM trained by BigCode, which was ahead of its time when it was released in May. It was trained on 80+ programming languages from The Stack (v1.2) with opt-out requests excluded. It is not an instruction model and commands like "Write a function that computes the square root" do not work well. However, by using the Tech Assistant prompt you can make it more helpful.
- Llama 2 is an LLM trained by Meta on 2 trillion tokens. It is the most popular open source LLM overall, so some developers use it, despite it not being as good as many of the models above at making code edits. It is also important because Code Llama, the most popular LLM for coding, is built on top of it, which in turn is the foundation for WizardCoder and Phind-CodeLlama.
Llms that use other machine learning models as tools for vision etc
- https://shikun.io/projects/prismer
- https://github.com/microsoft/visual-chatgpt uses react style agent
- https://viper.cs.columbia.edu/
- https://arxiv.org/abs/2303.17580 hugging GPT
https://vicuna.lmsys.org/ : -13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT
Toolformer: with Meta; has specific format; requires thousands of examples per API; works on weaker models (GPT-J 6B)
ReAct: with Google Brain
LaMDA: similar to Toolformer, launched in Bard
ChatGPT Plugins: relies on OpenAPI specs’ human descriptions of params
Comparisons
- ReAct vs Toolformer vs LaMDA
  - Toolformer requires fine tuning model, can work on weaker models
  - Source: https://threadreaderapp.com/thread/1640530396076187649.html