Applied LLMs | Notion

This is mostly about building things using LLMs, and about fine-tuning.

References
- Concise FT guide: https://medium.com/@mayaakim/complete-guide-to-llm-fine-tuning-for-beginners-c2c38a3252be
- Concise FT videos
  - https://www.deeplearning.ai/short-courses/finetuning-large-language-models/
  - Fine-tuning Large Language Models (LLMs) | w/ Example Code - YouTube
- HF Course to learn the libs
Leaderboards (caveat issues)
- TODO
Competition write-ups
- AIMO
  - AIMO Winner
    - Consisted of:
      - A recipe to fine-tune DeepSeekMath-Base 7B to act as a "reasoning agent" that can solve mathematical problems via a mix of natural language reasoning and the use of the Python REPL to compute intermediate results.
      - A novel decoding algorithm for tool-integrated reasoning (TIR) with code execution feedback to generate solution candidates during inference.
        
        For each problem, copy the input N times to define the initial batch of prompts to feed vLLM. This effectively defines the number of candidates one uses for majority voting.
        
        Sample N diverse completions until a complete block of Python code is produced.
        
        Execute each Python block and concatenate the output, including tracebacks if they appear.
        
        Repeat M times to produce a batch of generations of size N and depth M, allowing the model to self-correct code errors using the traceback. If a sample fails to produce sensible outputs (e.g., incomplete code blocks), prune that result.
        
        Postprocess the solution candidates and then apply majority voting to select the final answer
      - A variety of internal validation sets that we used to guide model selection and avoid overfitting to the public leaderboard.
  - AIM second place
  - AIMO getting started with DeepSeekMath
Applications
Composition tools and libraries
- https://github.com/hwchase17/langchain
  - Overengineered, but consolidates up-to-date techniques
    - Prompt templates = string interpolation, LLMs are just thin API wrappers around LLMs, chains = function composition
  - Agents: dynamic chain with overarching prompt pattern that registers tools
  - When processing docs, use “combine chains”: either “stuff” (concat), map-reduce, “refine” (fold), and map-rerank (score each chunk)
- https://github.com/jerryjliu/llama_index: Easier than LangChain for IR/QA tasks (source)
- https://github.com/deepset-ai/haystack
- https://github.com/chroma-core/chroma: simple document embeddings database, can use pluggable models, has in-mem (duckdb) mode for testing
- https://github.com/TimDettmers/bitsandbytes: The bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.
- xformers: Meta’s building blocks for transformers
- https://einops.rocks/ - Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, jax, and others.
- huggingface
  - datasets
  - transformers[sentencepiece]: AutoTokenizer, AutoModel*, Trainer, DataCollatorWithPadding, etc
  - evaluate: load/compute metrics
  - https://pypi.org/project/accelerate/: avoid hardcoding device details
  - peft
- gradio, streamlit: for building simple web UIs in python
RLHF how-to’s
- https://huggingface.co/blog/trl-peft
Implementations
- https://github.com/Lightning-AI/lit-gpt
- mingpt
- nanogpt
- ‣: minimal code to train 1-10B model
- https://github.com/lucidrains
- Llama and Alpaca
  - 2048 tokens
  - https://github.com/ggerganov/llama.cpp
  - https://github.com/antimatter15/alpaca.cpp
  - https://github.com/cocktailpeanut/dalai frontend UI
  - https://github.com/nsarrazin/serge frontend UI
  - https://news.ycombinator.com/item?id=35378683
Other modality models
- https://news.ycombinator.com/item?id=35352452
- Proprietary research
  - https://news.ycombinator.com/item?id=35365399
- Services
  - https://news.ycombinator.com/item?id=35358873
Pretrained embeddings
- GPT embeddings
- all-MiniLM-L6-v2: fastest
- all-mpnet-base-v2: best quality
- “The OpenAI text similarity models perform poorly and much worse than the state of the art (all-mpnet-base-v2 / all-roberta-large-v1).” (source)
- List from Chroma
File formats
Generic inference platforms
Quantization