Advice
Hardware architectures
CUDA model
Tools
Usage tips
Sparse matrices
CSR, CSC
Block sparse: BSR, BSC. Same as CSR but on entire (uniformly sized) blocks
Interconnects
Second Generation | Third Generation | Fourth Generation | Fifth Generation |
---|---|---|---|
NVLink bandwidth per GPU | 300GB/s | 600GB/s | 900GB/s |
Maximum Number of Links per GPU | 6 | 12 | 18 |
Supported NVIDIA Architectures | NVIDIA Volta™ architecture | NVIDIA Ampere architecture | NVIDIA Hopper™ architecture |
Scripts
pip install cuda-python
conda install pytorch==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install nvidia/label/cuda-12.1.0::cuda-toolkit
conda install nvidia/label/cuda-12.1.0::cuda-cudart
Tools