/Users/yang/Library/Application Support/hatch/env/virtual/PROJECT/s85Jsv-M/PROJECT/bin/python
--data-dir .
to hatch, or set HATCH_DATA_DIR (source)gpt-neox: requires cuda 11.7, so installed cu117 from runscript (choose to omit the kernel, and choose alternate path, but it seems to ignore alt path). Then add to PATH and LD_LIBRARY_PATH as post-install instructions say.
axolotl:
This fails to build:
$ pip3 install packaging
$ pip3 install -e '.[flash-attn,deepspeed]'
...
Collecting flash-attn==2.3.3 (from axolotl==0.3.0)
Downloading flash_attn-2.3.3.tar.gz (2.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 256.1 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
Traceback (most recent call last):
File "/home/ubuntu/axolotl2/.direnv/python-3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/home/ubuntu/axolotl2/.direnv/python-3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/axolotl2/.direnv/python-3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-q4j7s6bv/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-q4j7s6bv/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-q4j7s6bv/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 480, in run_setup
super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-q4j7s6bv/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "<string>", line 8, in <module>
ModuleNotFoundError: No module named 'packaging'
[end of output]
# can't just run the whole pinstall with --no-build-isolation
$ pip3 install --no-build-isolation --no-cache-dir -e '.[flash-attn,deepspeed]'
Obtaining file:///home/ubuntu/axolotl2
Checking if build backend supports build_editable ... done
Preparing editable metadata (pyproject.toml) ... error
error: subprocess-exited-with-error
× Preparing editable metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
running dist_info
creating /tmp/pip-modern-metadata-rzwoxot9/axolotl.egg-info
writing /tmp/pip-modern-metadata-rzwoxot9/axolotl.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-modern-metadata-rzwoxot9/axolotl.egg-info/dependency_links.txt
writing requirements to /tmp/pip-modern-metadata-rzwoxot9/axolotl.egg-info/requires.txt
writing top-level names to /tmp/pip-modern-metadata-rzwoxot9/axolotl.egg-info/top_level.txt
writing manifest file '/tmp/pip-modern-metadata-rzwoxot9/axolotl.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-modern-metadata-rzwoxot9/axolotl.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-modern-metadata-rzwoxot9/axolotl.egg-info/SOURCES.txt'
creating '/tmp/pip-modern-metadata-rzwoxot9/axolotl-0.3.0.dist-info'
error: invalid command 'bdist_wheel'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
# needs to do
pip3 install packaging wheel
pip3 install --no-build-isolation flash-attn==2.3.3
pip3 install -e '.[flash-attn,deepspeed]'
At import time, I end up with an error importing flash_attn:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/ubuntu/axolotl/.direnv/python-3.11/lib/python3.11/site-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/home/ubuntu/axolotl/.direnv/python-3.11/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 8, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /home/ubuntu/axolotl/.direnv/python-3.11/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
requires flash-attn, whose flash_attn has a flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so that depends on _ZN3c104cuda9SetDeviceEi
axolotl Docker builds with pytorch 2.0.1+cu117, though, seemingly without additional magic pip commands
In Docker, it doesn’t have this symbol either, so it’s not even looking for it - how is flash-attn getting installed?
root@7d356900a935:/workspace/axolotl# python -c 'import torch; print(torch.__version__)'
2.0.1+cu117
root@7d356900a935:/workspace/axolotl# python -c 'import torch; print(torch.__file__)'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/__init__.py
root@7d356900a935:/workspace/axolotl# find /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/ -name libtorch_cuda.so
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so
root@7d356900a935:/workspace/axolotl# nm -u /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so | grep -i setdevice
U cudaSetDevice@@libcudart.so.11.0
In host, I can make it work with:
# installs torch 2.1.1+cu121
pip install --extra-index-url <https://huggingface.github.io/autogptq-index/whl/cu118/> auto-gptq==0.5.1
pip install flash-attn==2.3.3
Got it working on host also with: (LYING, not sure how I got it working, but this isn’t sufficient)
# maybe having torch did it? or maybe having specifically torch 2.1.2 and cuda 12.1?
$ pip3 install --no-build-isolation torch packaging wheel
$ pip3 install --no-build-isolation --no-cache-dir flash-attn==2.3.3
$ python -c 'import flash_attn' # no error!
$ pip3 install --no-cache-dir -e '.[flash-attn,deepspeed]'
For some reason, inside Docker, flash-attn builds/installs just fine - it can see packaging!
Thought maybe there was something funky with conda, but happens even with a normal ubuntu python, just (as root) pip install packaging
and then pip install flash-attn
can see packaging
With conda, these are the paths it sees:
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/packaging/__init__.py
sys.path is ['', '/root/miniconda3/envs/py3.9/lib/python39.zip', '/root/miniconda3/envs/py3.9/lib/python3.9', '/root/miniconda3/envs/py3.9/lib/python3.9/lib-dynload', '/root/miniconda3/envs/py3.9/lib/python3.9/site-packages']
With normal ubuntu python, these are the paths it sees:
/usr/local/lib/python3.10/dist-packages/packaging/__init__.py
sys.path is ['', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages', '/tmp/pip-install-r7e9o_tz/flash-attn_4a95b88dd57f426096ebebb3b173b1a3']
Not sure what you see if you use venv, not sure how to run `pip install .` in a way that targets a certain venv
But I can definitely see this is where things break down
So this seems like something specific to using venvs