Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

安装过程报错 #2

Closed
zhudy opened this issue Dec 9, 2024 · 6 comments
Closed

安装过程报错 #2

zhudy opened this issue Dec 9, 2024 · 6 comments

Comments

@zhudy
Copy link

zhudy commented Dec 9, 2024

x86和arm虚机上都试了,按照docker目录的Dockerfile 都可以启动容器:docker run -it -v ~/zhihu:/mnt nvidia/cuda:12.5.1-devel-ubuntu20.04,但同样碰到几个问题:

  1. nvidia/cuda:12.5.1-devel-ubuntu20.04 这个image arm和x86上都应该没有预装python,所以启动容器后,容器里需要额外+: apt install python3 python3-pip
  2. 容器里运行 root@8140d29f1925:/mnt/ZhiLight# pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
    都会报错(不知道为啥找不到numpy2.1.3版本,改成最新的2.2.0也找不到,去掉清华镜像也一样错误):
    Requirement already satisfied: torch==2.4.1 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 1)) (2.4.1)
    ERROR: Could not find a version that satisfies the requirement numpy==2.1.3 (from -r requirements.txt (line 2)) (from versions: 1.3.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.8.1, 1.8.2, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.10.0.post2, 1.10.1, 1.10.2, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 1.11.3, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 1.13.3, 1.14.0, 1.14.1, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6, 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.15.4, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4, 1.18.5, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.20.0, 1.20.1, 1.20.2, 1.20.3, 1.21.0, 1.21.1, 1.21.2, 1.21.3, 1.21.4, 1.21.5, 1.21.6, 1.22.0, 1.22.1, 1.22.2, 1.22.3, 1.22.4, 1.23.0, 1.23.1, 1.23.2, 1.23.3, 1.23.4, 1.23.5, 1.24.0, 1.24.1, 1.24.2, 1.24.3, 1.24.4)
    ERROR: No matching distribution found for numpy==2.1.3 (from -r requirements.txt (line 2))
  3. 忽略第二个错误,容器里继续执行打包: root@8140d29f1925:/mnt/ZhiLight# python3 setup.py bdist_wheel 报错:error: [Errno 2] No such file or directory: 'cmake' 通过apt install -y cmake 无法解决,会报告: CMake 3.18 or higher is required. You are running version 3.16.3 先删除之:apt remove cmake 换成高版本:wget https://cmake.org/files/v3.31/cmake-3.31.2-linux-x86_64.tar.gz && tar xzvf cmake-3.31.2-linux-x86_64.tar.gz && export PATH=$PATH:/mnt/cmake-3.31.2-linux-x86_64/bin
  4. 容器里继续执行打包: root@8140d29f1925:/mnt/ZhiLight# python3 setup.py bdist_wheel
    running bdist_wheel
    running build
    running build_py
    running egg_info
    writing zhilight.egg-info/PKG-INFO
    writing dependency_links to zhilight.egg-info/dependency_links.txt
    writing requirements to zhilight.egg-info/requires.txt
    writing top-level names to zhilight.egg-info/top_level.txt
    reading manifest file 'zhilight.egg-info/SOURCES.txt'
    writing manifest file 'zhilight.egg-info/SOURCES.txt'
    running build_ext
    -- The C compiler identification is GNU 9.4.0
    -- The CXX compiler identification is GNU 9.4.0
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working C compiler: /usr/bin/cc - skipped
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Check for working CXX compiler: /usr/bin/c++ - skipped
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- The CUDA compiler identification is NVIDIA 12.5.82 with host compiler GNU 9.4.0
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
    -- Detecting CUDA compile features
    -- Detecting CUDA compile features - done
    CUDA Version: 12.5.82
    -- Found Python: /usr/bin/python3.8 (found version "3.8.10") found components: Interpreter Development Development.Module Development.Embed
    Will link against CUDA 12 complied libraries
    CMAKE_CXX_FLAGS -D_GLIBCXX_USE_CXX11_ABI=0
    -- CMAKE_INSTALL_RPATH:
    -- Submodule update
    -- USE_STATIC_NCCL is set. Linking with static NCCL library.
    -- Found NCCL: /usr/include
    -- Determining NCCL version from /usr/include/nccl.h...
    -- Looking for NCCL_VERSION_CODE
    -- Looking for NCCL_VERSION_CODE - not found
    -- NCCL version < 2.3.5-5
    -- Found NCCL (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnccl_static.a)
    Traceback (most recent call last):
    File "", line 1, in
    ModuleNotFoundError: No module named 'pybind11'
    CMake Error at CMakeLists.txt:68 (find_package):
    By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has
    asked CMake to find a package configuration file provided by "pybind11",
    but CMake did not find one.

Could not find a package configuration file provided by "pybind11" with any
of the following names:

pybind11Config.cmake
pybind11-config.cmake

Add the installation prefix of "pybind11" to CMAKE_PREFIX_PATH or set
"pybind11_DIR" to a directory containing one of the above files. If
"pybind11" provides a separate development package or SDK, be sure it has
been installed.

-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
File "setup.py", line 138, in
setup(
File "/usr/lib/python3/dist-packages/setuptools/init.py", line 144, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 223, in run
self.run_command('build')
File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 87, in run
_build_ext.run(self)
File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/usr/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/usr/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "setup.py", line 102, in build_extension
subprocess.check_call(["cmake", ext.sourcedir] + cmake_args, cwd=build_temp)
File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/mnt/ZhiLight', '-DCMAKE_CXX_STANDARD=17', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/mnt/ZhiLight/build/lib.linux-x86_64-3.8/zhilight/', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DPYTHON_VERSION=3.8', '-DCMAKE_BUILD_TYPE=Release', '-DWITH_TESTING=ON', '-DEXAMPLE_VERSION_INFO=0.4.7', '-GNinja', '-DCMAKE_MAKE_PROGRAM:FILEPATH=ninja', '-DPython_ROOT_DIR=/usr']' returned non-zero exit status 1.

@unix1986
Copy link
Collaborator

unix1986 commented Dec 9, 2024

@zhudy Try pip install pybind11, the installation of dependencies may not have been completed.

@zhudy
Copy link
Author

zhudy commented Dec 9, 2024

@unix1986 thanks for your suggestion, it works for the pybind11 issue, but there comes this new issues:
root@8140d29f1925:/mnt/ZhiLight# export PATH=$PATH:/mnt/cmake-3.31.2-linux-x86_64/bin
root@8140d29f1925:/mnt/ZhiLight# cmake --version
cmake version 3.31.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).
root@8140d29f1925:/mnt/ZhiLight# pip install pybind11
Collecting pybind11
Downloading pybind11-2.13.6-py3-none-any.whl (243 kB)
|████████████████████████████████| 243 kB 2.7 MB/s
Installing collected packages: pybind11
Successfully installed pybind11-2.13.6
root@8140d29f1925:/mnt/ZhiLight# python3 setup.py bdist_wheel
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/zhilight
copying zhilight/loader.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/lazy_unpickling.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/convert.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/version.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/init.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/dynamic_batch.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/config.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/llama.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/load_tensor_util.py -> build/lib.linux-x86_64-3.8/zhilight
copying zhilight/quant.py -> build/lib.linux-x86_64-3.8/zhilight
creating build/lib.linux-x86_64-3.8/zhilight/server
copying zhilight/server/init.py -> build/lib.linux-x86_64-3.8/zhilight/server
creating build/lib.linux-x86_64-3.8/zhilight/server/openai
copying zhilight/server/openai/init.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai
creating build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/serving_completion.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/cli_args.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/api_server.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/serving_chat.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/init.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/preparse_cli_args.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/middleware.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/protocol.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
copying zhilight/server/openai/entrypoints/serving_engine.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/entrypoints
creating build/lib.linux-x86_64-3.8/zhilight/server/openai/lora
copying zhilight/server/openai/lora/request.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/lora
copying zhilight/server/openai/lora/init.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/lora
creating build/lib.linux-x86_64-3.8/zhilight/server/openai/basic
copying zhilight/server/openai/basic/utils.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/basic
copying zhilight/server/openai/basic/sequence.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/basic
copying zhilight/server/openai/basic/outputs.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/basic
copying zhilight/server/openai/basic/init.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/basic
copying zhilight/server/openai/basic/logger.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/basic
copying zhilight/server/openai/basic/config.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/basic
copying zhilight/server/openai/basic/sampling_params.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/basic
creating build/lib.linux-x86_64-3.8/zhilight/server/openai/engine
copying zhilight/server/openai/engine/metrics.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/engine
copying zhilight/server/openai/engine/llm_engine.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/engine
copying zhilight/server/openai/engine/init.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/engine
copying zhilight/server/openai/engine/async_llm_engine.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/engine
copying zhilight/server/openai/engine/arg_utils.py -> build/lib.linux-x86_64-3.8/zhilight/server/openai/engine
running egg_info
creating zhilight.egg-info
writing zhilight.egg-info/PKG-INFO
writing dependency_links to zhilight.egg-info/dependency_links.txt
writing requirements to zhilight.egg-info/requires.txt
writing top-level names to zhilight.egg-info/top_level.txt
writing manifest file 'zhilight.egg-info/SOURCES.txt'
reading manifest file 'zhilight.egg-info/SOURCES.txt'
writing manifest file 'zhilight.egg-info/SOURCES.txt'
running build_ext
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The CUDA compiler identification is NVIDIA 12.5.82 with host compiler GNU 9.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
CUDA Version: 12.5.82
-- Found Python: /usr/bin/python3.8 (found version "3.8.10") found components: Interpreter Development Development.Module Development.Embed
Will link against CUDA 12 complied libraries
CMAKE_CXX_FLAGS -D_GLIBCXX_USE_CXX11_ABI=0
-- CMAKE_INSTALL_RPATH:
-- Submodule update
-- USE_STATIC_NCCL is set. Linking with static NCCL library.
-- Found NCCL: /usr/include
-- Determining NCCL version from /usr/include/nccl.h...
-- Looking for NCCL_VERSION_CODE
-- Looking for NCCL_VERSION_CODE - not found
-- NCCL version < 2.3.5-5
-- Found NCCL (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnccl_static.a)
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found pybind11: /usr/local/lib/python3.8/dist-packages/pybind11/include (found version "2.13.6")
PYTORCH_CMAKE_PREFIX_PATH /usr/local/lib/python3.8/dist-packages/torch/share/cmake
CMAKE_PREFIX_PATH /usr/local/lib/python3.8/dist-packages/torch/share/cmake/Torch
-- Found CUDA: /usr/local/cuda (found version "12.5")
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.5.82")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Caffe2: CUDA detected: 12.5
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 12.5
-- /usr/local/cuda/lib64/libnvrtc.so shorthash is a50b0e02
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Added CUDA NVCC flags for: -gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_80,code=compute_80
CMake Warning at /usr/local/lib/python3.8/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/usr/local/lib/python3.8/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
tests/py_export_internal/CMakeLists.txt:13 (find_package)

-- Found Torch: /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch.so
-- Configuring done (41.9s)
CMake Warning (dev) in CMakeLists.txt:
Policy CMP0111 is not set: An imported target missing its location property
fails during generation. Run "cmake --help-policy CMP0111" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.

IMPORTED_LOCATION or IMPORTED_IMPLIB not set for imported target
"flash_attn" configuration "Release".
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) in CMakeLists.txt:
Policy CMP0111 is not set: An imported target missing its location property
fails during generation. Run "cmake --help-policy CMP0111" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.

IMPORTED_LOCATION or IMPORTED_IMPLIB not set for imported target
"flash_attn" configuration "Release".
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning at /mnt/cmake-3.31.2-linux-x86_64/share/cmake-3.31/Modules/FindPython/Support.cmake:4240 (add_library):
Cannot generate a safe runtime search path for target C because files in
some directories may conflict with libraries in implicit directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.
Call Stack (most recent call first):
/mnt/cmake-3.31.2-linux-x86_64/share/cmake-3.31/Modules/FindPython.cmake:692 (__Python_add_library)
/usr/local/lib/python3.8/dist-packages/pybind11/share/cmake/pybind11/pybind11NewTools.cmake:267 (python_add_library)
CMakeLists.txt:108 (pybind11_add_module)

CMake Warning (dev) in CMakeLists.txt:
Policy CMP0111 is not set: An imported target missing its location property
fails during generation. Run "cmake --help-policy CMP0111" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.

IMPORTED_LOCATION or IMPORTED_IMPLIB not set for imported target
"flash_attn" configuration "Release".
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning at 3rd/bmengine/tests/CMakeLists.txt:17 (add_executable):
Cannot generate a safe runtime search path for target test_allocator
because files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning at 3rd/bmengine/tests/CMakeLists.txt:20 (add_executable):
Cannot generate a safe runtime search path for target test_gemm because
files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning at 3rd/bmengine/tests/CMakeLists.txt:23 (add_executable):
Cannot generate a safe runtime search path for target test_print_tensor
because files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning at 3rd/bmengine/tests/CMakeLists.txt:26 (add_executable):
Cannot generate a safe runtime search path for target test_bitonic because
files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning at 3rd/bmengine/tests/CMakeLists.txt:29 (add_executable):
Cannot generate a safe runtime search path for target test_softmax because
files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning at 3rd/bmengine/tests/CMakeLists.txt:32 (add_executable):
Cannot generate a safe runtime search path for target test_index_select
because files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning at 3rd/bmengine/tests/CMakeLists.txt:35 (add_executable):
Cannot generate a safe runtime search path for target test_nccl because
files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning at 3rd/bmengine/tests/CMakeLists.txt:38 (add_executable):
Cannot generate a safe runtime search path for target test_thread_pool
because files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning (dev) in CMakeLists.txt:
Policy CMP0111 is not set: An imported target missing its location property
fails during generation. Run "cmake --help-policy CMP0111" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.

IMPORTED_LOCATION or IMPORTED_IMPLIB not set for imported target
"flash_attn" configuration "Release".
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning at /mnt/cmake-3.31.2-linux-x86_64/share/cmake-3.31/Modules/FindPython/Support.cmake:4240 (add_library):
Cannot generate a safe runtime search path for target internals_ because
files in some directories may conflict with libraries in implicit
directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/lib64
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/lib64
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.
Call Stack (most recent call first):
/mnt/cmake-3.31.2-linux-x86_64/share/cmake-3.31/Modules/FindPython.cmake:692 (__Python_add_library)
/usr/local/lib/python3.8/dist-packages/pybind11/share/cmake/pybind11/pybind11NewTools.cmake:267 (python_add_library)
tests/py_export_internal/CMakeLists.txt:20 (pybind11_add_module)

CMake Warning (dev) in CMakeLists.txt:
Policy CMP0111 is not set: An imported target missing its location property
fails during generation. Run "cmake --help-policy CMP0111" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.

IMPORTED_LOCATION or IMPORTED_IMPLIB not set for imported target
"flash_attn" configuration "Release".
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) in CMakeLists.txt:
Policy CMP0111 is not set: An imported target missing its location property
fails during generation. Run "cmake --help-policy CMP0111" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.

IMPORTED_LOCATION or IMPORTED_IMPLIB not set for imported target
"flash_attn" configuration "Release".
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning at src/nn/tests/CMakeLists.txt:6 (add_executable):
Cannot generate a safe runtime search path for target
test_attention_rag_buffer because files in some directories may conflict
with libraries in implicit directories:

runtime library [libcublas.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib
runtime library [libcublasLt.so.12] in /usr/local/cuda/targets/x86_64-linux/lib/stubs may be hidden by files in:
  /usr/local/cuda/targets/x86_64-linux/lib

Some of these libraries may not be found correctly.

CMake Warning (dev) in CMakeLists.txt:
Policy CMP0111 is not set: An imported target missing its location property
fails during generation. Run "cmake --help-policy CMP0111" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.

IMPORTED_LOCATION or IMPORTED_IMPLIB not set for imported target
"flash_attn" configuration "Release".
This warning is for project developers. Use -Wno-dev to suppress it.

-- Generating done (0.4s)
-- Build files have been written to: /mnt/ZhiLight/build/temp.linux-x86_64-3.8/zhilight.C
ninja: error: 'flash_attn-NOTFOUND', needed by '/mnt/ZhiLight/build/lib.linux-x86_64-3.8/zhilight/C.cpython-38-x86_64-linux-gnu.so', missing and no known rule to make it
Traceback (most recent call last):
File "setup.py", line 138, in
setup(
File "/usr/lib/python3/dist-packages/setuptools/init.py", line 144, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 223, in run
self.run_command('build')
File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 87, in run
_build_ext.run(self)
File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/usr/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/usr/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "setup.py", line 103, in build_extension
subprocess.check_call(["cmake", "--build", "."] + build_args, cwd=build_temp)
File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target C']' returned non-zero exit status 1.

@zhudy
Copy link
Author

zhudy commented Dec 9, 2024

小结一下,这个issue的根本原因是nvidia/cuda:12.5.1-devel-ubuntu20.04 这个image arm和x86上都应该没有预装python, 而用apt install python3的版本为3.8.10,所以解决的办法是升级到3.10以上再来搞后面的,尝试改成用miniconda,其默认的python版本为3.12.7
但在安装 pip install flash-attn==2.7.0.post2 时报错了:
(base) root@e08c4f4c64c5:/mnt/ZhiLight# pip install flash-attn==2.7.0.post2
Collecting flash-attn==2.7.0.post2
Using cached flash_attn-2.7.0.post2.tar.gz (2.7 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
fatal: not a git repository (or any of the parent directories): .git
/tmp/pip-install-a1go4zb7/flash-attn_5ea399dc26b74c9e96a71e8fc981817b/setup.py:99: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-a1go4zb7/flash-attn_5ea399dc26b74c9e96a71e8fc981817b/setup.py", line 183, in
CUDAExtension(
File "/root/miniconda3/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 1076, in CUDAExtension
library_dirs += library_paths(cuda=True)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 1207, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2416, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
...

@unix1986
Copy link
Collaborator

@zhudy 我今天会push一个直接可用的镜像和供参考的Dockerfile。之前提供的base镜像需要自行安装python/nccl和requirements.txt中的少量依赖,不够方便。

@unix1986
Copy link
Collaborator

@zhudy 我今天会push一个直接可用的镜像和供参考的Dockerfile。之前提供的base镜像需要自行安装python/nccl和requirements.txt中的少量依赖,不够方便。

# Docker image
# CUDA: 12.4.1 Driver: 550.54.15及兼容版本
docker pull ghcr.io/zhihu/zhilight/zhilight:0.4.8-cu124
# CUDA: 12.5.1 Driver: 555.42.02及兼容版本
docker pull ghcr.io/zhihu/zhilight/zhilight:0.4.8-cu125
# 以下Dockerfile可供参考,选用官方cuDNN镜像构建自己特定版本的镜像
# Dockerfile: https://github.com/zhihu/ZhiLight/blob/main/docker/Dockerfile

@zhudy
Copy link
Author

zhudy commented Dec 10, 2024

感谢更新,那我先关闭这个issue,谢谢

@zhudy zhudy closed this as completed Dec 10, 2024
unix1986 added a commit that referenced this issue Dec 12, 2024
@unix1986 unix1986 pinned this issue Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants