-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windows ninja error #360
Comments
also get error after I set os.environ['TORCH_CUDA_ARCH_LIST'] = '8.9' / os.environ['TORCH_CUDA_ARCH_LIST'] = '6.0;6.1;7.0;7.5;8.0;8.6;8.9' RTX4090 |
@difsch2077 I never tested quanto extensions on Windows to be honest. |
Ok, but this is not mentioned in the README.md. |
对啊这个问题怎么解决啊,我也碰到了 |
As I understand it, optimum-quanto attempts to compile its files for CUDA when running on Windows. Specifically, the file optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu. For this, the NVIDIA CUDA compiler, nvcc.exe, is used. However, the file gemm_cuda.cu contains assembly syntax with From what I’ve gathered, there doesn’t seem to be a way to force nvcc.exe to use anything other than cl.exe. There might be a chance to compile everything with GCC on Windows and then "feed" it to nvcc, but I’m not sure if that would work. (Highly unlikely) Perhaps someone has a more specific solution to this tricky issue? |
Facing the same problem. Please confirm if this library is not supposed to work on Windows? or no solution available for Windows. https://learn.microsoft.com/en-us/cpp/assembler/inline/inline-assembler?view=msvc-170 |
quanto can work on windows, but all custom kernels would need to be disabled. The inclusion of the custom kernels happens in this file. By checking the system type, it should be possible to avoid importing the custom extensions. |
:( |
#365 should fix the issue. You can test using the corresponding branch. |
Thank you @dacorvo I will try and post results. |
I am getting another error Sample code here
|
@nitinmukesh thank you for testing this: I pushed a fix for that error too (sorry). Let me know the result if you have the time to test with that change. |
No issues at all. yes I will test it. or another pull request [EDIT] |
Just the same pull-request |
I merged both PR, 365 and 366 locally and it's working fine on Windows 11. |
Please you can also verify and close this. |
:/Users/aiqqq/AppData/Local/Temp/params_4e7035be-7fee-49b7-af09-dcb79fa4f3fa.json"; C:/Users/aiqqq/AppData/Local/Programs/Python/Python310/python.exe -u tool.py
Downloading Model to directory: C:\Users\aiqqq.cache\modelscope\hub\black-forest-labs/FLUX.1-dev
2024-12-16 04:39:39,469 - modelscope - INFO - Creating symbolic link [C:\Users\aiqqq.cache\modelscope\hub\black-forest-labs/FLUX.1-dev].
2024-12-16 04:39:39,470 - modelscope - WARNING - Failed to create symbolic link C:\Users\aiqqq.cache\modelscope\hub\black-forest-labs/FLUX.1-dev.
Initializing Flux Pipeline. This may take a few minutes...
Loading checkpoint shards: 100%|█████████████████████████████| 2/2 [00:03<00:00, 1.68s/it]
Loading pipeline components...: 20%|████▊ | 1/5 [00:00<00:00, 9.63it/s]You set
add_prefix_space
. The tokenizer needs to be converted from the slow tokenizersLoading pipeline components...: 100%|████████████████████████| 5/5 [00:00<00:00, 8.75it/s]
Using device: cuda
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py:1964: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Traceback (most recent call last):
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2104, in _run_ninja_build
subprocess.run(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\pycoze-workspace\User\Local\tool\bzliezjhupowjuudulzlufuqhazzkdgr\tool.py", line 87, in
images = generate_images("A futuristic city",
File "D:\pycoze-workspace\User\Local\tool\bzliezjhupowjuudulzlufuqhazzkdgr\tool.py", line 64, in generate_images
init_pipeline(device)
File "D:\pycoze-workspace\User\Local\tool\bzliezjhupowjuudulzlufuqhazzkdgr\tool.py", line 49, in init_pipeline
pipe.to(device)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 454, in to
module.to(device, dtype)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2724, in to
return super().to(*args, **kwargs)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1340, in to
return self.apply(convert)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 900, in apply
module.apply(fn)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 900, in apply
module.apply(fn)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 900, in apply
module.apply(fn)
[Previous line repeated 4 more times]
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 927, in apply
param_applied = fn(param)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
return t.to(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 272, in torch_function
return func(*args, **kwargs)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 298, in torch_dispatch
return WeightQBytesTensor.create(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 139, in create
return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 79, in init
data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering.
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack
data_int32 = torch.ops.quanto.pack_fp8_marlin(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_ops.py", line 1116, in call
return self.op(*args, **(kwargs or {}))
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda_init.py", line 167, in gptq_marlin_repack
return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\extension.py", line 44, in lib
self.lib = load(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1314, in load
return jit_compile(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1721, in jit_compile
write_ninja_file_and_build_library(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1833, in write_ninja_file_and_build_library
run_ninja_build(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2120, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\TH -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
FAILED: gemm_cuda.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\TH -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS_ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined
asm volatile(
^
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined
asm volatile(
^
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\liC:C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined
asm volatile(
^
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")"
: "=f"(((float *)C_warp)[0]), "=f"(((float *)C_warp)[1]), "=f"(((float *)C_warp)[2]), "=f"(((float *)C_warp)[3])
^
6 errors detected in the compilation of "C:/Users/aiqqq/AppData/Local/Programs/Python/Python310/lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu".
gemm_cuda.cu
ninja: build stopped: subcommand failed.
How to fix?
device:4090
system:win11
The text was updated successfully, but these errors were encountered: