Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows ninja error #360

Open
difsch2077 opened this issue Dec 15, 2024 · 16 comments · Fixed by #365
Open

windows ninja error #360

difsch2077 opened this issue Dec 15, 2024 · 16 comments · Fixed by #365

Comments

@difsch2077
Copy link

:/Users/aiqqq/AppData/Local/Temp/params_4e7035be-7fee-49b7-af09-dcb79fa4f3fa.json"; C:/Users/aiqqq/AppData/Local/Programs/Python/Python310/python.exe -u tool.py
Downloading Model to directory: C:\Users\aiqqq.cache\modelscope\hub\black-forest-labs/FLUX.1-dev
2024-12-16 04:39:39,469 - modelscope - INFO - Creating symbolic link [C:\Users\aiqqq.cache\modelscope\hub\black-forest-labs/FLUX.1-dev].
2024-12-16 04:39:39,470 - modelscope - WARNING - Failed to create symbolic link C:\Users\aiqqq.cache\modelscope\hub\black-forest-labs/FLUX.1-dev.
Initializing Flux Pipeline. This may take a few minutes...
Loading checkpoint shards: 100%|█████████████████████████████| 2/2 [00:03<00:00, 1.68s/it]
Loading pipeline components...: 20%|████▊ | 1/5 [00:00<00:00, 9.63it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|████████████████████████| 5/5 [00:00<00:00, 8.75it/s]
Using device: cuda
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py:1964: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Traceback (most recent call last):
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2104, in _run_ninja_build
subprocess.run(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\pycoze-workspace\User\Local\tool\bzliezjhupowjuudulzlufuqhazzkdgr\tool.py", line 87, in
images = generate_images("A futuristic city",
File "D:\pycoze-workspace\User\Local\tool\bzliezjhupowjuudulzlufuqhazzkdgr\tool.py", line 64, in generate_images
init_pipeline(device)
File "D:\pycoze-workspace\User\Local\tool\bzliezjhupowjuudulzlufuqhazzkdgr\tool.py", line 49, in init_pipeline
pipe.to(device)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 454, in to
module.to(device, dtype)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2724, in to
return super().to(*args, **kwargs)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1340, in to
return self.apply(convert)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 900, in apply
module.apply(fn)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 900, in apply
module.apply(fn)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 900, in apply
module.apply(fn)
[Previous line repeated 4 more times]
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 927, in apply
param_applied = fn(param)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
return t.to(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 272, in torch_function
return func(*args, **kwargs)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 298, in torch_dispatch
return WeightQBytesTensor.create(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 139, in create
return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 79, in init
data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering.
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack
data_int32 = torch.ops.quanto.pack_fp8_marlin(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_ops.py", line 1116, in call
return self.op(*args, **(kwargs or {}))
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda_init
.py", line 167, in gptq_marlin_repack
return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits)
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\extension.py", line 44, in lib
self.lib = load(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1314, in load
return jit_compile(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1721, in jit_compile
write_ninja_file_and_build_library(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1833, in write_ninja_file_and_build_library
run_ninja_build(
File "C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2120, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\TH -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
FAILED: gemm_cuda.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\TH -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IC:\Users\aiqqq\AppData\Local\Programs\Python\Python310\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
_ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined
asm volatile(
^

C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^

C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined
asm volatile(
^

C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^

C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\liC:C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined
asm volatile(
^

C:\Users\aiqqq\AppData\Local\Programs\Python\Python310\lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")"
: "=f"(((float *)C_warp)[0]), "=f"(((float *)C_warp)[1]), "=f"(((float *)C_warp)[2]), "=f"(((float *)C_warp)[3])
^

6 errors detected in the compilation of "C:/Users/aiqqq/AppData/Local/Programs/Python/Python310/lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu".
gemm_cuda.cu
ninja: build stopped: subcommand failed.

How to fix?
device:4090
system:win11

@difsch2077
Copy link
Author

difsch2077 commented Dec 15, 2024

also get error after I set os.environ['TORCH_CUDA_ARCH_LIST'] = '8.9' / os.environ['TORCH_CUDA_ARCH_LIST'] = '6.0;6.1;7.0;7.5;8.0;8.6;8.9'

RTX4090

@dacorvo
Copy link
Collaborator

dacorvo commented Dec 16, 2024

@difsch2077 I never tested quanto extensions on Windows to be honest.

@difsch2077
Copy link
Author

Ok, but this is not mentioned in the README.md.

@chenxf85
Copy link

对啊这个问题怎么解决啊,我也碰到了

@KMiNT21
Copy link

KMiNT21 commented Dec 28, 2024

As I understand it, optimum-quanto attempts to compile its files for CUDA when running on Windows. Specifically, the file optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu.

For this, the NVIDIA CUDA compiler, nvcc.exe, is used.
This compiler relies on a Host Compiler.
On Windows, that means cl.exe (Microsoft (R) C/C++ Optimizing Compiler) only.

However, the file gemm_cuda.cu contains assembly syntax with __asm__, which is not supported by cl.exe.

From what I’ve gathered, there doesn’t seem to be a way to force nvcc.exe to use anything other than cl.exe.

There might be a chance to compile everything with GCC on Windows and then "feed" it to nvcc, but I’m not sure if that would work. (Highly unlikely)

Perhaps someone has a more specific solution to this tricky issue?

@nitinmukesh
Copy link

nitinmukesh commented Jan 6, 2025

Facing the same problem.
huggingface/diffusers#10467

Please confirm if this library is not supposed to work on Windows? or no solution available for Windows.

https://learn.microsoft.com/en-us/cpp/assembler/inline/inline-assembler?view=msvc-170

@dacorvo
Copy link
Collaborator

dacorvo commented Jan 6, 2025

quanto can work on windows, but all custom kernels would need to be disabled. The inclusion of the custom kernels happens in this file. By checking the system type, it should be possible to avoid importing the custom extensions.

@nitinmukesh
Copy link

:(
Would be difficult for a non-developer. If possible please provide with changes and I can test.

@dacorvo
Copy link
Collaborator

dacorvo commented Jan 6, 2025

#365 should fix the issue. You can test using the corresponding branch.

@nitinmukesh
Copy link

Thank you @dacorvo

I will try and post results.

@nitinmukesh
Copy link

@dacorvo

I am getting another error
pip install git+https://github.com/huggingface/optimum-quanto.git@refs/pull/365/head

Sample code here
huggingface/diffusers#10467

(venv) C:\ai1\diffuser_t2i>python FLUX_FP8.py
Downloading shards: 100%|███████████████████████████████████████████████████| 2/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:00<00:00,  3.08it/s]
Loading pipeline components...:  80%|████████████████████████▊      | 4/5 [00:01<00:00,  2.13it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████| 5/5 [00:02<00:00,  2.38it/s]
Traceback (most recent call last):
  File "C:\ai1\diffuser_t2i\FLUX_FP8.py", line 24, in <module>
    image = pipe(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\diffusers\pipelines\flux\pipeline_flux.py", line 783, in __call__
    ) = self.encode_prompt(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\diffusers\pipelines\flux\pipeline_flux.py", line 370, in encode_prompt
    prompt_embeds = self._get_t5_prompt_embeds(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\diffusers\pipelines\flux\pipeline_flux.py", line 256, in _get_t5_prompt_embeds
    prompt_embeds = self.text_encoder_2(text_input_ids.to(device), output_hidden_states=False)[0]
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 708, in pre_forward
    module.to(self.execution_device)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\transformers\modeling_utils.py", line 3164, in to
    return super().to(*args, **kwargs)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 272, in __torch_function__
    return func(*args, **kwargs)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 298, in __torch_dispatch__
    return WeightQBytesTensor.create(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 139, in create
    return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 79, in __init__
    data_packed = MarlinF8PackedTensor.pack(data)  # pack fp8 data to in32, and apply marlier re-ordering.
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack
    data_int32 = torch.ops.quanto.pack_fp8_marlin(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\torch\_ops.py", line 1225, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' 'quanto' object has no attribute 'pack_fp8_marlin'

@dacorvo
Copy link
Collaborator

dacorvo commented Jan 10, 2025

@nitinmukesh thank you for testing this: I pushed a fix for that error too (sorry). Let me know the result if you have the time to test with that change.

@nitinmukesh
Copy link

nitinmukesh commented Jan 10, 2025

@dacorvo

No issues at all. yes I will test it.
Please let me know if I should reinstall using
pip install git+https://github.com/huggingface/optimum-quanto.git@refs/pull/365/head

or another pull request

[EDIT]
Found answer
https://github.com/huggingface/optimum-quanto/pull/366/files

@dacorvo
Copy link
Collaborator

dacorvo commented Jan 10, 2025

Just the same pull-request

@nitinmukesh
Copy link

I merged both PR, 365 and 366 locally and it's working fine on Windows 11.
Thank you.

@nitinmukesh
Copy link

@difsch2077

Please you can also verify and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants