Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fused-RoPE Attention with q_offset and k_offset #701

Open
qiyuxinlin opened this issue Dec 26, 2024 · 1 comment
Open

Fused-RoPE Attention with q_offset and k_offset #701

qiyuxinlin opened this issue Dec 26, 2024 · 1 comment

Comments

@qiyuxinlin
Copy link

As you mentioned in your first article, I now have a requirement to modify the KVCache. The best way to do this is by using your Fused-RoPE Attention, but it can only be applied when the positions of Q and K are sequential. I studied your code, and in prefill.cuh, I noticed that you actually left interfaces to get the positions for Q and K. When I run it with a single batch, it works correctly. I would like to ask if these two interfaces are not exposed for any particular bug reasons.

@yzh119
Copy link
Collaborator

yzh119 commented Dec 26, 2024

if these two interfaces are not exposed for any particular bug reasons.

It's only because I haven't got time to work on that... MLC-LLM uses the C++ APIs but we haven't exposed it in Python.
We welcome contributions from the community :)

Added to roadmap: #675

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants