You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As you mentioned in your first article, I now have a requirement to modify the KVCache. The best way to do this is by using your Fused-RoPE Attention, but it can only be applied when the positions of Q and K are sequential. I studied your code, and in prefill.cuh, I noticed that you actually left interfaces to get the positions for Q and K. When I run it with a single batch, it works correctly. I would like to ask if these two interfaces are not exposed for any particular bug reasons.
The text was updated successfully, but these errors were encountered:
if these two interfaces are not exposed for any particular bug reasons.
It's only because I haven't got time to work on that... MLC-LLM uses the C++ APIs but we haven't exposed it in Python.
We welcome contributions from the community :)
As you mentioned in your first article, I now have a requirement to modify the KVCache. The best way to do this is by using your Fused-RoPE Attention, but it can only be applied when the positions of Q and K are sequential. I studied your code, and in prefill.cuh, I noticed that you actually left interfaces to get the positions for Q and K. When I run it with a single batch, it works correctly. I would like to ask if these two interfaces are not exposed for any particular bug reasons.
The text was updated successfully, but these errors were encountered: