[Question] How to support custom stride of paged_kv for hopper prefill attention #702

jianfei-wangg · 2024-12-27T13:35:20Z

Hello, I found that in 0.2.0.post1, the page_kv_t supports custom strides, which means we can use a page_layout of [max_num_pages, num_layer, 2, page_size, num_heads, head_dim]. this works fine with sm < 90 prefill kernels.

However, when I try the above configuration for sm90 prefill kernel, hopper params (flashinfer/attention/hopper/params.cuh) does not have variables like stride_page or custom_strides. So, how can I use non-continious paged_kv for sm90 prefill?

yzh119 · 2024-12-27T22:20:20Z

Hi @jianfei-wangg , yes we can use non-contiguous paged_kv for sm90 prefill.

The stride_page was converted to offsets here (we assume stride_block % stride_n == 0):

flashinfer/flashinfer/prefill.py

Lines 1626 to 1635 in 1c8dc36

    
           sparse_indices = block_sparse_indices_to_vector_sparse_offsets( 
        
               self._paged_kv_indices_buf, 
        
               self._paged_kv_indptr_buf, 
        
               self._vector_sparse_indices_buffer,  # output 
        
               self._vector_sparse_indptr_buffer, 
        
               self._kv_lens_buffer, 
        
               stride_block // stride_n, 
        
               1,  # stride_n // stride_n 
        
               page_size, 
        
           )

The reason we preprocess these offsets instead of computing offsets inside kernels is that we found that producer will be slow if we compute the offsets inside kernels:

flashinfer/include/flashinfer/attention/hopper/block_sparse_gather.cuh

Line 55 in 1c8dc36

return indices_[i];

So we want to avoid pointer arithmetic as much as possible and pre-compute the offsets.

jianfei-wangg · 2024-12-31T03:26:45Z

@yzh119 Thanks for in-time response, I can integrate hopper attention into out framework now.
Besides, I wonder whether block_sparse_indices_to_vector_sparse_offsets can be put into PrefillSM90Plan, thus we can use sm90-attention in the same way as pre-sm90-attention.

yzh119 · 2025-01-01T16:01:13Z

I wonder whether block_sparse_indices_to_vector_sparse_offsets can be put into PrefillSM90Plan, thus we can use sm90-attention in the same way as pre-sm90-attention.

It depends on whether we are using the same indices for each layer, if so, we can move the logic to plan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to support custom stride of paged_kv for hopper prefill attention #702

[Question] How to support custom stride of paged_kv for hopper prefill attention #702

jianfei-wangg commented Dec 27, 2024

yzh119 commented Dec 27, 2024

jianfei-wangg commented Dec 31, 2024

yzh119 commented Jan 1, 2025

[Question] How to support custom stride of paged_kv for hopper prefill attention #702

[Question] How to support custom stride of paged_kv for hopper prefill attention #702

Comments

jianfei-wangg commented Dec 27, 2024

yzh119 commented Dec 27, 2024

jianfei-wangg commented Dec 31, 2024

yzh119 commented Jan 1, 2025