Draft: Port hw-native flash attention example to PTODSL#449
Draft
jimmychou0 wants to merge 2 commits into
Draft
Conversation
23a5955 to
8a19a6e
Compare
Zhendong404
requested changes
May 29, 2026
| if CAUSAL: | ||
| raise ValueError("causal masking is not part of the hw-native source port yet") | ||
|
|
||
| c0 = pto.const(0) |
Collaborator
There was a problem hiding this comment.
当前PTODSL支持直接写字面量,不用pto.const构造
|
|
||
| qk_slot_ptr = gm_slot_buffer | ||
| pv_slot_ptr = pto.addptr(gm_slot_buffer, gm_pv_off_f32) | ||
| p_slot_ptr = pto.addptr(gm_slot_buffer_fp16, gm_p_off_f32 * 2) |
Collaborator
There was a problem hiding this comment.
make_tensor_view可以传offset参数,不用在这里构造ptr
Collaborator
There was a problem hiding this comment.
说错了,可以用partition_view + offset构造slice,更加符合PTO的style
| blayout="RowMajor", | ||
| slayout="ColMajor", | ||
| ) | ||
| k_right = pto.alloc_tile(shape=[HEAD, CUBE_S1], dtype=pto.f16, memory_space=pto.MemorySpace.RIGHT) |
Collaborator
There was a problem hiding this comment.
MemorySpace可以直接写字符串或Enum,字符串会简洁一些
| ) | ||
| for static_row_slice in range(row_slice_count): | ||
| with pto.if_(row_slice == pto.const(static_row_slice)) as br: | ||
| with br.then_: |
Collaborator
There was a problem hiding this comment.
当前的branch语法还是太折磨了,后面想想有没有改进方案
| } | ||
|
|
||
|
|
||
| def emit_flash_attention_mlir( |
Collaborator
There was a problem hiding this comment.
函数名可以统一改一下,不用叫emit_xxx吧,正常表达计算逻辑就行了
|
|
||
|
|
||
| @pto.simd | ||
| def finalize_and_store_output(o_tile: pto.Tile, running_sum: pto.Tile): |
|
|
||
| def emit_softmax(tile_id, ring_id, is_init): | ||
| slot_id = tile_id % cSLOT_NUM | ||
| with pto.for_(0, row_slice_count, step=1) as row_slice: |
Collaborator
There was a problem hiding this comment.
这个softmax是按行单独做softmax的,我感觉不是很高效。后面我们会替换成另一个版本
1ff1dd8 to
75bd970
Compare
75bd970 to
74d0d1c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Scope
Validation