Fix attention backward dropout #81

shubhamchandak94 · 2025-08-07T08:48:30Z

Issue #, if available:

None, although closely related to aws-neuron/aws-neuron-sdk#1156.

Description of changes:

Existing dropout implementation in flash attention backward kernel had a couple issues:

Using softmax_y post-dropout for computing softmax_dx_local.
Not applying dropout to softmax_dy before using it to compute softmax_dx_local (subsequently used to compute dq and dk).

The CR updates the implementation to correctly comply with reference pseudocode as provided in https://arxiv.org/pdf/2205.14135 (Section B.4, algorithm 4).

Testing:

Please see detailed unit test requirements in the CONTRIBUTING.md

The change is covered by numeric check using nki.baremetal
The change is covered by performance benchmark test using nki.benchmark
The change is covered by end-to-end integration test

I tested locally with a golden function to make sure output is accurate and performance is as expected with and without dropout.

Pull Request Checklist

I have filled in all the required field in the template
I have tested locally that all the tests pass
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT-0 license.

Fix attention backward dropout

5d50b00

aws-qieqingy approved these changes Aug 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix attention backward dropout #81

Fix attention backward dropout #81

Uh oh!

shubhamchandak94 commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix attention backward dropout #81

Are you sure you want to change the base?

Fix attention backward dropout #81

Uh oh!

Conversation

shubhamchandak94 commented Aug 7, 2025

Issue #, if available:

Description of changes:

Testing:

Pull Request Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants