Skip to content

feat(compilation): Extend sequence parallelism to activations#1523

Open
AlpinDale wants to merge 1 commit into
mainfrom
feat/sequence-parallel-activations
Open

feat(compilation): Extend sequence parallelism to activations#1523
AlpinDale wants to merge 1 commit into
mainfrom
feat/sequence-parallel-activations

Conversation

@AlpinDale
Copy link
Copy Markdown
Collaborator

Adds a new pattern to the sequence parallelism pass to support activations like SiLU and GELU.

This transforms "AllReduce -> Activation" into "ReduceScatter -> Activation -> AllGather", enabling further fusion opportunities.

Adds a new pattern to the sequence parallelism pass to support activations like SiLU and GELU.

This transforms "AllReduce -> Activation" into "ReduceScatter -> Activation -> AllGather", enabling further fusion opportunities.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @AlpinDale, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the compilation process by extending sequence parallelism to include activation functions. By introducing a specialized pattern for operations like SiLU and GELU, it reorders the communication and computation steps from a simple AllReduce followed by activation to a more optimized ReduceScatter, Activation, and then AllGather sequence. This strategic restructuring is designed to create new opportunities for operation fusion, ultimately leading to more efficient model execution.

Highlights

  • New Optimization Pattern: Introduced a new AllReduceActivationPattern class to extend sequence parallelism optimizations to activation functions.
  • Transformation Logic: Implemented a transformation that converts the AllReduce -> Activation sequence into ReduceScatter -> Activation -> AllGather.
  • Activation Support: Enabled this new optimization pattern for common activation functions, specifically SiLU and GELU, within the SequenceParallelismPass.
  • Performance Enhancement: The change aims to unlock further fusion opportunities and improve compilation efficiency for models utilizing these activations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new pattern to the sequence parallelism pass to handle activations like SiLU and GELU, transforming AllReduce -> Activation into ReduceScatter -> Activation -> AllGather. The implementation is solid and follows the existing structure of the file.

I have two suggestions to improve maintainability:

  1. Refactor the hardcoded list of activation functions into a module-level constant for better readability.
  2. A suggestion for a future refactoring of the helper class hierarchy to better separate concerns, as the new pattern is forced to handle a parameter (epsilon) that is irrelevant to it.

Overall, this is a great addition that enables more fusion opportunities.

device: str,
**kwargs):
# activation_op is e.g. torch.ops.aten.silu.default
super().__init__(epsilon=0.0, dtype=dtype, device=device, **kwargs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Passing epsilon=0.0 here is functionally correct, but it indicates a minor issue with the class hierarchy. AllReduceActivationPattern does not use epsilon, but it's forced to provide it because its parent class, _SequenceParallelPatternHelper, is tightly coupled with RMSNorm-specific parameters.

For better long-term maintainability, consider refactoring the helper classes. A new base class could be introduced for sequence parallelism communication helpers, which AllReduceActivationPattern could inherit from directly. The existing _SequenceParallelPatternHelper could then combine both communication and RMSNorm functionalities.

This is not a blocker for the current change but a suggestion for future code health.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant