Add fuse_gelu_sigmoid_approximation optimization pass#2648
Add fuse_gelu_sigmoid_approximation optimization pass#2648Rakshitha-Ireddi wants to merge 3 commits intoapple:mainfrom
Conversation
This pass detects the pattern x * sigmoid(1.702 * x) which is the sigmoid approximation of GELU, and fuses it into a single gelu op with mode=SIGMOID_APPROXIMATION. Changes: - Add optimize_gelu_sigmoid.py with fuse_gelu_sigmoid_approximation pass - Register the pass in __init__.py - Add pass to the default pass pipeline after fuse_gelu_exact - Add comprehensive tests in test_gelu_sigmoid_pass.py
| | | | ||
| |---------------------------------------- | ||
|
|
||
| Pattern 2: x -> mul(x) -> sigmoid(1.702 * x) -> output (less common) |
There was a problem hiding this comment.
It's not clear to me where this pattern is being detected.
There was a problem hiding this comment.
I have removed the Pattern 2 reference from the docstring since only Pattern 1 (x -> mul(1.702) -> sigmoid -> mul(x)) is implemented.
|
The new unit tests for this PR are failing on CI. Please take a look: |
- Remove incorrect early-return check that rejected patterns where final mul is block output - Change variable comparison to use == instead of .name comparison - Remove unimplemented Pattern 2 from docstring - Fix expected_output_shapes to use dict format instead of set - Remove unused imports (pytest, PASS_REGISTRY)
|
Fixed! The issues were:
|
|
@TobyRoseman Could you please check and let me know what can i do for this. |
|
@Rakshitha-Ireddi - I agree that failure is not related to your PR; I'm seeing the same failure in |
|
Thanks for checking! I will hold off on changes and wait for your update on the main branch issue. |
Description
Add
fuse_gelu_sigmoid_approximationoptimization pass that detects and fuses the sigmoid approximation of GELU activation function.The pattern
x * sigmoid(1.702 * x)is fused intogelu(x, mode="SIGMOID_APPROXIMATION").Motivation
The sigmoid approximation of GELU is commonly used in neural networks (e.g., some transformer variants, GPT-style models). Currently, this pattern remains decomposed as
mul -> sigmoid -> mul, missing the opportunity for backend optimizations. This pass enables the backend to recognize and potentially accelerate this activation pattern.Changes
optimize_gelu_sigmoid.pywithfuse_gelu_sigmoid_approximationpass__init__.pyfuse_gelu_exacttest_gelu_sigmoid_pass.pyTesting
Files Changed
coremltools/converters/mil/mil/passes/defs/optimize_gelu_sigmoid.pycoremltools/converters/mil/mil/passes/tests/test_gelu_sigmoid_pass.pycoremltools/converters/mil/mil/passes/__init__.pycoremltools/converters/mil/mil/passes/pass_pipeline.pyContributors