Skip to content

gate mode diff #3

@theNiemand

Description

@theNiemand

Would you consider supporting softplus as an alternative activation mode (e.g., via a flag like gate_mode: "lower_bound" | "softplus")?
And, I tested fla's chunk_kda vs FlashKDA, in end to end result, lower_bound gate mode will cause repeatition in model's output.Both fla's chunk_kda with lower_bound gate mode and FlashKDA have the same problem, but when use fla's chunk_kda with softplus gate mode, it works fine.
the model used is kimi linear (https://modelscope.cn/models/moonshotai/Kimi-Linear-48B-A3B-Instruct).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions