Skip to content

About Knowledge Distillation #1180

@SakuraGoH

Description

@SakuraGoH

I plan to use a larger network to distill a smaller one. And I would like to know what is the theoretical upper limit of the strength of a weaker network if it is trained using training data from a stronger network. For example, I train b20c256 with the training data of b28c512nbt, will b20c256 become stronger? If it will, how much stronger will it become? Can it be stonger than b40c256? To say the least, can it be stronger than itself?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions