About Knowledge Distillation

**I plan to use a larger network to distill a smaller one. And I would like to know what is the theoretical upper limit of the strength of a weaker network if it is trained using training data from a stronger network. For example, I train b20c256 with the training  data of b28c512nbt, will b20c256 become stronger? If it will, how much stronger will it become? Can it be stonger than b40c256? To say the least, can it be stronger than itself?**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Knowledge Distillation #1180

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About Knowledge Distillation #1180

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions