Skip to content

support quantized models #812

@tharvik

Description

@tharvik

currently, we use pretty much float32 tensors all around, which yields pretty huge models.
after discussion with @martinjaggi, training is hard to do without float32, but inference can probably utilize uint8 tensors, dividing up to 4x the size of trained models.

note: check that the model is still behaving correctly after quantization

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions