Skip to content

Clarification on training inputs and outputs #12

@DocDriven

Description

@DocDriven

The HLD is helpful, but what I think is still hard to understand is how you actually train the model, speak how inputs and outputs of the model look like that get used to calculate the loss.

Following the diagram and assuming one input of each type of feature (binary, numeric, categorical), you have 3 inputs looking like this:

[ 1 
  44
  9 ]

These are the inputs of your model, which get initially transformed and concatenated into a vector with 8 features like:

[ 1        // binary
  0.22     // rescaled
  0.2      // rest is random embedding of emb size 6
 -0.12
  0.65
  0.11
 -0.96
 -1.01 ]

As I understand it, you use an autoencoder to reconstruct the concatenated layer, e.g. you minimize the reconstruction error between the concatenated and the output layer, both with 8 features. But doesn't this completely ignore the training of the embedding?

Would you be so kind to give a simple example, which errors you minimize between which inputs and outputs (concrete values are also fine!)

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions