Skip to content

Experiment with contextual embeddings based on Transformer architectures. #5

@dhonza

Description

@dhonza

Perform initial experiments with the contextual log line embeddings.

Our current embedding is based on aggregating (averaging) of per-token fastText embeddings. Contextual embeddings are expected to improve the performance of the downstream task similarly to NLP.

  • start with pre-trained BERT-like Transformer models (https://huggingface.co/, https://www.sbert.net/, https://simpletransformers.ai/), then:
    • continue with unsupervised pretraining with objectives like masked language modeling (MLM) or next sentence prediction (NSP)
    • finetune on labeled log data
  • analyze the embeddings (clustering, t-SNE visualizations...)
  • add to LAD benchmark suite and compare with other methods

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions