Perform initial experiments with the contextual log line embeddings.
Our current embedding is based on aggregating (averaging) of per-token fastText embeddings. Contextual embeddings are expected to improve the performance of the downstream task similarly to NLP.
- start with pre-trained BERT-like Transformer models (https://huggingface.co/, https://www.sbert.net/, https://simpletransformers.ai/), then:
- continue with unsupervised pretraining with objectives like masked language modeling (MLM) or next sentence prediction (NSP)
- finetune on labeled log data
- analyze the embeddings (clustering, t-SNE visualizations...)
- add to LAD benchmark suite and compare with other methods
Perform initial experiments with the contextual log line embeddings.
Our current embedding is based on aggregating (averaging) of per-token fastText embeddings. Contextual embeddings are expected to improve the performance of the downstream task similarly to NLP.