Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

Command embedding #730

@poppysec

Description

@poppysec

Aim: Explore a basic prototype for creating a vector database of malicious shell commands to compare LLM system commands to.

  1. Create vector database file
  2. Selective normalisation for file paths, usernames, variables etc
  3. Create embeddings of normalised command
  4. Read Promptwright JSONL dataset, input embeddings to DB
  5. Use some similarity metric to compare an input to the entries in the DB
  6. Test with known malicious and known benign cases

To do:

  • Look into specialised embeddings models for dealing with code rather than generic text
  • Consider options for normalisation
  • Consider options for similarity metric use
  • How to deal with the malicious command being surrounded by benign commands

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions