Command embedding

**Aim**: Explore a basic prototype for creating a vector database of malicious shell commands to compare LLM system commands to.

1. Create vector database file
2. Selective normalisation for file paths, usernames, variables etc
3. Create embeddings of normalised command
5. Read Promptwright JSONL dataset, input embeddings to DB
6. Use some similarity metric to compare an input to the entries in the DB
7. Test with known malicious and known benign cases

To do:
- Look into specialised embeddings models for dealing with code rather than generic text
- Consider options for normalisation
- Consider options for similarity metric use
- How to deal with the malicious command being surrounded by benign commands

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command embedding #730

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Command embedding #730

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions