Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@
**/*.pyo
Dockerfile
.git
.gitignore
.gitignore
amber/.env
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ app-route/*.txt
app-malt/=0.26.0
app-k8s/policies/*
app-k8s/pod_deployment/*
app-k8s/test.txt
app-k8s/test.txt
55 changes: 55 additions & 0 deletions amber/amber-manifest-green.json5
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
{
manifest_version: "0.1.0",
program: {
image: "ghcr.io/froot-netsys/malt_agent:latest",
entrypoint: [
"uv",
"run",
"malt_agent.py",
Copy link
Copy Markdown
Collaborator

@Kolleida Kolleida Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entrypoint should be running the ./start_route_agent.sh in place of uv run malt_agent.py and using the route_agent image (I'm assuming this is for route app?). Also, this container needs to be run with --privileged and mount /lib/modules.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Kolleida! I believe it's not possible to run an image with --privileged in amber. This is why we went with the MALT agent instead. Is there another way we can run your benchmark via amber? Please let me know what image/entrypoint/config parameters to use. Thank you!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CdavM If you are running MALT, then the role should be "malt_operator", not "route_operator" (this was mainly used by the leaderboard query to filter MALT specific results). Also, the config should look something like this:

assessment_config: {
  prompt_type: "zeroshot_base",
  num_queries: 3,
  complexity_level = ["level1", "level2", "level3"],
  output_dir: "dump",
  output_file = "query_output.jsonl"
  benchmark_path: "assessment_queries.jsonl",
  regenerate_benchmark: true
}

This config generates 30 queries in total spread across the 3 levels. Increasing num_queries adds 10 queries (you can choose how much you think is appropriate for good signal).

Later I saw the agentbeats version of NetArena on the website, and the description references the K8s benchmark, but you guys are doing MALT instead. Is this also because the setup needed (e.g. boostrap a KIND cluster) is impossible/hard to express in amber?

"--host",
"0.0.0.0",
"--port",
"8081"
],
env: {
PROXY_URL: "${slots.proxy.url}",
LOG_LEVEL: "INFO",
},
network: {
endpoints: [
{ name: "endpoint", port: 8081 },
],
},
},
config_schema: {
type: "object",
properties: {
},
required: [],
additionalProperties: false,
},
slots: {
proxy: { kind: "a2a" },
},
provides: {
a2a: { kind: "a2a", endpoint: "endpoint" },
},
exports: {
a2a: "a2a",
},
metadata: {
assessment_config: {
prompt_type: "zeroshot_base",
num_queries: 2,
max_iterations: 10,
output_dir: "dump",
benchmark_path: "assessment_error_config.json",
regenerate_benchmark: true,
num_switches: 2,
num_hosts_per_subnet: 1
},
participant_roles: [
"route_operator"
]
}
}
1 change: 1 addition & 0 deletions amber/sample.env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
AMBER_CONFIG_OPENAI_API_KEY=