SImulator for plan evaluation

[Proposal] Plan-level world model simulator for AssetOpsBench

Hi @DhavalRepo18 and AssetOpsBench team,

I am opening this issue to share that I am currently working on a simulator for AssetOpsBench. The main idea is to extend the [SPIN](https://arxiv.org/abs/2605.14051)-style plan evaluation direction by treating the simulator not only as a tool emulator, but as a plan-level world model that virtually executes an agent's plan before real execution, tracks intermediate states, and estimates failure risk. 

## Background

In the current Plan-Execute workflow, the agent first generates a plan, then resolves tool arguments, calls MCP tools, and finally summarizes the results. In this process, some failures may only become visible after execution starts, such as incorrect argument grounding, missing upstream information, or error propagation from an early step to later steps. A simulator could help evaluate these risks before real execution.

## Proposed contribution

I am currently working on the design of an AssetOpsBench simulator and the supporting database needed for it. The database is intended to include both tool metadata, such as server names, tool names, descriptions, and parameter schemas, and domain state information, such as sites, assets, sensors, work orders, events, and other simulated environment states.

The goal is to use this simulator to take planner outputs, virtually instantiate possible execution trajectories, estimate intermediate states and failure risks, and support decisions such as whether to execute the plan, request replanning, or insert validation steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SImulator for plan evaluation #339

Background

Proposed contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SImulator for plan evaluation #339

Description

Background

Proposed contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions