[Proposal] Plan-level world model simulator for AssetOpsBench
Hi @DhavalRepo18 and AssetOpsBench team,
I am opening this issue to share that I am currently working on a simulator for AssetOpsBench. The main idea is to extend the SPIN-style plan evaluation direction by treating the simulator not only as a tool emulator, but as a plan-level world model that virtually executes an agent's plan before real execution, tracks intermediate states, and estimates failure risk.
Background
In the current Plan-Execute workflow, the agent first generates a plan, then resolves tool arguments, calls MCP tools, and finally summarizes the results. In this process, some failures may only become visible after execution starts, such as incorrect argument grounding, missing upstream information, or error propagation from an early step to later steps. A simulator could help evaluate these risks before real execution.
Proposed contribution
I am currently working on the design of an AssetOpsBench simulator and the supporting database needed for it. The database is intended to include both tool metadata, such as server names, tool names, descriptions, and parameter schemas, and domain state information, such as sites, assets, sensors, work orders, events, and other simulated environment states.
The goal is to use this simulator to take planner outputs, virtually instantiate possible execution trajectories, estimate intermediate states and failure risks, and support decisions such as whether to execute the plan, request replanning, or insert validation steps.
[Proposal] Plan-level world model simulator for AssetOpsBench
Hi @DhavalRepo18 and AssetOpsBench team,
I am opening this issue to share that I am currently working on a simulator for AssetOpsBench. The main idea is to extend the SPIN-style plan evaluation direction by treating the simulator not only as a tool emulator, but as a plan-level world model that virtually executes an agent's plan before real execution, tracks intermediate states, and estimates failure risk.
Background
In the current Plan-Execute workflow, the agent first generates a plan, then resolves tool arguments, calls MCP tools, and finally summarizes the results. In this process, some failures may only become visible after execution starts, such as incorrect argument grounding, missing upstream information, or error propagation from an early step to later steps. A simulator could help evaluate these risks before real execution.
Proposed contribution
I am currently working on the design of an AssetOpsBench simulator and the supporting database needed for it. The database is intended to include both tool metadata, such as server names, tool names, descriptions, and parameter schemas, and domain state information, such as sites, assets, sensors, work orders, events, and other simulated environment states.
The goal is to use this simulator to take planner outputs, virtually instantiate possible execution trajectories, estimate intermediate states and failure risks, and support decisions such as whether to execute the plan, request replanning, or insert validation steps.