-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
** Please make sure you read the contribution guide and file the issues in the right place. **
Contribution guide.
Describe the bug
The phenomena observed so far are as follows:
When using load_artifacts_tool or preload_memory_tool, these tools generate prompts to guide the LLM.
The problem is that these tools generate dynamic content within those prompts.
load_artifacts
You have a list of artifacts: {json.dumps(artifact_names)}preload_memory
<PAST_CONVERSATIONS>
{full_memory_text}
</PAST_CONVERSATIONS>This is where the problem arises.
These tools use append_instructions, which appends their results to the top-level system_prompt.
Example:
system_instruction"
// omitted
You have a list of artifacts:
["test.txt"]
When the user asks questions about any of the artifacts, you should call the
`load_artifacts` function to load the artifact. Always call load_artifacts
before answering questions related to the artifacts, regardless of whether the
artifacts have been loaded before. Do not depend on prior answers about the
artifacts.
The following content is from your previous conversations with the user.
They may be useful for answering the user's current query.
<PAST_CONVERSATIONS>
Time: 2025-10-20T21:15:13.686996
user: hi
</PAST_CONVERSATIONS>
"
This behavior corrupts all forms of prompt caching, and in the case of explicit caching, it results in severe cost inefficiencies.
As with how both static_instruction and instruction operate in the agent, these prompts should instead be appended to the bottom of the prompt.
One concern is that moving them to the bottom may cause the LLM to become confused about context. However, I believe this issue could be mitigated by making this behavior optional or by issuing a warning when prompt caching is enabled together with these tools.
To Reproduce
Using load_artifacts, preload_memory (always)
I’m ready to contribute to fixing this issue.