-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Problem/Motivation
(Solution inspired on memvid .. )
The eat_agent_experiences collection is designed to grow indefinitely. Over time, this will lead to:
- Increased MongoDB Costs: Storing millions of detailed experience documents, including large vector embeddings, can become expensive.
- Performance Degradation: Querying a massive "hot" collection can become slower, even with proper indexing.
While old experiences are less frequently needed for real-time decisions, they remain valuable for long-term analysis, auditing, and potential system rollbacks. We need a strategy to move this "cold" data to a cheaper, more compact storage format. The memvid project demonstrates an innovative approach using video compression that is perfect for this use case.
Proposed Solution
We will create a background process (an ArchivingAgent or a dedicated script) that periodically archives old experiences from the eat_agent_experiences collection into highly compressed video files. This process will serialize each experience document to JSON, encode the JSON into a QR code, and write that QR code as a frame in a video file. Once successfully archived, the old records will be removed from MongoDB.
This issue does not include implementing the reader for these archives; it focuses solely on the creation and cleanup process.
Implementation Details
-
Create an Archive Encoder Utility:
- Create a new utility file:
evolving_agents/memory/archive_encoder.py. - Implement a function
encode_experiences_to_video(experiences: List[dict]) -> bytes:- This function will take a list of experience documents (as Python dicts).
- It will iterate through the experiences:
- For each
experience, serialize it to a JSON string usingjson.dumps(..., default=str). - Encode the JSON string into a QR code image using the
qrcodelibrary. - Convert the
PIL.Imageinto anOpenCVframe (numpy array).
- For each
- Use
opencv-python'scv2.VideoWriterto write each frame to an in-memory video stream or a temporary file. A robust codec like H.264 (via'mp4v') is sufficient. - The function will return the final video file as a
bytesobject.
- Create a new utility file:
-
Create the Archiving Script:
- Create a new script:
scripts/archive_experiences.py. - This script will contain the main logic:
a. Define anARCHIVE_THRESHOLD_DAYSconstant (e.g., 90).
b. Connect to MongoDB.
c. Query for Old Experiences: Find all documents ineat_agent_experienceswhere thetimestampis older than the threshold.
d. Fetch Documents: Retrieve the documents to be archived. If there are none, exit gracefully.
e. Encode to Video: Pass the list of documents to theencode_experiences_to_videoutility function.
f. Save the Archive: Write the returned video bytes to a file with a timestamped name (e.g.,archives/experiences_archive_2024-08-15.mkv) in a designated directory.
g. Verification (Important!): As a basic check, ensure the saved file is not zero-sized.
h. Delete from MongoDB: Once the archive file is successfully saved, use the_ids of the archived documents to perform adeleteManyoperation on theeat_agent_experiencescollection.
- Create a new script:
-
Update Documentation:
- Add a section to
docs/ARCHITECTURE.mdexplaining the new cold storage and archiving process. - Add a new document in
docs/guides/explaining how to run the archiving script.
- Add a section to
Acceptance Criteria
- The
archive_encoder.pyutility is created and can convert a list of dictionaries into a video file. - The
scripts/archive_experiences.pyscript successfully identifies and fetches old experience documents from MongoDB. - The script correctly generates and saves a video archive file.
- After the archive is saved, the corresponding documents are deleted from the
eat_agent_experiencescollection. - The process is robust and does not delete data if the archive creation fails.
- Project documentation is updated to reflect this new feature.