Skip to content

Implement memvid-inspired Archiving for Cold Storage #142

@matiasmolinas

Description

@matiasmolinas

Problem/Motivation

(Solution inspired on memvid .. )

The eat_agent_experiences collection is designed to grow indefinitely. Over time, this will lead to:

  1. Increased MongoDB Costs: Storing millions of detailed experience documents, including large vector embeddings, can become expensive.
  2. Performance Degradation: Querying a massive "hot" collection can become slower, even with proper indexing.

While old experiences are less frequently needed for real-time decisions, they remain valuable for long-term analysis, auditing, and potential system rollbacks. We need a strategy to move this "cold" data to a cheaper, more compact storage format. The memvid project demonstrates an innovative approach using video compression that is perfect for this use case.

Proposed Solution

We will create a background process (an ArchivingAgent or a dedicated script) that periodically archives old experiences from the eat_agent_experiences collection into highly compressed video files. This process will serialize each experience document to JSON, encode the JSON into a QR code, and write that QR code as a frame in a video file. Once successfully archived, the old records will be removed from MongoDB.

This issue does not include implementing the reader for these archives; it focuses solely on the creation and cleanup process.

Implementation Details

  1. Create an Archive Encoder Utility:

    • Create a new utility file: evolving_agents/memory/archive_encoder.py.
    • Implement a function encode_experiences_to_video(experiences: List[dict]) -> bytes:
      • This function will take a list of experience documents (as Python dicts).
      • It will iterate through the experiences:
        • For each experience, serialize it to a JSON string using json.dumps(..., default=str).
        • Encode the JSON string into a QR code image using the qrcode library.
        • Convert the PIL.Image into an OpenCV frame (numpy array).
      • Use opencv-python's cv2.VideoWriter to write each frame to an in-memory video stream or a temporary file. A robust codec like H.264 (via 'mp4v') is sufficient.
      • The function will return the final video file as a bytes object.
  2. Create the Archiving Script:

    • Create a new script: scripts/archive_experiences.py.
    • This script will contain the main logic:
      a. Define an ARCHIVE_THRESHOLD_DAYS constant (e.g., 90).
      b. Connect to MongoDB.
      c. Query for Old Experiences: Find all documents in eat_agent_experiences where the timestamp is older than the threshold.
      d. Fetch Documents: Retrieve the documents to be archived. If there are none, exit gracefully.
      e. Encode to Video: Pass the list of documents to the encode_experiences_to_video utility function.
      f. Save the Archive: Write the returned video bytes to a file with a timestamped name (e.g., archives/experiences_archive_2024-08-15.mkv) in a designated directory.
      g. Verification (Important!): As a basic check, ensure the saved file is not zero-sized.
      h. Delete from MongoDB: Once the archive file is successfully saved, use the _ids of the archived documents to perform a deleteMany operation on the eat_agent_experiences collection.
  3. Update Documentation:

    • Add a section to docs/ARCHITECTURE.md explaining the new cold storage and archiving process.
    • Add a new document in docs/guides/ explaining how to run the archiving script.

Acceptance Criteria

  • The archive_encoder.py utility is created and can convert a list of dictionaries into a video file.
  • The scripts/archive_experiences.py script successfully identifies and fetches old experience documents from MongoDB.
  • The script correctly generates and saves a video archive file.
  • After the archive is saved, the corresponding documents are deleted from the eat_agent_experiences collection.
  • The process is robust and does not delete data if the archive creation fails.
  • Project documentation is updated to reflect this new feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions