-
Notifications
You must be signed in to change notification settings - Fork 18
feat: log runs jsonl before db removal #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,7 @@ | |
| from time import time | ||
| from datetime import datetime, timezone | ||
| from typing import Union | ||
| import json | ||
|
|
||
| # Explicit adapter for datetime -> ISO string, replacing the deprecated default | ||
| # behaviour removed in Python 3.12+. Compatible with Python 3.8+. | ||
|
|
@@ -13,14 +14,15 @@ | |
|
|
||
|
|
||
| class DatabaseProcessor(AbstractDatabaseProcessor): | ||
| def __init__(self, database_path: Path): | ||
| def __init__(self, database_path: Path, run_rm_log_path=None): | ||
| """This function should handle the connection to the database | ||
| And if required the creation of the tables""" | ||
| self.database_path = database_path | ||
| # handle possible subdirectories before creating database with sqlite | ||
| path = Path(self.database_path) | ||
| path.parent.mkdir(exist_ok=True, parents=True) | ||
| self.connection: sqlite3.Connection | ||
| self.run_rm_log_path = run_rm_log_path | ||
| # create tables if required | ||
| self.open_database() | ||
| self._create_tables() | ||
|
|
@@ -383,6 +385,24 @@ def _get_run_paths(self): | |
| run_paths[entry["run_start"]] = entry.get("path") or "" | ||
| return run_paths | ||
|
|
||
| def _get_run_data(self, run_start): | ||
| """Helper function to get dict of a run """ | ||
| cursor = self.connection.cursor() | ||
| cursor.execute(GET_RUN_INFO_BY_RUN_START, (run_start,)) | ||
|
|
||
| row = cursor.fetchone() | ||
| if not row: | ||
| print(f"Could not find run with run_start {run_start}") | ||
| return None | ||
|
|
||
| columns = [col[0] for col in cursor.description] | ||
| return dict(zip(columns, row)) | ||
|
|
||
| def _log_run_jsonl(self, logpath, run_data): | ||
| """ Logs json of a run row to logpath """ | ||
| with Path(logpath).open("a", encoding="utf-8") as f: | ||
| _ = f.write(json.dumps(run_data, default=str) + "\n") | ||
|
|
||
| def remove_runs(self, remove_runs: list): | ||
| """This function removes all provided runs and all their corresponding data""" | ||
| run_starts, run_names, run_aliases, run_tags, _ = self._get_runs() | ||
|
|
@@ -492,13 +512,17 @@ def _remove_by_limit(self, run: str, run_starts: list): | |
|
|
||
| def _remove_run(self, run_start: str): | ||
| """Helper function to remove the data from all tables""" | ||
| self.connection.cursor().execute(DELETE_FROM_RUNS.format(run_start=run_start)) | ||
| self.connection.cursor().execute(DELETE_FROM_SUITES.format(run_start=run_start)) | ||
| self.connection.cursor().execute(DELETE_FROM_TESTS.format(run_start=run_start)) | ||
| self.connection.cursor().execute( | ||
| DELETE_FROM_KEYWORDS.format(run_start=run_start) | ||
| ) | ||
| self.connection.commit() | ||
| # grab data of the run for later to log in jsonl if flag set | ||
| run_data = self._get_run_data(run_start) if self.run_rm_log_path else None | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm just wondering if this is the nicest solution. Here you implement it only for the runs, but what if people want to use other data than the run data, like test/keyword/suite data? Isn't it a good idea to add a flag for the datatypes you want to write to a file? Maybe a nice addition could be to just add a default .jsonl path, like removed_data.jsonl, in which we can then add the data like: { run: "...", suite: "...", test: "...", keyword: "..."}
{ run: "...", suite: "...", test: "...", keyword: "..."}
{ run: "...", suite: "...", test: "...", keyword: "..."}Then in the CLI maybe we could do something like mapping the values This is quite a change in the implementation, but I think this would just be a lot more flexible for future use instead of only making it for the run data. I do recognize that this could also make the .jsonl a lot bigger but that is up to the user then.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That could be a nice addition for some people, I will change it, thank you.
did you mean to type
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah the flag updated comment I made later, but obviously that's the better name!
In that example I did actually mean
So valid examples would be (but there could obviously be more different combinations): Does this make sense?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay yeah I read too fast >.> You meant this line: And obviously I meant to type run 😅
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for looking it over and the quick answer! |
||
| with self.connection: | ||
| cursor = self.connection.cursor() | ||
| cursor.execute(DELETE_FROM_RUNS.format(run_start=run_start)) | ||
| if cursor.rowcount > 0: # only cleanup related tables if a run would be deleted | ||
| cursor.execute(DELETE_FROM_SUITES.format(run_start=run_start)) | ||
| cursor.execute(DELETE_FROM_TESTS.format(run_start=run_start)) | ||
| cursor.execute(DELETE_FROM_KEYWORDS.format(run_start=run_start)) | ||
| if self.run_rm_log_path and run_data: | ||
| self._log_run_jsonl(self.run_rm_log_path, run_data) | ||
|
|
||
| def vacuum_database(self): | ||
| """This function vacuums the database to reduce the size after removing runs""" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,6 +34,7 @@ def __init__( | |
| timezone: str = "", | ||
| log_url: Optional[str] = None, | ||
| custom_filters: str = "", | ||
| run_rm_log_path: Optional[str] = None, | ||
| ): | ||
| """Sets the parameters provided in the command line""" | ||
| self.database_path = database_path | ||
|
|
@@ -57,6 +58,7 @@ def __init__( | |
| self.timezone = timezone | ||
| self.log_url = log_url | ||
| self.custom_filters = custom_filters | ||
| self.run_rm_log_path = run_rm_log_path | ||
|
|
||
| def initialize_database(self, suppress=True): | ||
| """Function that initializes the database if it does not exist | ||
|
|
@@ -66,7 +68,7 @@ def initialize_database(self, suppress=True): | |
| if not suppress: | ||
| console += self._print_console(f" 1. Database preparation") | ||
| if not self.database_class: | ||
| self.database = DatabaseProcessor(self.database_path) | ||
| self.database = DatabaseProcessor(self.database_path, run_rm_log_path=self.run_rm_log_path) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For consistency let's just do |
||
| else: | ||
| if not suppress: | ||
| console += self._print_console( | ||
|
|
@@ -80,6 +82,13 @@ def initialize_database(self, suppress=True): | |
| module = importlib.util.module_from_spec(spec) | ||
| spec.loader.exec_module(module) | ||
| self.database = module.DatabaseProcessor(self.database_path) | ||
| is_rm_log_unsupported = not hasattr(self.database, "run_rm_log_path") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you add this logic? I'm curious when this can actually happen? Because if the flag exists, the logic should be correct. If it does not exist the flag will fail when tried in the CLI right?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah it just landed that this could be for custom database class implementations maybe?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that is exactly what its for. |
||
| setattr(self.database, "run_rm_log_path", self.run_rm_log_path) | ||
| if is_rm_log_unsupported and self.run_rm_log_path and not suppress: | ||
| console += self._print_console( | ||
| " WARNING The custom database class does not explicitly support '--log-removed-runs'. " | ||
| "Logging might be skipped if the custom class overrides the deletion logic." | ||
| ) | ||
| if not suppress: | ||
| console += self._print_console( | ||
| f" created database: '{self.database_path}'" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency again I think it would be good to make it similar to other CLI arguments and apart from my 1 mistake with the ssl arguments I never use any '-' dashes. So maybe it would be better to not do it here either: logremovedruns or with my comment from below logdata, or even logremoveddata I'm not sure what a good name would be, open to suggestions here.
Surprisingly AI had a good suggestion for this: --logremoved, which I think is quite clear in it's function and is still readable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you and your AI colleague, I didn't like the flag name either. I will change this.
(Btw it will probably be at the very least until next week wednesday until I have this PR ready)