feat: log runs jsonl before db removal#295
Conversation
Signed-off-by: HuntTheSun <HuntTheSun@users.noreply.github.com>
|
I will look into .jsonl and your implementation tonight. I havent used it before so I can check if it makes sense then. Regarding the other features let me review and then answer those as well. Thanks for starting the implementation and you are of course free to do so! 😄 |
| console += self._print_console(f" 1. Database preparation") | ||
| if not self.database_class: | ||
| self.database = DatabaseProcessor(self.database_path) | ||
| self.database = DatabaseProcessor(self.database_path, run_rm_log_path=self.run_rm_log_path) |
There was a problem hiding this comment.
For consistency let's just do self.run_rm_log_path here instead of the run_rm_log_path=self.run_rm_log_path, or update the other one, I don't have a preference as long as it's the same.
| module = importlib.util.module_from_spec(spec) | ||
| spec.loader.exec_module(module) | ||
| self.database = module.DatabaseProcessor(self.database_path) | ||
| is_rm_log_unsupported = not hasattr(self.database, "run_rm_log_path") |
There was a problem hiding this comment.
Why did you add this logic? I'm curious when this can actually happen? Because if the flag exists, the logic should be correct. If it does not exist the flag will fail when tried in the CLI right?
There was a problem hiding this comment.
Ah it just landed that this could be for custom database class implementations maybe?
There was a problem hiding this comment.
Yes, that is exactly what its for.
It justs prints a warning if their custom db doesn't implement the attribute "run_rm_log_path" and they use the flag. setattr prevents an AttributeError if run_rm_log_path is accessed but not present on the custom db. I think I check if it exists before using, but I just want to make sure if any changes are made I don't introduce random crashes on custom db
At least I hope that is what will happen, I will have to check, I haven't yet played around with custom db.
|
As always I reviewed the code but that all looks fine so I just added some ideas regarding the implementation. Let me know what you think of the suggestions! Regarding your questions
I think it can already be used in the server right? That part of the code should also pass through the same remove_run functions in the database and So once you call the server like |
| nargs="*", | ||
| default=None, | ||
| ) | ||
| db_group.add_argument( |
There was a problem hiding this comment.
For consistency again I think it would be good to make it similar to other CLI arguments and apart from my 1 mistake with the ssl arguments I never use any '-' dashes. So maybe it would be better to not do it here either: logremovedruns or with my comment from below logdata, or even logremoveddata I'm not sure what a good name would be, open to suggestions here.
Surprisingly AI had a good suggestion for this: --logremoved, which I think is quite clear in it's function and is still readable!
There was a problem hiding this comment.
Thank you and your AI colleague, I didn't like the flag name either. I will change this.
(Btw it will probably be at the very least until next week wednesday until I have this PR ready)
| ) | ||
| self.connection.commit() | ||
| # grab data of the run for later to log in jsonl if flag set | ||
| run_data = self._get_run_data(run_start) if self.run_rm_log_path else None |
There was a problem hiding this comment.
I'm just wondering if this is the nicest solution. Here you implement it only for the runs, but what if people want to use other data than the run data, like test/keyword/suite data?
Isn't it a good idea to add a flag for the datatypes you want to write to a file?
I'm just thinking about how that could work in the CLI, since that would require 2 arguments, a path and a folder.
Maybe a nice addition could be to just add a default .jsonl path, like removed_data.jsonl, in which we can then add the data like:
{ run: "...", suite: "...", test: "...", keyword: "..."}
{ run: "...", suite: "...", test: "...", keyword: "..."}
{ run: "...", suite: "...", test: "...", keyword: "..."}Then in the CLI maybe we could do something like mapping the values all, run, suite, test and keyword combined with a possible path, or a default if not provided. And if no argument is provided at all just use all and a set filename in the root folder.
--log-removed-runs # no argument, assume all datatypes + default .jsonl file: `removed_data.jsonl`
--log-removed-runs all # assume default path again but for only run data
--log-removed-runs custom/path/to/file.jsonl # assume all
--log-removed-runs "run:path/to/custom file.jsonl"
--log-removed-runs "run:test:suite:keyword:path/to/custom file.jsonl"
--log-removed-runs test:keyword
This is quite a change in the implementation, but I think this would just be a lot more flexible for future use instead of only making it for the run data. I do recognize that this could also make the .jsonl a lot bigger but that is up to the user then.
There was a problem hiding this comment.
That could be a nice addition for some people, I will change it, thank you.
--log-removed-runs all # assume default path again but for only run data
did you mean to type runs instead of all there? (and I would name the flag --logremoved if thats okay?)
There was a problem hiding this comment.
Yeah the flag updated comment I made later, but obviously that's the better name!
did you mean to type runs instead of all there?
In that example I did actually mean all, since that's a valid option flag that I would suggest. My suggestion is pretty much the following:
- Name it
logremoved - This cli take 1 optional input
- When the input is not provided, assume the option is
allso log all data types, and use a default .jsonl file path in the root - When an option is provided it should be one of the following:
- Data type(s), split by a colon
- File path, split by a colon
- Valid data types: all, run, suite, test, keyword
- Assume anything else is a file path
So valid examples would be (but there could obviously be more different combinations):
--logremoved
--logremoved run:keyword:my/path/to.jsonl
--logremoved my/path/to.jsonl
--logremoved all
--logremoved run:suite
Does this make sense?
There was a problem hiding this comment.
Okay yeah I read too fast >.>
You meant this line:
--log-removed-runs all # assume default path again but for only run data
And obviously I meant to type run 😅
Although both could be valid:
--logremoved all # assume default path for all data
--logremoved run # assume default path but only run data
There was a problem hiding this comment.
Thank you for looking it over and the quick answer!
Makes sense, I will implement it that way and get back to you when its done!
Have a nice weekend!
Implementation of #282
Hi @timdegroot1996 , i hoped it's alright I moved forward with a draft implementation, I'm not sure how elegantly I handled arg passing and the custom-db tbh.
Questions
Is .jsonl an okay format?
I chose .jsonl because runs are removed one by one, so logging them one by one makes the most sense, which jsonl was invented to make easier. Batch collecting runs to log could cause inconsistencies if something is cancelled/exited during the execution. Also .jsonl is way easier to get the format right compared to .json when logging runs one by one.
Should I explore other formats or is jsonl alright?
Should I also implement feature in the server?
Thank you in advance!