Conversation
| if __name__ == "__main__": | ||
| output_file = sys.argv[1] | ||
| input_files = sys.argv[2:] | ||
|
|
There was a problem hiding this comment.
One of the first entries in the metadata.json file is version, which is currently on 2. We could check the version and throw an warning (or maybe an error) if its not something we can handle; for instance, if CodeChecker starts emitting new versions of this, we should know about it.
As a fun fact, there is a version 1 to version 2 converter in CodeChecker:
https://github.com/Ericsson/codechecker/blob/1d5c4ffc6cc45f4f6eb61756faee92b8699ff39a/web/client/codechecker_client/metadata.py#L18
There was a problem hiding this comment.
This could be a good use of the common lib mentioned in #81 (fail, warn)
src/metadata_merge.py
Outdated
| # We append info from json2 to json1 from here on out | ||
| json1_root["result_source_files"].update(json2_root["result_source_files"]) | ||
| json1_root["skipped"] = json1_root["skipped"] + json2_root["skipped"] | ||
| # Merge time |
There was a problem hiding this comment.
| # Merge time | |
| # Merge time; we assume here both json files describe jobs in | |
| # the same analysis invocation, implying that the analysis start | |
| # time is the lowest timestamp, and the end is the highest. |
There was a problem hiding this comment.
This is a wrong assumption; every cached action will contain its original times.
Should we leave this time part out of the file? (storing the time is not really hermetic or reproducible)
There was a problem hiding this comment.
Oooh an excellent observation!! We should totally check how incremental analysis works in CodeChecker, lets not be too smart ourselves.
| json1_root = json1["tools"][0] | ||
| json2_root = json2["tools"][0] | ||
| # We append info from json2 to json1 from here on out | ||
| json1_root["result_source_files"].update(json2_root["result_source_files"]) |
There was a problem hiding this comment.
We should comment here that if the runs came from the same bazel invocation then (and lets make sure that this is actually true) the following fields must be exactly teh same:
- name
- version (of codechecker, not the version of the metadata.json file)
- working_directory
- output_path
On the other hand, how about these fields? Are we sure these are the same for every job?:
- command
Also, action_num should be aggregated up to actually reflect the amount of jobs, shouldn't it?
There was a problem hiding this comment.
I tried my best to cover everything here.
There are a couple of misses...
The analyzer tools, for example, have their own version number, but we theoretically could tolerate different versions of them running (?).
Why:
We currently do not produce a
metadata.jsonwith our distributed rule.What:
Added metadata output for all steps, then added a final step merging all metadata files into one.
Addresses:
Fixes: #45