Bulter.py: Adds preprocess command for local preprocess#5266
Bulter.py: Adds preprocess command for local preprocess#5266
preprocess command for local preprocess#5266Conversation
preprocess command for local preprocess
| return uworker_env | ||
|
|
||
|
|
||
| def _early_setup(args): |
There was a problem hiding this comment.
nit: any reason not to name it just setup?
There was a problem hiding this comment.
Since most of this script is just env setup i thought it was easier for me to understand what made this method different or what is purpose is if i called it early_setup
There was a problem hiding this comment.
I think setup by itself makes more sense and it is also more common to find throughout the codebase but I am also okay if we go through with it as is.
Did you use a |
|
Does it need to be a subcommand of |
| uworker_env = _get_job_environment(args.job) | ||
| uworker_env.update(_get_fuzzer_environment(args.fuzzer, args.job)) | ||
|
|
||
| # Replicate what process_command_impl does in a real tworker |
There was a problem hiding this comment.
Could we use process_command_impl() then instead?
There was a problem hiding this comment.
At the end of said method we call run_command():
https://github.com/google/clusterfuzz/blob/master/src/clusterfuzz/_internal/bot/tasks/commands.py#L482
Which in turns triggers a workflow in which the preprocess step immediately queues the main task for remote execution when finished or just straight ups executes all 3 steps in the same machine(depending on setup), but we don't want that, we want to stop just after finishing the preprocess so we could manually trigger the main portion wherever and whenever we need to
Not in this case, but its possible to use a service account, you just need to generate a key, save it in your local and set it up as the default credentials for any gcloud library and cli operation. This is done using the Added more context in the description so future reviewers can easily understand this |
Yes, its need to as butler already handles a lot of bootstrapping operations for the same purpose, for example if we didn't use |
I think we are referring to different things, what I mean is that I think we could have as a standalone butler script so that we can run it with as python butler.py run <name_of_script> --non-dry-run --config $MY_DIRThat way we don't have to handle all of that by ourselves. Does it make sense? |
Thanks! |
Adds
preprocessbutler scriptThis command allows developers to trigger the preprocess portion of a
fuzztask and in consecuence generate the serialized and compresseduworker_inputpayload, upload it to real GCS, and get the signed download URL, exactly as it happens remotely. We can then use the resulting url to trigger a task in any backend that we want:utask_mainqueueThis accelerates local debugging of the
tworkerpreprocessing phase without relying on remote execution queues, which has proven to take multiple hours to "ACK" a task request.Note: To use this command you need the
Secret Manager Secret Accessorfor Dev or setup a service account key in your local(by using thegcloud authcli) that has said role and any other role required for a tworker'spreprocess.Changes
Tests performed
Executed the following command in dev:
Successfully creates and uploads the payload and returns a valid signed URL. This signed url was later used to trigger a swarming task trough prpc, here are the logs