-
Notifications
You must be signed in to change notification settings - Fork 7
Support for running transform scripts during result updates. #6196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Container container, | ||
| User user, | ||
| List<Map<String, Object>> rows, | ||
| List<Map<String, Object>> oldKeys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: looks like this param is unused and can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, done.
| { | ||
| List<Map<String, Object>> rowsForTransform = resolveRows(container, user, rows); | ||
| AssayTransformContext context = new AssayTransformContext(container, user, rowsForTransform, _schema.getProtocol(), _schema.getProvider()); | ||
| TransformResult result = DataTransformService.get().transformAndValidate(context, null, DataTransformService.TransformOperation.UPDATE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that the run 2nd param isn't needed here, but maybe it is just for writing the transform run properties (which aren't supported for update).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, lucky for us it isn't. On the way out, it's to create some of the run properties info. Coming back in, we use it to create a protocol application so that in the experiment run graph we see the transform as an additional step (which is very cool). Unfortunately on update, there is no lineage graph that we can tie the operation to so it's not apparent that a transform has been run. I think this is okay for now but something we may want to consider in the future. I think having some derivation protocol would be the wrong way to go here but maybe an audit record?
| if (dataTypeHandled) | ||
| throw new BatchValidationException(new ValidationException(String.format("There was more than one transformed file found for the data type : %s.", context.getProvider().getDataType()))); | ||
| dataTypeHandled = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is an example of when this exception would be hit? The ExpData data object here is the specific run data row right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be if for some reason the update transform script wrote out the same row primary key value to multiple rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a direct way a script could intentionally do this but if we made a mistake in the processing of script outputs, it would mean we had multiple files that were generated for the result data (and we wouldn't have a good way to pick the correct one), so this feels like it is an unrecoverable state.
Rationale
This PR includes two main changes:
Related Pull Requests
Changes
AssayResultUpdateServicewhich is where updates happen. If the transform script has altered any rows, those will be merged with the original set of results.AbstractAssayTsvDataHandler.insertRowDatawhich would be after any transform scripts may have been run.DataTransformService 94:95.transformOperationso that a script author can conditionally add logic for different transform operations:Run info properties for Update
During update, a
runProperties.tsvfile is still written out for the transform script to parse. While no new properties have been introduced, the properties that are available during update are a subset of the properties available during insert. The list of run properties during update include (see the spec for more details):