Skip to content

Conversation

@labkey-klum
Copy link
Contributor

@labkey-klum labkey-klum commented Jan 7, 2025

Rationale

This PR includes two main changes:

  • Adds the ability to run transform scripts when assay results are updated, currently in the app this is only via grid or bulk (updates via file is not yet supported).
  • Fixes the issue discovered during the implementation of adding plate data after the initial import (via re-import) where transform scripts wouldn't run in the proper sequence. The problem is that any merging of existing and new data occurs before a transform script. So if a transform script was needed to parse the new data, the result of the re-import would be only new rows would appear.

Related Pull Requests

Changes

  • Transform script invocation also occurs in the AssayResultUpdateService which is where updates happen. If the transform script has altered any rows, those will be merged with the original set of results.
  • Merging of previous and new run data is now handled in AbstractAssayTsvDataHandler.insertRowData which would be after any transform scripts may have been run.
  • Note: By default, existing transform scripts will only run during insert. Until the new UI to configure a script is implemented, a transform script can be run on update by commenting out lines : DataTransformService 94:95.
  • Introduced a new replacement token : transformOperation so that a script author can conditionally add logic for different transform operations:
# demo usage of the operation replacement token by raising an error on update
if ("${transformOperation}" == "UPDATE") {
    errorsDf <- data.frame(type=c('error'), prop=c('test property'), msg=c('Invalid operation for update'));

Run info properties for Update

During update, a runProperties.tsv file is still written out for the transform script to parse. While no new properties have been introduced, the properties that are available during update are a subset of the properties available during insert. The list of run properties during update include (see the spec for more details):

  • assayName
  • assayType
  • baseUrl
  • containerPath
  • errorsFile
  • protocolDescription
  • protocolId
  • protocolLsid
  • runDataFile
  • userName
  • workingDir

Container container,
User user,
List<Map<String, Object>> rows,
List<Map<String, Object>> oldKeys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: looks like this param is unused and can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, done.

{
List<Map<String, Object>> rowsForTransform = resolveRows(container, user, rows);
AssayTransformContext context = new AssayTransformContext(container, user, rowsForTransform, _schema.getProtocol(), _schema.getProvider());
TransformResult result = DataTransformService.get().transformAndValidate(context, null, DataTransformService.TransformOperation.UPDATE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that the run 2nd param isn't needed here, but maybe it is just for writing the transform run properties (which aren't supported for update).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lucky for us it isn't. On the way out, it's to create some of the run properties info. Coming back in, we use it to create a protocol application so that in the experiment run graph we see the transform as an additional step (which is very cool). Unfortunately on update, there is no lineage graph that we can tie the operation to so it's not apparent that a transform has been run. I think this is okay for now but something we may want to consider in the future. I think having some derivation protocol would be the wrong way to go here but maybe an audit record?

Comment on lines +161 to +163
if (dataTypeHandled)
throw new BatchValidationException(new ValidationException(String.format("There was more than one transformed file found for the data type : %s.", context.getProvider().getDataType())));
dataTypeHandled = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is an example of when this exception would be hit? The ExpData data object here is the specific run data row right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be if for some reason the update transform script wrote out the same row primary key value to multiple rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a direct way a script could intentionally do this but if we made a mistake in the processing of script outputs, it would mean we had multiple files that were generated for the result data (and we wouldn't have a good way to pick the correct one), so this feels like it is an unrecoverable state.

@labkey-klum labkey-klum merged commit 19b4390 into develop Jan 9, 2025
2 checks passed
@labkey-klum labkey-klum deleted the fb_transform_on_update_2 branch January 9, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants