Support for running transform scripts during result updates. #6196

labkey-klum · 2025-01-07T19:59:54Z

Rationale

This PR includes two main changes:

Adds the ability to run transform scripts when assay results are updated, currently in the app this is only via grid or bulk (updates via file is not yet supported).
Fixes the issue discovered during the implementation of adding plate data after the initial import (via re-import) where transform scripts wouldn't run in the proper sequence. The problem is that any merging of existing and new data occurs before a transform script. So if a transform script was needed to parse the new data, the result of the re-import would be only new rows would appear.

Related Pull Requests

Changes

Transform script invocation also occurs in the AssayResultUpdateService which is where updates happen. If the transform script has altered any rows, those will be merged with the original set of results.
Merging of previous and new run data is now handled in AbstractAssayTsvDataHandler.insertRowData which would be after any transform scripts may have been run.
Note: By default, existing transform scripts will only run during insert. Until the new UI to configure a script is implemented, a transform script can be run on update by commenting out lines : DataTransformService 94:95.
Introduced a new replacement token : transformOperation so that a script author can conditionally add logic for different transform operations:

# demo usage of the operation replacement token by raising an error on update
if ("${transformOperation}" == "UPDATE") {
    errorsDf <- data.frame(type=c('error'), prop=c('test property'), msg=c('Invalid operation for update'));

Run info properties for Update

During update, a runProperties.tsv file is still written out for the transform script to parse. While no new properties have been introduced, the properties that are available during update are a subset of the properties available during insert. The list of run properties during update include (see the spec for more details):

assayName
assayType
baseUrl
containerPath
errorsFile
protocolDescription
protocolId
protocolLsid
runDataFile
userName
workingDir

…ropertiesFile

cnathe · 2025-01-08T22:28:06Z

assay/api-src/org/labkey/api/assay/AssayResultUpdateService.java

+            Container container,
+            User user,
+            List<Map<String, Object>> rows,
+            List<Map<String, Object>> oldKeys


minor: looks like this param is unused and can be removed

good catch, done.

cnathe · 2025-01-08T22:31:51Z

assay/api-src/org/labkey/api/assay/AssayResultUpdateService.java

+        {
+            List<Map<String, Object>> rowsForTransform = resolveRows(container, user, rows);
+            AssayTransformContext context = new AssayTransformContext(container, user, rowsForTransform, _schema.getProtocol(), _schema.getProvider());
+            TransformResult result = DataTransformService.get().transformAndValidate(context, null, DataTransformService.TransformOperation.UPDATE);


I'm surprised that the run 2nd param isn't needed here, but maybe it is just for writing the transform run properties (which aren't supported for update).

Yes, lucky for us it isn't. On the way out, it's to create some of the run properties info. Coming back in, we use it to create a protocol application so that in the experiment run graph we see the transform as an additional step (which is very cool). Unfortunately on update, there is no lineage graph that we can tie the operation to so it's not apparent that a transform has been run. I think this is okay for now but something we may want to consider in the future. I think having some derivation protocol would be the wrong way to go here but maybe an audit record?

cnathe · 2025-01-08T22:38:19Z

assay/api-src/org/labkey/api/assay/AssayResultUpdateService.java

+                        if (dataTypeHandled)
+                            throw new BatchValidationException(new ValidationException(String.format("There was more than one transformed file found for the data type : %s.", context.getProvider().getDataType())));
+                        dataTypeHandled = true;


what is an example of when this exception would be hit? The ExpData data object here is the specific run data row right?

would this be if for some reason the update transform script wrote out the same row primary key value to multiple rows?

I can't think of a direct way a script could intentionally do this but if we made a mistake in the processing of script outputs, it would mean we had multiple files that were generated for the result data (and we wouldn't have a good way to pick the correct one), so this feels like it is an unrecoverable state.

labkey-klum added 7 commits December 31, 2024 16:34

wire up transform scripts for assay result updates

561ab0f

additional validation for supported transform operations

1dc081f

Merge branch 'develop' into fb_transform_on_update_2

4ca2412

transform operation replacement token

1a738c7

Merge branch 'develop' into fb_transform_on_update_2

9050095

handle merging of re-run assay data to be transform script compatible

f8998f8

remove debug code

22c748d

labkey-klum mentioned this pull request Jan 7, 2025

Support for running transform scripts during result updates. LabKey/commonAssays#833

Merged

labkey-klum requested a review from cnathe January 7, 2025 20:28

Restore the handling of the maximumSeverity prop from transformedRunP…

7210b8d

…ropertiesFile

cnathe approved these changes Jan 8, 2025

View reviewed changes

code review feedback

2defde6

labkey-klum merged commit 19b4390 into develop Jan 9, 2025
2 checks passed

labkey-klum deleted the fb_transform_on_update_2 branch January 9, 2025 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for running transform scripts during result updates. #6196

Support for running transform scripts during result updates. #6196

Uh oh!

labkey-klum commented Jan 7, 2025 •

edited

Loading

Uh oh!

cnathe Jan 8, 2025

Uh oh!

labkey-klum Jan 9, 2025

Uh oh!

cnathe Jan 8, 2025

Uh oh!

labkey-klum Jan 9, 2025

Uh oh!

cnathe Jan 8, 2025

Uh oh!

cnathe Jan 8, 2025

Uh oh!

labkey-klum Jan 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support for running transform scripts during result updates. #6196

Support for running transform scripts during result updates. #6196

Uh oh!

Conversation

labkey-klum commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Related Pull Requests

Changes

Run info properties for Update

Uh oh!

cnathe Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

labkey-klum Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

cnathe Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

labkey-klum Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

cnathe Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

cnathe Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

labkey-klum Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

labkey-klum commented Jan 7, 2025 •

edited

Loading