Skip to content

Conversation

@bbimber
Copy link
Collaborator

@bbimber bbimber commented Dec 12, 2025

This addresses a Pipeline-related issue we encountered when updating to 25.11. I believe our issue is an unintended consequence of this commit, which was probably designed to tighten handling of files under the LK file root:

0a066b5

Short Background:

  • The change above is largely fine. The intention of that change is to disallow files to access across containers using relative paths. That's a good idea. There is already a check in AbstractFileXarSource.canonicalizeDataFileURL() to skip this enforcement if the code provides an absolute filepath. These would be fine for our use cases.
  • The problem is that separate code relativizes the filepath when creating the LSID. This is old behavior. The problem is that the absolute filepath recorded by the pipeline code is converted to a relative path when making the LSID, and it throws an exception here. I think this is just an unintended consequence of that change. There should be no security-related issues around the file string that goes into the LSID. This PR addresses the relative path problem, and does not relativize the LSID filepath unless the incoming Path is under the file root..

Specific stacktrace here. In this example, the PipelineJob JSON is providing absolute filepaths, which should be allowable. In 25.11 they throw this exception:

11 Dec 2025 09:58:10,890 ERROR: Path to parent not allowed: ../../../../../../../exacloud/gscratch/prime-seq/workDir/10f37793-b6a1-103e-82cb-f619086abb6f/SequenceA.work/adapters.fasta
java.nio.file.InvalidPathException: Path to parent not allowed: ../../../../../../../exacloud/gscratch/prime-seq/workDir/10f37793-b6a1-103e-82cb-f619086abb6f/SequenceA.work/adapters.fasta
    at org.labkey.api.exp.AbstractFileXarSource.canonicalizeDataFileURL(AbstractFileXarSource.java:126)
    at org.labkey.api.exp.XarSource.getCanonicalDataFileURL(XarSource.java:116)
    at org.labkey.experiment.xar.AutoFileLSIDReplacer.getReplacement(AutoFileLSIDReplacer.java:64)
    at org.labkey.api.exp.xar.Replacer$CompoundReplacer.getReplacement(Replacer.java:48)
    at org.labkey.api.exp.xar.LsidUtils.doReplacements(LsidUtils.java:154)
    at org.labkey.api.exp.xar.LsidUtils.resolveStringFromTemplate(LsidUtils.java:131)
    at org.labkey.api.exp.xar.LsidUtils.resolveLsidFromTemplate(LsidUtils.java:95)
    at org.labkey.api.exp.xar.LsidUtils.resolveLsidFromTemplate(LsidUtils.java:206)
    at org.labkey.experiment.api.ExperimentServiceImpl.createData(ExperimentServiceImpl.java:793)
    at org.labkey.experiment.api.ExperimentServiceImpl.createData(ExperimentServiceImpl.java:327)
    at org.labkey.experiment.pipeline.ExpGeneratorHelper.addData(ExpGeneratorHelper.java:112)
    at org.labkey.experiment.pipeline.ExpGeneratorHelper.addData(ExpGeneratorHelper.java:87)
    at org.labkey.experiment.pipeline.ExpGeneratorHelper._insertRun(ExpGeneratorHelper.java:420)
    at org.labkey.experiment.pipeline.ExpGeneratorHelper.insertRun(ExpGeneratorHelper.java:265)
    at org.labkey.experiment.pipeline.ExpGeneratorHelper.insertRun(ExpGeneratorHelper.java:186)
    at org.labkey.experiment.pipeline.XarGeneratorTask.run(XarGeneratorTask.java:173)
    at org.labkey.api.pipeline.PipelineJob.runActiveTask(PipelineJob.java:831)
    at org.labkey.api.pipeline.PipelineJob.run(PipelineJob.java:1079)
    at org.labkey.pipeline.mule.PipelineJobRunner.run(PipelineJobRunner.java:40)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.mule.impl.model.resolvers.DynamicEntryPoint.invokeMethod(DynamicEntryPoint.java:312)
    at org.mule.impl.model.resolvers.DynamicEntryPoint.invoke(DynamicEntryPoint.java:259)
    at org.mule.impl.DefaultLifecycleAdapter.intercept(DefaultLifecycleAdapter.java:193)
    at org.mule.impl.InterceptorsInvoker.execute(InterceptorsInvoker.java:47)
    at org.mule.impl.model.DefaultMuleProxy.run(DefaultMuleProxy.java:470)
    at org.mule.impl.work.WorkerContext.run(WorkerContext.java:310)
    at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1486)
    at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:391)
    at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:865)
    at org.mule.impl.work.ScheduleWorkExecutor.doExecute(ScheduleWorkExecutor.java:39)
    at org.mule.impl.work.MuleWorkManager.executeWork(MuleWorkManager.java:277)
    at org.mule.impl.work.MuleWorkManager.scheduleWork(MuleWorkManager.java:244)
    at org.mule.impl.model.seda.SedaComponent.run(SedaComponent.java:483)
    at org.mule.impl.work.WorkerContext.run(WorkerContext.java:310)
    at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
    at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)

Below is a segment of the pipeline job JSON:

   } ], [ "org.labkey.api.pipeline.RecordedAction", {
      "_inputs" : [ "java.util.LinkedHashSet", [ [ "org.labkey.api.pipeline.RecordedAction$DataFile", {
        "_uri" : "file:\/\/\/home\/groups\/WallLab\/primeseq\/8\/@files\/sequenceImportPipeline\/SequenceImport_2025-12-05_13-09-15\/UAM-46401_R1.fastq.gz",
        "_role" : "Input FASTQ",
        "_transient" : false,
        "_generated" : false
      } ], [ "org.labkey.api.pipeline.RecordedAction$DataFile", {
        "_uri" : "file:\/\/\/home\/groups\/WallLab\/primeseq\/8\/@files\/sequenceImportPipeline\/SequenceImport_2025-12-05_13-09-15\/UAM-46401_R2.fastq.gz",
        "_role" : "Input FASTQ",
        "_transient" : false,
        "_generated" : false
      } ] ] ],
      "_outputs" : [ "java.util.LinkedHashSet", [ [ "org.labkey.api.pipeline.RecordedAction$DataFile", {
        "_uri" : "file:\/\/\/home\/exacloud\/gscratch\/prime-seq\/workDir\/10f37793-b6a1-103e-82cb-f619086abb6f\/SequenceA.work\/adapters.fasta",
        "_role" : "FASTQ Processing Output",
        "_transient" : false,
        "_generated" : false
      } ], [ "org.labkey.api.pipeline.RecordedAction$DataFile", {
        "_uri" : "file:\/\/\/home\/exacloud\/gscratch\/prime-seq\/workDir\/10f37793-b6a1-103e-82cb-f619086abb6f\/SequenceA.work\/UAM-46401_R1\/Preprocessing\/UAM-46401_R1.Trimmomatic.fastq.gz",
        "_role" : "Processed FASTQ",
        "_transient" : false,
        "_generated" : false
      } ], [ "org.labkey.api.pipeline.RecordedAction$DataFile", {
        "_uri" : "file:\/\/\/home\/exacloud\/gscratch\/prime-seq\/workDir\/10f37793-b6a1-103e-82cb-f619086abb6f\/SequenceA.work\/UAM-46401_R1\/Preprocessing\/UAM-46401_R2.Trimmomatic.fastq.gz",
        "_role" : "Processed FASTQ",
        "_transient" : false,
        "_generated" : false
      } ] ] ],

Here is some longer background, which I'm writing in case LabKey decides to further tighten up security on cross-container files. I am all for this, but we have some longstanding use-cases I hope can be considered:

We have PipelineJobs that execute on a cluster, where the working directory is a completely different filesystem from the LabKey server. Inputs and outputs are recorded using those filepaths, which are saved into RecordedActions. That is serialized back to the server, which processes it into the Experiment/ExpRun system. From a permission perspective, I would be supportive of having a site- or container-level mechanism for the admin to provide an inclusion list of allowable file paths. This would be a relatively easy thing to administer (they do not change much), and would be a step forward on security.

In addition to instances where these reference a file outside of the LK file root, there are longstanding use cases where a file is referenced across containers. This includes a file in the /Shared container (which is commonly a special-case for permissions). This also includes instances where a workbook references a file in the parent container, or across sibling workbooks. Workbook/Parent relationships are also longstanding allowable cross-container interactions. When thinking about how to enforce filesystem permissions, I would suggest taking notes from how Query works (and also ContainerFilter). The problem of deciding what query rows to show is conceptually similar to what pipeline roots should or should not be allowable.

@bbimber
Copy link
Collaborator Author

bbimber commented Dec 12, 2025

Also - my user cannot create branches on the LK side of the fork. Would someone be able to copy this feature branch here so that TeamCity will run?

@labkey-adam
Copy link
Contributor

Also - my user cannot create branches on the LK side of the fork. Would someone be able to copy this feature branch here so that TeamCity will run?

Feature branch has been created and tests are running.

@XingY
Copy link
Contributor

XingY commented Dec 13, 2025

Code scan passed: #7264

@bbimber bbimber merged commit 0be8652 into LabKey:release25.11-SNAPSHOT Dec 13, 2025
8 checks passed
@bbimber
Copy link
Collaborator Author

bbimber commented Dec 13, 2025

Thanks @XingY.

I just saw your other PR with the Copilot comment about a null check - do you want me to add this? https://github.com/LabKey/platform/pull/7264/changes

@bbimber bbimber deleted the 25.11_fb_expdata_lsid branch December 13, 2025 22:27
@XingY
Copy link
Contributor

XingY commented Dec 14, 2025

Thanks @XingY.

I just saw your other PR with the Copilot comment about a null check - do you want me to add this? https://github.com/LabKey/platform/pull/7264/changes

Sure. thanks for doing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants