Skip to content

TIKA-4705 -- resourceName of nested tarball should not contain the pa…#2730

Draft
iachimoe wants to merge 1 commit intoapache:branch_3xfrom
iachimoe:feat/TIKA-4705_draft
Draft

TIKA-4705 -- resourceName of nested tarball should not contain the pa…#2730
iachimoe wants to merge 1 commit intoapache:branch_3xfrom
iachimoe:feat/TIKA-4705_draft

Conversation

@iachimoe
Copy link
Copy Markdown

@iachimoe iachimoe commented Apr 2, 2026

Detailed description of issue is at https://issues.apache.org/jira/browse/TIKA-4705

if (StringUtils.isBlank(name)) {
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use FilenameUtils.getName(name)

@Test
public void testNestedTarball() throws Exception {
List<Metadata> list = getRecursiveMetadata("test-nested-tarball.tar");
List<String> actualInternalPaths =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be actualResourceNames

.map(m -> m.get(TikaCoreProperties.RESOURCE_NAME_KEY))
.collect(Collectors.toList());

List<String> expectedInternalPaths = Arrays.asList("test-nested-tarball.tar",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be just the file names, right?

"/nested.tgz/nested.tar/testTXT.txt",
"/nested.tgz/nested.tar",
"/nested.tgz"), actualEmbeddedPaths);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for internalPaths?

There are three things we care about with this change: resource names (should be just the file name), embedded resource path and the internal path.

Copy link
Copy Markdown
Contributor

@tballison tballison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please aim this against main. I'll cherrypick back to 3.x.

I think this makes sense and is a good catch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants