Skip to content

NIFI-15758: Add fragment attribute support to UnpackContent and remov…#11058

Open
Scrooge-McDucks wants to merge 1 commit intoapache:mainfrom
Scrooge-McDucks:NIFI-15758
Open

NIFI-15758: Add fragment attribute support to UnpackContent and remov…#11058
Scrooge-McDucks wants to merge 1 commit intoapache:mainfrom
Scrooge-McDucks:NIFI-15758

Conversation

@Scrooge-McDucks
Copy link
Copy Markdown
Contributor

@Scrooge-McDucks Scrooge-McDucks commented Mar 27, 2026

…e fragment attributes from MergeContent in Defragment mode

Summary

This change adds optional fragment attribute support to UnpackContent so unpacked FlowFiles can be regrouped downstream using MergeContent in Defragment mode.

It also updates MergeContent to remove reassembly-related attributes from the merged FlowFile once defragmentation has completed successfully, including:

  • fragment.identifier
  • fragment.index
  • fragment.count
  • segment.original.filename

Motivation

A common dataflow pattern is:

  1. Data is packaged to optimise transport
  2. UnpackContent extracts individual FlowFiles
  3. Files are enriched or transformed independently
  4. Files are regrouped and repackaged

This works well conceptually, but today UnpackContent does not provide a built-in way to assign the fragment attributes needed for downstream reassembly across formats such as ZIP, TAR, and FlowFile Package.

Without those attributes, users need custom logic to preserve grouping and ordering, which adds complexity and can lead to inconsistent behaviour.

This change makes that workflow easier by allowing UnpackContent to optionally generate fragment attributes, while ensuring MergeContent removes the temporary reassembly metadata once the final merged FlowFile has been produced.

Changes Included

UnpackContent

Added optional support for assigning fragment attributes to unpacked FlowFiles.

New Properties

Add Fragment Attributes

  • When enabled, assigns:
    • fragment.identifier
    • fragment.index
    • fragment.count
    • segment.original.filename

Fragment Identifier Value

  • Specifies the value used for fragment.identifier
  • Supports Expression Language evaluated against the incoming packed FlowFile
  • Evaluated once per source FlowFile, with the resulting value applied to all unpacked FlowFiles derived from that source
  • Default: ${UUID()}

Examples:

  • ${UUID()} for a unique grouping per archive (default)
  • ${filename} for grouping based on the original filename
  • ${archive.filename} when an explicit archive attribute is available

Behaviour

When enabled:

  • All FlowFiles produced from a single archive share the same fragment.identifier
  • fragment.index is assigned based on entry order within the archive
  • fragment.count is set to the total number of unpacked entries
  • The identifier expression is evaluated once per parent FlowFile

When disabled:

  • No change to current UnpackContent behaviour

MergeContent

Updated MergeContent so that after a successful defragmentation, the merged FlowFile no longer retains temporary reassembly metadata.

When operating in Defragment mode, the merged FlowFile now removes:

  • fragment.identifier
  • fragment.index
  • fragment.count
  • segment.original.filename

This ensures the final merged output reflects the completed repackaged artifact rather than the intermediate fragmentation state used to drive regrouping.

Compatibility

  • Fully backward compatible
  • Fragment attribute generation in UnpackContent is opt-in
  • Existing flows are unchanged unless the new property is enabled
  • The MergeContent cleanup only applies after successful defragmentation

Example

Input:

  • archive.zip containing 3 files

Unpack output when enabled with ${filename} as the identifier:

filename fragment.identifier fragment.index fragment.count
file1.txt archive.zip 0 3
file2.txt archive.zip 1 3
file3.txt archive.zip 2 3

After processing and successful defragmentation in MergeContent, the merged FlowFile no longer retains:

  • fragment.identifier
  • fragment.index
  • fragment.count
  • segment.original.filename

Summary

NIFI-15758

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
  • Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using ./mvnw clean install -P contrib-check
    • JDK 21
    • JDK 25

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

…e fragment attributes from MergeContent in Defragment mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant