Skip to content

HDDS-14043. Fix ls -e UnsupportedOperationException on ofs/o3fs#10209

Open
chihsuan wants to merge 8 commits intoapache:masterfrom
chihsuan:HDDS-14043
Open

HDDS-14043. Fix ls -e UnsupportedOperationException on ofs/o3fs#10209
chihsuan wants to merge 8 commits intoapache:masterfrom
chihsuan:HDDS-14043

Conversation

@chihsuan
Copy link
Copy Markdown
Contributor

@chihsuan chihsuan commented May 7, 2026

What changes were proposed in this pull request?

Why

ozone fs -ls -e <path> throws UnsupportedOperationException: FileSystem ofs://om does not support Erasure Coding against any Ozone cluster, on both ofs:// and o3fs://. The message is misleading — Ozone supports EC. The real cause: Hadoop's upstream Ls -e reads ContentSummary.getErasureCodingPolicy() and rejects null, and Ozone's filesystem implementations were never setting the field.

What

Populate the EC-policy field on the ContentSummary returned by both ofs:// and o3fs://, mirroring HDFS's ContentSummaryComputationContext.getErasureCodingPolicyName(INode):

  • FilesReplicated for non-EC, the canonical EC scheme name (e.g. rs-3-2-1024k) for EC.
  • Directories
    • When listing a volume, each bucket entry reports the bucket's EC scheme name (if EC-configured) or "" otherwise.
    • Descendant directories follow the underlying OmKeyInfo's replication config: FSO bucket directories carry their own config (so an EC-configured intermediate dir reports its scheme); OBS/LEGACY synthetic directories report "" because they have no real key entry.
    • The OFS root, volumes, and snapshot indicators report "".
    • Known gap not addressed by this PR: direct fs.getFileStatus(bucketPath) / fs.getContentSummary(bucketPath) go through OzoneBucket#getFileStatus(""), whose synthesized OmKeyInfo carries OM's default replication config rather than the bucket's. As a result, these calls report "" for EC buckets even though listing the parent volume reports the bucket's EC scheme correctly. Aligning the two paths is a follow-up; this PR fixes the original UnsupportedOperationException by setting the field at all (non-null).

The reported policy is always for the queried path itself, not aggregated from descendants.

How

The policy is plumbed through a new ecPolicy field on FileStatusAdapter — per-key ReplicationConfig for real keys, the bucket's own ReplicationConfig for the synthetic bucket / bucket-snapshot adapters used when listing a volume, and "" for synthetic root / volume / snapshot-indicator adapters. getContentSummary then sets the field on the ContentSummary.Builder using the path's own FileStatusAdapter. BasicOzoneFileSystem also gains a getContentSummary override so o3fs no longer falls through to the FileSystem default, which left the field null.

ContentSummary.Builder.erasureCodingPolicy(String) does not exist in Hadoop 2.10.2, so the new builder call is isolated behind a protected applyEcPolicy hook, overridden only in the Hadoop 3 subclasses (ozonefs/, ozonefs-hadoop3/). ozonefs-hadoop2 inherits a no-op default — and Hadoop 2.10.2's Ls has no -e flag anyway, so there is no functional regression.

The PR is split into three commits (plumbing → fix → tests) for review; squash on merge as usual.

Notes for reviewers

There is some repeated shape between the two getContentSummary methods and across the four toFileStatusAdapter call sites (single-file ContentSummary.Builder block + the ecPolicy ternary). I have kept it as-is to keep this PR scoped to the -ls -e fix — happy to extract helpers in this PR if reviewers prefer, or to address it as a follow-up ticket.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14043

How was this patch tested?

  • New regression tests in AbstractOzoneFileSystemTest and AbstractRootedOzoneFileSystemTest exercise the fix end-to-end through TestO3FS, TestO3FSWithFSO, TestOFS, and TestOFSWithFSO. testLsDashEDoesNotThrow runs Hadoop FsShell directly with -ls -e and asserts return code 0 — the literal regression guard for the original exception. testContentSummaryErasureCodingPolicy asserts the file/directory values described above.
  • All four ozonefs* modules compile cleanly. ozonefs-hadoop2 compiles ozonefs-common against the Hadoop 2.10.2 classpath, which is the load-bearing check that the new code path doesn't reference any Hadoop 3-only API.
  • Manually reproduced the original bug, then confirmed the fix end-to-end against compose/ozone:
# Build the dist with the fix
mvn clean install -DskipTests -DskipShade -DskipRecon -Pdist -q

# Bring up the compose cluster
cd hadoop-ozone/dist/target/ozone-*-SNAPSHOT/compose/ozone
OZONE_REPLICATION_FACTOR=3 docker compose up -d

# Create a volume, bucket, and a single key (single-DN cluster -> RATIS/ONE)
docker compose exec -T scm ozone sh volume create /vol1
docker compose exec -T scm ozone sh bucket create /vol1/buck1
docker compose exec -T scm bash -c \
  'echo hello > /tmp/hi && ozone sh key put -t RATIS -r ONE /vol1/buck1/file1 /tmp/hi'

# Each of these threw UnsupportedOperationException before the fix; all return 0 and
# print a normal listing afterwards (file rows show "Replicated" in the EC-policy
# column; directory rows show an empty cell, matching HDFS).
docker compose exec -T scm bash -c 'ozone fs -ls -e /vol1/buck1/'
docker compose exec -T scm bash -c 'ozone fs -ls -R -e /vol1/buck1'
docker compose exec -T scm bash -c 'ozone fs -ls -e o3fs://buck1.vol1/'

docker compose down -v

chihsuan added 3 commits May 7, 2026 21:58
Adds a nullable erasureCodingPolicy field to FileStatusAdapter and
populates it from each key's ReplicationConfig (canonical EC scheme
name when EC, "Replicated" otherwise) in BasicOzoneClientAdapterImpl
and BasicRootedOzoneClientAdapterImpl. Synthetic adapters for buckets
and bucket snapshots derive the policy from the bucket's own
ReplicationConfig instead of hardcoding "Replicated", which previously
contradicted the existing isErasureCoded flag for EC buckets.

The 15-arg FileStatusAdapter constructor is preserved as a back-compat
overload that delegates with a null policy. No callers read the new
field yet; that change follows.
Hadoop's "fs -ls -e" reads ContentSummary.getErasureCodingPolicy() and
throws UnsupportedOperationException when it is null. ofs and o3fs were
returning a ContentSummary without the field set, producing the
misleading "FileSystem ofs://om does not support Erasure Coding" error
on every "-ls -e" against an Ozone cluster.

BasicOzoneFileSystem and BasicRootedOzoneFileSystem now set the policy
on the builder using the path's own FileStatusAdapter, matching HDFS's
"policy of the nearest ancestor" semantic rather than aggregating
descendants. BasicOzoneFileSystem also gains a getContentSummary
override so o3fs no longer falls through to the FileSystem default
(which left the field null).

The Builder.erasureCodingPolicy(String) call does not exist on Hadoop
2.10.2's ContentSummary.Builder, so it is isolated behind a protected
applyEcPolicy hook overridden only in the Hadoop 3 subclasses
(ozonefs and ozonefs-hadoop3). ozonefs-hadoop2 inherits a no-op
default and is unaffected; Hadoop 2.10.2 has no "-ls -e" flag anyway.
Adds two tests to each of AbstractOzoneFileSystemTest and
AbstractRootedOzoneFileSystemTest:

- testContentSummaryErasureCodingPolicy verifies a Ratis file reports
  "Replicated" and an EC file reports the canonical scheme name (e.g.
  rs-3-2-1024k); on rooted ofs the parent directory of a mixed listing
  also reports the bucket's policy.
- testLsDashEDoesNotThrow runs Hadoop FsShell with "-ls -e" against
  the bucket and asserts return code 0 - the literal regression guard
  for the original UnsupportedOperationException.

Coverage runs through TestO3FS, TestO3FSWithFSO, TestOFS and
TestOFSWithFSO.
@peterxcli peterxcli requested a review from jojochuang May 7, 2026 14:24
chihsuan added 3 commits May 7, 2026 22:32
- Drop redundant inline `no-op` comment in default applyEcPolicy;
  the Javadoc above already explains the Hadoop 2/3 split.
- Drop dead `ecPolicy == null` coercion in getContentSummary; every
  FileStatusAdapter producer this PR touches sets a non-null string
  and the Hadoop 3 override already null-guards.
- Replace `RandomUtils.secure().randomBytes(1)` test fillers with
  `new byte[]{0}`; remove now-unused import in the o3fs test.
- Add `testContentSummaryErasureCodingPolicyOnEcBucket` exercising
  the synthetic-bucket-adapter EC branch (previously unasserted).
@chihsuan chihsuan marked this pull request as ready for review May 7, 2026 15:25
@chihsuan chihsuan marked this pull request as draft May 7, 2026 15:25
@chihsuan chihsuan marked this pull request as ready for review May 8, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant