Skip to content

HDDS-15181. Added Robot smoketests for snapshot defrag on single-node compose#10200

Open
arunsarin85 wants to merge 2 commits intoapache:masterfrom
arunsarin85:HDDS-15181
Open

HDDS-15181. Added Robot smoketests for snapshot defrag on single-node compose#10200
arunsarin85 wants to merge 2 commits intoapache:masterfrom
arunsarin85:HDDS-15181

Conversation

@arunsarin85
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

  • Add snapshot/snapshot-defrag.robot: Robot Framework tests that exercise snapshot behavior while the OM is configured for periodic snapshot defrag in the unsecure compose/ozone environment.

  • Enable ozone.snapshot.defrag.service.interval only on the OM service in docker-compose.yaml

Please describe your PR in detail:
snapshot defrag basics from the user-visible side: reads, listing, diff, and delete, with waits so background defrag can run.

Scenarios in snapshot-defrag.robot (8 tests):

  1. Read snapshot data right after create - New snapshot’s .snapshot path matches the file that was stored (/etc/hosts).
  2. After waiting, snapshot and live bucket still match - Add another key on the live bucket, wait ~65s, then the first snapshot still has the old content and the live key matches /etc/passwd.
  3. Snapshot list still shows active - ozone sh snapshot ls lists the snapshot and SNAPSHOT_ACTIVE.
  4. Second snapshot sees all keys so far - Third key + second snapshot; older snapshot still only reflects the first key; newer snapshot can read all three keys.
  5. snapshot diff starts a new job - CLI diff between the two snapshots shows the usual “new job” / --get-report messaging.
  6. snapshot diff JSON lists added keys - --get-report --json completes with DONE and lists the keys that appeared after the first snapshot.
  7. Same JSON diff after another wait - Second ~65s wait, then run the JSON diff again and still expect the same key paths in the report (stability after more defrag time).
  8. Delete older snapshot, younger one still readable - Delete the first snapshot; listing shows SNAPSHOT_DELETED; keys are still readable through the second snapshot path.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15181

How was this patch tested?

Robot / Docker: execute_robot_test scm snapshot/snapshot-defrag.robot against hadoop-ozone/dist/target/ozone-*/compose/ozone after mvn clean package -DskipTests -Pdist (correct /opt/hadoop mount).

image

robot-001.xml

Comment thread hadoop-ozone/dist/src/main/compose/ozone/docker-compose.yaml Outdated
Comment thread hadoop-ozone/dist/src/main/smoketest/snapshot/snapshot-defrag.robot
Comment thread hadoop-ozone/dist/src/main/smoketest/snapshot/snapshot-defrag.robot Outdated
... checkpointDir / status) and re-read keys through snapshot paths. We do not rerun snapshot
... diff --get-report here: a completed diff report is served from cache for
... ozone.om.snapshot.diff.job.report.persistent.time, so that call would not retrigger work.
Sleep ${DEFRAG_WAIT_SECONDS}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a step around here to check the YAML to confirm defrag completion, something like this:

*** Keywords ***
Get Snapshot Local YAML Path
    [Arguments]    ${snapshot_name}
    ${info} =          Execute    ozone sh snapshot info /${VOLUME}/${BUCKET} ${snapshot_name}
    ${snapshot_id} =   Execute    echo '${info}' | jq -r '.snapshotId'
    [Return]           /data/metadata/db.snapshots/checkpointState/om.db-${snapshot_id}.yaml

Snapshot Local YAML Should Show Defragged
    [Arguments]    ${snapshot_name}
    ${yaml} =          Get Snapshot Local YAML Path    ${snapshot_name}
    Execute            test -f ${yaml}
    ${version} =       Execute    awk '/^version:/ {print $2}' ${yaml}
    ${needs_defrag} =  Execute    awk '/^needsDefrag:/ {print $2}' ${yaml}
    Should Be True     ${version} > 0
    Should Be Equal    ${needs_defrag}    false

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually with this, you can poll rather than waiting for fixed amount of time, so as to make the test finish quicker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants