Skip to content

[cmisCDB] Use module-advertised MaxDuration* for CDB FW polling#664

Draft
pavannaregundi wants to merge 1 commit into
sonic-net:masterfrom
pavannaregundi:cmisCDB
Draft

[cmisCDB] Use module-advertised MaxDuration* for CDB FW polling#664
pavannaregundi wants to merge 1 commit into
sonic-net:masterfrom
pavannaregundi:cmisCDB

Conversation

@pavannaregundi
Copy link
Copy Markdown
Contributor

SONiC bounds every CDB FW status poll either by a fixed 5s tCDBF sleep (CMD 0101h Start, CMD 0107h Complete) or by no delay at all (CMD 0102h/0103h/0104h).

Read the per-command max durations from the CDB 0041h reply (MaxDurationStart/Abort/Write/Complete/Copy at bytes 144-153, scaled by the 137.3 MaxDurationCoding multiplier per CMIS 5.0+ §9.4.2 Table 9-9) and use them as the polling ceiling in cdb1_chkstatus(). Modules that do not implement CMD 0041h fall back to the legacy 60s ceiling.

Changes:

  • cdb1_chkstatus(): new max_wait_ms / poll_interval_ms parameters. Default call (no args) preserves the legacy MAX_WAIT * 100 ms ceiling, so existing callers are unaffected.
  • get_fw_management_features() (CMD 0041h): on success, parse the reply LPL into self._fw_max_duration_ms. Any caller (notably cmis.get_module_fw_mgmt_feature() at the start of every firmware upgrade) pre-populates the cache, avoiding a redundant 0041h round-trip during the download phase.
  • _parse_fw_mgmt_durations(rpl), _get_fw_max_duration_ms(key): new private helpers. The getter is lazy and returns None when the module has no usable advertisement, in which case cdb1_chkstatus() falls back to the legacy ceiling.
  • start_fw_download (0101h), validate_fw_image (0107h): retain the legacy tCDBF pre-sleep to skip the foreground hold-off, then poll cdb1_chkstatus up to MaxDurationStart / MaxDurationComplete with a coarse cadence (2s / 1s) to minimise NACK events for modules whose advertised duration > tCDBF.
  • abort_fw_download (0102h), block_write_lpl (0103h), block_write_epl (0104h): bound polling by the advertised MaxDurationAbort / MaxDurationWrite.
  • commit_fw_image - As per the spec, there is no specific advertisement of the maximum duration for this command which is assumed to be insignificant and in the order of writing administrative information to nonvolatile memory (tWRITE). tWRITENV is 80ms.

Fixes: sonic-net/sonic-buildimage#26243

Description

Motivation and Context

How Has This Been Tested?

Additional Information (Optional)

SONiC bounds every CDB FW status poll either by a fixed 5s tCDBF
sleep (CMD 0101h Start, CMD 0107h Complete) or by no delay at all
(CMD 0102h/0103h/0104h).

Read the per-command max durations from the CDB 0041h reply
(MaxDurationStart/Abort/Write/Complete/Copy at bytes 144-153, scaled
by the 137.3 MaxDurationCoding multiplier per CMIS 5.0+ §9.4.2
Table 9-9) and use them as the polling ceiling in cdb1_chkstatus().
Modules that do not implement CMD 0041h fall back to the legacy 60s
ceiling.

Changes:
- cdb1_chkstatus(): new max_wait_ms / poll_interval_ms parameters.
  Default call (no args) preserves the legacy MAX_WAIT * 100 ms
  ceiling, so existing callers are unaffected.
- get_fw_management_features() (CMD 0041h): on success, parse the
  reply LPL into self._fw_max_duration_ms. Any caller (notably
  cmis.get_module_fw_mgmt_feature() at the start of every firmware
  upgrade) pre-populates the cache, avoiding a redundant 0041h
  round-trip during the download phase.
- _parse_fw_mgmt_durations(rpl), _get_fw_max_duration_ms(key): new
  private helpers. The getter is lazy and returns None when the
  module has no usable advertisement, in which case cdb1_chkstatus()
  falls back to the legacy ceiling.
- start_fw_download (0101h), validate_fw_image (0107h): retain the
  legacy tCDBF pre-sleep to skip the foreground hold-off, then poll
  cdb1_chkstatus up to MaxDurationStart / MaxDurationComplete with a
  coarse cadence (2s / 1s) to minimise NACK events for modules
  whose advertised duration > tCDBF.
- abort_fw_download (0102h), block_write_lpl (0103h),
  block_write_epl (0104h): bound polling by the advertised
  MaxDurationAbort / MaxDurationWrite.
- commit_fw_image - As per the spec, there is no specific advertisement
  of the maximum duration for this command which is assumed to be
  insignificant and in the order of writing administrative information
  to nonvolatile memory (tWRITE). tWRITENV is 80ms.

Fixes: sonic-net/sonic-buildimage#26243

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
@pavannaregundi pavannaregundi marked this pull request as draft May 11, 2026 12:08
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: MCIA I2C errors during parallel fw-upgrade due to fixed CDB timing not aligned with vendor-specific requirements

2 participants