-
Notifications
You must be signed in to change notification settings - Fork 39
OCPERT-270: Add job command for running image consistency check prow job #845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@tomasdavidorg: This pull request references OCPERT-270 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@tomasdavidorg: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Code Review: Image Consistency Check Prow JobOverviewThis PR adds functionality to trigger image consistency check Prow jobs with custom payload URLs and merge request IDs. The implementation follows existing patterns in the codebase. Key Improvements Recommended1. Replace Fixed Sleep with Polling Logic ⭐Current Issue: Using Better Approach: Enhance the existing Why reuse
Proposed Design: class Jobs:
# Add class constants for polling configuration
JOB_POLL_INITIAL_DELAY = 1 # Start with 1 second
JOB_POLL_MAX_DELAY = 10 # Cap at 10 seconds
JOB_POLL_BACKOFF_FACTOR = 2 # Double each retry
JOB_POLL_TIMEOUT = 60 # Max 60 seconds total
def get_job_results(self, job_id, job_name=None, payload=None,
upgrade_from=None, upgrade_to=None,
poll=False, max_wait_seconds=None):
"""
Get job results with optional polling.
Args:
poll: If True, retry with exponential backoff until job is ready
max_wait_seconds: Maximum time to wait when polling (default: 60s)
"""
if not job_id:
print("No job ID input, exit...")
sys.exit(0)
if max_wait_seconds is None:
max_wait_seconds = self.JOB_POLL_TIMEOUT
# If polling disabled, make single attempt (original behavior)
if not poll:
return self._fetch_job_info_once(job_id, job_name, payload,
upgrade_from, upgrade_to)
# Polling enabled: retry with exponential backoff
delay = self.JOB_POLL_INITIAL_DELAY
elapsed = 0
attempt = 0
while elapsed < max_wait_seconds:
attempt += 1
result = self._fetch_job_info_once(
job_id, job_name, payload, upgrade_from, upgrade_to,
attempt=attempt, silent=True
)
if result is not None:
print(f"Job {job_id} ready after {elapsed}s (attempt {attempt})")
return result
if attempt == 1:
print(f"Job {job_id} not ready yet, polling...")
time.sleep(delay)
elapsed += delay
delay = min(delay * self.JOB_POLL_BACKOFF_FACTOR, self.JOB_POLL_MAX_DELAY)
raise TimeoutError(
f"Job {job_id} not ready after {max_wait_seconds}s ({attempt} attempts)."
)
def _fetch_job_info_once(self, job_id, job_name=None, payload=None,
upgrade_from=None, upgrade_to=None,
attempt=1, silent=False):
"""
Fetch job info once (single HTTP request).
Extracted from get_job_results for reusability.
Args:
silent: If True, suppress error messages (used during polling)
"""
try:
resp = self._get_session().get(url=self.prow_job_url.format(job_id.strip()))
if resp.status_code == 200 and resp.text:
job_result = yaml.safe_load(resp.text)
# Check if job has required fields
if job_result and "status" in job_result and "url" in job_result["status"]:
status = job_result["status"]
spec = job_result["spec"]
job_name = spec["job"]
job_url = status.get("url")
job_state = status.get("state")
job_start_time = status.get("startTime")
if not silent:
print(job_name, payload, job_id, job_start_time, job_url, job_state)
job_dict = {
"jobName": job_name,
"payload": payload,
"upgrade_from": upgrade_from,
"upgrade_to": upgrade_to,
"jobStartTime": job_start_time,
"jobID": job_id,
"jobURL": job_url,
"jobState": job_state,
}
if "completionTime" in status:
job_dict["jobCompletionTime"] = status["completionTime"]
self.save_job_data(job_dict)
if not silent:
print("Done.\n")
return job_dict
else:
if not silent:
print(f"Attempt {attempt}: Job info incomplete")
return None
else:
if not silent:
print(f"Attempt {attempt}: HTTP {resp.status_code}")
return None
except Exception as e:
if not silent:
print(f"Attempt {attempt}: Error: {e}")
return NoneThen update both methods to use polling: def run_image_consistency_check(self, payload_url: str, mr_id: int) -> dict:
# ... validation and job trigger code ...
job_id = json.loads(job_run_res.text)["id"]
print(f"Image consistency check job triggered with ID: {job_id}")
# Use get_job_results with polling enabled!
job_info = self.get_job_results(job_id, poll=True)
print(f"Image consistency check job URL: {job_info['jobURL']}")
return job_info
def run_job(self, job_name, payload, upgrade_from, upgrade_to):
# ... existing code ...
if res.status_code == 200:
job_id = json.loads(res.text)["id"]
print(f"Returned job id: {job_id}")
# Replace time.sleep(5) with polling
try:
self.get_job_results(
job_id, job_name, payload, upgrade_from, upgrade_to,
poll=True # Enable polling!
)
except TimeoutError as e:
print(f"Warning: {e}")
except Exception as e:
print(f"get job result error: {e}")Benefits:
2. Add Payload URL Validation 🔒Security/Reliability Concern: The
Proposed Validation: def _validate_payload_url(self, payload_url: str) -> bool:
"""
Validate OpenShift release payload URL format.
Expected format:
quay.io/openshift-release-dev/ocp-release:<version>-<arch>
Examples:
Valid:
- quay.io/openshift-release-dev/ocp-release:4.19.1-x86_64
- quay.io/openshift-release-dev/ocp-release:4.18.0-0.nightly-2024-01-15-x86_64
Invalid:
- https://quay.io/... (has protocol)
- random-registry.io/... (wrong registry)
"""
pattern = r'^quay\.io/openshift-release-dev/ocp-release:\d+\.\d+\.\d+.*-(x86_64|aarch64|ppc64le|s390x|multi)$'
return re.match(pattern, payload_url) is not None
def run_image_consistency_check(self, payload_url: str, mr_id: int) -> dict:
# Validate payload URL first
if not self._validate_payload_url(payload_url):
raise ValueError(
f"Invalid payload URL format: {payload_url}\n"
f"Expected: quay.io/openshift-release-dev/ocp-release:<version>-<arch>"
)
# ... rest of implementation ...Benefits:
Additional Suggestions3. Extract Job Name Constant# At class level
IMAGE_CONSISTENCY_CHECK_JOB = "periodic-ci-openshift-release-tests-master-image-consistency-check"4. Return Job MetadataConsider returning structured data for programmatic access: return {
"job_id": job_id,
"job_url": job_info["jobURL"],
"payload_url": payload_url,
"mr_id": mr_id
}SummaryCritical (Please Address):
Nice to Have: The core implementation is solid! With these improvements, it will be more robust, reusable, and production-ready. Let me know if you'd like me to provide a working patch for the polling logic. |
https://issues.redhat.com/browse/OCPERT-270