-
Notifications
You must be signed in to change notification settings - Fork 28
Fix local natlab setup checks #1588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| new_connection_raw(conn_tag) | ||
| ) | ||
| except Exception as e: # pylint: disable=broad-exception-caught | ||
| log.warning("Failed to check MAC address for %s: %s", conn_tag.name, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: should we add a check if it's CI? Without a check we will need to backtrack to this place from the consequences that are caused by it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean if we should swallow the exception in CI? I can re raise it if we're on CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I wonder if it's worth to re-raise in CI since then we could find the root cause sooner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and think it wouldn't worth to re-raise it. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My wording might have been imprecise - I think it's worth to re-raise. This should allow to detect the connection failures closer to the root cause :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| """ | ||
| if "GITLAB_CI" not in os.environ: | ||
| log.info("setup_check: skipping ARP cache validation on non-CI environment") | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not all VMs are started should we still continue for the ones that are present instead of skipping all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Instead of skipping the check, I can reduce the timeout which currently is 300s/5min or set a number of retries for each ARP entry. At the moment the address is pinged until it gets one of the acceptable states: ["REACHABLE", "STALE", "DELAY", "PROBE", "PERMANENT"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively I can check which VMs are running and if the session marks requires them. Then run the checks to only the ones that are running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of only checking the containers we care about but maybe the easier solution is to just reduce the timeout as you say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the reduced timeout the test will still fail, it's easier but definitely not the best. IMHO would be great if we check that the required containers for the session are running and healthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my experience running natlab locally is quite a challenge already especially when things don't work so ignoring ARPs locally is not a good choice unless I miss something.
The idea of checking only the containers needed for the test is a nice one
ed0254e to
b80c355
Compare
jjanowsk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I don't have other remarks then Lukas, but there is one place where you could improve the experience. In helpers.py in the if prepare_vpn block we should not prepare nlx if it is not needed. Otherwise I get failures there if I run vpn test without nlx container running
b80c355 to
c343c1a
Compare
c343c1a to
342c043
Compare
342c043 to
622049d
Compare
Added commit to only prepare the server containers/VM that are required for the given pytest run. |
Problem
Some local checks require a virgin natlab environment with all the VMs running. This barely happens locally.
Solution
This PR:
100.64.0.1)☑️ Definition of Done checklist