Skip to content

Conversation

@crankynetman
Copy link
Collaborator

@crankynetman crankynetman commented May 15, 2025

This one was a doozie. Expired/deleted entries on one SCRAM instance were never unblocked on other instances because there is no mechanism for postgres to let django know that it needs to re-run stuff, only the active django re-runs stuff. Now, we get around that by calling process_updates and re-updating anything changed recently, however, before, the sync logic only looked at creation time (when field), missing any entries that were removed..

To fix this in a lot less dumb way than we used to, we now we use django-simple-history to detect any entry changes and then go ahead and just reprocess them to what the DB says they should be doing.

There is a lot in this PR, but the main changes are:

  • Split up process_updates() into:

    • get_entries_to_process(cutoff_time)
      • This is a function that we could reuse that simply finds all recently modified entries from other instances and returns them so you can Do Stuff® on em.
    • reprocess_entries(entries)
      • This actually "Does Stuff" by sending websocket messages to translators. We could eventually move this out of here and make it more generic to be used everywhere we Do A Websocket® but it needs to be a bit more dynamic to support future action types (see note below, not dealing with this now).
    • _check_for_orphaned_history()
      • This is a total edge case, but since we don't ever delete entries (our delete() override) I wanted to make sure that we catch and log any orphaned history objects that don't have a corresponding entry (say someone goes into the admin panel and actually "hard" deletes an entry, but the history stuff is still there).
  • Made sure to be efficient with DB calls by using select_related(),.exclude(), and values_list() on most of our django/db queries to make sure that we're relying on SQL to do the filtering (using built in django stuff obviously) instead of iterating on the python side (in the case of prod where we have a TON of entries.)

  • We used to rely on msg.msg_type from WebSocketMessage (which is always translator_add), but now we set that based on entry.is_active when we run the sync from DB to translator. Eventually though this needs to be more dynamic to support other action types, not sure how to solve that though.

There are a few caveats here to keep in mind, but for now i think it's fine:

  • This breaks from the existing behave patterns, and adds a pattern where we connect directly to the containers running via compose and ignores the test database. I tried to seperate this out into full integration tests. This means that test data is left behind so we blow away everything in the DB to reset it before and after test runs.
  • It's possible that a docker health check could run while the behave integration tests are running. This is unlikely and I made the health check run less frequently just to help avoid that but... it's gross.

For now, I think this gets us further, so it's still good. It also fixes things to use postgres instead of sqlite, which is a win overall.

@github-actions
Copy link

github-actions bot commented May 15, 2025

File Coverage
All files 81%
config/consumers.py 78%
config/urls.py 69%
config/settings/base.py 69%
config/settings/local.py 72%
scram/route_manager/admin.py 71%
scram/route_manager/models.py 70%
scram/route_manager/views.py 80%
scram/route_manager/api/serializers.py 92%
scram/route_manager/api/views.py 76%
scram/shared/shared_code.py 56%
scram/templates/403.html 91%
scram/templates/404.html 91%
scram/templates/base.html 99%

Minimum allowed coverage is 50%

Generated by 🐒 cobertura-action against 054086b

Copy link
Collaborator

@samoehlert samoehlert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth documenting this a bit since it's a lot and possibly confusing. 6 month from now Sam will have to start over learning this. I think just a flow of how it works would be beneficial. Maybe in decisions.md?

@crankynetman
Copy link
Collaborator Author

I think it might be worth documenting this a bit since it's a lot and possibly confusing. 6 month from now Sam will have to start over learning this. I think just a flow of how it works would be beneficial. Maybe in decisions.md?

Good call, I will do this.

@crankynetman crankynetman marked this pull request as ready for review December 18, 2025 02:51
Copy link
Collaborator

@samoehlert samoehlert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/acceptance/steps/multi_instance.py seems like it has some duplicate code compared to tests/integration/steps/multi_instance.py and I'm not sure if that's fine or not. I can't decide. I get the difference between acceptance and integration so leaving them separate makes sense, but also DRY.

@crankynetman crankynetman changed the title WIP: fix process updates expirations / cleanup syncing fix(process_updates): Fix expirations, cleanup syncing, and add integration tests Dec 19, 2025
Copy link
Collaborator

@samoehlert samoehlert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really big win. And such good tests too!

@crankynetman crankynetman merged commit f39c4d8 into main Dec 19, 2025
18 of 23 checks passed
@crankynetman crankynetman deleted the topic/chriscummings/fix-process-updates-expirations branch December 19, 2025 15:59
samoehlert added a commit that referenced this pull request Dec 19, 2025
…ration tests (#157)

This one was a doozie. Expired/deleted entries on one SCRAM instance
were never unblocked on other instances because there is no mechanism
for postgres to let django know that it needs to re-run stuff, only the
active django re-runs stuff. Now, we get around that by calling
process_updates and re-updating anything changed recently, however,
before, the sync logic only looked at creation time (`when` field),
missing any entries that were removed..

To fix this in a lot less dumb way than we used to, we now we use
`django-simple-history` to detect _**any**_ entry changes and then go
ahead and just reprocess them to what the DB says they should be doing.

There is a lot in this PR, but the main changes are:

- Split up `process_updates()` into:

  - `get_entries_to_process(cutoff_time)`
- This is a function that we could reuse that simply finds all recently
modified entries from other instances and returns them so you can Do
Stuff® on em.
  - `reprocess_entries(entries)`
- This actually "Does Stuff" by sending websocket messages to
translators. We could eventually move this out of here and make it more
generic to be used everywhere we Do A Websocket® but it needs to be a
bit more dynamic to support future action types (see note below, not
dealing with this now).
  - `_check_for_orphaned_history()`
- This is a total edge case, but since we don't ever delete entries (our
delete() override) I wanted to make sure that we catch and log any
orphaned history objects that don't have a corresponding entry (say
someone goes into the admin panel and actually "hard" deletes an entry,
but the history stuff is still there).

- Made sure to be efficient with DB calls by using
`select_related()`,`.exclude()`, and `values_list()` on most of our
django/db queries to make sure that we're relying on SQL to do the
filtering (using built in django stuff obviously) instead of iterating
on the python side (in the case of prod where we have a TON of entries.)

- We used to rely on `msg.msg_type` from WebSocketMessage (which is
always `translator_add`), but now we set that based on `entry.is_active`
when we run the sync from DB to translator. Eventually though this needs
to be more dynamic to support other action types, not sure how to solve
that though.

There are a few caveats here to keep in mind, but for now i think it's
fine:

- This breaks from the existing behave patterns, and adds a pattern
where we connect directly to the containers running via compose and
ignores the test database. I tried to seperate this out into full
integration tests. This means that test data is left behind so we blow
away everything in the DB to reset it before and after test runs.
- It's possible that a docker health check could run while the behave
integration tests are running. This is unlikely and I made the health
check run less frequently just to help avoid that but... it's gross.

For now, I think this gets us further, so it's still good. It also fixes
things to use postgres instead of sqlite, which is a win overall.
# Conflicts:
#	scram/route_manager/tests/acceptance/steps/common.py
#	scram/route_manager/tests/acceptance/steps/ip.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants