Pick latest pgjsonb get_fun return by alter_time, not by MAX(jid)#69065
Open
co-cy wants to merge 2 commits intosaltstack:3006.xfrom
Open
Pick latest pgjsonb get_fun return by alter_time, not by MAX(jid)#69065co-cy wants to merge 2 commits intosaltstack:3006.xfrom
co-cy wants to merge 2 commits intosaltstack:3006.xfrom
Conversation
added 2 commits
May 5, 2026 13:12
`get_fun` issued:
SELECT s.id, s.jid, s.full_ret
FROM salt_returns s
JOIN (SELECT MAX(`jid`) AS jid
FROM salt_returns GROUP BY fun, id) max
ON s.jid = max.jid
WHERE s.fun = %s
The backticks around `jid` are MySQL-style identifier quoting. On
PostgreSQL -- which is the only server pgjsonb talks to -- the
parser rejects this with `syntax error at or near "`"`. The query
was copy-pasted from `salt/returners/mysql.py` where the syntax is
valid; the SQL identifier quoting was not adjusted.
The bug is dormant in typical deployments: stock `master_job_cache`
operation goes through `get_load`, `get_jid`, `get_jids`, but not
`get_fun`. It surfaces only when an operator (or an extension)
explicitly calls `pgjsonb.get_fun` -- e.g. through a custom runner
or via `master_job_cache.get_fun` from external tooling.
Drop the backticks. `jid` is not a reserved word in PostgreSQL, so
no quoting is needed. While here, normalise the indentation of the
SQL string for readability; behaviour is otherwise unchanged.
Add a behavioural test that calls `get_fun` against a mocked
cursor, asserts the per-minion mapping is built correctly, and
guards against backticks creeping back into the issued SQL through
future copy-paste from the mysql returner.
Refs: saltstack#69062
`get_fun` chose the latest return per minion with:
SELECT s.id, s.jid, s.full_ret
FROM salt_returns s
JOIN (SELECT MAX(jid) AS jid
FROM salt_returns GROUP BY fun, id) max
ON s.jid = max.jid
WHERE s.fun = %s
`MAX(jid)` is the lexicographic max of the `jid` column, not the
time-latest return. It happens to coincide for Salt's default
`YYYYMMDDHHMMSSffffff` format and the `nano` variant because those
are timestamp-formatted strings of equal length. It silently fails
for:
* Deployments that override `master_job_cache.gen_jid` with a custom
scheme -- random UUIDs, snowflake ids, hash-based identifiers --
none of which sort lexicographically as timestamps.
* `prep_jid(passed_jid="...")` from external publishers (salt-api,
salt-ssh, runners, orchestrate) that hand the master a custom jid.
* Salt-runs that span a `jid_format` config change, leaving rows of
different formats in the same `salt_returns` table.
In all of those cases the lexicographic max is not the time-latest
and `get_fun` returns the wrong row. The function looks like it
worked, the operator gets the wrong answer.
Replace with `alter_time` ordering, which Postgres populates from
`DEFAULT NOW()` and which therefore reflects the actual insertion
order regardless of jid format:
SELECT DISTINCT ON (id)
id, jid, full_ret
FROM salt_returns
WHERE fun = %s
ORDER BY id, alter_time DESC
`DISTINCT ON (id)` is Postgres-specific (the file already targets
Postgres). It returns one row per `id`, picking the first per
`ORDER BY` -- here, the most recent `alter_time` per minion.
Add a behavioural test that pins the algorithm: the issued SQL
must order by `alter_time DESC`, and must not contain `MAX(jid)`.
Note on performance: there is no index on `salt_returns.alter_time`
in the documented schema, so this remains a sequential scan, same
as the previous `MAX(jid) GROUP BY` form. The lack of an
`alter_time` index is a separate, larger schema discussion that
will be tracked as a follow-up RFC.
Depends on saltstack#69063 (which removed the MySQL-style backtick quoting
from this function); this PR builds on that branch and should be
merged after it.
Refs: saltstack#69064
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
salt/returners/pgjsonb.py:get_funchose the latest return per minion withMAX(jid). That is the lexicographic maximum of thejidcolumn, not the time-latest return. It happens to coincide for Salt's default jid format (YYYYMMDDHHMMSSffffff) and thenanovariant — both are timestamp-formatted strings of equal length, so lex order matches chronological. It silently breaks for any deployment where that assumption does not hold:master_job_cache.gen_jidis overridden with a custom scheme — UUIDs, snowflake ids, hash-based identifiers — none lexicographically sortable as timestamps.prep_jid(passed_jid="...")with a custom jid string.jid_formatconfig change and ends up with rows of different formats in the samesalt_returns.In all those cases
MAX(jid)returns a row that is not the time-latest, andget_funreports the wrongfull_retper minion. No exception, no warning — just a silently wrong answer.This PR replaces the lex-max algorithm with
ORDER BY alter_time DESCandDISTINCT ON (id).alter_timeis populated by Postgres viaDEFAULT NOW(), so "latest" reflects actual insertion order regardless of jid format.What issues does this PR fix or reference?
Fixes #69064
Previous Behavior
For two returns of the same fun on the same minion with custom jids
"abc-001"and"000-002"(chronologically the second is newer),MAX(jid)picks"abc-001"— wrong.New Behavior
alter_time DESCorders by Postgres-managed insertion timestamp;DISTINCT ON (id)keeps the first row per minion. The chronologically newest return wins regardless of jid format.Performance
There is no index on
salt_returns.alter_timein the documented schema, so this query remains a sequential scan — same as the previousMAX(jid) GROUP BYform. Adding analter_timeindex, alongside several other schema observations (jids.created_at, redundant GIN indexes,salt_returns.successtyping), will be tracked as a separate follow-up RFC.Merge requirements satisfied?
changelog/69064.fixed.mdtests/pytests/unit/returners/test_pgjsonb.py:test_get_fun_orders_by_alter_time_desc_not_max_jid— pins the algorithm: SQL must order byalter_time DESCand must not containMAX(jid).Commits signed with GPG?
No.