AI+engine heavy board performance optimizations #10507
AI+engine heavy board performance optimizations #10507MostCromulent wants to merge 1 commit intoCard-Forge:masterfrom
Conversation
Engine and AI hot paths doing redundant work on cards that have no possible source of the thing being computed, and one AI loop redoing combat simulation for each of N identical tokens. Add O(1) fast-path guards that skip the work when it provably produces nothing, and batch the AI loop over interchangeable blockers. - AiAttackController.notNeededAsBlockers: detect runs of identical- equivalence-class blockers; batch-release with one predict call and binary-search on MIN_VALUE. Singleton path unchanged. - Card.getChangedCardTraitsList: skip the Iterables.concat over two empty TreeBasedTables (mirrors existing pattern in Card.getChangedCardTypes). - Card.hasRemoveIntrinsic: skip the IterableUtil.any Stream when all three changedCardTypes* tables are empty. - CardState.getReplacementEffects / getStaticAbilities / getTriggers: return FCollection.getEmpty() when no intrinsic, no split sides, no trait overlays, no trait-contributing keywords (and for REs no type / subtype / counter-based synthetics). - KeywordCollection: lazily-computed trait bitmask with mutation invalidation, powering the empty-source checks above. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The fast checkout methods currently might ignore also, the order of the checks might need to be reversed from the layer, to get early return true
|
|
i mean there is default List<SpellAbility> applySpellAbility(List<SpellAbility> list) { return list;}
default List<Trigger> applyTrigger(List<Trigger> list) { return list;}
default List<ReplacementEffect> applyReplacementEffect(List<ReplacementEffect> list) { return list;}
default List<StaticAbility> applyStaticAbility(List<StaticAbility> list) { return list;}I might do |
| * Used by {@link #notNeededAsBlockers} to batch predict calls over | ||
| * groups of identical tokens. | ||
| */ | ||
| public static boolean isAiEquivalentBlocker(Card a, Card b) { |
There was a problem hiding this comment.
- I don't think "indistinguishable" is strictly true, e.g.
AiBlockController#changesPTWhenBlockedis based on a trigger check. Though covering only the more common cases seems enough for this heuristic - However I'd still like to see some consideration for the opportunity cost introduced this way - yes people who encounter lots of token creatures currently complain, but some noise has to be taken into account. Therefore - for this to be worth it - the additional runtime of adding these attacker grouping checks for games with almost no tokens needs to be acceptable...
- I'm not even convinced this grouping is currently needed:
isn't the remaining life way more often heavily influenced by the amount of remaining creatures (for chump blocking) or their total toughness (against tramplers)?
So just always using binary search (maybe with a bias towards smaller boardstates) makes more sense? 🤔
There was a problem hiding this comment.
Have run some tests on this, results below:
Method:
Tested two variations against linear baseline (the code prior to this PR with no algorithm enabled):
-H003: the shipped fix in this PR - equivalence-class grouping + batch binary search innotNeededAsBlockers
-H003b: the proposed simplification — drop the equivalence-class detection entirely and just binary-search the whole sorted blocker list.Built H003b on top of the testbed branch and ran (a) an in-flight verify that runs both algorithms on identical pre-decision state at every
notNeededAsBlockerscall and compares release sets; (b) bucketed per-call ms-by-blocker-count timing on a 20-game precon batch + 10-game Commander batch with three variants: linear baseline, shipped H003, and H003b.Decision-making: 1,313 in-flight verify calls across both formats, 2 divergences (99.85% agreement). Where they diverge, H003b is strictly more conservative — declines an aggressive trade H003 takes. Both choices are reasonable AI play; the grouping isn't producing weird decisions.
Performance at 20 random precon games, fixed seed:
Linear (no algorithm) H003 (PR) H003b (simplification) Batch wall 197.6s 131.0s 202.0s ms/call N ≤ 5 (95% of decisions) 68 ms 29 ms 42 ms ms/call N6-15 (5% of decisions) 198 ms 203 ms 144 ms Note: each of these tests kept the other fixes in this PR turned on; only H003/H003b was isolated.
The opportunity-cost concern — that grouping checks add runtime in non-token games — turns out empirically negative: H003 is ~34% wall-faster than the same code without it even in random precon games where token swarms don't appear. H003b is essentially tied with linear at the batch level (slight regression); it only beats H003 on the rare wide-board calls.
Why: even N ≤ 5 "heterogeneous" precon boards routinely have partial duplicates — two of the same vanilla creature drawn off a 60-card deck, two basic-stat tokens from a single ETB. H003's grouping engages on those runs and batch-probes a 2-duplicate group in 1 predict call instead of 2. H003b can't recognise duplicates so it always does log(N) binary-search probes + up to N damage-trade walks, which on small boards comes out to more predict calls than even plain linear iteration. The grouping mechanism isn't gratuitous overhead; it's a net wall-saver even where there are no token swarms.
Recommendation: keep H003 as designed. The simpler H003b doesn't recover the wins H003 gets from batch-collapsing partial duplicates on small boards, and it's roughly tied with no-algorithm-at-all on the aggregate workload. The ~50 LOC of equivalence-class machinery is doing real work in real games, not just on token swarms.
The wall clock times here aren't particularly helpful, the per call times are the key measure. Wall times get noisy over a long test cycle (even with seeded starts) because a faster algorithm means AI has more time before it hits its thinking time cap, this means it has time to make a different decision, that decision then cascades into a different game state which can mean a longer game length.
On https://github.com/MostCromulent/forge/tree/tokenperf I set Claude AI up on a loop profiling performance of 4-player AI vs AI games with token heavy board states to identify potential performance optimizations.
The loop went through an process of hypothesis, testing, and verification to confirm any optimization did not cause regression in current AI decision making or engine results: outputs must be the same before and after code changes. This was done by running master code and optimizations concurrently and comparing outputs.
This PR reflects 6x fixes identified as resulting in improved performance with no regression in outputs. These focus on early-exiting commonly called methods if they are going to produce redundant or empty results, as well as an improved search algorithm.
These fixes together showed a ~33% improvement in AI batch testing time with 4-player precons and ~27% improvement with 4-player commander precons.
Detailed explanation and methodology here.
🤖 Generated with Claude Code